
This review is, to the best of our knowledge, the only living systematic review of data extraction methods.

When applied to the field of health research, this semi-automation sits at the interface between evidence-based medicine (EBM) and data science, and as described in the following section, interest in its development has grown in parallel with interest in AI in other areas of computer science.ġ.1 Related systematic reviews and overviews This creates opportunities for support through intelligent software, which identify and extract information automatically. The data extraction task can be time-consuming and repetitive when done by hand. Interventional, diagnostic, or prognostic systematic reviews routinely extract information from a specific set of fields that can be predefined.ġ The most common fields for extraction in interventional reviews are defined in the PICO framework (population, intervention, comparison, outcome) and similar frameworks are available for other review types. It is a necessary precursor to assessing the risk of bias in individual studies and synthesising their findings.

In a systematic review, data extraction is the process of capturing key characteristics of studies in structured and standardised form based on information in journal articles and reports. With this living review we aim to review the literature continually. The lack of publicly available gold-standard data for evaluation, and lack of application thereof, makes it difficult to draw conclusions on which is the best-performing system for each data extraction target. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data.
WEB DATA EXTRACTOR V7.1 CODE
Code was made available by 10 (19%) publications, and five (9%) implemented publicly available tools.Ĭonclusions: This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of systematic review. A description of their datasets was provided by 49 publications (94%), but only seven (13%) made the data publicly available. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. A total of 48 (90%) publications developed and evaluated classifiers that used randomised controlled trials as the main target texts.
WEB DATA EXTRACTOR V7.1 FULL
Of these, 41 (77%) of the publications addressed extraction of data from abstracts, while 14 (26%) used full texts.

Results: In total, 53 publications are included in this version of our review. This iteration of the living review includes publications up to a cut-off date of 22 April 2020. Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. Methods: We systematically and continually search MEDLINE, Institute of Electrical and Electronics Engineers (IEEE), arXiv, and theĭblp computer science bibliography databases. This living systematic review examines published approaches for data extraction from reports of clinical studies. Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies.
