Background: open access (OA) repositories of bibliographic records are increasing in size and coverage. Microsoft Academic Graph (MAG) contains more than 215 million OA bibliographic records from across science connected in a ‘graph’ of citation and conceptual relationships. If MAG contains the large majority of studies needed for systematic reviews (SRs), then focus can shift away from Boolean searches of multiple databases towards the data-mining of a single repository. Efficient data-mining of MAG, harnessing its graph structure and machine learning (ML) classifiers, can therefore achieve a step-change in the efficiency of study identification workflows for SRs.
Objectives: to assess the recall of MAG for: 1) study reports in the Cochrane Tobacco Addiction Group (TAG) Specialised Register; 2) study reports included in published TAG intervention reviews. To assess the performance of novel, semi-automated ‘MAG workflows’, designed for the efficient retrieval of study reports from MAG, in two use scenarios: 1) maintaining the TAG Specialised Register; 2) conducting and updating TAG intervention reviews.
Methods: we matched TAG Specialised Register records to corresponding MAG records; or else they were ‘not found in MAG’. We computed MAG recall and coded factors that could explain why a record is not found in MAG. These data were automatically collected using EPPI Reviewer software. MAG workflows combine ML, network graph analysis and conventional study selection methods. Evaluation of MAG workflows, compared with current standard methods, will use relative recall and economic evaluation methods to simulate impacts on recall, precision, workload and associated resource use.
Results: of 13,657 unique study reports included on the TAG Specialised Register, 82% are found in MAG. Of 2408 unique TAG Specialised Register study reports not found in MAG, more than 85% are either a conference proceeding, a trial registry record, a dissertation or dissertation abstract, or other grey literature; while only 6% are published in a host journal not indexed in MAG. 86% of study reports included in at least one TAG review, representing 94% of included studies in at least one TAG review, are found in MAG. We will report results from analyses of MAG workflow performance in TAG use scenarios.
Conclusions: MAG has clear potential to become a primary OA resource from which eligible studies are identified for Cochrane and other SRs. Our MAG workflows are now ready for implementation and further evaluation in other Cochrane Review Groups and use scenarios. Implementing these workflows can substantively reduce the costs associated with identifying studies for Cochrane and other SRs.
Patient or healthcare consumer involvement: this is a methods research study with no direct patient or healthcare consumer involvement. Its findings are expected to help ensure that SRs can more easily be kept up to date, with indirect benefits for all patients and healthcare consumers.