• Mayo Clinic Libraries
  • Evidence Synthesis Guide
  • Risk of Bias by Study Design

Evidence Synthesis Guide : Risk of Bias by Study Design

  • Review Types & Decision Tree
  • Standards & Reporting Results
  • Materials in the Mayo Clinic Libraries
  • Training Resources
  • Review Teams
  • Develop & Refine Your Research Question
  • Develop a Timeline
  • Project Management
  • Communication
  • PRISMA-P Checklist
  • Eligibility Criteria
  • Register your Protocol
  • Other Resources
  • Other Screening Tools
  • Grey Literature Searching
  • Citation Searching
  • Data Extraction Tools
  • Minimize Bias
  • GRADE & GRADE-CERQual
  • Synthesis & Meta-Analysis
  • Publishing your Review

Risk of Bias of Individual Studies

critical appraisal of systematic review essay

““Assessment of risk of bias is a key step that informs many other steps and decisions made in conducting systematic reviews. It plays an important role in the final assessment of the strength of the evidence.” 1  

Risk of Bias by Study Design (featured tools)

  • Systematic Reviews
  • Non-RCTs or Observational Studies
  • Diagnostic Accuracy
  • Animal Studies
  • Qualitative Research
  • Tool Repository
  • AMSTAR 2 The original AMSTAR was developed to assess the risk of bias in systematic reviews that included only randomized controlled trials. AMSTAR 2 was published in 2017 and allows researchers to identify high quality systematic reviews, including those based on non-randomised studies of healthcare interventions. more... less... AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews)
  • ROBIS ROBIS is a tool designed specifically to assess the risk of bias in systematic reviews. The tool is completed in three phases: (1) assess relevance(optional), (2) identify concerns with the review process, and (3) judge risk of bias in the review. Signaling questions are included to help assess specific concerns about potential biases with the review. more... less... ROBIS (Risk of Bias in Systematic Reviews)
  • BMJ Framework for Assessing Systematic Reviews This framework provides a checklist that is used to evaluate the quality of a systematic review.
  • CASP Checklist for Systematic Reviews This CASP checklist is not a scoring system, but rather a method of appraising systematic reviews by considering: 1. Are the results of the study valid? 2. What are the results? 3. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • CEBM Systematic Reviews Critical Appraisal Sheet The CEBM’s critical appraisal sheets are designed to help you appraise the reliability, importance, and applicability of clinical evidence. more... less... CEBM (Centre for Evidence-Based Medicine)
  • JBI Critical Appraisal Tools, Checklist for Systematic Reviews JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis.
  • NHLBI Study Quality Assessment of Systematic Reviews and Meta-Analyses The NHLBI’s quality assessment tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. more... less... NHLBI (National Heart, Lung, and Blood Institute)
  • RoB 2 RoB 2 provides a framework for assessing the risk of bias in a single estimate of an intervention effect reported from a randomized trial, rather than the entire trial. more... less... RoB 2 (revised tool to assess Risk of Bias in randomized trials)
  • CASP Randomised Controlled Trials Checklist This CASP checklist considers various aspects of an RCT that require critical appraisal: 1. Is the basic study design valid for a randomized controlled trial? 2. Was the study methodologically sound? 3. What are the results? 4. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • CONSORT Statement The CONSORT checklist includes 25 items to determine the quality of randomized controlled trials. Critical appraisal of the quality of clinical trials is possible only if the design, conduct, and analysis of RCTs are thoroughly and accurately described in the report. more... less... CONSORT (Consolidated Standards of Reporting Trials)
  • NHLBI Study Quality Assessment of Controlled Intervention Studies The NHLBI’s quality assessment tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. more... less... NHLBI (National Heart, Lung, and Blood Institute)
  • JBI Critical Appraisal Tools Checklist for Randomized Controlled Trials JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis.
  • ROBINS-I ROBINS-I is a tool for evaluating risk of bias in estimates of the comparative effectiveness… of interventions from studies that did not use randomization to allocate units to comparison groups. more... less... ROBINS-I (Risk Of Bias in Non-randomized Studies – of Interventions)
  • NOS This tool is used primarily to evaluate and appraise case-control or cohort studies. more... less... NOS (Newcastle-Ottawa Scale)
  • AXIS Cross-sectional studies are frequently used as an evidence base for diagnostic testing, risk factors for disease, and prevalence studies. The AXIS tool focuses mainly on the presented study methods and results. more... less... AXIS (Appraisal tool for Cross-Sectional Studies)
  • NHLBI Study Quality Assessment Tools for Non-Randomized Studies The NHLBI’s quality assessment tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. • Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies • Quality Assessment of Case-Control Studies • Quality Assessment Tool for Before-After (Pre-Post) Studies With No Control Group • Quality Assessment Tool for Case Series Studies more... less... NHLBI (National Heart, Lung, and Blood Institute)
  • Case Series Studies Quality Appraisal Checklist Developed by the Institute of Health Economics (Canada), the checklist is comprised of 20 questions to assess the robustness of the evidence of uncontrolled case series studies.
  • Methodological Quality and Synthesis of Case Series and Case Reports In this paper, Dr. Murad and colleagues present a framework for appraisal, synthesis and application of evidence derived from case reports and case series.
  • MINORS The MINORS instrument contains 12 items and was developed for evaluating the quality of observational or non-randomized studies. This tool may be of particular interest to researchers who would like to critically appraise surgical studies. more... less... MINORS (Methodological Index for Non-Randomized Studies)
  • JBI Critical Appraisal Tools for Non-Randomized Trials JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis. • Checklist for Analytical Cross Sectional Studies • Checklist for Case Control Studies • Checklist for Case Reports • Checklist for Case Series • Checklist for Cohort Studies
  • QUADAS-2 The QUADAS-2 tool is designed to assess the quality of primary diagnostic accuracy studies it consists of 4 key domains that discuss patient selection, index test, reference standard, and flow of patients through the study and timing of the index tests and reference standard. more... less... QUADAS-2 (a revised tool for the Quality Assessment of Diagnostic Accuracy Studies)
  • JBI Critical Appraisal Tools Checklist for Diagnostic Test Accuracy Studies JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis.
  • STARD 2015 The authors of the standards note that essential elements of diagnostic accuracy study methods are often poorly described and sometimes completely omitted, making both critical appraisal and replication difficult, if not impossible. The Standards for the Reporting of Diagnostic Accuracy Studies was developed to help improve completeness and transparency in reporting of diagnostic accuracy studies. more... less... STARD 2015 (Standards for the Reporting of Diagnostic Accuracy Studies)
  • CASP Diagnostic Study Checklist This CASP checklist considers various aspects of diagnostic test studies including: 1. Are the results of the study valid? 2. What were the results? 3. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • CEBM Diagnostic Critical Appraisal Sheet The CEBM’s critical appraisal sheets are designed to help you appraise the reliability, importance, and applicability of clinical evidence. more... less... CEBM (Centre for Evidence-Based Medicine)
  • SYRCLE’s RoB Implementation of SYRCLE’s RoB tool will facilitate and improve critical appraisal of evidence from animal studies. This may enhance the efficiency of translating animal research into clinical practice and increase awareness of the necessity of improving the methodological quality of animal studies. more... less... SYRCLE’s RoB (SYstematic Review Center for Laboratory animal Experimentation’s Risk of Bias)
  • ARRIVE 2.0 The ARRIVE 2.0 guidelines are a checklist of information to include in a manuscript to ensure that publications on in vivo animal studies contain enough information to add to the knowledge base. more... less... ARRIVE 2.0 (Animal Research: Reporting of In Vivo Experiments)
  • Critical Appraisal of Studies Using Laboratory Animal Models This article provides an approach to critically appraising papers based on the results of laboratory animal experiments, and discusses various bias domains in the literature that critical appraisal can identify.
  • CEBM Critical Appraisal of Qualitative Studies Sheet The CEBM’s critical appraisal sheets are designed to help you appraise the reliability, importance and applicability of clinical evidence. more... less... CEBM (Centre for Evidence-Based Medicine)
  • CASP Qualitative Studies Checklist This CASP checklist considers various aspects of qualitative research studies including: 1. Are the results of the study valid? 2. What were the results? 3. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • Quality Assessment and Risk of Bias Tool Repository Created by librarians at Duke University, this extensive listing contains over 100 commonly used risk of bias tools that may be sorted by study type.
  • Latitudes Network A library of risk of bias tools for use in evidence syntheses that provides selection help and training videos.

References & Recommended Reading

1.    Viswanathan, M., Patnode, C. D., Berkman, N. D., Bass, E. B., Chang, S., Hartling, L., ... & Kane, R. L. (2018). Recommendations for assessing the risk of bias in systematic reviews of health-care interventions .  Journal of clinical epidemiology ,  97 , 26-34.

2.     Kolaski, K., Logan, L. R., & Ioannidis, J. P. (2024). Guidance to best tools and practices for systematic reviews .  British Journal of Pharmacology ,  181 (1), 180-210

3.     Fowkes FG, Fulton PM.  Critical appraisal of published research: introductory guidelines.   BMJ (Clinical research ed).  1991;302(6785):1136-1140.

4.     Shea BJ, Reeves BC, Wells G, et al.  AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both.   BMJ (Clinical research ed).  2017;358:j4008.

5..     Whiting P, Savovic J, Higgins JPT, et al.  ROBIS: A new tool to assess risk of bias in systematic reviews was developed.   Journal of clinical epidemiology.  2016;69:225-234.

6.     Sterne JAC, Savovic J, Page MJ, et al.  RoB 2: a revised tool for assessing risk of bias in randomised trials.  BMJ (Clinical research ed).  2019;366:l4898.

7.     Moher D, Hopewell S, Schulz KF, et al.  CONSORT 2010 Explanation and Elaboration: Updated guidelines for reporting parallel group randomised trials.  Journal of clinical epidemiology.  2010;63(8):e1-37.

8..    Sterne JA, Hernan MA, Reeves BC, et al.  ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions.  BMJ (Clinical research ed).  2016;355:i4919.

9.    Downes MJ, Brennan ML, Williams HC, Dean RS.  Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS).   BMJ open.  2016;6(12):e011458.

10.   Guo B, Moga C, Harstall C, Schopflocher D.  A principal component analysis is conducted for a case series quality appraisal checklist.   Journal of clinical epidemiology.  2016;69:199-207.e192.

11.   Murad MH, Sultan S, Haffar S, Bazerbachi F.  Methodological quality and synthesis of case series and case reports.  BMJ evidence-based medicine.  2018;23(2):60-63.

12.   Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y, Chipponi J.  Methodological index for non-randomized studies (MINORS): development and validation of a new instrument.   ANZ journal of surgery.  2003;73(9):712-716.

13.   Whiting PF, Rutjes AWS, Westwood ME, et al.  QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.   Annals of internal medicine.  2011;155(8):529-536.

14.   Bossuyt PM, Reitsma JB, Bruns DE, et al.  STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies.   BMJ (Clinical research ed).  2015;351:h5527.

15.   Hooijmans CR, Rovers MM, de Vries RBM, Leenaars M, Ritskes-Hoitinga M, Langendam MW.  SYRCLE's risk of bias tool for animal studies.   BMC medical research methodology.  2014;14:43.

16.   Percie du Sert N, Ahluwalia A, Alam S, et al.  Reporting animal research: Explanation and elaboration for the ARRIVE guidelines 2.0.  PLoS biology.  2020;18(7):e3000411.

17.   O'Connor AM, Sargeant JM.  Critical appraisal of studies using laboratory animal models.   ILAR journal.  2014;55(3):405-417.

  • << Previous: Minimize Bias
  • Next: GRADE & GRADE-CERQual >>
  • Last Updated: Aug 23, 2024 4:29 PM
  • URL: https://libraryguides.mayo.edu/systematicreviewprocess
  • Search Menu
  • Sign in through your institution
  • Advance Articles
  • Specialty Certificate Examination Cases
  • Cover Archive
  • Virtual Issues
  • Trending Articles
  • Author Guidelines
  • Reviewer Guidelines
  • Submission Site
  • Open Access Options
  • Reasons to Publish
  • Self-Archiving Policy
  • About Clinical and Experimental Dermatology
  • About British Association of Dermatologists
  • Editorial Board
  • Advertising & Corporate Services
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Discussion and conclusions, learning points, funding sources, data availability, ethics statement, cpd questions, instructions for answering questions.

  • < Previous

How to critically appraise a systematic review: an aide for the reader and reviewer

ORCID logo

Conflicts of interest H.W. founded the Cochrane Skin Group in 1987 and was coordinator editor until 2018. The other authors declare they have no conflicts of interest.

  • Article contents
  • Figures & tables
  • Supplementary Data

John Frewen, Marianne de Brito, Anjali Pathak, Richard Barlow, Hywel C Williams, How to critically appraise a systematic review: an aide for the reader and reviewer, Clinical and Experimental Dermatology , Volume 48, Issue 8, August 2023, Pages 854–859, https://doi.org/10.1093/ced/llad141

  • Permissions Icon Permissions

The number of published systematic reviews has soared rapidly in recent years. Sadly, the quality of most systematic reviews in dermatology is substandard. With the continued increase in exposure to systematic reviews, and their potential to influence clinical practice, we sought to describe a sequence of useful tips for the busy clinician reader to determine study quality and clinical utility. Important factors to consider when assessing systematic reviews include: determining the motivation to performing the study, establishing if the study protocol was prepublished, assessing quality of reporting using the PRISMA checklist, assessing study quality using the AMSTAR 2 critical appraisal checklist, assessing for evidence of spin, and summarizing the main strengths and limitations of the study to determine if it could change clinical practice. Having a set of heuristics to consider when reading systematic reviews serves to save time, enabling assessment of quality in a structured way, and come to a prompt conclusion of the merits of a review article in order to inform the care of dermatology patients.

A systematic review aims to systematically and transparently summarize the available data on a defined clinical question, via a rigorous search for studies, a critique of the quality of included studies and a qualitative and/or quantitative synthesis. 1 Systematic reviews are at the top of the pyramid in most evidence hierarchies for informing evidence-based healthcare as they are considered of greater validity and clinical applicability than those study types lower down, such as case series or individual trials. 2

A good systematic review should provide an unbiased overview of studies to inform clinical practice. Systematic reviews can reconcile apparently conflicting results, add precision to estimating smaller treatment effects, highlight the evidence’s limitations and biases and identify research gaps. Guidelines are available to assist systematic reviewers to transparently report why the review was done, the authors’ methods and findings via the PRISMA checklist. 3

The sharp rise in systematic review publications over time raises concern that the majority are unnecessary, misleading and/or conflicted. 4 A review of dermatology systematic reviews noted that 93% failed to report at least one PRISMA checklist item. 5 Another review of a random sample of 140/732 dermatology systematic reviews in 2017 found 90% were low quality. 6 Some improvements have occurred: reporting standards compliance has improved slightly (between 2013 and 2017), 5 and several leading dermatology journals including the British Journal of Dermatology have changed editorial policies, mandating authors to preregister review protocols.

Given the surge in poor-quality systematic review publications, we sought to describe a checklist of seven practical tips from the authors’ collective experience of writing and critically appraising systematic reviews, hoping that they will assist busy clinicians to critically appraise systematic reviews both as manuscript reviewers and as readers and research users.

Read the abstract to develop a sense of the subject.

What was the motivation for completing the review?

Has the review protocol been published and have changes been made to it.

Review the reporting quality .

Review the quality of the article and the depth of the review question.

Consider the authors’ interpretation and assess for spin .

Summarize and come to a position .

Read the abstract to develop a sense of the subject

From the abstract, use the PICO (population, intervention, comparator and outcome) framework to establish if the subject, intervention and outcomes are relevant to clinical practice. Is the review question clear and appropriate?

Inspect the authors’ conflicts of interest and funding sources. Self-disclosed financial conflicts are often insufficiently described or not declared at all. 7 If you suspect conflicts for authors with no stated conflicts, briefly searching the senior authors’ names on PubMed, or the Open Payments website (for US authors) may reveal hidden conflicts. 8 Is the motivation for the systematic review justified in the introduction? Can new insights be formed by combining studies? If the systematic review is an update, what new available data justifies this? Search for similar recent systematic reviews (which may have been omitted intentionally). Is it a redundant duplicate review that adds little new useful information? 9 Has the author recently published reviews on similar subjects? Salami publications refer to authors chopping up a topic into smaller pieces to obtain maximum publications. 10

Search PROSPERO for publication of the review protocol. 11 A prepublished review protocol in a publicly accessible site offers reassurance that the systematic review followed a clear plan with prespecified PICO elements. Put bluntly, it reduces authors’ opportunity for deception by selective analysis and highlighting of results that are more likely to get published. If a protocol is found, assess deviation from this protocol and justification, if present. Protocol registration allows improved PRISMA reporting. 12 A registered protocol with reporting of deviations allows the reader to judge whether any modifications are justified, for example adjusting for unexpected challenges during analysis. 10

Review the reporting quality

Look for supplementary material detailing the PRISMA checklist. Commonly under-reported PRISMA items include protocol and registration, risk of bias across studies, risk of bias in individual studies, the data collection process and review objectives. 5 Adequate reporting quality using PRISMA does not necessarily indicate the review is clinically useful; however, it allows the reader to assess the study’s utility (see Table 1 ). Additional assessments of review quality are described below.

The relationship between systematic review reporting quality and study quality a

Reporting qualityStudy quality
GoodFlawed
ClearMay be helpful for clinical practiceAt least you can tell it is flawed and make a judgement on utility
PoorA sparkling diamond – but how do you know?Difficult to distinguish from a good but poorly reported study
Reporting qualityStudy quality
GoodFlawed
ClearMay be helpful for clinical practiceAt least you can tell it is flawed and make a judgement on utility
PoorA sparkling diamond – but how do you know?Difficult to distinguish from a good but poorly reported study

Adapted with permission from Williams. 21

Review the quality of the article and the depth of the review question

Distinct from quality of reporting completeness, assessing the review's quality allows for assessment of the overall clinical meaningfulness of the results. Does the PICO make sense in respect to this? The AMSTAR 2 critical appraisal instrument is useful in determining quantitative systematic review quality. 13 This checklist marks the key aspects of a systematic review and computes an outcome of the review quality. 14 If meta-analysis was performed, did the authors justify and use appropriate methods for statistical combination of results? Were weighted techniques used to combine results and adjusted for heterogeneity, if present? If heterogeneity was present, were sources of this investigated? Did authors assess the potential impact of the individual study’s risk of bias (RoB) and perform analysis to investigate the impact of RoB on the summary estimates of affect? See Table 2 for an example of a completed AMSTAR 2 checklist on a recently published poor-quality systematic review. 15

An example of assessment of the quality of a systematic review (Drake et al. ) 15 using the AMSTAR 2 checklist an explanation of which can be found at https://amstar.ca/Amstar-2.php

Checklist itemResponse
1. Did the research questions and inclusion criteria for the review include the components of PICO?No
2. Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocol? No
3. Did the review authors explain their selection of the study designs for inclusion in the review?No
4. Did the review authors use a comprehensive literature search strategy? Partial Yes
5. Did the review authors perform study selection in duplicate?Yes
6. Did the review authors perform data extraction in duplicate?Yes
7. Did the review authors provide a list of excluded studies and justify the exclusions? No
8. Did the review authors describe the included studies in adequate detail?No
9. Did the review authors use a satisfactory technique for assessing the risk of bias (RoB) in individual studies that were included in the review? No
10. Did the review authors report on the sources of funding for the studies included in the review?No
11. If meta-analysis was performed did the review authors use appropriate methods for statistical combination of results?N/A
12. If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis?N/A
13. Did the review authors account for RoB in individual studies when interpreting/discussing the results of the review No
14. Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review?No
15. If they performed quantitative synthesis did the review authors carry out an adequate investigation of publication bias (small study bias) and discuss its likely impact on the results of the review? N/A
16. Did the review authors report any potential sources of conflict of interest, including any funding they received for conducting the review?Yes
Checklist itemResponse
1. Did the research questions and inclusion criteria for the review include the components of PICO?No
2. Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocol? No
3. Did the review authors explain their selection of the study designs for inclusion in the review?No
4. Did the review authors use a comprehensive literature search strategy? Partial Yes
5. Did the review authors perform study selection in duplicate?Yes
6. Did the review authors perform data extraction in duplicate?Yes
7. Did the review authors provide a list of excluded studies and justify the exclusions? No
8. Did the review authors describe the included studies in adequate detail?No
9. Did the review authors use a satisfactory technique for assessing the risk of bias (RoB) in individual studies that were included in the review? No
10. Did the review authors report on the sources of funding for the studies included in the review?No
11. If meta-analysis was performed did the review authors use appropriate methods for statistical combination of results?N/A
12. If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis?N/A
13. Did the review authors account for RoB in individual studies when interpreting/discussing the results of the review No
14. Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review?No
15. If they performed quantitative synthesis did the review authors carry out an adequate investigation of publication bias (small study bias) and discuss its likely impact on the results of the review? N/A
16. Did the review authors report any potential sources of conflict of interest, including any funding they received for conducting the review?Yes

N/A, not applicable; PICO, population, intervention, comparator and outcome. a Denotes AMSTAR 2 critical domain. The overall confidence in the results of the review is dependent on such critical domains. When one critical domain is not satisfied, the confidence is rated as ‘low’ and the review may not provide an accurate and comprehensive summary of the available studies that address the question of interest. When more than one critical domain are not satisfied, the confidence in the results of the review is rated as ‘critically low’ and the review should not be relied on to provide an accurate and comprehensive summary of the available studies.

Quality checklists for assessment of qualitative research include Consolidated Criteria for Reporting Qualitative research (COREQ), Standards for Reporting Qualitative Research (SRQR) and Critical Appraisal Skills Programme (CASP). 16 Such checklists aim to improve identification of high-quality qualitative research in journal articles, as well as acting as a guide for conducting research. 16

Consider the authors’ interpretation and assess for spin

Spin is a distorted interpretation of results. This manifests itself in studies as (i) misleading reporting, (ii) misleading interpretation, and (iii) inappropriate extrapolation. 14 Are the conclusion’s clinical practice recommendations not supported by the studies’ findings? Is the title misleading? Is there selective reporting? These are the three most severe forms of spin occurring in systematic reviews. 17

Summarize and come to a position

Summarize the reviews main positives and negatives and establish if there is sufficient quality to merit changing clinical practice, or are fatal flaws present that nullify the review’s clinical utility? Consider internal validity (are the results true?) and external validity (are the results applicable to my patient group?). When applying the systematic review results to a particular patient, it may help to consider these points: (i) how similar are the study participants to my patient?; (ii) do the outcomes make sense to me?; (iii) what was the magnitude of treatment benefit? – work out the number needed to treat; 18 (iv) what are the adverse events?; and (v) what are my patient's values and preferences? 19

Although systematic reviews have potential for summarizing evidence for dermatological interventions in a systematic and unbiased way, the rapid expansion of poorly reported and poor-quality reviews (Table 3 ) is regrettable. We do not claim our checklist items (Table 4 ) are superior to other checklists such as those suggested by CASP, 20 but they are based on the practical experience of critical appraisal of dermatology systematic reviews conducted by the authors.

The top seven ‘sins’ of dermatology systematic reviews a

Systematic review sin typeExplanationSolution
SuperficialNarrow minor topics of questionable clinical value that are often done as student projectsSenior authors to show more responsibility and undertake some form of prioritization exercises with patients and clinicians
SalamiChopping up a topic into several smaller pieces in order to obtain as many publications as possibleEditors to spot and decline potential salami topics and encourage broader reviews
SelectiveFailing to register the protocol for a systematic review and only reporting the outcomes that look interestingFunders and journals to make prospective registration on PROSPERO mandatory
SloppyPoorly reported reviews that fail to comply with basic PRISMA reporting guidanceJournal editors to require authors to complete PRISMA checklist and to check those responses
Seen beforeCovert duplication of existing reviewsReaders to expose and journals to investigate and retract if response inadequate
SpeciousReviews that give an air or spurious precision by presenting lots of numbers and statistical methods yet fail to engage with content expertise to make any sense of the topicReview teams to include content experts
Journals to employ an associate editor with systematic review expertise
Seriously wrongSausage factory reviews that get past journal editors, but which contain serious errors such as including the same study more than once in a meta-analysisAll systematic reviews with meta-analysis should be sent for statistical and content expertise review
Systematic review sin typeExplanationSolution
SuperficialNarrow minor topics of questionable clinical value that are often done as student projectsSenior authors to show more responsibility and undertake some form of prioritization exercises with patients and clinicians
SalamiChopping up a topic into several smaller pieces in order to obtain as many publications as possibleEditors to spot and decline potential salami topics and encourage broader reviews
SelectiveFailing to register the protocol for a systematic review and only reporting the outcomes that look interestingFunders and journals to make prospective registration on PROSPERO mandatory
SloppyPoorly reported reviews that fail to comply with basic PRISMA reporting guidanceJournal editors to require authors to complete PRISMA checklist and to check those responses
Seen beforeCovert duplication of existing reviewsReaders to expose and journals to investigate and retract if response inadequate
SpeciousReviews that give an air or spurious precision by presenting lots of numbers and statistical methods yet fail to engage with content expertise to make any sense of the topicReview teams to include content experts
Journals to employ an associate editor with systematic review expertise
Seriously wrongSausage factory reviews that get past journal editors, but which contain serious errors such as including the same study more than once in a meta-analysisAll systematic reviews with meta-analysis should be sent for statistical and content expertise review

Adapted with permission from Williams. 10

Checklist of questions, considerations and tips for critical appraisal of systematic reviews

ItemComments

Is there a clear PICO and is it relevant to clinical practice?
Is it clear and appropriate?

Are there any conflicts of interests or financial considerations?
Does the introduction provide a compelling reason for the systematic review to be performed?
Are there other similar systematic reviews, perhaps not even referenced in this paper?

Is this systematic review registered on PROSPERO?
Was the protocol adhered to and if not, was this justified?

Has a PRISMA checklist been completed and is this accurate?
Pay particular attention to reporting of bias

Consider using a formal checklist, e.g. AMSTAR 2
If meta-analysis was performed, was it appropriate to combine the studies?
Were weighted techniques used to combine study results and adjusted for heterogeneity if present?
If heterogeneity was present were sources of this investigated?
Did authors assess the potential impact of risk of bias from individual studies?

Do the conclusions correlate with the results? (If not, is there misleading reporting, misleading interpretation, inappropriate extrapolation?)
Do the authors make recommendations for clinical practice which are not supported by the study’s findings?
Is the title misleading?
Is there evidence of selective reporting?

What are the main positives and negatives?
Consider the internal validity; are the results true?
If they are true; consider external validity; are the (true) results applicable to my patient group?
How similar are the study participants to my patient?
Do the outcomes make sense to me?
What was the magnitude of treatment effects? (Calculate NNT)
What were the adverse events?
What are my patients’ values and preferences?
ItemComments

Is there a clear PICO and is it relevant to clinical practice?
Is it clear and appropriate?

Are there any conflicts of interests or financial considerations?
Does the introduction provide a compelling reason for the systematic review to be performed?
Are there other similar systematic reviews, perhaps not even referenced in this paper?

Is this systematic review registered on PROSPERO?
Was the protocol adhered to and if not, was this justified?

Has a PRISMA checklist been completed and is this accurate?
Pay particular attention to reporting of bias

Consider using a formal checklist, e.g. AMSTAR 2
If meta-analysis was performed, was it appropriate to combine the studies?
Were weighted techniques used to combine study results and adjusted for heterogeneity if present?
If heterogeneity was present were sources of this investigated?
Did authors assess the potential impact of risk of bias from individual studies?

Do the conclusions correlate with the results? (If not, is there misleading reporting, misleading interpretation, inappropriate extrapolation?)
Do the authors make recommendations for clinical practice which are not supported by the study’s findings?
Is the title misleading?
Is there evidence of selective reporting?

What are the main positives and negatives?
Consider the internal validity; are the results true?
If they are true; consider external validity; are the (true) results applicable to my patient group?
How similar are the study participants to my patient?
Do the outcomes make sense to me?
What was the magnitude of treatment effects? (Calculate NNT)
What were the adverse events?
What are my patients’ values and preferences?

NNT, number needed to treat.

Considering each question suggested in our checklist when faced with yet another systematic review draws a timely conclusion on its quality and application to clinical practice, when acting as a reviewer or reader. Although the checklist may sound exhaustive and time-consuming, we recommend cutting it short if there are major red flags early on, such as absence of a protocol or assessment of RoB. Given the growing number of systematic reviews, having an efficient and succinct aide for appraising articles saves the reader time and energy, while simplifying the decision regarding what merits a change in clinical practice. Our intention is not to criticize others’ well-intentioned efforts, but to improve standards of reliable evidence to inform patient care.

Systematic reviews of randomized controlled trials offer one of the best methods to summarize the evidence surrounding therapeutic interventions for skin conditions.

The number of systematic reviews in the dermatology literature is increasing rapidly.

The quality of dermatology systematic reviews is generally poor.

We describe a checklist for the busy clinician or reviewer to consider when faced with a systematic review.

Key factors to consider include: determining the review motivation, establishing if the study protocol was prepublished, assessing quality of reporting and study quality using PRISMA, and AMSTAR 2 critical appraisal checklists, and assessing for evidence of spin.

Summarizing the main qualities and limitations of a systematic review will help to determine if the review is robust enough to potentially change clinical practice for patient benefit.

This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

No new data generated.

Ethical approval: not applicable. Informed consent: not applicable.

Moher D , Liberati A , Tetzlaff J et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement . Ann Intern Med 2009 ; 151 : 264 – 9 .

Google Scholar

Murad MH , Asi N , Alsawas M , Alahdab F . New evidence pyramid . Evid Based Med 2016 ; 21 : 125 – 7 .

Page MJ , McKenzie JE , Bossuyt PM et al.  The PRISMA 2020 statement: an updated guideline for reporting systematic reviews . BMJ 2021 ; 372 : n71 .

Ioannidis JP . The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses . Milbank Q 2016 ; 94 : 485 – 514 .

Croitoru DO , Huang Y , Kurdina A et al.  Quality of reporting in systematic reviews published in dermatology journals . Br J Dermatol 2020 ; 182 : 1469 – 76 .

Smires S , Afach S , Mazaud C et al.  Quality and reporting completeness of systematic reviews and meta-analyses in dermatology . J Invest Dermatol 2021 ; 141 : 64 – 71 .

Baraldi JH , Picozzo SA , Arnold JC et al.  A cross-sectional examination of conflict-of-interest disclosures of physician-authors publishing in high-impact US medical journals . BMJ Open 2022 ; 12 : e057598 .

Centers for Medicare & Medicaid Services . Open Payments Search Tool. About. Available at : https://openpaymentsdata.cms.gov/about (last accessed 22 April 2023).

Guelimi R , Afach S , Régnaux JP et al.  Overlapping network meta-analyses on psoriasis systemic treatments, an overview: quantity does not make quality . Br J Dermatol 2022 ; 187 : 29 – 41 .

Williams HC . Are dermatology systematic reviews spinning out of control? Dermatology 2021 ; 237 : 493 – 5 .

National Institute for Health and Care Research . About Prospero. Available at : https://www.crd.york.ac.uk/prospero/#aboutpage (last accessed 22 April 2023).

Barbieri JS , Wehner MR . Systematic reviews in dermatology: opportunities for improvement . Br J Dermatol 2020 ; 182 : 1329 – 30 .

Shea BJ , Reeves BC , Wells G et al.  AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both . BMJ 2017 ; 358 : j4008 .

AMSTAR . AMSTAR checklist. Available at : https://amstar.ca/Amstar_Checklist.php (last accessed 22 April 2023).

Drake L , Reyes-Hadsall S , Martinez J et al.  Evaluation of the safety and effectiveness of nutritional supplements for treating hair loss: a systematic review . JAMA Dermatol 2023 ; 159 : 79 – 86 .

Stenfors T , Kajamaa A , Bennett D . How to … assess the quality of qualitative research . Clin Teach 2020 ; 17 : 596 – 9 .

Yavchitz A , Ravaud P , Altman DG et al.  A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity . J Clin Epidemiol 2016 ; 75 : 56 – 65 .

Manriquez JJ , Villouta MF , Williams HC . Evidence-based dermatology: number needed to treat and its relation to other risk measures . J Am Acad Dermatol 2007 ; 56 : 664 – 71 .

Williams HC . Applying trial evidence back to the patient . Arch Dermatol 2003 ; 139 : 1195 – 200 .

CASP . CASP checklists. Available at : https://casp-uk.net/casp-tools-checklists/  (last accessed 22 April 2023).

Williams HC . Cars, CONSORT 2010, and clinical practice . Trials 2010 ; 11 : 33 .

Learning objective

To demonstrate up-to-date knowledge on assessing systematic reviews.

Which of the following critical appraisal checklists is useful for assessment of items that should be reported in a systematic review?

Which one of the following statements is correct?

The number of published systematic reviews in the dermatology literature is falling.

The quality of published dermatology systematic reviews is generally very good.

Publishing details of the PRISMA checklist in a systematic review indicates that the study quality is high.

External validity refers to the applicability of results to your patient group.

Internal validity refers to the applicability of results to your patient group.

Spin in systematic reviews can be described by which one of the following measures?

Authors declaring all conflicts of interest.

Title suggesting beneficial effect not supported by findings.

Adequate reporting of study limitations.

Conclusion formulating recommendations for clinical practice supported by findings.

Reporting a departure from study protocol that may modify interpretation of results.

PICO stands for which of the following.

PubMed, inclusion, comparator, outcome.

Population, items, comparator, outcome.

Population, intervention, context, observations.

Protocol, intervention, certainty, outcome.

Population, intervention, comparator, outcome.

Publication of a systematic review study protocol can be found at which source?

Cochrane Library.

ClinicalTrials.gov.

This learning activity is freely available online at https://oupce.rievent.com/a/TWWDCK

Users are encouraged to

Read the article in print or online, paying particular attention to the learning points and any author conflict of interest disclosures.

Reflect on the article.

Register or login online at https://oupce.rievent.com/a/TWWDCK and answer the CPD questions.

Complete the required evaluation component of the activity.

Once the test is passed, you will receive a certificate and the learning activity can be added to your RCP CPD diary as a self-certified entry.

This activity will be available for CPD credit for 5 years following its publication date. At that time, it will be reviewed and potentially updated and extended for an additional period.

Author notes

Month: Total Views:
April 2023 10
May 2023 63
June 2023 63
July 2023 119
August 2023 194
September 2023 214
October 2023 250
November 2023 189
December 2023 205
January 2024 336
February 2024 375
March 2024 461
April 2024 623
May 2024 612
June 2024 340
July 2024 312
August 2024 292

Email alerts

Citing articles via.

  • Recommend to Your Librarian
  • Advertising and Corporate Services
  • Journals Career Network

Affiliations

  • Online ISSN 1365-2230
  • Copyright © 2024 British Association of Dermatologists
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

University of Texas

  • University of Texas Libraries
  • UT Libraries

Systematic Reviews & Evidence Synthesis Methods

Critical appraisal.

  • Types of Reviews
  • Formulate Question
  • Find Existing Reviews & Protocols
  • Register a Protocol
  • Searching Systematically
  • Supplementary Searching
  • Managing Results
  • Deduplication
  • Glossary of terms
  • Librarian Support
  • Video tutorials This link opens in a new window
  • Systematic Review & Evidence Synthesis Boot Camp

Some reviews require a critical appraisal for each study that makes it through the screening process. This involves a risk of bias assessment and/or a quality assessment. The goal of these reviews is not just to find all of the studies, but to determine their methodological rigor, and therefore, their credibility.

"Critical appraisal is the balanced assessment of a piece of research, looking for its strengths and weaknesses and them coming to a balanced judgement about its trustworthiness and its suitability for use in a particular context." 1

It's important to consider the impact that poorly designed studies could have on your findings and to rule out inaccurate or biased work.

Selection of a valid critical appraisal tool, testing the tool with several of the selected studies, and involving two or more reviewers in the appraisal are good practices to follow.

1. Purssell E, McCrae N. How to Perform a Systematic Literature Review: A Guide for Healthcare Researchers, Practitioners and Students. 1st ed. Springer ;  2020.

Evaluation Tools

  • The Appraisal of Guidelines for Research & Evaluation Instrument (AGREE II) The Appraisal of Guidelines for Research & Evaluation Instrument (AGREE II) was developed to address the issue of variability in the quality of practice guidelines.
  • Centre for Evidence-Based Medicine (CEBM). Critical Appraisal Tools "contains useful tools and downloads for the critical appraisal of different types of medical evidence. Example appraisal sheets are provided together with several helpful examples."
  • Critical Appraisal Skills Programme (CASP) Checklists Critical Appraisal checklists for many different study types
  • Critical Review Form for Qualitative Studies Version 2, developed out of McMaster University
  • Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS) Downes MJ, Brennan ML, Williams HC, et al. Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS). BMJ Open 2016;6:e011458. doi:10.1136/bmjopen-2016-011458
  • Downs & Black Checklist for Assessing Studies Downs, S. H., & Black, N. (1998). The Feasibility of Creating a Checklist for the Assessment of the Methodological Quality Both of Randomised and Non-Randomised Studies of Health Care Interventions. Journal of Epidemiology and Community Health (1979-), 52(6), 377–384.
  • GRADE The Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group "has developed a common, sensible and transparent approach to grading quality (or certainty) of evidence and strength of recommendations."
  • Grade Handbook Full handbook on the GRADE method for grading quality of evidence.
  • MAGIC (Making GRADE the Irresistible choice) Clear succinct guidance in how to use GRADE
  • Joanna Briggs Institute. Critical Appraisal Tools "JBI’s critical appraisal tools assist in assessing the trustworthiness, relevance and results of published papers." Includes checklists for 13 types of articles.
  • Latitudes Network This is a searchable library of validity assessment tools for use in evidence syntheses. This website also provides access to training on the process of validity assessment.
  • Mixed Methods Appraisal Tool A tool that can be used to appraise a mix of studies that are included in a systematic review - qualitative research, RCTs, non-randomized studies, quantitative studies, mixed methods studies.
  • RoB 2 Tool Higgins JPT, Sterne JAC, Savović J, Page MJ, Hróbjartsson A, Boutron I, Reeves B, Eldridge S. A revised tool for assessing risk of bias in randomized trials In: Chandler J, McKenzie J, Boutron I, Welch V (editors). Cochrane Methods. Cochrane Database of Systematic Reviews 2016, Issue 10 (Suppl 1). dx.doi.org/10.1002/14651858.CD201601.
  • ROBINS-I Risk of Bias for non-randomized (observational) studies or cohorts of interventions Sterne J A, Hernán M A, Reeves B C, Savović J, Berkman N D, Viswanathan M et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions BMJ 2016; 355 :i4919 doi:10.1136/bmj.i4919
  • Scottish Intercollegiate Guidelines Network. Critical Appraisal Notes and Checklists "Methodological assessment of studies selected as potential sources of evidence is based on a number of criteria that focus on those aspects of the study design that research has shown to have a significant effect on the risk of bias in the results reported and conclusions drawn. These criteria differ between study types, and a range of checklists is used to bring a degree of consistency to the assessment process."
  • The TREND Statement (CDC) Des Jarlais DC, Lyles C, Crepaz N, and the TREND Group. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: The TREND statement. Am J Public Health. 2004;94:361-366.
  • Assembling the Pieces of a Systematic Reviews, Chapter 8: Evaluating: Study Selection and Critical Appraisal.
  • How to Perform a Systematic Literature Review, Chapter: Critical Appraisal: Assessing the Quality of Studies.

Other library guides

  • Duke University Medical Center Library. Systematic Reviews: Assess for Quality and Bias
  • UNC Health Sciences Library. Systematic Reviews: Assess Quality of Included Studies
  • Last Updated: Aug 12, 2024 8:26 AM
  • URL: https://guides.lib.utexas.edu/systematicreviews

Creative Commons License

  • Open access
  • Published: 08 June 2023

Guidance to best tools and practices for systematic reviews

  • Kat Kolaski 1 ,
  • Lynne Romeiser Logan 2 &
  • John P. A. Ioannidis 3  

Systematic Reviews volume  12 , Article number:  96 ( 2023 ) Cite this article

25k Accesses

23 Citations

76 Altmetric

Metrics details

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 , 5 , 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 , 19 , 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 , 24 , 25 , 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 , 33 , 34 , 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 , 45 , 46 , 47 , 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 , 50 , 51 , 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 , 56 , 57 , 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

figure 1

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 , 85 , 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 , 100 , 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 , 104 , 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 , 150 , 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 , 152 , 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 , 166 , 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 , 185 , 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Muka T, Glisic M, Milic J, Verhoog S, Bohlius J, Bramer W, et al. A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research. Eur J Epidemiol. 2020;35(1):49–60.

Article   PubMed   Google Scholar  

Thomas J, McDonald S, Noel-Storr A, Shemilt I, Elliott J, Mavergames C, et al. Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for cochrane reviews. J Clin Epidemiol. 2021;133:140–51.

Article   PubMed   PubMed Central   Google Scholar  

Fontelo P, Liu F. A review of recent publication trends from top publishing countries. Syst Rev. 2018;7(1):147.

Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34.

Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:1–7.

Article   Google Scholar  

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358: j4008.

Goldkuhle M, Narayan VM, Weigl A, Dahm P, Skoetz N. A systematic assessment of Cochrane reviews and systematic reviews published in high-impact medical journals related to cancer. BMJ Open. 2018;8(3): e020869.

Ho RS, Wu X, Yuan J, Liu S, Lai X, Wong SY, et al. Methodological quality of meta-analyses on treatments for chronic obstructive pulmonary disease: a cross-sectional study using the AMSTAR (Assessing the Methodological Quality of Systematic Reviews) tool. NPJ Prim Care Respir Med. 2015;25:14102.

Tsoi AKN, Ho LTF, Wu IXY, Wong CHL, Ho RST, Lim JYY, et al. Methodological quality of systematic reviews on treatments for osteoporosis: a cross-sectional study. Bone. 2020;139(June): 115541.

Arienti C, Lazzarini SG, Pollock A, Negrini S. Rehabilitation interventions for improving balance following stroke: an overview of systematic reviews. PLoS ONE. 2019;14(7):1–23.

Kolaski K, Romeiser Logan L, Goss KD, Butler C. Quality appraisal of systematic reviews of interventions for children with cerebral palsy reveals critically low confidence. Dev Med Child Neurol. 2021;63(11):1316–26.

Almeida MO, Yamato TP, Parreira PCS, do Costa LOP, Kamper S, Saragiotto BT. Overall confidence in the results of systematic reviews on exercise therapy for chronic low back pain: a cross-sectional analysis using the Assessing the Methodological Quality of Systematic Reviews (AMSTAR) 2 tool. Braz J Phys Ther. 2020;24(2):103–17.

Mayo-Wilson E, Ng SM, Chuck RS, Li T. The quality of systematic reviews about interventions for refractive error can be improved: a review of systematic reviews. BMC Ophthalmol. 2017;17(1):1–10.

Matthias K, Rissling O, Pieper D, Morche J, Nocon M, Jacobs A, et al. The methodological quality of systematic reviews on the treatment of adult major depression needs improvement according to AMSTAR 2: a cross-sectional study. Heliyon. 2020;6(9): e04776.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Riado Minguez D, Kowalski M, Vallve Odena M, Longin Pontzen D, Jelicic Kadic A, Jeric M, et al. Methodological and reporting quality of systematic reviews published in the highest ranking journals in the field of pain. Anesth Analg. 2017;125(4):1348–54.

Churuangsuk C, Kherouf M, Combet E, Lean M. Low-carbohydrate diets for overweight and obesity: a systematic review of the systematic reviews. Obes Rev. 2018;19(12):1700–18.

Article   CAS   PubMed   Google Scholar  

Storman M, Storman D, Jasinska KW, Swierz MJ, Bala MM. The quality of systematic reviews/meta-analyses published in the field of bariatrics: a cross-sectional systematic survey using AMSTAR 2 and ROBIS. Obes Rev. 2020;21(5):1–11.

Franco JVA, Arancibia M, Meza N, Madrid E, Kopitowski K. [Clinical practice guidelines: concepts, limitations and challenges]. Medwave. 2020;20(3):e7887 ([Spanish]).

Brito JP, Tsapas A, Griebeler ML, Wang Z, Prutsky GJ, Domecq JP, et al. Systematic reviews supporting practice guideline recommendations lack protection against bias. J Clin Epidemiol. 2013;66(6):633–8.

Zhou Q, Wang Z, Shi Q, Zhao S, Xun Y, Liu H, et al. Clinical epidemiology in China series. Paper 4: the reporting and methodological quality of Chinese clinical practice guidelines published between 2014 and 2018: a systematic review. J Clin Epidemiol. 2021;140:189–99.

Lunny C, Ramasubbu C, Puil L, Liu T, Gerrish S, Salzwedel DM, et al. Over half of clinical practice guidelines use non-systematic methods to inform recommendations: a methods study. PLoS ONE. 2021;16(4):1–21.

Faber T, Ravaud P, Riveros C, Perrodeau E, Dechartres A. Meta-analyses including non-randomized studies of therapeutic interventions: a methodological review. BMC Med Res Methodol. 2016;16(1):1–26.

Ioannidis JPA. The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q. 2016;94(3):485–514.

Møller MH, Ioannidis JPA, Darmon M. Are systematic reviews and meta-analyses still useful research? We are not sure. Intensive Care Med. 2018;44(4):518–20.

Moher D, Glasziou P, Chalmers I, Nasser M, Bossuyt PMM, Korevaar DA, et al. Increasing value and reducing waste in biomedical research: who’s listening? Lancet. 2016;387(10027):1573–86.

Barnard ND, Willet WC, Ding EL. The misuse of meta-analysis in nutrition research. JAMA. 2017;318(15):1435–6.

Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction - GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–94.

Page MJ, Shamseer L, Altman DG, Tetzlaff J, Sampson M, Tricco AC, et al. Epidemiology and reporting characteristics of systematic reviews of biomedical research: a cross-sectional study. PLoS Med. 2016;13(5):1–31.

World Health Organization. WHO handbook for guideline development, 2nd edn. WHO; 2014. Available from: https://www.who.int/publications/i/item/9789241548960 . Cited 2022 Jan 20

Higgins J, Lasserson T, Chandler J, Tovey D, Thomas J, Flemying E, et al. Methodological expectations of Cochrane intervention reviews. Cochrane; 2022. Available from: https://community.cochrane.org/mecir-manual/key-points-and-introduction . Cited 2022 Jul 19

Cumpston M, Chandler J. Chapter II: Planning a Cochrane review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Jan 30

Henderson LK, Craig JC, Willis NS, Tovey D, Webster AC. How to write a cochrane systematic review. Nephrology. 2010;15(6):617–24.

Page MJ, Altman DG, Shamseer L, McKenzie JE, Ahmadzai N, Wolfe D, et al. Reproducible research practices are underused in systematic reviews of biomedical interventions. J Clin Epidemiol. 2018;94:8–18.

Lorenz RC, Matthias K, Pieper D, Wegewitz U, Morche J, Nocon M, et al. AMSTAR 2 overall confidence rating: lacking discriminating capacity or requirement of high methodological quality? J Clin Epidemiol. 2020;119:142–4.

Posadzki P, Pieper D, Bajpai R, Makaruk H, Könsgen N, Neuhaus AL, et al. Exercise/physical activity and health outcomes: an overview of Cochrane systematic reviews. BMC Public Health. 2020;20(1):1–12.

Wells G, Shea B, O’Connell D, Peterson J, Welch V, Losos M. The Newcastile-Ottawa Scale (NOS) for assessing the quality of nonrandomized studies in meta-analyses. The Ottawa Hospital; 2009. Available from: https://www.ohri.ca/programs/clinical_epidemiology/oxford.asp . Cited 2022 Jul 19

Stang A. Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. Eur J Epidemiol. 2010;25(9):603–5.

Stang A, Jonas S, Poole C. Case study in major quotation errors: a critical commentary on the Newcastle-Ottawa scale. Eur J Epidemiol. 2018;33(11):1025–31.

Ioannidis JPA. Massive citations to misleading methods and research tools: Matthew effect, quotation error and citation copying. Eur J Epidemiol. 2018;33(11):1021–3.

Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: a scoping review. J Clin Epidemiol. 2022;144:22–42.

Crequit P, Boutron I, Meerpohl J, Williams H, Craig J, Ravaud P. Future of evidence ecosystem series: 2. Current opportunities and need for better tools and methods. J Clin Epidemiol. 2020;123:143–52.

Shemilt I, Noel-Storr A, Thomas J, Featherstone R, Mavergames C. Machine learning reduced workload for the cochrane COVID-19 study register: development and evaluation of the cochrane COVID-19 study classifier. Syst Rev. 2022;11(1):15.

Nguyen P-Y, Kanukula R, McKensie J, Alqaidoom Z, Brennan SE, Haddaway N, et al. Changing patterns in reporting and sharing of review data in systematic reviews with meta-analysis of the effects of interventions: a meta-research study. medRxiv; 2022 Available from: https://doi.org/10.1101/2022.04.11.22273688v3 . Cited 2022 Nov 18

Afshari A, Møller MH. Broken science and the failure of academics—resignation or reaction? Acta Anaesthesiol Scand. 2018;62(8):1038–40.

Butler E, Granholm A, Aneman A. Trustworthy systematic reviews–can journals do more? Acta Anaesthesiol Scand. 2019;63(4):558–9.

Negrini S, Côté P, Kiekens C. Methodological quality of systematic reviews on interventions for children with cerebral palsy: the evidence pyramid paradox. Dev Med Child Neurol. 2021;63(11):1244–5.

Page MJ, Moher D. Mass production of systematic reviews and meta-analyses: an exercise in mega-silliness? Milbank Q. 2016;94(3):515–9.

Clarke M, Chalmers I. Reflections on the history of systematic reviews. BMJ Evid Based Med. 2018;23(4):121–2.

Alnemer A, Khalid M, Alhuzaim W, Alnemer A, Ahmed B, Alharbi B, et al. Are health-related tweets evidence based? Review and analysis of health-related tweets on twitter. J Med Internet Res. 2015;17(10): e246.

PubMed   PubMed Central   Google Scholar  

Haber N, Smith ER, Moscoe E, Andrews K, Audy R, Bell W, et al. Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): a systematic review. PLoS ONE. 2018;13(5): e196346.

Swetland SB, Rothrock AN, Andris H, Davis B, Nguyen L, Davis P, et al. Accuracy of health-related information regarding COVID-19 on Twitter during a global pandemic. World Med Heal Policy. 2021;13(3):503–17.

Nascimento DP, Almeida MO, Scola LFC, Vanin AA, Oliveira LA, Costa LCM, et al. Letter to the editor – not even the top general medical journals are free of spin: a wake-up call based on an overview of reviews. J Clin Epidemiol. 2021;139:232–4.

Ioannidis JPA, Fanelli D, Dunne DD, Goodman SN. Meta-research: evaluation and improvement of research methods and practices. PLoS Biol. 2015;13(10):1–7.

Munn Z, Stern C, Aromataris E, Lockwood C, Jordan Z. What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences. BMC Med Res Methodol. 2018;18(1):1–9.

Pollock M, Fernandez R, Becker LA, Pieper D, Hartling L. Chapter V: overviews of reviews. Cochrane handbook for systematic reviews of interventions. In:  Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-v . Cited 2022 Mar 7

Tricco AC, Lillie E, Zarin W, O’Brien K, Colquhoun H, Kastner M, et al. A scoping review on the conduct and reporting of scoping reviews. BMC Med Res Methodol. 2016;16(1):1–10.

Garritty C, Gartlehner G, Nussbaumer-Streit B, King VJ, Hamel C, Kamel C, et al. Cochrane rapid reviews methods group offers evidence-informed guidance to conduct rapid reviews. J Clin Epidemiol. 2021;130:13–22.

Elliott JH, Synnot A, Turner T, Simmonds M, Akl EA, McDonald S, et al. Living systematic review: 1. Introduction—the why, what, when, and how. J Clin Epidemiol. 2017;91:23–30.

Higgins JPT, Thomas J, Chandler J. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Jan 25

Aromataris E, Munn Z. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Jan 15]. Available from: https://synthesismanual.jbi.global .

Tufanaru C, Munn Z, Aromartaris E, Campbell J, Hopp L. Chapter 3: Systematic reviews of effectiveness. In Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Jan 25]. Available from: https://synthesismanual.jbi.global .

Leeflang MMG, Davenport C, Bossuyt PM. Defining the review question. In: Deeks JJ, Bossuyt PM, Leeflang MMG, Takwoingi Y, editors. Cochrane handbook for systematic reviews of diagnostic test accuracy [internet]. Cochrane; 2022 [cited 2022 Mar 30]. Available from: https://training.cochrane.org/6-defining-review-question .

Noyes J, Booth A, Cargo M, Flemming K, Harden A, Harris J, et al.Qualitative evidence. In: Higgins J, Tomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions [internet]. Cochrane; 2022 [cited 2022 Mar 30]. Available from: https://training.cochrane.org/handbook/current/chapter-21#section-21-5 .

Lockwood C, Porritt K, Munn Z, Rittenmeyer L, Salmond S, Bjerrum M, et al. Chapter 2: Systematic reviews of qualitative evidence. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Jul 11]. Available from: https://synthesismanual.jbi.global .

Debray TPA, Damen JAAG, Snell KIE, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017;356:i6460.

Moola S, Munn Z, Tufanaru C, Aromartaris E, Sears K, Sfetcu R, et al. Systematic reviews of etiology and risk. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Mar 30]. Available from: https://synthesismanual.jbi.global/ .

Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539–49.

Prinsen CAC, Mokkink LB, Bouter LM, Alonso J, Patrick DL, de Vet HCW, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1147–57.

Munn Z, Moola S, Lisy K, Riitano D, Tufanaru C. Chapter 5: Systematic reviews of prevalence and incidence. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Mar 30]. Available from: https://synthesismanual.jbi.global/ .

Centre for Evidence-Based Medicine. Study designs. CEBM; 2016. Available from: https://www.cebm.ox.ac.uk/resources/ebm-tools/study-designs . Cited 2022 Aug 30

Hartling L, Bond K, Santaguida PL, Viswanathan M, Dryden DM. Testing a tool for the classification of study designs in systematic reviews of interventions and exposures showed moderate reliability and low accuracy. J Clin Epidemiol. 2011;64(8):861–71.

Crowe M, Sheppard L, Campbell A. Reliability analysis for a proposed critical appraisal tool demonstrated value for diverse research designs. J Clin Epidemiol. 2012;65(4):375–83.

Reeves BC, Wells GA, Waddington H. Quasi-experimental study designs series—paper 5: a checklist for classifying studies evaluating the effects on health interventions—a taxonomy without labels. J Clin Epidemiol. 2017;89:30–42.

Reeves BC, Deeks JJ, Higgins JPT, Shea B, Tugwell P, Wells GA. Chapter 24: including non-randomized studies on intervention effects.  In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-24 . Cited 2022 Mar 1

Reeves B. A framework for classifying study designs to evaluate health care interventions. Forsch Komplementarmed Kl Naturheilkd. 2004;11(Suppl 1):13–7.

Google Scholar  

Rockers PC, Røttingen J, Shemilt I. Inclusion of quasi-experimental studies in systematic reviews of health systems research. Health Policy. 2015;119(4):511–21.

Mathes T, Pieper D. Clarifying the distinction between case series and cohort studies in systematic reviews of comparative studies: potential impact on body of evidence and workload. BMC Med Res Methodol. 2017;17(1):8–13.

Jhangiani R, Cuttler C, Leighton D. Single subject research. In: Jhangiani R, Cuttler C, Leighton D, editors. Research methods in psychology, 4th edn. Pressbooks KPU; 2019. Available from: https://kpu.pressbooks.pub/psychmethods4e/part/single-subject-research/ . Cited 2022 Aug 15

Higgins JP, Ramsay C, Reeves BC, Deeks JJ, Shea B, Valentine JC, et al. Issues relating to study design and risk of bias when including non-randomized studies in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4(1):12–25.

Cumpston M, Lasserson T, Chandler J, Page M. 3.4.1 Criteria for considering studies for this review, Chapter III: Reporting the review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-iii#section-iii-3-4-1 . Cited 2022 Oct 12

Kooistra B, Dijkman B, Einhorn TA, Bhandari M. How to design a good case series. J Bone Jt Surg. 2009;91(Suppl 3):21–6.

Murad MH, Sultan S, Haffar S, Bazerbachi F. Methodological quality and synthesis of case series and case reports. Evid Based Med. 2018;23(2):60–3.

Robinson K, Chou R, Berkman N, Newberry S, FU R, Hartling L, et al. Methods guide for comparative effectiveness reviews integrating bodies of evidence: existing systematic reviews and primary studies. AHRQ; 2015. Available from: https://archive.org/details/integrating-evidence-report-150226 . Cited 2022 Aug 7

Tugwell P, Welch VA, Karunananthan S, Maxwell LJ, Akl EA, Avey MT, et al. When to replicate systematic reviews of interventions: consensus checklist. BMJ. 2020;370: m2864.

Tsertsvadze A, Maglione M, Chou R, Garritty C, Coleman C, Lux L, et al. Updating comparative effectiveness reviews:current efforts in AHRQ’s effective health care program. J Clin Epidemiol. 2011;64(11):1208–15.

Cumpston M, Chandler J. Chapter IV: Updating a review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Aug 2

Pollock M, Fernandes RM, Newton AS, Scott SD, Hartling L. A decision tool to help researchers make decisions about including systematic reviews in overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):1–8.

Pussegoda K, Turner L, Garritty C, Mayhew A, Skidmore B, Stevens A, et al. Identifying approaches for assessing methodological and reporting quality of systematic reviews: a descriptive study. Syst Rev. 2017;6(1):1–12.

Bhaumik S. Use of evidence for clinical practice guideline development. Trop Parasitol. 2017;7(2):65–71.

Moher D, Eastwood S, Olkin I, Drummond R, Stroup D. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Lancet. 1999;354:1896–900.

Stroup D, Berlin J, Morton S, Olkin I, Williamson G, Rennie D, et al. Meta-analysis of observational studies in epidemiology A proposal for reporting. JAMA. 2000;238(15):2008–12.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol. 2009;62(10):1006–12.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372: n71.

Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44(11):1271–8.

Centre for Evidence-Based Medicine. Critical appraisal tools. CEBM; 2015. Available from: https://www.cebm.ox.ac.uk/resources/ebm-tools/critical-appraisal-tools . Cited 2022 Apr 10

Page MJ, McKenzie JE, Higgins JPT. Tools for assessing risk of reporting biases in studies and syntheses of studies: a systematic review. BMJ Open. 2018;8(3):1–16.

Article   CAS   Google Scholar  

Ma LL, Wang YY, Yang ZH, Huang D, Weng H, Zeng XT. Methodological quality (risk of bias) assessment tools for primary and secondary medical studies: what are they and which is better? Mil Med Res. 2020;7(1):1–11.

Banzi R, Cinquini M, Gonzalez-Lorenzo M, Pecoraro V, Capobussi M, Minozzi S. Quality assessment versus risk of bias in systematic reviews: AMSTAR and ROBIS had similar reliability but differed in their construct and applicability. J Clin Epidemiol. 2018;99:24–32.

Swierz MJ, Storman D, Zajac J, Koperny M, Weglarz P, Staskiewicz W, et al. Similarities, reliability and gaps in assessing the quality of conduct of systematic reviews using AMSTAR-2 and ROBIS: systematic survey of nutrition reviews. BMC Med Res Methodol. 2021;21(1):1–10.

Pieper D, Puljak L, González-Lorenzo M, Minozzi S. Minor differences were found between AMSTAR 2 and ROBIS in the assessment of systematic reviews including both randomized and nonrandomized studies. J Clin Epidemiol. 2019;108:26–33.

Lorenz RC, Matthias K, Pieper D, Wegewitz U, Morche J, Nocon M, et al. A psychometric study found AMSTAR 2 to be a valid and moderately reliable appraisal tool. J Clin Epidemiol. 2019;114:133–40.

Leclercq V, Hiligsmann M, Parisi G, Beaudart C, Tirelli E, Bruyère O. Best-worst scaling identified adequate statistical methods and literature search as the most important items of AMSTAR2 (A measurement tool to assess systematic reviews). J Clin Epidemiol. 2020;128:74–82.

Bühn S, Mathes T, Prengel P, Wegewitz U, Ostermann T, Robens S, et al. The risk of bias in systematic reviews tool showed fair reliability and good construct validity. J Clin Epidemiol. 2017;91:121–8.

Gates M, Gates A, Duarte G, Cary M, Becker M, Prediger B, et al. Quality and risk of bias appraisals of systematic reviews are inconsistent across reviewers and centers. J Clin Epidemiol. 2020;125:9–15.

Perry R, Whitmarsh A, Leach V, Davies P. A comparison of two assessment tools used in overviews of systematic reviews: ROBIS versus AMSTAR-2. Syst Rev. 2021;10(1):273.

Gates M, Gates A, Guitard S, Pollock M, Hartling L. Guidance for overviews of reviews continues to accumulate, but important challenges remain: a scoping review. Syst Rev. 2020;9(1):1–19.

Aromataris E, Fernandez R, Godfrey C, Holly C, Khalil H, Tungpunkom P. Chapter 10: umbrella reviews. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis. JBI; 2020. Available from: https://synthesismanual.jbi.global . Cited 2022 Jul 11

Pieper D, Lorenz RC, Rombey T, Jacobs A, Rissling O, Freitag S, et al. Authors should clearly report how they derived the overall rating when applying AMSTAR 2—a cross-sectional study. J Clin Epidemiol. 2021;129:97–103.

Franco JVA, Meza N. Authors should also report the support for judgment when applying AMSTAR 2. J Clin Epidemiol. 2021;138:240.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med. 2009;6(7): e1000100.

Page MJ, Moher D. Evaluations of the uptake and impact of the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement and extensions: a scoping review. Syst Rev. 2017;6(1):263.

Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372: n160.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. Updating guidance for reporting systematic reviews: development of the PRISMA 2020 statement. J Clin Epidemiol. 2021;134:103–12.

Welch V, Petticrew M, Petkovic J, Moher D, Waters E, White H, et al. Extending the PRISMA statement to equity-focused systematic reviews (PRISMA-E 2012): explanation and elaboration. J Clin Epidemiol. 2016;70:68–89.

Beller EM, Glasziou PP, Altman DG, Hopewell S, Bastian H, Chalmers I, et al. PRISMA for abstracts: reporting systematic reviews in journal and conference abstracts. PLoS Med. 2013;10(4): e1001419.

Moher D, Shamseer L, Clarke M. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4(1):1.

Hutton B, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, et al. The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann Intern Med. 2015;162(11):777–84.

Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, et al. Preferred reporting items for a systematic review and meta-analysis of individual participant data: The PRISMA-IPD statement. JAMA. 2015;313(16):1657–65.

Zorzela L, Loke YK, Ioannidis JP, Golder S, Santaguida P, Altman DG, et al. PRISMA harms checklist: Improving harms reporting in systematic reviews. BMJ. 2016;352: i157.

McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, Clifford T, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy studies The PRISMA-DTA statement. JAMA. 2018;319(4):388–96.

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73.

Wang X, Chen Y, Liu Y, Yao L, Estill J, Bian Z, et al. Reporting items for systematic reviews and meta-analyses of acupuncture: the PRISMA for acupuncture checklist. BMC Complement Altern Med. 2019;19(1):1–10.

Rethlefsen ML, Kirtley S, Waffenschmidt S, Ayala AP, Moher D, Page MJ, et al. PRISMA-S: An extension to the PRISMA statement for reporting literature searches in systematic reviews. J Med Libr Assoc. 2021;109(2):174–200.

Blanco D, Altman D, Moher D, Boutron I, Kirkham JJ, Cobo E. Scoping review on interventions to improve adherence to reporting guidelines in health research. BMJ Open. 2019;9(5): e26589.

Koster TM, Wetterslev J, Gluud C, Keus F, van der Horst ICC. Systematic overview and critical appraisal of meta-analyses of interventions in intensive care medicine. Acta Anaesthesiol Scand. 2018;62(8):1041–9.

Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51.

Pollock A, Berge E. How to do a systematic review. Int J Stroke. 2018;13(2):138–56.

Gagnier JJ, Kellam PJ. Reporting and methodological quality of systematic reviews in the orthopaedic literature. J Bone Jt Surg. 2013;95(11):1–7.

Martinez-Monedero R, Danielian A, Angajala V, Dinalo JE, Kezirian EJ. Methodological quality of systematic reviews and meta-analyses published in high-impact otolaryngology journals. Otolaryngol Head Neck Surg. 2020;163(5):892–905.

Boutron I, Crequit P, Williams H, Meerpohl J, Craig J, Ravaud P. Future of evidence ecosystem series 1. Introduction-evidence synthesis ecosystem needs dramatic change. J Clin Epidemiol. 2020;123:135–42.

Ioannidis JPA, Bhattacharya S, Evers JLH, Der Veen F, Van SE, Barratt CLR, et al. Protect us from poor-quality medical research. Hum Reprod. 2018;33(5):770–6.

Lasserson T, Thomas J, Higgins J. Section 1.5 Protocol development, Chapter 1: Starting a review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/archive/v6/chapter-01#section-1-5 . Cited 2022 Mar 20

Stewart L, Moher D, Shekelle P. Why prospective registration of systematic reviews makes sense. Syst Rev. 2012;1(1):7–10.

Allers K, Hoffmann F, Mathes T, Pieper D. Systematic reviews with published protocols compared to those without: more effort, older search. J Clin Epidemiol. 2018;95:102–10.

Ge L, Tian J, Li Y, Pan J, Li G, Wei D, et al. Association between prospective registration and overall reporting and methodological quality of systematic reviews: a meta-epidemiological study. J Clin Epidemiol. 2018;93:45–55.

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;350: g7647.

Pieper D, Rombey T. Where to prospectively register a systematic review. Syst Rev. 2022;11(1):8.

PROSPERO. PROSPERO will require earlier registration. NIHR; 2022. Available from: https://www.crd.york.ac.uk/prospero/ . Cited 2022 Mar 20

Kirkham JJ, Altman DG, Williamson PR. Bias due to changes in specified outcomes during the systematic review process. PLoS ONE. 2010;5(3):3–7.

Victora CG, Habicht JP, Bryce J. Evidence-based public health: moving beyond randomized trials. Am J Public Health. 2004;94(3):400–5.

Peinemann F, Kleijnen J. Development of an algorithm to provide awareness in choosing study designs for inclusion in systematic reviews of healthcare interventions: a method study. BMJ Open. 2015;5(8): e007540.

Loudon K, Treweek S, Sullivan F, Donnan P, Thorpe KE, Zwarenstein M. The PRECIS-2 tool: designing trials that are fit for purpose. BMJ. 2015;350: h2147.

Junqueira DR, Phillips R, Zorzela L, Golder S, Loke Y, Moher D, et al. Time to improve the reporting of harms in randomized controlled trials. J Clin Epidemiol. 2021;136:216–20.

Hemkens LG, Contopoulos-Ioannidis DG, Ioannidis JPA. Routinely collected data and comparative effectiveness evidence: promises and limitations. CMAJ. 2016;188(8):E158–64.

Murad MH. Clinical practice guidelines: a primer on development and dissemination. Mayo Clin Proc. 2017;92(3):423–33.

Abdelhamid AS, Loke YK, Parekh-Bhurke S, Chen Y-F, Sutton A, Eastwood A, et al. Use of indirect comparison methods in systematic reviews: a survey of cochrane review authors. Res Synth Methods. 2012;3(2):71–9.

Jüni P, Holenstein F, Sterne J, Bartlett C, Egger M. Direction and impact of language bias in meta-analyses of controlled trials: empirical study. Int J Epidemiol. 2002;31(1):115–23.

Vickers A, Goyal N, Harland R, Rees R. Do certain countries produce only positive results? A systematic review of controlled trials. Control Clin Trials. 1998;19(2):159–66.

Jones CW, Keil LG, Weaver MA, Platts-Mills TF. Clinical trials registries are under-utilized in the conduct of systematic reviews: a cross-sectional analysis. Syst Rev. 2014;3(1):1–7.

Baudard M, Yavchitz A, Ravaud P, Perrodeau E, Boutron I. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses. BMJ. 2017;356: j448.

Fanelli D, Costas R, Ioannidis JPA. Meta-assessment of bias in science. Proc Natl Acad Sci USA. 2017;114(14):3714–9.

Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. Grey literature in systematic reviews: a cross-sectional study of the contribution of non-English reports, unpublished studies and dissertations to the results of meta-analyses in child-relevant reviews. BMC Med Res Methodol. 2017;17(1):64.

Hopewell S, McDonald S, Clarke M, Egger M. Grey literature in meta-analyses of randomized trials of health care interventions. Cochrane Database Syst Rev. 2007;2:MR000010.

Shojania K, Sampson M, Ansari MT, Ji J, Garritty C, Radar T, et al. Updating systematic reviews. AHRQ Technical Reviews. 2007: Report 07–0087.

Tate RL, Perdices M, Rosenkoetter U, Wakim D, Godbee K, Togher L, et al. Revision of a method quality rating scale for single-case experimental designs and n-of-1 trials: The 15-item Risk of Bias in N-of-1 Trials (RoBiNT) Scale. Neuropsychol Rehabil. 2013;23(5):619–38.

Tate RL, Perdices M, McDonald S, Togher L, Rosenkoetter U. The design, conduct and report of single-case research: Resources to improve the quality of the neurorehabilitation literature. Neuropsychol Rehabil. 2014;24(3–4):315–31.

Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366: l4894.

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355: i4919.

Igelström E, Campbell M, Craig P, Katikireddi SV. Cochrane’s risk of bias tool for non-randomized studies (ROBINS-I) is frequently misapplied: a methodological systematic review. J Clin Epidemiol. 2021;140:22–32.

McKenzie JE, Brennan SE. Chapter 12: Synthesizing and presenting findings using other methods. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-12 . Cited 2022 Apr 10

Ioannidis J, Patsopoulos N, Rothstein H. Reasons or excuses for avoiding meta-analysis in forest plots. BMJ. 2008;336(7658):1413–5.

Stewart LA, Tierney JF. To IPD or not to IPD? Eval Health Prof. 2002;25(1):76–97.

Tierney JF, Stewart LA, Clarke M. Chapter 26: Individual participant data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-26 . Cited 2022 Oct 12

Chaimani A, Caldwell D, Li T, Higgins J, Salanti G. Chapter 11: Undertaking network meta-analyses. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Oct 12.

Cooper H, Hedges L, Valentine J. The handbook of research synthesis and meta-analysis. 3rd ed. Russell Sage Foundation; 2019.

Sutton AJ, Abrams KR, Jones DR, Sheldon T, Song F. Methods for meta-analysis in medical research. Methods for meta-analysis in medical research; 2000.

Deeks J, Higgins JPT, Altman DG. Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic review of interventions. Cochrane; 2022. Available from: http://www.training.cochrane.org/handbook . Cited 2022 Mar 20.

Clarke MJ. Individual patient data meta-analyses. Best Pract Res Clin Obstet Gynaecol. 2005;19(1):47–55.

Catalá-López F, Tobías A, Cameron C, Moher D, Hutton B. Network meta-analysis for comparing treatment effects of multiple interventions: an introduction. Rheumatol Int. 2014;34(11):1489–96.

Debray T, Schuit E, Efthimiou O, Reitsma J, Ioannidis J, Salanti G, et al. An overview of methods for network meta-analysis using individual participant data: when do benefits arise? Stat Methods Med Res. 2016;27(5):1351–64.

Tonin FS, Rotta I, Mendes AM, Pontarolo R. Network meta-analysis : a technique to gather evidence from direct and indirect comparisons. Pharm Pract (Granada). 2017;15(1):943.

Tierney JF, Vale C, Riley R, Smith CT, Stewart L, Clarke M, et al. Individual participant data (IPD) metaanalyses of randomised controlled trials: guidance on their use. PLoS Med. 2015;12(7): e1001855.

Rouse B, Chaimani A, Li T. Network meta-analysis: an introduction for clinicians. Intern Emerg Med. 2017;12(1):103–11.

Cochrane Training. Review Manager RevMan Web. Cochrane; 2022. Available from: https://training.cochrane.org/online-learning/core-software/revman . Cited 2022 Jun 24

MetaXL. MetalXL. Epi Gear; 2016. Available from: http://epigear.com/index_files/metaxl.html . Cited 2022 Jun 24.

JBI. JBI SUMARI. JBI; 2019. Available from: https://sumari.jbi.global/ . Cited 2022 Jun 24.

Ryan R. Cochrane Consumers and Communication Review Group: data synthesis and analysis. Cochrane Consumers and Communication Review Group; 2013. Available from: http://cccrg.cochrane.org . Cited 2022 Jun 24

McKenzie JE, Beller EM, Forbes AB. Introduction to systematic reviews and meta-analysis. Respirology. 2016;21(4):626–37.

Campbell M, Katikireddi SV, Sowden A, Thomson H. Lack of transparency in reporting narrative synthesis of quantitative data: a methodological assessment of systematic reviews. J Clin Epidemiol. 2019;105:1–9.

Campbell M, McKenzie JE, Sowden A, Katikireddi SV, Brennan SE, Ellis S, et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ. 2020;368: l6890.

McKenzie JE, Brennan S, Ryan R. Summarizing study characteristics and preparing for synthesis. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Oct 12

AHRQ. Systems to rate the strength of scientific evidence. Evidence report/technology assessment no. 47. AHRQ; 2002. Available from: https://archive.ahrq.gov/clinic/epcsums/strengthsum.htm . Cited 2022 Apr 10.

Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, et al. Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches. BMC Health Serv Res. 2004;4(1):38.

Ioannidis JPA. Meta-research: the art of getting it wrong.  Res Synth Methods. 2010;1(3–4):169–84.

Lai NM, Teng CL, Lee ML. Interpreting systematic reviews:  are we ready to make our own conclusions? A cross sectional study. BMC Med. 2011;9(1):30.

Glenton C, Santesso N, Rosenbaum S, Nilsen ES, Rader T, Ciapponi A, et al. Presenting the results of Cochrane systematic reviews to a consumer audience: a qualitative study. Med Decis Making. 2010;30(5):566–77.

Yavchitz A, Ravaud P, Altman DG, Moher D, HrobjartssonA, Lasserson T, et al. A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity. J Clin Epidemiol. 2016;75:56–65.

Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. GRADE Working Group. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:7454.

GRADE Working Group. Organizations. GRADE; 2022 [cited 2023 May 2].  Available from: www.gradeworkinggroup.org .

Hartling L, Fernandes RM, Seida J, Vandermeer B, Dryden DM. From the trenches: a cross-sectional study applying the grade tool in systematic reviews of healthcare interventions.  PLoS One. 2012;7(4):e34697.

Hultcrantz M, Rind D, Akl EA, Treweek S, Mustafa RA, Iorio A, et al. The GRADE working group clarifies the construct of certainty of evidence. J Clin Epidemiol. 2017;87:4–13.

Schünemann H, Brozek J, Guyatt G, Oxman AD, Editors. Section 6.3.2. Symbolic representation. GRADE Handbook [internet].  GRADE; 2013 [cited 2022 Jan 27]. Available from: https://gdt.gradepro.org/app/handbook/handbook.html#h.lr8e9vq954 .

Siemieniuk R, Guyatt G What is GRADE? [internet] BMJ Best Practice; 2017 [cited 2022 Jul 20]. Available from: https://bestpractice.bmj.com/info/toolkit/learn-ebm/what-is-grade/ .

Guyatt G, Oxman AD, Sultan S, Brozek J, Glasziou P, Alonso-Coello P, et al. GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes. J Clin Epidemiol. 2013;66(2):151–7.

Guyatt GH, Oxman AD, Sultan S, Glasziou P, Akl EA, Alonso-Coello P, et al. GRADE guidelines: 9. Rating up the quality of evidence. J Clin Epidemiol. 2011;64(12):1311–6.

Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, et al. GRADE guidelines: 4. Rating the quality of evidence - Study limitations (risk of bias). J Clin Epidemiol. 2011;64(4):407–15.

Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, et al. GRADE guidelines 6. Rating the quality of evidence - Imprecision. J Clin Epidemiol. 2011;64(12):1283–93.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 7. Rating the quality of evidence - Inconsistency. J Clin Epidemiol. 2011;64(12):1294–302.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 8. Rating the quality of evidence - Indirectness. J Clin Epidemiol. 2011;64(12):1303–10.

Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, et al. GRADE guidelines: 5. Rating the quality of evidence - Publication bias. J Clin Epidemiol. 2011;64(12):1277–82.

Andrews JC, Schünemann HJ, Oxman AD, Pottie K, Meerpohl JJ, Coello PA, et al. GRADE guidelines: 15. Going from evidence to recommendation - Determinants of a recommendation’s direction and strength. J Clin Epidemiol. 2013;66(7):726–35.

Fleming PS, Koletsi D, Ioannidis JPA, Pandis N. High quality of the evidence for medical and other health-related interventions was uncommon in Cochrane systematic reviews. J Clin Epidemiol. 2016;78:34–42.

Howick J, Koletsi D, Pandis N, Fleming PS, Loef M, Walach H, et al. The quality of evidence for medical interventions does not improve or worsen: a metaepidemiological study of Cochrane reviews. J Clin Epidemiol. 2020;126:154–9.

Mustafa RA, Santesso N, Brozek J, Akl EA, Walter SD, Norman G, et al. The GRADE approach is reproducible in assessing the quality of evidence of quantitative evidence syntheses. J Clin Epidemiol. 2013;66(7):736-742.e5.

Schünemann H, Brozek J, Guyatt G, Oxman A, editors. Section 5.4: Overall quality of evidence. GRADE Handbook. GRADE; 2013. Available from: https://gdt.gradepro.org/app/handbook/handbook.html#h.lr8e9vq954a . Cited 2022 Mar 25.

GRADE Working Group. Criteria for using GRADE. GRADE; 2016. Available from: https://www.gradeworkinggroup.org/docs/Criteria_for_using_GRADE_2016-04-05.pdf . Cited 2022 Jan 26

Werner SS, Binder N, Toews I, Schünemann HJ, Meerpohl JJ, Schwingshackl L. Use of GRADE in evidence syntheses published in high-impact-factor nutrition journals: a methodological survey. J Clin Epidemiol. 2021;135:54–69.

Zhang S, Wu QJ, Liu SX. A methodologic survey on use of the GRADE approach in evidence syntheses published in high-impact factor urology and nephrology journals. BMC Med Res Methodol. 2022;22(1):220.

Li L, Tian J, Tian H, Sun R, Liu Y, Yang K. Quality and transparency of overviews of systematic reviews. J Evid Based Med. 2012;5(3):166–73.

Pieper D, Buechter R, Jerinic P, Eikermann M. Overviews of reviews often have limited rigor: a systematic review. J Clin Epidemiol. 2012;65(12):1267–73.

Cochrane Editorial Unit. Appendix 1: Checklist for auditing GRADE and SoF tables in protocols of intervention reviews. Cochrane Training; 2022. Available from: https://training.cochrane.org/gomo/modules/522/resources/8307/Checklist for GRADE and SoF methods in Protocols for Gomo.pdf. Cited 2022 Mar 12

Ryan R, Hill S. How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group. Cochrane; 2016. Available from: https://cccrg.cochrane.org/author-resources .

Cunningham M, France EF, Ring N, Uny I, Duncan EA, Roberts RJ, et al. Developing a reporting guideline to improve meta-ethnography in health research: the eMERGe mixed-methods study. Heal Serv Deliv Res. 2019;7(4):1–116.

Tong A, Flemming K, McInnes E, Oliver S, Craig J. Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. BMC Med Res Methodol. 2012;12:181.

Gates M, Gates G, Pieper D, Fernandes R, Tricco A, Moher D, et al. Reporting guideline for overviews of reviews of healthcare interventions: development of the PRIOR statement. BMJ. 2022;378:e070849.

Whiting PF, Reitsma JB, Leeflang MMG, Sterne JAC, Bossuyt PMM, Rutjes AWSS, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(4):529–36.

Hayden JA, van der Windt DA, Cartwright JL, Co P. Research and reporting methods assessing bias in studies of prognostic factors. Ann Intern Med. 2013;158(4):280–6.

Critical Appraisal Skills Programme. CASP qualitative checklist. CASP; 2018. Available from: https://casp-uk.net/images/checklist/documents/CASP-Qualitative-Studies-Checklist/CASP-Qualitative-Checklist-2018_fillable_form.pdf . Cited 2022 Apr 26

Hannes K, Lockwood C, Pearson A. A comparative analysis of three online appraisal instruments’ ability to assess validity in qualitative research. Qual Health Res. 2010;20(12):1736–43.

Munn Z, Moola S, Riitano D, Lisy K. The development of a critical appraisal tool for use in systematic reviews addressing questions of prevalence. Int J Heal Policy Manag. 2014;3(3):123–8.

Lewin S, Bohren M, Rashidian A, Munthe-Kaas H, Glenton C, Colvin CJ, et al. Applying GRADE-CERQual to qualitative evidence synthesis findings-paper 2: how to make an overall CERQual assessment of confidence and create a Summary of Qualitative Findings table. Implement Sci. 2018;13(suppl 1):10.

Munn Z, Porritt K, Lockwood C, Aromataris E, Pearson A.  Establishing confidence in the output of qualitative research synthesis: the ConQual approach. BMC Med Res Methodol. 2014;14(1):108.

Flemming K, Booth A, Hannes K, Cargo M, Noyes J. Cochrane Qualitative and Implementation Methods Group guidance series—paper 6: reporting guidelines for qualitative, implementation, and process evaluation evidence syntheses. J Clin Epidemiol. 2018;97:79–85.

Lockwood C, Munn Z, Porritt K. Qualitative research synthesis:  methodological guidance for systematic reviewers utilizing meta-aggregation. Int J Evid Based Health. 2015;13(3):179–87.

Schünemann HJ, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, et al. GRADE guidelines: 21 part 1.  Study design, risk of bias, and indirectness in rating the certainty across a body of evidence for test accuracy. J Clin Epidemiol. 2020;122:129–41.

Schünemann HJ, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, et al. GRADE guidelines: 21 part 2. Test accuracy: inconsistency, imprecision, publication bias, and other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables. J Clin Epidemiol. 2020;122:142–52.

Foroutan F, Guyatt G, Zuk V, Vandvik PO, Alba AC, Mustafa R, et al. GRADE Guidelines 28: use of GRADE for the assessment of evidence about prognostic factors:  rating certainty in identification of groups of patients with different absolute risks. J Clin Epidemiol. 2020;121:62–70.

Janiaud P, Agarwal A, Belbasis L, Tzoulaki I. An umbrella review of umbrella reviews for non-randomized observational evidence on putative risk and protective factors [internet]. OSF protocol; 2021 [cited 2022 May 28]. Available from: https://osf.io/xj5cf/ .

Mokkink LB, Prinsen CA, Patrick DL, Alonso J, Bouter LM, et al. COSMIN methodology for systematic reviews of Patient-Reported Outcome Measures (PROMs) - user manual. COSMIN; 2018 [cited 2022 Feb 15]. Available from:  http://www.cosmin.nl/ .

Thomas J, M P, Noyes J, Chandler J, Rehfuess E, Tugwell P, et al. Chapter 17: Intervention complexity. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-17 . Cited 2022 Oct 12

Guise JM, Chang C, Butler M, Viswanathan M, Tugwell P. AHRQ series on complex intervention systematic reviews—paper 1: an introduction to a series of articles that provide guidance and tools for reviews of complex interventions. J Clin Epidemiol. 2017;90:6–10.

Riaz IB, He H, Ryu AJ, Siddiqi R, Naqvi SAA, Yao Y, et al. A living, interactive systematic review and network meta-analysis of first-line treatment of metastatic renal cell carcinoma [formula presented]. Eur Urol. 2021;80(6):712–23.

Créquit P, Trinquart L, Ravaud P. Live cumulative network meta-analysis: protocol for second-line treatments in advanced non-small-cell lung cancer with wild-type or unknown status for epidermal growth factor receptor. BMJ Open. 2016;6(8):e011841.

Ravaud P, Créquit P, Williams HC, Meerpohl J, Craig JC, Boutron I. Future of evidence ecosystem series: 3. From an evidence synthesis ecosystem to an evidence ecosystem. J Clin Epidemiol. 2020;123:153–61.

Download references

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Author information

Authors and affiliations.

Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC, USA

Kat Kolaski

Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY, USA

Lynne Romeiser Logan

Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA, USA

John P. A. Ioannidis

You can also search for this author in PubMed   Google Scholar

Contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Kat Kolaski .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’ s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Supplementary Information

Additional file 2a..

Overviews, scoping reviews, rapid reviews and living reviews.

Additional file 2B.

Practical scheme for distinguishing types of research evidence.

Additional file 4.

Presentation of forest plots.

Additional file 5A.

Illustrations of the GRADE approach.

Additional file 5B.

 Adaptations of GRADE for evidence syntheses.

Additional file 6.

 Links to Concise Guide online resources.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kolaski, K., Logan, L.R. & Ioannidis, J.P.A. Guidance to best tools and practices for systematic reviews. Syst Rev 12 , 96 (2023). https://doi.org/10.1186/s13643-023-02255-9

Download citation

Received : 03 October 2022

Accepted : 19 February 2023

Published : 08 June 2023

DOI : https://doi.org/10.1186/s13643-023-02255-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Certainty of evidence
  • Critical appraisal
  • Methodological quality
  • Risk of bias
  • Systematic review

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

critical appraisal of systematic review essay

Critical Appraisal of Systematic Reviews and Meta-Analyses

  • First Online: 27 June 2021

Cite this chapter

critical appraisal of systematic review essay

  • Sanjay Patole 2 , 3  

3183 Accesses

Systematic reviews are the most reliable and comprehensive statement about what works. They focus on a specific question and use clearly stated, prespecified scientific methods to identify, select, assess, and summarise the findings of similar but separate studies. A systematic review may or may not contain a meta-analysis for various reasons. Given the hierarchy of evidence-based medicine, a systematic review and meta-analysis are expected to provide robust evidence to guide clinical practice and research. However, the methodological rigour (design, conduct, analysis, interpretation, and reporting) of both, the systematic review and meta-analysis and the included studies deserve equal attention for judging the validity of the findings of a systematic review. Reproducibility is a critical aspect of science. Without transparency about what was done, and how it was done, it is difficult to reproduce the results, questioning the validity of any study. This chapter focuses on the critical appraisal of a systematic review and meta-analysis based on their principles and practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

critical appraisal of systematic review essay

Quality Assessment in Meta-analysis

Systematic reviewing.

critical appraisal of systematic review essay

Systematic Reviews and Meta-Analysis: A Guide for Beginners

Baker R, Jackson D. A new approach to outliers in meta-analysis. Health Care Manag Sci. 2008;11:121–31.

Article   PubMed   Google Scholar  

Balasubramanian H, Ananthan A, Rao S, Patole S. Odds ratio vs risk ratio in randomised controlled trials. Postgrad Med. 2015;127:359–67.

Bashir R, Surian D, Gunn AG. Time-to-update of systematic reviews relative to the availability of new evidence. Syst Rev. 2018;7:195. https://doi.org/10.1186/s13643-018-0856-9 .

Article   PubMed   PubMed Central   Google Scholar  

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: How will we ever keep up? PLoS Med. 2010;7: https://doi.org/10.1371/journal.pmed.1000326 .

Bender R, Bunce C, Clarke M, Gates S, Lange S, Pace NL, Thorlund K. Attention should be given to multiplicity issues in systematic reviews. J Clin Epidemiol. 2008;61:857–65.

Borenstein M, Hedges LV, Higggins JPT, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods. 2010;1:97–111.

Bowden J, Tierney JF, Copas AJ, et al. Quantifying, displaying and accounting for heterogeneity in the meta-analysis of RCTs using standard and generalised Q statistics. BMC Med Res Methodol. 2011;11:41. https://doi.org/10.1186/1471-2288-11-41 .

Bramer WM, Rethlefsen ML, Kleijnen J, Franco OH. Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study. Syst Rev. 2017,6: Article 245. https://systematicreviewsjournal.biomedcentral.com/articles/10.1186/s13643-017-0644-y .

Brown PA, Harniss MK, Schomer KG, Feinberg M, Cullen NK, Johnson KL. Conducting systematic evidence reviews: core concepts and lessons learned. Arch Phys Med Rehabil. 2012;93:S177–84.

Chess LE, Gagnier JJ. Applicable or non-applicable: investigations of clinical heterogeneity in systematic reviews. BMC Med Res Methodol. 2016;17(16):19. https://doi.org/10.1186/s12874-016-0121-7 .

Article   Google Scholar  

Clarke M, Hopewell S, Chalmers I. Clinical trials should begin and end with systematic reviews of relevant evidence: 12 years and waiting. Lancet. 2010;376:20–1.

Cooper NJ, Jones DR, Sutton AJ. The use of systematic reviews when designing studies. Clin Trials. 2005;2:260–4.

Cooper C, Booth A, Varley-Campbell J, Britten N, Garside R. Defining the process to literature searching in systematic reviews: a literature review of guidance and supporting studies. BMC Med Res Methodol. 2018;18:85. https://doi.org/10.1186/s12874-018-0545-3 .

Coulson M, Healey M, Fidler F, Cumming G. Confidence intervals permit, but do not guarantee, better inference than statistical significance testing. Front Psychol. 2010;1:26. https://doi.org/10.3389/fpsyg.2010.00026 . eCollection 2010.

da Costa BR, Juni P. Systematic reviews and meta-analyses of randomised trials: principles and pitfalls. Eur Heart J. 2014;14(35):3336–45.

Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane handbook for systematic reviews of interventions version 6.0 (updated July 2019). Cochrane, 2019. www.training.cochrane.org/handbook .

Dickersin K. The existence of publication bias and risk factors for its occurrence. JAMA. 1990;263:1385–9.

Article   CAS   PubMed   Google Scholar  

EBM notebook: Weighted event rates. Werre SR, Walter-Dilks C. BMJ Evid Based Med. 2005;10:70. http://dx.doi.org/10.1136/ebm.10.3.70 .

Ebrahim S. The use of numbers needed to treat derived from systematic reviews and meta-analysis: caveats and pitfalls. Eval Health Prof. 2001;24:152–64.

Egger M, Smith GD, Phillips AN. Meta-analysis: principles and procedures. BMJ. 1997;315(7121):1533–7.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Esteves SC, Majzoub A, Agarwal A. The problem of mixing ‘apples and oranges’ in meta-analytic studies. Transl Androl Urol. 2017;6:S412–3. https://doi.org/10.21037/tau.2017.03.23 .

Evangelou E, Ioanidis JPA, Patsopoulos NA. Uncertainty in Heterogeneity Estimates in Meta-Analyses. BMJ. 2007;335:914–6.

Fuhr U, Hellmich M. Channelling the flood of meta-analyses. Eur J Clin Pharmacol. 2015;71:645–7.

Gagnier JJ, Moher D, Boon H, Beyene J, Bombardier C. Investigating clinical heterogeneity in systematic reviews: a methodologic review of guidance in the literature. BMC Med Res Methodol. 2012;30(12):111. https://doi.org/10.1186/1471-2288-12-111 .

Garner P, Hopewell S, Chandler J, et al. When and how to update systematic reviews: consensus and checklist. BMJ. 2016;354: https://doi.org/10.1136/bmj.i3507 .

Gibbs NM, Gibbs SV. Misuse of ‘trend’ to describe ‘almost significant’ differences in anaesthesia research. Br J Anaesth. 2015;115:337–9.

Glasziou PP, Shepperd S, Brassey J. Can we rely on the best trial? A comparison of individual trials and systematic reviews. BMC Med Res Methodol. 2010;18(10):23. https://doi.org/10.1186/1471-2288-10-23 .

Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines 6. Rating the quality of evidence and imprecision. J Clin Epidemiol 2011; 64: 1283e–1293.

Google Scholar  

Haddaway NR, Rytwinski T. Meta-analysis is not an exact science: Call for guidance on quantitative synthesis decisions. Environ Int. 2018;114:357–9.

Higgins J, Thompson S, Deeks JJ, Altman D. Statistical heterogeneity in systematic reviews of clinical trials: a critical appraisal of guidelines and practice. J Health Service Res Policy. 2002; 7:51–61.

Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;15(21):1539–58.

Hozo SP, Djulbegovic B, Hozo I. Estimating the mean and variance from the median, range, and the size of a sample. BMC Med Res Methodol. 2005;5:13.

Huang HY, Andrews E, Jones J, Skovron ML, Tilson H. Pitfalls in meta-analyses on adverse events reported from clinical trials. Pharmacoepidemiol Drug Saf. 2011;20:1014–20.

Huedo-Medina TB, Sánchez-Meca J, Marín-Martínez F, Botella J. Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Methods. 2006;11:193–206.

Hunter JE, Schmidt FL. Fixed effects vs. random effects meta-analysis models: Implications for cumulative research knowledge. Int J Sel Assess. 2000; 8: 275–292.

IntHout J, Ioannidis JP, Borm GF, Goeman JJ. Small studies are more heterogeneous than large ones: a meta-meta-analysis. J Clin Epidemiol. 2015;68:860–9.

Ioannidis JP. Interpretation of tests of heterogeneity and bias in meta-analysis. J Eval Clin Pract. 2008;14:951–7.

Ioannidis JA. The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q. 2016;94:485–514. https://doi.org/10.1111/1468-0009.12210 .

Ioannidis JP, Lau J. Completeness of safety reporting in randomised trials: an evaluation of 7 medical areas. JAMA. 2001;285:437–43.

Ioannidis JP, Cappelleri JC, Lau J. Issues in comparisons between meta-analyses and large trials. JAMA. 1998;8(279):1089–93.

Jackson D, Turner R. Power analysis for random-effects meta-analysis. Res Synth Methods. 2017;8:290–302.

Jeong W, Keighley C, Wolfe R, et al. The epidemiology and clinical manifestations of mucormycosis: a systematic review and meta-analysis of case reports. Syst Rev. 2019;25(1):26–34. https://doi.org/10.1016/j.cmi.2018.07.011 .

Article   CAS   Google Scholar  

Jin ZC, Zhou XH, He J. Statistical methods for dealing with publication bias in meta-analysis. Stat Med. 2015;34:343–60.

Jones JB, Blecker S, Shah NR. Meta-analysis 101: what you want to know in the era of comparative effectiveness. Am Health Drug Benefits. 2008;1:38–43.

CAS   PubMed   PubMed Central   Google Scholar  

Jørgensen AW, Maric KL, Tendal B, Faurschou A, Gøtzsche PC. Industry-supported meta-analyses compared with meta-analyses with non-profit or no support: differences in methodological quality and conclusions. BMC Med Res Methodol. 2008;9(8):60. https://doi.org/10.1186/1471-2288-8-60 .

Kriston L. Dealing with clinical heterogeneity in meta-analysis. Assumptions, methods, interpretation. Int J Methods Psychiatr Res. 2013;22:1–15.

Lakens D, Hilgard J, Staaks J. On the reproducibility of meta-analyses: six practical recommendations. BMC Psychol. 2016;31(4):24. https://doi.org/10.1186/s40359-016-0126-3 .

Lau J, Ioannidis JP, Terrin N, Schmid CH, Olkin I. The case of the misleading funnel plot. BMJ. 2006;333(7568):597–600.

Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ. 2001;322:1479–80.

Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009; 62: e1–e34.

Liberati A. Meta-analysis: statistical alchemy for the 21st century: discussion. A plea for a more balanced view of meta-analysis and systematic overviews of the effect of health care interventions. J Clin Epidemiol. 1995;48:81–6.

Mahtani KR. All health researchers should begin their training by preparing at least one systematic review. J R Soc Med. 2016;109:264–8.

Malone DC, Hines LE, Graff JS. The good, the bad, and the different: a primer on aspects of heterogeneity of treatment effects. J Manage Care Special Pharm. 2014;20:555–63.

Melson WG, Bootsma MCJ, Rovers MM, Bonten MJM. The effects of clinical and statistical heterogeneity on the predictive values of results from meta-analyses. Clin Microbiol Infec. 2014;20:123–9.

Mohan BP, Adler DG. Heterogeneity in systematic review and meta-analysis: how to read between the numbers. Gastrointest Endosc. 2019;89:902–3.

Nikolakopoulou A, Mavridis D, Salanti G. How to interpret meta-analysis models: fixed effect and random effects meta-analyses. Evid Based Mental Health. 2014. https://doi.org/10.1136/eb-2014-101794 .

Phan K, Tian DH, Cao C, Black D, Yan TD. Systematic review and meta-analysis: techniques and a guide for the academic surgeon. Ann Cardiothorac Surg. 2015;4:112–22.

PubMed   PubMed Central   Google Scholar  

Purgato M, Adams CE. Heterogeneity: the issue of apples, oranges and fruit pie. Epidemiol Psychiatr Sci. 2012;21:27–9.

Ranganathan P, Pramesh CS, Buyse M. Common pitfalls in statistical analysis: clinical versus statistical significance. Perspect Clin Res. 2015;6:169–70. https://doi.org/10.4103/2229-3485.159943 .

Richardson M, Garner P, Donegan S. Interpretation of subgroup analyses in systematic reviews: a tutorial. Clin Epidemiol Global Health. 2018 (Published: May 28, 2018) https://doi.org/10.1016/j.cegh.2018.05.005 .

Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995; 123:A12–3.

Roever L, Zoccai GB. Critical appraisal of systematic reviews and meta-analyses. Evid Based Med Pract. 2015;1:1. https://doi.org/10.4172/EBMP.1000e106 .

Rücker G, Schwarzer G, Carpenter JR, et al. Undue reliance on I(2) in assessing heterogeneity may mislead. BMC Med Res Methodol. 2008;8:79.

Sanchez-Meca J, Marin-Martinez F. Confidence intervals for the overall effect size in random-effects meta-analysis. Psychol Methods. 2008;13:31–48.

Schmidt FL, Oh IS, Hayes TL. Fixed-versus random-effects models in meta-analysis: model properties and an empirical comparison of differences in results. Br J Math Stat Psychol. 2009;62:97–128.

Scifres CM, Iams JD, Klebanoff M, Macones GA. Meta-analysis vs large clinical trials: which should guide our management? Am J Obstet Gynecol. 2009;200:484.e1–4845.

Sedgwick P. How to read a funnel plot in a meta-analysis. BMJ. 2015;351: https://doi.org/10.1136/bmj.h4718 (Published 16 September 2015).

Shokraneh F. Reproducibility and replicability of systematic reviews. World J Meta-Anal. 2019; 7(3): 66–71. https://dx.doi.org/10.13105/wjma.v7.i3.66 .

Shrier I, Boivin JF, Platt RW, et al. The interpretation of systematic reviews with meta-analyses: an objective or subjective process? BMC Med Inform Decis Making. 2008;19. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472–6947-8-19 .

Shuster JJ. Empirical vs natural weighting in random effects meta-analysis. Stat Med. 2010;30(29):1259–65.

Simonsohn U, Nelson LD, Simmons JP. P-curve: A key to the file-drawer. J Exp Psychol Gen. 2014;143:534–47.

Smith TO, Hing CB. “Garbage in, garbage out”- the importance of detailing methodological reasoning in orthopaedic meta-analysis. Int Orthop. 2011;35:301–2.

Stanley TD, Doucouliagos H. Neither fixed nor random: weighted least squares meta-analysis. Stat Med. 2015;15(34):2116–27.

Sterne JAC, Sutton AJ, Ioannidis JPA, et al. Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ. 2011;22(343): https://doi.org/10.1136/bmj.d4002 .

Sterne JA, Hernán MA, Reeves BC, et al. ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;12(355): https://doi.org/10.1136/bmj.i4919 .

Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting: Meta-analysis of observational studies in epidemiology (MOOSE) group. JAMA. 2000; 283: 2008–2012.

Subgroups in meta-analysis–Section 3.3.7: JBI Reviewer’s Manual–JBI GLOBAL WIKI https://wiki.joannabriggs.org/display/MANUAL/3.3.7+Subgroups+in+meta-analysis .

Thompson SG, Pocock SJ. Can meta-analyses be trusted? Lancet. 1991;2(338):1127–30.

Tricco AC, Straus SE, Moher D. How can we improve the interpretation of systematic reviews? BMC Med. 2011: 31. https://bmcmedicine.biomedcentral.com/articles/10.1186/1741-7015-9-31 .

van der Knaap LM, Leeuw FL, Bogaerts S, Nijssen LTJ. Combining Campbell standard and the realist evaluation approach: the best of two worlds? Am J Eval. 2008;29:48–57.

van Driel ML, De Sutter A, De Maeseneer J, Christiaens T. Searching for unpublished trials in Cochrane reviews may not be worth the effort. J Clin Epidemiol. 2009;62(838–44): https://doi.org/10.1016/j.jclinepi.2008.09.010 .

Viechtbauer W. Confidence intervals for the amount of heterogeneity in meta-analysis. Stat Med. 2007;26:37–52.

von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61:344–9.

von Hippel PT. The heterogeneity statistic I(2) can be biased in small meta-analyses. BMC Med Res Methodol. 2015;14(15):35.

Wells GA, Shea B, O’Connell D, et al. The Newcastle-Ottawa Scale (NOS) for assessing the quality of non-randomised studies in meta-analyses. The Ottawa Hospital Research Institute; 2000. http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp .

Xu C, Li L, Lin L, et al. Exclusion of studies with no events in both arms in meta-analysis impacted the conclusions. J Clin Epidemiol. 2020; 123: 91–99.

Download references

Author information

Authors and affiliations.

School of Medicine, University of Western Australia, Perth, WA, 6009, Australia

Sanjay Patole

Neonatal Directorate, King Edward Memorial Hospital for Women, Perth, WA, 6008, Australia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Sanjay Patole .

Editor information

Editors and affiliations.

School of Medicine, University of Western Australia, Perth, WA, Australia

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Patole, S. (2021). Critical Appraisal of Systematic Reviews and Meta-Analyses. In: Patole, S. (eds) Principles and Practice of Systematic Reviews and Meta-Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-71921-0_12

Download citation

DOI : https://doi.org/10.1007/978-3-030-71921-0_12

Published : 27 June 2021

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-71920-3

Online ISBN : 978-3-030-71921-0

eBook Packages : Biomedical and Life Sciences Biomedical and Life Sciences (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Attention: UniSA network-related systems are currently down - impacting internet access and access to resources.

Phone support is available on 1300 137 659

  • Overview of systematic reviews
  • Systematic or scoping?
  • Other review types
  • Glossary of terms
  • Define question
  • Top tools and techniques
  • How to search
  • Where to search
  • Subject headings
  • Search filters
  • Review your search
  • Run your search on other databases
  • Search the grey literature
  • Report search results
  • Updating a search
  • How to screen

Critical appraisal

Critical appraisal overview.

Before using studies in your review, you need to critically appraise them for quality and risk of bias.

Move through the slide deck below to learn more about critical appraisal. Alternatively, download the PDF document at the bottom of this box.

  • Critical appraisal This document is a printable version of the slide deck above.

Critical appraisal tools

Use a formal Critical Appraisal Tool ("CAT") to assess your papers. The tool must be applied without adaptation to the appropriate study design.

  • AMSTAR Assessment of Multiple Systematic Reviews (AMSTAR) is a 37-item assessment tool used to assess the methodological quality of systematic reviews.
  • COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN)
  • Critical appraisal skills programme (CASP) Tools for systematic reviews, randomised controlled trials, cohort studies, and case control studies
  • GATE (Graphic Appraisal Tool for Epidemiology)
  • JBI: Critical appraisal tools CATs for a range of study types: case control, case reports, cohort, diagnostic test , economic evaluations, prevalence, quasi-experimental, RCTs, SRs and more.
  • Methodological index for non-randomized studies (MINORS)
  • PEDro – Physiotherapy Evidence Database Developed to support evidence-based practice in physiotherapy, PEDro can be used to find trials, reviews and guidelines evaluating physiotherapy interventions. Trials are assessed for quality using the PEDro scale. Aimed at a global audience and produced in Australia by the Institute for Musculoskeletal Health.

The application of a CAT ('CASP')

The Cochrane Common Mental Disorders group have produced 7 videos demonstrating the application of the CASP checklist to different study designs.

  • 1. Introduction to critical appraisal
  • 2. Systematic reviews and meta analysis
  • 3. Randomised controlled trials
  • 4. Cohort studies
  • 5. Case control studies
  • 6. Cross sectional studies
  • 7. Diagnostic studies

Guidelines and standards

Medical icon

  • Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) website

'Specify the methods used to assess risk of bias in the included studies, including details of the tool(s) used, how many reviewers assessed each study and whether they worked independently, and, if applicable, details of automation tools used in the process.' - PRISMA 2020 Explanation and Elaboration, p. 11

Other standards

  • Overview of systematic reviews See the overview page (of this guide) for additional guidelines and standards.

Hierarchy of evidence variations

Many hierarchies have been developed to show the different levels of evidence, and to 'rank' different study designs. See some common ones below:

  • NHMRC Evidence Hierarchy Evidence Hierarchy used by Australia's National Health & Medical Research Council.
  • Brian Haynes 6S Hierarchy of Evidence Based Resources Developed in 2009 this is a useful model for guiding clinical decision making.
  • Levels of Evidence FAME JBI, Collaborating Centres, and Evidence Translation Groups assign a level of evidence to all conclusions drawn in JBI Systematic Reviews.
  • CEBM 2020 Levels of Evidence System In addition to traditional critical appraisal, this enables clinicians/patients to answer clinical questions quickly and without pre-appraised sources.

Further resources

  • Quality of reporting of observational longitudinal research This study developed and tested a checklist of criteria related to threats to the internal and external validity of observational longitudinal studies.
  • Study designs (Centre for Evidence-Based Medicine) This short article gives a brief guide to the different study types and a comparison of the advantages and disadvantages.
  • Using preprints in evidence synthesis: Commentary on experience during the COVID-19 pandemic Clyne, B., Walsh, K. A., O'Murchu, E., et al. (2021). Journal of Clinical Epidemiology, 138, 203-210. https://doi.org/10.1016/j.jclinepi.2021.05.010
  • << Previous: Covidence
  • Next: Extraction >>
  • Last Updated: Aug 22, 2024 10:13 AM
  • URL: https://guides.library.unisa.edu.au/SystematicReviews

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Clin Diagn Res
  • v.11(5); 2017 May

Critical Appraisal of Clinical Research

Azzam al-jundi.

1 Professor, Department of Orthodontics, King Saud bin Abdul Aziz University for Health Sciences-College of Dentistry, Riyadh, Kingdom of Saudi Arabia.

Salah Sakka

2 Associate Professor, Department of Oral and Maxillofacial Surgery, Al Farabi Dental College, Riyadh, KSA.

Evidence-based practice is the integration of individual clinical expertise with the best available external clinical evidence from systematic research and patient’s values and expectations into the decision making process for patient care. It is a fundamental skill to be able to identify and appraise the best available evidence in order to integrate it with your own clinical experience and patients values. The aim of this article is to provide a robust and simple process for assessing the credibility of articles and their value to your clinical practice.

Introduction

Decisions related to patient value and care is carefully made following an essential process of integration of the best existing evidence, clinical experience and patient preference. Critical appraisal is the course of action for watchfully and systematically examining research to assess its reliability, value and relevance in order to direct professionals in their vital clinical decision making [ 1 ].

Critical appraisal is essential to:

  • Combat information overload;
  • Identify papers that are clinically relevant;
  • Continuing Professional Development (CPD).

Carrying out Critical Appraisal:

Assessing the research methods used in the study is a prime step in its critical appraisal. This is done using checklists which are specific to the study design.

Standard Common Questions:

  • What is the research question?
  • What is the study type (design)?
  • Selection issues.
  • What are the outcome factors and how are they measured?
  • What are the study factors and how are they measured?
  • What important potential confounders are considered?
  • What is the statistical method used in the study?
  • Statistical results.
  • What conclusions did the authors reach about the research question?
  • Are ethical issues considered?

The Critical Appraisal starts by double checking the following main sections:

I. Overview of the paper:

  • The publishing journal and the year
  • The article title: Does it state key trial objectives?
  • The author (s) and their institution (s)

The presence of a peer review process in journal acceptance protocols also adds robustness to the assessment criteria for research papers and hence would indicate a reduced likelihood of publication of poor quality research. Other areas to consider may include authors’ declarations of interest and potential market bias. Attention should be paid to any declared funding or the issue of a research grant, in order to check for a conflict of interest [ 2 ].

II. ABSTRACT: Reading the abstract is a quick way of getting to know the article and its purpose, major procedures and methods, main findings, and conclusions.

  • Aim of the study: It should be well and clearly written.
  • Materials and Methods: The study design and type of groups, type of randomization process, sample size, gender, age, and procedure rendered to each group and measuring tool(s) should be evidently mentioned.
  • Results: The measured variables with their statistical analysis and significance.
  • Conclusion: It must clearly answer the question of interest.

III. Introduction/Background section:

An excellent introduction will thoroughly include references to earlier work related to the area under discussion and express the importance and limitations of what is previously acknowledged [ 2 ].

-Why this study is considered necessary? What is the purpose of this study? Was the purpose identified before the study or a chance result revealed as part of ‘data searching?’

-What has been already achieved and how does this study be at variance?

-Does the scientific approach outline the advantages along with possible drawbacks associated with the intervention or observations?

IV. Methods and Materials section : Full details on how the study was actually carried out should be mentioned. Precise information is given on the study design, the population, the sample size and the interventions presented. All measurements approaches should be clearly stated [ 3 ].

V. Results section : This section should clearly reveal what actually occur to the subjects. The results might contain raw data and explain the statistical analysis. These can be shown in related tables, diagrams and graphs.

VI. Discussion section : This section should include an absolute comparison of what is already identified in the topic of interest and the clinical relevance of what has been newly established. A discussion on a possible related limitations and necessitation for further studies should also be indicated.

Does it summarize the main findings of the study and relate them to any deficiencies in the study design or problems in the conduct of the study? (This is called intention to treat analysis).

  • Does it address any source of potential bias?
  • Are interpretations consistent with the results?
  • How are null findings interpreted?
  • Does it mention how do the findings of this study relate to previous work in the area?
  • Can they be generalized (external validity)?
  • Does it mention their clinical implications/applicability?
  • What are the results/outcomes/findings applicable to and will they affect a clinical practice?
  • Does the conclusion answer the study question?
  • -Is the conclusion convincing?
  • -Does the paper indicate ethics approval?
  • -Can you identify potential ethical issues?
  • -Do the results apply to the population in which you are interested?
  • -Will you use the results of the study?

Once you have answered the preliminary and key questions and identified the research method used, you can incorporate specific questions related to each method into your appraisal process or checklist.

1-What is the research question?

For a study to gain value, it should address a significant problem within the healthcare and provide new or meaningful results. Useful structure for assessing the problem addressed in the article is the Problem Intervention Comparison Outcome (PICO) method [ 3 ].

P = Patient or problem: Patient/Problem/Population:

It involves identifying if the research has a focused question. What is the chief complaint?

E.g.,: Disease status, previous ailments, current medications etc.,

I = Intervention: Appropriately and clearly stated management strategy e.g.,: new diagnostic test, treatment, adjunctive therapy etc.,

C= Comparison: A suitable control or alternative

E.g.,: specific and limited to one alternative choice.

O= Outcomes: The desired results or patient related consequences have to be identified. e.g.,: eliminating symptoms, improving function, esthetics etc.,

The clinical question determines which study designs are appropriate. There are five broad categories of clinical questions, as shown in [ Table/Fig-1 ].

[Table/Fig-1]:

Categories of clinical questions and the related study designs.

Clinical QuestionsClinical Relevance and Suggested Best Method of Investigation
Aetiology/CausationWhat caused the disorder and how is this related to the development of illness.
Example: randomized controlled trial - case-control study- cohort study.
TherapyWhich treatments do more good than harm compared with an alternative treatment?
Example: randomized control trial, systematic review, meta- analysis.
PrognosisWhat is the likely course of a patient’s illness?
What is the balance of the risks and benefits of a treatment?
Example: cohort study, longitudinal survey.
DiagnosisHow valid and reliable is a diagnostic test?
What does the test tell the doctor?
Example: cohort study, case -control study
Cost- effectivenessWhich intervention is worth prescribing?
Is a newer treatment X worth prescribing compared with older treatment Y?
Example: economic analysis

2- What is the study type (design)?

The study design of the research is fundamental to the usefulness of the study.

In a clinical paper the methodology employed to generate the results is fully explained. In general, all questions about the related clinical query, the study design, the subjects and the correlated measures to reduce bias and confounding should be adequately and thoroughly explored and answered.

Participants/Sample Population:

Researchers identify the target population they are interested in. A sample population is therefore taken and results from this sample are then generalized to the target population.

The sample should be representative of the target population from which it came. Knowing the baseline characteristics of the sample population is important because this allows researchers to see how closely the subjects match their own patients [ 4 ].

Sample size calculation (Power calculation): A trial should be large enough to have a high chance of detecting a worthwhile effect if it exists. Statisticians can work out before the trial begins how large the sample size should be in order to have a good chance of detecting a true difference between the intervention and control groups [ 5 ].

  • Is the sample defined? Human, Animals (type); what population does it represent?
  • Does it mention eligibility criteria with reasons?
  • Does it mention where and how the sample were recruited, selected and assessed?
  • Does it mention where was the study carried out?
  • Is the sample size justified? Rightly calculated? Is it adequate to detect statistical and clinical significant results?
  • Does it mention a suitable study design/type?
  • Is the study type appropriate to the research question?
  • Is the study adequately controlled? Does it mention type of randomization process? Does it mention the presence of control group or explain lack of it?
  • Are the samples similar at baseline? Is sample attrition mentioned?
  • All studies report the number of participants/specimens at the start of a study, together with details of how many of them completed the study and reasons for incomplete follow up if there is any.
  • Does it mention who was blinded? Are the assessors and participants blind to the interventions received?
  • Is it mentioned how was the data analysed?
  • Are any measurements taken likely to be valid?

Researchers use measuring techniques and instruments that have been shown to be valid and reliable.

Validity refers to the extent to which a test measures what it is supposed to measure.

(the extent to which the value obtained represents the object of interest.)

  • -Soundness, effectiveness of the measuring instrument;
  • -What does the test measure?
  • -Does it measure, what it is supposed to be measured?
  • -How well, how accurately does it measure?

Reliability: In research, the term reliability means “repeatability” or “consistency”

Reliability refers to how consistent a test is on repeated measurements. It is important especially if assessments are made on different occasions and or by different examiners. Studies should state the method for assessing the reliability of any measurements taken and what the intra –examiner reliability was [ 6 ].

3-Selection issues:

The following questions should be raised:

  • - How were subjects chosen or recruited? If not random, are they representative of the population?
  • - Types of Blinding (Masking) Single, Double, Triple?
  • - Is there a control group? How was it chosen?
  • - How are patients followed up? Who are the dropouts? Why and how many are there?
  • - Are the independent (predictor) and dependent (outcome) variables in the study clearly identified, defined, and measured?
  • - Is there a statement about sample size issues or statistical power (especially important in negative studies)?
  • - If a multicenter study, what quality assurance measures were employed to obtain consistency across sites?
  • - Are there selection biases?
  • • In a case-control study, if exercise habits to be compared:
  • - Are the controls appropriate?
  • - Were records of cases and controls reviewed blindly?
  • - How were possible selection biases controlled (Prevalence bias, Admission Rate bias, Volunteer bias, Recall bias, Lead Time bias, Detection bias, etc.,)?
  • • Cross Sectional Studies:
  • - Was the sample selected in an appropriate manner (random, convenience, etc.,)?
  • - Were efforts made to ensure a good response rate or to minimize the occurrence of missing data?
  • - Were reliability (reproducibility) and validity reported?
  • • In an intervention study, how were subjects recruited and assigned to groups?
  • • In a cohort study, how many reached final follow-up?
  • - Are the subject’s representatives of the population to which the findings are applied?
  • - Is there evidence of volunteer bias? Was there adequate follow-up time?
  • - What was the drop-out rate?
  • - Any shortcoming in the methodology can lead to results that do not reflect the truth. If clinical practice is changed on the basis of these results, patients could be harmed.

Researchers employ a variety of techniques to make the methodology more robust, such as matching, restriction, randomization, and blinding [ 7 ].

Bias is the term used to describe an error at any stage of the study that was not due to chance. Bias leads to results in which there are a systematic deviation from the truth. As bias cannot be measured, researchers need to rely on good research design to minimize bias [ 8 ]. To minimize any bias within a study the sample population should be representative of the population. It is also imperative to consider the sample size in the study and identify if the study is adequately powered to produce statistically significant results, i.e., p-values quoted are <0.05 [ 9 ].

4-What are the outcome factors and how are they measured?

  • -Are all relevant outcomes assessed?
  • -Is measurement error an important source of bias?

5-What are the study factors and how are they measured?

  • -Are all the relevant study factors included in the study?
  • -Have the factors been measured using appropriate tools?

Data Analysis and Results:

- Were the tests appropriate for the data?

- Are confidence intervals or p-values given?

  • How strong is the association between intervention and outcome?
  • How precise is the estimate of the risk?
  • Does it clearly mention the main finding(s) and does the data support them?
  • Does it mention the clinical significance of the result?
  • Is adverse event or lack of it mentioned?
  • Are all relevant outcomes assessed?
  • Was the sample size adequate to detect a clinically/socially significant result?
  • Are the results presented in a way to help in health policy decisions?
  • Is there measurement error?
  • Is measurement error an important source of bias?

Confounding Factors:

A confounder has a triangular relationship with both the exposure and the outcome. However, it is not on the causal pathway. It makes it appear as if there is a direct relationship between the exposure and the outcome or it might even mask an association that would otherwise have been present [ 9 ].

6- What important potential confounders are considered?

  • -Are potential confounders examined and controlled for?
  • -Is confounding an important source of bias?

7- What is the statistical method in the study?

  • -Are the statistical methods described appropriate to compare participants for primary and secondary outcomes?
  • -Are statistical methods specified insufficient detail (If I had access to the raw data, could I reproduce the analysis)?
  • -Were the tests appropriate for the data?
  • -Are confidence intervals or p-values given?
  • -Are results presented as absolute risk reduction as well as relative risk reduction?

Interpretation of p-value:

The p-value refers to the probability that any particular outcome would have arisen by chance. A p-value of less than 1 in 20 (p<0.05) is statistically significant.

  • When p-value is less than significance level, which is usually 0.05, we often reject the null hypothesis and the result is considered to be statistically significant. Conversely, when p-value is greater than 0.05, we conclude that the result is not statistically significant and the null hypothesis is accepted.

Confidence interval:

Multiple repetition of the same trial would not yield the exact same results every time. However, on average the results would be within a certain range. A 95% confidence interval means that there is a 95% chance that the true size of effect will lie within this range.

8- Statistical results:

  • -Do statistical tests answer the research question?

Are statistical tests performed and comparisons made (data searching)?

Correct statistical analysis of results is crucial to the reliability of the conclusions drawn from the research paper. Depending on the study design and sample selection method employed, observational or inferential statistical analysis may be carried out on the results of the study.

It is important to identify if this is appropriate for the study [ 9 ].

  • -Was the sample size adequate to detect a clinically/socially significant result?
  • -Are the results presented in a way to help in health policy decisions?

Clinical significance:

Statistical significance as shown by p-value is not the same as clinical significance. Statistical significance judges whether treatment effects are explicable as chance findings, whereas clinical significance assesses whether treatment effects are worthwhile in real life. Small improvements that are statistically significant might not result in any meaningful improvement clinically. The following questions should always be on mind:

  • -If the results are statistically significant, do they also have clinical significance?
  • -If the results are not statistically significant, was the sample size sufficiently large to detect a meaningful difference or effect?

9- What conclusions did the authors reach about the study question?

Conclusions should ensure that recommendations stated are suitable for the results attained within the capacity of the study. The authors should also concentrate on the limitations in the study and their effects on the outcomes and the proposed suggestions for future studies [ 10 ].

  • -Are the questions posed in the study adequately addressed?
  • -Are the conclusions justified by the data?
  • -Do the authors extrapolate beyond the data?
  • -Are shortcomings of the study addressed and constructive suggestions given for future research?
  • -Bibliography/References:

Do the citations follow one of the Council of Biological Editors’ (CBE) standard formats?

10- Are ethical issues considered?

If a study involves human subjects, human tissues, or animals, was approval from appropriate institutional or governmental entities obtained? [ 10 , 11 ].

Critical appraisal of RCTs: Factors to look for:

  • Allocation (randomization, stratification, confounders).
  • Follow up of participants (intention to treat).
  • Data collection (bias).
  • Sample size (power calculation).
  • Presentation of results (clear, precise).
  • Applicability to local population.

[ Table/Fig-2 ] summarizes the guidelines for Consolidated Standards of Reporting Trials CONSORT [ 12 ].

[Table/Fig-2]:

Summary of the CONSORT guidelines.

Title and abstractIdentification as a RCT in the title- Structured summary (trial design, methods, results, and conclusions)
Introduction-Scientific background
-Objectives
Methods-Description of trial design and important changes to methods
-Eligibility criteria for participants
-The interventions for each group
-Completely defined and assessed primary and secondary outcome measures
-How sample size was determined
-Method used to generate the random allocation sequence
-Mechanism used to implement the random allocation sequence
-Blinding details -Statistical methods used
Results-Numbers of participants, losses and exclusions after randomization
-Results for each group and the estimated effect size and its precision (such as 95% confidence interval)
-Results of any other subgroup analyses performed
Discussion-Trial limitations
-Generalisability
Other information- Registration number

Critical appraisal of systematic reviews: provide an overview of all primary studies on a topic and try to obtain an overall picture of the results.

In a systematic review, all the primary studies identified are critically appraised and only the best ones are selected. A meta-analysis (i.e., a statistical analysis) of the results from selected studies may be included. Factors to look for:

  • Literature search (did it include published and unpublished materials as well as non-English language studies? Was personal contact with experts sought?).
  • Quality-control of studies included (type of study; scoring system used to rate studies; analysis performed by at least two experts).
  • Homogeneity of studies.

[ Table/Fig-3 ] summarizes the guidelines for Preferred Reporting Items for Systematic reviews and Meta-Analyses PRISMA [ 13 ].

[Table/Fig-3]:

Summary of PRISMA guidelines.

TitleIdentification of the report as a systematic review, meta-analysis, or both.
AbstractStructured Summary: background; objectives; eligibility criteria; results; limitations; conclusions; systematic review registration number.
Introduction-Description of the rationale for the review
-Provision of a defined statement of questions being concentrated on with regard to participants, interventions, comparisons, outcomes, and study design (PICOS).
Methods-Specification of study eligibility criteria
-Description of all information sources
-Presentation of full electronic search strategy
-State the process for selecting studies
-Description of the method of data extraction from reports and methods used for assessing risk of bias of individual studies in addition to methods of handling data and combining results of studies.
ResultsProvision of full details of:
-Study selection.
-Study characteristics (e.g., study size, PICOS, follow-up period) -Risk of bias within studies.
-Results of each meta-analysis done, including confidence intervals and measures of consistency.
-Methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression).
Discussion-Summary of the main findings including the strength of evidence for each main outcome.
-Discussion of limitations at study and outcome level.
-Provision of a general concluded interpretation of the results in the context of other evidence.
FundingSource and role of funders.

Critical appraisal is a fundamental skill in modern practice for assessing the value of clinical researches and providing an indication of their relevance to the profession. It is a skills-set developed throughout a professional career that facilitates this and, through integration with clinical experience and patient preference, permits the practice of evidence based medicine and dentistry. By following a systematic approach, such evidence can be considered and applied to clinical practice.

Financial or other Competing Interests

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Critical Appraisal of a Systematic Review: A Concise Review

Affiliations.

  • 1 Division of Pulmonary and Critical Care Medicine, Department of Medicine, Medical College of Wisconsin, Milwaukee, WI.
  • 2 Department of Anesthesiology, University Hospital RWTH Aachen University, Aachen, Germany.
  • 3 Department of Intensive Care Medicine, University Hospital RWTH Aachen University, Aachen, Germany.
  • 4 Department of Anesthesiology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia.
  • 5 Clinical Evaluation Research Unit, Department of Critical Care Medicine, Queen's University, KGH Research Institute, Kingston Health Sciences Centre, Kingston, ON, Canada.
  • 6 Department of Anesthesiology and Intensive Care Medicine, University Hospital Wuerzburg, Wuerzburg, Germany.
  • PMID: 35853198
  • DOI: 10.1097/CCM.0000000000005602

Objectives: Concise definitive review of how to read and critically appraise a systematic review.

Data sources: None.

Study selection: Current literature describing the conduct, reporting, and appraisal of systematic reviews and meta-analyses.

Data extraction: Best practices for conducting, reporting, and appraising systematic review were summarized.

Data synthesis: A systematic review is a review of a clearly formulated question that uses systematic and explicit methods to identify, select, and critically appraise relevant original research, and to collect and analyze data from the studies that are included in the review. Critical appraisal methods address both the credibility (quality of conduct) and rate the confidence in the quality of summarized evidence from a systematic review. The A Measurement Tool to Assess Systematic Reviews-2 tool is a widely used practical tool to appraise the conduct of a systematic review. Confidence in estimates of effect is determined by assessing for risk of bias, inconsistency of results, imprecision, indirectness of evidence, and publication bias.

Conclusions: Systematic reviews are transparent and reproducible summaries of research and conclusions drawn from them are only as credible and reliable as their development process and the studies which form the systematic review. Applying evidence from a systematic review to patient care considers whether the results can be directly applied, whether all important outcomes have been considered, and if the benefits are worth potential harms and costs.

Copyright © 2022 by the Society of Critical Care Medicine and Wolters Kluwer Health, Inc. All Rights Reserved.

PubMed Disclaimer

Conflict of interest statement

Dr. Hill’s institution received funding from Fresenius Kabi and the Medical Faculty Rheinisch-Westfälische Technische Hochschule Aachen; she received funding from Fresenius Kabi. The remaining authors have disclosed that they do not have any potential conflicts of interest.

Similar articles

  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
  • The future of Cochrane Neonatal. Soll RF, Ovelman C, McGuire W. Soll RF, et al. Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
  • Behavioural interventions for smoking cessation: an overview and network meta-analysis. Hartmann-Boyce J, Livingstone-Banks J, Ordóñez-Mena JM, Fanshawe TR, Lindson N, Freeman SC, Sutton AJ, Theodoulou A, Aveyard P. Hartmann-Boyce J, et al. Cochrane Database Syst Rev. 2021 Jan 4;1:CD013229. doi: 10.1002/14651858.CD013229.pub2. Cochrane Database Syst Rev. 2021. PMID: 33411338
  • How to Critically Appraise and Interpret Systematic Reviews and Meta-Analyses of Diagnostic Accuracy: A User Guide. Frank RA, Salameh JP, Islam N, Yang B, Murad MH, Mustafa R, Leeflang M, Bossuyt PM, Takwoingi Y, Whiting P, Dawit H, Kang SK, Ebrahimzadeh S, Levis B, Hutton B, McInnes MDF. Frank RA, et al. Radiology. 2023 May;307(3):e221437. doi: 10.1148/radiol.221437. Epub 2023 Mar 14. Radiology. 2023. PMID: 36916896 Free PMC article. Review.
  • Characteristics, quality and volume of the first 5 months of the COVID-19 evidence synthesis infodemic: a meta-research study. Abbott R, Bethel A, Rogers M, Whear R, Orr N, Shaw L, Stein K, Thompson Coon J. Abbott R, et al. BMJ Evid Based Med. 2022 Jun;27(3):169-177. doi: 10.1136/bmjebm-2021-111710. Epub 2021 Jun 3. BMJ Evid Based Med. 2022. PMID: 34083212 Free PMC article. Review.
  • Advantages of unilateral percutaneous kyphoplasty for osteoporotic vertebral compression fractures-a systematic review and meta-analysis. Cao DH, Gu WB, Zhao HY, Hu JL, Yuan HF. Cao DH, et al. Arch Osteoporos. 2024 May 15;19(1):38. doi: 10.1007/s11657-024-01400-8. Arch Osteoporos. 2024. PMID: 38750277 Review.
  • Evaluating COVID-19 vaccine acceptance among parents in Saudi Arabia: a systematic review examining attitudes, hesitancy, and intentions. Sayed AA. Sayed AA. Front Public Health. 2024 Mar 22;12:1327944. doi: 10.3389/fpubh.2024.1327944. eCollection 2024. Front Public Health. 2024. PMID: 38584927 Free PMC article.
  • Utilization of Mirror Visual Feedback for Upper Limb Function in Poststroke Patients: A Systematic Review and Meta-Analysis. Kim H, Lee E, Jung J, Lee S. Kim H, et al. Vision (Basel). 2023 Nov 15;7(4):75. doi: 10.3390/vision7040075. Vision (Basel). 2023. PMID: 37987295 Free PMC article. Review.
  • Comparison of clinical outcomes between aggressive and non-aggressive intravenous hydration for acute pancreatitis: a systematic review and meta-analysis. Li XW, Wang CH, Dai JW, Tsao SH, Wang PH, Tai CC, Chien RN, Shao SC, Lai EC. Li XW, et al. Crit Care. 2023 Mar 22;27(1):122. doi: 10.1186/s13054-023-04401-0. Crit Care. 2023. PMID: 36949459 Free PMC article.
  • Evidence-Based Medicine Working Group: Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA 1992; 268:2420–2425
  • Guyatt G: Users’ Guides to the Medical Literature. Third Edition. New York, NY, McGraw Hill Education, 2015
  • Djulbegovic B, Guyatt GH: Progress in evidence-based medicine: A quarter century on. Lancet 2017; 390:415–423
  • Doig GS, Roberts I, Bellomo R: The tens of thousands of lives saved by randomized clinical trials in critical care. Intensive Care Med 2015; 41:701–704
  • Davidoff F, Haynes B, Sackett D, et al.: Evidence based medicine. BMJ 1995; 310:1085–1086

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Ingenta plc
  • Ovid Technologies, Inc.
  • Wolters Kluwer

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Critical Appraisal of Research Articles: Systematic Reviews

  • Systematic Reviews
  • Clinical Practice Guidelines
  • Qualitative Studies

What is a Systematic Review?

A systematic review is a review of a clearly formulated queston that uses systematic and explicit methods to identify, select, and critically appraise relevant research, and to collect and analyze data from studies that are included in the review. Statistical methods may or may not be used to analyze and summarize the results of the included studies.

How to Find Systematic Reviews

1. Search the Cochrane Database of Systematic Reviews

2.  Using  PubMed , either use the 'Systematic Reviews' filter or add this to the end of your search 'AND (systematic review [ti])

3. If searching CINAHL , limit by publication type (select "Systematic Review").

Questions to Ask

  • Is it a systematic review of the right type of studies which are relevant to your question?
  • Does the methods section describe how all the relevant trials were found and assessed?    The paper should give a comprehensive account of the sources consulted in the search for relevant papers, the search strategy used to find them, and the quality and relevance criteria used to decide whether to include them in the review.
  • The authors should include hand searching of journals and searching for unpublished literature.
  • Were any obvious databases missed?
  • Did the authors check the reference lists of articles and textbooks?
  • Did they contact experts (to get their list of references checked for completeness and to try to find out about ongoing or unpublished research)?
  • Did they use an appropriate search strategy; were important subject terms missed?
  • Who were the study participants and how is their disease status defined?
  • What intervention(s) were given, how, and in what setting?
  • How were outcomes assessed?

      3.   Are the studies consistent, both clinically and statistically?

      4.   Compare with PRISMA

  • Look at the most recent PRISMA checklist to see how well the authors documented the various preferred reporting items.

Appraisal Checklists for Systematic Reviews

  • Critical Appraisals Skills Programme (CASP)
  • Joanna Briggs Institute
  • << Previous: Prognosis
  • Next: Clinical Practice Guidelines >>

Creative Commons License

  • Last Updated: Mar 1, 2024 11:56 AM
  • URL: https://guides.himmelfarb.gwu.edu/CriticalAppraisal

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2962
  • [email protected]
  • https://himmelfarb.gwu.edu
Display of Opening hours
Hours
7:30am – 2:00am
7:30am – 2:00am
7:30am – 2:00am
8:00am – 5:00pm
Reference Desk 9:00am – 10:00pm

Main Library Logo

Systematic Reviews

  • Review Types
  • Scoping Review Steps
  • Before You Begin

P = Plan: decide on your search methods

I = identify: search for studies that match your criteria, e = evaluate: exclude or include studies, c = collect: extract and synthesize key data, e = explain: give context and rate the strength of the studies, s = summarize: write and publish your final report.

  • Biomedical & Public Health Reviews

Congratulations!

You've decided to conduct a Systematic Review! Please see the associated steps below. You can follow the  P-I-E-C-E-S = Plan, Identify, Evaluate, Collect, Explain, Summarize  system or any number of systematic review processes available  (Foster & Jewell, 2017) .

P =   Plan: decide on your search methods

Determine your Research Question 

By now you should have identified gaps in the field and have a specific question you are seeking to answer. This will likely have taken several iterations and is the most important part of the Systematic Review process. 

Identify Relevant Systematic Reviews 

Once you've finalized a research question, you should be able to locate existing systematic reviews on or similar to your topic. existing systematic reviews will be your clues to mine for keywords, sample searches in various databases, and will help your team finalize your review question and develop your  inclusion and exclusion criteria. , decide on a protocol and reporting standard, your  protocol  is essentially a project plan and data management strategy for an objective, reproducible, sound methodology for peer review. the  reporting standard or guidelines  are not a protocols, but rather a set of standards to guide the development of your systematic review. often they include checklists. it is not required, but highly recommended to follow a reporting standard. .

Protocol registry:   Reviewing existing systematic reviews and registering your protocol will increase transparency, minimize bias, and reduce the redundancy of groups working on the same topics ( PLoS Medicine Editors, 2011 ). Protocols can serve as internal or external documents. Protocols can be made public prior to the review. Some registries allow for keeping a protocol private for a set period of time.

Cochrane Database of Systematic Reviews (UGA Login)  (Health Sciences)

A collection of regularly updated, systematic reviews of the effects of health care. New reviews are added with each issue of The Cochrane Library Reviews mainly of randomized controlled trials. All reviews have protocols.

PROSPERO  (General)

This is an international register of systematic reviews and is public. 

Campbell Corporation  (Education & Social Sciences)

Topics covered include Ageing; Business and Management; Climate Solutions; Crime and Justice; Disability;  Education; International Development; Knowledge Translation and Implementation; Methods; Nutrition and Food Security; Sexual Orientation and Gender Identity; Social Welfare; and Training.

Systematic Review for Animals and Food  (Vet Med & Animal Science)

Reporting Standards:  

Campbell MECCIR Standards  (Education & Social Sciences)

Cochrane Guides & Handbooks  (Health & Medical Sciences)

Institute of Medicine of the National Academies: Finding What Works in Healthcare: Standards for Systematic Reviews  (healthcare)

  • PRISMA for Systematic Review Protocols  (General)
  • PRISMA Checklist  (General)
  • PRISMA for Scoping Reviews  (General)

Decide on Databases and Grey Literature for Systematic Review Research

Because the purpose of a SR is to find all studies related to your research question, you will need to search multiple databases. You should be able to name the databases you are already familiar with using. Your librarian will be able to recommend additional databases, including some of the following: 

  • PubMed  (Health Sciences)
  • Web of Science  
  • Cochrane Database  (Biomedical)
  • National and Regional Databases (i.e. WHO LILACS scientific health information from Latin America and the Caribbean countries)
  • CINAHL  (Health Sciences)
  • PsycINFO  (Psychology)

Depending on your topic, you may want to search clinical trials and grey literature. See this  guide  for more on grey literature.

Develop Keywords and Write a Search Strategy

Go   here for help with writing your search strategy  

Translate Search Strategies

Each database you use will have different methods of searching and resulting search strings, including syntax. ideally you will create one master keyword list and translate it for each database. below are tools to assist with translating search strings. .

Includes syntax for Cochrane Library, EBSCO, ProQuest, Ovid, and POPLINE.

The IEBH SR-Accelerator is a suite of tools to assist in speeding up portions of the Systematic Review process, including the Polyglot tool which translates searches across databases. 

University of Michigan Search 101 - SR Database Cheat Sheet

Storing, Screening and Full-Text Screening of Your Citations

Because systematic review literature searches may produce thousands of citations and abstracts, the research team will be screening and systematically reviewing large amounts of results. During  screening , you will remove duplicates and remove studies that are not relevant to your topic based on a review of titles and abstracts. Of what remains, the  full-text screening  of the studies will then need to be conducted to confirm that they fit within your  inclusion/exclusion criteria.   

The results of the literature review and screening processes are best managed by various tools and software. You can also use a simple form or table to log the relevant information from each study. Consider whether you will be coding your data during the extraction process in your decision on which tool or software to use. Your librarian can consult on which of these is best suited to your research needs.

  • EndNote  Guide (UGA supported citation tracking software) - great for storing, organizing, and de-duplication
  • RefWorks  Guide (UGA supported citation tracking software) - great for storing, organizing, and de-duplication
  • Rayyan  (free service) - great for initial title/abstract screening OR full-text screening as cannot differentiate; not ideal for de-duplication
  • Covidence  (requires a subscription) - full suite of systematic review tools including meta-analysis
  • Combining Software  (EndNote, Google Forms, Excel) 
  • Forms such as  Qualtrics  (UGA EITS software) can note who the coder is, creates charts and tables, good when have a review of multiple types of studies

Data Extraction

Data extraction processes differ between qualitative and quantitative evidence syntheses. In both cases, you must provide the reader with a clear overview of the studies you have included, their similarities and differences, and the findings. Extraction should be done in accordance to pre-established guidelines, such as PRISMA. 

Some systematic reviews contain meta-analysis of the quantitative findings of the results. Consider including a statistician on your team to complete the analysis of all individual study results. Meta-analysis will tell you how much or what the actual results is across the studies and explains results in a measure of variance, typically called a forest plot.

Systematic review price models have changed over the years. Previously, you had to depend on departmental access to software that would cost several hundred dollars. Now that the software is cloud-based, tiered payment systems are now available. Sometimes there is a free tier level, but costs go up for functionality, number or users, or both. Depending on the organization's model, payments may be monthly, annual or per project/review.

  • Always check your departmental resources before making a purchase.
  • View all training videos and other resources before starting your project.
  • If your access is limited to a specific amount of time, wait to purchase until the appropriate work stage

Software list

Tool created by Brown University to assist with screening for systematic reviews.

  • Cochrane's RevMan

Review Manager (RevMan) is the software used for preparing and maintaining Cochrane Reviews.

Systematic review tool intended to assist with the screening and extraction process. (Requires subscription)

  • Distiller SR

DistillerSR is an online application designed specifically for the screening and data extraction phases of a systematic review (Requires subscription) Student and Faculty tiers have monthly pricing with a three month minimum. Number of projects is limited by pricing.

  • EPPI Reviewer  (requires subscription, free trial)

It includes features such as text mining, data clustering, classification and term extraction

Rayyan is a free web-based application that can be used to screen titles, abstracts, and full text. Allows for multiple simultaneous users.

  • AHRQ's  SRDR tool  (free) which is web-based and has a training environment, tutorials, and example templates of systematic review data extraction forms

"System for the Unified Management, Assessment and Review of Information, the Joanna Briggs Institutes premier software for the systematic review of literature."

  • Syras Pricing is based on both number of abstracts and number of collaborators. The free tier is limited to 300 abstracts and two collaborators. Rather than monthly pricing, the payment is one-time per project.

Evidence Synthesis or Critical Appraisal

PRISMA guidelines suggest including critical appraisal of the included studies to assess the risk of bias and to include the assessment in your final manuscripts. There are several appraisal tools available depending on your discipline and area of research.

Simple overview of risk of bias assessment, including examples of how to assess and present your conclusions.

CASP is an organization that provides resources for healthcare professionals, but their appraisal tools can be used for varying study types across disciplines.

From the Joanna Briggs Institute: "JBI’s critical appraisal tools assist in assessing the trustworthiness, relevance and results of published papers."

Johns Hopkins Evidence-Based Practice Model  (health sciences)

National Academies of Sciences, Engineering, and Medicine

Document the search; 5.1.6. Include a methods section

List  of additional critical appraisal tools from Cardiff University. 

Synthesize, Map, or Describe the Results

Prepare your process and findings in a final manuscript. Be sure to check your PRISMA checklist or other reporting standard. You will want to include the full formatted search strategy for the appendix, as well as include documentation of your search methodology. A convenient way to illustrate this process is through a  PRISMA  Flow Diagram. 

Attribution: Unless noted otherwise, this section of the guide was adapted from Texas A&M's "Systematic Reviews and Related Evidence Syntheses" 

  • << Previous: Before You Begin
  • Next: Biomedical & Public Health Reviews >>
  • Last Updated: Aug 23, 2024 1:53 PM
  • URL: https://guides.libs.uga.edu/SystematicReview

Banner Image

Systematic Review Guide

  • Types of Reviews
  • Decision Tool
  • Comparison Chart
  • Readiness Checklist
  • Guidance on Authorship
  • Knowledge Synthesis Search Timeline
  • Peer Review of Electronic Search Strategies
  • For Supervisors
  • For Learners
  • The Systematic Review Process
  • Staying Organized
  • Guides and Manuals
  • Review Question & Protocol
  • Academic Literature
  • Grey Literature

Critical Appraisal

  • Documentation
  • Workshop Materials
  • Online Courses
  • Recent Publications
  • Report a problem, or give us feedback

Critical appraisal is the careful analysis of studies to determine their relative value. The  Institute of Medicine's Standards for Systematic Reviews  includes Standard 3.6: “Critically appraise each study:

3.6.1 Systematically assess the risk of bias, using predefined criteria 3.6.2 Assess the relevance of the study’s populations, interventions, and outcome measures 3.6.3 Assess the fidelity of the implementation of interventions”

See also: What is Critical Appraisal?

Critical Appraisal Starting Points

  • AMSTAR Checklist (A Measurement Tool to Assess Systematic Reviews)  A valid, reliable and useable instrument that helps users differentiate between systematic reviews, focusing on their methodological quality and expert consensus.
  • CATMaker and EBM Calculators  Centre for Evidence-Based Medicine (Oxford, UK). CATmaker is a computer-assisted critical appraisal tool, now free. It provides guided appraisal and calculations for therapy, diagnosis, prognosis, etc.
  • Critical Appraisal Checklists by BMJ Best Practice
  • Critical Appraisal Skills Programme (CASP) Tools & Checklists
  • Critical Appraisal Tools from CEBM
  • GRADE Working Group  The working group has developed a common, sensible and transparent approach to grading quality of evidence and strength of recommendations.
  • Health Evidence Quality Assessment Tool for Review Articles
  • International Centre for Allied Health Evidence, Critical Analysis Tools
  • The ROBINS-I tool (Risk Of Bias In Non-randomized Studies - of Interventions)  by the Cochrane Methods Bias Group
  • SIGN Critical Appraisal Notes and Checklists  Includes checklists for systematic reviews, RCTs, and other types of studies. 
  • Systematic Review Toolbox - Try the Advanced search on "Quality Assessment" features
  • What is Critical Appraisal?

Print Books on Critical Appraisal

  Find these titles and more in the Health Sciences Library:

critical appraisal of systematic review essay

E-Books on Critical Appraisal

critical appraisal of systematic review essay

  • << Previous: Grey Literature
  • Next: Documentation >>
  • Last Updated: Jun 4, 2024 5:22 PM
  • URL: https://guides.hsict.library.utoronto.ca/SMH/systematic

                             

X

Library Services

UCL LIBRARY SERVICES

  • Guides and databases
  • Library skills
  • Systematic reviews

Describing and appraising studies

  • What are systematic reviews?
  • Types of systematic reviews
  • Formulating a research question
  • Identifying studies
  • Searching databases
  • Synthesis and systematic maps
  • Software for systematic reviews
  • Online training and support
  • Live and face to face training
  • Individual support
  • Further help

Searching for information

On this page:

  • Describing stuides

Examples of describing studies

Critical appraisal of quality and relevance, appraising the quality of a systematic review, describing studies.

Studies that are used in a review are described in a standardised way that is suitable for each review.  The detail provided facilitates transparency in how each study contributes to the overall findings of the review, and the overall reliabillity of the review.

There are three key reasons for describing (or coding) the studies in a systematic review.

  • To know more about the included studies. Studies may be described on characteristics such as the aims, methods, or particular elements to describe the research sample and any outcomes measured.  Historically such descriptions about the studies may have been limited to basic information such as authors names, place of publication, and research methods. It is now recognized that information about what has or has not been studied is a useful product in its own right, as highlighted in the box on systematic evidence maps .
  • To identify the research findings of the individual studies to be synthesized
  • To identify the methods used in each study, so the study can be critically appraised for trustworthiness and relevance to the review. Methodological aspects of each study might be described in terms of how sampling was undertaken, recruitment of the sample, data collection methods, data analysis methods.
  • Coding for systematic reviews and meta-analysis, Sandra Wilson Video (78 mins) on describing studies)

  • Young people, pregnancy and social exclusion Brunton et al 2006. An example of the coding used for all the included studies for a review on Young people, pregnancy and social exclusion.

The EPPI-Centre coding guidelines in education:

  • Education guidelines version 0.94 (2001) Guideline for descriptive keywording.
  • Education guidelines version 0.97 (2003) Guideline for extracting and appraising data from studies.

Critical appraisal involves checking the quality, reliability and relevance of the studies in the review in relation to the review question. It appraises each study in terms of the following aspects:. 

  • Is the study relevant to the research question?
  • Is the study valid? E.g. Were the study methods applied appropriately?
  • Were appropriate methods used in relation to the review question?

In addition, the studies are collectively appraised in terms of how they support the review findings and evidence claims of the review. For example, if the research evidence comprises of studies that have wide variation of findings, this reduces the strength of the evidence claims.

There are many standardised tools available for critical appraisal depending on the study design and the type of review. The approach to critical appraisal and the appraisal decisions for each study should be reported.

  • Critical Appraisal Skills Programme (CASP) Tools (checklists) for systematic reviews, randomised controlled trials, cohort studies, case control studies, economic evaluations, diagnostic studies, qualitative studies, clinical prediction rule
  • EBP checklists: University of Glasgow Adaptations of the CASP checklists for appraising other types of study, including an education intervention, and studies on treatment and prevention, and guidelines.
  • Newcastle-Ottawa Scale Useful for assessing quality of non-randomised studies such as case-control and cohort studies.
  • Risk of Bias Cochrane tool Version 2 of the Cochrane risk-of-bias tool for randomized trials (RoB 2). RoB 2 is structured into a fixed set of domains of bias, focussing on different aspects of trial design, conduct, and reporting.

Commonly-used tools for appraising research evidence in reviews:

  • GRADE For appraising reviews of effectiveness.
  • CerQUAL For appraising reviews including qualitative research evidence.

It is important for users of systematic reviews to consider the quality of the whole review. There are three separate elements that contribute an appraisal:

  • the quality and relevance of the methods used to address the review questions;
  • the quality and relevance  of the methods used by the individual studies included in the review;
  • the nature and extent of the total evidence from studies included in the review.

There are tools to help with the appraisal of a whole review. Some of these are specific to certain types of reviews, and others are more generic. 

  • JBI checklist for systematic reviews An appraisal tool that can be used for a range of types of systematic reviews. From the Joanna Briggs Institute.

Some tools focus only on appraising the methods of specific types of reviews:

  • AMSTAR 2 Used to appraise only reviews that examine the effectiveness of interventions.
  • ROBIS A tool from the University of Bristol that focuses on testing statistical bias in certain types of reviews.

Further reading:

  • Evidence Standards: A Dimensions of Difference Framework for Justifiable Evidence Claim Gough (2016). This short paper provides an overarching conceptual framework for considering the main dimensions involved in appraising evidence claims of a review.

Related guides

  • LibrarySkills@UCL: Evaluating information
  • << Previous: Searching databases
  • Next: Synthesis and systematic maps >>
  • Last Updated: Aug 2, 2024 9:22 AM
  • URL: https://library-guides.ucl.ac.uk/systematic-reviews

Banner

  • Macquarie University Library
  • Subject and Research Guides

Systematic Reviews

  • Step 8: Appraise Studies & Assess Risk of Bias
  • Step 1: Check Protocols & Guides
  • Step 2: Form a Question
  • Step 3: Develop a Search Strategy & Criteria
  • Grey Literature
  • Documenting the Search
  • Step 5: Export Results with EndNote, Mendeley
  • Review Software and Tools
  • Step 6: PRISMA Flow Diagram & Screen
  • Step 7: Extract Data
  • Step 9: Synthesise & Interpret, Meta-analyses
  • Step 10: Templates, Write & Publish
  • Non-Health Systematic Reviews

What is Critical Appraisal?

Critical appraisal.

Critical appraisal  simply put is the process of systematically looking at research papers to assess three important things:  trustworthiness, value  and  relevance.  When critically appraising a research paper the first step is to examine the study for any bias .

Bias can occur in the design or methodology of the study and this can distort the study's findings so that they do not accurately reflect the truth. It should be noted that no study is totally free from bias and for this reason  it is necessary to systematically check that the researchers have done all they can to  minimise bias .

A study which is sufficiently free from bias is said to have  internal validity . A study will be said to have  external validity  when it   can be  generalised  to the clinical (or wider population) context.

Critical appraisal checklists  provide a framework for interpreting and determining the reliability of the evidence. Checklists are designed to help you answer the questions - is the study unbiased, are the findings reliable, and are the findings valid?

Critical Appraisal Tools

Joanna briggs institute (jbi).

  • Critical Appraisal Skills Programme (CASP)
  • Johns Hopkins Research Evidence Appraisal Tool
  • Other Tools

An updated version of AMSTAR that appraises systematic reviews, including ones based on non-randomised studies of healthcare interventions. Includes additional criteria such as inclusion of PICO, risk of bias in the evidence synthesis stage, causes and significance of heterogeneity, and justification of chosen study design. 'Yes' answers to questions denote positive results.

JBI’s critical appraisal tools assist in assessing the trustworthiness, relevance and results of published papers.

  • JBI’s Critical Appraisal Tools

Critical Appraisal Skills Programme (CASP)

This set of eight critical appraisal tools are designed to be used when reading research, these include tools for Systematic Reviews, Randomised Controlled Trials, Cohort Studies, Case Control Studies, Economic Evaluations, Diagnostic Studies, Qualitative studies and Clinical Prediction Rule.

These are free to download and can be used by anyone under the Creative Commons License.

  • CASP Checklists

Contains questions used to evaluate an article's study design and level of evidence. This tool contains three questions that allow the review to determine a study's methodology. Uses a 16 item checklist for research studies and a 12 item checklist for systematic reviews and meta-analyses.

  • Appraisal tools from Royal Melbourne Hospital Summary of critical appraisal tools from key EBM centres, including Glasgow University; NHMRC; BMJ
  • CEBM: Centre for Evidence Based Medicine, Oxford University Includes critical appraisal checklists.
  • Duke University Medical Center Library LibGuide Includes critical appraisal worksheets for different study types.
  • University of South Australia, International Centre for Allied Health Evidence Includes critical appraisal tools from study types ranging from RCT's to systematic reviews.

Types of Bias

Bias may result from systematic errors in the research methodology. This table from the Cochrane Handbook summarises the different types of bias. 

A common classification scheme for bias

    
Systematic differences between baseline characteristics of the groups that are compared
Systematic differences between groups in the care that is provided, or in exposure to factors other than the interventions of interest
Systematic differences between groups in withdrawals from a study. Incomplete outcome data
Systematic differences between groups in how outcomes are determined
Systematic differences between reported and unreported findings 

Books on Critical Appraisal

critical appraisal of systematic review essay

Critical Appraisal of Epidemiological Studies and Clinical Trials, 2007, 3rd ed.

critical appraisal of systematic review essay

Evidence-Based Medicine, Sharon E. Straus, 2011, 4th ed.

Users' guides to the medical literature: a manual for evidence-based clinical practice, gordon guyatt et al., 2015, 3rd ed..

critical appraisal of systematic review essay

How to Read a Paper, Trisha Greenhalgh, 2014, 5th ed.

Tools for assessing 'risk of bias'.

The Cochrane risk of bias tool is now recommended for use within all Cochrane reviews, and is widely used by non-Cochrane reviews of randomized controlled trials.

  • The Cochrane Collaboration’s tool for assessing risk of bias The Cochrane risk of bias tool covers 6 domains of bias: selection, performance, detection, attrition, reporting, and other bias.
  • RoB 2.0 tool A revised tool to assess risk of bias in randomised trials (RoB 2)
  • ROBINS-I tool ROBINS-I (Risk Of Bias In Non-randomised Studies - of Interventions)
  • Newcastle-Ottawa Scale (NOS) The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses
  • ROBINS-E tool (Risk of Bias in Non-randomised Studies - of Exposures) Under development

Further Reading

  • Article: ROBIS: a new tool to assess risk of bias in Systematic Reviews was developed
  • Article: Risk of bias tools in systematic reviews of health interventions: an analysis of PROSPERO-registered protocols
  • Chapter 8: Assessing risk of bias
  • << Previous: Step 7: Extract Data
  • Next: Step 9: Synthesise & Interpret, Meta-analyses >>
  • Last Updated: Aug 1, 2024 1:06 PM
  • URL: https://libguides.mq.edu.au/systematic_reviews

Assess for Quality and Bias of Studies

Quality assessment/risk of bias.

Studies that are included in a systematic review may include biases in their results or conclusions. Bias can lead to either underestimation or overestimation of the true effect of an intervention, with varying degrees of impact. Biases may arise from the actions of primary study investigators, review authors, or limitations in the research process, and can be influenced by conflicts of interest (Boutron et al., 2023). Studies should be evaluated for risk of bias with a tool that fits the study designs included in your systematic review. 

Per the  Cochrane Handbook :

 "Methodological quality refers to  critical appraisal  of a study or systematic review and the extent to which study authors conducted and reported their research to the highest possible standard.  Bias  refers to systematic deviation of results or inferences from the truth. These deviations can occur as a result of flaws in design, conduct, analysis, and/or reporting. It is not always possible to know whether an estimate is biased even if there is a flaw in the study; further, it is difficult to quantify and at times to predict the direction of bias" (Higgins et al., 2023).

The most recent version of the Cochrane handbook also states that "Most recent tools for assessing the internal validity of findings from quantitative studies in health now focus on risk of bias, whereas previous tools targeted the broader notion of ‘methodological quality’" (Boutron et al., 2023).

Types of bias can also include:

  • Publication bias 
  • Location bias
  • Citation bias
  • Language bias
  • Outcome reporting bias 

For more information on bias see:

Cochrane Handbook Chapter 7: Considering Bias and Conflicts of Interest Among the Included Studies .

Cochrane Handbook Chapter 8: Assessing Risk of Bias in a Randomized Trial.

Finding What Works in Health Care: Reporting Bias   (IOM, 2011)

  • Quality Assessment/Risk of Bias Tools
  • Assess for Risk of Bias  The sixth part in a series of free online courses about how to conduct a systematic review from Evidence Synthesis Academy.

Boutron I, Page MJ, Higgins JPT, Altman DG, Lundh A, Hróbjartsson A. Chapter 7: Considering bias and conflicts of interest among the included studies. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors).  Cochrane Handbook for Systematic Reviews of Interventions  version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook.

Higgins, JPT, Thomas, J, Chandler, J, Cumpston, M, Li, T, Page, MJ, & Welch, VA (Eds.). (2023).  Cochrane handbook for systematic reviews of interventions, version 6.4. Cochrane. www.training.cochrane.org/handbook.

Institute of Medicine. (2011).  Finding what works in health care: Standards for systematic reviews.  The National Academies Press. https://doi.org/10.17226/13059.

  • << Previous: Quantitative vs. Qualitative Research
  • Next: Reducing Bias >>
  • Adelphi University Libraries
  • Common Review Types
  • Integrative Reviews
  • Scoping Reviews
  • Rapid Reviews
  • Meta-Analysis/Meta-Synthesis
  • Selecting a Review Type
  • Types of Questions
  • Key Features and Limitations
  • Is a Systematic Review Right for Your Research?
  • Guidelines for Student Researchers
  • Training Resources
  • Register Your Protocol
  • Handbooks & Manuals
  • Reporting Guidelines
  • PRESS 2015 Guidelines
  • Search Strategies
  • Selected Databases
  • Grey Literature
  • Handsearching
  • Citation Searching
  • Screening Studies
  • Study Types & Terminology
  • Quantitative vs. Qualitative Research
  • Reducing Bias
  • Tools for Specific Study Types
  • Data Collection/Extraction
  • Broad Functionality Programs & Tools
  • Search Strategy Tools
  • Deduplication Tools
  • Screening Tools
  • Data Extraction & Management Tools
  • Meta Analysis Tools
  • Books on Systematic Reviews
  • Finding Systematic Review Articles in the Databases
  • Systematic Review Journals
  • More Resources
  • Evidence-Based Practice Research in Nursing This link opens in a new window
  • Citation Management Programs
  • Last Updated: Aug 23, 2024 4:52 PM
  • URL: https://libguides.adelphi.edu/Systematic_Reviews
  • Correspondence
  • Open access
  • Published: 21 August 2024

A critical appraisal of“effects of exercise therapy on disability, mobility, and quality of life in the elderly with chronic low back pain: a systematic review and meta-analysis of randomised controlled trials.”

  • Zhang shikun 1 &
  • Zhou wensheng 2  

Journal of Orthopaedic Surgery and Research volume  19 , Article number:  495 ( 2024 ) Cite this article

69 Accesses

Metrics details

This response letter addresses the comments received on our paper. The main points of our response include: Clarification of the definitions of primary and secondary indexes; Justification for the use of the RoB2 tool for quality assessment; Measures to improve sensitivity analysis and data consistency; Explanation and improvement plans regarding the timing of Prospero registration. We have provided detailed explanations of the study design and outlined specific measures for future improvements to enhance research transparency and quality.

Dear Editor,

Thank you for providing us with the opportunity to publish our study and for the valuable comments on our paper titled “Effects of exercise therapy on disability, mobility, and quality of life in the elderly with chronic low back pain: a systematic review and meta-analysis of randomised controlled trials.” We highly value these comments and hope to clarify some issues through the following response.

It was pointed out that the distinction between primary and secondary indexes in our study is unclear. This might have led to misunderstandings, as pain was not explicitly mentioned in the title and hypothesis. While the pain indicator is not mentioned in the title, it is detailed in the study hypothesis and methods section. In future studies, we will clearly define primary and secondary indexes to ensure that readers fully understand our study design.

We used the RoB2 tool to assess the quality of the included studies, because it is widely used in randomized controlled trials. Based on articles by Sterne et al. [ 2 ] in BMJ and Higgins et al. [ 1 ] in the Cochrane Handbook for Systematic Reviews of Interventions, we believe RoB2 is suitable for this study. In future studies, we will consider using more appropriate assessment tools for non-pharmacological research, such as the Downs and Black scale, to more comprehensively assess study quality.

The sensitivity analysis provided valuable insights, showing that heterogeneity of the results significantly decreased after excluding studies with a high risk of bias. Regarding concerns about sensitivity analysis and data consistency, we will more rigorously review our data processing procedures to ensure the accuracy of all analyses and figures.

Once again, we appreciate the valuable comments and suggestions on our study. We will make improvements based on the feedback and hope to provide more effective treatment options for elderly patients with chronic low back pain through further research. We hope our response clarifies some misunderstandings and further advances research in this field.

Zhang Shikun.

Data availability

No datasets were generated or analysed during the current study.

Sterne, J. A. C., Savović, J., Page, M. J., Elbers, R. G., Blencowe, N. S., Boutron, I., Cates, C. J., Cheng, H.-Y., Corbett, M. S., Eldridge, S. M., Emberson, J. R., Hernán, M. A., Hopewell, S., Hróbjartsson, A., Junqueira, D. R., Jüni, P., Kirkham, J. J., Lasserson, T., Li, T.,. . Higgins, J. P. T. (2019). RoB 2: a revised tool for assessing risk of bias in randomised trials. bmj , 366 , 1–8. https://doi.org/10.1136/bmj.l4898 .

Higgins, J. P., Savović, J., Page, M. J., Elbers, R. G., & Sterne, J. A. (2019). Assessing risk of bias in a randomized trial. In J. T. J.P.T. Higgins, J. Chandler, M. Cumpston, T. Li, M.J. Page, V.A. Welch (Ed.), Cochrane Handbook for Systematic Reviews of Interventions . https://doi.org/ https://doi.org/10.1002/9781119536604.ch8

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

Jiangsu Police Institute, Nanjing, China

Zhang shikun

Jiangsu Second Normal University, Nanjing, China

Zhou wensheng

You can also search for this author in PubMed   Google Scholar

Contributions

Zhang Shikun write the response letter; Zhou Wensheng check the letter.

Corresponding author

Correspondence to Zhang shikun .

Ethics declarations

Ethics approval and consent to participate, consent for publication.

All authors consent to the publication of this manuscript and have confirmed this.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

shikun, Z., wensheng, Z. A critical appraisal of“effects of exercise therapy on disability, mobility, and quality of life in the elderly with chronic low back pain: a systematic review and meta-analysis of randomised controlled trials.”. J Orthop Surg Res 19 , 495 (2024). https://doi.org/10.1186/s13018-024-04884-9

Download citation

Received : 30 May 2024

Accepted : 27 June 2024

Published : 21 August 2024

DOI : https://doi.org/10.1186/s13018-024-04884-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Journal of Orthopaedic Surgery and Research

ISSN: 1749-799X

critical appraisal of systematic review essay

Monash University Logo

  • Help & FAQ

A systematic review and critical appraisal of menopause guidelines

  • Chronic Disease & Ageing
  • Epidemiology and Preventive Medicine Alfred Hospital
  • School of Nursing and Midwifery

Research output : Contribution to journal › Review Article › Research › peer-review

Objective and rationale To identify and appraise current national and international clinical menopause guidance documents, and to extract and compare the recommendations of the most robust examples. Design Systematic review. Data sources Ovid MEDLINE, EMBASE, PsycINFO and Web of Science Eligibility criteria for selecting studies Practice guidance documents for menopause published from 2015 until 20 July 2023. Quality was assessed by the Appraisal of Guidelines for Research and Evaluation II (AGREE II) instrument. Results Twenty-six guidance papers were identified. Of these, five clinical practice guidelines (CPGs) and one non-hormonal therapy position statement met AGREE II criteria of being at least of moderate quality. The five CPGs listed symptoms associated with the perimenopause and menopause to be vasomotor symptoms (VMS), disturbed sleep, musculoskeletal pain, decreased sexual function or desire, and mood disturbance (low mood, mood changes or depressive symptoms). Acknowledged potential long-term menopause consequences were urogenital atrophy, and increased risks of cardiovascular disease and osteoporosis. VMS and menopause-associated mood disturbance were the only consistent indications for systemic menopausal hormone therapy (MHT). Some CPGs supported MHT to prevent or treat osteoporosis, but specific guidance was lacking. None recommended MHT for cognitive symptoms or prevention of other chronic disease. Perimenopause-specific recommendations were scant. A neurokinin 3B antagonist, selective serotonin/norepinephrine (noradrenaline) reuptake inhibitors and gabapentin were recommended non-hormonal medications for VMS, and cognitive behavioural therapy and hypnosis were consistently considered as being of potential benefit. Discussion The highest quality CPGs consistently recommended MHT for VMS and menopause-associated mood disturbance, whereas clinical depression or cognitive symptoms, and cardiometabolic disease and dementia prevention were not treatment indications. Further research is needed to inform clinical recommendations for symptomatic perimenopausal women.

Original languageEnglish
Pages (from-to)122-138
Number of pages17
Journal
Volume50
Issue number2
DOIs
Publication statusPublished - Apr 2024

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

  • 10.1136/bmjsrh-2023-202099

Other files and links

  • Link to publication in Scopus

Projects per year

MenoPROMPT: a co-designed, comprehensive, evidence-based program to improve the care of women at and after menopause

Davis, S. , Manski-Nankervis, J. A. E., Bell, R., Islam, R. , Vincent, A. , Boyle, D., Temple-Smith, M. J., Ebeling, P. , Jane, F., Allan, C., Tonkin, A. & McMorrow, R.

1/11/22 → 31/10/26

Project : Research

T1 - A systematic review and critical appraisal of menopause guidelines

AU - Hemachandra, Chandima

AU - Taylor, Sasha

AU - Islam, Rakibul M.

AU - Fooladi, Ensieh

AU - Davis, Susan R.

N1 - Funding Information: This research was funded by the Australian National Health and Medical Research Council (NHMRC) (Grant 2015514). SRD holds an NHMRC Leadership Grant (2016627) Publisher Copyright: © Author(s) (or their employer(s)) 2024.

PY - 2024/4

Y1 - 2024/4

N2 - Objective and rationale To identify and appraise current national and international clinical menopause guidance documents, and to extract and compare the recommendations of the most robust examples. Design Systematic review. Data sources Ovid MEDLINE, EMBASE, PsycINFO and Web of Science Eligibility criteria for selecting studies Practice guidance documents for menopause published from 2015 until 20 July 2023. Quality was assessed by the Appraisal of Guidelines for Research and Evaluation II (AGREE II) instrument. Results Twenty-six guidance papers were identified. Of these, five clinical practice guidelines (CPGs) and one non-hormonal therapy position statement met AGREE II criteria of being at least of moderate quality. The five CPGs listed symptoms associated with the perimenopause and menopause to be vasomotor symptoms (VMS), disturbed sleep, musculoskeletal pain, decreased sexual function or desire, and mood disturbance (low mood, mood changes or depressive symptoms). Acknowledged potential long-term menopause consequences were urogenital atrophy, and increased risks of cardiovascular disease and osteoporosis. VMS and menopause-associated mood disturbance were the only consistent indications for systemic menopausal hormone therapy (MHT). Some CPGs supported MHT to prevent or treat osteoporosis, but specific guidance was lacking. None recommended MHT for cognitive symptoms or prevention of other chronic disease. Perimenopause-specific recommendations were scant. A neurokinin 3B antagonist, selective serotonin/norepinephrine (noradrenaline) reuptake inhibitors and gabapentin were recommended non-hormonal medications for VMS, and cognitive behavioural therapy and hypnosis were consistently considered as being of potential benefit. Discussion The highest quality CPGs consistently recommended MHT for VMS and menopause-associated mood disturbance, whereas clinical depression or cognitive symptoms, and cardiometabolic disease and dementia prevention were not treatment indications. Further research is needed to inform clinical recommendations for symptomatic perimenopausal women.

AB - Objective and rationale To identify and appraise current national and international clinical menopause guidance documents, and to extract and compare the recommendations of the most robust examples. Design Systematic review. Data sources Ovid MEDLINE, EMBASE, PsycINFO and Web of Science Eligibility criteria for selecting studies Practice guidance documents for menopause published from 2015 until 20 July 2023. Quality was assessed by the Appraisal of Guidelines for Research and Evaluation II (AGREE II) instrument. Results Twenty-six guidance papers were identified. Of these, five clinical practice guidelines (CPGs) and one non-hormonal therapy position statement met AGREE II criteria of being at least of moderate quality. The five CPGs listed symptoms associated with the perimenopause and menopause to be vasomotor symptoms (VMS), disturbed sleep, musculoskeletal pain, decreased sexual function or desire, and mood disturbance (low mood, mood changes or depressive symptoms). Acknowledged potential long-term menopause consequences were urogenital atrophy, and increased risks of cardiovascular disease and osteoporosis. VMS and menopause-associated mood disturbance were the only consistent indications for systemic menopausal hormone therapy (MHT). Some CPGs supported MHT to prevent or treat osteoporosis, but specific guidance was lacking. None recommended MHT for cognitive symptoms or prevention of other chronic disease. Perimenopause-specific recommendations were scant. A neurokinin 3B antagonist, selective serotonin/norepinephrine (noradrenaline) reuptake inhibitors and gabapentin were recommended non-hormonal medications for VMS, and cognitive behavioural therapy and hypnosis were consistently considered as being of potential benefit. Discussion The highest quality CPGs consistently recommended MHT for VMS and menopause-associated mood disturbance, whereas clinical depression or cognitive symptoms, and cardiometabolic disease and dementia prevention were not treatment indications. Further research is needed to inform clinical recommendations for symptomatic perimenopausal women.

UR - http://www.scopus.com/inward/record.url?scp=85185176283&partnerID=8YFLogxK

U2 - 10.1136/bmjsrh-2023-202099

DO - 10.1136/bmjsrh-2023-202099

M3 - Review Article

C2 - 38336466

AN - SCOPUS:85185176283

SN - 2515-1991

JO - BMJ Sexual and Reproductive Health

JF - BMJ Sexual and Reproductive Health

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

microorganisms-logo

Article Menu

critical appraisal of systematic review essay

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

The Ambiguous Correlation of Blautia with Obesity: A Systematic Review

critical appraisal of systematic review essay

Reviewer 1 Report

According to the manuscript titled "The ambiguous correlation of Blautia with obesity: A systematic review" by Warren Chanda and colleagues. A global epidemic of obesity poses significant health and economic challenges due to its complex and multifactorial nature. In addition to diet and lifestyle, the gut microbiota is increasingly recognized as a contributor to obesity development. A number of studies have reported both potential probiotic properties and casual factors for obesity associated with Blautia, one of the major intestinal bacteria of the Firmicutes phylum. The purpose of this systematic review is to summarize current understanding of the Blautia-obesity relationship and to evaluate evidence from animal and clinical studies, in order to inform future research and therapeutic strategies targeting the gut Blautia in obese patients. In regard to the present manuscript, I would like to make a few comments.

The manuscript should introduce terms related to gut microbiota

It is possible that the manuscript is a systematic review. According to my opinion, the introduction is too long in conjunction with the table

There are several steps that are missing in the present manuscript regarding the material and methods of the systematic review. It is important to understand the search equation, the PICOS criteria, the bias evaluation, and how the articles were selected or assessed. It is possible to accomplish this using several tools, such as JBI guidelines or RevMan

There is a division of the results section according to the manuscripts found in the literature. Perhaps this should be explained in the introduction to facilitate understanding.

Discussion is excellent; however, the narrative review should be changed to a systematic review in accordance with PRISMA guidelines.

From my point of view, the manuscript should be reorganized in the introduction, describing the next sections of the results. Add several key aspects to the material and methods, and emphasize more of the key points in the results and discussion

Author Response

  • Comment 1: The manuscript should introduce terms related to gut microbiota

Response. Thank you. Some terms such as probiotics (Line 242), prebiotics (Line 263) and SCFAs (Line 55), including gut microbiota (Line 48) have been introduced.

  • Comment 2: It is possible that the manuscript is a systematic review. According to my opinion, the introduction is too long in conjunction with the table

Response. Thank you. We have revised the introduction section (Line 30-74). To further shorten the introduction, Table 1 is changed to a supplementary Table S1 (Line 61, 87)

  • Comment 3: There are several steps that are missing in the present manuscript regarding the material and methods of the systematic review. It is important to understand the search equation, the PICOS criteria, the bias evaluation, and how the articles were selected or assessed. It is possible to accomplish this using several tools, such as JBI guidelines or RevMan

Response . Thank you. The Method section has been rearranged in line with the PRISMA guidelines. It consists of literature search strategy (Line 94-99), study selection with the PICOS criteria (Line 106-113), data extraction (Line 115-121), Bias evaluation (Line 124-130), and data synthesis (Line 132-133). Because we wanted to understand the effect of medical treatment or lifestyle interventions on gut microbiota with the outcome variables (Blautia population vs obesity status), we included all studies that met the inclusion criteria for data extraction, irrespective of their quality (part of JBI guidelines). The risk-of-bias domains considered were selection bias (“Are the groups comparable such that an observed difference is likely attributable to the treatment rather than a confounder?”), performance bias (“Was the approach to husbandry the same for all treatment groups and was caregiving done without knowledge of the treatment group?”), and detection bias (“Was the approach to assessing the outcomes the same in both groups and done without knowledge of the group?”) as outlined in an article, “ Annette M. O'Connor, Jan M. Sargeant, Critical Appraisal of Studies Using Laboratory Animal Models, ILAR Journal, Volume 55, Issue 3, 2014, Pages 405–417, https://doi.org/10.1093/ilar/ilu038 ”.

  • Comment 4: There is a division of the results section according to the manuscripts found in the literature. Perhaps this should be explained in the introduction to facilitate understanding.

Response. Thank you. The last paragraph of the introduction has been revised to effectively summarize the objectives of the systematic review, to provide a clear overview of the research focus and the potential implications of the findings. It has highlighted the divisions of the results sections (Line 68-74). Also, a statement has been added to explain the divisions (Line 176-180)

  • Comment 5: Discussion is excellent; however, the narrative review should be changed to a systematic review in accordance with PRISMA guidelines.
  • Thank you. The requirements have been met. For instance, on the PRISMA checklist, 23a about the general interpretation of the results has been covered in Lines 407-476, 23b&c about limitations of the evidence and the review process has been included in Lines 477-486, while 23d about the implications of the results for practice, policy, and future research has been included in Lines 487-507.
  • Comment 6: From my point of view, the manuscript should be reorganized in the introduction, describing the next sections of the results. Add several key aspects to the material and methods, and emphasize more of the key points in the results and discussion

Response.   Thank you. The introduction has been reorganized in a way that the last paragraph (Line 68-74) indicates the main objective ( Blautia’s role in obesity) and specific objectives that include 1) exploring the abundance of Blautia in the gut microbiome of obese individuals concerning any treatment and lifestyle interventions. 2) exploring how Blautia populations respond to any treatments and lifestyle changes in obese individuals, and 3) examining associations between changes in Blautia abundance and the efficacy of employed interventions in managing obesity. These specific objectives are the basis for the divisions in the results section. Section 3.2 addresses objectives 1 and 2, while section 3.3 addresses objective 3. Both the results and discussion sections have emphasized on the 3-outlined objectives. The method section has been re-arranged following the PRISMA guidelines (see response for comment 3 )

Reviewer 2 Report

Obesity is a complex and multifactorial disease with global epidemic proportions, posing significant health and economic challenges. Whilst diet and lifestyle are well-established contributors to the pathogenesis, the gut microbiota's role in obesity development is increasingly recognized. Blautia, as one of the major intestinal bacteria of the Firmicutes phylum, is reported with both potential probiotic properties and casual factors for obesity in different studies making its role controversial.

The writing is generally clear and understandable, but there are a few grammatical errors and awkward phrasings that could be improved. For instance, "casual factors" should be corrected to "causal factors."

Figure 1 does not seem to be explained in the text, nor is the content of figure 1B commented on. 

  • Comment 1: The writing is generally clear and understandable, but there are a few grammatical errors and awkward phrasings that could be improved. For instance, "casual factors" should be corrected to "causal factors."

Response . Thank you. It has been revised (Line 17).

  • Comment2: Figure 1 does not seem to be explained in the text, nor is the content of figure 1B commented on.

Response . Thank you. Fig. 1A shows health conditions that can occur or exacerbated by obesity. Line 33-39 refers to Fig. 1A in text. Fig. 1B indicates the detection of the Blautia in various phenotypes. We have included in the text Line 61-62; and it is further cited in Line 422.

Thank you for taking into account my previous comments and rearranging your manuscript into a systematic review.   I have no further comments to make.

Chanda, W.; Jiang, H.; Liu, S.-J. The Ambiguous Correlation of Blautia with Obesity: A Systematic Review. Microorganisms 2024 , 12 , 1768. https://doi.org/10.3390/microorganisms12091768

Chanda W, Jiang H, Liu S-J. The Ambiguous Correlation of Blautia with Obesity: A Systematic Review. Microorganisms . 2024; 12(9):1768. https://doi.org/10.3390/microorganisms12091768

Chanda, Warren, He Jiang, and Shuang-Jiang Liu. 2024. "The Ambiguous Correlation of Blautia with Obesity: A Systematic Review" Microorganisms 12, no. 9: 1768. https://doi.org/10.3390/microorganisms12091768

Article Metrics

Further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • Open access
  • Published: 23 August 2024

Pesticide safety practice and its public health risk in African regions: systematic review and meta-analysis

  • Dechasa Adare Mengistu 1 ,
  • Abraham Geremew 1 &
  • Roba Argaw Tessema   ORCID: orcid.org/0000-0001-8734-4030 1  

BMC Public Health volume  24 , Article number:  2295 ( 2024 ) Cite this article

64 Accesses

Metrics details

Although pesticides play an integral role in food security and preventing public health from vector-borne diseases, inappropriate handling and continual use of restricted organochlorine pesticides pose short- and long-term adverse effects and become public health concerns in the African region. This study aimed to determine the combined level of protective equipment use, management of empty pesticide containers, and leftover pesticide residues in the African region.

The preferred reporting items for systematic reviews and the meta-analysis protocol were used to carry out this study. The Scopus, PubMed, Web of Science, Google Scholar, DOAJ, and National Repository databases were searched for articles published between November 12, 2023, and January 2, 2024. The meta-analysis data were visualized using a forest plot. A random-effects model was applied when heterogeneity existed in pooled studies. Subgroup analysis of the data was performed based on the location where the study was conducted and the publication year. Meta-regression and sensitivity analysis were performed to evaluate the robustness of the pooled prevalence of studies. Publication bias was assessed using a funnel plot. The authors used the Joanna Briggs Institute Critical Assessment tool to determine the quality of the studies.

In this review, 2174 articles were identified from the included electronic databases, 24 of which were included in the present study. The study revealed that the combined mean prevalence of wearing a mask, glove, boot/safety shoes, overall wear, and head cover accounted for 18% (95% CI: 11.9 to 26.1%, p  < 0.001), 18% (95% CI: 11.7 to 26.9%, p  < 0.001), 23% (95% CI: 15.7 to 33.3%, p  < 0.001), 26% (95% CI: 16.2 to 38.7%, p  < 0.001), and 14% (95% CI: 8.90 to 22.0%, p  < 0.001), respectively. The prevalence of pesticides stored in the living room and pesticide containers used for different purposes was 51% and 26%, respectively.

Conclusions

Poor pesticide safety practices were identified. A substantial proportion of the respondents reported storing pesticide residues in their living rooms, and the reuse of pesticide empty containers. Regional institutions should lead the designing of safety strategies to reduce the public health risks of pesticide exposure.

Peer Review reports

Introduction

Organochlorine pesticides (OPs) have been extensively used in developing countries, including African regions, to control pests for crop protection, weeds and insects to safeguard public health from vector-borne diseases such as malaria and typhus [ 1 , 2 ]. However, due to inappropriate handling, misuse, and the continual use of restricted OPs such as dichloro-diphenyl-trichloroethane (DDT), the negative impact of pesticides, especially OPs, has increased over time [ 3 ]. Globally, the use of pesticides increased by 30% from 2000 to 2020 and increased by 2.7 million tons in 2020 [ 4 , 5 ]. The data on pesticide manufacturers (2011) show that global pesticide production will increase by 2.7 times by 2050 and reach 10.1 million tons per year [ 6 ].

In Africa, the use of pesticides increased by 20%, from 84,762 tons in 2010 to 105,758 tons in 2020 [ 5 , 7 ]. The widespread misuse of OPs has caused health problems ranging from short-term effects, such as headache and nausea, to chronic effects, such as cancer, reproductive harm, immunosuppression, endocrine disruption and acute neurological damage [ 8 , 9 , 10 ]. Recent evidence indicates that pesticides accounted for 14–20% of global suicides from 2006 to 2015 and led to 110,000–168,000 fatalities annually from 2010 to 2014 [ 11 ]. This might be mainly attributed to mishandling and inappropriate use of pesticides.

Pesticide exposure is one of the main work-related risks for farmers in developing countries. The primary exposure routes include dietary residue exposure, occupational exposure, indoor and outdoor pesticide exposure, and incorrect pesticide application to domestic animals [ 12 ]. The improper handling of pesticides by users, mainly farmers, is one of the risk factors for the occurrence of health problems associated with pesticides [ 13 ]. The global impact of the inappropriate handling of pesticides led to 155,488 deaths and 7,362,493 disability-adjusted life years (DALYs) in 2016 [ 14 ]. Improper handling is a significant concern in many developing countries, including different African regions, and results in serious health threats [ 13 , 15 , 16 , 17 ].

The Food and Agriculture Organization (FAO) and World Health Organization (WHO) recommend that pesticide applicators always use personal protective equipment (PPE), such as a face mask or respirator, washable hats, eye-wear and face protection (goggles), safe shoes/boots, aprons, gloves, and clean long-sleeved coveralls, during the application of pesticides. They also recommend the need to ensure proper storage and management of empty pesticide containers, leftover pesticide residues, and application equipment. To this end, the current review aimed to measure the level of compliance with FAO/WHO recommendations for safety practices in the use of PPEs by pesticide applicators and the extent of leftover pesticide residue and empty pesticide container management in the African region. To our knowledge, only two reviews have elucidated the role of PPE in the prevention of pesticide use risks in agricultural settings and the factors affecting PPE use; however, there is scarce evidence generated through systematic reviews and meta-analyses to determine the combined level of PPE use, management of empty pesticide containers, and leftover pesticide residues in the African region.

Therefore, this study aimed to determine the safety practices among pesticide applicators in the African region. The findings of the review will be used as input to design appropriate interventions to protect the health of pesticide operators, re-entry workers, bystanders, residents, and the environment at large.

Materials and methods

The eligibility criteria, inclusion criteria.

Pesticide applicators/users in the African regions.

None of the included studies reported pesticide safety practices, including the use of personal protective equipment (PPE), including masks, gloves, head protection (hat), safety boots, and coveralls, and studies reported places for pesticide storage and empty container management.

Intervention

Pesticide application safety practices.

Types of articles

Full-text, peer-reviewed, and published articles written in English.

Publication year

Articles published from 2010 to January 2024.

Studies conducted in the African region.

Exclusion criteria

Review articles, reports, editorial papers, short communications, preprints, articles with a high risk of bias, and commentaries were excluded from this study.

Information sources

The authors (DAM, AG, and RAT) retrieved articles from different electronic databases, such as SCOPUS, PubMed, Web of Science, DOAJ, Google Scholar, and National Repository, from November 12, 2023, to January 2, 2024.

Search strategies

To retrieve these articles, the authors used a combination of Boolean logic operators (AND, OR, and NOT), Medical Subject Headings (MeSH), and main keywords. Furthermore, after the search for the articles was performed from the included electronic databases and their eligibility was assessed, the references within eligible studies were further screened for additional articles. The search strategies employed in this study are summarized in Table  1 .

Study selection process

The authors used a PRISMA flow chart for the selection of studies. This PRISMA flow chart provides the number of articles included in the study, as well as those excluded from the study, with the reasons for exclusion. The duplicated articles were also checked and removed using ENDNOTE (Thomson Reuters, USA).

The authors (DAM, AG, and RAT) independently screened the articles based on their titles and abstracts to determine their eligibility. Then, the authors evaluated the full-text articles (DAM, AG, and RAT) to determine their eligibility for the current study. Disagreements between the authors concerning the inclusion and exclusion of articles were resolved by consensus after discussion. Finally, articles that met the inclusion criteria and eligible articles were included in the present study.

Quality assessment

After the authors (DAM, AG, and RAT) evaluated the articles for eligibility, the quality of the articles was assessed using the Joanna Briggs Institute Critical Assessment Tool (JBI), which was used to evaluate the quality of the prevalence studies [ 21 ]. This tool has nine evaluation criteria. Each parameter was given a value of one if it met the criteria and zero if it did not. Then, based on the total score obtained from the nine evaluation criteria, each article was categorized as having a low, moderate, or high risk of bias, as those articles scored 85% or above, 60–85%, or 60% or less, respectively. Those articles with a moderate or low risk of bias were included in the current study. Disagreements between the authors of this work (DAM, AG, and RAT) during the quality evaluation of the article were resolved by discussing unclear points and repeating the same procedures. The JBI critical appraisal tools with nine evaluation criteria include the following: [ 1 ] appropriate sampling frame; [ 2 ] proper sampling technique; [ 3 ] adequate sample size; [ 4 ] description of the study subject and setting description; [ 5 ] sufficient data analysis; [ 6 ] use of valid methods for the identified conditions; [ 7 ] valid measurement for all participants; [ 8 ] use of appropriate statistical analysis; and [ 9 ] adequate response rate.

Data extraction

The data were extracted from the included articles using Microsoft Excel (developed by the authors). The data regarding the main characteristics and outcomes of the studies, including publication year, location/region where the study was conducted, study population, and pesticide safety practices, including the use of personal PPE (mask, glove, head protection (hat), safety shoes/boot and coverall), location for pesticide storage and empty container management, were extracted from the included articles. Finally, disagreements regarding the data extraction were resolved through discussion.

Statistical procedures and data analysis

The pooled prevalence of pesticide safety practices among pesticide applicators was determined using Comprehensive Meta-Analysis version 3.0 statistical software and Stata Version 17.0. Prevalence with 95% confidence intervals (95% CIs) were calculated. The pooled prevalence was determined for different pesticide safety practices, including the use of PPEs and the safe management of empty containers, as we were the location for pesticide storage. The data were cleaned and re-entered for each practice to reduce error. The cleaned data were then visualized using a forest plot. The heterogeneity between the articles was evaluated using the I-squared test (I 2 statistics). The I 2 describes the percentage of total variation across studies due to heterogeneity rather than chance. A random-effects model was applied when heterogeneity existed in pooled studies. A random effects approach incorporates both within-study and between-study variability. When the I 2 index was greater than 50%, a random effects model was used to calculate the pooled prevalence of PPE use. Otherwise, a fixed-effects model was applied when the heterogeneity was insignificant and the I 2 index was less than 50%. The level of heterogeneity was classified as heterogeneity might not be important (0% ≤ I 2  ≤ 25%), represented moderate heterogeneity (25% < I 2  ≤ 50%), represented substantial heterogeneity (50% < I 2  ≤ 75%), and 75% < I 2  ≤ 100% implied considerable heterogeneity [ 22 ]. Furthermore, subgroup analysis of the data was performed based on the location where the study was conducted (country) and the publication year. Differences with p values less than 0.05 were considered to indicate statistical significance. Sensitivity analysis was performed to evaluate the robustness of the pooled prevalence of studies. Publication bias was evaluated with a funnel plot.

For this study, 2174 articles were retrieved from various electronic databases, and 870 were excluded because they were duplicated. After the articles were evaluated based on their title and abstracts, 363 studies were excluded. Furthermore, 444 articles were excluded after 480 articles were evaluated for their objective, methods, and outcomes of interest. Thirty-six full-text articles were assessed for bias, 12 of which were excluded due to a high risk of bias. Finally, 24 articles conducted in various African countries that met the eligibility criteria were included in this study (Fig.  1 ).

figure 1

PRISMA flowchart of study selection to determine pesticide safety practices in the African Region, 2024

General characteristics of the studies

In the present study, 7146 participants, ranging from 70 [ 23 ] to 644 [ 24 ], were included. Among the 24 articles included from 10 African countries, 8, 4, 3, 2, and 2 were from Ethiopia [ 15 , 24 , 25 , 26 , 27 , 28 , 29 , 30 ], Nigeria [ 31 , 32 , 33 , 34 ], Ghana [ 35 , 36 , 37 ], Tanzania [ 38 , 39 ] and Cameroon [ 40 , 41 ], respectively. The remaining 5 articles were conducted in Benin [ 42 ], Rwanda [ 43 ], Uganda [ 44 ], Egypt [ 23 ], and Kenya [ 45 ].

Among the included articles, 19 studies reported the use of masks [ 23 , 24 , 25 , 26 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 37 , 38 , 40 , 41 , 42 , 43 , 44 , 45 ] during pesticide application. Similarly, 19 studies reported the use of gloves [ 23 , 24 , 25 , 26 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 37 , 38 , 40 , 41 , 42 , 43 , 44 , 45 ] during the application of pesticides, while 18, 14, and 16 articles reported the use of safe shoes (boots) [ 23 , 24 , 25 , 28 , 29 , 30 , 31 , 32 , 33 , 35 , 37 , 38 , 40 , 41 , 42 , 43 , 44 , 45 ], coverall [ 23 , 24 , 25 , 28 , 29 , 30 , 31 , 35 , 37 , 40 , 41 , 43 , 44 , 45 ] and head protector (hat) [ 23 , 24 , 25 , 26 , 28 , 29 , 30 , 32 , 33 , 35 , 37 , 38 , 40 , 42 , 43 , 44 ], respectively. Furthermore, 12 and 11 studies reported the place of pesticide storage [ 15 , 29 , 30 , 31 , 32 , 33 , 34 , 36 , 37 , 40 , 43 , 45 ] and the management of empty pesticide containers [ 23 , 24 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 39 , 42 ], respectively (Table  2 ).

Pesticide safety practice among pesticide applicators

Use of personal protective equipment, the prevalence of the use of face masks.

Figure  2 shows the use of face masks during pesticide application. A random-effects model was utilized due to significant heterogeneity (I 2  = 97.90%, p  < 0.001). The pooled prevalence of the use of masks during pesticide application was 18.0% (95% CI: 11.9 to 26.3%), with a p value < 0.001 (Fig.  2 ).

figure 2

Meta-analysis of the pooled prevalence of the use of face masks among pesticide applicators during pesticide application in the African region

Prevalence of glove use

Figure  3 illustrates the use of gloves during pesticide application. A random-effects model was employed because of the significant heterogeneity (I 2  = 98.02%, p  < 0.001) to compute the pooled estimate. The pooled prevalence of the use of gloves during pesticide application among the applicators was 18.1% (95% CI: 11.7 to 26.9%), p value < 0.001 (Fig.  3 ).

figure 3

Meta-analysis of the pooled prevalence of glove use among pesticide applicators during pesticide application in the African region

Prevalence of the use of safe shoes (boots)

Figure  4 Illustrates the use of shoes during pesticide application. Because of the significant heterogeneity (I 2 =97.97%, p <0.001), arandom effects model was utilized to calculate the pooled estimate. The overall pooled prevalence of the use of safe shoes (boots) during pesticide application was 23.5% (95% CI: 15.8 to 33.5%), with a P value <0.001 (Fig.  4 ).

figure 4

Meta-analysis of the pooled prevalence of wearing safe shoes (boots) among pesticide applicators during pesticide application in the African Region

The prevalence of the use of coveralls (full-body protection)

Figure  5 shows the wearing of full-body protection during pesticide application. Because of the significant heterogeneity (I 2  = 98.36%, p  < 0.001), a random effects model was used to compute the pooled prevalence. The pooled prevalence of wearing coveralls during the application of pesticides was 25.9% (95% CI: 16.2 to 38.7%), with a p value = 0.001 (Fig.  5 ).

figure 5

Meta-analysis of the overall prevalence of wearing pesticides among pesticide applicators in the African Region

The prevalence of the use of head protection (hats)

Figure  6 depicts the use of wearing head protection during pesticide application. Because of the significant heterogeneity (I2 = 97.45%, p  < 0.001), a random effects model was used to compute the pooled prevalence. The overall pooled prevalence of wearing head protectors (hats) during pesticide application was 14.4% (95% CI: 9.00 to 22.2%), with a p value < 0.001 (Fig.  6 ).

figure 6

Meta-analysis of the pooled prevalence of wearing head protectors during pesticide application among pesticide applicators in the African Region

Pesticide residues and container management

Practice-storage of pesticide residues in house before or after application.

Figure  7 illustrates the storage location of pesticide residues before or after application by pesticide applicators. A random-effects model was used to compute the pooled prevalence because of the presence of significant heterogeneity (I 2  = 98.26%, p  < 0.001). The pooled prevalence of storing pesticide residues in the house or living room was 51.4% (95% CI: 37.6 to 65.1%), with a p value < 0.05 (Fig.  7 ).

figure 7

Meta-analysis of the pooled prevalence of stored pesticide residues in the living room before and/or after pesticide application in the African region

The practice of reusing pesticide containers for other purposes

Figure  8 illustrates the pooled prevalence of reusing pesticide containers for different purposes. Arandom-effects model was used because of the presence of significant heterogeneity (I 2 = 96.94%, P <0.001) to calculate the pooled prevalence. The pooled prevalence of pesticide container reuse was 26.4% (95% CI: 18.8 to 35.7%), with a P value < 0.001(Figure 8 ).

figure 8

In this study, the results of the meta-analysis demonstrated that there were statistically significant ( p  < 0.05) differences in the pooled prevalence of pesticide safety practices (Figs.  2 , 3 , 4 , 5 , 6 , 7 and 8 ) among the included studies. Additionally, the meta-analysis revealed significant heterogeneity (I 2  > 75%, p  < 0.001) in the studies examining the prevalence of pesticide safety practices, indicating considerable variation. These significant heterogeneity values imply that factors other than random variation may contribute to the observed differences, such as variations in study design (reliability of outcome measures), applicator characteristics (such as age, sex and health status) and exposure levels (application dose, duration of application).

Subgroup analysis

Table  3 illustrates the subgroup analysis based on the study area, publication year, and pooled prevalence with 95% CIs. Based on the data extracted from more than one article, the pooled prevalence of the use of face masks in Cameroon, Ethiopia, Ghana, and Nigeria was 32.5%, 13.5%, 27.8%, and 38.0%, respectively. The pooled prevalence of the use of gloves in Cameroon, Ethiopia, Ghana, and Nigeria was 8.3%, 11.8%, 30.6% and 51.1%, respectively. Moreover, this study revealed that the prevalence of reuse of empty pesticide containers in Benin, Egypt, Ethiopia, Ghana, Nigeria, and Tanzania was 22.7%, 25.7%, 31.0%, 20.7%, 30.0%, and 9.0%, respectively (Table  3 ).

Meta-regression

In this study, meta-regression was employed to examine potential sourcesources of heterogeneity (e.g., the impact of sample size) for each pesticide safety practice using a random effects model (Table 4 ).

Keys: CI: Confidence Interval .

Sensitivity analysis

To take all potential outliers into account, the sensitivity analysis was conducted by removing the largest and/or smallest outcomes, which were expected to influence the overall pooled estimate. Table 5 indicates the final pooled estimates after removing the outliers, and those outliers were assessed by running a funnel plot for each pesticide safety practice (Figs.  9 and 10 ).

figure 9

Funnel plot of the effects of face masks ( A ), protective gloves ( B ), safe shoes ( C ), and overall wearing of PPEs ( D ) in studies of pesticide safety practices in African regions for investigating publication bias. The black solid circle data point indicates the number of the largest and/or smallest study outcomes removed from each pesticide safety practice

figure 10

Funnel plot of head protection ( E ), in-house pesticide storage ( F ), and reuse empty container of pesticide ( G ) from the studies of pesticide safety practice in African regions for investigating publication bias. The black solid circle data point indicates the number of removed largest and/or smallest study outcomes from each pesticide safety practice

This systematic review and meta-analysis aimed to determine and provide information on pesticide safety practices and their health risks among pesticide applicators in the African region. In the present study, 2174 articles were recovered from the included electronic databases, 24 of which were published in various African countries, met the eligibility criteria and were included in this study.

As pesticide usage has been continuous, farmers should use appropriate personal protective equipment (PPE) at all stages of pesticide handling and application, particularly in developing countries [ 46 ]. However, evidence has revealed that farmers do not use PPE properly before, during, or after pesticide application [ 46 ]. The findings of the present study indicated that the pooled prevalence of the use of face masks during pesticide application was approximately 18%. This finding was lower than the global prevalence of face mask use (43%) [ 20 ]. The variation may be attributed to the difference in the scope of the study, where the current study focused only on the African region.

Protective gloves and workplace hygiene can reduce exposure to pesticides and the risk of various diseases, including Parkinson’s disease [ 47 ]. However, the current study revealed that the combined prevalence of glove use during pesticide application among pesticide applicators was 18%. This finding was lower than the results reported by a systematic review conducted by Sapbamrer and Thammachai, where the global prevalence of glove use was approximately 41% [ 20 ]. The variation may be due to the difference in the implementation of safety practices and the scope of the study.

Similarly, the overall pooled prevalence of wearing coveralls and head protection was approximately 26% and 14%, respectively. The findings of the current study were lower than those of a study conducted elsewhere, where the global prevalence of the use of head protection (hats) was nearly 47%, and 34% in the African region [ 20 ]. Furthermore, the present study findings indicated that the pooled prevalence of the use of safe shoes (boots) during pesticide application was approximately 23%, which was lower than that reported elsewhere, where the prevalence of the use of safe shoes (boots) was nearly 45% [ 20 ].

Based on a subgroup analysis, the pooled prevalence of the use of different PPEs varied across different African countries. The prevalence of mask use during pesticide application in Cameroon, Ethiopia, Ghana, and Nigeria was nearly 33%, 14%, 28%, and 38%, respectively. Similarly, the pooled prevalence of the use of gloves was 8%, 12%, 31%, and 51% in Cameroon, Ethiopia, Ghana, and Nigeria, respectively. Furthermore, the prevalence of the reuse of empty pesticide containers was approximately 23%, 26%, 31%, 21%, 30%, and 9% in Benin, Egypt, Ethiopia, Ghana, Nigeria, and Tanzania, respectively. In addition to the variation in the prevalence of safety practices among the African countries under investigation, the prevalence of pesticide safety practices in each country was also low.

Furthermore, appropriate storage of pesticide residues and proper management of leftover pesticide residues and empty containers play pivotal roles in the prevention of pesticide exposure. However, the current study revealed that approximately half (51%) of the users had stored pesticide residues in their living room, while approximately 26% reused empty pesticide containers. This indicates the need for urgent intervention to improve the level of safety practices among pesticide applicators, particularly in the African region. In general, b ased on the findings of this systematic review and meta-analysis, the pooled prevalence of pesticide safety practices is insufficient in the African region. This may be attributed to several reasons, such as deficient and ineffective pesticide management, monitoring, and evaluation systems, poor restrictive law enforcement, poor accessibility of PPE at affordable prices, poor awareness about short- and long-term health risks of pesticide exposure and inadequate pesticide waste disposal, which are crucial points that require interventions to safeguard the health of pesticide applicators and the public at large.

Strengths and limitations

This study used multiple electronic databases to retrieve articles. The quality of the articles was assessed using standard tools for quality assessment of prevalence studies (JBI). Furthermore, this study was conducted using PRISMA guidelines or protocols for systematic review and meta-analysis. In addition to the study’s strengths, there were several limitations, including methodological limitations, such as language bias resulting from the search being limited to English, the scarcity of high-quality publications, and poor or no reporting of some variables in the identified studies (e.g. wind direction during pesticide application, showering habits post-application, and risks of child exposure). When interpreting the pooled estimate because of the potential heterogeneity in the included studies, attention should be given to this issue. Despite these limitations, this meta-analysis provides credible findings that are essential for designing appropriate interventions.

Conclusions and recommendations

In conclusion, the present study revealed poor pesticide safety practices, including adequate use of facemasks, gloves, hats, and overall safe shoes, among pesticide applicators in the African region. PPEs have a privileged position in safety interventions in many countries for the control of pesticide exposure, even though they should only be used as a last alternative in the hierarchy of pesticide prevention and control measures. Nearly half of the participants reported that storing pesticides in their living rooms can pose a major health risk, and a substantial proportion of respondents reported the reuse of pesticide empty containers for other purposes.

The application of appropriate safety practices is recommended to reduce the health risks associated with pesticide exposure. Regional institutions, policymakers, agricultural extension services, and health education programs should plan appropriate interventions to improve pesticide safety practices and increase public awareness. In-depth education and field-based practical-oriented regular training for farmers about the proper use of PPEs during pesticide application is imperative for improving their understanding. Optimizing the pesticide management system through strict law enforcement of pesticide use and handling and providing adequate pesticide safety in-service training to ensure sufficient knowledge and skills to adopt self-protective behaviors are essential.

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

Comprehensive Meta-Analysis

Food and Agriculture Organization of the United Nations

Joanna Briggs Institute

Medical subject heading

Organochlorine pesticides

Personal Protective Equipment

Preferred Reporting Items for Systematic Review and Meta-Analysis

World Health Organization

Idowu GA, Aiyesanmi AF, Oyegoke FO. Organochlorine pesticide residues in pods and beans of cocoa (Theobroma cacao L.) from Ondo State Central District, Nigeria. Environ Adv. 2022;7:100162.

Article   CAS   Google Scholar  

Tessema RA, Nagy K, Ádám B. Occupational and environmental pesticide exposure and associated health risks among pesticide applicators and nonapplicator residents in rural Ethiopia. Front Public Health 2022:10:1017189. https://doi.org/10.3389/fpubh

Megha P, Sreedev P. Organochlorine pesticides, their toxic effects on living organisms and their fate in the environment. Interdiscip Toxicol. 2016;9(3–4):90–100. https://doi.org/10.1515/intox-2016-0012 .

Article   CAS   PubMed   Google Scholar  

Tudi M, Daniel Ruan H, Wang L, Lyu J, Sadler R, Connell D, et al. Agriculture development, pesticide application, and its impact on the environment. Int J Environ Res Public Health. 2021;18(3):1112.

Article   CAS   PubMed   PubMed Central   Google Scholar  

FAO. World Food and Agriculture – Statistical Yearbook 2022. Rome. https://doi.org/10.4060/cc2211en

Grube A, Donaldson D, Kiely T, Wu L. Pesticides industry sales and usage. Washington, DC: US EPA; 2011.

Google Scholar  

Jepson PC, Murray K, Bach O, Bonilla MA, Neumeister L. Selection of pesticides to reduce human and environmental health risks: a global guideline and minimum pesticides list. Lancet Planet Health. 2020;4(2):e56–63.

Article   PubMed   Google Scholar  

Organization WH. Preventing disease through healthy environments: exposure to highly hazardous pesticides: a major public health concern. World Health Organization; 2019.

Buah-Kwofie A, Humphries MS, Combrink X, Myburgh JG. Accumulation of organochlorine pesticides in fat tissue of wild Nile crocodiles (Crocodylus niloticus) from iSimangaliso Wetland Park, South Africa. Chemosphere. 2018;195:463–71.

Gerber R, Bouwman H, Govender D, Ishizuka M, Ikenaka Y, Yohannes YB, et al. Levels of DDTs and other organochlorine pesticides in healthy wild Nile crocodiles (Crocodylus niloticus) from a flagship conservation area. Chemosphere. 2021;264:128368.

Mew EJ, Padmanathan P, Konradsen F, Eddleston M, Chang S–S, Phillips MR, et al. The global burden of fatal self-poisoning with pesticides 2006-15: a systematic review. J Affect Disord. 2017;219:93–104.

Allsop M, Huxdorff C, Johnston P, Santillo D, Thompson K. Pesticides and our Health, A Growing Concern. University of Exeter Exeter EX4 4RN United Kingdom: Greenpeace Research Laboratories School of Biosciences Innovation Centre Phase. 2015;2.

Olisah C, Okoh OO, Okoh AI. Occurrence of organochlorine pesticide residues in biological and environmental matrices in Africa: a two-decade review. Heliyon. 2020;6(3).

Boedeker W, Watts M, Clausing P, Marquez E. The global distribution of acute unintentional pesticide poisoning: estimations based on a systematic review. BMC Public Health. 2020;20(1):1875.

Article   PubMed   PubMed Central   Google Scholar  

Alebachew F, Azage M, Kassie GG, Chanie M. Pesticide use safety practices and associated factors among farmers in Fogera district wetland areas, south Gondar Zone, Northwest Ethiopia. PLoS ONE. 2023;18(1):e0280185.

Berni I, Menouni A, El IG, Duca R-C, Kestemont M-P, Godderis L, et al. Understanding farmers’ safety behavior regarding pesticide use in Morocco. Sustainable Prod Consum. 2021;25:471–83.

Article   Google Scholar  

Sheahan M, Barrett CB, Goldvale C. Human health and pesticide use in sub-saharan Africa. Agric Econ. 2017;48(S1):27–41.

FAO International Code of Conduct on. Pesticide Management – Guidelines on Licensing of Public Health Pest Control Operators, Rome, Italy Pages p37 ISBN 9789241509923, 2015.

Garrigou A, Laurent C, Berthet A, Colosio C, Jas N, Daubas-Letourneux V, et al. Critical review of the role of PPE in the prevention of risks related to agricultural pesticide use. Saf Sci. 2020;123:104527.

Sapbamrer R, Thammachai A. Factors affecting the use of personal protective equipment and pesticide safety practices: a systematic review. Environ Res. 2020;185:109444.

JBI. The Joanna Briggs Institute. Critical appraisal tools for use in the JBI systematic reviews: Checklist for prevalence studies. Joanna Briggs Institute; 2017.

Ades AE, Lu G, Higgins JP. The interpretation of random-effects meta-analysis in decision models. Med Decis Making. 2005;25(6):646–54.

Shalaby SE, Abdou GY, Sallam AA. Pesticide-residue relationship and its adverse effects on occupational workers in Dakahlyia, Egypt. Appl Biol Res. 2012;14(1):24–32.

Gesesew HA, Woldemichael K, Massa D, Mwanri L. Farmers knowledge, attitudes, practices and health problems associated with pesticide use in rural irrigation villages, Southwest Ethiopia. PLoS ONE. 2016;11(9):e0162527.

Endalew M, Gebrehiwot M, Dessie A. Pesticide use knowledge, attitude, practices and practices associated factors among floriculture workers in Bahirdar city, North West, Ethiopia, 2020. Environ Health Insights. 2022;16:11786302221076250.

Lelamo S, Ashenafi T, Ejeso A, Soboksa NE, Negassa B, Aregu MB. Pesticide Use Practice and Associated factors among Rural Community of Malga District, Sidama Regional State, South Ethiopia. Environ Health Insights. 2023;17:11786302231157226.

Mequanint C, Getachew B, Mindaye Y, Amare DE, Guadu T, Dagne H. Practice towards pesticide handling, storage and its associated factors among farmers working in irrigations in Gondar town, Ethiopia, 2019. BMC Res Notes. 2019;12:1–6.

Mengistie BT, Mol AP, Oosterveer P. Pesticide use practices among smallholder vegetable farmers in Ethiopian Central Rift Valley. Environ Dev Sustain. 2017;19:301–24.

Aliyi N, Sorsa S, Deribe E. Pesticide usage and Safety Measures Awareness of Small Scale Farmers in Gera District, Jimma Zone, Western Ethiopia. Ethiop J Appl Sci Technol. 2018;9(1):19–30.

Deressa DA, Alemu K. Assessment of Pesticide Use by Farmers in Assosa District. Benishagul Gumuz National Regional State of Ethiopia; 2022.

Olalekan RM, Muhammad IH, Okoronkwo UL, Akopjubaro EH. Assessment of safety practices and farmer’s behaviors adopted when handling pesticides in rural Kano state. Nigeria Arts Humanit Open Access J. 2020;4(5):191–201.

Adesuyi AA, Longinus NK, Olatunde AM, Chinedu NV. Pesticides related knowledge, attitude, and safety practices among small-scale vegetable farmers in lagoon wetlands, Lagos, Nigeria. J Agric Environ Int Dev (JAEID). 2018;112(1):81–99.

Emeribe C, Ogbomida E, Akukwe T, Eze E, Nwobodo T, Aganmwonyi I, et al. Smallholder Farmers Perception and Awareness of Public Health Effects of Pesticides usage in selected Agrarian communities, Edo Central, Edo State, Nigeria. J Appl Sci Environ Manage. 2023;27(10):2133–51.

Nwadike C, Joshua VI, Doka PJ, Ajaj R, Abubakar Hashidu U, Gwary-Moda S, et al. Occupational safety knowledge, attitude, and practice among farmers in Northern Nigeria during pesticide application—a case study. Sustainability. 2021;13(18):10107.

Okoffo ED, Mensah M, Fosu-Mensah BY. Pesticide exposure and the use of personal protective equipment by cocoa farmers in Ghana. Environ Syst Res. 2016;5:1–15.

Wumbei A, Houbraken M, Spanoghe P. Pesticides use and exposure among yam farmers in the Nanumba traditional area of Ghana. Environ Monit Assess. 2019;191:1–16.

Miyittah MK, Kwadzo M, Gyamfua AP, Dodor DE. Health risk factors associated with pesticide use by watermelon farmers in central region, Ghana. Environ Syst Res. 2020;9:1–13.

Manyilizu WB, Mdegela RH, Helleve A, Skjerve E, Kazwala R, Nonga H, et al. Self-reported symptoms and pesticide use among farm workers in Arusha, Northern Tanzania: a cross-sectional study. Toxics. 2017;5(4):24.

Lekei EE, Ngowi AV, London L. Farmers’ knowledge, practices and injuries associated with pesticide exposure in rural farming villages in Tanzania. BMC Public Health. 2014;14:389.

Nguemo CC, Tita M, Abdel-Wahhab MA. Pesticide knowledge and safety practices in farm workers from Tubah Sub-division, North West Region, Cameroon. Int J Halal Res. 2019;1(1):39–47.

Jean S, Emmanuel F, Edouard AN, Brownlinda S. Farmers’ Knowledge, Attitude and Practices on Pesticide Safety: A Case Study of Vegetable Farmers in Mount-Bamboutos Agricultural Area, Cameroon. 2019.

Vikkey HA, Fidel D, Pazou Elisabeth Y, Hilaire H, Hervé L, Badirou A, et al. Risk factors for pesticide poisoning and pesticide users’ cholinesterase levels in cotton production areas: Glazoué and savè townships, in the central Republic of Benin. Environ Health Insights. 2017;11:1178630217704659.

Ndayambaje B, Amuguni H, Coffin-Schmitt J, Sibo N, Ntawubizi M, VanWormer E. Pesticide application practices and knowledge among small-scale local rice growers and communities in Rwanda: a cross-sectional study. Int J Environ Res Public Health. 2019;16(23):4770.

Oesterlund AH, Thomsen JF, Sekimpi DK, Maziina J, Racheal A, Jørs E. Pesticide knowledge, practice and attitude and how it affects the health of small-scale farmers in Uganda: a cross-sectional study. Afr Health Sci. 2014;14(2):420–33.

Macharia I, Mithöfer D, Waibel H. Pesticide handling practices by a vegetable farmer in Kenya. Environ Dev Sustain. 2013;15:887–902.

Yarpuz-Bozdogan N. The importance of personal protective equipment in pesticide applications in agriculture. Curr Opin Environ Sci Health. 2018;4:1–4.

Furlong M, Tanner CM, Goldman SM, Bhudhikanok GS, Blair A, Chade A, et al. Protective glove use and hygiene habits modify the associations of specific pesticides with Parkinson’s disease. Environ Int. 2015;75:144–50.

Download references

Acknowledgements

Not applicable.

The authors did not receive funding for this work.

Author information

Authors and affiliations.

College of Health and Medical Science, School of Environmental Health, Haramaya University, PO Box, 235, Harar, Ethiopia

Dechasa Adare Mengistu, Abraham Geremew & Roba Argaw Tessema

You can also search for this author in PubMed   Google Scholar

Contributions

D.A made contributions to the conception and design, literature search, study selection, data extraction, risk of bias assessment, data analysis, and drafting of the manuscript. A.G, and R.A contributed to the conception, design of the study, risk of bias assessment, data analysis, data interpretation, and revision of the manuscript for critically important intellectual content; revision of the manuscript. All authors have read and approved the final version of the manuscript and agree with the order of the presentation of the authors.

Corresponding author

Correspondence to Roba Argaw Tessema .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Mengistu, D.A., Geremew, A. & Tessema, R.A. Pesticide safety practice and its public health risk in African regions: systematic review and meta-analysis. BMC Public Health 24 , 2295 (2024). https://doi.org/10.1186/s12889-024-19764-4

Download citation

Received : 17 March 2024

Accepted : 12 August 2024

Published : 23 August 2024

DOI : https://doi.org/10.1186/s12889-024-19764-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Safety practice
  • Pesticide handling
  • Health risk
  • African region

BMC Public Health

ISSN: 1471-2458

critical appraisal of systematic review essay

  • DOI: 10.1177/26331055241270591
  • Corpus ID: 271878581

Amygdala fMRI—A Critical Appraisal of the Extant Literature

  • T. Varkevisser , E. Geuze , J. van Honk
  • Published in Neuroscience Insights 13 August 2024

Ask This Paper

By using this feature, you agree to AI2's terms and conditions and that you will not submit any sensitive or confidential info.

AI2 may include your prompts and inputs in a public dataset for future AI research and development. Please check the box to opt-out.

Ask a question about " "

Supporting statements, figures from this paper.

figure 1

190 References

Amygdala habituation: a reliable fmri phenotype.

  • Highly Influential

Reproducibility of amygdala activation in facial emotion processing at 7T

Reliability of fronto–amygdala coupling during emotional face processing, the role of the amygdala in emotional processing: a quantitative meta-analysis of functional neuroimaging studies, fmri neurofeedback of amygdala response to aversive stimuli enhances prefrontal–limbic brain connectivity, hemodynamic responses of the amygdala, the orbitofrontal cortex and the visual cortex during a fear conditioning paradigm., lateralization of amygdala activation: a systematic review of functional neuroimaging studies, event-related activation in the human amygdala associates with later memory for individual emotional experience, unreliability of putative fmri biomarkers during emotional face processing, subliminal emotional faces elicit predominantly right-lateralized amygdala activation: a systematic meta-analysis of fmri studies, related papers.

Showing 1 through 3 of 0 Related Papers

COMMENTS

  1. Full article: Critical appraisal

    The purpose of the current article is to define critical appraisal, identify its benefits, discuss conceptual issues influencing the adequacy of a critical appraisal, and detail procedures to help reviewers undertake critical appraisals. A critical appraisal involves a careful and systematic assessment of a study's trustworthiness or ...

  2. Evidence Synthesis Guide : Risk of Bias by Study Design

    This guide provides information and resources which may be helpful when undertaking a systematic review, scoping review or other type of evidence synthesis review.

  3. Appraising systematic reviews: a comprehensive guide to ensuring

    These recommendations advocate for systematic approaches and emphasize the documentation of critical components, including the search strategy and study selection. A thorough evaluation of methodologies, research quality, and overall evidence strength is essential during the appraisal process.

  4. PDF Checklist for Systematic Reviews and Research Syntheses

    The systematic review should present a clear statement that critical appraisal was conducted and provide the details of the items that were used to assess the included studies.

  5. How to critically appraise a systematic review: an aide for the reader

    Checklist Given the surge in poor-quality systematic review publications, we sought to describe a checklist of seven practical tips from the authors' collective experience of writing and critically appraising systematic reviews, hoping that they will assist busy clinicians to critically appraise systematic reviews both as manuscript reviewers and as readers and research users.

  6. Critical Appraisal Tools and Reporting Guidelines

    AMSTAR 2: A critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ, 358, j4008. https://doi.org/10.1136/bmj.j4008

  7. Guidelines for writing a systematic review

    A preliminary review, which can often result in a full systematic review, to understand the available research literature, is usually time or scope limited. Complies evidence from multiple reviews and does not search for primary studies. 3. Identifying a topic and developing inclusion/exclusion criteria.

  8. Critical Appraisal

    Critical Appraisal Some reviews require a critical appraisal for each study that makes it through the screening process. This involves a risk of bias assessment and/or a quality assessment. The goal of these reviews is not just to find all of the studies, but to determine their methodological rigor, and therefore, their credibility. "Critical appraisal is the balanced assessment of a piece of ...

  9. Guidance to best tools and practices for systematic reviews

    In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews.

  10. Critical Appraisal of Systematic Reviews and Meta-Analyses

    Systematic reviews are the most reliable and comprehensive statement about what works. They focus on a specific question and use clearly stated, prespecified scientific methods to identify, select, assess, and summarise the findings of similar but separate studies.

  11. Guidance to best tools and practices for systematic reviews

    Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting.

  12. Critical appraisal

    Critical appraisal tools. Use a formal Critical Appraisal Tool ("CAT") to assess your papers. The tool must be applied without adaptation to the appropriate study design. AMSTAR. Assessment of Multiple Systematic Reviews (AMSTAR) is a 37-item assessment tool used to assess the methodological quality of systematic reviews.

  13. Critical Appraisal of Clinical Research

    Critical appraisal of systematic reviews: provide an overview of all primary studies on a topic and try to obtain an overall picture of the results. In a systematic review, all the primary studies identified are critically appraised and only the best ones are selected.

  14. (PDF) Critical appraisal

    A critical appraisal involves. a careful and systematic assessment of a study s trustworthiness. or methodological rigour, and contributes to assessing how. con fident people can be in the ...

  15. Critical Appraisal of a Systematic Review: A Concise Review

    Data synthesis: A systematic review is a review of a clearly formulated question that uses systematic and explicit methods to identify, select, and critically appraise relevant original research, and to collect and analyze data from the studies that are included in the review. Critical appraisal methods address both the credibility (quality of ...

  16. Systematic reviews: Structure, form and content

    This article provides an overview of the structure, form and content of systematic reviews, with a particular focus on the literature searching component. It will also discuss tools and resources - including those relating to reporting standards and critical appraisal of the articles included in the review - which will be of use to researchers conducting a systematic review.

  17. Critical Appraisal of Research Articles: Systematic Reviews

    Is it a systematic review of the right type of studies which are relevant to your question? Does the methods section describe how all the relevant trials were found and assessed? The paper should give a comprehensive account of the sources consulted in the search for relevant papers, the search strategy used to find them, and the quality and relevance criteria used to decide whether to include ...

  18. Steps of a Systematic Review

    PRISMA guidelines suggest including critical appraisal of the included studies to assess the risk of bias and to include the assessment in your final manuscripts. ... relevance and results of published papers." Johns Hopkins Evidence-Based ... this section of the guide was adapted from Texas A&M's "Systematic Reviews and Related Evidence ...

  19. Critical Appraisal

    Critical appraisal is the careful analysis of studies to determine their relative value. The Institute of Medicine's Standards for Systematic Reviews includes Standard 3.6: "Critically appraise each study: 3.6.1 Systematically assess the risk of bias, using predefined criteria. 3.6.2 Assess the relevance of the study's populations ...

  20. JBI Critical Appraisal Tools

    JBI's critical appraisal tools assist in assessing the trustworthiness, relevance and results of published papers. These tools have been revised. Recently published articles detail the revision. "Assessing the risk of bias of quantitative analytical studies: introducing the vision for critical appraisal within JBI systematic reviews"

  21. Systematic reviews

    Critical Appraisal of quality and relevance Critical appraisal involves checking the quality, reliability and relevance of the studies in the review in relation to the review question. It appraises each study in terms of the following aspects:. Is the study relevant to the research question? Is the study valid? E.g.

  22. Systematic Reviews

    Critical Appraisal Critical appraisal simply put is the process of systematically looking at research papers to assess three important things: trustworthiness, value and relevance. When critically appraising a research paper the first step is to examine the study for any bias.

  23. Assess for Quality and Bias of Studies

    Per the Cochrane Handbook: "Methodological quality refers to critical appraisal of a study or systematic review and the extent to which study authors conducted and reported their research to the highest possible standard. Bias refers to systematic deviation of results or inferences from the truth.

  24. Critical Appraisal Of Papers Systemic And Narrative Reviews Nursing Essay

    The intention of this essay is to critically appraise one paper of systematic review and one paper of narrative review by using CASP tool and then compare and contrast the difference between the two reviews. This assignment will start with by illustrating the types of literature review (systematic and narrative reviews) and the strengths and ...

  25. A critical appraisal of"effects of exercise therapy on disability

    We used the RoB2 tool to assess the quality of the included studies, because it is widely used in randomized controlled trials. Based on articles by Sterne et al. [] in BMJ and Higgins et al. [] in the Cochrane Handbook for Systematic Reviews of Interventions, we believe RoB2 is suitable for this study.In future studies, we will consider using more appropriate assessment tools for non ...

  26. The importance of systematic reviews

    The extensive and comprehensive systematic review and meta analysis of Shool et al (Citation 2024) complements the work of Liu et al. (Citation 2008), the only systematic review on motorcycle helmet use and injury outcomes in the Cochrane database. Shool and collaborators aimed to identify the underlying causes of the variation in helmet usage ...

  27. A systematic review and critical appraisal of menopause guidelines

    T1 - A systematic review and critical appraisal of menopause guidelines. AU - Hemachandra, Chandima. AU - Taylor, Sasha. AU - Islam, Rakibul M. AU - Fooladi, Ensieh. AU - Davis, Susan R. N1 - Funding Information: This research was funded by the Australian National Health and Medical Research Council (NHMRC) (Grant 2015514).

  28. The Ambiguous Correlation of Blautia with Obesity: A Systematic Review

    Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers. ... A systematic review" by Warren Chanda and colleagues. ... Jan M. Sargeant, Critical Appraisal of Studies Using Laboratory Animal Models, ILAR Journal, Volume 55, Issue 3, 2014, Pages 405 ...

  29. Pesticide safety practice and its public health risk in African regions

    The preferred reporting items for systematic reviews and the meta-analysis protocol were used to carry out this study. The Scopus, PubMed, Web of Science, Google Scholar, DOAJ, and National Repository databases were searched for articles published between November 12, 2023, and January 2, 2024.

  30. Amygdala fMRI—A Critical Appraisal of the Extant Literature

    Even before the advent of fMRI, the amygdala occupied a central space in the affective neurosciences. Yet this amygdala-centred view on emotion processing gained even wider acceptance after the inception of fMRI in the early 1990s, a landmark that triggered a goldrush of fMRI studies targeting the amygdala in vivo. Initially, this amygdala fMRI research was mostly confined to task-activation ...