Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is Peer Review? | Types & Examples

What Is Peer Review? | Types & Examples

Published on December 17, 2021 by Tegan George . Revised on June 22, 2023.

Peer review, sometimes referred to as refereeing , is the process of evaluating submissions to an academic journal. Using strict criteria, a panel of reviewers in the same subject area decides whether to accept each submission for publication.

Peer-reviewed articles are considered a highly credible source due to the stringent process they go through before publication.

There are various types of peer review. The main difference between them is to what extent the authors, reviewers, and editors know each other’s identities. The most common types are:

  • Single-blind review
  • Double-blind review
  • Triple-blind review

Collaborative review

Open review.

Relatedly, peer assessment is a process where your peers provide you with feedback on something you’ve written, based on a set of criteria or benchmarks from an instructor. They then give constructive feedback, compliments, or guidance to help you improve your draft.

Table of contents

What is the purpose of peer review, types of peer review, the peer review process, providing feedback to your peers, peer review example, advantages of peer review, criticisms of peer review, other interesting articles, frequently asked questions about peer reviews.

Many academic fields use peer review, largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the manuscript. For this reason, academic journals are among the most credible sources you can refer to.

However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure.

Peer assessment is often used in the classroom as a pedagogical tool. Both receiving feedback and providing it are thought to enhance the learning process, helping students think critically and collaboratively.

Prevent plagiarism. Run a free check.

Depending on the journal, there are several types of peer review.

Single-blind peer review

The most common type of peer review is single-blind (or single anonymized) review . Here, the names of the reviewers are not known by the author.

While this gives the reviewers the ability to give feedback without the possibility of interference from the author, there has been substantial criticism of this method in the last few years. Many argue that single-blind reviewing can lead to poaching or intellectual theft or that anonymized comments cause reviewers to be too harsh.

Double-blind peer review

In double-blind (or double anonymized) review , both the author and the reviewers are anonymous.

Arguments for double-blind review highlight that this mitigates any risk of prejudice on the side of the reviewer, while protecting the nature of the process. In theory, it also leads to manuscripts being published on merit rather than on the reputation of the author.

Triple-blind peer review

While triple-blind (or triple anonymized) review —where the identities of the author, reviewers, and editors are all anonymized—does exist, it is difficult to carry out in practice.

Proponents of adopting triple-blind review for journal submissions argue that it minimizes potential conflicts of interest and biases. However, ensuring anonymity is logistically challenging, and current editing software is not always able to fully anonymize everyone involved in the process.

In collaborative review , authors and reviewers interact with each other directly throughout the process. However, the identity of the reviewer is not known to the author. This gives all parties the opportunity to resolve any inconsistencies or contradictions in real time, and provides them a rich forum for discussion. It can mitigate the need for multiple rounds of editing and minimize back-and-forth.

Collaborative review can be time- and resource-intensive for the journal, however. For these collaborations to occur, there has to be a set system in place, often a technological platform, with staff monitoring and fixing any bugs or glitches.

Lastly, in open review , all parties know each other’s identities throughout the process. Often, open review can also include feedback from a larger audience, such as an online forum, or reviewer feedback included as part of the final published product.

While many argue that greater transparency prevents plagiarism or unnecessary harshness, there is also concern about the quality of future scholarship if reviewers feel they have to censor their comments.

In general, the peer review process includes the following steps:

  • First, the author submits the manuscript to the editor.
  • Reject the manuscript and send it back to the author, or
  • Send it onward to the selected peer reviewer(s)
  • Next, the peer review process occurs. The reviewer provides feedback, addressing any major or minor issues with the manuscript, and gives their advice regarding what edits should be made.
  • Lastly, the edited manuscript is sent back to the author. They input the edits and resubmit it to the editor for publication.

The peer review process

In an effort to be transparent, many journals are now disclosing who reviewed each article in the published product. There are also increasing opportunities for collaboration and feedback, with some journals allowing open communication between reviewers and authors.

It can seem daunting at first to conduct a peer review or peer assessment. If you’re not sure where to start, there are several best practices you can use.

Summarize the argument in your own words

Summarizing the main argument helps the author see how their argument is interpreted by readers, and gives you a jumping-off point for providing feedback. If you’re having trouble doing this, it’s a sign that the argument needs to be clearer, more concise, or worded differently.

If the author sees that you’ve interpreted their argument differently than they intended, they have an opportunity to address any misunderstandings when they get the manuscript back.

Separate your feedback into major and minor issues

It can be challenging to keep feedback organized. One strategy is to start out with any major issues and then flow into the more minor points. It’s often helpful to keep your feedback in a numbered list, so the author has concrete points to refer back to.

Major issues typically consist of any problems with the style, flow, or key points of the manuscript. Minor issues include spelling errors, citation errors, or other smaller, easy-to-apply feedback.

Tip: Try not to focus too much on the minor issues. If the manuscript has a lot of typos, consider making a note that the author should address spelling and grammar issues, rather than going through and fixing each one.

The best feedback you can provide is anything that helps them strengthen their argument or resolve major stylistic issues.

Give the type of feedback that you would like to receive

No one likes being criticized, and it can be difficult to give honest feedback without sounding overly harsh or critical. One strategy you can use here is the “compliment sandwich,” where you “sandwich” your constructive criticism between two compliments.

Be sure you are giving concrete, actionable feedback that will help the author submit a successful final draft. While you shouldn’t tell them exactly what they should do, your feedback should help them resolve any issues they may have overlooked.

As a rule of thumb, your feedback should be:

  • Easy to understand
  • Constructive

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

peer review analysis of research

Below is a brief annotated research example. You can view examples of peer feedback by hovering over the highlighted sections.

Influence of phone use on sleep

Studies show that teens from the US are getting less sleep than they were a decade ago (Johnson, 2019) . On average, teens only slept for 6 hours a night in 2021, compared to 8 hours a night in 2011. Johnson mentions several potential causes, such as increased anxiety, changed diets, and increased phone use.

The current study focuses on the effect phone use before bedtime has on the number of hours of sleep teens are getting.

For this study, a sample of 300 teens was recruited using social media, such as Facebook, Instagram, and Snapchat. The first week, all teens were allowed to use their phone the way they normally would, in order to obtain a baseline.

The sample was then divided into 3 groups:

  • Group 1 was not allowed to use their phone before bedtime.
  • Group 2 used their phone for 1 hour before bedtime.
  • Group 3 used their phone for 3 hours before bedtime.

All participants were asked to go to sleep around 10 p.m. to control for variation in bedtime . In the morning, their Fitbit showed the number of hours they’d slept. They kept track of these numbers themselves for 1 week.

Two independent t tests were used in order to compare Group 1 and Group 2, and Group 1 and Group 3. The first t test showed no significant difference ( p > .05) between the number of hours for Group 1 ( M = 7.8, SD = 0.6) and Group 2 ( M = 7.0, SD = 0.8). The second t test showed a significant difference ( p < .01) between the average difference for Group 1 ( M = 7.8, SD = 0.6) and Group 3 ( M = 6.1, SD = 1.5).

This shows that teens sleep fewer hours a night if they use their phone for over an hour before bedtime, compared to teens who use their phone for 0 to 1 hours.

Peer review is an established and hallowed process in academia, dating back hundreds of years. It provides various fields of study with metrics, expectations, and guidance to ensure published work is consistent with predetermined standards.

  • Protects the quality of published research

Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. Any content that raises red flags for reviewers can be closely examined in the review stage, preventing plagiarized or duplicated research from being published.

  • Gives you access to feedback from experts in your field

Peer review represents an excellent opportunity to get feedback from renowned experts in your field and to improve your writing through their feedback and guidance. Experts with knowledge about your subject matter can give you feedback on both style and content, and they may also suggest avenues for further research that you hadn’t yet considered.

  • Helps you identify any weaknesses in your argument

Peer review acts as a first defense, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process. This way, you’ll end up with a more robust, more cohesive article.

While peer review is a widely accepted metric for credibility, it’s not without its drawbacks.

  • Reviewer bias

The more transparent double-blind system is not yet very common, which can lead to bias in reviewing. A common criticism is that an excellent paper by a new researcher may be declined, while an objectively lower-quality submission by an established researcher would be accepted.

  • Delays in publication

The thoroughness of the peer review process can lead to significant delays in publishing time. Research that was current at the time of submission may not be as current by the time it’s published. There is also high risk of publication bias , where journals are more likely to publish studies with positive findings than studies with negative findings.

  • Risk of human error

By its very nature, peer review carries a risk of human error. In particular, falsification often cannot be detected, given that reviewers would have to replicate entire experiments to ensure the validity of results.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Measures of central tendency
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Thematic analysis
  • Discourse analysis
  • Cohort study
  • Ethnography

Research bias

  • Implicit bias
  • Cognitive bias
  • Conformity bias
  • Hawthorne effect
  • Availability heuristic
  • Attrition bias
  • Social desirability bias

Peer review is a process of evaluating submissions to an academic journal. Utilizing rigorous criteria, a panel of reviewers in the same subject area decide whether to accept each submission for publication. For this reason, academic journals are often considered among the most credible sources you can use in a research project– provided that the journal itself is trustworthy and well-regarded.

In general, the peer review process follows the following steps: 

  • Reject the manuscript and send it back to author, or 
  • Send it onward to the selected peer reviewer(s) 
  • Next, the peer review process occurs. The reviewer provides feedback, addressing any major or minor issues with the manuscript, and gives their advice regarding what edits should be made. 
  • Lastly, the edited manuscript is sent back to the author. They input the edits, and resubmit it to the editor for publication.

Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. It also represents an excellent opportunity to get feedback from renowned experts in your field. It acts as a first defense, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process.

Peer-reviewed articles are considered a highly credible source due to this stringent process they go through before publication.

Many academic fields use peer review , largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the published manuscript.

However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure. 

A credible source should pass the CRAAP test  and follow these guidelines:

  • The information should be up to date and current.
  • The author and publication should be a trusted authority on the subject you are researching.
  • The sources the author cited should be easy to find, clear, and unbiased.
  • For a web source, the URL and layout should signify that it is trustworthy.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

George, T. (2023, June 22). What Is Peer Review? | Types & Examples. Scribbr. Retrieved August 21, 2024, from https://www.scribbr.com/methodology/peer-review/

Is this article helpful?

Tegan George

Tegan George

Other students also liked, what are credible sources & how to spot them | examples, ethical considerations in research | types & examples, applying the craap test & evaluating sources, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Back Home

  • Science Notes Posts
  • Contact Science Notes
  • Todd Helmenstine Biography
  • Anne Helmenstine Biography
  • Free Printable Periodic Tables (PDF and PNG)
  • Periodic Table Wallpapers
  • Interactive Periodic Table
  • Periodic Table Posters
  • Science Experiments for Kids
  • How to Grow Crystals
  • Chemistry Projects
  • Fire and Flames Projects
  • Holiday Science
  • Chemistry Problems With Answers
  • Physics Problems
  • Unit Conversion Example Problems
  • Chemistry Worksheets
  • Biology Worksheets
  • Periodic Table Worksheets
  • Physical Science Worksheets
  • Science Lab Worksheets
  • My Amazon Books

Understanding Peer Review in Science

Peer Review Process

Peer review is an essential element of the scientific publishing process that helps ensure that research articles are evaluated, critiqued, and improved before release into the academic community. Take a look at the significance of peer review in scientific publications, the typical steps of the process, and and how to approach peer review if you are asked to assess a manuscript.

What Is Peer Review?

Peer review is the evaluation of work by peers, who are people with comparable experience and competency. Peers assess each others’ work in educational settings, in professional settings, and in the publishing world. The goal of peer review is improving quality, defining and maintaining standards, and helping people learn from one another.

In the context of scientific publication, peer review helps editors determine which submissions merit publication and improves the quality of manuscripts prior to their final release.

Types of Peer Review for Manuscripts

There are three main types of peer review:

  • Single-blind review: The reviewers know the identities of the authors, but the authors do not know the identities of the reviewers.
  • Double-blind review: Both the authors and reviewers remain anonymous to each other.
  • Open peer review: The identities of both the authors and reviewers are disclosed, promoting transparency and collaboration.

There are advantages and disadvantages of each method. Anonymous reviews reduce bias but reduce collaboration, while open reviews are more transparent, but increase bias.

Key Elements of Peer Review

Proper selection of a peer group improves the outcome of the process:

  • Expertise : Reviewers should possess adequate knowledge and experience in the relevant field to provide constructive feedback.
  • Objectivity : Reviewers assess the manuscript impartially and without personal bias.
  • Confidentiality : The peer review process maintains confidentiality to protect intellectual property and encourage honest feedback.
  • Timeliness : Reviewers provide feedback within a reasonable timeframe to ensure timely publication.

Steps of the Peer Review Process

The typical peer review process for scientific publications involves the following steps:

  • Submission : Authors submit their manuscript to a journal that aligns with their research topic.
  • Editorial assessment : The journal editor examines the manuscript and determines whether or not it is suitable for publication. If it is not, the manuscript is rejected.
  • Peer review : If it is suitable, the editor sends the article to peer reviewers who are experts in the relevant field.
  • Reviewer feedback : Reviewers provide feedback, critique, and suggestions for improvement.
  • Revision and resubmission : Authors address the feedback and make necessary revisions before resubmitting the manuscript.
  • Final decision : The editor makes a final decision on whether to accept or reject the manuscript based on the revised version and reviewer comments.
  • Publication : If accepted, the manuscript undergoes copyediting and formatting before being published in the journal.

Pros and Cons

While the goal of peer review is improving the quality of published research, the process isn’t without its drawbacks.

  • Quality assurance : Peer review helps ensure the quality and reliability of published research.
  • Error detection : The process identifies errors and flaws that the authors may have overlooked.
  • Credibility : The scientific community generally considers peer-reviewed articles to be more credible.
  • Professional development : Reviewers can learn from the work of others and enhance their own knowledge and understanding.
  • Time-consuming : The peer review process can be lengthy, delaying the publication of potentially valuable research.
  • Bias : Personal biases of reviews impact their evaluation of the manuscript.
  • Inconsistency : Different reviewers may provide conflicting feedback, making it challenging for authors to address all concerns.
  • Limited effectiveness : Peer review does not always detect significant errors or misconduct.
  • Poaching : Some reviewers take an idea from a submission and gain publication before the authors of the original research.

Steps for Conducting Peer Review of an Article

Generally, an editor provides guidance when you are asked to provide peer review of a manuscript. Here are typical steps of the process.

  • Accept the right assignment: Accept invitations to review articles that align with your area of expertise to ensure you can provide well-informed feedback.
  • Manage your time: Allocate sufficient time to thoroughly read and evaluate the manuscript, while adhering to the journal’s deadline for providing feedback.
  • Read the manuscript multiple times: First, read the manuscript for an overall understanding of the research. Then, read it more closely to assess the details, methodology, results, and conclusions.
  • Evaluate the structure and organization: Check if the manuscript follows the journal’s guidelines and is structured logically, with clear headings, subheadings, and a coherent flow of information.
  • Assess the quality of the research: Evaluate the research question, study design, methodology, data collection, analysis, and interpretation. Consider whether the methods are appropriate, the results are valid, and the conclusions are supported by the data.
  • Examine the originality and relevance: Determine if the research offers new insights, builds on existing knowledge, and is relevant to the field.
  • Check for clarity and consistency: Review the manuscript for clarity of writing, consistent terminology, and proper formatting of figures, tables, and references.
  • Identify ethical issues: Look for potential ethical concerns, such as plagiarism, data fabrication, or conflicts of interest.
  • Provide constructive feedback: Offer specific, actionable, and objective suggestions for improvement, highlighting both the strengths and weaknesses of the manuscript. Don’t be mean.
  • Organize your review: Structure your review with an overview of your evaluation, followed by detailed comments and suggestions organized by section (e.g., introduction, methods, results, discussion, and conclusion).
  • Be professional and respectful: Maintain a respectful tone in your feedback, avoiding personal criticism or derogatory language.
  • Proofread your review: Before submitting your review, proofread it for typos, grammar, and clarity.
  • Couzin-Frankel J (September 2013). “Biomedical publishing. Secretive and subjective, peer review proves resistant to study”. Science . 341 (6152): 1331. doi: 10.1126/science.341.6152.1331
  • Lee, Carole J.; Sugimoto, Cassidy R.; Zhang, Guo; Cronin, Blaise (2013). “Bias in peer review”. Journal of the American Society for Information Science and Technology. 64 (1): 2–17. doi: 10.1002/asi.22784
  • Slavov, Nikolai (2015). “Making the most of peer review”. eLife . 4: e12708. doi: 10.7554/eLife.12708
  • Spier, Ray (2002). “The history of the peer-review process”. Trends in Biotechnology . 20 (8): 357–8. doi: 10.1016/S0167-7799(02)01985-6
  • Squazzoni, Flaminio; Brezis, Elise; Marušić, Ana (2017). “Scientometrics of peer review”. Scientometrics . 113 (1): 501–502. doi: 10.1007/s11192-017-2518-4

Related Posts

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Reumatologia
  • v.59(1); 2021

Logo of reumatol

Peer review guidance: a primer for researchers

Olena zimba.

1 Department of Internal Medicine No. 2, Danylo Halytsky Lviv National Medical University, Lviv, Ukraine

Armen Yuri Gasparyan

2 Departments of Rheumatology and Research and Development, Dudley Group NHS Foundation Trust (Teaching Trust of the University of Birmingham, UK), Russells Hall Hospital, Dudley, West Midlands, UK

The peer review process is essential for quality checks and validation of journal submissions. Although it has some limitations, including manipulations and biased and unfair evaluations, there is no other alternative to the system. Several peer review models are now practised, with public review being the most appropriate in view of the open science movement. Constructive reviewer comments are increasingly recognised as scholarly contributions which should meet certain ethics and reporting standards. The Publons platform, which is now part of the Web of Science Group (Clarivate Analytics), credits validated reviewer accomplishments and serves as an instrument for selecting and promoting the best reviewers. All authors with relevant profiles may act as reviewers. Adherence to research reporting standards and access to bibliographic databases are recommended to help reviewers draft evidence-based and detailed comments.

Introduction

The peer review process is essential for evaluating the quality of scholarly works, suggesting corrections, and learning from other authors’ mistakes. The principles of peer review are largely based on professionalism, eloquence, and collegiate attitude. As such, reviewing journal submissions is a privilege and responsibility for ‘elite’ research fellows who contribute to their professional societies and add value by voluntarily sharing their knowledge and experience.

Since the launch of the first academic periodicals back in 1665, the peer review has been mandatory for validating scientific facts, selecting influential works, and minimizing chances of publishing erroneous research reports [ 1 ]. Over the past centuries, peer review models have evolved from single-handed editorial evaluations to collegial discussions, with numerous strengths and inevitable limitations of each practised model [ 2 , 3 ]. With multiplication of periodicals and editorial management platforms, the reviewer pool has expanded and internationalized. Various sets of rules have been proposed to select skilled reviewers and employ globally acceptable tools and language styles [ 4 , 5 ].

In the era of digitization, the ethical dimension of the peer review has emerged, necessitating involvement of peers with full understanding of research and publication ethics to exclude unethical articles from the pool of evidence-based research and reviews [ 6 ]. In the time of the COVID-19 pandemic, some, if not most, journals face the unavailability of skilled reviewers, resulting in an unprecedented increase of articles without a history of peer review or those with surprisingly short evaluation timelines [ 7 ].

Editorial recommendations and the best reviewers

Guidance on peer review and selection of reviewers is currently available in the recommendations of global editorial associations which can be consulted by journal editors for updating their ethics statements and by research managers for crediting the evaluators. The International Committee on Medical Journal Editors (ICMJE) qualifies peer review as a continuation of the scientific process that should involve experts who are able to timely respond to reviewer invitations, submitting unbiased and constructive comments, and keeping confidentiality [ 8 ].

The reviewer roles and responsibilities are listed in the updated recommendations of the Council of Science Editors (CSE) [ 9 ] where ethical conduct is viewed as a premise of the quality evaluations. The Committee on Publication Ethics (COPE) further emphasizes editorial strategies that ensure transparent and unbiased reviewer evaluations by trained professionals [ 10 ]. Finally, the World Association of Medical Editors (WAME) prioritizes selecting the best reviewers with validated profiles to avoid substandard or fraudulent reviewer comments [ 11 ]. Accordingly, the Sarajevo Declaration on Integrity and Visibility of Scholarly Publications encourages reviewers to register with the Open Researcher and Contributor ID (ORCID) platform to validate and publicize their scholarly activities [ 12 ].

Although the best reviewer criteria are not listed in the editorial recommendations, it is apparent that the manuscript evaluators should be active researchers with extensive experience in the subject matter and an impressive list of relevant and recent publications [ 13 ]. All authors embarking on an academic career and publishing articles with active contact details can be involved in the evaluation of others’ scholarly works [ 14 ]. Ideally, the reviewers should be peers of the manuscript authors with equal scholarly ranks and credentials.

However, journal editors may employ schemes that engage junior research fellows as co-reviewers along with their mentors and senior fellows [ 15 ]. Such a scheme is successfully practised within the framework of the Emerging EULAR (European League Against Rheumatism) Network (EMEUNET) where seasoned authors (mentors) train ongoing researchers (mentees) how to evaluate submissions to the top rheumatology journals and select the best evaluators for regular contributors to these journals [ 16 ].

The awareness of the EQUATOR Network reporting standards may help the reviewers to evaluate methodology and suggest related revisions. Statistical skills help the reviewers to detect basic mistakes and suggest additional analyses. For example, scanning data presentation and revealing mistakes in the presentation of means and standard deviations often prompt re-analyses of distributions and replacement of parametric tests with non-parametric ones [ 17 , 18 ].

Constructive reviewer comments

The main goal of the peer review is to support authors in their attempt to publish ethically sound and professionally validated works that may attract readers’ attention and positively influence healthcare research and practice. As such, an optimal reviewer comment has to comprehensively examine all parts of the research and review work ( Table I ). The best reviewers are viewed as contributors who guide authors on how to correct mistakes, discuss study limitations, and highlight its strengths [ 19 ].

Structure of a reviewer comment to be forwarded to authors

SectionNotes
Introductory lineSummarizes the overall impression about the manuscript validity and implications
Evaluation of the title, abstract and keywordsEvaluates the title correctness and completeness, inclusion of all relevant keywords, study design terms, information load, and relevance of the abstract
Major commentsSpecifically analyses each manuscript part in line with available research reporting standards, supports all suggestions with solid evidence, weighs novelty of hypotheses and methodological rigour, highlights the choice of study design, points to missing/incomplete ethics approval statements, rights to re-use graphics, accuracy and completeness of statistical analyses, professionalism of bibliographic searches and inclusion of updated and relevant references
Minor commentsIdentifies language mistakes, typos, inappropriate format of graphics and references, length of texts and tables, use of supplementary material, unusual sections and order, completeness of scholarly contribution, conflict of interest, and funding statements
Concluding remarksReflects on take-home messages and implications

Some of the currently practised review models are well positioned to help authors reveal and correct their mistakes at pre- or post-publication stages ( Table II ). The global move toward open science is particularly instrumental for increasing the quality and transparency of reviewer contributions.

Advantages and disadvantages of common manuscript evaluation models

ModelsAdvantagesDisadvantages
In-house (internal) editorial reviewAllows detection of major flaws and errors that justify outright rejections; rarely, outstanding manuscripts are accepted without delaysJournal staff evaluations may be biased; manuscript acceptance without external review may raise concerns of soft quality checks
Single-blind peer reviewMasking reviewer identity prevents personal conflicts in small (closed) professional communitiesReviewer access to author profiles may result in biased and subjective evaluations
Double-blind peer reviewConcealing author and reviewer identities prevents biased evaluations, particularly in small communitiesMasking all identifying information is technically burdensome and not always possible
Open (public) peer reviewMay increase quality, objectivity, and accountability of reviewer evaluations; it is now part of open science culturePeers who do not wish to disclose their identity may decline reviewer invitations
Post-publication open peer reviewMay accelerate dissemination of influential reports in line with the concept “publish first, judge later”; this concept is practised by some open-access journals (e.g., F1000 Research)Not all manuscripts benefit from open dissemination without peers’ input; post-publication review may delay detection of minor or major mistakes
Post-publication social media commentingMay reveal some mistakes and misconduct and improve public perception of article implicationsNot all communities use social media for commenting and other academic purposes

Since there are no universally acceptable criteria for selecting reviewers and structuring their comments, instructions of all peer-reviewed journal should specify priorities, models, and expected review outcomes [ 20 ]. Monitoring and reporting average peer review timelines is also required to encourage timely evaluations and avoid delays. Depending on journal policies and article types, the first round of peer review may last from a few days to a few weeks. The fast-track review (up to 3 days) is practised by some top journals which process clinical trial reports and other priority items.

In exceptional cases, reviewer contributions may result in substantive changes, appreciated by authors in the official acknowledgments. In most cases, however, reviewers should avoid engaging in the authors’ research and writing. They should refrain from instructing the authors on additional tests and data collection as these may delay publication of original submissions with conclusive results.

Established publishers often employ advanced editorial management systems that support reviewers by providing instantaneous access to the review instructions, online structured forms, and some bibliographic databases. Such support enables drafting of evidence-based comments that examine the novelty, ethical soundness, and implications of the reviewed manuscripts [ 21 ].

Encouraging reviewers to submit their recommendations on manuscript acceptance/rejection and related editorial tasks is now a common practice. Skilled reviewers may prompt the editors to reject or transfer manuscripts which fall outside the journal scope, perform additional ethics checks, and minimize chances of publishing erroneous and unethical articles. They may also raise concerns over the editorial strategies in their comments to the editors.

Since reviewer and editor roles are distinct, reviewer recommendations are aimed at helping editors, but not at replacing their decision-making functions. The final decisions rest with handling editors. Handling editors weigh not only reviewer comments, but also priorities related to article types and geographic origins, space limitations in certain periods, and envisaged influence in terms of social media attention and citations. This is why rejections of even flawless manuscripts are likely at early rounds of internal and external evaluations across most peer-reviewed journals.

Reviewers are often requested to comment on language correctness and overall readability of the evaluated manuscripts. Given the wide availability of in-house and external editing services, reviewer comments on language mistakes and typos are categorized as minor. At the same time, non-Anglophone experts’ poor language skills often exclude them from contributing to the peer review in most influential journals [ 22 ]. Comments should be properly edited to convey messages in positive or neutral tones, express ideas of varying degrees of certainty, and present logical order of words, sentences, and paragraphs [ 23 , 24 ]. Consulting linguists on communication culture, passing advanced language courses, and honing commenting skills may increase the overall quality and appeal of the reviewer accomplishments [ 5 , 25 ].

Peer reviewer credits

Various crediting mechanisms have been proposed to motivate reviewers and maintain the integrity of science communication [ 26 ]. Annual reviewer acknowledgments are widely practised for naming manuscript evaluators and appreciating their scholarly contributions. Given the need to weigh reviewer contributions, some journal editors distinguish ‘elite’ reviewers with numerous evaluations and award those with timely and outstanding accomplishments [ 27 ]. Such targeted recognition ensures ethical soundness of the peer review and facilitates promotion of the best candidates for grant funding and academic job appointments [ 28 ].

Also, large publishers and learned societies issue certificates of excellence in reviewing which may include Continuing Professional Development (CPD) points [ 29 ]. Finally, an entirely new crediting mechanism is proposed to award bonus points to active reviewers who may collect, transfer, and use these points to discount gold open-access charges within the publisher consortia [ 30 ].

With the launch of Publons ( http://publons.com/ ) and its integration with Web of Science Group (Clarivate Analytics), reviewer recognition has become a matter of scientific prestige. Reviewers can now freely open their Publons accounts and record their contributions to online journals with Digital Object Identifiers (DOI). Journal editors, in turn, may generate official reviewer acknowledgments and encourage reviewers to forward them to Publons for building up individual reviewer and journal profiles. All published articles maintain e-links to their review records and post-publication promotion on social media, allowing the reviewers to continuously track expert evaluations and comments. A paid-up partnership is also available to journals and publishers for automatically transferring peer-review records to Publons upon mutually acceptable arrangements.

Listing reviewer accomplishments on an individual Publons profile showcases scholarly contributions of the account holder. The reviewer accomplishments placed next to the account holders’ own articles and editorial accomplishments point to the diversity of scholarly contributions. Researchers may establish links between their Publons and ORCID accounts to further benefit from complementary services of both platforms. Publons Academy ( https://publons.com/community/academy/ ) additionally offers an online training course to novice researchers who may improve their reviewing skills under the guidance of experienced mentors and journal editors. Finally, journal editors may conduct searches through the Publons platform to select the best reviewers across academic disciplines.

Peer review ethics

Prior to accepting reviewer invitations, scholars need to weigh a number of factors which may compromise their evaluations. First of all, they are required to accept the reviewer invitations if they are capable of timely submitting their comments. Peer review timelines depend on article type and vary widely across journals. The rules of transparent publishing necessitate recording manuscript submission and acceptance dates in article footnotes to inform readers of the evaluation speed and to help investigators in the event of multiple unethical submissions. Timely reviewer accomplishments often enable fast publication of valuable works with positive implications for healthcare. Unjustifiably long peer review, on the contrary, delays dissemination of influential reports and results in ethical misconduct, such as plagiarism of a manuscript under evaluation [ 31 ].

In the times of proliferation of open-access journals relying on article processing charges, unjustifiably short review may point to the absence of quality evaluation and apparently ‘predatory’ publishing practice [ 32 , 33 ]. Authors when choosing their target journals should take into account the peer review strategy and associated timelines to avoid substandard periodicals.

Reviewer primary interests (unbiased evaluation of manuscripts) may come into conflict with secondary interests (promotion of their own scholarly works), necessitating disclosures by filling in related parts in the online reviewer window or uploading the ICMJE conflict of interest forms. Biomedical reviewers, who are directly or indirectly supported by the pharmaceutical industry, may encounter conflicts while evaluating drug research. Such instances require explicit disclosures of conflicts and/or rejections of reviewer invitations.

Journal editors are obliged to employ mechanisms for disclosing reviewer financial and non-financial conflicts of interest to avoid processing of biased comments [ 34 ]. They should also cautiously process negative comments that oppose dissenting, but still valid, scientific ideas [ 35 ]. Reviewer conflicts that stem from academic activities in a competitive environment may introduce biases, resulting in unfair rejections of manuscripts with opposing concepts, results, and interpretations. The same academic conflicts may lead to coercive reviewer self-citations, forcing authors to incorporate suggested reviewer references or face negative feedback and an unjustified rejection [ 36 ]. Notably, several publisher investigations have demonstrated a global scale of such misconduct, involving some highly cited researchers and top scientific journals [ 37 ].

Fake peer review, an extreme example of conflict of interest, is another form of misconduct that has surfaced in the time of mass proliferation of gold open-access journals and publication of articles without quality checks [ 38 ]. Fake reviews are generated by manipulating authors and commercial editing agencies with full access to their own manuscripts and peer review evaluations in the journal editorial management systems. The sole aim of these reviews is to break the manuscript evaluation process and to pave the way for publication of pseudoscientific articles. Authors of these articles are often supported by funds intended for the growth of science in non-Anglophone countries [ 39 ]. Iranian and Chinese authors are often caught submitting fake reviews, resulting in mass retractions by large publishers [ 38 ]. Several suggestions have been made to overcome this issue, with assigning independent reviewers and requesting their ORCID IDs viewed as the most practical options [ 40 ].

Conclusions

The peer review process is regulated by publishers and editors, enforcing updated global editorial recommendations. Selecting the best reviewers and providing authors with constructive comments may improve the quality of published articles. Reviewers are selected in view of their professional backgrounds and skills in research reporting, statistics, ethics, and language. Quality reviewer comments attract superior submissions and add to the journal’s scientific prestige [ 41 ].

In the era of digitization and open science, various online tools and platforms are available to upgrade the peer review and credit experts for their scholarly contributions. With its links to the ORCID platform and social media channels, Publons now offers the optimal model for crediting and keeping track of the best and most active reviewers. Publons Academy additionally offers online training for novice researchers who may benefit from the experience of their mentoring editors. Overall, reviewer training in how to evaluate journal submissions and avoid related misconduct is an important process, which some indexed journals are experimenting with [ 42 ].

The timelines and rigour of the peer review may change during the current pandemic. However, journal editors should mobilize their resources to avoid publication of unchecked and misleading reports. Additional efforts are required to monitor published contents and encourage readers to post their comments on publishers’ online platforms (blogs) and other social media channels [ 43 , 44 ].

The authors declare no conflict of interest.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • NEWS Q&A
  • 01 September 2022

The researchers using AI to analyse peer review

  • Richard Van Noorden

You can also search for this author in PubMed   Google Scholar

Do more-highly cited journals have higher-quality peer review? Reviews are generally confidential and the definition of ‘quality’ is elusive, so this is a difficult question to answer. But researchers who used machine learning to study 10,000 peer-review reports in biomedical journals have tried. They invented proxy measures for quality, which they term thoroughness and helpfulness.

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

Nature 609 , 455 (2022)

doi: https://doi.org/10.1038/d41586-022-02787-5

This interview has been edited for length and clarity.

Severin, A. et al. Preprint at https://arxiv.org/abs/2207.09821 (2022).

van Rooyen, S., Black, N. & Godlee, F. J. Clin. Epidemiol. 52 , 625–629 (1999).

Article   PubMed   Google Scholar  

Superchi, C. et al. BMJ Open 10 , e035604 (2020).

Buljan, I., Garcia-Costa, D., Grimaldo, F., Squazzoni, F. & Marušić, A. eLife 9 , e53249 (2020).

Squazzoni, F. et al. Sci. Adv. 7 , eabd0299 (2021).

Eve, M. P. et al. Reading Peer Review (Cambridge Univ. Press, 2021).

Google Scholar  

Download references

Related Articles

peer review analysis of research

  • Peer review

South Korean science on the global stage

South Korean science on the global stage

Nature Index 21 AUG 24

How South Korea can build better gender diversity into research

How South Korea can build better gender diversity into research

The citation black market: schemes selling fake references alarm scientists

The citation black market: schemes selling fake references alarm scientists

News 20 AUG 24

Cash for catching scientific errors

Cash for catching scientific errors

Technology Feature 19 AUG 24

Who will make AlphaFold3 open source? Scientists race to crack AI model

Who will make AlphaFold3 open source? Scientists race to crack AI model

News 23 MAY 24

Pay researchers to spot errors in published papers

Pay researchers to spot errors in published papers

World View 21 MAY 24

Senior Researcher-Experimental Leukemia Modeling, Mullighan Lab

Memphis, Tennessee

St. Jude Children's Research Hospital (St. Jude)

peer review analysis of research

Assistant or Associate Professor (Research-Educator)

The Center for Molecular Medicine and Genetics in the Wayne State University School of Medicine (http://genetics.wayne.edu/) is expanding its high-...

Detroit, Michigan

Wayne State University

Postdoctoral Fellow – Cancer Immunotherapy

Tampa, Florida

H. Lee Moffitt Cancer Center & Research Institute

peer review analysis of research

Postdoctoral Associate - Specialist

Houston, Texas (US)

Baylor College of Medicine (BCM)

peer review analysis of research

Postdoctoral Associate- CAR T Cells, Synthetic Biology

peer review analysis of research

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • What Is Peer Review? | Types & Examples

What Is Peer Review? | Types & Examples

Published on 6 May 2022 by Tegan George . Revised on 2 September 2022.

Peer review, sometimes referred to as refereeing , is the process of evaluating submissions to an academic journal. Using strict criteria, a panel of reviewers in the same subject area decides whether to accept each submission for publication.

Peer-reviewed articles are considered a highly credible source due to the stringent process they go through before publication.

There are various types of peer review. The main difference between them is to what extent the authors, reviewers, and editors know each other’s identities. The most common types are:

  • Single-blind review
  • Double-blind review
  • Triple-blind review

Collaborative review

Open review.

Relatedly, peer assessment is a process where your peers provide you with feedback on something you’ve written, based on a set of criteria or benchmarks from an instructor. They then give constructive feedback, compliments, or guidance to help you improve your draft.

Table of contents

What is the purpose of peer review, types of peer review, the peer review process, providing feedback to your peers, peer review example, advantages of peer review, criticisms of peer review, frequently asked questions about peer review.

Many academic fields use peer review, largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the manuscript. For this reason, academic journals are among the most credible sources you can refer to.

However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure.

Peer assessment is often used in the classroom as a pedagogical tool. Both receiving feedback and providing it are thought to enhance the learning process, helping students think critically and collaboratively.

Prevent plagiarism, run a free check.

Depending on the journal, there are several types of peer review.

Single-blind peer review

The most common type of peer review is single-blind (or single anonymised) review . Here, the names of the reviewers are not known by the author.

While this gives the reviewers the ability to give feedback without the possibility of interference from the author, there has been substantial criticism of this method in the last few years. Many argue that single-blind reviewing can lead to poaching or intellectual theft or that anonymised comments cause reviewers to be too harsh.

Double-blind peer review

In double-blind (or double anonymised) review , both the author and the reviewers are anonymous.

Arguments for double-blind review highlight that this mitigates any risk of prejudice on the side of the reviewer, while protecting the nature of the process. In theory, it also leads to manuscripts being published on merit rather than on the reputation of the author.

Triple-blind peer review

While triple-blind (or triple anonymised) review – where the identities of the author, reviewers, and editors are all anonymised – does exist, it is difficult to carry out in practice.

Proponents of adopting triple-blind review for journal submissions argue that it minimises potential conflicts of interest and biases. However, ensuring anonymity is logistically challenging, and current editing software is not always able to fully anonymise everyone involved in the process.

In collaborative review , authors and reviewers interact with each other directly throughout the process. However, the identity of the reviewer is not known to the author. This gives all parties the opportunity to resolve any inconsistencies or contradictions in real time, and provides them a rich forum for discussion. It can mitigate the need for multiple rounds of editing and minimise back-and-forth.

Collaborative review can be time- and resource-intensive for the journal, however. For these collaborations to occur, there has to be a set system in place, often a technological platform, with staff monitoring and fixing any bugs or glitches.

Lastly, in open review , all parties know each other’s identities throughout the process. Often, open review can also include feedback from a larger audience, such as an online forum, or reviewer feedback included as part of the final published product.

While many argue that greater transparency prevents plagiarism or unnecessary harshness, there is also concern about the quality of future scholarship if reviewers feel they have to censor their comments.

In general, the peer review process includes the following steps:

  • First, the author submits the manuscript to the editor.
  • Reject the manuscript and send it back to the author, or
  • Send it onward to the selected peer reviewer(s)
  • Next, the peer review process occurs. The reviewer provides feedback, addressing any major or minor issues with the manuscript, and gives their advice regarding what edits should be made.
  • Lastly, the edited manuscript is sent back to the author. They input the edits and resubmit it to the editor for publication.

The peer review process

In an effort to be transparent, many journals are now disclosing who reviewed each article in the published product. There are also increasing opportunities for collaboration and feedback, with some journals allowing open communication between reviewers and authors.

It can seem daunting at first to conduct a peer review or peer assessment. If you’re not sure where to start, there are several best practices you can use.

Summarise the argument in your own words

Summarising the main argument helps the author see how their argument is interpreted by readers, and gives you a jumping-off point for providing feedback. If you’re having trouble doing this, it’s a sign that the argument needs to be clearer, more concise, or worded differently.

If the author sees that you’ve interpreted their argument differently than they intended, they have an opportunity to address any misunderstandings when they get the manuscript back.

Separate your feedback into major and minor issues

It can be challenging to keep feedback organised. One strategy is to start out with any major issues and then flow into the more minor points. It’s often helpful to keep your feedback in a numbered list, so the author has concrete points to refer back to.

Major issues typically consist of any problems with the style, flow, or key points of the manuscript. Minor issues include spelling errors, citation errors, or other smaller, easy-to-apply feedback.

The best feedback you can provide is anything that helps them strengthen their argument or resolve major stylistic issues.

Give the type of feedback that you would like to receive

No one likes being criticised, and it can be difficult to give honest feedback without sounding overly harsh or critical. One strategy you can use here is the ‘compliment sandwich’, where you ‘sandwich’ your constructive criticism between two compliments.

Be sure you are giving concrete, actionable feedback that will help the author submit a successful final draft. While you shouldn’t tell them exactly what they should do, your feedback should help them resolve any issues they may have overlooked.

As a rule of thumb, your feedback should be:

  • Easy to understand
  • Constructive

Below is a brief annotated research example. You can view examples of peer feedback by hovering over the highlighted sections.

Influence of phone use on sleep

Studies show that teens from the US are getting less sleep than they were a decade ago (Johnson, 2019) . On average, teens only slept for 6 hours a night in 2021, compared to 8 hours a night in 2011. Johnson mentions several potential causes, such as increased anxiety, changed diets, and increased phone use.

The current study focuses on the effect phone use before bedtime has on the number of hours of sleep teens are getting.

For this study, a sample of 300 teens was recruited using social media, such as Facebook, Instagram, and Snapchat. The first week, all teens were allowed to use their phone the way they normally would, in order to obtain a baseline.

The sample was then divided into 3 groups:

  • Group 1 was not allowed to use their phone before bedtime.
  • Group 2 used their phone for 1 hour before bedtime.
  • Group 3 used their phone for 3 hours before bedtime.

All participants were asked to go to sleep around 10 p.m. to control for variation in bedtime . In the morning, their Fitbit showed the number of hours they’d slept. They kept track of these numbers themselves for 1 week.

Two independent t tests were used in order to compare Group 1 and Group 2, and Group 1 and Group 3. The first t test showed no significant difference ( p > .05) between the number of hours for Group 1 ( M = 7.8, SD = 0.6) and Group 2 ( M = 7.0, SD = 0.8). The second t test showed a significant difference ( p < .01) between the average difference for Group 1 ( M = 7.8, SD = 0.6) and Group 3 ( M = 6.1, SD = 1.5).

This shows that teens sleep fewer hours a night if they use their phone for over an hour before bedtime, compared to teens who use their phone for 0 to 1 hours.

Peer review is an established and hallowed process in academia, dating back hundreds of years. It provides various fields of study with metrics, expectations, and guidance to ensure published work is consistent with predetermined standards.

  • Protects the quality of published research

Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. Any content that raises red flags for reviewers can be closely examined in the review stage, preventing plagiarised or duplicated research from being published.

  • Gives you access to feedback from experts in your field

Peer review represents an excellent opportunity to get feedback from renowned experts in your field and to improve your writing through their feedback and guidance. Experts with knowledge about your subject matter can give you feedback on both style and content, and they may also suggest avenues for further research that you hadn’t yet considered.

  • Helps you identify any weaknesses in your argument

Peer review acts as a first defence, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process. This way, you’ll end up with a more robust, more cohesive article.

While peer review is a widely accepted metric for credibility, it’s not without its drawbacks.

  • Reviewer bias

The more transparent double-blind system is not yet very common, which can lead to bias in reviewing. A common criticism is that an excellent paper by a new researcher may be declined, while an objectively lower-quality submission by an established researcher would be accepted.

  • Delays in publication

The thoroughness of the peer review process can lead to significant delays in publishing time. Research that was current at the time of submission may not be as current by the time it’s published.

  • Risk of human error

By its very nature, peer review carries a risk of human error. In particular, falsification often cannot be detected, given that reviewers would have to replicate entire experiments to ensure the validity of results.

Peer review is a process of evaluating submissions to an academic journal. Utilising rigorous criteria, a panel of reviewers in the same subject area decide whether to accept each submission for publication.

For this reason, academic journals are often considered among the most credible sources you can use in a research project – provided that the journal itself is trustworthy and well regarded.

Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. It also represents an excellent opportunity to get feedback from renowned experts in your field.

It acts as a first defence, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process.

Peer-reviewed articles are considered a highly credible source due to this stringent process they go through before publication.

In general, the peer review process follows the following steps:

  • Reject the manuscript and send it back to author, or
  • Lastly, the edited manuscript is sent back to the author. They input the edits, and resubmit it to the editor for publication.

Many academic fields use peer review , largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the published manuscript.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

George, T. (2022, September 02). What Is Peer Review? | Types & Examples. Scribbr. Retrieved 21 August 2024, from https://www.scribbr.co.uk/research-methods/peer-reviews/

Is this article helpful?

Tegan George

Tegan George

Other students also liked, what is a double-blind study | introduction & examples, a quick guide to experimental design | 5 steps & examples, data cleaning | a guide with examples & steps.

  • Research article
  • Open access
  • Published: 06 March 2019

Tools used to assess the quality of peer review reports: a methodological systematic review

  • Cecilia Superchi   ORCID: orcid.org/0000-0002-5375-6018 1 , 2 , 3 ,
  • José Antonio González 1 ,
  • Ivan Solà 4 , 5 ,
  • Erik Cobo 1 ,
  • Darko Hren 6 &
  • Isabelle Boutron 7  

BMC Medical Research Methodology volume  19 , Article number:  48 ( 2019 ) Cite this article

25k Accesses

42 Citations

65 Altmetric

Metrics details

A strong need exists for a validated tool that clearly defines peer review report quality in biomedical research, as it will allow evaluating interventions aimed at improving the peer review process in well-performed trials. We aim to identify and describe existing tools for assessing the quality of peer review reports in biomedical research.

We conducted a methodological systematic review by searching PubMed, EMBASE (via Ovid) and The Cochrane Methodology Register (via The Cochrane Library) as well as Google® for all reports in English describing a tool for assessing the quality of a peer review report in biomedical research. Data extraction was performed in duplicate using a standardized data extraction form. We extracted information on the structure, development and validation of each tool. We also identified quality components across tools using a systematic multi-step approach and we investigated quality domain similarities among tools by performing hierarchical, complete-linkage clustering analysis.

We identified a total number of 24 tools: 23 scales and 1 checklist. Six tools consisted of a single item and 18 had several items ranging from 4 to 26. None of the tools reported a definition of ‘quality’. Only 1 tool described the scale development and 10 provided measures of validity and reliability. Five tools were used as an outcome in a randomized controlled trial (RCT). Moreover, we classified the quality components of the 18 tools with more than one item into 9 main quality domains and 11 subdomains. The tools contained from two to seven quality domains. Some domains and subdomains were considered in most tools such as the detailed/thorough (11/18) nature of reviewer’s comments. Others were rarely considered, such as whether or not the reviewer made comments on the statistical methods (1/18).

Several tools are available to assess the quality of peer review reports; however, the development and validation process is questionable and the concepts evaluated by these tools vary widely. The results from this study and from further investigations will inform the development of a new tool for assessing the quality of peer review reports in biomedical research.

Peer Review reports

The use of editorial peer review originates in the eighteenth century [ 1 ]. It is a longstanding and established process that generally aims to provide a fair decision-making mechanism and improve the quality of a submitted manuscript [ 2 ]. Despite the long history and application of the peer review system, its efficacy is still a matter of controversy [ 3 , 4 , 5 , 6 , 7 ]. About 30 years after the first international Peer Review Congress, there are still ‘scarcely any bars to eventual publication. There seems to be no study too fragmented, no hypothesis too trivial [...] for a paper to end up in print’ (Drummond Rennie, chair of the advisory board) [ 8 ].

Recent evidence suggests that many current editors and peer reviewers in biomedical journals still lack the appropriate competencies [ 9 ]. In particular, it has been shown that peer reviewers rarely receive formal training [ 3 ]. Moreover, their capacity to detect errors [ 10 , 11 ], identify deficiencies in reporting [ 12 ] and spin [ 13 ] has been found lacking.

Some systematic reviews have been performed to estimate the effect of interventions aimed at improving the peer review process [ 2 , 14 , 15 ]. These studies showed that there is still a lack of evidence supporting the use of interventions to improve the quality of the peer review process. Furthermore, Bruce and colleagues highlighted the urgent need to clarify outcomes, such as peer review report quality, that should be used in randomized controlled trials evaluating these interventions [ 15 ].

A validated tool that clearly defines peer review report quality in biomedical research is greatly needed. This will allow researchers to have a structured instrument to evaluate the impact of interventions aimed at improving the peer review process in well-performed trials. Such a tool could also be regularly used by editors to evaluate the work of reviewers.

Herein, as starting point for the development of a new tool, we identify and describe existing tools that assess the quality of peer review reports in biomedical research.

Study design

We conducted a methodological systematic review and followed the standard Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines [ 16 ]. The quality of peer review reports is an outcome that in the long term is related to clinical relevance and patient care. However, the protocol was not registered in PROSPERO, as this review does not contain direct health-related outcomes [ 17 ].

Information sources and search strategy

We searched PubMed, EMBASE (via Ovid) and The Cochrane Methodology Register (via The Cochrane Library) from their inception to October 27, 2017 as well as Google® (search date: October 20, 2017) for all reports describing a tool to assess the quality of a peer review report in biomedical research. Search strategies were refined in collaboration with an expert methodologist (IS) and are presented in the Additional file  1 . We hand-searched the citation lists of included papers and consulted a senior editor with expertise in editorial policies and peer review processes to further identify relevant reports.

Eligibility criteria

We included all reports describing a tool to assess the quality of a peer review report. Sanderson and colleagues defined a tool as ‘any structured instrument aimed at aiding the user to assess the quality [...]’ [ 18 ]. Building on this definition, we defined a quality tool as any structured or unstructured instrument assisting the user to assess the quality of peer review report (for definitions see Table  1 ). We restricted inclusion to the English language.

Study selection

We exported the references retrieved from the search into the reference manager Endnote X7 (Clarivate Analytics, Philadelphia, United States), which was subsequently used to remove duplicates. We reviewed all records manually to verify and remove duplicates that had not been previously detected. A reviewer (CS) screened all titles and abstracts of the retrieved citations. A second reviewer (JAG) carried out quality control on a 25% random sample obtained using the statistical software R 3.3.3 [ 19 ]. We obtained and independently examined the full-text copies of potentially eligible reports for further assessment. In the case of disagreement, consensus was determined by a discussion or by involving a third reviewer (DH). We reported the result of this process through a PRISMA flowchart [ 16 ]. When several tools were reported in the same article, they were included as separate tools. When a tool was reported in more than one article, we extracted data from all related reports.

Data extraction

General characteristics of tools.

We designed a data extraction form using Google® Docs and extracted the general characteristics of the tools. We determined whether the tool was scale or checklist. We defined a tool as a scale when it included a numeric or nominal overall quality score while we considered it as a checklist when an overall quality score was not present. We recorded the total number of items (for definitions see Table 1 ). For scales with more than 1 item we extracted how items were weighted, how the overall score was calculated, and the scoring range. Moreover, we checked whether the scoring instructions were adequately defined, partially defined, or not defined according to the subjective judgement of two reviewers (CS and JAG) (an example of the definition for scoring instructions is shown in Table  2 ). Finally, we extracted all information related to the development, validation, and assessment of the tool’s reliability and if the concept of quality was defined.

Two reviewers (CS and JAG) piloted and refined the data extraction form on a random 5% sample of extracted articles. Full data extraction was conducted by two reviewers (CS and JAG) working independently for all included articles. In the case of disagreement, consensus was obtained by discussion or by involving a third reviewer (DH). Authors of the reports were contacted in cases where we needed further clarification of the tool.

Quality components of the peer review report considered in the tools

We followed the systematic multi-step approach recently described by Gentles [ 20 ], which is based on a constant comparative method of analysis developed within the Grounded Theory approach [ 21 ]. Initially, a researcher (CS) extracted all items included in the tools and for each item identified a ‘key concept’ representing a quality component of peer review reports. Next, two researchers (CS and DH) organized the key concepts into a domain-specific matrix (analogous to the topic-specific matrices described by Gentles). Initially, the matrix consisted of domains for peer review report quality, followed by items representative of each domain and references to literature sources that items were extracted from. As the analysis progressed, subdomains were created and the final version of the matrix included domains, subdomains, items and references.

Furthermore, we calculated the proportions of domains based on the number of items included in each domain for each tool. According to the proportions obtained, we created a domain profile for each tool. Then, we calculated the matrix of Euclidean distances between the domain profiles. These distances were used to perform the hierarchical, complete-linkage clustering analysis, which provided us with a tree structure that we represent in a chart. Through this graphical summary, we were able to identify domain similarities among the different tools, which helped us draw our analytical conclusions. The calculations and graphical representations were obtained using the statistical software R 3.3.3 [ 19 ].

Study selection and general characteristics of reports

The screening process is summarized in a flow diagram (Fig. 1 ). Of the 4312 records retrieved, we finally included 46 reports: 39 research articles; 3 editorials; 2 information guides; 1 was a letter to the editor and 1 study was available only as an abstract (excluded studies are listed in Additional file  2 ; included studies are listed in Additional file  3 ).

figure 1

Study selection flow diagram

General characteristics of the tools

In the 46 reports, we identified 24 tools, including 23 scales and 1 checklist. The tools were developed from 1985 to 2017. Four tools had from 2 to 4 versions [ 22 , 23 , 24 , 25 ]. Five tools were used as an outcome in a randomized controlled trial [ 23 , 25 , 26 , 27 , 28 ]. Table  3 lists the general characteristics of the identified tools. Table  4 presents a more complete descriptive summary of the tools’ characteristics, including types and measures of validity and reliability.

Six scales consisted of a single item enquiring into the overall quality of the peer review report, all of them based on directly asking users to score the overall quality [ 22 , 25 , 29 , 30 , 31 , 32 ]. These tools assessed the quality of a peer review report by using: 1) a 4 or 5 Likert point scale ( n  = 4); 2) as ‘good’, ‘fair’ and ‘poor’ ( n  = 1); and 3) a restricted scale from 80 to 100 (n = 1). Seventeen scales and one checklist had several items ranging in number from 4 to 26. Of these, 10 used the same weight for each item [ 23 , 24 , 27 , 28 , 33 , 34 , 35 , 36 , 37 , 38 ]. The overall quality score was the sum of the score for each item ( n  = 3); the mean of the score of the items ( n  = 6); or the summary score ( n  = 11) (for definitions see Table 1 ). Three scales reported more than one way to assess the overall quality [ 23 , 24 , 36 ]. The scoring system instructions were not defined in 67% of the tools.

None of the tools reported the definition of peer review report quality, and only one described the tool development [ 39 ]. The first version of this tool was designed by a development group composed of four researchers and three editors. It was based on a tool used in an earlier study and that had been developed by reviewing the literature and interviewing editors. Successively, the tool was modified by rewording some questions after some group discussions and a guideline for using the tool was drawn up.

Only 3 tools assessed and reported a validation process [ 39 , 40 , 41 ]. The assessed types of validity included face validity, content validity, construct validity, and preliminary criterion validity. Face and content validity could involve either a sole editor and author or a group of researchers and editors. Construct validity was assessed with multiple regression analysis using discriminant criteria (reviewer characteristics such as age, sex, and country of residence) and convergent criteria (training in epidemiology and/or statistics); or the overall assessment of the peer review report by authors and an assessment of ( n  = 4–8) specific components of the peer review report by editors or authors. Preliminary criterion was assessed by comparing grades obtained by an editor to those obtained by an editor-in-chief using an earlier version of the tool. Reliability was assessed in 9 tools [ 24 , 25 , 26 , 27 , 31 , 36 , 39 , 41 , 42 ]; all reported inter-rater reliability and 2 also reported test-retest reliability. One tool reported the internal consistency measured with the Cronbach’s alpha [ 39 ].

Quality components of the peer review reports considered in the tools with more than one item

We extracted 132 items included in the 18 tools. One item asking for the percentage of co-reviews the reviewer had graded was not included in the classification because it represented a method of measuring reviewer’s performance and not a component of peer review report quality.

We organized the key concepts from each item into ‘topic-specific matrices’ (Additional file  4 ), identifying nine main domains and 11 subdomains: 1) relevance of study ( n  = 9); 2) originality of the study ( n  = 5); 3) interpretation of study results ( n  = 6); 4) strengths and weaknesses of the study ( n  = 12) (general, methods and statistical methods); 5) presentation and organization of the manuscript ( n  = 8); 6) structure of the reviewer’s comments ( n  = 4); 7) characteristics of reviewer’s comments ( n  = 14) (clarity, constructiveness, detail/thoroughness, fairness, knowledgeability, tone); 8) timeliness of the review report ( n  = 7); and 9) usefulness of the review report ( n  = 10) (decision making and manuscript improvement). The total number of tools corresponding to each domain and subdomain is shown in Fig.  2 . An explanation and example of all domains and subdomains is provided in Table  5 . Some domains and subdomains were considered in most tools, such as whether the reviewers’ comments were detailed/thorough ( n  = 11) and constructive ( n  = 9), whether the reviewers’ comments were on the relevance of the study ( n  = 9) and if the peer review report was useful for manuscript improvement ( n  = 9). However, other items were rarely considered, such as whether the reviewer made comments on the statistical methods ( n  = 1).

figure 2

Frequency of quality domains and subdomains

Clustering analysis among tools

We created a domain profile for each tool. For example, the tool developed by Justice et al. consisted of 5 items [ 35 ]. We classified three items under the domain ‘ Characteristics of the reviewer’s comments ’, one under ‘ Timeliness of the review report ’ and one under ‘ Usefulness of the review report ’. According to the aforementioned classification, the domain profile (represented by proportions of domains) for this tool was 0.6:0.2:0.2 for the incorporating domains and 0 for the remaining ones. The hierarchical clustering used the matrix of Euclidean distances among domain profiles, which led to five main clusters (Fig.  3 ).

figure 3

Hierarchical clustering of tools based on the nine quality domains. The figure shows which quality domains are present in each tool. A slice of the chart represents a tool, and each slice is divided into sectors, indicating quality domains (in different colours). The area of each sector corresponds to the proportion of each domain within the tool. For instance, the “Review Rating” tool consists of two domains: Timeliness , meaning that 25% of all its items are encompassed in this domain, and Characteristics of reviewer’s comments occupying the remaining 75%. The blue lines starting from the centre of the chart define how the tools are divided into the five clusters. Clusters #1, #2 and #3 are sub-nodes of a major node grouping all three, meaning that the tools in these clusters have a similar domain profile compared to the tools in clusters #4 and #5

The first cluster consisted of 5 tools developed from 1990 to 2016. All tools included at least one item in the characteristics of the reviewer’s comments domain, representing at least 50% of each domain profile. In the second cluster, there were 3 tools developed from 1994 to 2006. These tools were characterized to incorporate at least one item in the usefulness and timeliness domains. The third cluster included 6 tools that had been developed from 1998 to 2010 and exhibited the most heterogeneous mix of domains. These tools were distinct from the rest because they encompassed items related to interpretation of the study results and originality of the study . Moreover, the third cluster included two tools with different versions and variations. The first, second, and third cluster were linked together in the hierarchical tree that presented tools with at least one quality component grouped in the domain characteristics of the reviewer’s comments. In the fourth cluster, there are 2 tools developed from 2011 to 2017 that consist of at least one component in the strengths and weaknesses domain. Finally, the fifth cluster included 2 tools developed from 2009 to 2012 and which consisted of the same 2 domains. The fourth and fifth clusters were separated from the rest in the hierarchical tree that presented tools with only a few domains.

To the best of our knowledge, this is the first comprehensive review that has systematically identified tools used in biomedical research for assessing the quality of peer review reports. We have identified 24 tools from both the medical literature and an internet search: 23 scales and 1 checklist. One out of four tools consisted of a single item that simply asked the evaluator for a direct assessment of the peer review report’s ‘overall quality’. The remaining tools had between 4 to 26 items in which the overall quality was assessed as the sum of all items, their mean, or as a summary score.

Since a definition of overall quality was not provided, these tools consisted exclusively of a subjective quality assessment by the evaluators. Moreover, we found that only one study reported a rigorous development process of the tool, although it included a very limited number of people. This is of concern because it means that the identified tools were, in fact, not suitable to assess the quality of a peer review report, particularly because they lack a focused theoretical basis. We found 10 tools that were evaluated for validity and reliability; in particular, criterion validity was not assessed for any tool.

Most of the scales with more than one item resulted in a summary score. These scales did not consider how items could be weighted differently. Although commonly used, scales are controversial tools in assessing quality primarily because using a score ‘in summarization weights’ would cause a biased estimation of the measured object [ 43 ]. It is not clear how weights should be assigned to each item of the scale [ 18 ]. Thus different weightings would produce different scales, which could provide varying quality assessments of an individual study [ 44 ].

n our methodological systematic review, we found only one checklist. However, it was neither rigorously developed nor validated and therefore we could not consider it adequate for assessing peer review report quality. We believe that checklists may be a more appropriate means for assessing quality because they do not present an overall score, meaning they do not require a weight for the items.

It is necessary to clearly define what the tool measures. For example, the Risk of Bias (RoB) tool [ 45 ] has a clear aim (to assess trial conduct and not reporting), and it provides a detailed definition of each domain in the tool, including support for judgment. Furthermore, it was developed with transparent procedures, including wide consultation and review of the empirical evidence. Bias and uncertainty can arise when using tools that are not evidence-based, rigorously developed, validated and reliable; and this is particularly true for tools that are used for evaluating interventions aimed at improving the peer review process in RCTs, thus affecting how trial results are interpreted.

We found that most of the items included in the different tools did not cover the scientific aspects of a peer review report nor were constrained to biomedical research. Surprisingly, few tools included an item related to the methods used in the study, and only one inquired about the statistical methods.

In line with a previous study published in 1990 [ 28 ], we believe that the quality components found across all tools could be further organized according to the perspective of either an editor or author, specifically by taking into account the different yet complementary uses of a peer review report. For instance, reviewer’s comments on the relevance of the study and interpretation of the study’s results could assist editors in making an editorial decision, clarity and detail/thoroughness of reviewer’s comments are important attributes which help authors improve manuscript quality. We plan to further investigate the perspectives of biomedical editors and authors towards the quality of peer review reports by conducting an international online survey. We will also include patient editors as survey’s participants as their involvement in the peer review process can further ensure that research manuscripts are relevant and appropriate to end-users [ 46 ].

The present study has strengths but also some limitations. Although we implemented a comprehensive search strategy for reports by following the guidance for conducting methodological reviews [ 20 ], we cannot exclude a possibility that some tools were not identified. Moreover, we limited the eligibility criteria to reports published only in English. Finally, although the number of eligible records we identified through Google® was very limited, it is possible that we introduced selection bias due to a (re)search bubble effect [ 47 ].

Due to the lack of a standard definition of quality, a variety of tools exist for assessing the quality of a peer review report. Overall, we were able to establish 9 quality domains. Between two to seven domains were used among each of the 18 tools. The variety of items and item combinations amongst tools raises concern about variations in the quality of publications across biomedical journals. Low-quality biomedical research implies a tremendous waste of resources [ 48 ] and explicitly affects patients’ lives. We strongly believe that a validated tool is necessary for providing a clear definition of peer review report quality in order to evaluate interventions aimed at improving the peer review process in well-performed trials.

Conclusions

The findings from this methodological systematic review show that the tools for assessing the quality of a peer review report have various components, which have been grouped into 9 domains. We plan to survey a sample of editors and authors in order to refine our preliminary classifications. The results from further investigations will allow us to develop a new tool for assessing the quality of peer review reports. This in turn could be used to evaluate interventions aimed at improving the peer review process in RCTs. Furthermore, it would help editors: 1) evaluate the work of reviewers; 2) provide specific feedback to reviewers; and 3) identify reviewers who provide outstanding review reports. Finally, it might be further used to score the quality of peer review reports in developing programs to train new reviewers.

Abbreviations

Preferred Reporting Items for Systematic Reviews

Randomized controlled trials

Risk of Bias

Kronick DA. Peer review in 18th-century scientific journalism. JAMA. 1990;263(10):1321–2.

Article   CAS   Google Scholar  

Jefferson T, Alderson P, Wager E, Davidoff F. Effects of editorial peer review. JAMA. 2002;287(21):2784–6.

Article   Google Scholar  

Smith R. Peer review: a flawed process at the heart of science and journals. J R Soc Med. 2006;99:178–82.

Baxt WG, Waeckerle JF, Berlin JA, Callaham ML. Who reviews the reviewers? Feasibility of using a fictitious manuscript to evaluate peer reviewer performance. Ann Emerg Med. 1998;32(3):310–7.

Kravitz RL, Franks P, Feldman MD, Gerrity M, Byrne C, William M. Editorial peer reviewers’ recommendations at a general medical journal : are they reliable and do editors care? PLoS One. 2010;5(4):2–6.

Yaffe MB. Re-reviewing peer review. Sci Signal. 2009;2(85):1–3.

Stahel PF, Moore EE. Peer review for biomedical publications : we can improve the system. BMC Med. 2014;12(179):1–4.

Google Scholar  

Rennie D. Make peer review scientific. Nature. 2016;535:31–3.

Moher D. Custodians of high-quality science: are editors and peer reviewers good enough? https://www.youtube.com/watch?v=RV2tknDtyDs&t=454s . Accessed 16 Oct 2017.

Ghimire S, Kyung E, Kang W, Kim E. Assessment of adherence to the CONSORT statement for quality of reports on randomized controlled trial abstracts from four high-impact general medical journals. Trials. 2012;13:77.

Boutron I, Dutton S, Ravaud P, Altman DG. Reporting and interpretation of randomized controlled trials with statistically nonsignificant results. JAMA. 2010;303(20):2058–64.

Hopewell S, Collins GS, Boutron I, Yu L-M, Cook J, Shanyinde M, et al. Impact of peer review on reports of randomised trials published in open peer review journals: retrospective before and after study. BMJ. 2014;349:g4145.

Lazarus C, Haneef R, Ravaud P, Boutron I. Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention. BMC Med Res Methodol. 2015;15:85.

Jefferson T, Rudin M, Brodney Folse S, et al. Editorial peer review for improving the quality of reports of biomedical studies. Cochrane Database Syst Rev. 2007;2:MR000016.

Bruce R, Chauvin A, Trinquart L, Ravaud P, Boutron I. Impact of interventions to improve the quality of peer review of biomedical journals: a systematic review and meta-analysis. BMC Med. 2016;14:85.

Moher D, Liberati A, Tetzlaff J, Altman DG, Group TP. Preferred reporting items for systematic reviews and meta-analyses : the PRISMA statement. PLoS Med. 2009;6(7):e1000097.

NHS. PROSPERO International prospective register of systematic reviews. https://www.crd.york.ac.uk/prospero/ . Accessed 6 Nov 2017.

Sanderson S, Tatt ID, Higgins JPT. Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Intern J Epidemiol. 2007;36:666–76.

R Core Team. R: a language and environment for statistical computing. http://www.r-project.org/ . Accessed 4 Dec 2017.

Gentles SJ, Charles C, Nicholas DB, Ploeg J, McKibbon KA. Reviewing the research methods literature: principles and strategies illustrated by a systematic overview of sampling in qualitative research. Syst Rev. 2016;5:172.

Glaser B, Strauss A. The discovery of grounded theory. Chicago: Aldine; 1967.

Friedman DP. Manuscript peer review at the AJR: facts, figures, and quality assessment. Am J Roentgenol. 1995;164(4):1007–9.

Black N, Van Rooyen S, Godlee F, Smith R, Evans S. What makes a good reviewer and a good review for a general medical journal? JAMA. 1998;280(3):231–3.

Henly SJ, Dougherty MC. Quality of manuscript reviews in nursing research. Nurs Outlook. 2009;57(1):18–26.

Callaham ML, Baxt WG, Waeckerle JF, Wears RL. Reliability of editors’ subjective quality ratings of peer reviews of manuscripts. JAMA. 1998;280(3):229–31.

Callaham ML, Knopp RK, Gallagher EJ. Effect of written feedback by editors on quality of reviews: two randomized trials. JAMA. 2002;287(21):2781–3.

Van Rooyen S, Godlee F, Evans S, Black N, Smith R. Effect of open peer review on quality of reviews and on reviewers ’ recommendations : a randomised trial. BMJ. 1999;318(7175):23–7.

Mcnutt RA, Evans AT, Fletcher RH, Fletcher SW. The effects of blinding on the quality of peer review. JAMA. 1990;263(10):1371–6.

Moore A, Jones R. Supporting and enhancing peer review in the BJGP. Br J Gen Pract. 2014;64(624):e459–61.

Stossel TP. Reviewer status and review quality. N Engl J Med. 1985;312(10):658–9.

Thompson SR, Agel J, Losina E. The JBJS peer-review scoring scale: a valid, reliable instrument for measuring the quality of peer review reports. Learn Publ. 2016;29:23–5.

Rajesh A, Cloud G, Harisinghani MG. Improving the quality of manuscript reviews : impact of introducing a structured electronic template to submit reviews. AJR. 2013;200:20–3.

Shattell MM, Chinn P, Thomas SP, Cowling WR. Authors’ and editors’ perspectives on peer review quality in three scholarly nursing journals. J Nurs Scholarsh. 2010;42(1):58–65.

Jawaid SA, Jawaid M, Jafary MH. Characteristics of reviewers and quality of reviews: a retrospective study of reviewers at Pakistan journal of medical sciences. Pakistan J Med Sci. 2006;22(2):101–6.

Justice AC, Cho MK, Winker MA, Berlin JA. Does masking author identity improve peer review quality ? A randomized controlled trial. JAMA. 1998;280(3):240–3.

Henly SJ, Bennett JA, Dougherty MC. Scientific and statistical reviews of manuscripts submitted to nursing research: comparison of completeness, quality, and usefulness. Nurs Outlook. 2010;58(4):188–99.

Hettyey A, Griggio M, Mann M, Raveh S, Schaedelin FC, Thonhauser KE, et al. Peerage of science: will it work? Trends Ecol Evol. 2012;27(4):189–90.

Publons. Publons for editors: overview. https://static1.squarespace.com/static/576fcda2e4fcb5ab5152b4d8/t/58e21609d482e9ebf98163be/1491211787054/Publons_for_Editors_Overview.pdf . Accessed 20 Oct 2017.

Van Rooyen S, Black N, Godlee F. Development of the review quality instrument (RQI) for assessing peer reviews of manuscripts. J Clin Epidemiol. 1999;52(7):625–9.

Evans AT, McNutt RA, Fletcher SW, Fletcher RH. The characteristics of peer reviewers who produce good-quality reviews. J Gen Intern Med. 1993;8(8):422–8.

Feurer I, Becker G, Picus D, Ramirez E, Darcy M, Hicks M. Evaluating peer reviews: pilot testing of a grading instrument. JAMA. 1994;272(2):98–100.

Landkroon AP, Euser AM, Veeken H. Quality assessment of reviewers’ reports using a simple instrument. Obstet Gynecol. 2006;108(4):979–85.

Greenland S, O’Rourke K. On the bias produced by quality scores in meta-analysis, and a hierarchical view of proposed solutions. Biostatistics. 2001;2(4):463–71.

Jüni P, Witschi A, Bloch R. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999;282(11):1054–60.

Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.

Schroter S, Price A, Flemyng E, et al. Perspectives on involvement in the peer-review process: surveys of patient and public reviewers at two journals. BMJ Open. 2018;8:e023357.

Ćurković M, Košec A. Bubble effect: including internet search engines in systematic reviews introduces selection bias and impedes scientific reproducibility. BMC Med Res Methodol. 2018;18(1):130.

Chalmers I, Bracken MB, Djulbegovic B, Garattini S, Grant J, Gülmezoglu AM, et al. How to increase value and reduce waste when research priorities are set. Lancet. 2014;383(9912):156–65.

Kliewer MA, Freed KS, DeLong DM, Pickhardt PJ, Provenzale JM. Reviewing the reviewers: comparison of review quality and reviewer characteristics at the American journal of roentgenology. AJR. 2005;184(6):1731–5.

Berquist T. Improving your reviewer score: it’s not that difficult. AJR. 2017;209:711–2.

Callaham ML, Mcculloch C. Longitudinal trends in the performance of scientific peer reviewers. Ann Emerg Med. 2011;57(2):141–8.

Yang Y. Effects of training reviewers on quality of peer review: a before-and-after study (Abstract). https://peerreviewcongress.org/abstracts_2009.html . Accessed 7 Nov 2017.

Prechelt L. Review quality collector. https://reviewqualitycollector.org/static/pdf/rqdef-example.pdf . Accessed 20 Oct 2017.

Das Sinha S, Sahni P, Nundy S. Does exchanging comments of Indian and non-Indian reviewers improve the quality of manuscript reviews? Natl Med J India. 1999;12(5):210–3.

Callaham ML, Schriger DL. Effect of structured workshop training on subsequent performance of journal peer reviewers. Ann Emerg Med. 2002;40(3):323–8.

Download references

Acknowledgments

The authors would like to thank the MiRoR consortium for their support, Elizabeth Moylan for helping to identify further relevant reports and Melissa Sharp for providing advice during the writing of this article.

This project was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no 676207. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Availability of data and materials

The datasets supporting the conclusions of the present study will be available in the Zenodo repository in the Methods in Research on Research (MiRoR) community [ https://zenodo.org/communities/miror/?page=1&size=20 ].

Author information

Authors and affiliations.

Department of Statistics and Operations Research, Barcelona-Tech, UPC, c/ Jordi Girona 1-3, 08034, Barcelona, Spain

Cecilia Superchi, José Antonio González & Erik Cobo

INSERM, U1153 Epidemiology and Biostatistics Sorbonne Paris Cité Research Center (CRESS), Methods of therapeutic evaluation of chronic diseases Team (METHODS), F-75014, Paris, France

Cecilia Superchi

Paris Descartes University, Sorbonne Paris Cité, Paris, France

Iberoamerican Cochrane Centre, Hospital de la Santa Creu i Sant Pau, C/ Sant Antoni Maria Claret 167, Pavelló 18 - planta 0, 08025, Barcelona, Spain

CIBER de Epidemiología y Salud Pública (CIBERESP), Madrid, Spain

Department of Psychology, Faculty of Humanities and Social Sciences, University of Split, Split, Croatia

Centre d’épidémiologie Clinique, Hôpital Hôtel-Dieu, 1 place du Paris Notre-Dame, 75004, Paris, France

Isabelle Boutron

You can also search for this author in PubMed   Google Scholar

Contributions

All authors provided intellectual contributions to the development of this study. CS, EC and IB had the initial idea and with JAG and DH, designed the study. CS designed the search in collaboration with IS. CS conducted the screening and JAG carried out a quality control of a 25% random sample. CS and JAG conducted the data extraction. CS conducted the analysis and with JAG designed the figures. CS led the writing of the manuscript. IB led the supervision of the manuscript preparation. All authors provided detailed comments on earlier drafts and approved the final manuscript.

Corresponding author

Correspondence to Cecilia Superchi .

Ethics declarations

Ethics approval and consent to participate.

Not required.

Consent for publication

Not applicable.

Competing interests

All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare that (1) no authors have support from any company for the submitted work; (2) IB is the deputy director of French EQUATOR that might have an interest in the work submitted; (3) no author’s spouse, partner, or children have any financial relationships that could be relevant to the submitted work; and (4) none of the authors has any non-financial interests that could be relevant to the submitted work.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:.

Search strategies. (PDF 182 kb)

Additional file 2:

Excluded studies. (PDF 332 kb)

Additional file 3:

Included studies. (PDF 244 kb)

Additional file 4:

Classification of peer review report quality components. (PDF 2660 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Superchi, C., González, J.A., Solà, I. et al. Tools used to assess the quality of peer review reports: a methodological systematic review. BMC Med Res Methodol 19 , 48 (2019). https://doi.org/10.1186/s12874-019-0688-x

Download citation

Received : 11 July 2018

Accepted : 20 February 2019

Published : 06 March 2019

DOI : https://doi.org/10.1186/s12874-019-0688-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Peer review
  • Quality control
  • Systematic review

BMC Medical Research Methodology

ISSN: 1471-2288

peer review analysis of research

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Fragments of peer review: A quantitative analysis of the literature (1969-2015)

Roles Conceptualization, Data curation, Formal analysis, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Department of Computer Science, University of Valencia, Burjassot, Valencian Community, Spain

Roles Formal analysis, Investigation, Methodology, Supervision, Validation, Writing – original draft

Affiliation Department of Research in Biomedicine and Health, University of Split, Split, Split-Dalmatia, Croatia

Roles Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing

Affiliation Department of Economics and Management, University of Brescia, Brescia, Lombardy, Italy

ORCID logo

  • Francisco Grimaldo, 
  • Ana Marušić, 
  • Flaminio Squazzoni

PLOS

  • Published: February 21, 2018
  • https://doi.org/10.1371/journal.pone.0193148
  • Reader Comments

Fig 1

This paper examines research on peer review between 1969 and 2015 by looking at records indexed from the Scopus database. Although it is often argued that peer review has been poorly investigated, we found that the number of publications in this field doubled from 2005. A half of this work was indexed as research articles, a third as editorial notes and literature reviews and the rest were book chapters or letters. We identified the most prolific and influential scholars, the most cited publications and the most important journals in the field. Co-authorship network analysis showed that research on peer review is fragmented, with the largest group of co-authors including only 2.1% of the whole community. Co-citation network analysis indicated a fragmented structure also in terms of knowledge. This shows that despite its central role in research, peer review has been examined only through small-scale research projects. Our findings would suggest that there is need to encourage collaboration and knowledge sharing across different research communities.

Citation: Grimaldo F, Marušić A, Squazzoni F (2018) Fragments of peer review: A quantitative analysis of the literature (1969-2015). PLoS ONE 13(2): e0193148. https://doi.org/10.1371/journal.pone.0193148

Editor: Lutz Bornmann, Max Planck Society, GERMANY

Received: June 9, 2017; Accepted: February 4, 2018; Published: February 21, 2018

Copyright: © 2018 Grimaldo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work has been supported by the TD1306 COST Action PEERE. The first author, Francisco Grimaldo, was also funded by the Spanish Ministry of Economy, Industry and Competitiveness project TIN2015-66972-C5-5-R. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Peer review is central to research. It is essential to ensure the quality of scientific publications, but also to help the scientific community self-regulate its reputation and resource allocation [ 1 ]. Whether directly or indirectly, it also influences funding and publication [ 2 ]. The transition of publishing and reading to the digital era has not changed the value of peer review, although it has stimulated the call for new models and more reliable standards [ 3 – 5 ].

Under the impact of recent scandals, where manipulated research passed the screening of peer review and was eventually published in influential journals, many analysts have suggested that more research is needed on this delicate subject [ 6 – 10 ]. The lack of data and robust evidence on the quality of the process has led many observers even to question the real value of peer review and to contemplate alternatives [ 11 – 13 ].

This study aims to provide a comprehensive analysis of peer review literature from 1969 to 2015, by looking at articles indexed in Scopus. This analysis can help to reveal the structure of the field by finding the more prolific and influential authors, the most authoritative journals and the most active research institutions. By looking at co-authorship and co-citation networks, we measured structural characteristics of the scientific community, including collaboration and knowledge sharing. This was to understand whether, despite the growing number of publications on peer review in the last years, research is excessively fragmented to give rise to a coherent and connected field.

Finally, it is important to note that the period covered by our analysis is highly representative. Indeed, although many analysts suggested that peer review is deeply rooted in the historical evolution of modern science since the 17 th century [ 14 ], recent historical analysis suggested that as an institutionalized system of evaluation in scholarly journals was established systematically only about 70 years ago, where also the terms “peer review” and “referee” became common currency [ 15 ].

Fig 1 (left panel) shows that the number of publications on peer review doubled from 2005. From 2004 to 2015, the annual growth of publications on peer review was 12% on average, reaching 28% and 38% between 2004–2005 and 2005–2006, respectively. The volume of published research grew more than the total number of publications, which had an average growth of 5% from 2004 to 2015, to reach 15% in 2004–2005 ( Fig 1 , right panel). The observed peak-based dynamics of this growth can be related to the impact of the International Congresses on Peer Review and Biomedical Publication, which were regularly held every four years starting from 1989 with the JAMA publishing abstracts and articles from the second, third and fourth editions of the congress [ 10 ]. This was also confirmed by data from PubMed and Web of Science (see Figure A in S1 Appendix ).

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

Number of records on peer review (left) and number of records published in English on any topic from 1969 to 2015 in Scopus (right).

https://doi.org/10.1371/journal.pone.0193148.g001

About half of the records were journal articles, the rest mostly being editorial notes, commentaries, letters and literature reviews (see Figure B in S1 Appendix ). However, the number of original research contributions, e.g., research articles, book chapters or conference proceedings papers, increased from 2005 onward to exceed the number of editorial notes, reviews and letters ( Fig 2 ). This would indicate that empirical, data-driven research is increasing recently.

thumbnail

https://doi.org/10.1371/journal.pone.0193148.g002

Fig 3 shows the top 10 most productive countries by origin of research authors. Peer review is studied predominantly in the US, followed by the UK, Australia, Canada and Germany. While this may also be due to the size-effect of these communities, previous studies have suggested that peer review is intrinsic especially to the Anglo-Saxon institutional context [ 16 ]. However, if we look at the top 10 most productive institutions, in which it is probable that research has been cumulative and more systematic, we also found two prominent European institutions, the ETH Zurich and the University of Zurich ( Fig 4 ). This indicates that research on peer review is truly international.

thumbnail

https://doi.org/10.1371/journal.pone.0193148.g003

thumbnail

https://doi.org/10.1371/journal.pone.0193148.g004

Fig 5 shows the most prolific authors. While the top ones are European social scientists, i.e., Lutz Bornmann and Hans-Dieter Daniel, who published 45 and 34 papers, respectively, the pioneers of research on peer review were medical scholars and journal editors, such as Drummond Rennie, Richard Smith and Annette Flanagin.

thumbnail

https://doi.org/10.1371/journal.pone.0193148.g005

Among publication outlets, the journals that published on peer review most frequently were: Science (n = 136 papers), Nature (n = 110), JAMA (n = 99), Scientometrics (n = 65), Behavioral and Brain Sciences (n = 48), Chemical and Engineering News (34), Academic Medicine (32), Australian Clinical Review (32), Learned Publishing (n = 31) and Research Evaluation (n = 31). However, research articles on peer review were published mostly by JAMA (n = 62), Behavioral and Brain Sciences (n = 44) and Scientometrics (n = 42). This means that top journals such as Science and Nature typically published commentaries, editorial notes or short surveys, while research contributions have mostly been published elsewhere. If we look at the impact of research on peer review on journal citations (see Table A in S1 Appendix ), the impact has been minimal with the exception of Scientometrics , whose articles on peer review significantly contributed to the journal’s success (10.97% of the whole journal citations were received by articles on peer review). However, it is worth noting that the contribution of research on peer review in terms of journal citations has been increasing over time (see Fig 6 for a restricted sample of journals listed in Table B in S1 Appendix ).

thumbnail

https://doi.org/10.1371/journal.pone.0193148.g006

Among the most important topics, looking at the keywords revealed that research has preferably examined the connection between peer review and “quality assurance” (103 papers), “publishing” (93), “research” (76), “open access” (56), “quality improvement” (47), “evaluation” (46), “publication” (44), “assessment” (41), “ethics” (40) and “bibliometrics” (39). The primacy of the link between peer review and the concept of “quality” was confirmed by looking at nouns/verbs/adjectives in the paper titles (“quality” appearing 527 times against “journal” appearing 454 times or “research”, 434 times) and in the abstracts (“quality” recurring 2208 times against “research” 2038 or medical 1014 times). This would confirm that peer review has been mainly viewed as a “quality control” process rather than a collaborative process that would aim at increasing the knowledge value of a manuscript [ 17 ].

Data showed that research on peer review is typically pursued in small collaborative networks (75% of the records had less than three co-authors), with the exception of one article published in 2012, which was co-authored by 198 authors and so was excluded by the following analysis on co-authorship networks to avoid statistical bias (see Figure D in S1 Appendix ). Around 83% of the co-authorship networks included less than six authors (see Figure E in S1 Appendix ). The most prolific authors were also those with more co-authors, although not those with a higher average number of co-authors per paper (see Table E in S1 Appendix ).

The most prolific authors from our analysis were not always those instrumental in connecting the community studying peer review (e.g., compare Table 1 and Fig 5 ). Fragmentation and small-scale collaboration networks were dominant (e.g., see Table B and Figure E in S1 Appendix ). We found 1912 clusters with an average size of 4.1, which is very small. However, it is important to emphasize certain differences in the position of scientists in these three samples. When excluding records published in medicine journals, we found a more connected co-authorship network with scientists working in more cohesive and stable groups, indicated by the lower number of clusters, higher density and shorter diameter in sample 3 in Table 2 , which is not linearly related to decreasing numbers of nodes and edges.

thumbnail

https://doi.org/10.1371/journal.pone.0193148.t001

thumbnail

https://doi.org/10.1371/journal.pone.0193148.t002

To look at this in more detail, we plot the co-authorship network linking all authors of the papers on peer review. Each network’s node was a different author and links were established between two authors whenever they co-authored a paper. The greater the number of papers co-authored between two authors, the higher the thickness of the corresponding link.

The co-authorship network was highly disaggregated, with 7971 authors forming 1910 communities ( Table 1 ). With the exception of a large community of 167 researchers and a dozen of communities including around 30 to 50 scientists, 98% of communities had fewer than 15 scientists. Note that the giant component (n = 167 scientists) represents only 2.1% of the total number of scientists in the sample. It included co-authorship relations between the top 10 most central authors and their collaborators ( Fig 7 ). The situation is different if we look at various largest communities and restrict our analysis to research articles and articles published in non-medicine journals ( Fig 8 ). In this case, collaboration groups were more cohesive (see Fig 8 , right panel).

thumbnail

Note that the node size refers to the author’s betweeness centrality.

https://doi.org/10.1371/journal.pone.0193148.g007

thumbnail

Sample 1 on the left, i.e., sample 3 on the right, i.e., outside medicine). Note that the node size refers to the author’s betweeness centrality.

https://doi.org/10.1371/journal.pone.0193148.g008

In order to look at the internal structure of the field, we built a co-citation network that measured relations between cited publications. It is important to note that here a co-citation meant that two records were cited in the same document. For the sake of clarity, we reported data only on cases in which co-citations were higher than 1.

Fig 9 shows the co-citation network, which included 6402 articles and 71548 references. In the top-right hand corner, there is the community of 84 papers, while the two small clusters at the bottom-centre and middle-left, were examples of isolated co-citation links that were generated by a small number of articles (e.g., the bottom-centre was generated by four citing articles by the same authors with a high number of co-citation links). Table 3 presents the co-citation network metrics, including data on the giant component. Results suggest that the field is characterized by network fragmentation with 192 clusters with a limited size. While the giant component covered 33% of the nodes, it counted only 0.9% of the total number of cited sources in all records. Furthermore, data showed that 79.2% of co-citation links included no more than five cited references.

thumbnail

https://doi.org/10.1371/journal.pone.0193148.g009

thumbnail

https://doi.org/10.1371/journal.pone.0193148.t003

Table 4 shows a selection of the most important references that were instrumental in clustering the co-citation network as part of the giant component. Results demonstrated not only the importance of certain classical sociology of science contributions, e.g., Robert Merton’s work, which showed an interest on peer review since the 1970s; also more recent works, including literature reviews, were important to decrease the disconnection of scientists in the field [ 2 ]. They also show that, at least for the larger co-citation subnetwork, the field is potentially inter-disciplinary, with important contributions from scholars in medicine as well as scholars from sociology and behavioural sciences.

thumbnail

https://doi.org/10.1371/journal.pone.0193148.t004

Discussion and conclusions

Our analysis showed that research on peer review has been rapidly growing, especially from 2005. Not only the number of publications increased; it did also the number of citations and so the impact of research on peer review [ 18 ]. We also found that research is international, with more tradition in the US but with important research groups also in Europe. However, when looking at co-authorship networks, findings indicate that research is fragmented. Scholars do not collaborate on a significant scale, with the largest group of co-authors including only 2.1% of the whole community. When looking at co-citation patterns, we found that also knowledge sharing is fragmented. The larger networks covers only 33% of the nodes, which count only for 0.9% of the total number of cited sources in all records.

This calls for a serious consideration of certain structural problems of studies on peer review. First, difficulties in accessing data from journals and funding agencies and performing large-scale quantitative research have probably limited collaborative research [ 19 ]. While the lack of data may also be due to the confidentiality and anonymity that characterize peer review, it is also possible that editorial boards of journals and administrative bodies of funding agencies have interests in obstructing independent research as a means to protect internal decisions [ 8 ]. However, the full digitalisation of editorial management processes and the increasing emphasis on open data and research integrity among science stakeholders are creating a favourable context in which researchers will be capable of accessing peer review data more frequently and easily soon [ 20 ]. This is expected to stimulate collaboration and increase the scale of research on peer review. Secondly, the lack of specific funding schemes that support research on peer review has probably obstructed the possibility of systematic studies [ 10 ]. This has probably made difficult for scholars to establish large-scale, cross-disciplinary collaboration.

In conclusion, although peer review may reflect context-specificities and disciplinary traditions, the common challenge of understanding the complexity of this process, testing the efficacy of different models in reducing bias and allocating credit and reputation fairly requires ensuring comparison and encouraging collaboration and knowledge sharing across communities [ 21 ]. Here, a recently released document on data sharing by a European project has indicated that data sharing on peer review is instrumental to promote the quality of the process, with relevant collective benefits [ 22 ]. Not only such initiatives are important to improve the quality of research; they can also promote an evidence-based approach to peer review reorganizations and innovations, which is now not so well developed.

Our sample included all records on peer review published from 1969 to 2015, which were extracted from Scopus on July 1 st 2016. We used the Advanced Search tab on the Scopus website to run our query strings (for detail, see below) and exported all available fields for each document retrieved as a CSV (comma separated values format) file. After several tests and checks on the dataset, we identified three samples of records that were hierarchically linked as follows:

  • Sample 1 (n = 6402 documents), which included any paper reporting “peer review” either in the “article title” or “author keywords” fields (the use of other fields, such as “Abstract” and “Keywords”, led to a high number of documents that reported about peer review but were excluded from the sample as we verified that they were not studies on peer review but just papers that had gone through a peer review process). This sample was obtained after deduplication of the following query to Scopus: (TITLE("peer review") OR AUTHKEY("peer review")) AND PUBYEAR < 2016.
  • Sample 2 (n = 3725 documents), which restricted sample 1 to journal articles (already published or just available online), books, book chapters and conference papers, so excluding editorial notes, reviews, commentaries and letters. This sample was obtained after deduplication of the following query to Scopus: (TITLE("peer review") OR AUTHKEY("peer review")) AND (DOCTYPE("ar") OR DOCTYPE ("ip") OR DOCTYPE ("bk") OR DOCTYPE ("ch") OR DOCTYPE ("cp")) AND PUBYEAR < 2016.
  • Sample 3 (n = 1955 documents), which restricted sample 2 to records that were not listed among “Medicine” as subject area. This sample was obtained after deduplication of the following query to Scopus: (TITLE("peer review") OR AUTHKEY("peer review")) AND (DOCTYPE("ar") OR DOCTYPE ("ip") OR DOCTYPE ("bk") OR DOCTYPE ("ch") OR DOCTYPE ("cp")) AND PUBYEAR < 2016 AND NOT(SUBJAREA("MEDI")).

With sample 1, we aimed to exclude records that were not explicitly addressed to peer review as an object of study. With sample 2, we identified only articles that reported results, data or cases. With sample 3, we aimed to understand specificities and differences between studies on peer review in medicine and other studies. If not differently mentioned, we reported results on sample 1. Note that, in order to check data consistency, we compared our Scopus dataset with other datasets and repositories, such as PubMed and WoS (see Figure A in S1 Appendix ).

The queries to Scopus proposed in this paper allowed us to retrieve the corpus at a sufficient level of generality to look at the big picture of this field of research. Querying titles and author keywords about “peer review” did not restrict the search only to specific aspects, contexts or cases in which peer review could have been studied (e.g., peer review of scientific manuscripts). Although these queries could filter out some relevant papers, we strongly believe these cases had only a marginal impact on our analysis. For instance, we tried to use other related search terms and found a few papers from Scopus (e.g. just 2 documents for “grant decision making” and 3 documents for “grant selection”) and a number of false positives (e.g. the first 20 of the 69 documents obtained for “panel review” did not really deal with peer review as a field of research).

In order to visualize the collaboration structure in the field, we calculated co-authorship networks [ 23 ] in all samples. Each node in the co-authorship network represented a researcher, while each edge connected two researchers who co-authored a paper. In order to map knowledge sharing, we extracted co-citation networks [ 24 – 26 ]. In this case, nodes represented bibliographic references while edges connected two references when they were cited in the same paper. These methods are key to understand the emergence and evolution of research on peer review as a field driven by scientists’ connections and knowledge flows [ 27 ].

When constructing co-authorship and co-citation networks, we only used information about documents explicitly dealing with “peer review”. The rationale behind this decision was that we wanted to measure the kind of collaboration that can be attributed to these publications, regardless the total productivity of the scientists involved. Minor data inconsistencies can also happen due to the data exported from Scopus, WoS and PubMed not being complete, clean and free of errors. If a paper is missing, all co-authorship links that can be derived will be missing too. If an author name is written in two ways, two different nodes will represent the same researcher and links will be distributed between them. The continuous refinement, sophistication and precision of the algorithms behind these databases ensure that the amount of mistakes and missing information is irrelevant for a large-scale analysis. In any case, we implemented automatic mechanisms that cleaned data and removed duplicated records that reduced these inconsistencies to a marginal level given the scope of our study (see the R script used to perform the analysis in S1 File ).

Research has extensively used co-authorship and co-citation networks to study collaboration patterns by means of different network descriptors [ 28 ]. Here, we focussed on the following indicators, which were used to extract information from the samples presented above:

  • Number of nodes: the number of different co-authors and different bibliographic references, which was used to provide a structural picture of the community of researchers studying peer review.
  • Number of edges: the sum of all different two-by-two relationships between researchers and between bibliographic references, which was used to represent the collaboration structure of this field of research.
  • Network density: the ratio between the number of edges in the co-authorship network and the total number of edges that this network would have if it were completely connected, which was used to understand whether the community was cohesive or fragmented (i.e., higher the ratio, the more cohesive was the research community).
  • Diameter: the longest shortest path in the network, which indicated the distance between its two farthest nodes (i.e., two authors or two references), so showing the degree of separation in the network.
  • Betweeness centrality: the number of shortest paths between any two nodes that passed through any particular node of the network. Note that nodes around the edge of the network would typically have a lower betweeness centrality, whereas a higher betweeness centrality would indicate, in our case, that a scientist or a paper was connecting, respectively, different parts of the co-authorship or the co-citation network, thus playing a central role in connecting the community.
  • Number and size of clusters: here, we found the number and size of densely connected sub-communities in the network after performing a simple breadth-first search. We used these indicators to understand if the community was characterized by a multitude of sub-groups relatively independent from each other, with someone sub-connecting more researchers.

Supporting information

S1 appendix..

https://doi.org/10.1371/journal.pone.0193148.s001

S1 File. R code script used to perform the quantitative analysis.

https://doi.org/10.1371/journal.pone.0193148.s002

Acknowledgments

We would like to thank Rocio Tortajada for her help in the early stages of this research and Emilia López-Iñesta for her help in formatting the references. This paper was based upon work from the COST Action TD1306 “New Frontiers of Peer Review”-PEERE ( www.peere.org ).

  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 9. The Economist. Unreliable research. Trouble at the lab. 18 Oct 2013. https://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble . Cited 28 Sept 2017.
  • 13. Gould THP. Do we still need peer review? An argument for change. Plymouth, UK: Scarecrow Press. 2013.
  • 16. Lamont M. How professors think. Inside the curious world of academic judgment. Cambridge, MA: Harvard University Press. 2009.
  • 28. Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge University Press; 1994.

Library Homepage

  • Evaluate Information
  • Primary vs Secondary Sources
  • How to Use Sources: The BEAM Method

Understand Peer Review

  • Information Privilege
  • Popular, Online Resources
  • Data Literacy

"Peer review" is an article review process where experts in a specific field will review an anonymous study or article before its potential publication. While there are limits to the the peer review process, this method allows multiple people to rigorously review submitted content (research, ideas, data, results, etc.) to ensure that the the author has met the highest academic standards and has presented relevant findings based on sound methodology. "Peer review" is most often associated with scholarly journals, but books, primarily those published by university presses, can also be peer-reviewed.

In masters and doctoral programs, you are expected to analyze and cite peer-reviewed materials. Keep in mind that even if a journal is considered peer-reviewed, not all of its articles will have been peer-reviewed. Commentaries, opinion articles, book reviews, letters to the editor, and the editor’s introduction likely have not gone through the peer-review process. Check out the following sources to learn more about peer review: 

  • Taylor & Francis Author Services: Understand Peer Review Find out how the peer review process works and how you can use it to ensure every article you publish is as good as it can be.
  • The Peer-Review Process Explained [Video] The peer review process explained defines peer-reviewed articles as scholarship and outlines the peer-review process towards publication

Finding Peer-Reviewed Sources

This screenshot shows the "Peer Review" filter in ProQuest

How can I tell if something is peer-reviewed?

For journals.

The best place to find out if something is peer reviewed is the publisher’s website. Look for information about submitting articles and/or information about the publishing process. 

Additionally, if the journal is in the CityU collection, you can use the Journal Finder to see if the journal is peer reviewed. Keep in mind that not all articles within a peer reviewed journal are necessarily peer reviewed. 

  • Find a Journal

For Books, Edited Collections, and Monographs

The process of peer review generally applies to journal articles, but it is possible for a book to be peer reviewed as well. Although many books go through some sort of editorial or review process, there is not an easy method for determining whether a book is peer reviewed.

One method for locating peer-reviewed books is to take a look at book publications from university presses. Books published by university presses almost always go through a process of peer review. The process of peer review for university presses typically involves two or three independent referees who will initially review the manuscript. If the manuscript receives positive review, the university press will send it to their editorial board, who are all faculty members, for final review. This review process is required in order to obtain membership into the Association of University Presses. You may view the member directory of the Association of University Presses here:

Another method for determining whether a book is peer reviewed is to locate book reviews within scholarly journals on that particular book. These book reviews may provide a deep evaluation regarding the quality of scholarship and authority in the book.

To find a review for a particular book, search for the title in the main library search bar, and filter by "Book Review" under "Content Type"

Screenshot of content type filter

Limitations of Peer Review

Systemic inequities in peer review.

Systemic inequities in the peer review process reflect the broader inequities within academia. According to the National Center for Education Statistics (2021), as of fall 2018, around 75% of of all full-time faculty in degree-granting postsecondary institutions are white. The same systems that exclude communities of color also impact the type of research that gets published---this can be evident in everything from who is granted the time and money to conduct research to the types of research that are valued in publishing. 

Additionally, individual biases held by both reviewers and editors, whether those are intentional or unconscious, can impact the peer review process at an institutional level. This is true both in the review process itself and, in the case of editors, the scope of the research topics and methodologies that are chosen to be included in a collection or valued more broadly within the realm of academic publishing.

While the peer review process is in place to ensure the reliability of published material, the process itself isn't without its flaws or deserving of criticism. In fact, there are several types of peer review processes and some are better than others. For example, things like double-anonymous (sometimes called "double-blind") peer review can help prevent bias based on the perceived race or gender of the author(s) by the reviewer(s), whereas a fully transparent peer review process, like open peer review, can help hold reviewer(s) accountable and allow for more objective reviews.

You can read about the different types of peer review below.

Types of Peer Review and their Limitations

The main types of peer review and their limitations are presented below. It should be noted that none of these fully address the issues of bias against certain types of research. This is especially true for research that disrupts traditional methodologies or work that challenges or criticizes existing power structures and systems. 

Single-Anonymous

The most traditional form of peer review. Identities of the reviewer(s) are hidden from the author(s). This process can aid in perpetuating reviewer bias.

Double-Anonymous

The identities of both the reviewer(s) and the author(s) are kept anonymous from each other. This can help protect a work from reviewer bias but does not protect an author from biases held by an editor.

Triple-Anonymous

The same as double-anonymous but the identity of the author(s) is also hidden from the editor(s).

Open Peer Review

The most transparent option, in this form of peer review, identities of both reviewer(s) and author(s) are known to each other. However, this term can also mean a process in which a reviewer's identity is revealed at some point during the review or publication process  not necessarily at the outset. 

References and Further Reading

To read more about the limitations of peer review and the impact this has on scholarly conversation, please see the below resources.

  • << Previous: Scholarly Sources
  • Next: Information Privilege >>
  • Last Updated: Mar 21, 2024 9:53 AM

CityU Home - CityU Catalog

Creative Commons License

Unfortunately we don't fully support your browser. If you have the option to, please upgrade to a newer version or use Mozilla Firefox , Microsoft Edge , Google Chrome , or Safari 14 or newer. If you are unable to, and need support, please send us your feedback .

We'd appreciate your feedback. Tell us what you think! opens in new tab/window

What is peer review?

Reviewers play a pivotal role in scholarly publishing. The peer review system exists to validate academic work, helps to improve the quality of published research, and increases networking possibilities within research communities. Despite criticisms, peer review is still the only widely accepted method for research validation and has continued successfully with relatively minor changes for some 350 years.

Elsevier relies on the peer review process to uphold the quality and validity of individual articles and the journals that publish them.

Peer review has been a formal part of scientific communication since the first scientific journals appeared more than 300 years ago. The Philosophical Transactions opens in new tab/window of the Royal Society is thought to be the first journal to formalize the peer review process opens in new tab/window under the editorship of Henry Oldenburg (1618- 1677).

Despite many criticisms about the integrity of peer review, the majority of the research community still believes peer review is the best form of scientific evaluation. This opinion was endorsed by the outcome of a survey Elsevier and Sense About Science conducted in 2009 opens in new tab/window and has since been further confirmed by other publisher and scholarly organization surveys. Furthermore, a  2015 survey by the Publishing Research Consortium opens in new tab/window , saw 82% of researchers agreeing that “without peer review there is no control in scientific communication.”

To learn more about peer review, visit Elsevier’s free e-learning platform  Researcher Academy opens in new tab/window and see our resources below.

The review process

The peer review process

Types of peer review.

Peer review comes in different flavours. Each model has its own advantages and disadvantages, and often one type of review will be preferred by a subject community. Before submitting or reviewing a paper, you must therefore check which type is employed by the journal so you are aware of the respective rules. In case of questions regarding the peer review model employed by the journal for which you have been invited to review, consult the journal’s homepage or contact the editorial office directly.  

Single anonymized review

In this type of review, the names of the reviewers are hidden from the author. This is the traditional method of reviewing and is the most common type by far. Points to consider regarding single anonymized review include:

Reviewer anonymity allows for impartial decisions, as the reviewers will not be influenced by potential criticism from the authors.

Authors may be concerned that reviewers in their field could delay publication, giving the reviewers a chance to publish first.

Reviewers may use their anonymity as justification for being unnecessarily critical or harsh when commenting on the authors’ work.

Double anonymized review

Both the reviewer and the author are anonymous in this model. Some advantages of this model are listed below.

Author anonymity limits reviewer bias, such as on author's gender, country of origin, academic status, or previous publication history.

Articles written by prestigious or renowned authors are considered based on the content of their papers, rather than their reputation.

But bear in mind that despite the above, reviewers can often identify the author through their writing style, subject matter, or self-citation – it is exceedingly difficult to guarantee total author anonymity. More information for authors can be found in our  double-anonymized peer review guidelines .

Triple anonymized review

With triple anonymized review, reviewers are anonymous to the author, and the author's identity is unknown to both the reviewers and the editor. Articles are anonymized at the submission stage and are handled in a way to minimize any potential bias towards the authors. However, it should be noted that: 

The complexities involved with anonymizing articles/authors to this level are considerable.

As with double anonymized review, there is still a possibility for the editor and/or reviewers to correctly identify the author(s) from their writing style, subject matter, citation patterns, or other methodologies.

Open review

Open peer review is an umbrella term for many different models aiming at greater transparency during and after the peer review process. The most common definition of open review is when both the reviewer and author are known to each other during the peer review process. Other types of open peer review consist of:

Publication of reviewers’ names on the article page 

Publication of peer review reports alongside the article, either signed or anonymous 

Publication of peer review reports (signed or anonymous) with authors’ and editors’ responses alongside the article 

Publication of the paper after pre-checks and opening a discussion forum to the community who can then comment (named or anonymous) on the article 

Many believe this is the best way to prevent malicious comments, stop plagiarism, prevent reviewers from following their own agenda, and encourage open, honest reviewing. Others see open review as a less honest process, in which politeness or fear of retribution may cause a reviewer to withhold or tone down criticism. For three years, five Elsevier journals experimented with publication of peer review reports (signed or anonymous) as articles alongside the accepted paper on ScienceDirect ( example opens in new tab/window ).

Read more about the experiment

More transparent peer review

Transparency is the key to trust in peer review and as such there is an increasing call towards more  transparency around the peer review process . In an effort to promote transparency in the peer review process, many Elsevier journals therefore publish the name of the handling editor of the published paper on ScienceDirect. Some journals also provide details about the number of reviewers who reviewed the article before acceptance. Furthermore, in order to provide updates and feedback to reviewers, most Elsevier journals inform reviewers about the editor’s decision and their peers’ recommendations. 

Article transfer service: sharing reviewer comments

Elsevier authors may be invited to  transfer  their article submission from one journal to another for free if their initial submission was not successful. 

As a referee, your review report (including all comments to the author and editor) will be transferred to the destination journal, along with the manuscript. The main benefit is that reviewers are not asked to review the same manuscript several times for different journals. 

Tools & resources

Elsevier researcher academy modules.

The certified peer reviewer course opens in new tab/window

Transparency in peer review opens in new tab/window

Reviewers’ Update articles

Peer review using today’s technology

Lifting the lid on publishing peer review reports: an interview with Bahar Mehmani and Flaminio Squazzoni

How face-to-face peer review can benefit authors and journals alike

Innovation in peer review: introducing “volunpeers”

Results masked review: peer review without publication bias

Interesting reads

"Is Peer Review in Crisis?" Perspectives in Publishing No 2, August 2004, by Adrian Mulligan opens in new tab/window

“The history of the peer-review process” Trends in Biotechnology, 2002, by Ray Spier opens in new tab/window

Publishing Research Consortium Peer review survey . 2015. Elsevier; 2015 

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

The emergence of a field: a network analysis of research on peer review

Affiliations.

  • 1 Department of Theoretical Computer Science, Institute of Mathematics, Physics and Mechanics, Jadranska 19, 1000 Ljubljana, Slovenia.
  • 2 Andrej Marušič Institute, University of Primorska, Muzejski trg 2, 6000 Koper, Slovenia.
  • 3 Faculty of Social Sciences, University of Ljubljana, Kardeljeva pl. 5, 1000 Ljubljana, Slovenia.
  • 4 Department of Economics and Management, University of Brescia, Via San Faustino 74/B, 25122 Brescia, Italy.
  • PMID: 29056788
  • PMCID: PMC5629241
  • DOI: 10.1007/s11192-017-2522-8

This article provides a quantitative analysis of peer review as an emerging field of research by revealing patterns and connections between authors, fields and journals from 1950 to 2016. By collecting all available sources from Web of Science, we built a dataset that included approximately 23,000 indexed records and reconstructed collaboration and citation networks over time. This allowed us to trace the emergence and evolution of this field of research by identifying relevant authors, publications and journals and revealing important development stages. Results showed that while the term "peer review" itself was relatively unknown before 1970 ("referee" was more frequently used), publications on peer review significantly grew especially after 1990. We found that the field was marked by three development stages: (1) before 1982, in which most influential studies were made by social scientists; (2) from 1983 to 2002, in which research was dominated by biomedical journals, and (3) from 2003 to 2016, in which specialised journals on science studies, such as Scientometrics, gained momentum frequently publishing research on peer review and so becoming the most influential outlets. The evolution of citation networks revealed a body of 47 publications that form the main path of the field, i.e., cited sources in all the most influential publications. They could be viewed as the main corpus of knowledge for any newcomer in the field.

Keywords: Authors; Citation networks; Journals; Main path; Peer review.

PubMed Disclaimer

Record from web of science

Citation network structure: D C…

Citation network structure: D C = 0 —circle, square; D C = 1…

Growth of the number of…

Growth of the number of works and the citation year distribution

Degree distributions in the citation…

Degree distributions in the citation network

Referee: peer: review

Preprint transformation

Selected strong components

Search path count method (SPC)

Main paths for 100 largest…

Main paths for 100 largest weights

Cuts and islands

SPC islands [20,200]

SPC link Island 1 [100]

Similar articles

  • Critical assessment of progress of medical sciences in Iran and Turkey: the way developing countries with limited resources should make effective contributions to the production of science. Massarrat S, Kolahdoozan S. Massarrat S, et al. Arch Iran Med. 2011 Nov;14(6):370-7. Arch Iran Med. 2011. PMID: 22039839
  • Network Analysis of Publications on Topological Indices from the Web of Science. Bodlaj J, Batagelj V. Bodlaj J, et al. Mol Inform. 2014 Aug;33(8):514-35. doi: 10.1002/minf.201400014. Epub 2014 Jun 13. Mol Inform. 2014. PMID: 27486038
  • A comparative bibliometric analysis of the top 150 cited papers in hypospadiology (1945-2013). O'Kelly F, Nason GJ, McLoughlin LC, Flood HD, Thornhill JA. O'Kelly F, et al. J Pediatr Urol. 2015 Apr;11(2):85.e1-85.e11. doi: 10.1016/j.jpurol.2014.11.022. Epub 2015 Mar 4. J Pediatr Urol. 2015. PMID: 25819379
  • A scoping review of competencies for scientific editors of biomedical journals. Galipeau J, Barbour V, Baskin P, Bell-Syer S, Cobey K, Cumpston M, Deeks J, Garner P, MacLehose H, Shamseer L, Straus S, Tugwell P, Wager E, Winker M, Moher D. Galipeau J, et al. BMC Med. 2016 Feb 2;14:16. doi: 10.1186/s12916-016-0561-2. BMC Med. 2016. PMID: 26837937 Free PMC article. Review.
  • Mapping the historical development of physical activity and health research: A structured literature review and citation network analysis. Varela AR, Pratt M, Harris J, Lecy J, Salvo D, Brownson RC, Hallal PC. Varela AR, et al. Prev Med. 2018 Jun;111:466-472. doi: 10.1016/j.ypmed.2017.10.020. Prev Med. 2018. PMID: 29709233 Review.
  • Natural Language Processing to Identify Digital Learning Tools in Postgraduate Family Medicine: Protocol for a Scoping Review. Yan H, Rahgozar A, Sethuram C, Karunananthan S, Archibald D, Bradley L, Hakimjavadi R, Helmer-Smith M, Jolin-Dahel K, McCutcheon T, Puncher J, Rezaiefar P, Shoppoff L, Liddy C. Yan H, et al. JMIR Res Protoc. 2022 May 2;11(5):e34575. doi: 10.2196/34575. JMIR Res Protoc. 2022. PMID: 35499861 Free PMC article.
  • Editors between Support and Control by the Digital Infrastructure - Tracing the Peer Review Process with Data from an Editorial Management System. Hartstein J, Blümel C. Hartstein J, et al. Front Res Metr Anal. 2021 Oct 19;6:747562. doi: 10.3389/frma.2021.747562. eCollection 2021. Front Res Metr Anal. 2021. PMID: 34738050 Free PMC article.
  • Systematic analysis of the scientific literature on population surveillance. González-Alcaide G, Llorente P, Ramos-Rincón JM. González-Alcaide G, et al. Heliyon. 2020 Oct;6(10):e05141. doi: 10.1016/j.heliyon.2020.e05141. Epub 2020 Oct 1. Heliyon. 2020. PMID: 33029562 Free PMC article.
  • The limitations to our understanding of peer review. Tennant JP, Ross-Hellauer T. Tennant JP, et al. Res Integr Peer Rev. 2020 Apr 30;5:6. doi: 10.1186/s41073-020-00092-1. eCollection 2020. Res Integr Peer Rev. 2020. PMID: 32368354 Free PMC article. Review.
  • Intermediacy of publications. Šubelj L, Waltman L, Traag V, van Eck NJ. Šubelj L, et al. R Soc Open Sci. 2020 Jan 15;7(1):190207. doi: 10.1098/rsos.190207. eCollection 2020 Jan. R Soc Open Sci. 2020. PMID: 32218924 Free PMC article.
  • Batagelj, V. (2003) Efficient algorithms for citation network analysis. http://arxiv.org/abs/cs/0309023 .
  • Batagelj, V. (2007). WoS2Pajek. Manual for version 1.4, July 2016. http://vladowiki.fmf.uni-lj.si/doku.php?id=pajek:wos2pajek .
  • Batagelj V, Cerinšek M. On bibliographic networks. Scientometrics. 2013;96(3):845–864. doi: 10.1007/s11192-012-0940-1. - DOI
  • Batagelj, V., Doreian, P., Ferligoj, A., & Kejžar, N. (2014). Understanding large temporal networks and spatial networks: Exploration, pattern searching, visualization and network evolution. London: Wiley series in computational and quantitative social science, Wiley.
  • Bornmann L. Scientific peer review. Annual Review of Information Science and Technology. 2011;45(1):197–245. doi: 10.1002/aris.2011.1440450112. - DOI

Related information

Linkout - more resources, full text sources.

  • Archivio Istituzionale della Ricerca Unimi
  • Europe PubMed Central
  • PubMed Central

Other Literature Sources

  • scite Smart Citations
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

We apologize for any inconvenience as we update our site to a new look.

peer review analysis of research

  • Walden University
  • Faculty Portal

Evaluating Resources: Peer Review

What is peer review.

The term peer review can be confusing, since in some of your courses you may be asked to review the work of your peers. When we talk about peer-reviewed journal articles, this has nothing to do with your peers!

Peer-reviewed journals, also called refereed journals, are journals that use a specific scholarly review process to try to ensure the accuracy and reliability of published articles. When an article is submitted to a peer-reviewed journal for publication, the journal sends the article to other scholars/experts in that field and has them review the article for accuracy and reliability.

Find out more about peer review with our Peer Review Guide:

  • Peer Review Guide

Types of peer review

Single blind.

In this process, the names of the reviewers are not known to the author(s). The reviewers do know the name of the author(s).

Double blind

Here, neither reviewers or authors know each other's names.

In the open review process, both reviewers and authors know each other's names.

What about editorial review?

Journals also use an editorial review process. This is not the same as peer review. In an editorial review process an article is evaluated for style guidelines and for clarity. Reviewers here do not look at technical accuracy or errors in data or methodology, but instead look at grammar, style, and whether an article is well written.

What is the difference between scholarly and peer review?

Not all scholarly journals are peer reviewed, but all peer-reviewed journals are scholarly.

  • Things that are written for a scholarly or academic audience are considered scholarly writing.
  • Peer-reviewed journals are a part of the larger category of scholarly writing.
  • Scholarly writing includes many resources that are not peer reviewed, such as books, textbooks, and dissertations.

Scholarly writing does not come with a label that says scholarly . You will need to evaluate the resource to see if it is

  • aimed at a scholarly audience
  • reporting research, theories or other types of information important to scholars
  • documenting and citing sources used to help authenticate the research done

The standard peer review process only applies to journals. While scholarly writing has certainly been edited and reviewed, peer review is a specific process only used by peer-reviewed journals. Books and dissertations may be scholarly, but are not considered peer reviewed.

Check out Select the Right Source for help with what kinds of resources are appropriate for discussion posts, assignments, projects, and more:

  • Select the Right Source

How do I locate or verify peer-reviewed articles?

The peer review process is initiated by the journal publisher before an article is even published. Nowhere in the article will it tell you whether or not the article has gone through a peer review process.

You can locate peer-reviewed articles in the Library databases, typically by checking a limiter box.

  • Quick Answer: How do I find scholarly, peer reviewed journal articles?

You can verify whether a journal uses a peer review process by using Ulrich's Periodicals Directory.

  • Quick Answer: How do I verify that my article is peer reviewed?

What about resources that are not peer-reviewed?

Limiting your search to peer review is a way that you can ensure that you're looking at scholarly journal articles, and not popular or trade publications. Because peer-reviewed articles have been vetted by experts in the field, they are viewed as being held to a higher standard, and therefore are considered to be a high quality source. Professors often prefer peer-reviewed articles because they are considered to be of higher quality.

There are times, though, when the information you need may not be available in a peer-reviewed article.

  • You may need to find original work on a theory that was first published in a book.
  • You may need to find very current statistical data that comes from a government website.
  • You may need background information that comes from a scholarly encyclopedia.

You will want to evaluate these resources to make sure that they are the best source for the information you need.

Note: If you are required for an assignment to find information from a peer-reviewed journal, then you will not be able to use non-peer-reviewed sources such as books, dissertations, or government websites. It's always best to clarify any questions over assignments with your professor.

  • Previous Page: Evaluation Methods
  • Next Page: Primary & Secondary Sources
  • Office of Student Disability Services

Walden Resources

Departments.

  • Academic Residencies
  • Academic Skills
  • Career Planning and Development
  • Customer Care Team
  • Field Experience
  • Military Services
  • Student Success Advising
  • Writing Skills

Centers and Offices

  • Center for Social Change
  • Office of Academic Support and Instructional Services
  • Office of Degree Acceleration
  • Office of Research and Doctoral Services
  • Office of Student Affairs

Student Resources

  • Doctoral Writing Assessment
  • Form & Style Review
  • Quick Answers
  • ScholarWorks
  • SKIL Courses and Workshops
  • Walden Bookstore
  • Walden Catalog & Student Handbook
  • Student Safety/Title IX
  • Legal & Consumer Information
  • Website Terms and Conditions
  • Cookie Policy
  • Accessibility
  • Accreditation
  • State Authorization
  • Net Price Calculator
  • Contact Walden

Walden University is a member of Adtalem Global Education, Inc. www.adtalem.com Walden University is certified to operate by SCHEV © 2024 Walden University LLC. All rights reserved.

peer review analysis of research

eLife logo

  • Feature Article

Meta-Research: Large-scale language analysis of peer review reports

Is a corresponding author

  • Daniel Garcia-Costa
  • Francisco Grimaldo
  • Flaminio Squazzoni
  • Ana Marušić
  • Department of Research in Biomedicine and Health, University of Split School of Medicine, Croatia ;
  • Department d'Informàtica, University of Valencia, Spain ;
  • Department of Social and Political Sciences, University of Milan, Italy ;
  • Open access
  • Copyright information
  • Comment Open annotations (there are currently 0 annotations on this page).
  • 2,979 views
  • 294 downloads
  • 20 citations

Share this article

Cite this article.

  • Ivan Buljan
  • Copy to clipboard
  • Download BibTeX
  • Download .RIS
  • Figures and data

Introduction

Data availability, decision letter, author response, article and author information.

Peer review is often criticized for being flawed, subjective and biased, but research into peer review has been hindered by a lack of access to peer review reports. Here we report the results of a study in which text-analysis software was used to determine the linguistic characteristics of 472,449 peer review reports. A range of characteristics (including analytical tone, authenticity, clout, three measures of sentiment, and morality) were studied as a function of reviewer recommendation, area of research, type of peer review and reviewer gender. We found that reviewer recommendation had the biggest impact on the linguistic characteristics of reports, and that area of research, type of peer review and reviewer gender had little or no impact. The lack of influence of research area, type of review or reviewer gender on the linguistic characteristics is a sign of the robustness of peer review.

Most journals rely on peer review to ensure that the papers they publish are of a certain quality, but there are concerns that peer review suffers from a number of shortcomings ( Grimaldo et al., 2018 ; Fyfe et al., 2020 ). These include gender bias, and other less obvious forms of bias, such as more favourable reviews for articles with positive findings, articles by authors from prestigious institutions, or articles by authors from the same country as the reviewer ( Haffar et al., 2019 ; Lee et al., 2013 ; Resnik and Elmore, 2016 ).

Analysing the linguistic characteristics of written texts, speeches, and audio-visual materials is well established in the humanities and psychology ( Pennebaker, 2017 ). A recent example of this is the use of machine learning by Garg et al. to track gender and ethnic stereotypes in the United States over the past 100 years ( Garg et al., 2018 ). Similar techniques have been used to analyse scientific articles, with an early study showing that scientific writing is a complex process that is sensitive to formal and informal standards, context-specific canons and subjective factors ( Hartley et al., 2003 ). Later studies found that fraudulent scientific papers seem to be less readable than non-fraudulent papers ( Markowitz and Hancock, 2016 ), and that papers in economics written by women are better written than equivalent papers by men (and that this gap increases during the peer review process; Hengel, 2018 ). There is clearly scope for these techniques to be used to study other aspects of the research and publishing process.

To date most research on the linguistic characteristics of peer review has focused on comparisons between different types of peer review, and it has been shown that open peer review (in which peer review reports and/or the names of reviewers are made public) leads to longer reports and a more positive emotional tone compared to confidential peer review ( Bravo et al., 2019 ; Bornmann et al., 2012 ). Similar techniques have been used to explore possible gender bias in the peer review of grant applications, but a consensus has not been reached yet ( Marsh et al., 2011 ; Magua et al., 2017 ). To date, however, these techniques have not been applied to the peer review process at a large scale, largely because most journals strictly limit access to peer review reports.

Here we report the results of a linguistic analysis of 472,449 peer review reports from the PEERE database ( Squazzoni et al., 2017 ). The reports came from 61 journals published by Elsevier in four broad areas of research: health and medical sciences (22 journals); life sciences (5); physical sciences (30); social sciences and economics (4). For each review we had data on the following: i) the recommendation made by the reviewer (accept [n = 26,387, 5.6%]; minor revisions required [134,858, 28.5%]; major revisions required [161,696, 34.2%]; reject [n = 149,508, 31.7%]); ii) the broad area of research; iii) the type of peer review used by the journal (single-blind [n = 411,727, 87.1%] or double-blind [n = 60,722, 12.9%]); and the gender of the reviewer (75.9% were male; 24.1% were female).

We used various linguistic tools to examine the peer review reports in our sample (see Methods for more details). Linguistic Inquiry and Word Count (LIWC) text-analysis software was used to perform word counts and to return scores of between 0% and 100% for ‘analytical tone’, ‘clout’ and ‘authenticity’ ( Pennebaker et al., 2015 ). Three different approaches were used to perform sentiment analysis: i) LIWC returns a score between 0% and 100% for ‘emotional tone’ (with more positive emotions leading to higher scores); ii) the SentimentR package returns a majority of scores between –1 (negative sentiment) and +1 (positive sentiment), with an extremely low number of results outside that range (0.03% in our sample); iii) the Stanford CoreNLP returns a score between 0 (negative sentiment) to +4 (positive sentiment). We also used LIWC to analyse the reports in terms of five foundations of morality ( Graham et al., 2009 ).

Length of report

For all combinations of area of research, type of peer review and reviewer gender, reports recommending accept were shortest, followed by reports recommending minor revisions, reject, and major revisions ( Figure 1 ). Reports written by reviewers for social sciences and economics journals were significantly longer than those written by reviewers for medical journals; men also tended to write longer reports than women; however, the type of peer review (i.e., single- vs. double-blind) did not have any influence on the length of reports (see Table 2 in Supplementary file 1 ).

peer review analysis of research

Words counts in peer review reports.

Word count (mean and 95% confidence interval; LIWC analysis) of peer review reports in four broad areas of research for double-blind review (top) and single-blind review (bottom), and for female reviewers (left) and male reviewers (right). Reports recommending accept (red) were consistently the shortest, and reports recommending major revisions (green) were consistently the longest. See Supplementary file 1 for summary data and mixed model linear regression coefficients and residuals. HMS: health and medical sciences; LS: life sciences; PS: physical sciences; SS&E: social sciences and economics.

Analytical tone, clout and authenticity

LIWC returned high scores (typically between 85.0 and 91.0) for analytical tone, and low scores (typically between 18.0 and 25.0) for authenticity, for the peer review reports in our sample ( Figure 2A,C ; Figure 2—figure supplement 1A,C ). High authenticity of a text is defined as the use of more personal words (I-words), present tense words, and relativity words, and fewer non-personal words and modal words ( Pennebaker et al., 2015 ). Low authenticity and high analytical tone are characteristic of texts describing medical research ( Karačić et al., 2019 ; Glonti et al., 2017 ). There was some variation with reviewer recommendation in the scores returned for clout, with accept having the highest scores for clout, followed by minor revisions, major revisions and reject ( Figure 2B ; Figure 2—figure supplement 1B ).

peer review analysis of research

Analytical tone, clout and authenticity and in peer review reports for single-blind review.

Scores returned by LIWC (mean percentages and 95% confidence interval) for analytical tone ( A ), clout ( B ) and authenticity ( C ) for peer review reports in four broad areas of research for female reviewers (left) and male reviewers (right) using single-blind review. Reports recommending accept (red) consistently had the most clout, and reports recommending reject (purple) consistently had the least clout. See Supplementary files 2 – 4 for summary data, mixed model linear regression coefficients and residuals, and examples of reports with high and low scores for analytical tone, clout and authenticity. HMS: health and medical sciences; LS: life sciences; PS: physical sciences; SS&E: social sciences and economics.

When reviewers recommended major revisions, the text of the report was more analytical. The analytical tone was higher when reviewers were women and for single-blind peer review, but we did not find any effect of the area of research (see Table 4 in Supplementary file 2 ).

Clout levels varied with area of research, with the highest levels in social sciences and economics journals (see Table 7 in Supplementary file 3 ). When reviewers recommended rejection, the text showed low levels of clout, as it did when reviewers were men and when the journal useded single-blind peer review (see Table 7 in Supplementary file 3 ).

The text of reports in social sciences and economics journals had the highest levels of authenticity. Authenticity was prevalent also when reviewers recommended rejection. There was no significant variation in terms of authenticity per reviewer gender or type of peer review (see Table 10 in Supplementary file 4 ).

Sentiment analysis

The three approaches were used to perform sentiment analysis on our sample – LIWC, SentimentR and the Stanford CoreNLP – produced similar results. Reports recommending accept had the highest scores, indicating higher sentiment, followed by reports recommending minor revisions, major revisions and reject ( Figure 3 ; Figure 3—figure supplement 1 ). Furthermore, reports for social sciences and economics journals had the highest levels of sentiment, as did reviews written by women. We did not find any association between sentiment and the type of peer review (see Table 13 in  Supplementary file 5 , Table 16 in  Supplementary file 6 and Table 19 in  Supplementary file 7 ).

peer review analysis of research

Sentiment analysis of peer review reports for single-blind review.

Scores for sentiment analysis returned by LIWC ( A ; mean percentage and 95% confidence interval, CI), SentimentR ( B ; mean score and 95% CI), and Stanford CoreNLP ( C ; mean score and 95% CI) for peer review reports in four broad areas of research for female reviewers (left) and male reviewers (right) using single-blind review. See Supplementary files 5 – 7 for summary data, mixed model linear regression coefficients and residuals, and examples of reports with high and low scores for sentiment according to LIWC, SentimentR and Stanford CoreNLP analysis.

Moral foundations

LIWC was also used to explore the morality of the reports in our sample ( Graham et al., 2009 ). The differences between peer review recommendations were statistically significant. Reports recommending acceptance had the highest scores for general morality, followed by reports recommending minor revisions, major revisions and reject ( Figure 4 ). Regarding the research area, we found a lowest proportion of words related to morality in the social sciences and economics, when reviewers were men, and when single-blind peer review was used ( Figure 4 ).

peer review analysis of research

Moral foundations in peer review reports.

Scores returned by LIWC (mean percentage on a log scale) for general morality in peer review reports in four broad areas of research for double-blind review (top) and single-blind review (bottom), and for female reviewers (left) and male reviewers (right). Reports recommending accept (red) consistently had the highest scores. See Supplementary file 8 for lists of the ten most frequent words found in peer review reports for general morality and the five moral foundation variables. HMS: health and medical sciences; LS: life sciences; PS: physical sciences; SS&E: social sciences and economics.

We also explored five foundations of morality – care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and sanctity/degradation – but no clear patterns emerged ( Figure 4—figure supplements 1 – 5 ). See the Methods section for more details, and Supplementary file 8 for lists of the ten most common phrases from the LIWC Moral Foundation dictionary. In general, the prevalence of these words was minimal, with average scores lower than 1%. Moreover, these words tended to be part of common phrases and thus did not speak to the moral content of the reviews. This suggests that a combination of qualitative and quantitative methods, including machine learning tools, will be required to explore the moral aspects of peer review.

Our study suggests that the reviewer recommendation has the biggest influence on the linguistic characteristics (and length) of peer review reports, which is consistent with previous, case-based research ( Casnici et al., 2017 ). It is probable that whenever reviewers recommend revision, they write a longer report in order to justify their requests and/or to suggest changes to improve the manuscript (which they do not have to do when they recommend to accept or reject). In our study, in the case of the two more negative recommendations (reject and major revisions), the reports were shorter, and language was less emotional and more analytical. We found that the type of peer review – single-blind or double-blind – had no significant influence on the reports, contrary to previous reports on smaller samples ( Bravo et al., 2019 ; van Rooyen et al., 1999 ). Likewise, area of research had no significant influence on the reports in the sample, and neither did reviewer gender, which is consistent with a previous smaller study ( Bravo et al., 2019 ). The lack of influence exerted by the area of research, the type of peer review or the reviewer gender on the linguistic characteristics of the reports is a sign of the robustness of peer review.

The results of our study should be considered in the light of certain limitations. Most of the journals were in the health and medical sciences and the physical sciences, and most used single-blind peer review. However, the size, depth and uniqueness of our dataset helped us provide a more comprehensive analysis of peer review reports than previous studies, which were often limited to small samples and incomplete data ( van den Besselaar et al., 2018 ; Sizo et al., 2019 ; Falk Delgado et al., 2019 ). Future research would also benefit from baseline data against which results could be compared, although our results match the preliminary results from a study at a single biomedical journal ( Glonti et al., 2017 ), and from knowing more about the referees (such as their status or expertise). Finally, we did not examine the actual content of the manuscripts under review, so we could not determine how reliable reviewers were in their assessments. Combining language analyses of peer review reports with estimates of peer review reliability for the same manuscripts (via inter-reviewer ratings) could provide new insights into the peer review process.

The PEERE dataset

PEERE is a collaboration between publishers and researchers ( Squazzoni et al., 2020 ), and the PEERE dataset contains 583,365 peer review reports from 61 journal published by Elsevier, with data on reviewer recommendation, area of research (health and medical sciences; life sciences; physical sciences; social sciences and economics), type of peer review (single blind or double blind), and reviewer gender for each report. Most of the reports (N = 481,961) are for original research papers, with the rest (N = 101,404) being for opinion pieces, editorials and letters to the editor. The database was first filtered to exclude reviews that included reference to manuscript revisions, resulting in 583,365 reports. We eliminated 110,636 due to the impossibility to determine reviewer gender, and 260 because we did not have data on the recommendation. Our analysis was performed on a total number of 472,449 peer review reports.

Gender determination

To determine reviewer gender, we followed a standard disambiguation algorithm that has already been validated on a dataset of scientists extracted from the Web of Science database covering a similar publication time window ( Santamaría and Mihaljević, 2018 ). Gender was assigned following a multi-stage gender inference procedure consisting of three steps. First, we performed a preliminary gender determination using, when available, gender salutation (i.e., Mr, Mrs, Ms...). Secondly, we queried the Python package gender-guesser about the extracted first names and country of origin, if any. Gender-guesser has demonstrated to achieve the lowest misclassification rate and introduce the smallest gender bias ( Paltridge, 2017 ). Lastly, we queried the best performer gender inference service, Gender API ( https://gender-api.com/ ), and used the returned gender whenever we found a minimum of 62 samples with, at least, 57% accuracy, which follows the optimal values found in benchmark 2 of the previous research ( Santamaría and Mihaljević, 2018 ). This threshold for the obtained confidence parameters was suitable to ensure that the rate of misclassified names did not exceed 5% ( Santamaría and Mihaljević, 2018 ). This allowed us to determine the gender of 81.1% of reviewers, among which 75.9% were male and 24.1% female. With regards to the three possible gender sources, 6.3% of genders came from scientist salutation, 77.2% from gender-guesser, and 16.5% from the Gender API. The remaining 18.9% of reviewers were assigned an unknown gender. This level of gender determination is consistent with the non-classification rate for names of scientists in previous research ( Santamaría and Mihaljević, 2018 ).

Analytical tone, authenticity and clout

We used a version of the Linguistic Inquiry and Word Count (LIWC) text-analysis software with standardized scores ( http://liwc.wpengine.com/ ) to analyze the peer review reports in our sample. LIWC measures the percentage of words related to three psychological features (so scores range from 0 to 100): ‘analytical tone’; ‘clout’; and "authenticity. A high score for analytical tone indicates a report with a logical and hierarchical style of writing. Clout reveals personal sensitivity towards social status, confidence or leadership: a low score for clout is associated with insecurities and a less confident and more tentative tone ( Kacewicz et al., 2014 ). A high score for authenticity indicates a report written in a style that is honest and humble, whereas a low score indicates a style that is deceptive and superficial ( Pennebaker et al., 2015 ). The words people use also reflect how authentic or personal they sound. People who are authentic tend to use more I-words (e.g. I, me, mine), present-tense verbs, and relativity words (e.g. near, new) and fewer she-he words (e.g. his, her) and discrepancies (e.g. should, could) ( Pennebaker et al., 2015 ).

We used three different methodological approaches to assess sentiment. (i) LIWC measures ‘emotional tone’, which indicates writing dominated by either positive or negative emotions by counting number of words from a pre-specified dictionary. (ii) The SentimentR package ( Rinker, 2019 ) classifies the proportion of words related to sentiment in the text, similarly to the ‘emotional tone’ scores in LIWC but using a different vocabulary. The SentimentR score is the valence of words related with the specific sentiment, majority of scores (99.97%) ranging from −1 (negative sentiment) +1 (positive sentiment). (iii) Stanford CoreNLP is a deep language analysis program that uses machine learning to determine the emotional valence of the text ( Socher et al., 2013 ), and score ranges from 0 (negative sentiment) to +4 (positive sentiment). Examples of characteristic text variables from the peer review reports analysed with these approaches are given in Supplementary files 5 – 7 .

We used LIWC and Moral Foundations Theory ( https://moralfoundations.org/other-materials/ ) to analyse the reports in our sample according to five moral foundations: care/harm (also known as care-virtue/care-vice); fairness/cheating (or fairness-virtue/fairness-vice); loyalty/betrayal (or loyalty-virtue/loyalty-vice); authority/subversion (authority virtue/authority-vice); and sanctity/degradation (or sanctity-virtue/sanctity-vice).

Statistical methods

Data were analysed using the R programming language, version 3.6.3. ( R Development Core Team, 2017 ). To test the interaction effects and compare different peer review characteristics, we conducted a mixed model linear analysis on each variable (analytical tone, authenticity, clout; the measures of sentiment; and the measures of morality) with reviewer recommendation, area of research, type of peer review (single- or double-blind) and reviewer gender as fixed factors (predictors) and the journal, word count and article type as the random factor. This was to control across-journal interactions, number of words and article type.

The journal dataset required a data sharing agreement to be established between authors and publishers. A protocol on data sharing entitled 'TD1306 COST Action New frontiers of peer review (PEERE) PEERE policy on data sharing on peer review' was signed by all partners involved in this research on 1 March 2017, as part of a collaborative project funded by the EU Commission. The protocol established rules and practices for data sharing from a sample of scholarly journals, which included a specific data management policy, including data minimization, retention and storage, privacy impact assessment, anonymization, and dissemination. The protocol required that data access and use were restricted to the authors of this manuscript and data aggregation and report were done in such a way to avoid any identification of publishers, journals or individual records involved. The protocol was written to protect the interests of any stakeholder involved, including publishers, journal editors and academic scholars, who could be potentially acted by data sharing, use and release. The full version of the protocol is available on the peere.org website. To request additional information on the dataset and for any claim or objection, please contact the PEERE data controller at [email protected].

  • Google Scholar
  • López-Iñesta E
  • Squazzoni F
  • Falk Delgado A
  • Garretson G
  • Schiebinger L
  • Bazerbachi F
  • Pennebaker JW
  • Graesser AC
  • Sugimoto CR
  • Bhattacharya A
  • Leatherberry R
  • Malikireddy D
  • Markowitz DM
  • Jayasinghe UW
  • Paltridge B
  • Blackburn K
  • R Development Core Team
  • Santamaría L
  • Mihaljević H
  • Perelygin A
  • Ahrweiler P
  • MacCallum C
  • Ross-Hellauer T
  • Van Rossum J
  • van den Besselaar P
  • Sandström U
  • Schiffbaenker H
  • van Rooyen S
  • Peter Rodgers Senior and Reviewing Editor; eLife, United Kingdom
  • Erin Hengel Reviewer

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Thank you for submitting your article "Meta research: Large-scale language analysis of peer review reports reveals lack of moral bias" to eLife for consideration as a Feature Article. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by the eLife Features Editor. One of the reviewers was Erin Hengel; the other two reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The paper presents an explorative study of the linguistic content of a very large set (roughly 500,000) of peer review reports. To our knowledge, this is the first paper addressing such a large data set on peer review and the potential biases often experienced by scholars of all fields. This is highly valuable to academic research as a whole. However, we do not believe the evidence warrants the authors' conclusion that peer review lacks moral bias (see below), and we suggest that the authors revise their paper to focus exclusively on the LIWC indicators (and conduct further research into moral bias, informed by the comments from referee #2 at the end of this email, with a view to writing a second manuscript for future submission). There are also a number of other issues that need to be addressed in a revised manuscript.

Essential revisions:

1. Reducing the emphasis on moral bias in the present manuscript will involve deleting the last 10 rows of table 1 (and similarly for tables 2, 3 and 4), and making changes to the text. The title also needs to be changed, but Figure 1 does not need to be changed.

2. The authors should analyze the data they have for word length, analytical thinking, clout, authenticity, emotional tone and morality in general in greater detail through regression analyses and/or multilevel models.

Here is what referee #1 said on this topic: I am concerned that the explorative approach misses a lot of underlying information. By comparing means of groups, the in-group variance is overlooked. Some of the observed effects might be large enough to hold some sort of universal "truth", but for other cases, substantial effects might still exist within groups. My suggestion would be to rethink the design into one that allows for multidimensionality. One approach could be regression analysis, which would require a more strict type of test-design though. Another approach could be to build reviewer-profiles based on their characteristics. E.g. reviews with a high degree of clout and negative emotion, and low analytical thinking could be one type. Where is this type found? What characterizes it? Additionally, this would allow the authors to include information about the reviewer, e.g. are they always negative? This might also solve the baseline problem, as this would be a classification issue rather than a measurement issue.

Here is what referee #3 said on this topic: All the analyses are descriptive ANOVAs and the like (bivariate). Given the supposed quality of the data, I'd recommend they explore multilevel models. Then they can fix out differences by fields, journals, etc so we get a more clear sense of what's going on.

3. Please include a fuller description of your dataset and describe how representative/biased the sampling is by field and type of journal.
4. Please provide more information on what the LIWC measures and how it is calculated? It would be especially helpful if you showed several LIWC scores together with sample texts (preferably from your own database) to illustrate how well it analyses text. This can be shown in an appendix. If you can't show text from your own database due to privacy concerns, feel free to show passages from this report. (Alternatively, you could take a few reports from, say, the BMJ which makes referee reports publicly available for published papers.)
5. What are baseline effects? What kind of changes should we expect? E.g. when arguing that the language is cold and analytical, what is this comparable to? I would expect most scientific writing to be mostly in this category, and it should not be a surprise - I hope. It would be very useful for the reader to have some type of comparison.
6. Does the analysis allow for the fact that different subject areas will have different LIWC scores? The largest sample of reports comes from the physical sciences, which use single-blind review the most. Reports from this field are also shorter, slightly more analytic and display less Clout and Authenticity. I think your results are picking up this selection bias instead of representing actual differences between the two review processes.

Please discuss.

7. Please add a section on the limitations of your study. For example, there is no discussion of sample bias and representation really.
Also, LIWC is full of issues and problems (NLP has come a ways since LIWC arrived on the scene): do you use the standardized version of LIWC constructs with high cronbach alphas, or the raw versions with poor cronbach alphas?
Does the analysis distinguish between original research content and other forms of content (eg, editorials, opinion pieces, book reviews etc)?

[We repeat the reviewers’ points here in italic, and include our replies point by point in Roman.]

Essential revisions: 1. Reducing the emphasis on moral bias in the present manuscript will involve deleting the last 10 rows of table 1 (and similarly for tables 2, 3 and 4), and making changes to the text. The title also needs to be changed, but Figure 1 does not need to be changed.

Thank you for your comment. We revised the tables according to the reviewers’ recommendations and created new tables where we present multilevel description of the review reports based on the interaction effects of the reviewer recommendation, journal discipline, journal peer review type and reviewer gender. We created the tables for the five LIWC variables (word count, analytical tone, clout tone, authenticity and emotional tone), and two new additional measures (SentimentR- an R package that has its own dictionaries for the emotional tone of the text, and Stanford CoreNLP-a deep language analysis software, which served as the concurrent validity assessment of the tone variables from the LIWC package). All multilevel relations are now presented in new figures, which are the results of the mixed methods linear regression where we controlled the random effect of the journal, word count (except for LIWC word count) and article type. In light of the new results, and coherently with referee recommendations, we introduced a change in the title and replaced the Figure 1. We now have seven new graphs describing linguistic characteristics of the reviews between groups in the main text and eleven graphs presenting the moral variables in the Supplementary file.

Thank you for your comment. As mentioned above, we performed a new analysis, where we used the mixed methods approach with reviewer recommendation, journal discipline, journal peer review type and reviewer gender as predictors (fixed factors) and different journals, word count and article type as the random factor, which would enable us to control the variations between journals (there were 61 journals in total from which some were more represented than others, and the majority of the articles were original research articles). We found significant interaction of the reviewer recommendation, journal’s field of research, the type of peer review and reviewer’s gender in each variable assessment, but we understand that this significance could be due to the large sample size. So, we presented figures with the within-group relations on standardized scales where we presented the differences between groups.

Here is what referee #1 said on this topic: I am concerned that the explorative approach misses a lot of underlying information. By comparing means of groups, the in-group variance is overlooked. Some of the observed effects might be large enough to hold some sort of universal "truth", but for other cases, substantial effects might still exist within groups. My suggestion would be to rethink the design into one that allows for multidimensionality. One approach could be regression analysis, which would require a stricter type of test-design though. Another approach could be to build reviewer-profiles based on their characteristics. E.g. reviews with a high degree of clout and negative emotion, and low analytical thinking could be one type. Where is this type found? What characterizes it? Additionally, this would allow the authors to include information about the reviewer, e.g. are they always negative? This might also solve the baseline problem, as this would be a classification issue rather than a measurement issue.

Thank you for the comment. Excellent point. We re-performed the analysis accordingly. The mixed methods approach revealed that the majority of effects of the differences in the writing style of the reviews can be attributed to reviewer recommendations, much less to the journal’s field of research, the type of peer review type and reviewer’s gender. We tried to provide an overview of the general writing style in peer reviews by presenting relevant variables in the same graphs so that a reader can have an overview about what peer review characteristics predict different language styles.

Here is what referee #3 said on this topic: All the analyses are descriptive ANOVAs and the like (bivariate). Given the supposed quality of the data, I'd recommend they explore multilevel models. Then they can fix out differences by fields, journals, etc so we get a clearer sense of what's going on.

As explained above, we performed mixed methods approach where these effects were analysed jointly. The current analysis provides an overview of the interaction effects in peer review characteristics and sizes of the differences between them.

Thank you, we added this both to the methods and the limitations in the Discussion section.

LIWC has a dictionary with words associated with different tone, and it counts number of words for each tone type in a certain text. The LIWC output is the percentage of words from a tone category in the text. We now provided the calculation of the different tone variables in the Supplementary file, both for high and low levels of tone. The examples are anonymized.

Another good point. The results in our study were similar to an unpublished study that focused on the analysis of the peer review linguistic characteristics, but on a much smaller sample and in only in a single journal ( https://peerreviewcongress.org/prc17-0234 ). However, with the new methodological approach we looked at the relationship of the linguistic characteristics and different aspects of peer review process and found important differences. The analytical tone was indeed predominant in all types of peer review reports, but we found differences in other linguistic characteristics. The new results are presented in the revised manuscript.

Thank you for your comment. The dataset characteristics are now described in the limitations in the Discussion section and we are aware that there is a higher prevalence of journals from Physical sciences, double blind reviews and those which asked for revisions. However, the new analyses now include the interaction of peer review characteristics and so we introduced a better control for this selection bias.

As mentioned previously, we added a limitation section to the revised manuscript.

The LIWC version we used is the standardized version with high Cronbach alphas. This has now been clarified in the Methods section of the revised manuscript. We also analysed the data using Stanford CoreNLP deep learning tool in order to increase internal validity of our approach.

There were no book reviews in the dataset. However, we did make the distinction between the original articles and other formats (There was the total of 388,737 original articles and 83,972 of other types or articles of those included in the mixed model analyses), which is now described in the Methods. We used this as the random factor in the mixed model linear regression.

Author details

Ivan Buljan is in the Department of Research in Biomedicine and Health, University of Split School of Medicine, Split, Croatia

Contribution

For correspondence, competing interests.

ORCID icon

Daniel Garcia-Costa is in the Department d'Informàtica, University of Valencia, Burjassot-València, Spain

Francisco Grimaldo is in the Department d'Informàtica, University of Valencia, Burjassot-València, Spain

Flaminio Squazzoni is in the Department of Social and Political Sciences, University of Milan, Milan, Italy

Ana Marušić is in the Department of Research in Biomedicine and Health, University of Split School of Medicine, Split, Croatia

Ministerio de Ciencia e Innovación (RTI2018-095820-B-I00)

Spanish agencia estatal de investigación (rti2018-095820-b-i00), european regional development fund (rti2018-095820-b-i00), croatian science foundation (ip-2019-04-4882).

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Dr Bahar Mehmani from Elsevier for helping us with data collection.

Publication history

  • Received: November 1, 2019
  • Accepted: July 16, 2020
  • Accepted Manuscript published : July 17, 2020
  • Version of Record published : July 29, 2020

© 2020, Buljan et al.

This article is distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use and redistribution provided that the original author and source are credited.

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

Downloads (link to download the article as pdf).

  • Article PDF
  • Figures PDF

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools), categories and tags.

  • peer review
  • sentiment analysis
  • linguistic analysis
  • meta-research
  • scientific publishing
  • Part of Collection

Meta-research

Meta-Research: A Collection of Articles

Further reading.

The study of science itself is a growing field of research.

Be the first to read new articles from eLife

Howard Hughes Medical Institute

  • Research article
  • Open access
  • Published: 22 August 2024

A systematic review and meta-analysis of randomized trials of substituting soymilk for cow’s milk and intermediate cardiometabolic outcomes: understanding the impact of dairy alternatives in the transition to plant-based diets on cardiometabolic health

  • M. N. Erlich 1 , 2 ,
  • D. Ghidanac 1 , 2 ,
  • S. Blanco Mejia 1 , 2 ,
  • T. A. Khan 1 , 2 ,
  • L. Chiavaroli 1 , 2 , 3 ,
  • A. Zurbau 1 , 2 ,
  • S. Ayoub-Charette 1 , 2 ,
  • A. Almneni 4 ,
  • M. Messina 5 ,
  • L. A. Leiter 1 , 2 , 3 , 6 , 7 ,
  • R. P. Bazinet 1 ,
  • D. J. A. Jenkins 1 , 2 , 3 , 6 , 7 ,
  • C. W. C. Kendall 1 , 2 , 8 &
  • J. L. Sievenpiper 1 , 2 , 3 , 6 , 7  

BMC Medicine volume  22 , Article number:  336 ( 2024 ) Cite this article

427 Accesses

65 Altmetric

Metrics details

Dietary guidelines recommend a shift to plant-based diets. Fortified soymilk, a prototypical plant protein food used in the transition to plant-based diets, usually contains added sugars to match the sweetness of cow’s milk and is classified as an ultra-processed food. Whether soymilk can replace minimally processed cow’s milk without the adverse cardiometabolic effects attributed to added sugars and ultra-processed foods remains unclear. We conducted a systematic review and meta-analysis of randomized controlled trials, to assess the effect of substituting soymilk for cow’s milk and its modification by added sugars (sweetened versus unsweetened) on intermediate cardiometabolic outcomes.

MEDLINE, Embase, and the Cochrane Central Register of Controlled Trials were searched (through June 2024) for randomized controlled trials of ≥ 3 weeks in adults. Outcomes included established markers of blood lipids, glycemic control, blood pressure, inflammation, adiposity, renal disease, uric acid, and non-alcoholic fatty liver disease. Two independent reviewers extracted data and assessed risk of bias. The certainty of evidence was assessed using GRADE (Grading of Recommendations, Assessment, Development, and Evaluation). A sub-study of lactose versus sucrose outside of a dairy-like matrix was conducted to explore the role of sweetened soymilk which followed the same methodology.

Eligibility criteria were met by 17 trials ( n  = 504 adults with a range of health statuses), assessing the effect of a median daily dose of 500 mL of soymilk (22 g soy protein and 17.2 g or 6.9 g/250 mL added sugars) in substitution for 500 mL of cow’s milk (24 g milk protein and 24 g or 12 g/250 mL total sugars as lactose) on 19 intermediate outcomes. The substitution of soymilk for cow’s milk resulted in moderate reductions in non-HDL-C (mean difference, − 0.26 mmol/L [95% confidence interval, − 0.43 to − 0.10]), systolic blood pressure (− 8.00 mmHg [− 14.89 to − 1.11]), and diastolic blood pressure (− 4.74 mmHg [− 9.17 to − 0.31]); small important reductions in LDL-C (− 0.19 mmol/L [− 0.29 to − 0.09]) and c-reactive protein (CRP) (− 0.82 mg/L [− 1.26 to − 0.37]); and trivial increases in HDL-C (0.05 mmol/L [0.00 to 0.09]). No other outcomes showed differences. There was no meaningful effect modification by added sugars across outcomes. The certainty of evidence was high for LDL-C and non-HDL-C; moderate for systolic blood pressure, diastolic blood pressure, CRP, and HDL-C; and generally moderate-to-low for all other outcomes. We could not conduct the sub-study of the effect of lactose versus added sugars, as no eligible trials could be identified.

Conclusions

Current evidence provides a good indication that replacing cow’s milk with soymilk (including sweetened soymilk) does not adversely affect established cardiometabolic risk factors and may result in advantages for blood lipids, blood pressure, and inflammation in adults with a mix of health statuses. The classification of plant-based dairy alternatives such as soymilk as ultra-processed may be misleading as it relates to their cardiometabolic effects and may need to be reconsidered in the transition to plant-based diets.

Trial registration

ClinicalTrials.gov identifier, NCT05637866.

Peer Review reports

Major dietary guidelines recommend a shift to plant-based diets for public and planetary health [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ] , while recommending simultaneous reductions in ultra-processed foods [ 2 , 3 , 4 , 5 , 6 , 7 , 8 ]. The shift to plant-based diets has resulted in an explosion of dairy, meat, and egg alternatives with plant protein foods projected to reach almost 10% of the global protein market by 2030 [ 9 ]. Although these foods can aid in the transition to plant-based diets, food classification systems such as the World Health Organization (WHO)-endorsed NOVA classification system classify them as ultra-processed foods to be avoided [ 10 ].

Dairy alternatives are an important example of a food category at the crossroads of these competing recommendations. School milk programs provide > 150 million servings of cow’s milk to children worldwide [ 11 ]. These programs are in addition to the food service and procurement policies of public institutions such as schools, universities, hospitals, long-term care homes, and prisons. Many of these programs and policies do not allow for the free replacement of cow’s milk with nutrient-dense plant milks [ 12 , 13 ]. Although the Dietary Guidelines for Americans [ 1 ], Canada’s Food Guide [ 3 ], and several European food-based dietary guidelines [ 14 ] recognize fortified soymilk [ 1 ] as nutritionally equivalent to cow’s milk, school nutrition programs in the United States (US) [ 12 ] and Europe [ 13 ] only provide funding for cow’s milk. There is a bipartisan bill before the US congress to change this policy and provide funding for fortified soymilk [ 15 ]. A major barrier to the use of fortified soymilk is that it contains added sugars to match the sweetness of cow’s milk at a level which would disqualify it from meeting the Food and Drug Administration’s proposed definition of “healthy” [ 16 ] (although its total sugar content is usually ~ 60% less than that of cow’s milk given the higher sweetness intensity of sucrose vs lactose) [ 17 ] and is classified (irrespective of its sugar content) as an ultra-processed food to be avoided [ 10 , 18 ]. Cow’s milk, on the other hand, enjoys classification as a “healthy,” minimally processed food to be encouraged [ 10 , 18 ].

As industry innovates in response to the growing demand and policy makers develop public health nutrition policies and programs in response to the evolving dietary guidance for more plant-based diets, it is important to understand whether nutrient-dense ultra-processed plant protein foods can replace minimally processed dairy foods without the adverse cardiometabolic effects attributed to added sugars and ultra-processed foods. We conducted a systematic review and meta-analysis of randomized controlled trials of the effect of substituting soymilk for minimally processed cow’s milk and its modification by added sugars (sweetened versus unsweetened) on intermediate cardiometabolic outcomes as a basis for understanding the role of nutrient-dense ultra-processed plant protein foods in the transition to plant-based diets.

We followed the Cochrane Handbook for Systematic Reviews of Interventions to conduct this systematic review and meta-analysis and reported our results by the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [ 19 , 20 ] (Additional file 1 : Table 1). To explore whether added sugars mediate any effects observed in sweetened soymilk studies, we conducted an additional systematic review and meta-analysis sub-study. This separate investigation followed the same protocol and methodology as our main study. It focused on controlled trials examining the impact of lactose in isocaloric comparisons with fructose-containing sugars (such as sucrose, high-fructose corn syrup [HFCS], or fructose) when not included in a dairy-like matrix, on all outcomes in the main study. The protocol is registered at ClinicalTrials.gov (NCT05637866).

Data sources and search strategy

We searched MEDLINE, Embase, and the Cochrane Central Register of Controlled Trials databases through June 2024. The detailed search strategies for the main study and sub-study were based on validated search terms [ 21 ] (Additional file 1 : Tables 2 and 4). Manual searches of the reference lists of included studies supplemented the systematic search.

Study selection

The main study included randomized controlled trials in human adults with any health status. Included trials had a study duration of ≥ 3 weeks and investigated the effects of soymilk compared with cow’s milk in energy matched conditions on intermediate cardiometabolic outcomes (Additional file 1 : Table 3). Trials that included other comparators that were not cow’s milk or had no viable outcome data were excluded. No restrictions were placed on language. For the sub-study, we included controlled trials involving adults of all health statuses that had a study duration of ≥ 3 weeks and investigated the effects of added sugars compared with lactose on the same intermediate cardiometabolic outcomes (Additional file 1 : Table 5).

Data extraction

A minimum of two investigators (ME, DG, SBM, AA) independently extracted relevant data from eligible studies. Extracted data included study design, sample size, sample characteristics (age, body mass index [BMI], sex, health status), intervention characteristics (soymilk volume, total sugars content, soy protein dose), control characteristics (cow’s milk volume, total sugars content, milk protein dose, milk fat content), baseline outcome levels, background diet, follow-up duration, setting, funding sources, and outcome data. The authors were contacted for missing outcome data when it was indicated that a relevant outcome was measured but not reported. Graphically presented data were extracted from figures using Plot Digitizer [ 22 ].

Outcomes for the main study and sub-study included blood lipids (low-density lipoprotein cholesterol [LDL-C], high-density lipoprotein cholesterol [HDL-C], non-high-density lipoprotein cholesterol [non-HDL-C], triglycerides, and apolipoprotein B [ApoB]), glycemic control (hemoglobin A1c [HbA1c], fasting plasma glucose, 2-h postprandial glucose, fasting insulin, and plasma glucose area under the curve [PG-AUC]), blood pressure (systolic blood pressure and diastolic blood pressure), inflammation (c-reactive protein [CRP]), adiposity (body weight, BMI, body fat, and waist circumference), kidney function and structure (creatinine, creatinine clearance, glomerular filtration rate [GFR], estimated glomerular filtration rate [eGFR], albuminuria, and albumin-creatinine ratio [ACR]), uric acid, and non-alcoholic fatty liver disease (NAFLD) (intrahepatocellular lipid [IHCL], alanine transaminase [ALT], aspartate aminotransferase [AST], and fatty liver index).

Mean differences (MDs) between the intervention and control arm and respective standard errors were extracted for each trial. If these were not provided, they were derived from available data using published formulas [ 19 ]. Mean pairwise difference in change-from-baseline values were preferred over end values. When median data was provided, they were converted to mean data with corresponding variances using methods developed by McGrath et al. [ 23 ]. When no variance data was available, the standard deviation of the MDs was borrowed from a trial similar in size, participants, and nature of intervention. All disagreements were reconciled by consensus or with a senior reviewer (JLS).

Risk of bias assessment

Included studies were assessed for the risk of bias independently and in duplicate by at least two investigators (ME, DG, SBM, AA) using the Cochrane Risk of Bias (ROB) 2 Tool [ 24 ]. The assessment was performed across six domains of bias (randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome, selection of the reported result, and overall bias). Crossover studies were assessed for an additional domain of bias (risk of bias arising from period or carryover effects). The ROB for each domain was assessed as “low” (plausible bias unlikely to seriously alter the results), “high” (plausible bias that seriously weakens confidence in results), or “some concern” (plausible bias that raises some doubt about the results). Reviewer discrepancies were resolved by consensus or arbitration by a senior investigator (JLS).

Statistical analysis

STATA (version 17; StataCorp LP, College Station, TX) was used for all analyses for the main study and sub-study. The principal effect measures were the mean pair-wise differences in change from baseline (or alternatively, end differences) between the intervention arm providing the soymilk and the cow’s milk comparator/control arm in each trial (significance at P MD  < 0.05). Results are reported as MDs with 95% confidence intervals (95% CI). As one of our primary research questions relates to the role of added sugars as a mediator in any observed differences between soymilk and cow’s milk, we stratified results by the presence of added sugars in the soymilk (sweetened versus unsweetened) and assessed effect modification by this variable on pooled estimates. Data were pooled using the generic inverse variance method with DerSimonian and Laird random effect models [ 25 ]. Fixed effects were used when less than five trials were available for an outcome [ 26 ]. A paired analysis was applied for crossover designs and for within-individual correlation coefficient between treatment of 0.5 as described by Elbourne et al. [ 27 , 28 ].

Heterogeneity was assessed using the Cochran’s Q statistic and quantified using the I 2 statistic, where I 2  ≥ 50% and P Q  < 0.10 were used as evidence of substantial heterogeneity [ 19 ]. Potential sources of heterogeneity were explored using sensitivity analyses. Sensitivity analyses were done via two methods. We conducted an influence analysis by systematically removing one trial at a time and recalculating the overall effect estimate and heterogeneity. A trial was considered influential if its removal explained the substantial heterogeneity or altered the direction, magnitude, or significance of the summary estimate. To determine whether the overall summary estimates were robust to the use of an assumed correlation coefficient for crossover trials, we conducted a second sensitivity analysis by using correlation coefficients of 0.25 and 0.75. If ≥ 10 trials were available, meta-regression analyses were used to assess the significance of each subgroup categorically and when possible, continuously (significance at P  < 0.05). A priori subgroup analyses included soy protein dose, follow-up duration, baseline outcome levels, comparator, design, age, health status, funding, and risk of bias.

If ≥ 6 trials are available [ 29 ], dose–response analyses were performed using meta-regression to assess linear (by generalized least squares trend (GLST) estimation models) and non-linear spline curve modeling (by MKSPLINE procedure) dose–response gradients (significance at P  < 0.05).

If ≥ 10 studies were available, publication bias was assessed by inspection of contour-enhanced funnel plots and formal testing with Egger’s and Begg’s tests (significance at P  < 0.10) [ 30 , 31 , 32 ]. If evidence of publication bias was suspected, the Duval and Tweedie trim-and-fill method was performed to adjust for funnel plot asymmetry by imputing missing study data and assess for small-study effects [ 33 ].

Certainty of evidence

The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach was used to assess the certainty of evidence. The GRADE Handbook and GRADEpro V.3.2 software were used [ 34 , 35 ]. A minimum of two investigators (ME, DG, SBM) independently performed GRADE assessments for each outcome [ 36 ]. Discrepancies were resolved by consensus or arbitration by the senior author (JLS). The overall certainty of evidence was graded as either high, moderate, low, or very low. Randomized trials are initially graded as high by default and then downgraded or upgraded based on prespecified criteria. Reasons for downgrading the evidence included study limitations (risk of bias assessed by the Cochrane ROB Tool), inconsistency of results (substantial unexplained interstudy heterogeneity, I 2  > 50% and P Q  < 0.10), indirectness of evidence (presence of factors that limit the generalizability of the results), imprecision (the 95% CI for effect estimates overlap with the MID for benefit or harm), and publication bias (evidence of small-study effects). The evidence was upgraded if a significant dose–response gradient was detected. We defined the importance of the magnitude of the pooled effect estimates using prespecified MIDs (Additional file 1 : Table 6) with GRADE guidance [ 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 ] according to five levels: very large (≥ 10 MID); large (≥ 5 MID); moderate (≥ 2 MID); small important (≥ 1 MID); and trivial/unimportant (< 1 MID) effects.

Search results

Figure 1 in Appendix shows the flow of the literature for the main analysis. We identified 522 reports through database and manual searches. A total of 17 reports [ 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 ] met the inclusion criteria and contained data for LDL (10 trials, n  = 312), HDL-C (8 trials, n  = 271), non-HDL-C (7 trials, n  = 243), triglycerides (9 trials, n  = 278), HbA1c (1 trial, n  = 25), fasting plasma glucose (5 trials, n  = 147), 2-h plasma glucose (1 trial, n  = 28), fasting insulin (4 trials, n  = 119), systolic blood pressure (5 trials, n  = 158), diastolic blood pressure (5 trials, n  = 158), CRP (5 trials, n  = 147), body weight (6 trials, n  = 163), BMI (6 trials, n  = 173), body fat (1 trial, n  = 43), waist circumference (3 trials, n  = 90), creatinine (1 trial, n  = 25), eGFR (1 trial, n  = 25), ALT (1 trial, n  = 24), and AST (1 trial, n  = 24) involving 504 participants. No trials were available for ApoB, PG-AUC, creatinine clearance, eGFR, albuminuria, ACR, uric acid, IHCL, or fatty liver index.

Additional file 1 : Fig. 1 shows the flow of literature for the sub-study. We identified 1010 reports through database and manual searches. After excluding 305 duplicates, a total of 705 reports were reviewed by title and abstract. No reports met the inclusion criteria and therefore no data was available for analysis.

Trial characteristics

Table 1 shows the characteristics of the included trials. The trials were conducted in a variety of locations, with most conducted in Iran (7/17 trials, 41%), followed by the US (3/17 trials, 18%), Italy (2/17 trials, 12%), Brazil (1/17 trials, 6%), Scotland (1/17 trials, 6%), Sweden (1/17 trials, 6%), Spain (1/17 trials, 6%), and Australia (1/17 trials, 6%). All trials took place in outpatient settings (17/17, 100%). The median trial size was 25 participants (range, 7–60 participants). The median age of the participants was 48.5 years (range, 20–70 years) and the median BMI was 27.9 kg/m 2 (range, 20–31.1 kg/m 2 ). The trials included participants with hypercholesterolemia (4/17 trials, 25%), overweight or obesity (4/17 trials, 25%), type 2 diabetes (2/17 trials, 12%), hypertension (1/17 trials, 6%), rheumatoid arthritis (1/17 trials, 6%), or were healthy (3/17 trials, 18%) or post-menopausal (2/17 trials, 12%). Both trials with crossover design (10/17 trials, 59%) and parallel design (7/17 trials, 41%) were included. The intervention included sweetened (11/17 trials, 65%) and unsweetened (6/17 trials, 35%) soymilk.

The median soymilk dose was 500 mL/day (range, 240–1000 mL/day) with a median soy protein of 22 g/day (range, 2.5–70 g/day) or 6.6 g/250 mL (range, 2.6–35 g/250 mL) and median total (added) sugars of 17.2 g/day (range, 4.0–32 g/day) or 6.9 g/250 mL (range, 1–16 g/250 mL) in the sweetened soymilk. The comparators included skim (0% milk fat) (2/17 trials, 12%), low-fat (1% milk fat) (4/17 trials, 24%), reduced fat (1.5–2.5% milk fat) (7/17 trials, 41%), and whole (3% milk fat) (1/17 trials, 6%) cow’s milk. Three trials did not report the milk fat content of cow’s milk used. The median cow’s milk dose was 500 mL/day (range, 236–1000 mL/day) with a median milk protein of 24 g/day (range, 3.3–70 g/day) or 8.3 g/250 mL (range, 3.4–35 g/250 mL) and median total (lactose) sugars of 24 g/day (range, 11.5–49.2 g/day) or 12 g/250 mL (range, 10.8–12.8 g/250 mL). The median study duration was 4 weeks (range, 4–16 weeks). The trials received funding from industry (1/17 trials, 6%), agency (8/17 trials, 47%), both industry and agency (4/16 trials, 25%), or they did not report the funding source (4/17 trials, 24%).

Additional file 1 : Fig. 2 shows the ROB assessments of the included trials. Two trials were assessed as having some concerns from period or carryover effects: Bricarello et al. [ 53 ] and Steele [ 67 ]. All other trials were judged as having an overall low risk of bias. There was no evidence of serious risk of bias across the included trials.

Markers of blood lipids

Figure 2 and Additional file 1 : Figs. 3–6 show the effect of substituting soymilk for cow’s milk on markers of blood lipids. The substitution resulted in a small important reduction in LDL-C (10 trials; MD: − 0.19 mmol/L; 95% CI: − 0.29 to − 0.09 mmol/L; P MD  < 0.001; no heterogeneity: I 2  = 0.0%; P Q  = 0.823), a trivial increase in HDL-C (8 trials; MD: 0.05 mmol/L; 95% CI: 0.00 to 0.09 mmol/L; P MD  = 0.036; no heterogeneity: I 2  = 0.0%; P Q  = 0.053), a moderate reduction in non-HDL-C (7 trials; MD: − 0.26 mmol/L; 95% CI: − 0.43 to − 0.10 mmol/L; P MD  = 0.002; no heterogeneity: I 2  = 0.0%; P Q  = 0.977), and no effect on triglycerides. There were no interactions by added sugars in soymilk for any blood lipid markers ( P  = 0.49–0.821).

Markers of glycemic control

Figure 2 and Additional file 1 : Figs. 7–10 show the effect of substituting soymilk for cow’s milk on markers of glycemic control. The substitution had no effect on HbA1c, fasting plasma glucose, 2-h plasma glucose, or fasting insulin. There was no interaction by added sugars in soymilk for fasting plasma glucose ( P  = 0.747) but there was an interaction for fasting insulin ( P  = 0.026), where a lack of effect remained in both groups with neither the sweetened soymilk (non-significant increasing effect) nor the unsweetened soymilk (non-significant decreasing effect) showing an effect on fasting insulin. We could not assess this interaction for HbA1c or 2-h plasma glucose, as there was only one trial available for each outcome.

Blood pressure

Figure 2 and Additional file 1 : Figs. 11 and 12 show the effect of substituting soymilk for cow’s milk on blood pressure. The substitution resulted in a moderate reduction in both systolic blood pressure (5 trials; MD: − 8.00 mmHg; 95% CI: − 14.89 to − 1.11 mmHg; P MD  = 0.023; substantial heterogeneity: I 2  = 86.89%; P Q  ≤ 0.001) and diastolic blood pressure (5 trials; MD: − 4.74 mmHg; 95% CI: − 9.17 to − 0.31 mmHg; P MD  = 0.036; substantial heterogeneity: I 2  = 77.3%; P Q  = 0.001). There were no interactions by added sugars in soymilk for blood pressure ( P  = 0.747 and 0.964).

Markers of inflammation

Figure 2 and Additional file 1 : Fig. 13 show the effect of substituting soymilk for cow’s milk on markers of inflammation. The substitution resulted in a small important reduction in CRP (5 trials; MD: − 0.81 mg/dL; 95% CI: − 1.26 to − 0.37 mg/dL; P MD  = < 0.001; no heterogeneity: I 2  = 0.0%; P Q  = 0.814). There was no interaction by added sugars in soymilk for CRP ( P  = 0.275).

Markers of adiposity

Figure 2 and Additional file 1 : Figs. 14–17 show the effect of substituting soymilk for cow’s milk on markers of adiposity. The substitution had no effect on body weight, BMI, body fat, or waist circumference. There were no interactions by added sugars in soymilk for any adiposity outcome ( P  = 0.664–0.733).

Markers of kidney function

Figure 2 and Additional file 1 : Figs. 18 and 19 show the effect of substituting soymilk for cow’s milk on markers of kidney function. The substitution had no effect on creatinine or eGFR. We could not assess the interaction by added sugars in soymilk for creatinine or eGFR, as there was only one trial available for each outcome which included soymilk without added sugars.

Markers of NAFLD

Figure 2 and Additional file 1 : Figs. 20 and 21 show the effect of substituting soymilk for cow’s milk on markers of NAFLD. The substitution had no effect on ALT or AST. We could not assess heterogeneity or the interaction by added sugars in soymilk for ALT or AST, as there was only one trial available for each outcome which included soymilk without added sugars.

Sensitivity analysis

Additional file 1 : Figs. 22–33 present the influence analyses across all outcomes. The removal of Bricarello et al. [ 53 ] or Steele [ 67 ] each resulted in loss of significant effect for HDL-C. The removal of Onning et al. [ 62 ] or Steele [ 67 ] each resulted in a partial explanation of heterogeneity for triglycerides. The removal of Hasanpour et al. [ 56 ] explained the heterogeneity for fasting insulin. The removal of Keshavarz et al. [ 57 ] or Miraghajani et al. [ 59 ] each resulted in a loss of significant effect for systolic blood pressure and the removal of Rivas et al. [ 63 ] resulted in a partial explanation of the heterogeneity for systolic blood pressure. The removal of Hasanpour et al. [ 56 ], Keshavarz et al. [ 57 ], Miraghajani et al. [ 59 ], or Rivas et al. [ 63 ] each resulted in a loss of significant effect for diastolic blood pressure and the removal of Rivas et al. [ 63 ] resulted in a partial explanation of heterogeneity for diastolic blood pressure. The removal of Mohammad-Shahi et al. [ 58 ] resulted in loss of significant effect for CRP.

Additional file 1 : Table 8 shows the sensitivity analyses for the different correlation coefficients (0.25 and 0.75) used in paired analyses of crossover trials for all outcomes. The different correlation coefficients did not alter the direction, magnitude, or significance of the effect or evidence for heterogeneity, with the following exceptions: loss of significance for the effect of the substitution on HDL-C (8 trials; MD: 0.04 mmol/L; 95% CI: − 0.10 to 0.01 mmol/L; P MD  = 0.107; I 2  = 0.0%; P Q  = 0.670) with the use of 0.25 and (8 trials; MD: 0.05 mmol/L; 95% CI: − 0.10 to 0.01 mmol/L; P MD  = 0.089; I 2  = 0.0%; P Q  = 0.640) with the use of 0.75.

Subgroup analyses

Additional file 1 : Figs. 34–36 present the subgroup analyses and continuous meta-regression analyses for LDL-C. Subgroup analysis was not conducted for any other outcome as there were < 10 trials included. There was no significant effect modification by health status, BMI, age, comparator, baseline LDL-C, study design, follow-up duration, funding source, dose of soy protein, or risk of bias for LDL-C. However, there were tendencies towards a greater reduction in LDL-C by point estimates in groups with certain health statuses (hypercholesterolemic and overweight/obesity), a higher baseline LDL-C, and a higher soy protein dose (> 25 g/day).

Dose–response analyses

Additional file 1 : Figs. 37–42 present linear and non-linear dose–response analyses for LDL-C, HDL-C, non-HDL-C, triglycerides, body weight, and BMI. There was no dose–response seen for the effect of substituting soymilk for cow’s milk, with the exception of a positive linear dose–response for triglycerides ( P linear  = 0.038). We did not downgrade the certainty of evidence as the greater reduction in triglycerides seen at lower doses of soy protein was lost at higher doses. There were no dose–response analyses performed for the remaining outcomes because there were < 6 trials available for each.

Publication bias assessment

Additional file 1 : Fig. 43 presents the contour-enhanced funnel plot for assessment of publication bias for LDL-C. There was no asymmetry at the visual inspection and no evidence (Begg’s test = 0.721, Egger’s test = 0.856) of funnel plot asymmetry for LDL-C. No other publication bias analyses could be performed as there were < 10 trials available for each.

Adverse events and acceptability

Additional file 1 : Table 9 shows the reported adverse events and acceptability of study beverages. Adverse events were reported in nine trials. In one trial by Gardner et al. [ 55 ], one participant experienced a recurrence of a cancer; however, it was considered to be unrelated to the short-term consumption of the study milks. Three trials (Miraghajani et al., Hasanpour et al., and Mohammad-Shahi, et al.) [ 56 , 58 , 59 ] reported one to two withdrawals due to digestive difficulties related to soymilk consumption. Two trials (Sirtori et al. 1999 and 2002) [ 65 , 66 ] reported one or more participants with digestive difficulties related to cow’s milk consumption. Two trials (Nourieh et al. and Keshavarz et al.) [ 57 , 61 ] each reported two participant withdrawals related to digestive problems that were not specific to either study beverage. Of these, four trials indicated that most participants found the soymilk and cow’s milk acceptable and tolerable. One trial, by Onning et al. [ 62 ], incorporated a sensory evaluation of appearance, consistency, flavor, and overall impression, which showed declining scores for both types of milk over the 3-week test period.

GRADE assessment

Additional file 1 : Table 10 presents the GRADE assessment. The certainty of evidence for the effect of substituting soymilk for cow’s milk was high for LDL-C, non-HDL-C, fasting plasma glucose, and waist circumference. The certainty of evidence was moderate for HDL-C, triglycerides, fasting insulin, systolic blood pressure, diastolic blood pressure, CRP, body weight, and BMI owing to a downgrade for imprecision of the pooled effect estimates and was moderate for body fat owing to a downgrade for indirectness. The certainty of evidence was low for HbA1c, 2-h plasma glucose, creatinine, eGFR, ALT, and AST owing to downgrades for indirectness and imprecision.

We conducted a systematic review and meta-analysis of 17 trials that examined the effect of substituting soymilk (median dose of 22 g/day or 6.6 g/250 mL serving of soy protein per day and 17.2 g/day or 6.9 g/250 mL of total [added] sugars in the sweetened soymilk) for cow’s milk (median dose of 24 g/day or 8.3 g/250 mL of milk protein and 24 g/day or 12 g/250 mL of total sugars [lactose]) and its modification by added sugars (sweetened versus unsweetened soymilk) on 19 intermediate cardiometabolic outcomes over a median follow-up period of 4 weeks in adults of varying health status. The substitution of soymilk for cows’ milk led to moderate reductions in non-HDL-C (− 0.26 mmol/L or ~ − 7%) and systolic blood pressure (− 8.00 mmHg) and diastolic blood pressure (− 4.74 mmHg); small important reductions in LDL-C (− 0.19 mmol/L or ~ − 6%) and CRP (− 0.81 mg/L or ~ 22%); and a trivial increase in HDL-C (0.05 mmol/L or ~ 4%), with no adverse effects on other intermediate cardiometabolic outcomes. There was no meaningful interaction by added sugars in soymilk, with sweetened and unsweetened soymilk showing similar effects across outcomes. There was no dose–response relationship seen across the outcomes for which dose–response analyses were performed.

Findings in relation to the literature

Our findings agree with previous evidence syntheses of soy. Regulatory authorities such as the United States Food and Drug Administration (FDA) and Health Canada have conducted comprehensive evaluations of the randomized controlled trials of the effect of soy protein from different sources on total-C and LDL-C, resulting in approved health claims for soy protein (based on an intake of 25 g/day of soy protein irrespective of source) for cholesterol reduction [ 68 ] and coronary heart disease risk reduction [ 69 ]. Updated systematic reviews and meta-analyses of the 46 randomized controlled trials included in the re-evaluation of the FDA health claim [ 70 ] showed reductions in LDL-C of − 3.2% [ 71 ]. This reduction has been stable since the health claim was first approved in 1999 [ 72 ] and is smaller but consistent with our findings specifically for soymilk. No increase in HDL-C, however, was detected. Previous systematic reviews and meta-analyses of randomized controlled trials of soy protein and soy isoflavones have also shown significant but smaller reductions in systolic blood pressure (1.70 mmHg) and diastolic blood pressure (− 1.27 mmHg) [ 73 ] than was found in the current analysis. These reductions in LDL-C and blood pressure are further supported by reductions in clinical events with updated pooled analyses of prospective cohort studies showing that legumes including soy are associated with reduced incidence of total cardiovascular disease and coronary heart disease [ 74 ].

Systematic reviews and meta-analyses that specifically isolated the effect of soymilk (as a single food matrix) in its intended substitution for cow’s milk are lacking. Sohouli and coworkers [ 75 ] conducted a systematic review and meta-analysis of 18 randomized controlled trials in 665 individuals of varying health status that assessed the effect of soymilk in comparison with a mix of comparators on intermediate cardiometabolic outcomes but did not isolate its substitution with cow’s milk. This synthesis showed similar improvements in LDL-C (− 0.24 mmol/L), systolic blood pressure (− 7.38 mmHg), diastolic blood pressure (− 4.36 mmHg), and CRP (− 1.07, mg/L), while also showing reductions in waist circumference and TNF-α [ 75 ]. The substitution of legumes that includes soy for various animal protein sources and more specifically legumes/nuts (the only exposure available) for dairy in syntheses of prospective cohort studies has also shown reductions in incident total cardiovascular disease and all-cause mortality [ 76 ].

Indirect evidence from dietary patterns that contain soy foods including soymilk in substitution for different animal sources of protein including cow’s milk further supports our findings. Systematic reviews and meta-analyses of randomized trials of the Portfolio diet and vegetarian and vegan dietary patterns have shown additive reductions in LDL-C, non-HDL-C, blood pressure, and CRP when soy foods including soymilk are combined with other foods that target these same intermediate risk factors with displacement of different animal sources of protein including cow’s milk [ 77 , 78 ]. These reductions have also been shown to translate to reductions in clinical events with systematic reviews and meta-analyses of prospective cohort studies showing that adherence to these dietary patterns is associated with reductions in incident coronary heart disease, total cardiovascular disease, and all-cause mortality [ 79 , 80 , 81 ].

Potential mechanisms of action

The potential mechanism mediating the effects of soy remains unclear. Specific components within the soy food matrix, including soy protein and phytochemicals like isoflavones [ 82 ], have been implicated. The well-established lipid-lowering effect of soy [ 72 ] may be attributed to the 7S globulin fraction of soy protein, which exerts its primary action by upregulating LDL-C receptors predominantly within the liver, thereby augmenting the clearance of LDL-C from circulation [ 82 ]. The isoflavone, fiber, fatty acids, and anti-nutrient components may also exert some mediation [ 83 ]. The reduction in blood pressure has been most linked to the soy isoflavones [ 83 ]. There is evidence that soy isoflavones may modulate the renin–angiotensin–aldosterone system (RAAS), with the capacity to inhibit the production of angiotensin II and aldosterone, thereby contributing to the regulation of blood pressure [ 73 ]. Another blood pressure lowering mechanism may involve the ability of soy isoflavones to enhance endothelial function by mitigating oxidative stress and inflammation, consequently promoting the release of the relaxing factor nitric oxide (NO) [ 73 ]. This potential mechanism of isoflavones may also explain the reductions seen in inflammation.

Strengths and limitations

Our evidence synthesis had several strengths. First, we completed a comprehensive and reproducible systematic search and selection process of the available literature examining the effect of substituting soymilk for cow’s milk on intermediate cardiometabolic outcomes. Second, we synthesized the totality of available evidence from a large body of randomized controlled trials, which gives the greatest protection against systematic error. Third, we included an extensive and comprehensive list of outcomes to fully capture the impact of soymilk on cardiometabolic health. Fourth, we only included randomized controlled trials that compared soymilk to cow’s milk directly, to increase the specificity of our conclusion. Finally, we included a GRADE assessment to explore the certainty of available evidence.

There were also several limitations. First, we could not conduct the sub-study of the effect of lactose versus added sugars outside of a dairy-like matrix, as no eligible trials could be identified. Although this analysis is important for isolating the effect of added sugars as a mediator of any adverse effects, we did not observe any meaningful interaction by added sugars in soymilk. Second, there was serious imprecision in the pooled estimates across many of the outcomes with the 95% confidence intervals overlapping the MID in each case, with the exception of LDL-C, non-HDL-C, fasting plasma glucose, and waist circumference. The certainty of evidence for HDL-C, triglycerides, HbA1c, fasting plasma glucose, 2-h plasma glucose, fasting insulin, systolic blood pressure, diastolic blood pressure, CRP, body weight, BMI, body fat, creatinine, eGFR, ALT, and AST was downgraded for this reason. Third, there was evidence of indirectness related to insufficient trials for HbA1c, 2-h plasma glucose, creatinine, eGFR, ALT, and AST, which limits generalizability. Each outcome with data from only 1 trial was downgraded for this reason. Another source of indirectness could be the median follow-up duration of 4 weeks (range, 4–16 weeks). This time frame may be sufficient for observing certain effects, but other outcomes may require a longer period to manifest changes. Despite acknowledging this variation in response time among different outcomes, we did not further downgrade for this aspect of indirectness. Instead, we tailored our conclusions to reflect short-to-moderate term effects. Finally, although publication bias was not suspected, we were only able to make this assessment for LDL-C, as there were < 10 trials for all other outcomes.

Considering these strengths and limitations, we assessed the certainty of evidence as high for LDL-C and non-HDL-C; moderate for systolic blood pressure, diastolic blood pressure, CRP, and HDL-C; and moderate-to-low for all outcomes where significant effects were not observed.

Implications

This work has important implications for plant protein foods in the recommended shift to more plant-based diets. Major international dietary guidelines in the US [ 1 ], Canada [ 3 ], and Europe [ 4 , 5 , 6 ] recommend fortified soymilk as the only suitable replacement for cow’s milk. Our findings support this recommendation showing soymilk including sweetened soymilk (up to 7 g added sugars per 250 mL) does not have any adverse effects compared with cow’s milk across 19 intermediate cardiometabolic outcomes with benefits for lipids, blood pressure, and inflammation. This evidence suggests that it may be misleading as it relates to their cardiometabolic effects to classify fortified soymilk as an ultra-processed food to be avoided while classifying cow’s milk as a minimally processed food to be encouraged (based on the WHO-endorsed NOVA classification system [ 10 ]). It also suggests that it may be misleading not to allow fortified soymilk that is sweetened with small amounts of sugars to be classified as “healthy” (based on the FDA’s new proposed definition that only permits this claim on products with added sugars ≤ 2.5 g or 5% daily value (DV) per 250 mL serving [ 16 ]). The proposed FDA criteria would prevent this claim on soymilk products designed to be iso-sweet analogs of cow’s milk (in which 5 g or 10% daily value [DV] of added sugars from sucrose in soymilk is equivalent to the 12 g of lactose in cow’s milk per 250 mL serving, as sucrose is 1.4 sweeter than lactose [ 17 ]). To prevent confusion, policy makers may want to exempt fortified soymilk from classification as an ultra-processed food and allow added sugars up to 10% DV for the definition of “healthy,” as has been proposed by the FDA for sodium and saturated fat in dairy products (including soy-based dairy alternatives) to account for accepted processing and preservation methods [ 16 ]. These policy considerations would balance the need to limit nutrient-poor energy-dense foods with the need to promote nutrient-dense foods like fortified soymilk in the shift to healthy plant-based diets.

In conclusion, the evidence provides a good indication that substituting either sweetened or unsweetened soymilk for cow’s milk in adults with varying health statuses does not have the adverse effects on intermediate cardiometabolic outcomes attributed to added sugars and ultra-processed foods in the short-to-moderate term. There appear even to be advantages with small to moderate reductions in established markers of blood lipids (LDL-C, non-HDL-C) that are in line with approved health claims for cholesterol and coronary heart disease risk reduction, as well as small to moderate reductions in blood pressure and inflammation (CRP). Sources of uncertainty include imprecision and indirectness in several of the estimates. There remains a need for more well-powered randomized controlled trials of the effect of substituting soymilk for cow’s milk on less studied intermediate cardiometabolic outcomes, especially established markers of glycemic control, kidney structure and function, and NAFLD. There is also a need for trials comparing lactose versus added sugars outside of a dairy-like matrix to understand better the role of added sugars at different levels in substitution for lactose across outcomes. In the meantime, our findings support the use of fortified soymilk with up to 7 g added sugars per 250 mL as a suitable replacement for cow’s milk and suggest that its classification as ultra-processed and/or not healthy based on small amounts of added sugars may be misleading and need to be reconsidered to facilitate the recommended transition to plant-based diets.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its Additional file 1 : information files.

Abbreviations

Grading of Recommendations, Assessment, Development, and Evaluation

Non-high-density lipoprotein cholesterol

Low-density lipoprotein cholesterol

C-reactive protein

High-density lipoprotein cholesterol

World Health Organization

United States

Preferred Reporting Items for Systematic Reviews and Meta-Analysis

High-fructose corn syrup

Body mass index

Apolipoprotein B

Hemoglobin A1c

Plasma glucose area under the curve

Glomerular filtration rate

Estimated glomerular filtration rate

Albumin-creatinine ratio

Non-alcoholic fatty liver disease

Intrahepatocellular lipid

Alanine transaminase

Aspartate aminotransferase

Mean difference

Risk of bias

95% Confidence interval

Generalized least squares trend

Food and Drug Administration

Tumor necrosis factor alpha

Renin-angiotensin-aldosterone system

Nitric oxide

Daily value

Dietary guidelines for Americans, 2020–2025. 2020 [9:[Available from: www.dietaryguidelines.gov .

Canada, Health. Canada’s Food Guide. Ottawa; 2019.  https://food-guide.canada.ca/en/ .

Canada’s food guide Ottawa 2019 [Available from: https://food-guide.canada.ca/en/ .

Blomhoff R, Andersen R, Arnesen EK, Christensen JJ, Eneroth H, Erkkola M, Gudanaviciene I, Halldórsson ÞI, Höyer-Lund A, Lemming EW. Nordic nutrition recommendations 2023: integrating environmental aspects. Nordisk Ministerråd; 2023.

García EL, Lesmes IB, Perales AD, Arribas VM, del Puy Portillo Baquedano M, Velasco AMR, Salvo UF, Romero LT, Porcel FBO, Laín SA. Report of the Scientific Committee of the Spanish Agency for Food Safety and Nutrition (AESAN) on sustainable dietary and physical activity recommendations for the Spanish population. Wiley Online Library; 2023. Report No.: 2940–1399.

Brink E, van Rossum C, Postma-Smeets A, Stafleu A, Wolvers D, van Dooren C, et al. Development of healthy and sustainable food-based dietary guidelines for the Netherlands. Public Health Nutr. 2019;22(13):2419–35.

Article   PubMed   PubMed Central   Google Scholar  

Lichtenstein AH, Appel LJ, Vadiveloo M, Hu FB, Kris-Etherton PM, Rebholz CM, et al. 2021 dietary guidance to improve cardiovascular health: a scientific statement from the American Heart Association. Circulation. 2021;144(23):e472–87.

Article   PubMed   Google Scholar  

Willett W, Rockström J, Loken B, Springmann M, Lang T, Vermeulen S, et al. Food in the Anthropocene: the EAT–Lancet Commission on healthy diets from sustainable food systems. The lancet. 2019;393(10170):447–92.

Article   Google Scholar  

Bartashus J, Srinivasan G. Plant-based foods poised for explosive growth. Bloomberg Intelligence. 2021.

Monteiro CA, Cannon G, Lawrence M, Costa Louzada Md, Pereira Machado P. Ultra-processed foods, diet quality, and health using the NOVA classification system. Rome: FAO; 2019. p. 48.

International Dairy Federation. The contribution of school milk programmes to the nutrition of children worldwide. Brussels: Belgium; 2020.

Google Scholar  

USDA Food and Nutrition Service. Special Milk Program [Available from: https://www.fns.usda.gov/smp/special-milk-program .

The European Parliament. European Parliament resolution of 9 May 2023 on the implementation of the school scheme [Available from: https://www.europarl.europa.eu/doceo/document/TA-9-2023-0135_EN.html .

European Commission. Summary of FBDG recommendations for milk and dairy products for the EU, Iceland, Norway, Switzerland and the United Kingdom. [Available from: https://knowledge4policy.ec.europa.eu/health-promotion-knowledge-gateway/food-based-dietary-guidelines-europe-table-7_en .

Addressing Digestive Distress in Stomachs of Our Youth (ADD SOY) Act, House of Representatives, 1st Sess.; 2023.  https://troycarter.house.gov/sites/evo-subsites/troycarter.house.gov/files/evo-media-document/add-soy-act.pdf .

Food and Drug Administration. Food labeling: nutrient content claims; definition of term “healthy”. In: Department of Health and Human Services (HHS); 2022.  https://www.federalregister.gov/documents/2022/09/29/2022-20975/food-labeling-nutrient-content-claims-definition-of-term-healthy .

Helstad S. Chapter 20 - corn sweeteners. In: Serna-Saldivar SO, editor. Corn. 3rd ed. Oxford: AACC International Press; 2019. p. 551–91.

Chapter   Google Scholar  

Messina M, Sievenpiper JL, Williamson P, Kiel J, Erdman JW. Perspective: soy-based meat and dairy alternatives, despite classification as ultra-processed foods, deliver high-quality nutrition on par with unprocessed or minimally processed animal-based counterparts. Adv Nutr. 2022;13(3):726–38.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Higgins J, Thomas J, Chandler J. Cochrane handbook for systematic reviews of interventions version 6.2. 2021.

Moher D, Liberati A, Tetzlaff J, Altman DG, Group* P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med. 2009;151(4):264–9.

BMJ Best Practice. Search strategies [Available from: https://bestpractice.bmj.com/info/toolkit/learn-ebm/study-design-search-filters/ .

Rohatgi A. WebPlotDigitizer 4.6; 2022.  https://automeris.io/WebPlotDigitizer/ .

McGrath S, Zhao X, Steele R, Thombs BD, Benedetti A, Collaboration DESD. Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis. Stat Methods Med Res. 2020;29(9):2520–37.

Sterne JAC, Savovic J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366: l4898.

DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–88.

Article   PubMed   CAS   Google Scholar  

Tufanaru C, Munn Z, Stephenson M, Aromataris E. Fixed or random effects meta-analysis? Common methodological issues in systematic reviews of effectiveness. Int J Evid Based Healthc. 2015;13(3):196–207.

Elbourne DR, Altman DG, Higgins JP, Curtin F, Worthington HV, Vail A. Meta-analyses involving cross-over trials: methodological issues. Int J Epidemiol. 2002;31(1):140–9.

Balk EM, Earley A, Patel K, Trikalinos TA, Dahabreh IJ. Empirical assessment of within-arm correlation imputation in trials of continuous outcomes. 2013.

Fu R, Gartlehner G, Grant M, Shamliyan T, Sedrakyan A, Wilt TJ, et al. Conducting quantitative synthesis when comparing medical interventions: AHRQ and the Effective Health Care Program. J Clin Epidemiol. 2011;64(11):1187–97.

Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Contour-enhanced meta-analysis funnel plots help distinguish publication bias from other causes of asymmetry. J Clin Epidemiol. 2008;61(10):991–6.

Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997;315(7109):629–34.

Begg CB, Mazumdar M. Operating characteristics of a rank correlation test for publication bias. Biometrics. 1994;50(4):1088–101.

Duval S, Tweedie R. Trim and fill: a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics. 2000;56(2):455–63.

Schünemann H, Brożek J, Guyatt G, Oxman A. GRADE handbook. Grading of Recommendations Assessment, Development and Evaluation, Grade Working Group. 2013.

McMaster University and Evidence Prime. GRADEpro GDT: GRADEpro Guideline Development Tool [Software]. gradepro.org .

Brunetti M, Shemilt I, Pregno S, Vale L, Oxman AD, Lord J, et al. GRADE guidelines: 10. Considering resource use and rating the quality of economic evidence. J Clin Epidemiol. 2013;66(2):140–50.

Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, et al. GRADE guidelines 6. Rating the quality of evidence—imprecision. J Clin Epidemiol. 2011;64(12):1283–93.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 8. Rating the quality of evidence—indirectness. J Clin Epidemiol. 2011;64(12):1303–10.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 7. Rating the quality of evidence—inconsistency. J Clin Epidemiol. 2011;64(12):1294–302.

Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, et al. GRADE guidelines: 5. Rating the quality of evidence—publication bias. J Clin Epidemiol. 2011;64(12):1277–82.

Guyatt GH, Oxman AD, Santesso N, Helfand M, Vist G, Kunz R, et al. GRADE guidelines: 12. Preparing summary of findings tables-binary outcomes. J Clin Epidemiol. 2013;66(2):158–72.

Guyatt GH, Oxman AD, Sultan S, Glasziou P, Akl EA, Alonso-Coello P, et al. GRADE guidelines: 9. Rating up the quality of evidence. J Clin Epidemiol. 2011;64(12):1311–6.

Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, et al. GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias). J Clin Epidemiol. 2011;64(4):407–15.

Guyatt GH, Thorlund K, Oxman AD, Walter SD, Patrick D, Furukawa TA, et al. GRADE guidelines: 13. Preparing summary of findings tables and evidence profiles-continuous outcomes. J Clin Epidemiol. 2013;66(2):173–83.

Kaminski-Hartenthaler A, Gartlehner G, Kien C, Meerpohl JJ, Langer G, Perleth M, et al. GRADE-Leitlinien: 11. Gesamtbeurteilung des Vertrauens in Effektschätzer für einen einzelnen Studienendpunkt und für alle Endpunkte. Zeitschrift für Evidenz, Fortbildung und Qualität im Gesundheitswesen. 2013;107(9):638–45.

Langendam M, Carrasco-Labra A, Santesso N, Mustafa RA, Brignardello-Petersen R, Ventresca M, et al. Improving GRADE evidence tables part 2: a systematic survey of explanatory notes shows more guidance is needed. J Clin Epidemiol. 2016;74:19–27.

Santesso N, Carrasco-Labra A, Langendam M, Brignardello-Petersen R, Mustafa RA, Heus P, et al. Improving GRADE evidence tables part 3: detailed guidance for explanatory footnotes supports creating and understanding GRADE certainty in the evidence judgments. J Clin Epidemiol. 2016;74:28–39.

Santesso N, Glenton C, Dahm P, Garner P, Akl EA, Alper B, et al. GRADE guidelines 26: informative statements to communicate the findings of systematic reviews of interventions. J Clin Epidemiol. 2020;119:126–35.

Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol. 2011;64(4):401–6.

Schünemann HJ, Higgins JPT, Vist GE, Glasziou P, Akl EA, Skoetz N, Guyatt GH, Group, Cochrane GRADEing Methods and Group, the Cochrane Statistical Methods. Chapter 14: completing ‘summary of findings’ tables and grading the certainty of the evidence. Cochrane handbook for systematic reviews of interventions. 2019. p. 375–402.

Azadbakht L, Nurbakhsh S. Effect of soy drink replacement in a weight reducing diet on anthropometric values and blood pressure among overweight and obese female youths. Asia Pac J Clin Nutr. 2011;20(3):383–9.

PubMed   CAS   Google Scholar  

Beavers KM, Serra MC, Beavers DP, Cooke MB, Willoughby DS. Soymilk supplementation does not alter plasma markers of inflammation and oxidative stress in postmenopausal women. Nutr Res. 2009;29(9):616–22.

Bricarello LP, Kasinski N, Bertolami MC, Faludi A, Pinto LA, Relvas WG, et al. Comparison between the effects of soy milk and non-fat cow milk on lipid profile and lipid peroxidation in patients with primary hypercholesterolemia. Nutrition. 2004;20(2):200–4.

Faghih S, Hedayati M, Abadi A, Kimiagar M. Comparison of the effects of cow’s milk, fortified soy milk, and calcium supplement on plasma adipocytokines in overweight and obese women. Iranian Journal of Endocrinology and Metabolism. 2009;11(6):692–8.

Gardner CD, Messina M, Kiazand A, Morris JL, Franke AA. Effect of two types of soy milk and dairy milk on plasma lipids in hypercholesterolemic adults: a randomized trial. J Am Coll Nutr. 2007;26(6):669–77.

Hasanpour A, Babajafari S, Mazloomi SM, Shams M. The effects of soymilk plus probiotics supplementation on cardiovascular risk factors in patients with type 2 diabetes mellitus: a randomized clinical trial. BMC Endocr Disord. 2023;23(1):36.

Keshavarz SA, Nourieh Z, Attar MJ, Azadbakht L. Effect of soymilk consumption on waist circumference and cardiovascular risks among overweight and obese female adults. Int J Prev Med. 2012;3(11):798–805.

PubMed   PubMed Central   Google Scholar  

Mohammad-Shahi M, Mowla K, Haidari F, Zarei M, Choghakhori R. Soy milk consumption, markers of inflammation and oxidative stress in women with rheumatoid arthritis: a randomised cross-over clinical trial. Nutr Diet. 2016;73(2):139–45.

Miraghajani MS, Esmaillzadeh A, Najafabadi MM, Mirlohi M, Azadbakht L. Soy milk consumption, inflammation, coagulation, and oxidative stress among type 2 diabetic patients with nephropathy. Diabetes Care. 2012;35(10):1981–5.

Mitchell JH, Collins AR. Effects of a soy milk supplement on plasma cholesterol levels and oxidative DNA damage in men—a pilot study. Eur J Nutr. 1999;38(3):143–8.

Nourieh Z, Keshavarz SA, Attar MJH, Azadbakht L. Effects of soy milk consumption on inflammatory markers and lipid profiles among non-menopausal overweight and obese female adults. Int J Prev Med. 2012;3:798.

Onning G, Akesson B, Oste R, Lundquist I. Effects of consumption of oat milk, soya milk, or cow’s milk on plasma lipids and antioxidative capacity in healthy subjects. Ann Nutr Metab. 1998;42(4):211–20.

Rivas M, Garay RP, Escanero JF, Cia P Jr, Cia P, Alda JO. Soy milk lowers blood pressure in men and women with mild to moderate essential hypertension. J Nutr. 2002;132(7):1900–2.

Ryan-Borchers TA, Park JS, Chew BP, McGuire MK, Fournier LR, Beerman KA. Soy isoflavones modulate immune function in healthy postmenopausal women. Am J Clin Nutr. 2006;83(5):1118–25.

Sirtori CR, Pazzucconi F, Colombo L, Battistin P, Bondioli A, Descheemaeker K. Double-blind study of the addition of high-protein soya milk v. cows’ milk to the diet of patients with severe hypercholesterolaemia and resistance to or intolerance of statins. Br J Nutr. 1999;82(2):91–6.

Sirtori CR, Bosisio R, Pazzucconi F, Bondioli A, Gatti E, Lovati MR, et al. Soy milk with a high glycitein content does not reduce low-density lipoprotein cholesterolemia in type II hypercholesterolemic patients. Ann Nutr Metab. 2002;46(2):88–92.

Steele M. Effect on serum cholesterol levels of substituting milk with a soya beverage. Aust J Nutr Diet. 1992;49(1):24–8.

Summary of Health Canada’s assessment of a health claim about soy protein and cholesterol lowering Ottawa: Health Canada; 2015 [Available from: https://www.canada.ca/en/health-canada/services/food-nutrition/food-labelling/health-claims/assessments/summary-assessment-health-claim-about-protein-cholesterol-lowering.html .

Food and Drug Administration. Food labeling health claims; soy protein and coronary heart disease. Fed Regist. 1999;64:57699–733.

Food and Drug Administration. Food labeling health claims; soy protein and coronary heart disease. Fed Regist. 2017;82:50324–46.

Blanco Mejia S, Messina M, Li SS, Viguiliouk E, Chiavaroli L, Khan TA, et al. A meta-analysis of 46 studies identified by the FDA demonstrates that soy protein decreases circulating LDL and total cholesterol concentrations in adults. J Nutr. 2019;149(6):968–81.

Jenkins DJA, Blanco Mejia S, Chiavaroli L, Viguiliouk E, Li SS, Kendall CWC, et al. Cumulative meta-analysis of the soy effect over time. J Am Heart Assoc. 2019;8(13):e012458.

Mosallanezhad Z, Mahmoodi M, Ranjbar S, Hosseini R, Clark CCT, Carson-Chahhoud K, et al. Soy intake is associated with lowering blood pressure in adults: a systematic review and meta-analysis of randomized double-blind placebo-controlled trials. Complement Ther Med. 2021;59:102692.

Viguiliouk E, Glenn AJ, Nishi SK, Chiavaroli L, Seider M, Khan T, et al. Associations between dietary pulses alone or with other legumes and cardiometabolic disease outcomes: an umbrella review and updated systematic review and meta-analysis of prospective cohort studies. Adv Nutr. 2019;10(Suppl_4):S308–19.

Sohouli MH, Lari A, Fatahi S, Shidfar F, Găman M-A, Guimaraes NS, et al. Impact of soy milk consumption on cardiometabolic risk factors: a systematic review and meta-analysis of randomized controlled trials. Journal of Functional Foods. 2021;83:104499.

Neuenschwander M, Stadelmaier J, Eble J, Grummich K, Szczerba E, Kiesswetter E, et al. Substitution of animal-based with plant-based foods on cardiometabolic health and all-cause mortality: a systematic review and meta-analysis of prospective studies. BMC Med. 2023;21(1):404.

Chiavaroli L, Nishi SK, Khan TA, Braunstein CR, Glenn AJ, Mejia SB, et al. Portfolio dietary pattern and cardiovascular disease: a systematic review and meta-analysis of controlled trials. Prog Cardiovasc Dis. 2018;61(1):43–53.

Viguiliouk E, Kendall CW, Kahleova H, Rahelic D, Salas-Salvado J, Choo VL, et al. Effect of vegetarian dietary patterns on cardiometabolic risk factors in diabetes: a systematic review and meta-analysis of randomized controlled trials. Clin Nutr. 2019;38(3):1133–45.

Glenn AJ, Guasch-Ferre M, Malik VS, Kendall CWC, Manson JE, Rimm EB, et al. Portfolio diet score and risk of cardiovascular disease: findings from 3 prospective cohort studies. Circulation. 2023;148(22):1750–63.

Glenn AJ, Lo K, Jenkins DJA, Boucher BA, Hanley AJ, Kendall CWC, et al. Relationship between a plant-based dietary portfolio and risk of cardiovascular disease: findings from the Women’s Health Initiative prospective cohort study. J Am Heart Assoc. 2021;10(16): e021515.

Lo K, Glenn AJ, Yeung S, Kendall CWC, Sievenpiper JL, Jenkins DJA, Woo J. Prospective association of the portfolio diet with all-cause and cause-specific mortality risk in the Mr. OS and Ms. OS study. Nutrients. 2021;13(12):4360.  https://doi.org/10.3390/nu13124360 .

Jenkins DJ, Mirrahimi A, Srichaikul K, Berryman CE, Wang L, Carleton A, et al. Soy protein reduces serum cholesterol by both intrinsic and food displacement mechanisms. J Nutr. 2010;140(12):2302S-S2311.

Ramdath DD, Padhi EM, Sarfaraz S, Renwick S, Duncan AM. Beyond the cholesterol-lowering effect of soy protein: a review of the effects of dietary soy and its constituents on risk factors for cardiovascular disease. Nutrients. 2017;9(4):324.  https://doi.org/10.3390/nu9040324 .

Download references

Acknowledgements

Aspects of this work were presented at the following conferences: Canadian Nutrition Society (CNS), Quebec City, Canada, May 4–6, 2023; 40th International Symposium on Diabetes and Nutrition, Pula, Croatia, June 15–18, 2023; and Nutrition 2023—American Society for Nutrition (ASN), Boston, USA, July 22–25, 2023.

Authors’ Twitter handles

@Toronto_3D_Unit.

This work was supported by the United Soybean Board (the United States Department of Agriculture Soybean Checkoff Program [funding reference number, 2411–108-0101]) and the Canadian Institutes of Health Research (funding reference number, 129920) through the Canada-wide Human Nutrition Trialists’ Network (NTN). The Diet, Digestive tract, and Disease (3D) Centre, funded through the Canada Foundation for Innovation and the Ministry of Research and Innovation’s Ontario Research Fund, provided the infrastructure for the conduct of this work. ME was funded by a CIHR Canada Graduate Scholarship and Toronto 3D PhD Scholarship award. DG was funded by an Ontario Graduate Scholarship. TAK and AZ were funded by a Toronto 3D Postdoctoral Fellowship Award. LC was funded by a Toronto 3D New Investigator Award. SA-C was funded by a CIHR Canadian Graduate Scholarship. DJAJ was funded by the Government of Canada through the Canada Research Chair Endowment. None of the sponsors had any role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication. But one of the co-authors, Mark Messina, who was involved in all aspects of the study except data collection or analysis, is the Director of Nutrition Science and Research at the Soy Nutrition Institute Global, an organization that receives partial funding from the principal funder, the United Soybean Board (USB).

Author information

Authors and affiliations.

Department of Nutritional Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada

M. N. Erlich, D. Ghidanac, S. Blanco Mejia, T. A. Khan, L. Chiavaroli, A. Zurbau, S. Ayoub-Charette, L. A. Leiter, R. P. Bazinet, D. J. A. Jenkins, C. W. C. Kendall & J. L. Sievenpiper

Toronto 3D Knowledge Synthesis and Clinical Trials Unit, Clinical Nutrition and Risk Factor Modification Centre, St. Michael’s Hospital, Toronto, ON, Canada

M. N. Erlich, D. Ghidanac, S. Blanco Mejia, T. A. Khan, L. Chiavaroli, A. Zurbau, S. Ayoub-Charette, L. A. Leiter, D. J. A. Jenkins, C. W. C. Kendall & J. L. Sievenpiper

Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON, Canada

L. Chiavaroli, L. A. Leiter, D. J. A. Jenkins & J. L. Sievenpiper

Royal College of Surgeons in Ireland, Dublin, Ireland

Soy Nutrition Institute Global, Washington, DC, USA

Department of Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada

L. A. Leiter, D. J. A. Jenkins & J. L. Sievenpiper

Division of Endocrinology and Metabolism, Department of Medicine, St. Michael’s Hospital, Toronto, ON, Canada

College of Pharmacy and Nutrition, University of Saskatchewan, Saskatoon, SK, Canada

C. W. C. Kendall

You can also search for this author in PubMed   Google Scholar

Contributions

The authors’ responsibilities were as follows: JLS designed the research (conception, development of overall research plan, and study oversight); ME and DG acquired the data; ME, SBM, TAK, and SAC performed the data analysis; JLS, ME, DG, SBM, AA, TAK, and LC interpreted the data; JLS and ME drafted the manuscript, have primary responsibility for the final content, and take responsibility for the integrity of the data and accuracy of the data analysis; JLS, MNE, DG, SBM, TAK, LC, AZ, SAC, AA, MM, LAL, RPB, CWCK, and DJD contributed to the project conception and critical revision of the manuscript for important intellectual content and read and approved the final version of the manuscript. The corresponding author attests that all listed authors meet the authorship criteria and that no others meeting the criteria have been omitted. All authors read and approved the final manuscript.

Corresponding author

Correspondence to J. L. Sievenpiper .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

TAK reports receiving grants from Institute for the Advancement of Food and Nutrition Sciences (IAFNS, formerly ILSI North America) and National Honey Board (USDA Checkoff program). He has received honorariums from Advancement of Food and Nutrition Sciences (IAFNS), the International Food Information Council (IFIC), the Calorie Control Council (CCC), the International Sweeteners Association (ISA), and AmCham Dubai. He has received funding from the Toronto 3D Knowledge Synthesis and Clinical Trials foundation. LC has received research support from the Canadian Institutes of health Research (CIHR), Protein Industries Canada (a Government of Canada Global Innovation Clusters), The United Soybean Board (USDA soy “Checkoff” program), and the Alberta Pulse Growers Association. AZ is a part-time research associate at INQUIS Clinical Research, Ltd., a contract research organization. She has received consulting fees from Glycemic Index Foundation Inc. SA-C has received an honorarium from the International Food Information Council (IFIC) for a talk on artificial sweeteners, the gut microbiome, and the risk for diabetes. MM was employed by the Soy Nutrition Institute Global, an organization that receives funding from the United Soybean Board (USB) and from members involved in the soy industry. RPB has received industrial grants, including those matched by the Canadian government, and/or travel support or consulting fees largely related to work on brain fatty acid metabolism or nutrition from Arctic Nutrition, Bunge Ltd., Dairy Farmers of Canada, DSM, Fonterra Inc, Mead Johnson, Natures Crops International, Nestec Inc. Pharmavite, Sancero Inc., and Spore Wellness Inc. Moreover, Dr. Bazinet is on the executive of the International Society for the Study of Fatty Acids and Lipids and held a meeting on behalf of Fatty Acids and Cell Signaling, both of which rely on corporate sponsorship. Dr. Bazinet has given expert testimony in relation to supplements and the brain. DJAJ has received research grants from Saskatchewan & Alberta Pulse Growers Associations, the Agricultural Bioproducts Innovation Program through the Pulse Research Network, the Advanced Foods and Material Network, Loblaw Companies Ltd., Unilever Canada and Netherlands, Barilla, the Almond Board of California, Agriculture and Agri-food Canada, Pulse Canada, Kellogg’s Company, Canada, Quaker Oats, Canada, Procter & Gamble Technical Centre Ltd., Bayer Consumer Care, Springfield, NJ, Pepsi/Quaker, International Nut & Dried Fruit Council (INC), Soy Foods Association of North America, the Coca-Cola Company (investigator initiated, unrestricted grant), Solae, Haine Celestial, the Sanitarium Company, Orafti, the International Tree Nut Council Nutrition Research and Education Foundation, the Peanut Institute, Soy Nutrition Institute (SNI), the Canola and Flax Councils of Canada, the Calorie Control Council, the Canadian Institutes of Health Research (CIHR), the Canada Foundation for Innovation (CFI), and the Ontario Research Fund (ORF). He has received in-kind supplies for trials as a research support from the Almond board of California, Walnut Council of California, the Peanut Institute, Barilla, Unilever, Unico, Primo, Loblaw Companies, Quaker (Pepsico), Pristine Gourmet, Bunge Limited, Kellogg Canada, and WhiteWave Foods. He has been on the speaker’s panel, served on the scientific advisory board and/or received travel support and/or honoraria from Lawson Centre Nutrition Digital Series, Nutritional Fundamentals for Health (NFH)-Nutramedica, Saint Barnabas Medical Center, The University of Chicago, 2020 China Glycemic Index (GI) International Conference, Atlantic Pain Conference, Academy of Life Long Learning, the Almond Board of California, Canadian Agriculture Policy Institute, Loblaw Companies Ltd, the Griffin Hospital (for the development of the NuVal scoring system), the Coca-Cola Company, Epicure, Danone, Diet Quality Photo Navigation (DQPN), Better Therapeutics (FareWell), Verywell, True Health Initiative (THI), Heali AI Corp, Institute of Food Technologists (IFT), Soy Nutrition Institute (SNI), Herbalife Nutrition Institute (HNI), Saskatchewan & Alberta Pulse Growers Associations, Sanitarium Company, Orafti, the International Tree Nut Council Nutrition Research and Education Foundation, the Peanut Institute, Herbalife International, Pacific Health Laboratories, Barilla, Metagenics, Bayer Consumer Care, Unilever Canada and Netherlands, Solae, Kellogg, Quaker Oats, Procter & Gamble, Abbott Laboratories, Dean Foods, the California Strawberry Commission, Haine Celestial, PepsiCo, the Alpro Foundation, Pioneer Hi-Bred International, DuPont Nutrition and Health, Spherix Consulting and WhiteWave Foods, the Advanced Foods and Material Network, the Canola and Flax Councils of Canada, Agri-Culture and Agri-Food Canada, the Canadian Agri-Food Policy Institute, Pulse Canada, the Soy Foods Association of North America, the Nutrition Foundation of Italy (NFI), Nutra-Source Diagnostics, the McDougall Program, the Toronto Knowledge Translation Group (St. Michael’s Hospital), the Canadian College of Naturopathic Medicine, The Hospital for Sick Children, the Canadian Nutrition Society (CNS), the American Society of Nutrition (ASN), Arizona State University, Paolo Sorbini Foundation, and the Institute of Nutrition, Metabolism and Diabetes. He received an honorarium from the United States Department of Agriculture to present the 2013 W.O. Atwater Memorial Lecture. He received the 2013 Award for Excellence in Research from the International Nut and Dried Fruit Council. He received funding and travel support from the Canadian Society of Endocrinology and Metabolism to produce mini cases for the Canadian Diabetes Association (CDA). He is a member of the International Carbohydrate Quality Consortium (ICQC). His wife, Alexandra L Jenkins, is a director and partner of INQUIS Clinical Research for the Food Industry, his 2 daughters, Wendy Jenkins and Amy Jenkins, have published a vegetarian book that promotes the use of the foods described here, The Portfolio Diet for Cardiovascular Risk Reduction (Academic Press/Elsevier 2020 ISBN:978–0-12–810510-8), and his sister, Caroline Brydson, received funding through a grant from the St. Michael’s Hospital Foundation to develop a cookbook for one of his studies. He is also a vegan. CWCK has received grants or research support from the Advanced Food Materials Network, Agriculture and Agri-Foods Canada (AAFC), Almond Board of California, Barilla, Canadian Institutes of Health Research (CIHR), Canola Council of Canada, International Nut and Dried Fruit Council, International Tree Nut Council Research and Education Foundation, Loblaw Brands Ltd, the Peanut Institute, Pulse Canada, and Unilever. He has received in-kind research support from the Almond Board of California, Barilla, California Walnut Commission, Kellogg Canada, Loblaw Companies, Nutrartis, Quaker (PepsiCo), the Peanut Institute, Primo, Unico, Unilever, and WhiteWave Foods/Danone. He has received travel support and/or honoraria from the Barilla, California Walnut Commission, Canola Council of Canada, General Mills, International Nut and Dried Fruit Council, International Pasta Organization, Lantmannen, Loblaw Brands Ltd, Nutrition Foundation of Italy, Oldways Preservation Trust, Paramount Farms, the Peanut Institute, Pulse Canada, Sun-Maid, Tate & Lyle, Unilever, and White Wave Foods/Danone. He has served on the scientific advisory board for the International Tree Nut Council, International Pasta Organization, McCormick Science Institute, and Oldways Preservation Trust. He is a founding member of the International Carbohydrate Quality Consortium (ICQC), Executive Board Member of the Diabetes and Nutrition Study Group (DNSG) of the European Association for the Study of Diabetes (EASD), is on the Clinical Practice Guidelines Expert Committee for Nutrition Therapy of the EASD, and is a Director of the Toronto 3D Knowledge Synthesis and Clinical Trials foundation. JLS has received research support from the Canadian Foundation for Innovation, Ontario Research Fund, Province of Ontario Ministry of Research and Innovation and Science, Canadian Institutes of health Research (CIHR), Diabetes Canada, American Society for Nutrition (ASN), National Honey Board (U.S. Department of Agriculture [USDA] honey “Checkoff” program), Institute for the Advancement of Food and Nutrition Sciences (IAFNS), Pulse Canada, Quaker Oats Center of Excellence, INC International Nut and Dried Fruit Council Foundation, The United Soybean Board (USDA soy “Checkoff” program), Protein Industries Canada (a Government of Canada Global Innovation Cluster), Almond Board of California, European Fruit Juice Association, The Tate and Lyle Nutritional Research Fund at the University of Toronto, The Glycemic Control and Cardiovascular Disease in Type 2 Diabetes Fund at the University of Toronto (a fund established by the Alberta Pulse Growers), The Plant Protein Fund at the University of Toronto (a fund which has received contributions from IFF among other donors), The Plant Milk Fund at the University of Toronto (a fund established by the Karuna Foundation through Vegan Grants), and The Nutrition Trialists Network Fund at the University of Toronto (a fund established by donations from the Calorie Control Council and Physicians Committee for Responsible Medicine). He has received food donations to support randomized controlled trials from the Almond Board of California, California Walnut Commission, Danone, Nutrartis, Soylent, and Dairy Farmers of Canada. He has received travel support, speaker fees and/or honoraria from Danone, FoodMinds LLC, Nestlé, Abbott, General Mills, Nutrition Communications, International Food Information Council (IFIC), Arab Beverages, International Sweeteners Association, Association Calorie Control Council, and Phynova. He has or has had ad hoc consulting arrangements with Perkins Coie LLP, Tate & Lyle, Ingredion, and Brightseed. He is on the Clinical Practice Guidelines Expert Committees of Diabetes Canada, European Association for the study of Diabetes (EASD), Canadian Cardiovascular Society (CCS), and Obesity Canada/Canadian Association of Bariatric Physicians and Surgeons. He serves as an unpaid member of the Board of Trustees of IAFNS. He is a Director at Large of the Canadian Nutrition Society (CNS), founding member of the International Carbohydrate Quality Consortium (ICQC), Executive Board Member of the Diabetes and Nutrition Study Group (DNSG) of the EASD, and Director of the Toronto 3D Knowledge Synthesis and Clinical Trials foundation. His spouse is an employee of AB InBev. All other authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

12916_2024_3524_moesm1_esm.docx.

Additional file 1: This file contains Additional file 1 material, including the PRISMA checklist, further details on the search process, and additional results.

figure 1

Flow of literature on the effect of substituting soymilk for cow’s milk on intermediate cardiometabolic outcomes. Exclusion criteria: duplicate, abstract only (conference abstract), non-human (animal study), in vitro, review/position paper/commentary/letter, observational (observational study), no soymilk (intervention was not soymilk), children (participants < 18 years of age), no suitable comparator (comparator was not cow’s milk), isolated soy protein (an ISP powder was given to participants), acute (follow-up of < 3 weeks), combined intervention (effects of intervention and comparator could not be isolated), wrong endpoint (no data for outcomes of interest), alternative publication (repeated data from original publication)

figure 2

A summary plot for the effect of substituting soymilk for cow’s milk on intermediate cardiometabolic outcomes. Analyses were conducted using generic, inverse variance random-effects models (at least 5 trials available), or fixed-effects models (fewer than 5 trials available). Between-study heterogeneity was assessed by the Cochrane Q statistic, where P Q  < 0.100 was considered statistically significant, and quantified by the I 2 statistic, where I 2  ≥ 50% was considered evidence of substantial heterogeneity. The GRADE of randomized controlled trials are rated as “high” certainty of evidence and can be downgraded by 5 domains and upgraded by 1 domain. The white squares represent no downgrades, the filled black squares indicate a single downgrade or upgrades for each outcome, and the black square with a white “2” indicates a double downgrade for each outcome. Because all included trials were randomized or nonrandomized controlled trials, the certainty of the evidence was graded as high for all outcomes by default and then downgraded or upgraded based on prespecified criteria. Criteria for downgrades included risk of bias (downgraded if most trials were considered to be at high ROB); inconsistency (downgraded if there was substantial unexplained heterogeneity: I 2  ≥ 50%; P Q  < 0.10); indirectness (downgraded if there were factors absent or present relating to the participants, interventions, or outcomes that limited the generalizability of the results); imprecision (downgraded if the 95% CI crossed the minimally important difference (MID) for harm or benefit); and publication bias (downgraded if there was evidence of publication bias based on the funnel plot asymmetry and/or significant Egger or Begg test ( P  < 0.10)), with confirmation by adjustment using the trim-and-fill analysis of Duval and Tweedie. The criteria for upgrades included a significant dose–response gradient. For the interpretation of the magnitude, we used the MIDs to assess the importance of magnitude of our point estimate using the effect size categories according to the new GRADE guidance. Then, we used the MIDs to assess the importance of the magnitude of our point estimates using the effect size categories according to the GRADE guidance as follows: a large effect (≥ 5 × MID); moderate effect (≥ 2 × MID); small important effect (≥ 1 × MID); and trivial/unimportant effect (< 1 MID). *HDL-C values reversed to show benefit. **LDL-C was not downgraded for imprecision, as the degree to which the upper 95% CI crosses the MID is not clinically meaningful. Additionally, the moderate change in non-HDL-C, with high certainty of evidence, substantiates the high certainty of the LDL-C results.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Erlich, M.N., Ghidanac, D., Blanco Mejia, S. et al. A systematic review and meta-analysis of randomized trials of substituting soymilk for cow’s milk and intermediate cardiometabolic outcomes: understanding the impact of dairy alternatives in the transition to plant-based diets on cardiometabolic health. BMC Med 22 , 336 (2024). https://doi.org/10.1186/s12916-024-03524-7

Download citation

Received : 20 December 2023

Accepted : 09 July 2024

Published : 22 August 2024

DOI : https://doi.org/10.1186/s12916-024-03524-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Soy protein
  • Cardiovascular disease
  • Systematic review
  • Meta-analysis
  • Randomized controlled feeding trials

BMC Medicine

ISSN: 1741-7015

peer review analysis of research

  • - Google Chrome

Intended for healthcare professionals

  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Suicide rates among...

Suicide rates among physicians compared with the general population in studies from 20 countries: gender stratified systematic review and meta-analysis

Linked editorial.

Doctors and suicide

  • Related content
  • Peer review
  • 1 Department of Epidemiology, Center for Public Health, Medical University of Vienna, Vienna, Austria
  • 2 Department of Emergency Medicine, Vienna General Hospital, Medical University of Vienna, Vienna, Austria
  • 3 Department of Social and Preventive Medicine, Center for Public Health, Medical University of Vienna, Vienna, Austria
  • 4 Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
  • Correspondence to: E Schernhammer eva.schernhammer{at}muv.ac.at ( @EvaSchernhammer on X)
  • Accepted 10 June 2024

Objectives To estimate age standardised suicide rate ratios in male and female physicians compared with the general population, and to examine heterogeneity across study results.

Design Systematic review and meta-analysis.

Data sources Studies published between 1960 and 31 March 2024 were retrieved from Embase, Medline, and PsycINFO. There were no language restrictions. Forward and backwards reference screening was performed for selected studies using Google Scholar.

Eligibility criteria for selecting studies Observational studies with directly or indirectly age standardised mortality ratios for physician deaths by suicide, or suicide rates per 100 000 person years of physicians and a reference group similar to the general population, or extractable data on physician deaths by suicide suitable for the calculation of ratios. Two independent reviewers extracted data and assessed the risk of bias using an adapted version of the Joanna Briggs Institute checklist for prevalence studies. Mean effect estimates for male and female physicians were calculated based on random effects models, with subgroup analyses for geographical region and a secondary analysis of deaths by suicide in physicians compared with other professions.

Results Among 39 included studies, 38 studies for male physicians and 26 for female physicians were eligible for analyses, with a total of 3303 suicides in male physicians and 587 in female physicians (observation periods 1935-2020 and 1960-2020, respectively). Across all studies, the suicide rate ratio for male physicians was 1.05 (95% confidence interval 0.90 to 1.22). For female physicians, the rate ratio was significantly higher at 1.76 (1.40 to 2.21). Heterogeneity was high for both analyses. Meta-regression revealed a significant effect of the midpoint of study observation period, indicating decreasing effect sizes over time. The suicide rate ratio for male physicians compared with other professions was 1.81 (1.55 to 2.12).

Conclusion Standardised suicide rate ratios for male and female physicians decreased over time. However, the rates remained increased for female physicians. The findings of this meta-analysis are limited by a scarcity of studies from regions outside of Europe, the United States, and Australasia. These results call for continued efforts in research and prevention of physician deaths by suicide, particularly among female physicians and at risk subgroups.

Systematic review registration PROSPERO CRD42019118956.

Introduction

In 2019, suicide caused over 700 000 deaths globally, which was more than one in every 100 deaths that year (1.3%). While the worldwide age standardised suicide rate was estimated at 9.0 per 100 000 population, there was great variation between individual countries (from <2 to >80 suicide deaths per 100 000). 1 The overall global decline in suicide rates by 36% since 2000 is not a universal trend because some countries like the United States or Brazil saw an increase of roughly the same magnitude. 1 2 Among many other social and environmental factors, occupation has been shown to influence suicide risk beyond established risk factors such as low socioeconomic status or educational attainment. 3 4 5 6 7

Physicians are one of several occupational groups linked to a higher risk of death by suicide, and the medical community has a longstanding and often conflicted history in addressing this issue. 8 A JAMA editorial from 1903 reviewed annual suicide numbers for US physicians and concluded that their suicide risk is higher compared with the general population. 9 A substantial amount of evidence has been accumulated globally in the 120 years since then, providing more insight on the topic and the challenges involved in its assessment. Most earlier research reported higher suicide rates for male and female physicians compared with the general population, and the mean effect estimates from the first meta-analysis in 2004 indicated a significantly increased standardised mortality ratio (SMR) of 1.41 for male physicians and 2.27 for female physicians. 10 This meta-analysis included 22 studies on suicide in physicians with observation periods between 1910 and 1998 and revealed some heterogeneity among study results, which was partly explained by the decline in risk over time. Similarly, another meta-analysis that included nine studies with observation periods between 1980 and 2015 reported a significantly decreased SMR of 0.68 for male physicians and a significantly increased SMR of 1.46 for female physicians. 11

In addition to publication year, several other factors could potentially drive heterogeneity between the published studies. Methodological differences in study design, outcome measures, and level of age standardisation could explain heterogeneity between studies. Furthermore, individual countries and world regions have varying levels of stigma about suicide in general and among physicians in particular, associated with different risks of underreporting, access to support systems, and generally different training and working conditions.

In this study, we aimed to perform an appraisal of the currently available evidence on suicide deaths in male and female physicians compared with the general population. We also aimed to explore heterogeneity by considering a broader spectrum of potential covariates. We hypothesise that suicide rate ratios for male and female physicians have declined over time, but gender differences persist and suicide risk remains increased for female physicians.

Search strategy and study selection

This meta-analysis was conducted based on recommendations of the Cochrane Collaboration, 12 and is reported in accordance with the preferred reporting items for systematic review and meta-analyses (PRISMA) statement. 13 We searched for observational studies with data on suicide rates in physicians compared with the general population or similar using Medline, PsycINFO, and Embase. “Physician,” “mortality,” and “suicide” were entered as MeSH terms and text words and then connected through Boolean operators. The specific search strategy was developed and adapted for each database with the support of librarians from the Medical University of Vienna (supplement table S1). Following Schernhammer and Colditz, 10 we limited the search period to articles published after 1960 but updated it through to 31 March 2024. No constraints were placed on the language in which the reports were written, the region where study participants lived, or their age group. Articles published in languages other than English or German were screened with the help of the translation software DeepL 14 and colleagues fluent in these languages. Screening of the literature was done independently by two reviewers (CZ and SS). We also performed forward and backwards reference screening for the included articles and searched for unpublished data from sources and databases listed in included articles, such as the US National Institute for Occupational Safety and Health, the UK Office for National Statistics, Switzerland’s Federal Statistical Office, and Statistics Denmark.

We excluded studies that reported only on specific suicide methods in physicians, non-fatal suicidal behaviour or thoughts, mental health and burnout, and suicide prevention. We also excluded conference abstracts, editorials, case studies, and letters. Only reports with adequate data about physician deaths by suicide (not attempts) were eligible.

At the full text screening stage, we decided to only include rate based outcome measures that compare the suicide mortality in a physician population with the suicide mortality in a reference population. This includes the indirectly standardised mortality ratio (SMR), directly standardised rate ratio (SRR), and the comparative mortality figure. Even though their formulas and recommended uses differ and might yield slightly different results when calculated for the exact same population, 15 it can be argued that they are comparable estimates for the purpose of meta-analysing suicide deaths in physicians compared with a reference population. We also included rate ratios, even though their level of age standardisation is typically less detailed and only comprises one age group (with lower or upper age cutoff points). However, the proportionate mortality ratio expresses a different concept (the cause specific SMR divided by the all cause SMR, or the rate of suicides in all physician deaths divided by the rate of suicides in all population deaths). This outcome measure is not suitable for calculation of combined estimates with SMRs, especially in target populations with higher general life expectancy like physicians, 16 and was therefore not included. We also excluded studies that reported odds ratios and relative risk calculations because these are not based on rates.

We avoided overlapping time periods of the same geographical regions among included studies so that any physician death by suicide would only be counted once towards the pooled result. In case of overlaps, only one study was included, and the decision of which to include was based on three criteria in sequential order: sample size (higher number of observed suicides); risk of bias (lower risk of bias based on the Joanna Briggs Institute (JBI) checklist for prevalence studies); and recentness (more recent midpoint of observation period). We also excluded studies that only reported overall (and not gender stratified) suicide ratios, only covered physician subgroups (eg, medical specialties), or did not meet minimum requirements for sample size (ie, an expected number of one suicide). When necessary information for inclusion was missing from eligible studies or the source of data was unclear, we contacted the authors. We excluded studies if the necessary information could not be obtained. A detailed list of excluded references including reason for exclusion can be found in the supplement (table S2).

Data extraction and risk of bias

Data extraction was conducted by two reviewers (CZ and SS) using a standardised table in Microsoft Excel. If studies did not include an SMR, but reported the numbers of observed (O) and expected (E) suicides or the necessary information to calculate them, the SMR was calculated by the reviewers (SMR=O/E). If the studies did not include an SRR or rate ratio, but reported (age standardised) suicide rates per 100 000 person years for physicians (R1) and a suitable reference population (R2) for a similar time period, the SRR or rate ratio was calculated (SRR=R1/R2, rate ratio=R1/R2). For one study, R1 and R2 were estimated from graphs. 17 Because not all studies reported confidence limits and the ones that did used different methods, we calculated 95% confidence intervals (CIs) for all studies based on Fisher’s exact test using observed and expected suicide numbers. For SRRs or rate ratios, we calculated the expected suicides by treating the SRR as an SMR (E=O/SRR). Standard errors were derived from the calculated 95% CIs by using the formula recommended for ratios in the Cochrane handbook (standard error=(ln upper CI limit – ln lower CI limit)/3.92). 12

In addition to variables relating to the main outcome, we extracted data on the following study characteristics to be used in sensitivity analyses: geographical location, observation period, age range, level of age standardisation, suicide classification, study design, and reference group. We used duplicate extraction and checked the final extraction table for errors to ensure accuracy.

Because there was no suitable validated scale to assess the quality of observational studies on mortality ratios, we used the JBI checklist for prevalence studies 18 as a critical appraisal tool for risk of bias assessment. Out of nine questions on this checklist, three were deemed not applicable owing to the investigation of mortality rather than morbidity (see supplement table S3a). Two reviewers (CZ and SS) independently evaluated a subsample of the included studies and the JBI checklist was subsequently further specified to achieve clear criteria for risk of bias assessment (see supplement table S3b). The same two reviewers then independently evaluated all studies (supplement table S4a and S4b). Consistency in rating was high, disagreements were resolved through discussion. If all applicable items of the JBI checklist were rated positive, a study was classified as having low risk of bias. If at least one item was rated negative or unclear, a study was classified as having moderate or high risk of bias.

Data analysis

We performed separate meta-analyses of suicide rate ratios for male and female physicians. Random effects models were chosen a priori owing to the assumption that the included studies represent a random sample of different yet comparable physician populations with some heterogeneity in effect size. 19 Random effects models were calculated based on the Hartung-Knapp method (also known as the Sidik-Jonkman method). 20 Cumulative meta-analyses were performed to examine changes in the overall mean effect estimate over time. Heterogeneity was assessed by Q tests, I 2 , T 2 , and prediction intervals.

Begg and Egger tests were conducted to evaluate the possibility of publication bias, which was also assessed by funnel plot and trim-and-fill analysis. We performed sensitivity analyses using meta-regression (for single covariates and adjusted for study observation period midpoint), including binary variables for several study characteristics (see supplement table S5a and S5b): risk of bias (low risk v moderate or high risk studies), study design (registry based studies v others), outcome measures (SMR v others), level of age standardisation (detailed with several age groups used v others), suicide classification (narrow international classification of diseases (ICD) definition without deaths of undetermined intent v others), age range (studies with a cutoff point around retirement age v others), and reference group (general population v similar). We also performed meta-regressions for length of observation period and number of suicides. Subgroup analysis was performed to assess geographical differences in two categorisations: World Health Organization world regions (with studies from the Americas, European Region, and Western Pacific Region for male and female physicians, only one study from the African Region for male physicians, and no studies from the South East Asian and Eastern Mediterranean Region) and most common study origin regions, reflecting the accumulation of reports from certain parts of the world (US, UK, Scandinavia, other European countries, rest of the world). We also used subgroups to calculate mean effect estimates in older and more recent studies. Two groups were formed based on the midpoint of study observation period, with one subgroup consisting of the 10 most recent studies, and another subgroup with the remaining studies. To accommodate for multiple testing, we adapted the level of significance to P<0.01 for all sensitivity analyses.

We conducted a secondary meta-analysis on suicide rates in physicians compared with another reference group that was more similar than the general population in terms of socioeconomic status. Studies were included if they provided data on deaths by suicide in physicians as well as a group of other professions with similar socioeconomic status (all other eligibility criteria remained the same).

All analyses were performed with Stata (version 17). This study was registered at the International Prospective Register of Ongoing Systematic Reviews (PROSPERO) under CRD42019118956.

Patient and public involvement

Several authors of this paper have trained and worked as physicians, and lived through the loss of colleagues to suicide. Their firsthand experiences offered valuable insights similar to those typically provided by patients. Because of the highly methodical nature of a systematic review and meta-analysis, it was difficult to involve members of the public in most areas of the study design and execution. However, patient and public involvement representatives reviewed the manuscript after submission and offered suggestions on language, dissemination, and general improvements to increase its relevance to those affected by physician deaths by suicide.

Included studies

The initial literature search yielded 23 458 studies. After removing duplicates and screening titles and abstracts, we were left with 786 articles. Application of the inclusion criteria resulted in 75 reports and we found a further 22 potentially eligible studies through reference list and registry based searches. Full text screening resulted in 38 studies for male physicians and 26 for female physicians that were eligible for analyses ( fig 1 ). Because a few studies provided more than one effect estimate, 21 22 a total of 42 datasets (male physicians) and 27 datasets (female physicians) were used for meta-analysis ( table 1 and table 2 ).

Fig 1

Flowchart showing study selection

  • Download figure
  • Open in new tab
  • Download powerpoint

Characteristics of included studies on male physicians

  • View inline

Characteristics of included studies on female physicians

Meta-analyses

The meta-analysis on suicide deaths in male physicians ( fig 2 ) produced a mean effect estimate of 1.05 (95% CI 0.90 to 1.22). The Q test was highly significant (Q=460.2, df=41, P<0.001), and the I 2 of 94% indicated that a high proportion of variance in the observed effects was caused by heterogeneity in true effects compared with sampling error. The variance of true effect size estimated with T 2 was 0.216, the standard deviation T was 0.465. The resulting prediction interval ranged from 0.41 to 2.72, which indicates that in 95% of all comparable future studies in male physician populations, the true effect size will fall in this interval. This finding reflects a high level of dispersion, suggesting that the suicide rates are decreased in some male physician populations but increased in others compared with the general population. Meta-regression confirmed calendar time (measured by midpoint of study observation period) as a highly significant covariate (β=−0.015, P<0.001), with an adjusted R 2 indicating an explained proportion of 52% of between-study variance.

Fig 2

Forest plot of suicide rate ratios for male physicians compared with general population

The mean effect estimate for suicide deaths in female physicians ( fig 3 ) was 1.76 (95% CI 1.40 to 2.21). The Q test for heterogeneity was highly significant (Q=143.2, df=26, P<0.001), and the I 2 of 84% indicated a high proportion of variance caused by heterogeneity in true effects, with T 2 estimated at 0.278 and T at 0.523. The prediction interval ranged from 0.58 to 5.35, so the dispersion of the true effect size across studies on female physicians was also substantial, ranging from decreased suicide rates in some female physician populations to considerably increased rates in others. The midpoint of study observation period also showed a highly significant association with the pooled estimate in a meta-regression (β=−0.024, P<0.001), explaining 87% of between-study variance.

Fig 3

Forest plot of suicide rate ratios for female physicians compared with general population

A decrease in suicide rate ratios over time is shown by cumulative meta-analyses (supplement figure S1a and S1b). A decline in pooled estimates is observed for female physicians throughout all studies, and a decline for studies with midpoints of observation period after 1985 can be seen for male physicians.

Further analyses

We performed sensitivity analyses across all studies using meta-regression. We did not observe any significant (P<0.01) results for male or female physicians, for study design, outcome measures, level of age standardisation, suicide classification, age range, reference group, length of observation period, and number of suicides. We found a significant association between risk of bias and effect size for male (β=−0.475, P=0.001) and female (β=−0.601, P=0.003) physicians, but when adjusting for midpoint of observation period, this association was no longer significant.

Egger test and Begg test gave no evidence of publication bias for studies on male or female physicians. The funnel plots showed no asymmetry, although they did reflect the high heterogeneity between studies (figure S2a and S2b). The non-parametric trim-and-fill analyses imputed no studies for male or female physicians, therefore no difference in effect size was found for observed versus observed plus imputed studies.

We also performed subgroup analyses based on geographical study location in two different categorisations: WHO world regions and most common study origin regions. With both analyses, the decrease in effect sizes over time was visible in most subgroups, and lower effect sizes were observed especially in studies from Asian countries (supplement figures S3a, S3b, S4a, and S4b). This finding translates to lower overall suicide rates for male physicians in the Western Pacific Region of 0.61 (95% CI 0.35 to 1.04), or similarly, for studies outside of Europe and the US with 0.69 (0.45 to 1.06). This pattern was not observed for female physicians, although the suicide rate ratio for the Western Pacific Region (1.06, 0.34 to 3.32) was also the lowest compared with all other subgroups.

Given that calendar time has been shown to have a strong association with effect size, we also performed a subgroup analysis of the 10 most recent studies versus all older studies. For male physicians (supplement figure S5a), the mean effect estimate in the subgroup of 32 older datasets was increased at 1.17 (0.96 to 1.41), whereas in the subgroup of the 10 most recent studies it was significantly decreased at 0.78 (0.70 to 0.88). For female physicians (supplement figure S5b), the mean suicide rate ratio in the subgroup of 17 older studies was significantly increased at 2.21 (1.63 to 3.01). In the subgroup of the 10 most recent studies, the mean effect was still significantly increased at a lower level of 1.24 (1.00 to 1.55).

Secondary meta-analysis

We conducted another meta-analysis on suicide rates in physicians compared with other professions of similar socioeconomic status and identified eight studies that compared male physicians with a reference group of other academics, other professionals, other health professionals, or members of social class I (supplement figure S6 and table S6). The pooled effect estimate was significantly increased at 1.81 (95% CI 1.55 to 2.12). The Q test (Q=17.6, df=7, P=0.01) was significant, but the I 2 of 58% and the prediction interval of 1.15 to 2.87 indicated a lower level of heterogeneity compared with the main analysis, and a more similar effect size across studies. We found five studies on female physicians (supplement table S6). The results of these studies appeared similar to those for male physicians, but we deemed the number of eligible studies too low for a random effects meta-analysis. 62

In this meta-analysis summarising the available evidence on physician deaths by suicide, we found the rate ratio for female physicians to be significantly raised, but not for male physicians. This result confirmed our hypothesis that mean effect estimates would be lower than in a previous meta-analysis on the subject published in 2004. 10 Calendar time was identified as a significant covariate in both analyses, indicating decreasing suicide rate ratios for physicians over time. The high level of heterogeneity in results from different studies suggests that suicide risk for male and female physicians is not consistent across various physician populations. Therefore, the pooled effect estimate is only of limited use in describing the overall suicide risk for physicians compared with the general population. In a secondary meta-analysis, the suicide rate ratio of male physicians was shown to be significantly raised when other professional groups with similar socioeconomic status were used as a reference group, with less heterogeneity across study results.

Strengths and limitations of this study

We did not impose any language restrictions on our search strategy so that relevant studies from different geographical regions were found. Consequently, we were able to include a large number of studies from 20 countries providing overall and recent summary estimates based on a complete assessment of the available evidence. This study also explored a range of covariates as potential causes for heterogeneity.

Several weaknesses should also be mentioned. Underreporting of suicide deaths might be more common for physicians compared with the general population, 8 influencing ratios between those two populations in the original studies. Despite the large number of included reports, several geographical regions are still underrepresented in the available evidence, which limits the generalisability of findings.

Comparison with other studies

A systematic review on physician deaths by suicide included a meta-analysis of studies with observation periods between 1980 and 2015, 11 but found only nine eligible studies (a third of which were already included in the first meta-analysis by Schernhammer and Colditz 10 ). This analysis was also subject to some methodological limitations, such as using a potentially arbitrary starting point for study observation periods and not accounting for overlap between included studies (therefore counting some physician deaths by suicide twice). Another systematic review and meta-analysis on physician and healthcare worker deaths by suicide included only one new study compared with Schernhammer and Colditz 10 and so did not provide an updated estimate. 63 Additionally, this analysis included a large US study that reported increased proportionate mortality ratios, impacting the pooled estimate for male physicians towards showing an effect.

Meaning of the study

The results of this study suggest that across different physician populations, the suicide risk is decreasing compared with the general population, although it remains raised for female physicians. The causes of this decline are unknown, but several factors might play a part. The critical appraisal of the included studies indicated better study quality among more recent studies, which might have contributed to the decrease in effect sizes over time. Meta-regression results by Duarte and colleagues suggested that the decrease in suicide risk in male physicians was driven by a reduction in the rate of suicide deaths in physicians rather than an increase in suicide deaths in the population. 11 This finding could mean that physicians have benefitted more from general or targeted suicide prevention efforts compared with the general population, which is testament to the repeated calls for more awareness and interventions to support the mental health of physicians. 64 65 Furthermore, the proportion of female physicians has increased over recent decades, and the average proportion of female physicians across all OECD (Organisation for Economic Co-operation and Development) countries reached 50% in 2021. 66 This change is likely to affect working conditions in a historically male dominated field that could be relevant to the mental health of workers. Some evidence exists that occupational gender composition affects the availability of workplace support and affective wellbeing, with higher support levels in mixed rather than male dominated occupations. 67 68

It is important to note, however, that considerable heterogeneity exists in the suicide risk of different physician populations that is still partly unexplained. Working as a physician is probably associated with different risk and protective factors across diverse healthcare systems, as well as training and work environments. Additionally, prevailing attitudes and stigma about mental health and suicide could vary. Societal influences on suicide rates over time might affect physicians differently compared with the general population (eg, mental health stigma might differ for physicians compared with the general population, and change at a different rate). Therefore, it seems plausible that the relation between suicide deaths in physicians compared with the general population differs between regions and countries.

Policy implications

Overall, this study highlights the ongoing need for suicide prevention measures among physicians. We found evidence for increased suicide rates in female physicians compared with the general population, and for male physicians compared with other professionals. Additionally, the decreasing trend in suicide risk in physicians is not a universal phenomenon. An Australian study found a substantial increase in suicide risk for female physicians, which doubled between 2001 and 2017. 58 The recent covid-19 pandemic has put additional strain on the mental health of physicians, potentially exacerbating risk factors for suicide such as depression and substance use. 69 70 Other important risk factors include suicidal ideation and attempted suicide, and their prevalence among physicians was estimated by a recent meta-analysis. The results suggest higher levels of suicidal ideation among physicians compared with the general population, whereas the prevalence of suicide attempts appeared to be lower. 71 This finding could indicate that suicidal intent in physicians is more likely to result in fatal rather than non-fatal suicidal behaviours. 72 A systematic review on mental illness in physicians concluded that a coordinated range of mental health initiatives needs to be implemented at the individual and organisational level to create workplaces that support their mental health. 73 Evidence exists for effective physician directed interventions, but hardly any research on organisational measures to address suicide risk in physicians. 74 Continued advances in organisational strategies for the mental wellbeing of physicians are essential to support individual medical institutions in their efforts to foster supportive environments, combat gender discrimination, and integrate mental health awareness into medical education and training.

Recommendations for future research

In addition to more primary studies from world regions other than Europe, the US, and Australia, future research also needs to systematically look into other factors beyond study characteristics that might explain the heterogeneity in suicide risk in physicians. Such research would help in identifying physicians who are at risk, with targeted prevention measures and ways to adapt them to different clinical and cultural contexts. Because geographical or national differences appear to be important factors, future studies on suicide risk in physicians should bear in mind that the specific settings of any physician population might influence their risk and resilience factors to a much higher degree than previously assumed. Other major events that affect healthcare, such as the covid-19 pandemic, could also have a large impact. Future research is needed to assess any covid-19 related effects on suicide rates in physicians around the world.

What is already known on this topic

Many studies reported increased suicide rates for physicians, and a 2004 meta-analysis found significantly increased suicide rates for male and female physicians compared with the general population

Evidence on increased suicide rates for physicians is inconsistent across countries

What this study adds

Suicide rate ratios for physicians appear to have decreased over time, but are still increased for female physicians

A high level of heterogeneity exists across studies, suggesting that suicide risk varies among different physician populations

Further research is needed to identify physician populations and subgroups at higher risk of suicide

Ethics statements

Ethical approval.

Not required.

Data availability statement

Additional data are available from the corresponding author at [email protected] upon request.

Acknowledgments

The authors are grateful for the support in developing the literature search strategy that was provided by the library staff at the Medical University of Vienna, and for the generous help with translations that was provided by a number of colleagues from within and outside of this institution. The authors also want to acknowledge the efforts undertaken by the Federal Statistics Office (Switzerland) and the Office for National Statistics (UK) to provide original data that were used in this analysis. Furthermore, the authors thank Eduardo Vega who reviewed the paper after submission as a member of the public, as well as Lena Hübl and Klaus Michael Fröhlich who provided their perspectives as physicians.

Contributors: CZ, SS, and ES conceived and designed the study, HH and TN contributed and advised on methodological aspects. CZ performed the literature search and was the first reviewer for article screening, data extraction, and risk of bias assessment. SS was the second reviewer for article screening, data extraction, and risk of bias assessment. CZ performed the statistical analyses and SS accessed and verified the underlying study data. CZ, SS, and ES interpreted the data. CZ drafted the manuscript and prepared tables and figures. All authors critically revised the manuscript for intellectual content and approved the final version. ES supervised the study. CZ is the study guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: This study was partially supported by the Vienna Anniversary Foundation for Higher Education (grant number H-303766/2019). The funder had no role in the study design, data collection, analysis, or interpretation, or in writing or submitting the report. The researchers were independent from the funder and all authors had full access to all of the data (including statistical reports and tables) and can take responsibility for the integrity of the data and the accuracy of the data analysis.

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: CZ received partial funding from the Vienna Anniversary Foundation for Higher Education for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

Transparency: The lead author (the manuscript’s guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned (and, if relevant, registered) have been explained.

Dissemination to participants and related patient and public communities: The authors plan to disseminate the study findings through conference presentations, talks, press releases, social media, and in mandatory courses on mental wellbeing for medical students. The results will also be forwarded to national and international organisations that the authors have had contact with, to be disseminated both within these organisations and through their communication channels. This includes organisations in the field of mental health, public health, suicide prevention, and professional associations (for physicians and medical students); examples include the American Foundation for Suicide Prevention, the International Association for Suicide Prevention and particularly its Special Interest Group on Suicide and the Workplace, the Canadian Medical Association, the Austrian Public Health Association, and the Austrian Medical Chamber. Discussions on how these findings might be used in local and national suicide prevention efforts in Austria will involve physicians, hospital administrators, mental and occupational health professionals, and interested members of the public who are affected by suicidality among physicians.

Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

  • ↵ World Health Organization. Suicide worldwide in 2019: global health estimates. Published online 2021. Accessed 12 Sep 2023. https://www.who.int/publications-detail-redirect/9789240026643
  • Turecki G ,
  • Gunnell D ,
  • Mortensen PB ,
  • Nordentoft M
  • de Gelder R ,
  • Kapadia D ,
  • Øien-Ødegaard C ,
  • Spittal MJ ,
  • LaMontagne AD
  • Roberts SE ,
  • Jaremin B ,
  • Schernhammer ES ,
  • El-Hagrassy MM ,
  • Couto TCE ,
  • ↵ Cochrane. Cochrane Handbook for Systematic Reviews of Interventions. Version 6.3. Published 2022. Accessed 19 Jul 2023. https://training.cochrane.org/handbook/current
  • Liberati A ,
  • Altman DG ,
  • Tetzlaff J ,
  • ↵ Deep L. DeepL Translator. Accessed 19 Jul 2023. https://www.DeepL.com/translator
  • ↵ Windsor-Shellard B. Suicide by Occupation, England: 2011 to 2015 . Office for National Statistics; 2017. Accessed 17 Sep 2023. https://www.ons.gov.uk/releases/suicidesbyoccupationengland2011to2015
  • Stefansson CG ,
  • ↵ Munn Z, Moola S, Lisy K, Riitano D, Tufanaru C. Chapter 5: Systematic reviews of prevalence and incidence. In: JBI Manual for Evidence Synthesis . 2020. Accessed 19 Jul 2023. https://jbi.global/critical-appraisal-tools
  • ↵ Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to Meta-Analysis . Wiley; 2021. https://www.google.at/books/edition/Introduction_to_Meta_Analysis/2oYmEAAAQBAJ
  • IntHout J ,
  • Ioannidis JP ,
  • ↵ Office of Population Censuses and Surveys. Occupational Mortality 1979-80, 1982-83, Decennial Supplement, Part I+II. Her Majesty’s Stationery Office; 1986.
  • Friese CR ,
  • Lindhardt M ,
  • Frandsen E ,
  • Hamtoft H ,
  • Pitts FN Jr .
  • Dean G. The causes of death of South African doctors and dentists. Afr Med J . Published online 1969.
  • Revicki DA ,
  • Feuerlein W
  • Richings JC ,
  • Arnetz BB ,
  • Hedberg A ,
  • Theorell T ,
  • Allander E ,
  • Rimpelä AH ,
  • Nurminen MM ,
  • Pulkkinen PO ,
  • Rimpelä MK ,
  • Tokudome S ,
  • Nishizumi M ,
  • Kuratsune M
  • Schlicht SM ,
  • Gordon IR ,
  • Christie DG
  • Iwasaki A ,
  • Lindeman S ,
  • Hirvonen J ,
  • Lönnqvist J
  • Rafnsson V ,
  • Gunnarsdottir HK
  • Mosbech J ,
  • Clements A ,
  • Sakarovitch C ,
  • Hostettler M ,
  • Baburin A ,
  • Meltzer H ,
  • Griffiths C ,
  • Petersen MR ,
  • Aasland OG ,
  • Haldorsen T ,
  • Palhares-Alves HN ,
  • Palhares DM ,
  • Laranjeira R ,
  • Nogueira-Martins LA ,
  • Claessens H
  • Schwenk TL ,
  • Davidson JE ,
  • Herrero-Huertas L ,
  • Andérica E ,
  • FSO (Federal Statistics Office) Switzerland. Data request for physician suicide data. https://www.bfs.admin.ch/bfs/en/home.html
  • FMH (Foederatio Medicorum Helveticorum). Online-tool for the physician statistics of the Swiss Medical Association (FMH), data for 2008-2020. https://aerztestatistik.fmh.ch/
  • ONS (Office for National Statistics) UK. Suicide by Occupation in England: 2011 to 2015 and 2016 to 2020. Published 2021. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/adhocs/13768suicidebyhealthcarerelatedoccupationsengland2011to2015and2016to2020registrations
  • Zeritis S ,
  • Phillips M ,
  • Zimmermann C ,
  • Strohmaier S ,
  • Niederkrotenthaler T ,
  • Schernhammer E
  • Rothman K, Boyce J. Epidemiologic Analysis with a Programmable Calculator . Vol NIH Publication No. 79-1649. National Institutes of Health; 1979. Accessed 13 Sep 2023. https://hero.epa.gov/hero/index.cfm/reference/details/reference_id/3978444
  • Pitts FN Jr . ,
  • Schuller AB ,
  • Dutheil F ,
  • Pereira B ,
  • Goldman ML ,
  • Bernstein CA
  • ↵ OECD. Health at a Glance 2023: OECD Indicators. OECD Publ . Published online 2023. doi: 10.1787/7a7afb35-en OpenUrl CrossRef
  • ↵ Frank E, Dingle AD. Self-reported depression and suicide attempts among U.S. women physicians. Am J Psychiatry . Published online 1999. Accessed 25 Jun 2018. https://ajp.psychiatryonline.org/doi/pdf/10.1176/ajp.156.12.1887
  • Harvey SB ,
  • Epstein RM ,
  • Glozier N ,
  • Crawford J ,
  • Baker STE ,

peer review analysis of research

  • Open access
  • Published: 24 August 2024

Technical efficiency and its determinants in health service delivery of public health centers in East Wollega Zone, Oromia Regional State, Ethiopia: Two-stage data envelope analysis

  • Edosa Tesfaye Geta 1 ,
  • Dufera Rikitu Terefa 1 ,
  • Adisu Tafari Shama 1 ,
  • Adisu Ewunetu Desisa 1 ,
  • Wase Benti Hailu 1 ,
  • Wolkite Olani 1 ,
  • Melese Chego Cheme 1 &
  • Matiyos Lema 1  

BMC Health Services Research volume  24 , Article number:  980 ( 2024 ) Cite this article

Metrics details

Priority-setting becomes more difficult for decision-makers when the demand for health services and health care resources rises. Despite the fact that the Ethiopian healthcare system places a strong focus on the efficient utilization and allocation of health care resources, studies of efficiency in healthcare facilities have been very limited. Hence, the study aimed to evaluate efficiency and its determinants in public health centers.

A cross-sectional study was conducted in the East Wollega zone, Oromia Regional State, Ethiopia. Ethiopian fiscal year of 2021–2022 data was collected from August 01–30, 2022 and 34 health centers (decision-making units) were included in the analysis. Data envelope analysis was used to analyze the technical efficiency. A Tobit regression model was used to identify determinants of efficiency, declaring the statistical significance level at P  < 0.05, using 95% confidence interval.

The overall efficiency score was estimated to be 0.47 (95% CI = 0.36–0.57). Out of 34 health centers, only 3 (8.82%) of them were technically efficient, with an efficiency score of 1 and 31 (91.2%) were scale-inefficient, with an average score of 0.54. A majority, 30 (88.2%) of inefficient health centers exhibited increasing return scales. The technical efficiency of urban health centers was (β = -0.35, 95% CI: -0.54, -0.07) and affected health centers’ catchment areas by armed conflicts declined (β = -0.21, 95% CI: -0.39, -0.03) by 35% and 21%, respectively. Providing in-service training for healthcare providers increased the efficiency by 27%; 95% CI, β = 0.27(0.05–0.49).

Conclusions

Only one out of ten health centers was technically efficient, indicating that nine out of ten were scale-inefficient and utilized nearly half of the healthcare resources inefficiently, despite the fact that they could potentially reduce their inputs nearly by half while still maintaining the same level of outputs. The location of health centers and armed conflict incidents significantly declined the efficiency scores, whereas in-service training improved the efficiency. Therefore, the government and health sector should work on the efficient utilization of healthcare resources, resolving armed conflicts, organizing training opportunities, and taking into account the locations of the healthcare facilities during resource allocation.

Peer Review reports

The physical relationship between resources used (inputs) and outputs is referred to as technical efficiency (TE). A technically efficient position is reached when a set of inputs yields the maximum improvement in outputs [ 1 ]. Therefore, as it serves as a tool to achieve better health, health care can be viewed as an intermediate good, and efficiency is the study of the relationship between final health outcomes (lifes saved, life years gained, or quality-adjusted life years) and resource inputs (costs in the form of labor, capital, or equipment) [ 2 ].

Efficiency is a quality of performance that is evaluated by comparing the financial worth of the inputs, the resources utilized to produce a certain output and the output itself, which is a component of the health care system. Either maximizing output for a given set of inputs or minimising inputs required to produce a given output would make a primary health care (PHC) facility efficient. Technical efficiency is the minimum amount of resources required to produce a given output. Wastage or inefficiencies occur when resources are used more than is required to produce a given level of output [ 3 ].

According to the WHO, in order to make progress towards universal health coverage (UHC), more funding for healthcare is required as well as greater value for that funding. According to the 2010 Report, 20–40% of all resources used for health care are wasted [ 4 ]. In most countries, a sizable share of total spending goes into the health sector. Therefore, decision-makers and health administrators should place a high priority on improving the efficiency of health systems [ 5 ].

Efficient utilization of healthcare resources has a significant impact on the delivery of health services. It leads to better access to health services and improves their quality by optimizing the use of resources. Healthcare systems can reduce wait times, increase the number of patients served, and enhance the overall patient experience. When resources are used efficiently, it can result in cost savings for healthcare systems, which allows for the reallocation of funds to other areas in need, potentially expanding services or investing in new technologies [ 6 ].

Also, efficient use of healthcare resources can contribute to better health outcomes. For example, proper management of medical supplies can ensure that patients receive the necessary treatments without delay, leading to improved recovery rates, and it is key to the sustainability of health services by ensuring that healthcare systems can continue to provide care without exhausting financial or material resources [ 6 , 7 ].

Furthermore, proper resource allocation can help to reduce disparities in healthcare delivery by ensuring that resources are distributed based on need so that healthcare systems can work towards providing equitable care to all populations. Efficient resource utilization contributes to the resilience of health systems, enabling them to respond effectively to emergencies, such as pandemics or natural disasters, without compromising the quality of care [ 8 ].

One of the quality dimensions emphasized in strategegy of Ethiopian health sector transformation plan (HSTP) is the theme around excellence in quality improvement and assurance, which is a component of Ethiopia's National Health Financing Strategy (2015–2035), has been providing healthcare in a way that optimizes resource utilization and minimizes wastage [ 9 ]. The majority of efficiency evaluations of Ethiopia's health system have been conducted on a worldwide scale, evaluating various nations' relative levels of efficiency.

Spending on public health nearly doubled between 1995 and 2011. One of the fastest-growing economies, the gross domestic product (GDP) increased by 9% real on average between 1999 and 2012 [ 5 ]. As a result, the whole government budget was able to triple within the same time period (at constant 2010 prices), which resulted in additional funding for health [ 10 ].

External resources also rose from 1995 to 2011 from US$6 million to US$836 million (in constant 2012 dollar) [ 11 ]. The development of the health sector, particularly primary care, was dependent on this ongoing external financing, with external funding accounting for half of primary care spending in 2011 [ 12 ]. Over the past 20 years, Ethiopia's health system has experienced exceptional growth, especially at the primary care level. Prior to 2005, hospitals and urban areas received a disproportionate share of public health spending [ 13 ].

It is becoming more and more necessary for decision-makers to manage the demand for healthcare services and the available resources while striking a balance with competing goals from other sectors. As PHC enters a new transformative phase, beginning with the Health Sector Transformation Plan (HSTP), plans call for increased resource utilization efficiency. Over the course of the subsequent five years (2015/2016–2019/2020), Ethiopia planned to achieve UHC by strengthening the implementation of the nutrition programme and expanding PHC coverage to everyone through improved access to basic curative and preventative health care services [ 9 , 14 ].

Increasing efficiency in the health sector is one way to create financial space for health, and this might potentially free up even more resources to be used for delivering high-quality healthcare [ 15 ]. While there was a considerable emphasis on more efficient resource allocation and utilization during the Health Care and Financing Strategy (1998–2015) in Ethiopia, problems with health institutions' efficient utilization of resources persisted during this time [ 10 ]. Ethiopia is one of the least efficient countries in health system in the world which was ranked 169 th out of 191 countries [ 16 ].

Although maximising health care outputs requires evaluating the technical efficiency of health facilities in providing medical care, there is the lack of studies of this kind carried out across this country. Although the primary focus of health care reforms in Ethiopia is the efficient allocation and utilization of resources within the health system, there is a lack of studies on the efficiency of the country's primary health care system that could identify contributing factors, including incidents of armed conflict within the catchment population of the healthcare facilities, that may impact the efficiency level of these health care facilities. As a result, in the current study, the factors that might have an impact on the technical efficiency of the health centers were categorized into three categories: factors related to the environment, factors related to the health care facilities, and factors related to the health care providers (Fig.  1 ).

figure 1

Conceptual framework for technical efficiency of health centers in East Wollega zone, Oromia regional state, Ethiopia, 2022

In addition, the annual report of the East Wollega zonal health department for the Ethiopian fiscal year (EFY) 2021 and 2022 indicated that the performance of the health care facilities in the zone was low compared to other administrative zones of the region, Oromia Regional State. Therefore, this study aimed to evaluate technical efficiency and its determinants in the public health centers in East Wollega Zones, Oromia Regional State, Ethiopia.

Methods and materials

Study settings and design.

The study was carried out in public health care facilities, health centers found in East Wollega Zone, Oromia regional state, Ethiopia. The zone's capital city, Nekemte, is located around 330 kms from Addis Ababa, the capital of the country. The East Wollega Zone is located in the western part of the country, Ethiopia. Data for the EFY of July 2021 to June 2022 was retrospectively collected from August 1–30, 2022.

Data envelope analysis conceptual framework

A two-stage data envelope analysis (DEA) was employed in the current study. The two widely used DEA models, Banker, Charnes, and Cooper (BCC) and Charnes, Cooper, and Rhodes (CCR), were used to determine the technical efficiency (TE), pure technical efficiency (PTE), and scale efficiency (SE) scores for individual health centers which were considered as decision-making units (DMUs) in the first stage of the methodological framework. The overall technical efficiency (OTE) for the DMUs was determined using the CCR model, which assumed constant returns-to-scale (CRS), strong disposability of inputs and outputs, and convexity of the production possibility set. This efficiency value ranges from 0 to 1. Since the aim was to use the least amount of inputs with the same level of production in health centers, it is important to note that the model used input–output oriented approach. In general, this model evaluated the health centers' capabilities to produce a particular quantity of output with the least amount of inputs or, alternatively, the highest level of output that can be produced with the same amount of input. Overall, this model measured the ability of the health centers to produce a given level of output using the minimum amount of input, or alternatively, the maximum amount of output using a given amount of input, using the following formula: yrj : amount of output r from health centre  j , xij  : amount of input i to health centre j, ur: weight given to output; r , vi: weight given to input. i , n: number of health centers; s: number of outputs; m: number of inputs [ 17 , 18 ].

\(Max\;ho\;=\;\frac{\sum_{r=1}^suryijo}{\sum_{v=1}^mvixijo}\)

\(Subject\ to;\)

\(\frac{\sum_{r=1}^suryijo}{\sum_{v=1}^mvixijo}\;\leq1,j\;=\;1,\;\cdots\;jo,\;\cdots\;n,\)

\(ur\;\geq\;0\;r\;=\;1,\;\cdots\;,s\;and\;vi\;\geq0,\;i\;=\;1\;\cdots m\)

\(Max\;ho\;=\;\sum_{r=1}^s\text{uryrjo}.\)

\(Subject\ to;\)

\(Max\;ho=\sum_{r=1}^s\text{uryrjo}=1\)

\(Max\;ho\;=\;\sum_{r=1}^suryr-\sum_{r=1}^svixij\leq\;0,\;j\;=\;1\cdots,\;n\)

\(ur,\;vi\;\geq\;0\)

Constant returns to scale (CRS) were measured using the CCR model. The CCR model measuresd the health centre's ability to produce the expected amount of output from a given amount of input using the formula;

\(Max\;ho\;=\;\sum_{r=1}^s\text{uryrjo}.\)

\(Subject\;to;\)

\(Max\;ho\;=\;\sum_{r=1}^s\text{uryrjo}=1\)

\(Max\;ho\sum_{r=1}^suryr-\sum_{r=1}^svixij\;\leq\;0,\;j=\;1\dots,\;n\)

\(ur,\;vi\;\geq\;0\)

The BCC model was used to measure the variable returns to scale (VRS). When there are variations in output production levels and a proportionate increase in all inputs, this model works well for evaluating the PTE of health centers. The equation in use is:

\(Max\;ho\;=\sum_{r=1}^suryr+zjo\)

\(Subject\;to;\)

\(Max\;ho\;=\;\sum_{r=1}^suryr+zjo=1\)

\(Max\;ho\;=\;\sum_{1=r}^suryr-\sum_{r=1}^svixij+zjo\leq0,\;j\;=\;1,\cdots n\)

\(ur,\;vi\;\geq\;0\)

In the methodological framework of the second stage, the OTE scores estimated from the first stage was regeressed using a Tobit regression model. This was to identify determinants of the technical efficiency scores of the primary health care facilities, which included factors related to health centers, health care providers, and the environment. The coefficients (β) of the independent factors indicated their direction of influence on the dependent variable, which was the OTE score. The model used has been expressed below [ 19 ].

\(Yi\ast=\;{\mathrm\beta}_0+\mathrm\beta x_i+{\mathrm\varepsilon}_{\mathrm i},\;\mathrm i=1,\;2,\;\dots\mathrm n\)

\(Yi\ast\;=\;0,\;if\;yi\;\leq\;0,\)

\(Yi\ast\;=\;Yi,\;if\;0\;<\;Yi\ast\;=\;1,\;if\;yi\;\geq\;1,\)

Where γ i * is the limited dependent variable, which represented the technical efficiency score, γ i is the observed dependent (censored) variable, x i is the vector of independent variables (factors related to health centers, health care providers, and the environment). β 0 represented intercept (constant) whereas β 1 , β 2 and β 3 were the parameters of the independent variables (coefficients), ε i was a disturbance term assumed to be independently and normally distributed with zero mean and constant variance σ; and i = 1, 2,…n, (n is the number of observations, n  = 34 health centers).

Study variables

Input variables.

The input variables comprised financial resources (salary and incentives) and human resources (number of administrative staffs, clinical and midwife nurses, laboratory technicians and technologists, pharmacy technicians and pharmacists, public health officers, general physicians, and other health care professionals, as well as other non-clinical staffs).

Output variables

Output variables comprised the number of women who had 4 visits of antenatal care (4ANC), number of deliveries, number of mothers who received postnatal care (PNC), number of women who had family planning visits, number of children who received full immunization, number of children aged 6–59 months who received vitamin A supplements, number of clients counseled and tested for human immunodeficiency virus (HIV), number of HIV patients who had follow-up care, number of patients diagnosed for TB, number of TB patients who had follow-up care and complete their treatment, number of outpatients who visited the health facilities for other general health services.

Depedent variable

Overall technical efficiency scores of the health centers.

Independent variables

The explanatory variables used in the Tobit regression model were the location of the health centers, accessibility of the health centers to transportation services, support from non-governmental organisations (NGOs), armed conflict incidents in the catchment areas, adequate electricity and water supply, in-service health care provider training, availability of diagnostic services (laboratory services), availability of adequate drug supply, room arrangements for proximity between related services, and marking the rooms with the number and type of services they provide.

Study health facilities

Public health centers in the districts of the East Wollega Zone were the study facilities. In the context of the Ethiopian health care system, a health center is a health facility within the primary health care system that provides promotive, preventive, curative, and rehabilitative outpatient care, including basic laboratory and pharmacy services. This health facility typically has a capacity of 10 beds for emergency and delivery services. Health centers serve as referral centers for health posts and provide supportive supervision for health extension workers (HEWs). It is expected that one health center provides services to a population of 15,000–25,000 within its designated catchment area. There were 17 districts and 67 public health centers in the zone. Nine districts (50%) and thirty-four health centers (50%) were included in the analysis.

Data collection instrument and technique

Data collection was conducted using the document review checklist, which was developed after the review of the Ethiopian standard related to the requirements for health care facilities. Data for the EFY of July 2021 to June 2022 was retrospectively collected. The contents of the document review checklist (data collection instrument) included inputs, outputs, and factors related to health centers, the environment, and health care providers.

Data analysis

Initially, STATA 14 was used to compute descriptive statistics for each input and output variable. For each input and output variable, the mean, standard deviation (SD), minimum and maximum values were presented. Next, MaxDEA7 ( http://maxdea.com ) was used to compute the technical efficiency, pure technical efficiency, scale efficiency scores, and input reduction and/or output increases.

The efficiency of the health centers below the efficiency frontier was measured in terms of their distance from the frontier. If the technical efficiency (TE) score closes to 0, it indicates that the health center is technically inefficient because its production lies below the stochastic frontier. The higher the value of the TE score, the closer the unit’s performance is to the frontier. The TE scores typically fall within the range of 0 to 1. A score of 0 usually indicates that the health care facilities (DMUs) were completely inefficient in health service delivery, whereas a score of 1 suggests that the health care facilities operated at maximum efficiency in health service delivery. In this case, the efficiency scores between these two extremes represent varying levels of the health center's performance in health service delivery. As the TE score moves from 0 to 1, it reflects the health centers’ progress toward optimal resource utilization and efficient performance of the health care facilities in health service delivery [ 20 ]. In comparison to their counterparts, health centers that implemented the best practice frontier were considered technically efficient, with an efficiency score of 1; (100% efficient), and the health centers were said to be efficient if they utilized their resources optimally, and there was no scope for increasing the outputs without increasing the amount of inputs used. The higher the score, the more efficient a health center is. Those health centers with a TE score estimated to be 1 were considered efficient, whereas those with a TE score of < 1 were considered inefficient. This means that the health centers did not utilize their resources efficiently, resulting in wastage of resources and suboptimal outputs.

In the second stage, the estimated overall technical efficiency scores obtained from the DEA were considered as the dependent variable and regressed against the set of independent variables (Fig.  1 ) namely healthcare facility-related, healthcare provider-related and environment-related factors. Finally, the statistical significance level was declared at P  < 0.05 using the 95% confidence interval (CI).

Inputs used and outputs produced

A total of 34 DMUs were included in the study, and from these DMUs, input and output data were collected based on the data from July 1, 2021, to June 30, 2022 of one EFY. For the purpose of analysis, the input variables were categorized into financial resources and human resources, while maternal and child health (MCH), delivery, and general outpatient service were considered as output variables (Table  1 ).

Efficiency of the health centers

Efficient decision units in the DEA efficiency analysis model were defined relative to less efficient units, not absolute. The DMUs in our case were health centers. The estimating technique evaluated an individual health center’s efficiency by comparing its performance with a group of other efficient health centers. A health center’s efficiency reference set was the efficient health center that was used to evaluate the other health centers. The reasons behind the classification of an inefficient health centers as inefficient units were demonstrated by the efficient reference set's performance across the evaluation dimensions (Table  2 ).

Out of 34 health centers, only 3(8.82%) of them were technically efficient, and almost all 31(91.18%) were inefficient. On average, the OTE of the all 34 health centers was estimated to be 0.47, 95% CI = (0.36, 0.57). The OTE scores of the health centers varied greatly, from the lowest of 0.0003 to the highest of 1, implying that most of the health centers were using more resources to produce output than what other health centers with comparable resource levels were producing.

Scale-inefficient health centers had efficiency scores ranging from 0.0004 to 0.99. Thirty-one (91.2%) scale-inefficient health centers had an average score of 0.54; indicating that these health centers might, on average reduce 46% of their resources while maintaining the same amount of outputs. With a scale efficiency of 100%, three of the healthcare facilities (8.82%) had the highest efficiency score for their particular input–output mix.

Regarding PTE scores, 8(23.53%) of the health centers were efficient, and the average score was 0.77 ± 0.18. The return scales (RTS) of 1(2.94%), 3(8.82%), and 31(88.22%) health centers were decreasing return scales (DRS), constant return scales (CRS), and increasing return scales (IRS), respectively.

Determinants of overall technical efficiency

In this study, the Tobit regression model was used to identify the determinants of the technical efficiency of the health centers. As a dependent variable, the health facility's technical efficiency score was calculated from the DEA; Tobit regression was subsequently carried out (Table  3 ).

The location of the health centers, armed conflict incidents in the catchment areas of the health centers, and in-service training of the healthcare providers working in healthcare facicilities significantly influenced the technical efficiency scores of the health centers. Accordingly, the OTE of those health centers that were found in urban areas of the districts declined by 35%, 95% CI, β = -0.35(-0.54, -0.07) compared to the health centers found in rural areas of the districts. Similarly, the OTE of the health centers with catchment areas faced armed conflict incidents declined by 21%, 95% CI, β = -0.21 (-0.39, -0.03) compared to those health centers’ catchment areas that did not face the problem.

However, the in-service training of the health care providers who were working in the study healthcare facilities significantly improved the technical efficiency scores of the health centers. As a result, the OTE of the health centers in which their health care providers received adequate in-service training increased by 27%, 95% CI, β = 0.27 (0.05, 0.49).

The current study evaluated the technical efficiency of the health centers and identified the determinants of their efficiency. As a result, only one health center out of every 10 health centers operated efficiently, meaning that about 90% of health centers were inefficient. The average PTE score was 77%, which purely reflected the health centers’ managerial performance to organize inputs. This indicated that the health centers exhibited a 33% failure of managerial performance to organize the available health care resources. The ratio of OTE to PTE or CRS to VRS provided the SE scores. Accordingly, the majority of the DMUs, 88.22%, exhibited IRS that could expand their scale of efficiency without additional inputs, whereas only about 2% exhibited DRS that should scale down its scale of operation in order to operate at the most productive scale size (MPSS). Incontrst to this, the study conducted in China showed that more than half of the health care facilities operated at a DRS meaning that again in efficiency could be achieved only through downsizing the scale of operation in nearly 60% of the provinces [ 21 ].

In the study, the technical inefficiency of the health centers was significantly higher than the technical inefficiency findings of the study conducted in Sub-Saharan Africa countries (SSA): 65% of public health centers in Ghana [ 22 ], 59% in Pujehun district of Sierra Leone [ 23 ], 56% of public health centers in Kenya [ 24 ], and 50% of public health centers in Jimma Zone of Ethiopia [ 25 ] were technically inefficient. Similary, the systematic review study conducted in SSA showed that less than 40% of the studied health facilities were technically efficient in SSA countries [ 26 ]. These substantial discrepancies could be due to the armed conflict incidents in the catchment areas of the study health centers. This is supported by evidence that almost half of catchment areas of the studiy health centers experienced such conflicts.

The efficiency scores of the health centers varied significantly, from the lowest of 0.0003 to the highest of 1, indicating that some health centers were using more resources to produce output than other health centers with comparable amounts of resources. While only about one out of ten health centers had a scale efficiency of 100%, indicating that they had the most productive size for the particular input–output mix, in contrast to this, nine out of ten health centers were technically inefficient with 54% scale efficiency, implying they might reduce their healthcare resources almost by half while maintaining the same quantity of outputs (health services). This efficiency score was lower when compared to the efficiency score of health care facilities in Afghanstan, which showed the average efficiency score of health facilities was 0.74, when only 8.1% of the health care facilities had efficiency scores of 1(100% efficient) [ 27 ].

In the present study, the inefficiency level of health care facilities was high, which may have had an impact on the delivery of health care services. Different studies showed that the delivery of healthcare services is greatly impacted by the efficient use of healthcare resources [ 6 , 7 , 8 ]. and despite the scarcity of health care resources in the health sector, in most low- and middle-income countries (LMICs), the inefficiency of the sector persists [ 28 ].

Once more, the study identified determinants of the technical efficiency of the health centers. As a result, the efficiency score of those health centers that were located in the urban areas of the study districts declined by one-third. This finding in lines with the study conducted in SSA countries, showed that the location of health care facilities is significantly associated with the technical efficiency of the facilities [ 26 ]. Similarly, the study conducted in Europe showed that, despite performing similarly in the efficiency dimensions, a number of rural healthcare care facilities were found to be the best performers compared to urban health facilities [ 29 ]. Also, the study conducted in China revealed that the average technical efficiency of urban primary healthcare institutions fluctuated from 63.3% to 67.1%, which was lower than that of rural facilities (75.8–82.2%) from 2009 to 2019 [ 30 ].

The availability of different public and private health facilities in urban areas, such as public hospitals and private clinics, might contribute to the fact that rural health centers were significantly more efficient compared to those health centers found in the urban areas of the study districts. Patients might opt for these health facilities rather than public health centers in urban areas. In contrast to this, in rural areas, such options were not available. Again, these health facilities, the public and private health facilities might share the same catchment areas in urban areas, which could impact their health care utilization, resulting in under-utilization and lower outputs (the number of patients and clients who utilized the health services from the health facilities).

Similarly, the armed conflict incidents in the catchment areas of the health centers had a significant impact on the technical efficiency of the health centers. Accordingly, the efficiency of the health centers that of the catchment areas experienced armed conflicts declined by one-fifth compared to the health centers that of the catchment area did not experience such conflicts.

In the same way, the study conducted in Syria showed that the utilization of routine health services, such as ANC and outpatient consultations were negatively correlated with conflict incidents [ 31 ]; a study in Cameroon revealed that the population's utilization of healthcare services declined during the armed conflict [ 32 ]; a study in Nigeria showed that living in a conflict-affected area significantly decreases the likelihood of using healthcare services [ 33 ].

This could be due to the fact that healthcare providers in areas affected by violence may face many obstacles. They first encounter health system limitations: lack of medicines, medical supplies, healthcare workers, and financial resources are all consequences of conflict, which also harms health and the infrastructure that supports it. Additionally, it adds to the load already placed on health services. Second, access to communities in need of health care by both these populations and health personnel is made more challenging by armed conflict [ 33 ].

Furthermore, in-service training of the health care providers significantly improved the efficiency of the health centers. In the current study, the efficiency scores of health centers that of the health care providers had adequate in-service training increased by one-fourth compared to those health centers that of the staffs had inadequate in-service training. Similar to this, a scoping review study in LMICs revealed that combined and multidimensional training interventions could aid in enhancing the knowledge, competencies, and abilities of healthcare professionals in data administration and health care delivery [ 34 ].

Limitatations of the study

This study thoroughly evaluated the technical efficiency level of public health centers in delivering health services by using an input–output-oriented DEA model. Additionally, it pinpointed the determinants of technical efficiency in these health centers using a Tobit regression analysis. However, this technical efficiency analysis report in this study was based on the inputs and outputs data for the 2021–2022 EFY. Much might have been changed since 2021–2022 EFY. The findings aimed to bring attention to the potential advantages of this particular type of efficiency study rather than to provide blind guidance for decision-making in health care system. Due to a lack of data, the study did not include spending on drugs, non-pharmaceutical supplies, and other non-wage expenditures among the inputs. The DEA model only measures efficiency relative to best practice within the health center samples. Thus, any change in any type and number of health facilities and varibales included in the analysis can result in the different findings.

Policy implication of the study

In the current study, it was found that 90% of health centers were operating below scale efficiency, leading to the wastage of nearly half of the healthcare resources. This inefficiency likely had detrimental effects on healthcare service delivery. The findings suggest that merely allocating resources is insufficient for enhancing facility efficiency. Instead, a dual approach is necessary. This includes addressing enabling factors such as providing in-service training opportunities for healthcare providers and considering the strategic location of healthcare facilities. Simultaneously, it is imperative to mitigate disabling factors, like the incidents of armed conflicts within the catchment areas of these health care facilities. Implementing these measures at all levels could significantly improve the efficiency of health care facilities in healthcare deliveries.

Only one out of ten health centers operated with technical efficiency, indicating that approximately nine out of ten health centers used nearly half of the healthcare resources inefficiently. This is despite the fact that they could potentially reduce their inputs by nearly half while still maintaining the same level of output. The location of health centers and the armed conflict incidents in the catchment areas of the health centers significantly declined the efficiency scores of the health centers, whereas in-service training of the health care providers significantly increased the efficiency of the health centers.

Therefore, we strongly recommend the government and the health sector to focus on improving the health service delivery in the health centers by making efficient utilization of the health care resources, resolving armed conflicts with concerned bodies, organizing training opportunities for health care providers, and taking into account the rural and urban locations of the healthcare facilities when allocating resources for the healthcare facilities.

Availability of data and materials

The datasets used and/or analyzed during this study are available from the corresponding author on reasonable request.

Pa S. David JT. Definitions of efficiency. BMJ. 1999;318:1136.

Article   Google Scholar  

Mooney G, Russell EM, Weir RD. Choices for health care: a practical introduction to the economics of health care provision. London: Macmillian; 1986.

Book   Google Scholar  

Mann C, Dessie E, Adugna M, Berman P. Measuring efficiency of public health centers in Ethiopia. Harvard T.H. Boston, Massachusetts and Addis Ababa, Ethiopia: Chan School of Public Health and Federal Democratic Republic of Ethiopia Ministry of Health; 2016.

Google Scholar  

World Health Organization, Yip, Winnie & Hafez, Reem. Improving health system efficiency: reforms for improving the efficiency of health systems: lessons from 10 country cases. World Health Organization; ‎2015. https://iris.who.int/handle/10665/185989 .

Heredia-Ortiz E. Data for efficiency: a tool for assessing health systems’ resource use efficiency. Bethesda, MD: Health Finance & Governance Project, Abt Associates Inc; 2013.

Walters JK, Sharma A, Malica E, et al. Supporting efficiency improvement in public health systems: a rapid evidence synthesis. BMC Health Serv Res. 2022;22:293. https://doi.org/10.1186/s12913-022-07694-z .

Article   PubMed   PubMed Central   Google Scholar  

Queen Elizabeth E, Jane Osareme O, Evangel Chinyere A, Opeoluwa A, Ifeoma Pamela O, Andrew ID. The impact of electronic health records on healthcare delivery and patient outcomes: a review. World J Adv Res Rev. 2023;21(2):451–60.

Bastani P, Mohammadpour M, Samadbeik M, et al. Factors influencing access and utilization of health services among older people during the COVID − 19 pandemic: a scoping review. Arch Public Health. 2021;79:190. https://doi.org/10.1186/s13690-021-00719-9 .

FMOH. Health Sector Transformation Plan (2015/16 - 2019/20). Addis Ababa, Ethiopia: Federal Democratic Republic of Ethiopia Ministry of Health; 2015.

lebachew A, Yusuf Y, Mann C, Berman P, FMOH. Ethiopia’s Progress in Health Financing and the Contribution of the 1998 Health Care and Financing Strategy in Ethiopia. Resource Tracking and Management Project. Boston and Addis Ababa: Harvard T.H. Chan School of Public Health; Breakthrough International Consultancy, PLC; and Ethiopian Federal Ministry of Health; 2015.

Alebachew A, Hatt L, Kukla M. Monitoring and Evaluating Progress towards Universal Health Coverage in Ethiopia. PLoS Med. 2014;11(9):e1001696. https://doi.org/10.1371/journal.pmed.1001696 .

Berman P, Mann C, Ricculli ML. Financing Ethiopia’s Primary Care to 2035: A Model Projecting Resource Mobilization and Costs. Boston: Harvard T.H. Chan School of Public Health; 2015.

World Bank. Ethiopia: Public Expenditure Review, Volume 1. Main Report. Public expenditure review (PER);. © Washington, DC; 2000. http://hdl.handle.net/10986/14967 . License: CC BY 3.0 IGO.

Federal Democratic Republic of Ethiopia (FDRE). Growth and Transformation Plan II (GTP II) (2015/16–2019/20). Vol. I. Addis Ababa; 2016.

Powell-Jackson T, Hanson K, McIntyre D. Fiscal space for health: a review of the literature. London, United Kingdom and Cape Town, South Africa: Working Paper 1; 2012.

Evans DB, Tandon A, Murray CJL, Lauer JA. The comparative efficiency of National of Health Systems in producing health: An analysis of 191 countries. World Health Organization. 2000;29(29):1–36. Available from: http://www.who.int/healthinfo/paper29.pdf .

Coelli TJ. A Guide to DEAP Version 2.1: a data envelopment analysis (Computer) Program. Centers for Efficiency and Productivity Analysis (CEPA) Working papers, No. 08/96.

Charnes A, Cooper WW, Seiford LM, Tone K. Data envelopment analysis: theory. Data envelopment analysis: a comprehensive text with models applications, references and DEA-solver software. 2nd ed. Dordrecht: Academic Publishers; 1994. p. 1–490.

Carson RT, Sun Y. The Tobit model with a non-zero threshold. Econometr J. 2007;10(1):1–15.

Wang D, Du K, Zhang N. Measuring technical efficiency and total factor productivity change with undesirable outputs in Stata. Stata J: Promot Commun Stat Stata. 2022;22(1):103–24.

Chai P, Zhang Y, Zhou M, et al. Technical and scale efficiency of provincial health systems in China: a bootstrapping data envelopment analysis. BMJ Open. 2019;9:e027539. https://doi.org/10.1136/bmjopen-2018-027539 .

Akazili J, Adjuik M. Using data envelopment analysis to measure the extent of technical efficiency of public health centers in Ghana. Health Hum Rights. 2008. http://www.biomedcentral.com/1472–698X/8/11.

Renner A, Kirigia JM, Zere E, Barry SP, Kirigia DG, Kamara C, et al. Technical efficiency of peripheral health units in Pujehun district of Sierra Leone: a DEA application. BMC Health Serv Res. 2005;5:77.

Kirigia JM, Emrouznejad A, Sambo LG, Munguti N, Liambila W. Using data envelopment analysis to measure the technical efficiency of public health centers in Kenya. J Med Syst. 2004;28(2):155–66.

Article   PubMed   Google Scholar  

Bobo FT, Woldie M, Muluemebet Wordofa MA, Tsega G, Agago TA, Wolde-Michael K, Ibrahim N, Yesuf EA. Technical efficiency of public health centers in three districts in Ethiopia: two-stage data envelopment analysis. BMC Res Notes. 2018;11:465. https://doi.org/10.1186/s13104-018-3580-6 .

Tesleem KB, Indres M. Assessing the Efficiency of Health-care Facilities in Sub-Saharan Africa: A Systematic Review. Health Services Research and Managerial Epidemiology. 2020;7:1–12. https://doi.org/10.1177/2333392820919604 .

Farhad F, Khwaja S, Abo F, Said A, Mohammad Z, Sinai I, Wu Z.  Efficiency analysis of primary healthcare facilities in Afghanistan. Cost Eff Res Alloc. 2022;20:24. https://doi.org/10.1186/s12962-022-00357-0 .

de Siqueira Filha NT, Li J, Phillips-Howard PA, et al. The economics of healthcare access: a scoping review on the economic impact of healthcare access for vulnerable urban populations in low- and middle-income countries. Int J Equity Health. 2022;21:191. https://doi.org/10.1186/s12939-022-01804-3 .

Javier GL, Emilio M. Rural vs urban hospital performance in a ‘competitive’ public health service. Soc Sci Med. 2010;71:1131-e1140.

Zhou J, Peng R, Chang Y, Liu Z, Gao S, Zhao C, Li Y, Feng Q, Qin X. Analyzing the efficiency of Chinese primary healthcare institutions using the Malmquist-DEA approach: evidence from urban and rural areas. Front Public Health. 2023;11:1073552. https://doi.org/10.3389/fpubh.2023.1073552 .

Abdulkarim E, Yasser AA, Hasan A, Francesco C. The impact of armed conflict on utilisation of health services in north-west Syria: an observational study. BMC Confl Health. 2021;15:91. https://doi.org/10.1186/s13031-021-00429-7 .

Eposi CH, Chia EJ, Benjamin MK. Health services utilisation before and during an armed conflict; experiences from the Southwest Region of Cameroon. Open Public Health J. 2020;13:547–54. https://doi.org/10.2174/1874944502013010547 .

Alice D. Hard to Reach: Providing Healthcare in Armed Conflict. International Peace Institute. Issue Brief; 2018. Available at: https://www.ipinst.org/2019/01/providing-healthcare-in-armed-conflict-nigeria .

Edward N, Eunice T, George B. Pre- and in-service training of health care workers on immunization data management in LMICs: a scoping review. BMC Hum Res Health. 2019;17:92. https://doi.org/10.1186/s12960-019-0437-6 .

Download references

Acknowledgements

Our special thanks go to Wollega University and study health facilities.

We received no financial supports to be disclosed.

Author information

Authors and affiliations.

School of Public Health, Institute of Health Sciences, Wollega University, Nekemte, Oromia, Ethiopia

Edosa Tesfaye Geta, Dufera Rikitu Terefa, Adisu Tafari Shama, Adisu Ewunetu Desisa, Wase Benti Hailu, Wolkite Olani, Melese Chego Cheme & Matiyos Lema

You can also search for this author in PubMed   Google Scholar

Contributions

All authors participated in developing the study concept and design of the study. ET. contributed to data analysis, interpretation, report writing, manuscript preparation and acted as the corresponding author. DR, AT, A E, WB, WO, MC, and ML contributed to developing the data collection tools, data collection supervision, data entry to statistical software and report writing.

Corresponding author

Correspondence to Edosa Tesfaye Geta .

Ethics declarations

Ethics approval and consent to participate.

Wollega University's research ethical guidelines were adhered to carry out this study. The research ethics review committee (RERC) of Wollega University granted the ethical clearance number WURD-202–44/22 . A formal letter from the East Wollega Zonal Health Department was taken and given to the district health offices. The objective of the study was clearly communicated to all study health center directors and the required informed consent was obtained from all the study health centers. The study health centers’ confidentially was maintained. The codes from DMU001 to DMU034 were used in place of health facility identification in the data collection checklists. Each document of electronic and paper data was stored in a secure area. The research team was the only one with access to the data that was collected, and data sharing will be done in accordance with the ethical and legal guidelines.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Geta, E.T., Terefa, D.R., Shama, A.T. et al. Technical efficiency and its determinants in health service delivery of public health centers in East Wollega Zone, Oromia Regional State, Ethiopia: Two-stage data envelope analysis. BMC Health Serv Res 24 , 980 (2024). https://doi.org/10.1186/s12913-024-11431-z

Download citation

Received : 10 November 2023

Accepted : 12 August 2024

Published : 24 August 2024

DOI : https://doi.org/10.1186/s12913-024-11431-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Health centers
  • Health service delivery
  • Technical efficiency

BMC Health Services Research

ISSN: 1472-6963

peer review analysis of research

  • Introduction
  • Conclusions
  • Article Information

Contributing studies for clinically elevated depression symptoms are presented in order of largest to smallest prevalence rate. Square data markers represent prevalence rates, with lines around the marker indicating 95% CIs. The diamond data marker represents the overall effect size based on included studies.

Contributing studies for clinically elevated anxiety symptoms are presented in order of largest to smallest prevalence rate. Square data markers represent prevalence rates, with lines around the marker indicating 95% CIs. The diamond data marker represents the overall effect size based on included studies.

eTable 1. Example Search Strategy from Medline

eTable 2. Study Quality Evaluation Criteria

eTable 3. Quality Assessment of Studies Included

eTable 4. Sensitivity analysis excluding low quality studies (score=2) for moderators of the prevalence of clinically elevated depressive symptoms in children and adolescence during COVID-19

eTable 5. Sensitivity analysis excluding low quality studies (score=2) for moderators of the prevalence of clinically elevated anxiety symptoms in children and adolescence during COVID-19

eFigure 1. PRISMA diagram of review search strategy

eFigure 2. Funnel plot for studies included in the clinically elevated depressive symptoms

eFigure 3. Funnel plot for studies included in the clinically elevated anxiety symptoms

  • Pediatric Depression and Anxiety Doubled During the Pandemic JAMA News From the JAMA Network October 5, 2021 Anita Slomski
  • Guidelines Synopsis: Screening for Anxiety in Adolescent and Adult Women JAMA JAMA Clinical Guidelines Synopsis March 8, 2022 This JAMA Clinical Guidelines Synopsis summarizes the 2020 Women’s Preventive Services Initiative recommendation on screening for anxiety in adolescent and adult women. Tiffany I. Leung, MD, MPH; Adam S. Cifu, MD; Wei Wei Lee, MD, MPH
  • Addressing the Global Crisis of Child and Adolescent Mental Health JAMA Pediatrics Editorial November 1, 2021 Tami D. Benton, MD; Rhonda C. Boyd, PhD; Wanjikũ F.M. Njoroge, MD
  • Effect of the COVID-19 pandemic on Adolescents With Eating Disorders JAMA Pediatrics Comment & Response February 1, 2022 Thonmoy Dey, BSc; Zachariah John Mansell, BSc; Jasmin Ranu, BSc

See More About

Select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Others Also Liked

  • Download PDF
  • X Facebook More LinkedIn

Racine N , McArthur BA , Cooke JE , Eirich R , Zhu J , Madigan S. Global Prevalence of Depressive and Anxiety Symptoms in Children and Adolescents During COVID-19 : A Meta-analysis . JAMA Pediatr. 2021;175(11):1142–1150. doi:10.1001/jamapediatrics.2021.2482

Manage citations:

© 2024

  • Permissions

Global Prevalence of Depressive and Anxiety Symptoms in Children and Adolescents During COVID-19 : A Meta-analysis

  • 1 Department of Psychology, University of Calgary, Calgary, Alberta, Canada
  • 2 Alberta Children’s Hospital Research Institute, Calgary, Alberta, Canada
  • Editorial Addressing the Global Crisis of Child and Adolescent Mental Health Tami D. Benton, MD; Rhonda C. Boyd, PhD; Wanjikũ F.M. Njoroge, MD JAMA Pediatrics
  • News From the JAMA Network Pediatric Depression and Anxiety Doubled During the Pandemic Anita Slomski JAMA
  • JAMA Clinical Guidelines Synopsis Guidelines Synopsis: Screening for Anxiety in Adolescent and Adult Women Tiffany I. Leung, MD, MPH; Adam S. Cifu, MD; Wei Wei Lee, MD, MPH JAMA
  • Comment & Response Effect of the COVID-19 pandemic on Adolescents With Eating Disorders Thonmoy Dey, BSc; Zachariah John Mansell, BSc; Jasmin Ranu, BSc JAMA Pediatrics

Question   What is the global prevalence of clinically elevated child and adolescent anxiety and depression symptoms during COVID-19?

Findings   In this meta-analysis of 29 studies including 80 879 youth globally, the pooled prevalence estimates of clinically elevated child and adolescent depression and anxiety were 25.2% and 20.5%, respectively. The prevalence of depression and anxiety symptoms during COVID-19 have doubled, compared with prepandemic estimates, and moderator analyses revealed that prevalence rates were higher when collected later in the pandemic, in older adolescents, and in girls.

Meaning   The global estimates of child and adolescent mental illness observed in the first year of the COVID-19 pandemic in this study indicate that the prevalence has significantly increased, remains high, and therefore warrants attention for mental health recovery planning.

Importance   Emerging research suggests that the global prevalence of child and adolescent mental illness has increased considerably during COVID-19. However, substantial variability in prevalence rates have been reported across the literature.

Objective   To ascertain more precise estimates of the global prevalence of child and adolescent clinically elevated depression and anxiety symptoms during COVID-19; to compare these rates with prepandemic estimates; and to examine whether demographic (eg, age, sex), geographical (ie, global region), or methodological (eg, pandemic data collection time point, informant of mental illness, study quality) factors explained variation in prevalence rates across studies.

Data Sources   Four databases were searched (PsycInfo, Embase, MEDLINE, and Cochrane Central Register of Controlled Trials) from January 1, 2020, to February 16, 2021, and unpublished studies were searched in PsycArXiv on March 8, 2021, for studies reporting on child/adolescent depression and anxiety symptoms. The search strategy combined search terms from 3 themes: (1) mental illness (including depression and anxiety), (2) COVID-19, and (3) children and adolescents (age ≤18 years). For PsycArXiv , the key terms COVID-19 , mental health , and child/adolescent were used.

Study Selection   Studies were included if they were published in English, had quantitative data, and reported prevalence of clinically elevated depression or anxiety in youth (age ≤18 years).

Data Extraction and Synthesis   A total of 3094 nonduplicate titles/abstracts were retrieved, and 136 full-text articles were reviewed. Data were analyzed from March 8 to 22, 2021.

Main Outcomes and Measures   Prevalence rates of clinically elevated depression and anxiety symptoms in youth.

Results   Random-effect meta-analyses were conducted. Twenty-nine studies including 80 879 participants met full inclusion criteria. Pooled prevalence estimates of clinically elevated depression and anxiety symptoms were 25.2% (95% CI, 21.2%-29.7%) and 20.5% (95% CI, 17.2%-24.4%), respectively. Moderator analyses revealed that the prevalence of clinically elevated depression and anxiety symptoms were higher in studies collected later in the pandemic and in girls. Depression symptoms were higher in older children.

Conclusions and Relevance   Pooled estimates obtained in the first year of the COVID-19 pandemic suggest that 1 in 4 youth globally are experiencing clinically elevated depression symptoms, while 1 in 5 youth are experiencing clinically elevated anxiety symptoms. These pooled estimates, which increased over time, are double of prepandemic estimates. An influx of mental health care utilization is expected, and allocation of resources to address child and adolescent mental health concerns are essential.

Prior to the COVID-19 pandemic, rates of clinically significant generalized anxiety and depressive symptoms in large youth cohorts were approximately 11.6% 1 and 12.9%, 2 respectively. Since COVID-19 was declared an international public health emergency, youth around the world have experienced dramatic disruptions to their everyday lives. 3 Youth are enduring pervasive social isolation and missed milestones, along with school closures, quarantine orders, increased family stress, and decreased peer interactions, all potential precipitants of psychological distress and mental health difficulties in youth. 4 - 7 Indeed, in both cross-sectional 8 , 9 and longitudinal studies 10 , 11 amassed to date, the prevalence of youth mental illness appears to have increased during the COVID-19 pandemic. 3 However, data collected vary considerably. Specifically, ranges from 2.2% 12 to 63.8% 13 and 1.8% 12 to 49.5% 13 for clinically elevated depression and anxiety symptoms, respectively. As governments and policy makers deploy and implement recovery plans, ascertaining precise estimates of the burden of mental illness for youth are urgently needed to inform service deployment and resource allocation.

Depression and generalized anxiety are 2 of the most common mental health concerns in youth. 14 Depressive symptoms, which include feelings of sadness, loss of interest and pleasure in activities, as well as disruption to regulatory functions such as sleep and appetite, 15 could be elevated during the pandemic as a result of social isolation due to school closures and physical distancing requirements. 6 Generalized anxiety symptoms in youth manifest as uncontrollable worry, fear, and hyperarousal. 15 Uncertainty, disruptions in daily routines, and concerns for the health and well-being of family and loved ones during the COVID-19 pandemic are likely associated with increases in generalized anxiety in youth. 16

When heterogeneity is observed across studies, as is the case with youth mental illness during COVID-19, it often points to the need to examine demographic, geographical, and methodological moderators. Moderator analyses can determine for whom and under what circumstances prevalence is higher vs lower. With regard to demographic factors, prevalence rates of mental illness both prior to and during the COVID-19 pandemic are differentially reported across child age and sex, with girls 17 , 18 and older children 17 , 19 being at greater risk for internalizing disorders. Studies have also shown that youth living in regions that experienced greater disease burden 2 and urban areas 20 had greater mental illness severity. Methodological characteristics of studies also have the potential to influence the estimated prevalence rates. For example, studies of poorer methodological quality may be more likely to overestimate prevalence rates. 21 The symptom reporter (ie, child vs parent) may also contribute to variability in the prevalence of mental illness across studies. Indeed, previous research prior to the pandemic has demonstrated that child and parent reports of internalizing symptoms vary, 22 with children/adolescents reporting more internalizing symptoms than parents. 23 Lastly, it is important to consider the role of data collection timing on potential prevalence rates. While feelings of stress and overwhelm may have been greater in the early months of the pandemic compared with later, 24 extended social isolation and school closures may have exerted mental health concerns.

Although a narrative systematic review of 6 studies early in the pandemic was conducted, 8 to our knowledge, no meta-analysis of prevalence rates of child and adolescent mental illness during the pandemic has been undertaken. In the current study, we conducted a meta-analysis of the global prevalence of clinically elevated symptoms of depression and anxiety (ie, exceeding a clinical cutoff score on a validated measure or falling in the moderate to severe symptom range of anxiety and depression) in youth during the first year of the COVID-19 pandemic. While research has documented a worsening of symptoms for children and youth with a wide range of anxiety disorders, 25 including social anxiety, 26 clinically elevated symptoms of generalized anxiety are the focus of the current meta-analysis. In addition to deriving pooled prevalence estimates, we examined demographic, geographical, and methodological factors that may explain between-study differences. Given that there have been several precipitants of psychological distress for youth during COVID-19, we hypothesized that pooled prevalence rates would be higher compared with prepandemic estimates. We also hypothesized that child mental illness would be higher among studies with older children, a higher percentage of female individuals, studies conducted later in the pandemic, and that higher-quality studies would have lower prevalence rates.

This systematic review was registered as a protocol with PROSPERO (CRD42020184903) and the Preferred Reporting Items for Systematic Reviews and Meta-analyses ( PRISMA ) reporting guideline was followed. 27 Ethics review was not required for the study. Electronic searches were conducted in collaboration with a health sciences librarian in PsycInfo, Cochrane Central Register of Controlled Trials (CENTRAL), Embase, and MEDLINE from inception to February 16, 2021. The search strategy (eTable 1 in the Supplement ) combined search terms from 3 themes: (1) mental illness (including depression and anxiety), (2) COVID-19, and (3) children and adolescents (age ≤18 years). Both database and subject headings were used to search keywords. As a result of the rapidly evolving nature of research during the COVID-19 pandemic, we also searched a repository of unpublished preprints, PsycArXiv . The key terms COVID-19 , mental health , and child/adolescent were used on March 8, 2021, and yielded 38 studies of which 1 met inclusion criteria.

The following inclusion criteria were applied: (1) sample was drawn from a general population; (2) proportion of individuals meeting clinical cutoff scores or falling in the moderate to severe symptom range of anxiety or depression as predetermined by validated self-report measures were provided; (3) data were collected during COVID-19; (4) participants were 18 years or younger; (5) study was empirical; and (6) studies were written in English. Samples of participants who may be affected differently from a mental health perspective during COVID-19 were excluded (eg, children with preexisting psychiatric diagnoses, children with chronic illnesses, children diagnosed or suspected of having COVID-19). We also excluded case studies and qualitative analyses.

Five (N.R., B.A.M., J.E.C., R.E. and J.Z.) authors used Covidence software (Covidence Inc) to review all abstracts and to determine if the study met criteria for inclusion. Twenty percent of abstracts reviewed for inclusion were double-coded, and the mean random agreement probability was 0.89; disagreements were resolved via consensus with the first author (N.R.). Two authors (N.R. and B.A.M.) reviewed full-text articles to determine if they met all inclusion criteria and the percent agreement was 0.80; discrepancies were resolved via consensus.

When studies met inclusion criteria, prevalence rates for anxiety and depression were extracted, as well as potential moderators. When more than 1 wave of data was provided, the wave with the largest sample size was selected. For 1 study in which both parent and youth reports were provided, 26 the youth report was selected, given research that they are the reliable informants of their own behavior. 28 The following moderators were extracted: (1) study quality (see the next subsection); (2) participant age (continuously as a mean); (3) sex (% female in a sample); (4) geographical region (eg, East Asia, Europe, North America), (5) informant (child, parent), (6) month in 2020 when data were collected (range, 1-12). Data from all studies were extracted by 1 coder and the first author (N.R.). Discrepancies were resolved via consensus.

Adapted from the National Institute of Health Quality Assessment Tool for Observation Cohort and Cross-Sectional Studies, a short 5-item questionnaire was used (eTable 2 in the Supplement ). 29 Studies were given a score of 0 (no) or 1 (yes) for each of the 5 criteria (validated measure; peer-reviewed, response rate ≥50%, objective assessment, sufficient exposure time) and summed to give a total score of 5. When information was unclear or not provided by the study authors, it was marked as 0 (no).

All included studies are from independent samples. Comprehensive Meta-Analysis version 3.0 (Biostat) software was used for data analysis. Pooled prevalence estimates with associated 95% confidence intervals around the estimate were computed. We weighted pooled prevalence estimates by the weight of the inverse of their variance, which gives greater weight to large sample sizes.

We used random-effects models to reflect the variations observed across studies and assessed between-study heterogeneity using the Q and I 2 statistics. Pooled prevalence is reported as an event rate (ie, 0.30) but interpreted as prevalence (ie, 30.0%). Significant Q statistics and I 2 values more than 75% suggest moderator analyses should be explored. 30 As recommended by Bornstein et al, 30 we examined categorical moderators when k of 10 or higher and a minimum cell size of k more than 3 were available. A P value of .05 was considered statistically significant. For continuous moderators, random-effect meta-regression analyses were conducted. Publication bias was examined using the Egger test 31 and by inspecting funnel plots for symmetry.

Our electronic search yielded 3094 nonduplicate records (eFigure 1 in the Supplement ). Based on the abstract review, a total of 136 full-text articles were retrieved to examine against inclusion criteria, and 29 nonoverlapping studies 10 , 12 , 13 , 17 , 19 , 20 , 26 , 32 - 53 met full inclusion criteria.

A total of 29 studies were included in the meta-analyses, of which 26 had youth symptom reports and 3 studies 39 , 42 , 48 had parent reports of child symptoms. As outlined in Table 1 , across all 29 studies, 80 879 participants were included, of which the mean (SD) perecentage of female individuals was 52.7% (12.3%), and the mean age was 13.0 years (range, 4.1-17.6 years). All studies provided binary reports of sex or gender. Sixteen studies (55.2%) were from East Asia, 4 were from Europe (13.8%), 6 were from North America (20.7%), 2 were from Central America and South America (6.9%), and 1 study was from the Middle East (3.4%). Eight studies (27.6%) reported having racial or ethnic minority participants with the mean across studies being 36.9%. Examining study quality, the mean score was 3.10 (range, 2-4; eTable 3 in the Supplement ).

The pooled prevalence from a random-effects meta-analysis of 26 studies revealed a pooled prevalence rate of 0.25 (95% CI, 0.21-0.30; Figure 1 ) or 25.2%. The funnel plot was symmetrical (eFigure 2 in the Supplement ); however, the Egger test was statistically significant (intercept, −9.5; 95% CI, −18.4 to −0.48; P  = .02). The between-study heterogeneity statistic was significant ( Q  = 4675.91; P  < .001; I 2  = 99.47). Significant moderators are reported below, and all moderator analyses are presented in Table 2 .

As the number of months in the year increased, so too did the prevalence of depressive symptoms ( b  = 0.26; 95% CI, 0.06-0.46). Prevalence rates were higher as child age increased ( b  = 0.08; 95% CI, 0.01-0.15), and as the percentage of female individuals ( b  = 0.03; 95% CI, 0.01-0.05) in samples increased. Sensitivity analyses removing low-quality studies were conducted (ie, scores of 2) 32 , 43 (eTable 4 in the Supplement ). Moderators remained significant, except for age, which became nonsignificant ( b  = 0.06; 95% CI, −0.02 to 0.13; P  = .14).

The overall pooled prevalence rate across 25 studies for elevated anxiety was 0.21 (95% CI, 0.17-0.24; Figure 2 ) or 20.5%. The funnel plot was symmetrical (eFigure 3 in the Supplement ) and the Egger test was nonsignificant (intercept, −6.24; 95% CI, −14.10 to 1.62; P  = .06). The heterogeneity statistic was significant ( Q  = 3300.17; P  < .001; I 2  = 99.27). Significant moderators are reported below, and all moderator analyses are presented in Table 3 .

As the number of months in the year increased, so too did the prevalence of anxiety symptoms ( b  = 0.27; 95% CI, 0.10-0.44). Prevalence rates of clinically elevated anxiety was higher as the percentage of female individuals in the sample increased ( b  = 0.04; 95% CI, 0.01-0.07) and also higher in European countries ( k  = 4; rate = 0.34; 95% CI, 0.23-0.46; P  = .01) compared with East Asian countries ( k  = 14; rate = 0.17; 95% CI, 0.13-0.21; P  < .001). Lastly, the prevalence of clinically elevated anxiety was higher in studies deemed to have poorer quality ( k  = 21; rate = 0.22; 95% CI, 0.18-0.27; P  < .001) compared with studies with better study quality scores ( k  = 4; rate = 0.12; 95% CI, 0.07-0.20; P  < .001). Sensitivity analyses removing low quality studies (ie, scores of 2) 32 , 43 yielded the same pattern of results (eTable 5 in the Supplement ).

The current meta-analysis provides a timely estimate of clinically elevated depression and generalized anxiety symptoms globally among youth during the COVID-19 pandemic. Across 29 samples and 80 879 youth, the pooled prevalence of clinically elevated depression and anxiety symptoms was 25.2% and 20.5%, respectively. Thus, 1 in 4 youth globally are experiencing clinically elevated depression symptoms, while 1 in 5 youth are experiencing clinically elevated anxiety symptoms. A comparison of these findings to prepandemic estimates (12.9% for depression 2 and 11.6% for anxiety 1 ) suggests that youth mental health difficulties during the COVID-19 pandemic has likely doubled.

The COVID-19 pandemic, and its associated restrictions and consequences, appear to have taken a considerable toll on youth and their psychological well-being. Loss of peer interactions, social isolation, and reduced contact with buffering supports (eg, teachers, coaches) may have precipitated these increases. 3 In addition, schools are often a primary location for receiving psychological services, with 80% of children relying on school-based services to address their mental health needs. 54 For many children, these services were rendered unavailable owing to school closures.

As the month of data collection increased, rates of depression and anxiety increased correspondingly. One possibility is that ongoing social isolation, 6 family financial difficulties, 55 missed milestones, and school disruptions 3 are compounding over time for youth and having a cumulative association. However, longitudinal research supporting this possibility is currently scarce and urgently needed. A second possibility is that studies conducted in the earlier months of the pandemic (February to March 2020) 12 , 51 were more likely to be conducted in East Asia where self-reported prevalence of mental health symptoms tends to be lower. 56 Longitudinal trajectory research on youth well-being as the pandemic progresses and in pandemic recovery phases will be needed to confirm the long-term mental health implications of the COVID-19 pandemic on youth mental illness.

Prevalence rates for anxiety varied according to study quality, with lower-quality studies yielding higher prevalence rates. It is important to note that in sensitivity analyses removing lower-quality studies, other significant moderators (ie, child sex and data collection time point) remained significant. There has been a rapid proliferation of youth mental health research during the COVID-19 pandemic; however, the rapid execution of these studies has been criticized owing to the potential for some studies to sacrifice methodological quality for methodological rigor. 21 , 57 Additionally, several studies estimating prevalence rates of mental illness during the pandemic have used nonprobability or convenience samples, which increases the likelihood of bias in reporting. 21 Studies with representative samples and/or longitudinal follow-up studies that have the potential to demonstrate changes in mental health symptoms from before to after the pandemic should be prioritized in future research.

In line with previous research on mental illness in childhood and adolescence, 58 female sex was associated with both increased depressive and anxiety symptoms. Biological susceptibility, lower baseline self-esteem, a higher likelihood of having experienced interpersonal violence, and exposure to stress associated with gender inequity may all be contributing factors. 59 Higher rates of depression in older children were observed and may be due to puberty and hormonal changes 60 in addition to the added effects of social isolation and physical distancing on older children who particularly rely on socialization with peers. 6 , 61 However, age was not a significant moderator for prevalence rates of anxiety. Although older children may be more acutely aware of the stress of their parents and the implications of the current global pandemic, younger children may be able to recognize changes to their routine, both of which may contribute to similar rates of anxiety with different underlying mechanisms.

In terms of practice implications, a routine touch point for many youth is the family physician or pediatrician’s office. Within this context, it is critical to inquire about or screen for youth mental health difficulties. Emerging research 42 suggests that in families using more routines during COVID-19, lower child depression and conduct problems are observed. Thus, a tangible solution to help mitigate the adverse effects of COVID-19 on youth is working with children and families to implement consistent and predictable routines around schoolwork, sleep, screen use, and physical activity. Additional resources should be made available, and clinical referrals should be placed when children experience clinically elevated mental distress. At a policy level, research suggests that social isolation may contribute to and confer risk for mental health concerns. 4 , 5 As such, the closure of schools and recreational activities should be considered a last resort. 62 In addition, methods of delivering mental health resources widely to youth, such as group and individual telemental health services, need to be adapted to increase scalability, while also prioritizing equitable access across diverse populations. 63

There are some limitations to the current study. First, although the current meta-analysis includes global estimates of child and adolescent mental illness, it will be important to reexamine cross-regional differences once additional data from underrepresented countries are available. Second, most study designs were cross-sectional in nature, which precluded an examination of the long-term association of COVID-19 with child mental health over time. To determine whether clinically elevated symptoms are sustained, exacerbated, or mitigated, longitudinal studies with baseline estimates of anxiety and depression are needed. Third, few studies included racial or ethnic minority participants (27.6%), and no studies included gender-minority youth. Given that racial and ethnic minority 64 and gender-diverse youth 65 , 66 may be at increased risk for mental health difficulties during the pandemic, future work should include and focus on these groups. Finally, all studies used self- or parent-reported questionnaires to examine the prevalence of clinically elevated (ie, moderate to high) symptoms. Thus, studies using criterion standard assessments of child depression and anxiety disorders via diagnostic interviews or multimethod approaches may supplement current findings and provide further details on changes beyond generalized anxiety symptoms, such symptoms of social anxiety, separation anxiety, and panic.

Overall, this meta-analysis shows increased rates of clinically elevated anxiety and depression symptoms for youth during the COVID-19 pandemic. While this meta-analysis supports an urgent need for intervention and recovery efforts aimed at improving child and adolescent well-being, it also highlights that individual differences need to be considered when determining targets for intervention (eg, age, sex, exposure to COVID-19 stressors). Research on the long-term effect of the COVID-19 pandemic on mental health, including studies with pre– to post–COVID-19 measurement, is needed to augment understanding of the implications of this crisis on the mental health trajectories of today’s children and youth.

Corresponding Author: Sheri Madigan, PhD, RPsych, Department of Psychology University of Calgary, Calgary, AB T2N 1N4, Canada ( [email protected] ).

Accepted for Publication: May 19, 2021.

Published Online: August 9, 2021. doi:10.1001/jamapediatrics.2021.2482

Author Contributions: Drs Racine and Madigan had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Racine, Madigan.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Racine, McArthur, Eirich, Zhu, Madigan.

Critical revision of the manuscript for important intellectual content: Racine, Cooke, Eirich, Madigan.

Statistical analysis: Racine, McArthur.

Administrative, technical, or material support: Madigan.

Supervision: Racine, Madigan.

Conflict of Interest Disclosures: Dr Racine reported fellowship support from Alberta Innovates. Dr McArthur reported a postdoctoral fellowship award from the Alberta Children’s Hospital Research Institute. Ms Cooke reported graduate scholarship support from Vanier Canada and Alberta Innovates Health Solutions outside the submitted work. Ms Eirich reported graduate scholarship support from the Social Science and Humanities Research Council. No other disclosures were reported.

Additional Contributions: We acknowledge Nicole Dunnewold, MLIS (Research and Learning Librarian, Health Sciences Library, University of Calgary), for her assistance with the search strategy, for which they were not compensated outside of their salary. We also acknowledge the contribution of members of the Determinants of Child Development Laboratory at the University of Calgary, in particular, Julianna Watt, BA, and Katarina Padilla, BSc, for their contribution to data extraction, for which they were paid as research assistants.

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts
  • Open access
  • Published: 20 August 2024

Exploring healthcare workers’ perceptions of child health research at Kamuzu Central Hospital, Malawi: an interpretative phenomenological analysis

  • Myness Kasanda Ndambo 1 , 2 ,
  • Tuntufye Brighton Ndambo 3 &
  • Lucinda Manda-Taylor 1 , 2  

Human Resources for Health volume  22 , Article number:  57 ( 2024 ) Cite this article

177 Accesses

Metrics details

Children’s health is a global public health priority and a determinant of development and sustainability. Its effective delivery and further improvements require constant and dedicated research on children, especially by child healthcare workers (HCWs). Studies have shown a high involvement of child HCWs from developed countries in child health research, with an under-representation from the global south in authorship and leadership in international collaborations. To our knowledge, there is very little literature on challenges faced by child HCWs in Malawi in conducting child health research. We sought to explore the lived experiences of child HCWs at Kamuzu Central Hospital (KCH) in Malawi by examining their perceptions of child health research and assessing the availability of child health research opportunities.

From July 2023 to August 2023, we conducted five key informant interviews with purposively sampled policymakers and 20 in-depth interviews with child HCWs at KCH. The interviews were conducted in English, audio-recorded, and transcribed verbatim. We utilised interpretative phenomenological analysis by reviewing initial transcripts for familiarity, generating codes manually, and refining them into broader themes through comparisons and iterative processes.

The analysis revealed three main themes on perceptions of child HCWs at KCH in child health research. These are (i) perceived motivation and challenges for engaging in child health research, (ii) perceptions of resource availability and research opportunities at KCH, and (iii) perceptions of gaps in research training and participation among child HCWs.

Conclusions

Our study has uncovered critical factors influencing the low participation of child HCWs in child health research at KCH. Lack of collaboration, limited financial opportunities, and non-research-based training were the key barriers to participation in child health research among child HCWs at KCH. We advocate for the inclusion of child HCWs at all stages of collaborative health research, transparency on funding opportunities for child health research, and inclusion of research in the training of HCWs. These initiatives can strengthen the participation of child HCWs in child health research and ultimately enhance child health outcomes in Malawi.

Peer Review reports

Children’s health is a global public health priority and a determinant of development and sustainability [ 1 , 2 , 3 ]. Studies show a significant improvement in children’s health as evidenced by a sustained reduction in global child mortality from 12.7 to 5.7 million between 1990 and 2015. Child health research has significantly reduced mortality and morbidity [ 1 , 4 , 5 , 6 , 7 ]. Despite the reduction, estimated 16,000 children worldwide continue to die daily due to preventable causes [ 8 ]. Most of these deaths are clustered in developing countries [ 9 , 10 , 11 ] and comprise preventable infectious diseases [ 12 , 13 ]. Despite the higher burden of child mortality in developing countries, research in child health is dominated by researchers from developed countries [ 2 , 12 , 14 , 15 , 16 ].

Participation in child health research is one of the glaring inequalities between the developed and the developing world [ 17 , 18 ]. Previous studies have reported an under-representation in research by health professionals from the developing world in authorship and leadership in international collaborative research [ 18 ]. For instance, a global literature search on HIV/AIDs reported high dominance by North America and Western and Central Europe in scientific productions [ 18 ]. The study also showed low levels of leadership by Africans in international collaborative HIV/AIDS research [ 18 ]. It is argued that the absence of infrastructure and inadequate funding exacerbates the low participation in research leadership and authorship in the global south [ 19 , 20 , 21 ].

Africans do not take the leading roles in studies and authorship because of inadequate methodological skills in research design, analytical skills, and language problems (English), which hinder them in writing publications [ 18 ]. A study in Malawi on research experience among health professionals reported that all participants (100%) indicated a willingness to be trained in research, 3 (5.3%) had ever written a journal article, 23 (40.4%) had ever participated in research projects, and 18 (31.6%) had been trained in research methods [ 22 ].

Research has further shown that limitations of child health research in different areas result in gaps that cause physicians to extrapolate from adult studies, implement interventions that may not have been adequately evaluated, and even give out medications that may be potentially harmful to children or culturally and socially unacceptable [ 12 ].

Like other developing countries, Malawi needs medical treatments that reflect biological and non-biological variations [ 12 , 23 ] to make evidence-based decisions on the most efficient and cost-effective interventions [ 24 ]. In support, the Government of Malawi emphasises the need to conduct health research on child health [ 25 ]. The Malawi National Health Policy II points to insufficient capacity in research, among others, as a serious challenge that affects service delivery [ 26 , 27 ]. However, to our knowledge, there needs to be more literature on the perspectives of child healthcare workers (HCWs) at Kamuzu Central Hospital (KCH) in Malawi in conducting child health research. Therefore, this study sought to explore child HCWs’ experiences conducting child health research at KCH, assessing their perspectives and the availability of research opportunities at both delivery and policy levels.

Study design and setting

We applied interpretative phenomenological analysis (IPA), a qualitative research approach that investigates how individuals make sense of their lived experiences [ 28 , 29 , 30 , 31 , 32 , 33 ], to examine how child HCWs perceive their research experiences. IPA is suitable when more knowledge of the explored phenomenon is needed.

Key Informant Interviews (KIIs) and Individual In-Depth Interviews (IDIs) generated detailed descriptions of child HCWs' experiences in conducting child health research at KCH, a tertiary hospital with a well-established paediatric section in Lilongwe, Malawi (Fig.  1 ). KCH is a primary site for child health studies, making it suitable for exploring child HCWs' research experiences. Policymakers are also at Lilongwe's Ministry of Health (MoH) Headquarters.

figure 1

Map showing the study area

Recruitment

We purposively [ 34 ] sampled 20 HCWs involved in child healthcare delivery and five policymakers responsible for the paediatric section at KCH. Previous research has shown that at least six IDIs are enough to reach saturation [ 35 ]. Being an IPA approach, 25 participants were more than enough to get detailed research perceptions without targeting saturation [ 29 ]. Policymakers included health directors, managers, and a coordinator in the child health space. HCWs included doctors, clinicians, nurses, pharmacists, lab technicians, and biomedical engineers. The principal investigator (MKN) and the unit leader at the KCH paediatric unit compiled a list of prospective participants. MKN contacted them to explain the study before sending consent forms and planning for interviews. All contacted participants agreed to participate.

Data collection

MKN collected data using the same interview guide for both categories from July 2023 to August 2023. The interview guide was developed in English (Supplementary Information 1) and pre-tested by MKN at Area 25 Health Centre in Lilongwe to five purposively sampled HCWs involved in child healthcare delivery. The pre-test results assisted in refining the questions for clarity. The interview guide included questions on perceptions of child health research in Malawi, the importance of child health research, child HCWs’ current capacity in research, and the availability of research opportunities among child HCWs.

The IDIs provided a detailed exploration of everyone’s perspective [ 36 ], and KIIs created room for triangulation of results. Participants were explained the study, and MKN obtained signed informed consent. All participants were identified using numbers. MKN conducted interviews in English and audio-taped to make sure that everything was captured. All participants were professional HCWs who had gone up to tertiary education level and were conversant with English. Participants were interviewed in their offices for confidentiality and to create a safe environment to explain their perceptions freely. The interviews lasted 30–45 min, and we provided lunch allowances to all the participants. MKN took notes and summarised key points after every interview for validation [ 30 ].

Data analysis

The recordings were transcribed verbatim in English by MKN. TBN listened to all recordings and cross-checked the verbatim transcription. Using IPA, we analysed data flexibly in five steps described by Smith et al. [ 29 ]. Firstly, MKN was immersed in the data set by repeatedly reading the transcripts while stepping into the participants’ shoes as deeply as possible to note the initial thoughts, observations, and responses concerning the research objectives [ 37 ]. Secondly, MKN developed a codebook examining codes related to general perceptions of child health research.

Thirdly, LMT and TBN checked the codebook for validation by independently reading the first two transcripts line by line and identifying emerging codes to ensure coding reliability and consistency. The researchers regrouped for a final codebook through a consensus process by looking at commonalities and differences [ 38 ]. The final codebook was agreed upon by the joint consensus of all authors [ 38 , 39 ]. Fourthly, MKN coded all transcripts manually using the validated codebook by grouping similar excerpts in a Word document for easy immersion/familiarisation with the data through repeatedly and active reading [ 40 ]. Fifthly, all authors regrouped again and identified relationships between codes. The frequently identified codes were merged, and themes were generated from these codes. Throughout these steps, we focused on participants’ interpretations of their experiences in research. This manuscript comprises summaries, interpretations, and quotes from participants’ excerpts.

Ethical considerations

The College of Medicine Research Ethics Committee (COMREC) [P.06/23-0089] approved this study. Before data collection, we obtained written informed consent from all participants. We maintained confidentiality and anonymity by allocating numbers and transcripts to the participants. Each participant was informed about voluntary participation and the option to withdraw at any stage without repercussions. This study was conducted per the Declaration of Helsinki guidelines and regulations [ 41 ].

Reflexivity

We were mindful of our prior experiences and preconceptions shaped by our backgrounds in health research [ 31 , 32 , 33 , 37 , 42 ]. As researchers immersed in data collection, transcription, and analysis, we acknowledged that our understanding could influence interpretations. Through ongoing reflexive practices, such as team discussions and journaling, we recognised the subjectivity of our perspectives [ 31 , 32 , 33 , 37 , 42 ].

MKN played a pivotal role throughout the study, engaging in data collection, transcription, and leading the analysis, which enabled nuanced interpretations of HCWs' participation in child health research [ 43 ]. Our collective experience in qualitative research shaped our analytical stance and guided the emphasis on certain themes. By acknowledging our positionality and the iterative nature of our interpretations, we aimed to enhance the transparency and rigour of our study [ 43 ]. Reflexivity enriched our understanding and guided the interpretation and presentation of findings.

There were more female participants in the HCW category (60%) but one female participant in the policymaker category (20%) (Table  1 ). Most participants in the HCW category were aged between 25 and 34 years (65%), with most policymakers being above 45 years old (60%). All the nurses were females. Two doctors in the HCW group have a publication each, a pharmacist has two publications, and the publications for all the policymakers who participated in the KIIs add up to 104.

Participants in this study reported low involvement in research. Three key themes emerged from the transcripts. These are (i) perceived motivation and challenges for engaging in child health research, (ii) perceptions of resource availability and research opportunities at KCH, and (iii) perceptions of gaps in research training and participation among child HCWs. The themes are discussed below.

Perceived motivation and challenges for engaging in child health research

Participants reported some intrinsic professional motivation and multifaceted challenges for engaging in child health research. A willingness to better understand child health issues was reported as a strong motivator among child HCWs at KCH to engage in child health research.

“We know that part of our job is to look at the progress of the diseases and how the evolution has been for many years… looking at the fact that medicine is dynamic…all the changes that are happening in terms of vaccines, medicines, and all the changes, make everybody who works in the pediatric department to have that feeling and need to do more research. So yes, the willingness is there.” ID1 18, Doctor “Research that can tell us about the changing epidemiology of the disease would be paramount, and that should be very well documented and disseminated through the layers for implementation purposes to influence decision making that is still not happening... emerging conditions like Non Communcable Dseases (NCDs) in children...” KII 04, Policymaker

Child HCWs and policymakers expressed that financial incentives influence their motivation for engaging in research. Participants associate research with significant economic gains despite the struggles researchers have to go through to secure grants.

“…research comes with a lot of resources, so some can be used to improve the lives of the people conducting research.” IDI 10, Clinical Administrator “There is freedom of money in research if you find a grant. I know you have to struggle to find a grant, but after the struggle, there’s something you can benefit from. Healthcare workers do not know that research has some monetary benefits.” KII 04, Policymaker

However, other participants showed reluctance to engage in research due to various barriers within their environment, as explained below.

“…People are not oriented on how they can conduct research in children… in our lab, we generate data, so we expect that people will come to ask what they have noted, but only a few have come to ask us for data... So, it just shows that people are not interested in the data.” IDI 05, Lab Technician “Most of us get discouraged because we do not see the results of most studies happening here.” 1D1 19, Pediatric nurse

Policymakers emphasised the need for increased research initiatives to empower child HCWs to research in their respective hospital settings.

“The effort of research itself is low… We need to move in a direction where you can wake up and start writing a research question independently. It is something that needs to be pushed.” KII 04, Policymaker

Participants highlighted the limited involvement of nursing professionals in research, attributing it to insufficient capacity. Nurses reported that, unlike medical professionals, they are mostly overlooked in child health research.

“Most of us, especially nurses, are quiet. We are not active, unwilling, and do not participate more compared to the other side of medical, like the clinicians and the other team. Most of us are not experienced in research. So, with a lack of knowledge and expertise in research, we are not active compared to the other team.” IDI 01, Palliative nurse “No, in nursing, no. In pediatric? I have never heard of it, but the medical ones, like the doctors, are the ones who do that. For us, it is just continued professional development (CPD). Maybe because I am a junior, I don’t know much, but I have worked here for four years.” 1DI 20, Pediatric nurse

Some participants attributed their limited involvement in research to their busy clinical schedules, as outlined below.

“We don’t have time... doing normal clinical work is a lot of burden, and then there is the administration, clinical work, and teaching. … You need to formulate a time frame for research. People are torn between sitting behind their laptops and working on research or seeing patients. So the patient always takes precedence.” IDI 10, Clinical Administrator ” Research needs time, and for health professionals in Government hospitals, there is so much pressure for work, so we prioritise seeing clients over doing research. … We are so much interested in doing research, especially in medical equipment for neonates, children, and all that, but what limits us is the time factor.” IDI 16, Biomedical Engineer

In addition to time constraints, female nurses link their limited participation in research to gender roles, which hinder their ability to pursue research opportunities outside of their working hours.

“Women are too busy than men in our culture. Males find a lot of information on the Internet on how to conduct research. For females, we come to work and are busy with our daily routine; we go back home, are tired with the kids, and go to bed ...” IDI 19, Pediatric nurse.

These findings underscore the importance of addressing these barriers and enhancing support for child HCWs and policymakers to foster a conducive environment for research in child health.

Perceptions of resource availability and research opportunities at KCH

In this study, we were keen to understand the available research opportunities for child HCWs at KCH. Participants reported that research is not a priority in the annual budget at KCH, with only a small portion allocated for research activities. Child HCWs indicated that these funds often get diverted into clinical expenditures, leaving little-to-no resources available for research endeavours.

“Government policy demands every institution to allocate 1-2% of the annual funding to research, but in most cases, this money is not available for research …maybe the priority is on the clinical part of treating the patients. So, if I am interested in doing a study, I have to find funding to conduct it even though it will benefit the hospital.” IDI 05, Pharmacist.

Participants, therefore, emphasised the need for institutional support, including allocating a budget line specifically for research to minimise the diversion of funds from research to service delivery and encourage child HCWs to engage in research activities.

“They should allocate a certain amount for research because if they can, it will have its budget line within the hospital that the hospital cannot tap from for other expenses... That would make people interested in research because they would know there are already some funds I can utilise elsewhere.” IDI 05, Pharmacist

Child HCWs expressed concerns about limited funding opportunities for research, advocating for funders to be more open to supporting new researchers. They suggested that funders should allow new researchers to participate in grant writing competitions and allocate grants to encourage early career researchers.

“Funders should be more open to new people writing grants. The case of looking for proven records and experience. Where do you get the experience if you are starting? Give new people some small grants and see how they handle that...” IDI 10, Clinical Administrator

These sentiments were echoed by policymakers who admitted that, as a country, very little money is allocated for research.

“As a country, we are not investing in research in terms of money. People can have ideas, but we do not expect them to take the little money from their pockets. ” KII 04, Policymaker

Child HCWs highlighted the challenge of securing external funding for research projects due to the need for more institutional support and resources. The theme exposes the systemic challenges child HCWs face at KCH in pursuing research opportunities, including inadequate funding and competing priorities within the healthcare system.

Perceptions of gaps in research training and participation among child HCWs

Participants expressed a clear need for more emphasis on research training to address capability gaps among HCWs.

“…if we were exposed to training … I think I can be confident enough to research on my own and develop some manuscripts for publication.” 1D1 19, Pediatric Nurse “When someone is doing a study, we ask for our involvement to get mentored. But how involved are we? They have already developed a concept, done everything, and now they are on data collection; that is when we are involved. So, I would say our involvement should start from conceptualisation.” IDI 03, Nurse Administrator

Participants expressed inadequate training and capacity-building opportunities among HCWs, hindering their ability to engage in research activities effectively.

“I would say the opportunities are limited; if we had such opportunities, we have the team of people that are always willing to work in research, to do more research, but such opportunities are minimal…we need some sort of training here and there...” IDI 05, Pharmacist “I wouldn’t say there is any training in the department or at a hospital level to enhance someone’s progress with research. We get interested in doing research, but at the hospital level, there are no training and capacity-building activities.” 1DI 18, Doctor

Participants highlighted the need for more exposure to research training due to the absence of research concepts in clinical, medical, and nursing education curricula. They suggested incorporating research concepts into CPD programs as a potential solution to bridge this gap.

We underwent medical training and internship and were introduced to research concepts when we started working. It is a new concept to us, so it becomes a challenge... Again, it should be part of CPD. We do CPD as professionals, but mostly, it’s the same things that we do in the hospitals. I have never seen research being part of it.” ID1 18, Doctor

Policymakers acknowledged the importance of incorporating child HCWs into technical working group meetings to expose them to research gaps in the health system.

“I think it is about including them in our technical working groups; that is when they will be open and be exposed to implementation arrangements. They will also be motivated to say this is the area we can do something on.” KII 01, Policymaker

There was a perceived need to strengthen collaborations with stakeholders in various areas of healthcare to improve child HCWs' participation in research.

“…People should know that who is there in this area. We have people specialising in child health, but where are they? Do we know them? Why? They are somewhere in an organisation where we can’t even access them. But why do we have people specialising in child health? They are the ones who are supposed to be in the forefront...” IDI 03, Nurse Administrator

The results under this theme describe the multifaceted gaps that hinder child HCWs’ engagement in research activities and the importance of addressing these gaps through enhanced researcher involvement, training and capacity-building opportunities, policy support on health education curriculum, and improved collaboration with stakeholders.

This study offers valuable insights into the perceptions of child HCWs at KCH regarding child health research. Despite a general willingness among child HCWs to engage in research, participation still needs to improve due to various challenges.

A significant challenge identified was the need for more research capacity among child HCWs at KCH. Our study identified that child HCWs at KCH have a low drive to engage in child health research due to a lack of research training during their studies. This is in contrast to Tanzania, where a similar study found that over half of the participants had received research training at a university or medical college [ 44 ], which may result in regional differences in research capacity among child HCWs in sub-Saharan Africa. These disparities could impact research outcomes and the effectiveness of child health interventions in different countries. We recommend urgent reforms in the clinical education curriculum in Malawi to incorporate research training and bridge this gap.

Our findings highlight a significant involvement gap in research collaboration practices at KCH, where child HCWs are mainly involved in data collection rather than other processes, such as developing protocols, data analysis, and manuscript writing. The challenge is further exacerbated by the absence of a dedicated child health department within the Ministry of Health to foster collaboration and ownership of child health initiatives. In addition to advocating for establishing a standalone child health department to improve stakeholder collaboration and child health outcomes [ 45 , 46 ], we also advocate for a comprehensive engagement approach for child HCWs in research from conceptualisation to report writing. Collaboration between academic and research institutions can provide cost-effective training and expertise sharing, as recommended by the previous studies [ 44 , 47 , 48 , 49 ].

Gender constraints were also identified as a significant challenge for participation in research among female child HCWs at KCH. While male child HCWs have taken the initiative to self-train in research during their spare time using the internet, female child HCWs lack such opportunities due to household gender roles which fill their non-professional work time. Similar findings from other African countries have been reported [ 44 , 50 ]. Policymakers are urged to implement strategies to empower female child HCWs, including early involvement in research processes and creating supportive environments conducive to research engagement. Additionally, KCH child HCWs face significant time constraints due to heavy workloads, similar to findings from both African and developed countries [ 50 , 51 , 52 , 53 , 54 , 55 ]. To address this, we recommend allocating protected research time within the hospital, supported by increased human resources and integration of research into duty rosters.

Financial constraints were another significant challenge to research participation at KCH as the annual budget needs to prioritise research. This highlights the need for a dedicated budget line for research to ensure adequate funding. Lack of financial resources limits the ability to conduct research and affects the quality and scope of studies. With sufficient funding, procuring materials, compensating participants, and covering other essential expenses is easier. Prior research has also identified insufficient finances as a barrier to African research [ 44 , 47 ], indicating a broader systemic issue. Addressing this financial gap is crucial to fostering a research culture, building capacity among child HCWs, and improving healthcare outcomes through evidence-based practices. Enhanced funding mechanisms from governmental and non-governmental sources are needed to overcome these obstacles and promote a research-oriented healthcare environment.

Limitations

This is the first study to explore HCW experiences in child health research at KCH in Malawi. Our child HCW sample had more females, while the Policymaker sample included only one female, potentially introducing gender bias. Interpretative Phenomenological Analysis (IPA) allowed for detailed and nuanced perceptions of child health research at KCH. However, a qualitative study cannot establish causality, and the sample from a single hospital in central Malawi may limit generalizability. Despite this, the identified barriers and recommendations likely apply to all government hospitals in Malawi, as they share common health education institutions and policies. Further research involving multiple sites and more balanced gender representation is needed to validate and extend these findings.

Our study has identified crucial factors contributing to the low participation of child HCWs in child health research at KCH. We found notable gaps in research participation among child HCWs at KCH, including a lack of collaboration, limited financial opportunities, and non-research-based training. Some child HCWs expressed a strong research interest, but challenges at both individual and institutional levels hinder engagement. We advocate for targeted capacity-building interventions to address these challenges and promote a culture of research excellence. Prioritising these initiatives can foster a conducive environment for child health research and enhance outcomes in Malawi.

Availability of data and materials

The dataset generated and analysed during the current study is not publicly available. Even without identifiers such as names, the dataset could hold identifiable participant information in aggregate form due to some narratives in the transcripts with participants sampled in a single district. The pediatric department at KCH is a small section. With potential identifiers in the transcript narratives, we believe it would be ethically inappropriate to publicly share the data that could reveal our participants’ identities if read by someone within the district or KCH. The dataset or part of it could be available from the corresponding author at a reasonable request with permission from KCH. No datasets were generated or analysed during the current study.

Abbreviations

Healthcare Workers

Ministry of Health

  • Kamuzu Central Hospital

Human Immunodeficiency Virus

Acquired Immunodeficiency Syndrome

College of Medicine Research Ethics Committee

Key Informant Interview

Individual In-depth Interviews

Hanifiha M, Ghanbari A, Keykhaei M, Saeedi Moghaddam S, Rezaei N, Pasha Zanous M, et al. Global, regional, and national burden and quality of care index in children and adolescents: a systematic analysis for the global burden of disease study 1990–2017. PLoS ONE. 2022. https://doi.org/10.1371/journal.pone.0267596 .

Article   PubMed   PubMed Central   Google Scholar  

Khan MI, Memon ZA, Bhutta ZA. Challenges and opportunities in conducting research in developing countries. Pediatr Epidemiol. 2017. https://doi.org/10.1159/000481323 .

Article   Google Scholar  

Liu L, Oza S, Hogan D, Perin J, Rudan I, Lawn JE, et al. Global, regional, and national causes of child mortality in 2000–13, with projections to inform post-2015 priorities: an updated systematic analysis. Lancet. 2015;385:430–40. https://doi.org/10.1016/S0140-6736(14)61698-6 .

Article   PubMed   Google Scholar  

Requejo JH, Bhutta ZA. The post-2015 agenda: staying the course in maternal and child survival. Arch Dis Child. 2015;100(suppl 1):S76–81. https://doi.org/10.1136/archdischild-2013-305737 .

The World Health Organisation. Child health research. A foundation for improving child health. https://apps.who.int/iris/bitstream/handle/10665/68359/WHO_FCH_CAH_02.3.pdf;jsessionid=8860E98FB42246C4A3869DD54B713910?sequence=1

Lang T, Siribaddana S. Clinical trials have gone global: is this a good thing? PLoS Med. 2012. https://doi.org/10.1371/journal.pmed.1001228 .

Mbuagbaw L, Thabane L, Ongolo-Zogo P, Lang T. The challenges and opportunities of conducting a clinical trial in a low-resource setting: the case of the Cameroon mobile phone SMS (CAMPS) trial, an investigator-initiated trial. Trials. 2011;12:1. https://doi.org/10.1186/1745-6215-12-145 .

Khodaee GH, Khademi G, Saeidi M. Under-five mortality in the world (1900–2015). Int J Pediatr. 2015;3:1093–5. https://doi.org/10.22038/ijp.2015.6078 .

Victora CG, Wagstaff A, Schellenberg JA, Gwatkin D, Claeson M, Habicht J. Applying an equity lens to child health and mortality: more of the same is not enough. Lancet. 2003;362:233–41. https://doi.org/10.1016/S0140-6736(03)13917-7 .

Reiner RC Jr, Olsen HE, Ikeda CT, Echko MM, Ballestreros KE, Manguerra H, et al. Diseases, injuries, and risk factors in child and adolescent health, 1990 to 2017: findings from the global burden of diseases, injuries, and risk factors 2017 study. JAMA Pediatr. 2019. https://doi.org/10.1001/jamapediatrics.2019.0337 .

Dabis F, Orne-Gliemann J, Perez F, Leroy V, Newell ML, Coutsoudis A, et al. Working group on women and child health. Improving child health: the role of research. BMJ. 2002;324(7351):1444–7. https://doi.org/10.1136/bmj.324.7351.1444 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Alemayehu C, Mitchell G, Nikles J. Barriers for conducting clinical trials in developing countries—a systematic review. Int J Equity Health. 2018;17:37. https://doi.org/10.1186/s12939-018-0748-6 .

Murray CJ, Lopez AD. Global comparative assessments in the health sector: disease burden, expenditures and intervention packages. 1994. https://www.ncbi.nlm.nih.gov/books/NBK209104/ .

Orne-Gliemann J, Perez F, Leroy V, Newell ML, Dabis F. A decade of child health research in developing countries. Sante. 2003;13:69–75.

PubMed   Google Scholar  

Røttingen JA, Regmi S, Eide M, Young AJ, Viergever RF, Ardal C, et al. Mapping of available health research and development data: what’s there, what’s missing, and what role is there for a global observatory? Lancet. 2013;382(9900):1286–307. https://doi.org/10.1016/S0140-6736(13)61046-6 .

Moon S, Bermudez J, Hoen E. Innovation and access to medicines for neglected populations: could a treaty address a broken pharmaceutical R&D system? PLoS Med. 2012;9: e1001218. https://doi.org/10.1371/journal.pmed.1001218 .

Chu KM, Jayaraman S, Kyamanywa P, Ntakiyiruta G. Building research capacity in Africa: equity and global health collaborations. PLoS Med. 2014;11(3): e1001612. https://doi.org/10.1371/journal.pmed.1001612 .

González-Alcaide G, Menchi-Elanzi M, Nacarapa E, Ramos-Rincón JM. HIV/AIDS research in Africa and the Middle East: participation and equity in North-South collaborations and relationships. Glob Health. 2020;16:83. https://doi.org/10.1186/s12992-020-00609-9 .

Zakumumpa H, Bennett S, Ssengooba F. Leveraging the lessons learned from financing HIV programs to advance the universal health coverage (UHC) agenda in the East African community. Glob Health Res Policy. 2019;4:27. https://doi.org/10.1186/s41256-019-0118-y .

Langer A, Díaz-Olavarrieta C, Berdichevsky K, Villar J. Why is research from developing countries underrepresented in international health literature, and what can be done about it? Bull World Health Organ. 2004;82:802–3.

PubMed   PubMed Central   Google Scholar  

Smith E, Hunt M, Master Z. Authorship ethics in global health research partnerships between researchers from low or middle-income countries and high-income countries. BMC Med Ethics. 2014;15:42. https://doi.org/10.1186/1472-6939-15-42 .

Muula AS, Misiri H, Chimalizeni Y, Mpando D, Phiri C, Nyaka A. Access to continued professional education among health workers in Blantyre, Malawi. Afr Health Sci. 2004;4(3):182–4.

WHO Expert Committee. Guidelines for Good Clinical Practice (GCP) for trials on pharmaceutical products and the use of essential drugs. Sixth report of the WHO expert committee. Geneva: WHO Technical Report Series; 1995. https://www.femh-irb.org/content_pages/files_add/doc_arb/I01_9712011000.pdf .

McMichael C, Waters E, Volmink J. Evidence-based public health: what does it offer developing countries? J Public Health. 2005;27:215–21. https://doi.org/10.1093/PubMed/fdi024 .

Ministry of Health. (2012). National Health Research Agenda 2012–2016. file:///C:/Users/HP/Downloads/Malawi%20National-Health-Research-Agenda.pdf.

Masefield SC, Msosa A, Grugel J. Challenges to effective governance in a low-income healthcare system: a qualitative study of stakeholder perceptions in Malawi. BMC Health Serv Res. 2020;20:1142. https://doi.org/10.1186/s12913-020-06002-x .

Government of the Republic of Malawi. Health Sector Strategic Plan II 2017–2022. Towards universal health coverage. Lilongwe: Ministry of Health; 2017. https://www.healthdatacollaborative.org/fileadmin/uploads/hdc/Documents/Country_documents/HSSP_II_Final_HQ_complete_file.pdf.pdf

Sihre HK, Gill P, Lindenmeyer A, McGuiness M, Berrisford G, Jankovic J, et al. Understanding the lived experiences of severe postnatal psychiatric illnesses in English speaking South Asian women, living in the UK: a qualitative study protocol. BMJ Open. 2019. https://doi.org/10.1136/bmjopen-2018-025928 .

Smith JA, Fieldsend M. Interpretative phenomenological analysis. In: Camic PM, editor. Qualitative research in psychology: expanding perspectives in methodology and design (2nd ed.). Washington: American Psychological Association; 2021. p. 147–66. https://doi.org/10.1037/0000252-008 .

Chapter   Google Scholar  

Pietkiewicz I, Smith JA. A practical guide to using interpretative phenomenological analysis in qualitative research psychology. Psychol J. 2014;20:7–14. https://doi.org/10.14691/CPPJ.20.1.7 .

Heidegger M. Introduction to phenomenological research. Indiana University Press. 2005.

Rodriguez A, Smith J. Phenomenology as a healthcare research method. Evid Based Nurs. 2018;21(4):96–8. https://doi.org/10.1136/eb-2018-102990 .

Crotty M. Phenomenology and nursing research. Melbourne: Churchill Livingstone; 1996.

Google Scholar  

Tongco MDC. Purposive sampling as a tool for informant selection. Ethnobot Res Appl. 2007;5:147–58.

Guest G, Bunce A, Johnson L. How many interviews are enough? An experiment with data saturation and variability. Field Methods. 2006;18:59–82. https://doi.org/10.1177/1525822X05279903 .

Ritchie J, Lewis J. Qualitative research practice: a guide for social science students and researchers—Google Books. 2013

Heidegger M. Being and time (J. Macquarrie, & E. Robinson, Trans.). Oxford, UK & Cambridge, USA: Blackwell Publishers Ltd; 1962 (Original work published 1927).

Burnard P, Gill P, Stewart K, Treasure E, Chadwick B. Analysing and presenting qualitative data. Br Dent J. 2008;204(8):429–32. https://doi.org/10.1038/SJ.bdj.2008.292 .

Article   CAS   PubMed   Google Scholar  

Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349–57. https://doi.org/10.1093/intqhc/mzm042 .

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3(2):77–101. https://doi.org/10.1191/1478088706qp063oa .

World Medical Association. Declaration of Helsinki. Brazil: The World Medical Association, Inc. 2013. https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/ .

Olmos-Vega FM, Stalmeijer RE, Varpio L, Kahlke R. A practical guide to reflexivity in qualitative research: AMEE Guide No. 149. Med Teach. 2022. https://doi.org/10.1080/0142159X.2022.2057287 .

Mills J, Bonner A, Francis K. The development of constructivist grounded theory. Int J Qual Methods. 2006;5:25–35. https://doi.org/10.1177/160940690600500103 .

Kengia JT, Kalolo A, Barash D, Chwa C, Hayirli TC, Kapologwe NA, et al. Research capacity, motivators and barriers to conducting research among healthcare providers in Tanzania’s public health system: a mixed methods study. Hum Resour Health. 2023;21(1):73. https://doi.org/10.1186/s12960-023-00858-w .

Franzen SRP, Chandler C, Lang T. Health research capacity development in low and middle-income countries: reality or rhetoric? A systematic meta-narrative review of the qualitative literature. BMJ Open. 2017;7: e012332. https://doi.org/10.1136/bmjopen-2016-012332 .

Gureje O, Seedat S, Kola L, Appiah-Poku J, Othieno C, Harris B, et al. Partnership for mental health development in sub-Saharan Africa (PaM-D): a collaborative initiative for research and capacity building. Epidemiol Psychiatr Sci. 2019;28:389–96. https://doi.org/10.1017/S2045796018000707 .

Sun C, Dlamini PS, Maimbolwa MC, Changala Lukwesa Mukonka C, Nyamakura R, Omoni G, et al. Success stories: overcoming barriers to research in southern and eastern African countries. Clin Nurs Res. 2017;26:399–418. https://doi.org/10.1177/1054773817718935 .

Thomson DR, Semakula M, Hirschhorn LR, Murray M, Ndahindwa V, Manzi A, et al. Applied statistical training to strengthen analysis and health research capacity in Rwanda. Health Res Policy Syst. 2016;14:73. https://doi.org/10.1186/s12961-016-0144-x .

Bates I, Boyd A, Smith H, Cole DC. A practical and systematic approach to organisational capacity strengthening for research in the health sector in Africa. Health Res Policy Syst. 2014;12:11. https://doi.org/10.1186/1478-4505-12-11 .

Pascal Iloh GU, Amadi AN, Iro OK, Agboola SM, Aguocha GU, Chukwuonye ME. Attitude, practice orientation, benefits and barriers towards health research and publications among medical practitioners in Abia State, Nigeria: a cross-sectional study. Niger J Clin Pract. 2020;23:129–37. https://doi.org/10.4103/njcp.njcp_284_18 .

Corchon S, Portillo MC, Watson R, Saracíbar M. Nursing research capacity building in a Spanish hospital: an intervention study. J Clin Nurs. 2011;20(17–18):2479–89. https://doi.org/10.1111/j.1365-2702.2011.03744.x .

Gulland A. Doctors cite lack of time as greatest barrier to research. Br Med J. 2016. https://doi.org/10.1136/bmj.i1488 .

Habineza H, Nsanzabaganwa C, Nyirimanzi N, Umuhoza C, Cartledge K, Conard C, et al. Perceived attitudes of the importance and barriers to research amongst Rwandan interns and pediatric residents—a cross-sectional study. BMC Med Educ. 2019;19:4. https://doi.org/10.1186/s12909-018-1425-6 .

Golenko X, Pager S, Holden L. A thematic analysis of the role of the organisation in building allied health research capacity: a senior managers’ perspective. BMC Health Serv Res. 2012;12:276. https://doi.org/10.1186/1472-6963-12-276 .

Pager S, Holden L, Golenko X. Motivators, enablers, and barriers to building allied health research capacity. J Multidiscip Healthc. 2012;5:53–9. https://doi.org/10.2147/JMDH.S27638 .

Download references

Acknowledgements

The authors are grateful to the Management and Staff at Kamuzu Central Hospital (KCH), the Management and Staff of the KCH pediatric section, and all HCWs who participated in the study. The authors acknowledge the International Child Health Group for the financial support. The authors recognise the Ministry of Health and all Policymakers who participated in the study. Last but not least, the authors appreciate the staff members at the Training and Research Unit of Excellence for their administrative support throughout the study.

This study was supported by the International Child Health Group (ICHG), UK (ICHG Project No. ICHG835/537044). The funding body had no role in the study’s design, data collection, analysis, interpretation of data, or manuscript write-up.

Author information

Authors and affiliations.

School of Global and Public Health, Kamuzu University of Health Sciences, Blantyre, Malawi

Myness Kasanda Ndambo & Lucinda Manda-Taylor

Training and Research Unit of Excellence (TRUE), Kufa Road, Mandala, P. O. Box 30538, Blantyre, 3, Malawi

Department of Civic Education, Ministry of Local Government, Unity and Culture, Lilongwe, Malawi

Tuntufye Brighton Ndambo

You can also search for this author in PubMed   Google Scholar

Contributions

MKN, TBN, and LMT conceptualised and designed the study. MKN collected and analysed data with TBN. MKN drafted the manuscript. TBN and LMT reviewed the manuscript, provided input, and suggested additions and changes. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Myness Kasanda Ndambo .

Ethics declarations

Ethics approval and consent to participate.

The College of Medicine Research Ethics Committee (COMREC) approved the study [Protocol No. P.06/23-0089]. Before data collection, we obtained written informed consent from all participants. We maintained confidentiality and anonymity by allocating numbers to the participant transcripts. The information letter informed participants of their choice to participate and the option to withdraw at any stage of the research process. This study was conducted per the Declaration of Helsinki guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1. interview guide for policymakers and child healthcare workers., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ndambo, M.K., Ndambo, T.B. & Manda-Taylor, L. Exploring healthcare workers’ perceptions of child health research at Kamuzu Central Hospital, Malawi: an interpretative phenomenological analysis. Hum Resour Health 22 , 57 (2024). https://doi.org/10.1186/s12960-024-00938-5

Download citation

Received : 08 April 2024

Accepted : 22 July 2024

Published : 20 August 2024

DOI : https://doi.org/10.1186/s12960-024-00938-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Child health
  • Child health research
  • Interpretative phenomenological analysis
  • Healthcare workers

Human Resources for Health

ISSN: 1478-4491

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

peer review analysis of research

IMAGES

  1. Behind the Scenes in Academic Publishing: A Closer Look at Peer Review

    peer review analysis of research

  2. The Peer Review Process

    peer review analysis of research

  3. Peer Review Definition

    peer review analysis of research

  4. How to Publish Your Article in a Peer-Reviewed Journal: Survival Guide

    peer review analysis of research

  5. What Is Peer Review?

    peer review analysis of research

  6. Importance Of Peer Review Research When Formulating New Food And Drug Products

    peer review analysis of research

COMMENTS

  1. What Is Peer Review?

    The most common types are: Single-blind review. Double-blind review. Triple-blind review. Collaborative review. Open review. Relatedly, peer assessment is a process where your peers provide you with feedback on something you've written, based on a set of criteria or benchmarks from an instructor.

  2. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  3. Research Methods: How to Perform an Effective Peer Review

    Peer review has been a part of scientific publications since 1665, when the Philosophical Transactions of the Royal Society became the first publication to formalize a system of expert review. 1,2 It became an institutionalized part of science in the latter half of the 20 th century and is now the standard in scientific research publications. 3 In 2012, there were more than 28 000 scholarly ...

  4. Peer review analyze: A novel benchmark resource for ...

    1 Introduction. The peer-review process is the only widely accepted method of research validation. However, academia often criticizes the peer review system as non-transparent [1, 2], biased [], arbitrary [] and inconsistent [5, 6], a flawed process at the heart of science [], leading to researchers arguing with its reliability [] and quality [9, 10].

  5. Peer Review in Scientific Publications: Benefits, Critiques, & A

    Peer review is a mutual responsibility among fellow scientists, and scientists are expected, as part of the academic community, to take part in peer review. If one is to expect others to review their work, they should commit to reviewing the work of others as well, and put effort into it. 2) Be pleasant. If the paper is of low quality, suggest ...

  6. Understanding Peer Review in Science

    The manuscript peer review process helps ensure scientific publications are credible and minimizes errors. Peer review is an essential element of the scientific publishing process that helps ensure that research articles are evaluated, critiqued, and improved before release into the academic community. Take a look at the significance of peer review in scientific publications, the typical steps ...

  7. The peer review process

    The review of research articles by peer experts prior to their publication is considered a mainstay of publishing in the medical literature. [ 1, 2] This peer review process serves at least two purposes. For journal editors, peer review is an important tool for evaluating manuscripts submitted for publication.

  8. Demystifying the process of scholarly peer-review: an ...

    The peer-review process is the longstanding method by which research quality is assured. On the one hand, it aims to assess the quality of a manuscript, with the desired outcome being (in theory ...

  9. Peer review guidance: a primer for researchers

    The peer review process is essential for evaluating the quality of scholarly works, suggesting corrections, and learning from other authors' mistakes. The principles of peer review are largely based on professionalism, eloquence, and collegiate attitude. As such, reviewing journal submissions is a privilege and responsibility for 'elite ...

  10. The researchers using AI to analyse peer review

    But researchers who used machine learning to study 10,000 peer-review reports in biomedical journals have tried. They invented proxy measures for quality, which they term thoroughness and ...

  11. Peer review analyze: A novel benchmark resource for ...

    Peer Review is at the heart of scholarly communications and the cornerstone of scientific publishing. ... A novel benchmark resource for computational analysis of peer reviews PLoS One. 2022 Jan 27 ;17(1 ... Peer review texts could serve as a rich source of Natural Language Processing (NLP) research on understanding the scholarly communication ...

  12. What Is Peer Review?

    The most common types are: Single-blind review. Double-blind review. Triple-blind review. Collaborative review. Open review. Relatedly, peer assessment is a process where your peers provide you with feedback on something you've written, based on a set of criteria or benchmarks from an instructor.

  13. Peer review

    Peer review cannot improve poor research, but it can often "correct, enhance and strengthen the statistical analysis of data and can markedly improve ... Was a sample size necessary and if so was it done? Was the analysis of the data appropriate, and would any further analysis be helpful? It is useful for the editor, and to the author when they ...

  14. Tools used to assess the quality of peer review reports: a

    Background A strong need exists for a validated tool that clearly defines peer review report quality in biomedical research, as it will allow evaluating interventions aimed at improving the peer review process in well-performed trials. We aim to identify and describe existing tools for assessing the quality of peer review reports in biomedical research. Methods We conducted a methodological ...

  15. How to Write a Peer Review

    Here's how your outline might look: 1. Summary of the research and your overall impression. In your own words, summarize what the manuscript claims to report. This shows the editor how you interpreted the manuscript and will highlight any major differences in perspective between you and the other reviewers. Give an overview of the manuscript ...

  16. Fragments of peer review: A quantitative analysis of the literature

    Co-authorship network analysis showed that research on peer review is fragmented, with the largest group of co-authors including only 2.1% of the whole community. Co-citation network analysis indicated a fragmented structure also in terms of knowledge. This shows that despite its central role in research, peer review has been examined only ...

  17. Understand Peer Review

    "Peer review" is an article review process where experts in a specific field will review an anonymous study or article before its potential publication. While there are limits to the the peer review process, this method allows multiple people to rigorously review submitted content (research, ideas, data, results, etc.) to ensure that the the ...

  18. Reviewers

    The peer review system exists to validate academic work, helps to improve the quality of published research, and increases networking possibilities within research communities. Despite criticisms, peer review is still the only widely accepted method for research validation and has continued successfully with relatively minor changes for some ...

  19. The emergence of a field: a network analysis of research on peer review

    Abstract. This article provides a quantitative analysis of peer review as an emerging field of research by revealing patterns and connections between authors, fields and journals from 1950 to 2016. By collecting all available sources from Web of Science, we built a dataset that included approximately 23,000 indexed records and reconstructed ...

  20. Academic Guides: Evaluating Resources: Peer Review

    documenting and citing sources used to help authenticate the research done. The standard peer review process only applies to journals. While scholarly writing has certainly been edited and reviewed, peer review is a specific process only used by peer-reviewed journals. Books and dissertations may be scholarly, but are not considered peer reviewed.

  21. Meta-Research: Large-scale language analysis of peer review reports

    Abstract. Peer review is often criticized for being flawed, subjective and biased, but research into peer review has been hindered by a lack of access to peer review reports. Here we report the results of a study in which text-analysis software was used to determine the linguistic characteristics of 472,449 peer review reports.

  22. PDF A Systematic Review of Technology-Supported Peer Assessment Research

    International Review of Research in Open and Distributed Learning Volume 20, Number 5. December - 2019 A Systematic Review of Technology-Supported Peer Assessment Research: An Activity Theory Approach Lanqin Zheng 1, Nian-Shing Chen2*, Panpan Cui , and Xuan Zhang1. 1Beijing Normal University, Beijing, Corresponding author2National Yunlin University of Science and Technology , Taiwan *

  23. Peer Review or Lottery? A Critical Analysis of Two Different Forms of

    At present, peer review is the most common method used by funding agencies to make decisions about resource allocation. But how reliable, efficient, and fair is it in practice? ... Clarke Philip. 2011. "Funding Grant Proposals for Scientific Research: Retrospective Analysis of Scores by Members of Grant Review Panel." British Medical ...

  24. How to write a systematic review article including meta-analysis

    A meta-analysis involves detailed scrutiny and analysis of a huge mass of literature. Thus, you need to be organized and follow a clear process for your work to be efficient and effective. Here's the process flow usually followed in a typical systematic review/meta-analysis: Step 1. Develop a research question. Step 2.

  25. A systematic review and meta-analysis of randomized trials of

    We conducted a systematic review and meta-analysis of 17 trials that examined the effect of substituting soymilk (median dose of 22 g/day or 6.6 g/250 mL serving of soy protein per day and 17.2 g/day or 6.9 g/250 mL of total [added] sugars in the sweetened soymilk) for cow's milk (median dose of 24 g/day or 8.3 g/250 mL of milk protein and 24 g/day or 12 g/250 mL of total sugars [lactose ...

  26. Suicide rates among physicians compared with the general ...

    Objectives To estimate age standardised suicide rate ratios in male and female physicians compared with the general population, and to examine heterogeneity across study results. Design Systematic review and meta-analysis. Data sources Studies published between 1960 and 31 March 2024 were retrieved from Embase, Medline, and PsycINFO. There were no language restrictions.

  27. Full article: Exploring effective video-review strategies of patient

    Methods . Following standardized patient encounters, 107 final-year medical students were divided into two groups based on different remediation methods of video review: (1) precepted video review with preceptor feedback (N = 55) and (2) private video review and subsequent peer group discussion under supervision (N = 52).All students underwent twelve-stations of OSCE both before and after the ...

  28. Technical efficiency and its determinants in health service delivery of

    The physical relationship between resources used (inputs) and outputs is referred to as technical efficiency (TE). A technically efficient position is reached when a set of inputs yields the maximum improvement in outputs [].Therefore, as it serves as a tool to achieve better health, health care can be viewed as an intermediate good, and efficiency is the study of the relationship between ...

  29. Global Prevalence of Depressive and Anxiety Symptoms in Children and

    eTable 5. Sensitivity analysis excluding low quality studies (score=2) for moderators of the prevalence of clinically elevated anxiety symptoms in children and adolescence during COVID-19. eFigure 1. PRISMA diagram of review search strategy. eFigure 2. Funnel plot for studies included in the clinically elevated depressive symptoms. eFigure 3.

  30. Exploring healthcare workers' perceptions of child health research at

    Children's health is a global public health priority and a determinant of development and sustainability. Its effective delivery and further improvements require constant and dedicated research on children, especially by child healthcare workers (HCWs). Studies have shown a high involvement of child HCWs from developed countries in child health research, with an under-representation from the ...