• USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • The Research Problem/Question
  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Applying Critical Thinking
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Quantitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

A research problem is a definite or clear expression [statement] about an area of concern, a condition to be improved upon, a difficulty to be eliminated, or a troubling question that exists in scholarly literature, in theory, or within existing practice that points to a need for meaningful understanding and deliberate investigation. A research problem does not state how to do something, offer a vague or broad proposition, or present a value question. In the social and behavioral sciences, studies are most often framed around examining a problem that needs to be understood and resolved in order to improve society and the human condition.

Bryman, Alan. “The Research Question in Social Research: What is its Role?” International Journal of Social Research Methodology 10 (2007): 5-20; Guba, Egon G., and Yvonna S. Lincoln. “Competing Paradigms in Qualitative Research.” In Handbook of Qualitative Research . Norman K. Denzin and Yvonna S. Lincoln, editors. (Thousand Oaks, CA: Sage, 1994), pp. 105-117; Pardede, Parlindungan. “Identifying and Formulating the Research Problem." Research in ELT: Module 4 (October 2018): 1-13; Li, Yanmei, and Sumei Zhang. "Identifying the Research Problem." In Applied Research Methods in Urban and Regional Planning . (Cham, Switzerland: Springer International Publishing, 2022), pp. 13-21.

Importance of...

The purpose of a problem statement is to:

  • Introduce the reader to the importance of the topic being studied . The reader is oriented to the significance of the study.
  • Anchors the research questions, hypotheses, or assumptions to follow . It offers a concise statement about the purpose of your paper.
  • Place the topic into a particular context that defines the parameters of what is to be investigated.
  • Provide the framework for reporting the results and indicates what is probably necessary to conduct the study and explain how the findings will present this information.

In the social sciences, the research problem establishes the means by which you must answer the "So What?" question. This declarative question refers to a research problem surviving the relevancy test [the quality of a measurement procedure that provides repeatability and accuracy]. Note that answering the "So What?" question requires a commitment on your part to not only show that you have reviewed the literature, but that you have thoroughly considered the significance of the research problem and its implications applied to creating new knowledge and understanding or informing practice.

To survive the "So What" question, problem statements should possess the following attributes:

  • Clarity and precision [a well-written statement does not make sweeping generalizations and irresponsible pronouncements; it also does include unspecific determinates like "very" or "giant"],
  • Demonstrate a researchable topic or issue [i.e., feasibility of conducting the study is based upon access to information that can be effectively acquired, gathered, interpreted, synthesized, and understood],
  • Identification of what would be studied, while avoiding the use of value-laden words and terms,
  • Identification of an overarching question or small set of questions accompanied by key factors or variables,
  • Identification of key concepts and terms,
  • Articulation of the study's conceptual boundaries or parameters or limitations,
  • Some generalizability in regards to applicability and bringing results into general use,
  • Conveyance of the study's importance, benefits, and justification [i.e., regardless of the type of research, it is important to demonstrate that the research is not trivial],
  • Does not have unnecessary jargon or overly complex sentence constructions; and,
  • Conveyance of more than the mere gathering of descriptive data providing only a snapshot of the issue or phenomenon under investigation.

Bryman, Alan. “The Research Question in Social Research: What is its Role?” International Journal of Social Research Methodology 10 (2007): 5-20; Brown, Perry J., Allen Dyer, and Ross S. Whaley. "Recreation Research—So What?" Journal of Leisure Research 5 (1973): 16-24; Castellanos, Susie. Critical Writing and Thinking. The Writing Center. Dean of the College. Brown University; Ellis, Timothy J. and Yair Levy Nova. "Framework of Problem-Based Research: A Guide for Novice Researchers on the Development of a Research-Worthy Problem." Informing Science: the International Journal of an Emerging Transdiscipline 11 (2008); Thesis and Purpose Statements. The Writer’s Handbook. Writing Center. University of Wisconsin, Madison; Thesis Statements. The Writing Center. University of North Carolina; Tips and Examples for Writing Thesis Statements. The Writing Lab and The OWL. Purdue University; Selwyn, Neil. "‘So What?’…A Question that Every Journal Article Needs to Answer." Learning, Media, and Technology 39 (2014): 1-5; Shoket, Mohd. "Research Problem: Identification and Formulation." International Journal of Research 1 (May 2014): 512-518.

Structure and Writing Style

I.  Types and Content

There are four general conceptualizations of a research problem in the social sciences:

  • Casuist Research Problem -- this type of problem relates to the determination of right and wrong in questions of conduct or conscience by analyzing moral dilemmas through the application of general rules and the careful distinction of special cases.
  • Difference Research Problem -- typically asks the question, “Is there a difference between two or more groups or treatments?” This type of problem statement is used when the researcher compares or contrasts two or more phenomena. This a common approach to defining a problem in the clinical social sciences or behavioral sciences.
  • Descriptive Research Problem -- typically asks the question, "what is...?" with the underlying purpose to describe the significance of a situation, state, or existence of a specific phenomenon. This problem is often associated with revealing hidden or understudied issues.
  • Relational Research Problem -- suggests a relationship of some sort between two or more variables to be investigated. The underlying purpose is to investigate specific qualities or characteristics that may be connected in some way.

A problem statement in the social sciences should contain :

  • A lead-in that helps ensure the reader will maintain interest over the study,
  • A declaration of originality [e.g., mentioning a knowledge void or a lack of clarity about a topic that will be revealed in the literature review of prior research],
  • An indication of the central focus of the study [establishing the boundaries of analysis], and
  • An explanation of the study's significance or the benefits to be derived from investigating the research problem.

NOTE:   A statement describing the research problem of your paper should not be viewed as a thesis statement that you may be familiar with from high school. Given the content listed above, a description of the research problem is usually a short paragraph in length.

II.  Sources of Problems for Investigation

The identification of a problem to study can be challenging, not because there's a lack of issues that could be investigated, but due to the challenge of formulating an academically relevant and researchable problem which is unique and does not simply duplicate the work of others. To facilitate how you might select a problem from which to build a research study, consider these sources of inspiration:

Deductions from Theory This relates to deductions made from social philosophy or generalizations embodied in life and in society that the researcher is familiar with. These deductions from human behavior are then placed within an empirical frame of reference through research. From a theory, the researcher can formulate a research problem or hypothesis stating the expected findings in certain empirical situations. The research asks the question: “What relationship between variables will be observed if theory aptly summarizes the state of affairs?” One can then design and carry out a systematic investigation to assess whether empirical data confirm or reject the hypothesis, and hence, the theory.

Interdisciplinary Perspectives Identifying a problem that forms the basis for a research study can come from academic movements and scholarship originating in disciplines outside of your primary area of study. This can be an intellectually stimulating exercise. A review of pertinent literature should include examining research from related disciplines that can reveal new avenues of exploration and analysis. An interdisciplinary approach to selecting a research problem offers an opportunity to construct a more comprehensive understanding of a very complex issue that any single discipline may be able to provide.

Interviewing Practitioners The identification of research problems about particular topics can arise from formal interviews or informal discussions with practitioners who provide insight into new directions for future research and how to make research findings more relevant to practice. Discussions with experts in the field, such as, teachers, social workers, health care providers, lawyers, business leaders, etc., offers the chance to identify practical, “real world” problems that may be understudied or ignored within academic circles. This approach also provides some practical knowledge which may help in the process of designing and conducting your study.

Personal Experience Don't undervalue your everyday experiences or encounters as worthwhile problems for investigation. Think critically about your own experiences and/or frustrations with an issue facing society or related to your community, your neighborhood, your family, or your personal life. This can be derived, for example, from deliberate observations of certain relationships for which there is no clear explanation or witnessing an event that appears harmful to a person or group or that is out of the ordinary.

Relevant Literature The selection of a research problem can be derived from a thorough review of pertinent research associated with your overall area of interest. This may reveal where gaps exist in understanding a topic or where an issue has been understudied. Research may be conducted to: 1) fill such gaps in knowledge; 2) evaluate if the methodologies employed in prior studies can be adapted to solve other problems; or, 3) determine if a similar study could be conducted in a different subject area or applied in a different context or to different study sample [i.e., different setting or different group of people]. Also, authors frequently conclude their studies by noting implications for further research; read the conclusion of pertinent studies because statements about further research can be a valuable source for identifying new problems to investigate. The fact that a researcher has identified a topic worthy of further exploration validates the fact it is worth pursuing.

III.  What Makes a Good Research Statement?

A good problem statement begins by introducing the broad area in which your research is centered, gradually leading the reader to the more specific issues you are investigating. The statement need not be lengthy, but a good research problem should incorporate the following features:

1.  Compelling Topic The problem chosen should be one that motivates you to address it but simple curiosity is not a good enough reason to pursue a research study because this does not indicate significance. The problem that you choose to explore must be important to you, but it must also be viewed as important by your readers and to a the larger academic and/or social community that could be impacted by the results of your study. 2.  Supports Multiple Perspectives The problem must be phrased in a way that avoids dichotomies and instead supports the generation and exploration of multiple perspectives. A general rule of thumb in the social sciences is that a good research problem is one that would generate a variety of viewpoints from a composite audience made up of reasonable people. 3.  Researchability This isn't a real word but it represents an important aspect of creating a good research statement. It seems a bit obvious, but you don't want to find yourself in the midst of investigating a complex research project and realize that you don't have enough prior research to draw from for your analysis. There's nothing inherently wrong with original research, but you must choose research problems that can be supported, in some way, by the resources available to you. If you are not sure if something is researchable, don't assume that it isn't if you don't find information right away--seek help from a librarian !

NOTE:   Do not confuse a research problem with a research topic. A topic is something to read and obtain information about, whereas a problem is something to be solved or framed as a question raised for inquiry, consideration, or solution, or explained as a source of perplexity, distress, or vexation. In short, a research topic is something to be understood; a research problem is something that needs to be investigated.

IV.  Asking Analytical Questions about the Research Problem

Research problems in the social and behavioral sciences are often analyzed around critical questions that must be investigated. These questions can be explicitly listed in the introduction [i.e., "This study addresses three research questions about women's psychological recovery from domestic abuse in multi-generational home settings..."], or, the questions are implied in the text as specific areas of study related to the research problem. Explicitly listing your research questions at the end of your introduction can help in designing a clear roadmap of what you plan to address in your study, whereas, implicitly integrating them into the text of the introduction allows you to create a more compelling narrative around the key issues under investigation. Either approach is appropriate.

The number of questions you attempt to address should be based on the complexity of the problem you are investigating and what areas of inquiry you find most critical to study. Practical considerations, such as, the length of the paper you are writing or the availability of resources to analyze the issue can also factor in how many questions to ask. In general, however, there should be no more than four research questions underpinning a single research problem.

Given this, well-developed analytical questions can focus on any of the following:

  • Highlights a genuine dilemma, area of ambiguity, or point of confusion about a topic open to interpretation by your readers;
  • Yields an answer that is unexpected and not obvious rather than inevitable and self-evident;
  • Provokes meaningful thought or discussion;
  • Raises the visibility of the key ideas or concepts that may be understudied or hidden;
  • Suggests the need for complex analysis or argument rather than a basic description or summary; and,
  • Offers a specific path of inquiry that avoids eliciting generalizations about the problem.

NOTE:   Questions of how and why concerning a research problem often require more analysis than questions about who, what, where, and when. You should still ask yourself these latter questions, however. Thinking introspectively about the who, what, where, and when of a research problem can help ensure that you have thoroughly considered all aspects of the problem under investigation and helps define the scope of the study in relation to the problem.

V.  Mistakes to Avoid

Beware of circular reasoning! Do not state the research problem as simply the absence of the thing you are suggesting. For example, if you propose the following, "The problem in this community is that there is no hospital," this only leads to a research problem where:

  • The need is for a hospital
  • The objective is to create a hospital
  • The method is to plan for building a hospital, and
  • The evaluation is to measure if there is a hospital or not.

This is an example of a research problem that fails the "So What?" test . In this example, the problem does not reveal the relevance of why you are investigating the fact there is no hospital in the community [e.g., perhaps there's a hospital in the community ten miles away]; it does not elucidate the significance of why one should study the fact there is no hospital in the community [e.g., that hospital in the community ten miles away has no emergency room]; the research problem does not offer an intellectual pathway towards adding new knowledge or clarifying prior knowledge [e.g., the county in which there is no hospital already conducted a study about the need for a hospital, but it was conducted ten years ago]; and, the problem does not offer meaningful outcomes that lead to recommendations that can be generalized for other situations or that could suggest areas for further research [e.g., the challenges of building a new hospital serves as a case study for other communities].

Alvesson, Mats and Jörgen Sandberg. “Generating Research Questions Through Problematization.” Academy of Management Review 36 (April 2011): 247-271 ; Choosing and Refining Topics. Writing@CSU. Colorado State University; D'Souza, Victor S. "Use of Induction and Deduction in Research in Social Sciences: An Illustration." Journal of the Indian Law Institute 24 (1982): 655-661; Ellis, Timothy J. and Yair Levy Nova. "Framework of Problem-Based Research: A Guide for Novice Researchers on the Development of a Research-Worthy Problem." Informing Science: the International Journal of an Emerging Transdiscipline 11 (2008); How to Write a Research Question. The Writing Center. George Mason University; Invention: Developing a Thesis Statement. The Reading/Writing Center. Hunter College; Problem Statements PowerPoint Presentation. The Writing Lab and The OWL. Purdue University; Procter, Margaret. Using Thesis Statements. University College Writing Centre. University of Toronto; Shoket, Mohd. "Research Problem: Identification and Formulation." International Journal of Research 1 (May 2014): 512-518; Trochim, William M.K. Problem Formulation. Research Methods Knowledge Base. 2006; Thesis and Purpose Statements. The Writer’s Handbook. Writing Center. University of Wisconsin, Madison; Thesis Statements. The Writing Center. University of North Carolina; Tips and Examples for Writing Thesis Statements. The Writing Lab and The OWL. Purdue University; Pardede, Parlindungan. “Identifying and Formulating the Research Problem." Research in ELT: Module 4 (October 2018): 1-13; Walk, Kerry. Asking an Analytical Question. [Class handout or worksheet]. Princeton University; White, Patrick. Developing Research Questions: A Guide for Social Scientists . New York: Palgrave McMillan, 2009; Li, Yanmei, and Sumei Zhang. "Identifying the Research Problem." In Applied Research Methods in Urban and Regional Planning . (Cham, Switzerland: Springer International Publishing, 2022), pp. 13-21.

  • << Previous: Background Information
  • Next: Theoretical Framework >>
  • Last Updated: Aug 21, 2024 8:54 AM
  • URL: https://libguides.usc.edu/writingguide
  • Thesis Action Plan New
  • Academic Project Planner

Literature Navigator

Thesis dialogue blueprint, writing wizard's template, research proposal compass.

  • Why students love us
  • Rebels Blog
  • Why we are different
  • All Products
  • Coming Soon

Identifying a Research Problem: A Step-by-Step Guide

Identifying a Research Problem: A Step-by-Step Guide

Identifying a research problem is a crucial first step in the research process, serving as the foundation for all subsequent research activities. This guide provides a comprehensive overview of the steps involved in identifying a research problem, from understanding its essence to employing advanced strategies for refinement.

Key Takeaways

  • Understanding the definition and importance of a research problem is essential for academic success.
  • Exploring diverse sources such as literature reviews and consultations can help in formulating a solid research problem.
  • A clear problem statement, aligned research objectives, and well-defined questions are crucial for a focused study.
  • Evaluating the feasibility and potential impact of a research problem ensures its relevance and scope.
  • Advanced strategies, including interdisciplinary approaches and technology utilization, can enhance the identification and refinement of research problems.

Understanding the Essence of Identifying a Research Problem

Defining the research problem.

A research problem is the focal point of any academic inquiry. It is a concise and well-defined statement that outlines the specific issue or question that the research aims to address. This research problem usually sets the tone for the entire study and provides you, the researcher, with a clear purpose and a clear direction on how to go about conducting your research.

Importance in Academic Research

It also demonstrates the significance of your research and its potential to contribute new knowledge to the existing body of literature in the world. A compelling research problem not only captivates the attention of your peers but also lays the foundation for impactful and meaningful research outcomes.

Initial Steps to Identification

To identify a research problem, you need a systematic approach and a deep understanding of the subject area. Below are some steps to guide you in this process:

  • Conduct a thorough literature review to understand what has been studied before.
  • Identify gaps in the existing research that could form the basis of your study.
  • Consult with academic mentors to refine your ideas and approach.

Exploring Sources for Research Problem Identification

Literature review.

When you embark on the journey of identifying a research problem, a thorough literature review is indispensable. This process involves scrutinizing existing research to find literature gaps and unexplored areas that could form the basis of your research. It's crucial to analyze recent studies, seminal works, and review articles to ensure a comprehensive understanding of the topic.

Existing Theories and Frameworks

The exploration of existing theories and frameworks provides a solid foundation for developing a research problem. By understanding the established models and theories, you can identify inconsistencies or areas lacking in depth which might offer fruitful avenues for research.

Consultation with Academic Mentors

Engaging with academic mentors is vital in shaping a well-defined research problem. Their expertise can guide you through the complexities of your field, offering insights into feasible research questions and helping you refine your focus. This interaction often leads to the identification of unique and significant research opportunities that align with current academic and industry trends.

Formulating the Research Problem

Crafting a clear problem statement.

To effectively address your research problem, start by crafting a clear problem statement . This involves succinctly describing who is affected by the problem, why it is important, and how your research will contribute to solving it. Ensure your problem statement is concise and specific to guide the entire research process.

Setting Research Objectives

Setting clear research objectives is crucial for maintaining focus throughout your study. These objectives should directly align with the problem statement and guide your research activities. Consider using a bulleted list to outline your main objectives:

  • Understand the underlying factors contributing to the problem
  • Explore potential solutions
  • Evaluate the effectiveness of proposed solutions

Determining Research Questions

The formulation of precise research questions is a pivotal step in defining the scope and direction of your study. These questions should be directly derived from your research objectives and designed to be answerable through your chosen research methods. Crafting well-defined research questions will help you maintain a clear focus and avoid common pitfalls in the research process.

Evaluating the Scope and Relevance of the Research Problem

Feasibility assessment.

Before you finalize a research problem, it is crucial to assess its feasibility. Consider the availability of resources, time, and expertise required to conduct the research. Evaluate potential constraints and determine if the research problem can be realistically tackled within the given limitations.

Significance to the Field

Ensure that your research problem has a clear and direct impact on the field. It should aim to contribute to existing knowledge and address a real-world issue that is relevant to your academic discipline.

Potential Impact on Existing Knowledge

The potential impact of your research problem on existing knowledge cannot be understated. It should challenge, extend, or refine current understanding in a meaningful way. Consider how your research can add value to the existing body of work and potentially lead to significant advancements in your field.

Techniques for Refining the Research Problem

Narrowing down the focus.

To effectively refine your research problem, start by narrowing down the focus . This involves pinpointing the specific aspects of your topic that are most significant and ensuring that your research problem is not too broad. This targeted approach helps in identifying knowledge gaps and formulating more precise research questions.

Incorporating Feedback

Feedback is crucial in the refinement process. Engage with academic mentors, peers, and experts in your field to gather insights and suggestions. This collaborative feedback can lead to significant improvements in your research problem, making it more robust and relevant.

Iterative Refinement Process

Refinement should be seen as an iterative process, where you continuously refine and revise your research problem based on new information and feedback. This approach ensures that your research problem remains aligned with current trends and academic standards, ultimately enhancing its feasibility and relevance.

Challenges in Identifying a Research Problem

Common pitfalls and how to avoid them.

Identifying a research problem can be fraught with common pitfalls such as selecting a topic that is too broad or too narrow. To avoid these, you should conduct a thorough literature review and seek feedback from peers and mentors. This proactive approach ensures that your research question is both relevant and manageable.

Dealing with Ambiguity

Ambiguity in defining the research problem can lead to significant challenges down the line. Ensure clarity by operationalizing variables and explicitly stating the research objectives. This clarity will guide your entire research process, making it more structured and focused.

Balancing Novelty and Practicality

While it's important to address a novel issue in your research, practicality should not be overlooked. A research problem should not only contribute new knowledge but also be feasible and have clear implications. Balancing these aspects often requires iterative refinement and consultation with academic mentors to align your research with real-world applications.

Advanced Strategies for Identifying a Research Problem

Interdisciplinary approaches.

Embrace the power of interdisciplinary approaches to uncover unique and comprehensive research problems. By integrating knowledge from various disciplines, you can address complex issues that single-field studies might overlook. This method not only broadens the scope of your research but also enhances its applicability and depth.

Utilizing Technology and Data Analytics

Leverage technology and data analytics to refine and identify research problems with precision. Advanced tools like machine learning and big data analysis can reveal patterns and insights that traditional methods might miss. This approach is particularly useful in fields where large datasets are involved, or where real-time data integration can lead to more dynamic research outcomes.

Engaging with Industry and Community Needs

Focus on the needs of industry and community to ensure your research is not only academically sound but also practically relevant. Engaging with real-world problems can provide a rich source of research questions that are directly applicable and beneficial to society. This strategy not only enhances the relevance of your research but also increases its potential for impact.

Dive into the world of academic success with our 'Advanced Strategies for Identifying a Research Problem' at Research Rebels. Our expertly crafted guides and action plans are designed to simplify your thesis journey, transforming complex academic challenges into manageable tasks. Don't wait to take control of your academic future. Visit our website now to learn more and claim your special offer!

In conclusion, identifying a research problem is a foundational step in the academic research process that requires careful consideration and systematic approach. This guide has outlined the essential steps involved, from understanding the context and reviewing existing literature to formulating clear research questions. By adhering to these guidelines, researchers can ensure that their studies are grounded in a well-defined problem, enhancing the relevance and impact of their findings. It is crucial for scholars to approach this task with rigor and critical thinking to contribute meaningfully to the body of knowledge in their respective fields.

Frequently Asked Questions

What is a research problem.

A research problem is a specific issue, inconsistency, or gap in knowledge that needs to be addressed through scientific inquiry. It forms the foundation of a research study, guiding the research questions, methodology, and analysis.

Why is identifying a research problem important?

Identifying a research problem is crucial as it determines the direction and scope of the study. It helps researchers focus their inquiry, formulate hypotheses, and contribute to the existing body of knowledge.

How do I identify a suitable research problem?

To identify a suitable research problem, start with a thorough literature review to understand existing research and identify gaps. Consult with academic mentors, and consider relevance, feasibility, and your own interests.

What are some common pitfalls in identifying a research problem?

Common pitfalls include choosing a problem that is too broad or too narrow, not aligning with existing literature, lack of originality, and failing to consider the practical implications and feasibility of the study.

Can technology help in identifying a research problem?

Yes, technology and data analytics can aid in identifying research problems by providing access to a vast amount of data, revealing patterns and trends that might not be visible otherwise. Tools like digital libraries and research databases are particularly useful.

How can I refine my research problem?

Refine your research problem by narrowing its focus, seeking feedback from peers and mentors, and continually reviewing and adjusting the problem statement based on new information and insights gained during preliminary research.

Maximizing Impact: Creative Approaches to Your Marketing Final Project

The Ultimate Blueprint for Bachelor Thesis Success in 40 Days

The feedback loop: navigating peer reviews and supervisor input, how to conduct a systematic review and write-up in 7 steps (using prisma, pico and ai).

Persona redactando la tesis de un trabajo de investigación

Mastering the Art: How to Write the Thesis Statement of a Research Paper

Estudiante redactando propuesta de investigación doctoral

Cómo escribir una propuesta de investigación para un doctorado

Integrating Calm into Your Study Routine: The Power of Mindfulness in Education

Integrating Calm into Your Study Routine: The Power of Mindfulness in Education

Researcher measuring document length with a ruler.

How to Determine the Perfect Research Proposal Length

How Do I Start Writing My Thesis: A Step-by-Step Guide

How Do I Start Writing My Thesis: A Step-by-Step Guide

Icons and timeline illustrating research planning steps

From Idea to Proposal: 6 Steps to Efficiently Plan Your Research Project in 2024

Student planning thesis with calendar and books

Three Months to a Perfect Bachelor Thesis: A Detailed Plan for Students

Comprehensive Thesis Guide

Thesis Action Plan

Research Proposal Compass

  • Blog Articles
  • Affiliate Program
  • Terms and Conditions
  • Payment and Shipping Terms
  • Privacy Policy
  • Return Policy

© 2024 Research Rebels, All rights reserved.

Your cart is currently empty.

research problem article

The Research Problem & Statement

What they are & how to write them (with examples)

By: Derek Jansen (MBA) | Expert Reviewed By: Eunice Rautenbach (DTech) | March 2023

If you’re new to academic research, you’re bound to encounter the concept of a “ research problem ” or “ problem statement ” fairly early in your learning journey. Having a good research problem is essential, as it provides a foundation for developing high-quality research, from relatively small research papers to a full-length PhD dissertations and theses.

In this post, we’ll unpack what a research problem is and how it’s related to a problem statement . We’ll also share some examples and provide a step-by-step process you can follow to identify and evaluate study-worthy research problems for your own project.

Overview: Research Problem 101

What is a research problem.

  • What is a problem statement?

Where do research problems come from?

  • How to find a suitable research problem
  • Key takeaways

A research problem is, at the simplest level, the core issue that a study will try to solve or (at least) examine. In other words, it’s an explicit declaration about the problem that your dissertation, thesis or research paper will address. More technically, it identifies the research gap that the study will attempt to fill (more on that later).

Let’s look at an example to make the research problem a little more tangible.

To justify a hypothetical study, you might argue that there’s currently a lack of research regarding the challenges experienced by first-generation college students when writing their dissertations [ PROBLEM ] . As a result, these students struggle to successfully complete their dissertations, leading to higher-than-average dropout rates [ CONSEQUENCE ]. Therefore, your study will aim to address this lack of research – i.e., this research problem [ SOLUTION ].

A research problem can be theoretical in nature, focusing on an area of academic research that is lacking in some way. Alternatively, a research problem can be more applied in nature, focused on finding a practical solution to an established problem within an industry or an organisation. In other words, theoretical research problems are motivated by the desire to grow the overall body of knowledge , while applied research problems are motivated by the need to find practical solutions to current real-world problems (such as the one in the example above).

As you can probably see, the research problem acts as the driving force behind any study , as it directly shapes the research aims, objectives and research questions , as well as the research approach. Therefore, it’s really important to develop a very clearly articulated research problem before you even start your research proposal . A vague research problem will lead to unfocused, potentially conflicting research aims, objectives and research questions .

Free Webinar: How To Find A Dissertation Research Topic

What is a research problem statement?

As the name suggests, a problem statement (within a research context, at least) is an explicit statement that clearly and concisely articulates the specific research problem your study will address. While your research problem can span over multiple paragraphs, your problem statement should be brief , ideally no longer than one paragraph . Importantly, it must clearly state what the problem is (whether theoretical or practical in nature) and how the study will address it.

Here’s an example of a statement of the problem in a research context:

Rural communities across Ghana lack access to clean water, leading to high rates of waterborne illnesses and infant mortality. Despite this, there is little research investigating the effectiveness of community-led water supply projects within the Ghanaian context. Therefore, this study aims to investigate the effectiveness of such projects in improving access to clean water and reducing rates of waterborne illnesses in these communities.

As you can see, this problem statement clearly and concisely identifies the issue that needs to be addressed (i.e., a lack of research regarding the effectiveness of community-led water supply projects) and the research question that the study aims to answer (i.e., are community-led water supply projects effective in reducing waterborne illnesses?), all within one short paragraph.

Need a helping hand?

research problem article

Wherever there is a lack of well-established and agreed-upon academic literature , there is an opportunity for research problems to arise, since there is a paucity of (credible) knowledge. In other words, research problems are derived from research gaps . These gaps can arise from various sources, including the emergence of new frontiers or new contexts, as well as disagreements within the existing research.

Let’s look at each of these scenarios:

New frontiers – new technologies, discoveries or breakthroughs can open up entirely new frontiers where there is very little existing research, thereby creating fresh research gaps. For example, as generative AI technology became accessible to the general public in 2023, the full implications and knock-on effects of this were (or perhaps, still are) largely unknown and therefore present multiple avenues for researchers to explore.

New contexts – very often, existing research tends to be concentrated on specific contexts and geographies. Therefore, even within well-studied fields, there is often a lack of research within niche contexts. For example, just because a study finds certain results within a western context doesn’t mean that it would necessarily find the same within an eastern context. If there’s reason to believe that results may vary across these geographies, a potential research gap emerges.

Disagreements – within many areas of existing research, there are (quite naturally) conflicting views between researchers, where each side presents strong points that pull in opposing directions. In such cases, it’s still somewhat uncertain as to which viewpoint (if any) is more accurate. As a result, there is room for further research in an attempt to “settle” the debate.

Of course, many other potential scenarios can give rise to research gaps, and consequently, research problems, but these common ones are a useful starting point. If you’re interested in research gaps, you can learn more here .

How to find a research problem

Given that research problems flow from research gaps , finding a strong research problem for your research project means that you’ll need to first identify a clear research gap. Below, we’ll present a four-step process to help you find and evaluate potential research problems.

If you’ve read our other articles about finding a research topic , you’ll find the process below very familiar as the research problem is the foundation of any study . In other words, finding a research problem is much the same as finding a research topic.

Step 1 – Identify your area of interest

Naturally, the starting point is to first identify a general area of interest . Chances are you already have something in mind, but if not, have a look at past dissertations and theses within your institution to get some inspiration. These present a goldmine of information as they’ll not only give you ideas for your own research, but they’ll also help you see exactly what the norms and expectations are for these types of projects.

At this stage, you don’t need to get super specific. The objective is simply to identify a couple of potential research areas that interest you. For example, if you’re undertaking research as part of a business degree, you may be interested in social media marketing strategies for small businesses, leadership strategies for multinational companies, etc.

Depending on the type of project you’re undertaking, there may also be restrictions or requirements regarding what topic areas you’re allowed to investigate, what type of methodology you can utilise, etc. So, be sure to first familiarise yourself with your institution’s specific requirements and keep these front of mind as you explore potential research ideas.

Step 2 – Review the literature and develop a shortlist

Once you’ve decided on an area that interests you, it’s time to sink your teeth into the literature . In other words, you’ll need to familiarise yourself with the existing research regarding your interest area. Google Scholar is a good starting point for this, as you can simply enter a few keywords and quickly get a feel for what’s out there. Keep an eye out for recent literature reviews and systematic review-type journal articles, as these will provide a good overview of the current state of research.

At this stage, you don’t need to read every journal article from start to finish . A good strategy is to pay attention to the abstract, intro and conclusion , as together these provide a snapshot of the key takeaways. As you work your way through the literature, keep an eye out for what’s missing – in other words, what questions does the current research not answer adequately (or at all)? Importantly, pay attention to the section titled “ further research is needed ”, typically found towards the very end of each journal article. This section will specifically outline potential research gaps that you can explore, based on the current state of knowledge (provided the article you’re looking at is recent).

Take the time to engage with the literature and develop a big-picture understanding of the current state of knowledge. Reviewing the literature takes time and is an iterative process , but it’s an essential part of the research process, so don’t cut corners at this stage.

As you work through the review process, take note of any potential research gaps that are of interest to you. From there, develop a shortlist of potential research gaps (and resultant research problems) – ideally 3 – 5 options that interest you.

The relationship between the research problem and research gap

Step 3 – Evaluate your potential options

Once you’ve developed your shortlist, you’ll need to evaluate your options to identify a winner. There are many potential evaluation criteria that you can use, but we’ll outline three common ones here: value, practicality and personal appeal.

Value – a good research problem needs to create value when successfully addressed. Ask yourself:

  • Who will this study benefit (e.g., practitioners, researchers, academia)?
  • How will it benefit them specifically?
  • How much will it benefit them?

Practicality – a good research problem needs to be manageable in light of your resources. Ask yourself:

  • What data will I need access to?
  • What knowledge and skills will I need to undertake the analysis?
  • What equipment or software will I need to process and/or analyse the data?
  • How much time will I need?
  • What costs might I incur?

Personal appeal – a research project is a commitment, so the research problem that you choose needs to be genuinely attractive and interesting to you. Ask yourself:

  • How appealing is the prospect of solving this research problem (on a scale of 1 – 10)?
  • Why, specifically, is it attractive (or unattractive) to me?
  • Does the research align with my longer-term goals (e.g., career goals, educational path, etc)?

Depending on how many potential options you have, you may want to consider creating a spreadsheet where you numerically rate each of the options in terms of these criteria. Remember to also include any criteria specified by your institution . From there, tally up the numbers and pick a winner.

Step 4 – Craft your problem statement

Once you’ve selected your research problem, the final step is to craft a problem statement. Remember, your problem statement needs to be a concise outline of what the core issue is and how your study will address it. Aim to fit this within one paragraph – don’t waffle on. Have a look at the problem statement example we mentioned earlier if you need some inspiration.

Key Takeaways

We’ve covered a lot of ground. Let’s do a quick recap of the key takeaways:

  • A research problem is an explanation of the issue that your study will try to solve. This explanation needs to highlight the problem , the consequence and the solution or response.
  • A problem statement is a clear and concise summary of the research problem , typically contained within one paragraph.
  • Research problems emerge from research gaps , which themselves can emerge from multiple potential sources, including new frontiers, new contexts or disagreements within the existing literature.
  • To find a research problem, you need to first identify your area of interest , then review the literature and develop a shortlist, after which you’ll evaluate your options, select a winner and craft a problem statement .

research problem article

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

Mahmood Abdulrahman Chiroma

I APPRECIATE YOUR CONCISE AND MIND-CAPTIVATING INSIGHTS ON THE STATEMENT OF PROBLEMS. PLEASE I STILL NEED SOME SAMPLES RELATED TO SUICIDES.

Poonam

Very pleased and appreciate clear information.

Tabatha Cotto

Your videos and information have been a life saver for me throughout my dissertation journey. I wish I’d discovered them sooner. Thank you!

Esther Yateesa

Very interesting. Thank you. Please I need a PhD topic in climate change in relation to health.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Educational resources and simple solutions for your research journey

What is a Problem Statement in Research?

What is a Problem Statement in Research? How to Write It with Examples

The question, “What is a research problem statement?” is usually followed by “Why should I care about problem statements, and how can it affect my research?” In this article, we will try to simplify the concept so that you not only grasp its meaning but internalize its importance and learn how to craft a problem statement.

To put it simply, a “problem statement” as the name implies is any statement that describes a problem in research. When you conduct a study, your aim as a researcher is to answer a query or resolve a problem. This learned information is then typically disseminated by writing a research paper that details the entire process for readers (both for experts and the general public). To better grasp this concept, we’ll try to explain what a research problem statement is from the viewpoint of a reader. For the purpose of clarity and brevity the topic is divided into subsections.

Table of Contents

What is a research problem?

A research problem is a clearly defined issue in a particular field of study that requires additional investigation and study to resolve. Once identified, the problem can be succinctly stated to highlight existing knowledge gaps, the importance of solving the research problem, and the difference between a current situation and an improved state.

But why is it important to have a research problem ready? Keep in mind that a good research problem helps you define the main concepts and terms of research that not only guide your study but help you add to or update existing literature. A research problem statement should ideally be clear, precise, and tangible enough to assist you in developing a framework for establishing the objectives, techniques, and analysis of the research project. Hence, any research project, if it is to be completed successfully,  must start with a well-defined research problem.

research problem article

What is research problem statement?

A research problem statement in research writing is the most crucial component of any study, which the researcher must perfect for a variety of reasons, including to get funding and boost readership. We’ve already established that a research article’s “research problem” is a sentence that expresses the specific problem that the research is addressing. But first, let’s discuss the significance of the problem statement in research and how to formulate one, using a few examples.

Do you recall the thoughts that went through your head the last time you read a study article? Have you ever tried to quickly scan the introduction or background of the research article to get a sense of the context and the exact issue the authors were attempting to address through the study? Were you stuck attempting to pinpoint the key sentence(s) that encapsulates the background and context of the study, the motivation behind its initial conduct, and its goals? A research problem statement is the descriptive statement which conveys the issue a researcher is trying to address through the study with the aim of informing the reader the context and significance of performing the study at hand . The research problem statement is crucial for researchers to focus on a particular component of a vast field of study, and for readers to comprehend the significance of the research. A well-defined problem allows you to create a framework to develop research objectives or hypotheses.

Now that we are aware of the significance of a problem statement in research, we can concentrate on creating one that is compelling. Writing a problem statement is a fairly simple process; first, you select a broad topic or research area based on your expertise and the resources at your disposal. Then, you narrow it down to a specific research question or problem relevant to that area of research while keeping the gaps in existing knowledge in mind. To give you a step-by-step instruction on how to write a problem statement for research proposal we’ve broken the process down into sections discussing individual aspects.

When to write a problem statement?

The placement of the research problem in the research project is another crucial component when developing a problem statement. Since the research problem statement is fundamental to writing any research project, it is best to write it at the start of the research process, before experimental setup, data collection, and analysis. Without identifying a specific research problem, you don’t know what exactly you are trying to address through the research so it would not be possible for you to set up the right conditions and foundation for the research project.

It is important to describe the research problem statement at the beginning of the research process to guide the research design and methodology. Another benefit of having a clear and defined research problem early on is that it helps researchers stay on track and focus on the problem at hand without deviating into other trajectories. Writing down the research problem statement also ensures that the current study is relevant, fitting, and fills a knowledge gap. However, note that a research statement can be refined or modified as the research advances and new information becomes available. This could be anything from further deconstructing a specific query to posing a fresh query related to the selected topic area. In fact, it is common practice to revise the problem statement in research to maintain specificity and clarity and to allow room to reflect advancement in the research field.

Bonus point:

A well-defined research problem statement that is referenced in the proper position in the research proposal/article is crucial to effectively communicate the goal and significance of the study to all stakeholders concerned with the research. It piques the reader’s interest in the research area, which can advance the work in several ways and open up future partnerships and even employment opportunities for authors.

What does a research problem statement include?

If you have to create a problem statement from scratch, follow the steps/important aspects listed below to create a well-defined research problem statement.

  • Describe the wide-ranging research topics

To put things in perspective, it is important to first describe the background of the research issue, which derives from a broad area of study or interest that the research project is concerned with.

  • Talk about the research problem/issue

As mentioned earlier, it’s important to state the problem or issues that the research project seeks to address in a clear, succinct manner, preferably in a sentence or two to set the premise of the entire study.

  • Emphasize the importance of the issue

After defining the problem your research will try to solve, explain why it’s significant in the larger context and how your study aims to close the knowledge gap between the current state of knowledge and the ideal scenario.

  • Outline research questions to address the issue

Give a brief description of the list of research questions your study will use to solve the problem at hand and explain how these will address various components of the problem statement.

  • Specify the key goals of the research project

Next, carefully define a set of specific and measurable research objectives that the research project aims to address.

  • Describe the experimental setup

Be sure to include a description of the experimental design, including the intended sample (population/size), setting, or context in the problem statement.

  • Discuss the theoretical framework

Mention the numerous theoretical ideas and precepts necessary to comprehend the study issue and guide the research activity in this section.

  • Include the research methodology

To provide a clear and concise research framework, add a brief description of the research methodologies, including collection and analysis of data, which will be needed to address the research questions and objectives.

Characteristics of a research problem statement

It is essential for a research statement to be clear and concise so that it can guide the development of the research project. A good research statement also helps other stakeholders in comprehending the scope and relevance of the research, which could further lead to opportunities for collaboration or exploration. Here is a list of the key characteristics of a research problem that you should keep in mind when writing an effective research problem statement.

  • The “need” to resolve the issue must be present.

It is not enough to choose a problem in your area of interest and expertise; the research problem should have larger implications for a population or a specific subset. Unless the significance of the research problem is elaborated in detail, the research is not deemed significant. Hence, mentioning the “need” to conduct the research in the context of the subject area and how it will create a difference is of utmost importance.

  • The research problem needs to be presented rationally and clearly

The research statement must be written at the start and be simple enough for even researchers outside the subject area to understand. The two fundamental elements of a successful research problem statement are clarity and specificity. So, check and rewrite your research problem statement if your peers have trouble understanding it. Aim to write in a straightforward manner while addressing all relevant issues and coherent arguments.

  • The research issue is supported by facts and evidence

Before you begin writing the problem statement, you must collect all relevant information available to gain a better understanding of the research topic and existing gaps. A thorough literature search will give you an idea about the current situation and the specific questions you need to ask to close any knowledge gaps. This will also prevent you from asking the questions or identifying issues that have already been addressed. Also, the problem statement should be based on facts and data and should not depend upon hypothetical events.

  • The research problem should generate more research questions

Ideally, the research problem should be such that it helps advance research and encourage more questions. The new questions could be specific to the research that highlights different components or aspects of the problem. These questions must also aid in addressing the problem in a more comprehensive manner which provides a solid foundation for the research study.

  • The research problem should be tangible

The research issue should be concrete, which means that the study project’s budget and time constraints should be met. The research problem should not call for any actions and experiments that are impractical or outside of your area of competence.

To summarize the main characteristics of a research problem statement, it must:

  • Address the knowledge gap
  • Be current and relevant
  • Aids in advancing the field
  • Support future research
  • Be tangible and should suit researcher’s time and interest
  • Be based on facts and data

  How to write a problem statement in research proposal

The format of a problem statement might vary based on the nature and subject of the research; there is no set format. It is typically written in clear, concise sentences and can range from a few sentences to a few pages. Three considerations must be made when formulating a problem statement for a research proposal:

  • Context: The research problem statement needs to be created in the right setting with sufficient background information on the research topic. Context makes it easier to distinguish between the current state and the ideal one in which the issue would not exist. In this section, you can also include instances of any prior attempts and significant roadblocks to solving the problem.
  • Relevance: The main goal of the researcher here is to highlight the relevance of the research study. Explain how the research problem affects society or the field of research and, if the study is conducted to mitigate the issue, what an ideal scenario would look like. Who your study will most affect if the issue is resolved and how it can impact future research are other arguments that might be made in this section.
  • Strategy: Be sure to mention the goals and objectives of your research, and your approach to solve the problem. The purpose of this section is to lay out the research approach for tackling various parts of the research subject.

Examples of problem statement in research proposal

To put what we learned into practice, let’s look at an example of a problem statement in a research report. Suppose you decide to conduct a study on the topic of attention span of different generations. After a thorough literature search you concluded that the attention span of university students is reducing over generations compared to the previous one, even though there are many websites and apps to simplify tasks and make learning easy . This decrease in attention span is attributed to constant exposure to digital content and multiple screens.

In this scenario, the problem statement could be written as – “The problem this study addresses is the lack of regulative measures to control consumption of digital content by young university students, which negatively impacts their attention span”. The research’s goals and objectives, which may employ strategies to increase university students’ attention span by limiting their internet exposure, can then be described in more detail in subsequent paragraphs.

Frequently asked questions

What is a problem statement.

A problem statement is a succinct and unambiguous overview of the research issue that the study is trying to solve.

What is the difference between problem statement and thesis statement?

A problem statement is different from a thesis statement in that the former highlights the main points of a research paper while emphasizing the hypothesis, whilst the latter identifies the issue for which research is being done.

Why is a problem statement needed in a research proposal?

A problem statement identifies the specific problem that the researchers are trying to solve through their research. It is necessary to establish a framework for the project, focus the researcher’s attention, and inform stakeholders of the study’s importance.

Editage All Access is a subscription-based platform that unifies the best AI tools and services designed to speed up, simplify, and streamline every step of a researcher’s journey. The Editage All Access Pack is a one-of-a-kind subscription that unlocks full access to an AI writing assistant, literature recommender, journal finder, scientific illustration tool, and exclusive discounts on professional publication services from Editage.  

Based on 22+ years of experience in academia, Editage All Access empowers researchers to put their best research forward and move closer to success. Explore our top AI Tools pack, AI Tools + Publication Services pack, or Build Your Own Plan. Find everything a researcher needs to succeed, all in one place –  Get All Access now starting at just $14 a month !    

Related Posts

research funding sources

What are the Best Research Funding Sources

inductive research

Inductive vs. Deductive Research Approach

  • How it works

researchprospect post subheader

Research Problem – Definition, Steps & Tips

Published by Jamie Walker at August 12th, 2021 , Revised On October 3, 2023

Once you have chosen a research topic, the next stage is to explain the research problem: the detailed issue, ambiguity of the research, gap analysis, or gaps in knowledge and findings that you will discuss.

Here, in this article, we explore a research problem in a dissertation or an essay with some research problem examples to help you better understand how and when you should write a research problem.

“A research problem is a specific statement relating to an area of concern and is contingent on the type of research. Some research studies focus on theoretical and practical problems, while some focus on only one.”

The problem statement in the dissertation, essay, research paper, and other academic papers should be clearly stated and intended to expand information, knowledge, and contribution to change.

This article will assist in identifying and elaborating a research problem if you are unsure how to define your research problem. The most notable challenge in the research process is to formulate and identify a research problem. Formulating a problem statement and research questions while finalizing the research proposal or introduction for your dissertation or thesis is necessary.

Why is Research Problem Critical?

An interesting research topic is only the first step. The real challenge of the research process is to develop a well-rounded research problem.

A well-formulated research problem helps understand the research procedure; without it, your research will appear unforeseeable and awkward.

Research is a procedure based on a sequence and a research problem aids in following and completing the research in a sequence. Repetition of existing literature is something that should be avoided in research.

Therefore research problem in a dissertation or an essay needs to be well thought out and presented with a clear purpose. Hence, your research work contributes more value to existing knowledge. You need to be well aware of the problem so you can present logical solutions.

Formulating a research problem is the first step of conducting research, whether you are writing an essay, research paper,   dissertation , or  research proposal .

Looking for dissertation help?

Researchprospect to the rescue then.

We have expert writers on our team who are skilled at helping students with dissertations across a variety of STEM disciplines. Guaranteeing 100% satisfaction!

What is a Research Problem

Step 1: Identifying Problem Area – What is Research Problem

The most significant step in any research is to look for  unexplored areas, topics, and controversies . You aim to find gaps that your work will fill. Here are some research problem examples for you to better understand the concept.

Practical Research Problems

To conduct practical research, you will need practical research problems that are typically identified by analysing reports, previous research studies, and interactions with the experienced personals of pertinent disciplines. You might search for:

  • Problems with performance or competence in an organization
  • Institutional practices that could be enhanced
  • Practitioners of relevant fields and their areas of concern
  • Problems confronted by specific groups of people within your area of study

If your research work relates to an internship or a job, then it will be critical for you to identify a research problem that addresses certain issues faced by the firm the job or internship pertains to.

Examples of Practical Research Problems

Decreased voter participation in county A, as compared to the rest of the country.

The high employee turnover rate of department X of company Y influenced efficiency and team performance.

A charity institution, Y, suffers a lack of funding resulting in budget cuts for its programmes.

Theoretical Research Problems

Theoretical research relates to predicting, explaining, and understanding various phenomena. It also expands and challenges existing information and knowledge.

Identification of a research problem in theoretical research is achieved by analysing theories and fresh research literature relating to a broad area of research. This practice helps to find gaps in the research done by others and endorse the argument of your topic.

Here are some questions that you should bear in mind.

  • A case or framework that has not been deeply analysed
  • An ambiguity between more than one viewpoints
  • An unstudied condition or relationships
  • A problematic issue that needs to be addressed

Theoretical issues often contain practical implications, but immediate issues are often not resolved by these results. If that is the case, you might want to adopt a different research approach  to achieve the desired outcomes.

Examples of Theoretical Research Problems

Long-term Vitamin D deficiency affects cardiac patients are not well researched.

The relationship between races, sex, and income imbalances needs to be studied with reference to the economy of a specific country or region.

The disagreement among historians of Scottish nationalism regarding the contributions of Imperial Britain in the creation of the national identity for Scotland.

Hire an Expert Writer

Proposal and dissertation orders completed by our expert writers are

  • Formally drafted in academic style
  • Plagiarism free
  • 100% Confidential
  • Never Resold
  • Include unlimited free revisions
  • Completed to match exact client requirements

Step 2: Understanding the Research Problem

The researcher further investigates the selected area of research to find knowledge and information relating to the research problem to address the findings in the research.

Background and Rationale

  • Population influenced by the problem?
  • Is it a persistent problem, or is it recently revealed?
  • Research that has already been conducted on this problem?
  • Any proposed solution to the problem?
  • Recent arguments concerning the problem, what are the gaps in the problem?

How to Write a First Class Dissertation Proposal or Research Proposal

Particularity and Suitability

  • What specific place, time, and/or people will be focused on?
  • Any aspects of research that you may not be able to deal with?
  • What will be the concerns if the problem remains unresolved?
  • What are the benefices of the problem resolution (e.g. future researcher or organisation’s management)?

Example of a Specific Research Problem

A non-profit institution X has been examined on their existing support base retention, but the existing research does not incorporate an understanding of how to effectively target new donors. To continue their work, the institution needs more research and find strategies for effective fundraising.

Once the problem is narrowed down, the next stage is to propose a problem statement and hypothesis or research questions.

If you are unsure about what a research problem is and how to define the research problem, then you might want to take advantage of our dissertation proposal writing service. You may also want to take a look at our essay writing service if you need help with identifying a research problem for your essay.

Frequently Asked Questions

What is research problem with example.

A research problem is a specific challenge that requires investigation. Example: “What is the impact of social media on mental health among adolescents?” This problem drives research to analyse the relationship between social media use and mental well-being in young people.

How many types of research problems do we have?

  • Descriptive: Describing phenomena as they exist.
  • Explanatory: Understanding causes and effects.
  • Exploratory: Investigating little-understood phenomena.
  • Predictive: Forecasting future outcomes.
  • Prescriptive: Recommending actions.
  • Normative: Describing what ought to be.

What are the principles of the research problem?

  • Relevance: Addresses a significant issue.
  • Re searchability: Amenable to empirical investigation.
  • Clarity: Clearly defined without ambiguity.
  • Specificity: Narrowly framed, avoiding vagueness.
  • Feasibility: Realistic to conduct with available resources.
  • Novelty: Offers new insights or challenges existing knowledge.
  • Ethical considerations: Respect rights, dignity, and safety.

Why is research problem important?

A research problem is crucial because it identifies knowledge gaps, directs the inquiry’s focus, and forms the foundation for generating hypotheses or questions. It drives the methodology and determination of study relevance, ensuring that research contributes meaningfully to academic discourse and potentially addresses real-world challenges.

How do you write a research problem?

To write a research problem, identify a knowledge gap or an unresolved issue in your field. Start with a broad topic, then narrow it down. Clearly articulate the problem in a concise statement, ensuring it’s researchable, significant, and relevant. Ground it in the existing literature to highlight its importance and context.

How can we solve research problem?

To solve a research problem, start by conducting a thorough literature review. Formulate hypotheses or research questions. Choose an appropriate research methodology. Collect and analyse data systematically. Interpret findings in the context of existing knowledge. Ensure validity and reliability, and discuss implications, limitations, and potential future research directions.

You May Also Like

Not sure how to approach a company for your primary research study? Don’t worry. Here we have some tips for you to successfully gather primary study.

Make sure that your selected topic is intriguing, manageable, and relevant. Here are some guidelines to help understand how to find a good dissertation topic.

Let’s briefly examine the concept of research paradigms, their pillars, purposes, types, examples, and how they can be combined.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works
  • How To Formulate A Research Problem

Emmanuel

Introduction

In the dynamic realm of academia, research problems serve as crucial stepping stones for groundbreaking discoveries and advancements. Research problems lay the groundwork for inquiry and exploration that happens when conducting research. They direct the path toward knowledge expansion.

In this blog post, we will discuss the different ways you can identify and formulate a research problem. We will also highlight how you can write a research problem, its significance in guiding your research journey, and how it contributes to knowledge advancement.

Understanding the Essence of a Research Problem

A research problem is defined as the focal point of any academic inquiry. It is a concise and well-defined statement that outlines the specific issue or question that the research aims to address. This research problem usually sets the tone for the entire study and provides you, the researcher, with a clear purpose and a clear direction on how to go about conducting your research.

There are two ways you can consider what the purpose of your research problem is. The first way is that the research problem helps you define the scope of your study and break down what you should focus on in the research. The essence of this is to ensure that you embark on a relevant study and also easily manage it. 

The second way is that having a research problem helps you develop a step-by-step guide in your research exploration and execution. It directs your efforts and determines the type of data you need to collect and analyze. Furthermore, a well-developed research problem is really important because it contributes to the credibility and validity of your study.

It also demonstrates the significance of your research and its potential to contribute new knowledge to the existing body of literature in the world. A compelling research problem not only captivates the attention of your peers but also lays the foundation for impactful and meaningful research outcomes.

Identifying a Research Problem

To identify a research problem, you need a systematic approach and a deep understanding of the subject area. Below are some steps to guide you in this process:

  • Conduct a Literature Review: Before you dive into your research problem, ensure you get familiar with the existing literature in your field. Analyze gaps, controversies, and unanswered questions. This will help you identify areas where your research can make a meaningful contribution.
  • Consult with Peers and Mentors: Participate in discussions with your peers and mentors to gain insights and feedback on potential research problems. Their perspectives can help you refine and validate your ideas.
  • Define Your Research Objectives: Clearly outline the objectives of your study. What do you want to achieve through your research? What specific outcomes are you aiming for?

Formulating a Research Problem

Once you have identified the general area of interest and specific research objectives, you can then formulate your research problem. Things to consider when formulating a research problem:

  • Clarity and Specificity: Your research problem should be concise, specific, and devoid of ambiguity. Avoid vague statements that could lead to confusion or misinterpretation.
  • Originality: Strive to formulate a research problem that addresses a unique and unexplored aspect of your field. Originality is key to making a meaningful contribution to the existing knowledge.
  • Feasibility: Ensure that your research problem is feasible within the constraints of time, resources, and available data. Unrealistic research problems can hinder the progress of your study.
  • Refining the Research Problem: It is common for the research problem to evolve as you delve deeper into your study. Don’t be afraid to refine and revise your research problem if necessary. Seek feedback from colleagues, mentors, and experts in your field to ensure the strength and relevance of your research problem.

How Do You Write a Research Problem?

Steps to consider in writing a Research Problem:

  • Select a Topic: The first step in writing a research problem is to select a specific topic of interest within your field of study. This topic should be relevant, and meaningful, and have the potential to contribute to existing knowledge.
  • Conduct a Literature Review: Before formulating your research problem, conduct a thorough literature review to understand the current state of research on your chosen topic. This will help you identify gaps, controversies, or areas that need further exploration.
  • Identify the Research Gap: Based on your literature review, pinpoint the specific gap or problem that your research aims to address. This gap should be something that has not been adequately studied or resolved in previous research.
  • Be Specific and Clear: The research problem should be framed in a clear and concise manner. It should be specific enough to guide your research but broad enough to allow for meaningful investigation.
  • Ensure Feasibility: Consider the resources and constraints available to you when formulating the research problem. Ensure that it is feasible to address the problem within the scope of your study.
  • Align your Research Goals: The research problem should align with the overall goals and objectives of your study. It should be directly related to the research questions you intend to answer.
Related: How to Write a Problem Statement for your Research

Research Problem vs Research Questions

Research Problem: The research problem is a broad statement that outlines the overarching issue or gap in knowledge that your research aims to address. It provides the context and motivation for your study and helps establish its significance and relevance. The research problem is typically stated in the introduction section of your research proposal or thesis.

Research Questions: Research questions are specific inquiries that you seek to answer through your research. These questions are derived from the research problem and help guide the focus of your study. They are often more detailed and narrow in scope compared to the research problem. Research questions are usually listed in the methodology section of your research proposal or thesis.

Difference Between a Research Problem and a Research Topic

Research Problem: A research problem is a specific issue, gap, or question that requires investigation and can be addressed through research. It is a clearly defined and focused problem that the researcher aims to solve or explore. The research problem provides the context and rationale for the study and guides the research process. It is usually stated as a question or a statement in the introduction section of a research proposal or thesis.

Example of a Research Problem: “ What are the factors influencing consumer purchasing decisions in the online retail industry ?”

Research Topic: A research topic, on the other hand, is a broader subject or area of interest within a particular field of study. It is a general idea or subject that the researcher wants to explore in their research. The research topic is more general and does not yet specify a specific problem or question to be addressed. It serves as the starting point for the research, and the researcher further refines it to formulate a specific research problem.

Example of a Research Topic: “ Consumer behavior in the online retail industry.”

In summary, a research topic is a general area of interest, while a research problem is a specific issue or question within that area that the researcher aims to investigate.

Difference Between a Research Problem and Problem Statement

Research Problem: As explained earlier, a research problem is a specific issue, gap, or question that you as a researcher aim to address through your research. It is a clear and concise statement that defines the focus of the study and provides a rationale for why it is worth investigating.

Example of a Research Problem: “What is the impact of social media usage on the mental health and well-being of adolescents?”

Problem Statement: The problem statement, on the other hand, is a brief and clear description of the problem that you want to solve or investigate. It is more focused and specific than the research problem and provides a snapshot of the main issue being addressed.

Example of a Problem Statement: “ The purpose of this study is to examine the relationship between social media usage and the mental health outcomes of adolescents, with a focus on depression, anxiety, and self-esteem.”

In summary, a research problem is the broader issue or question guiding the study, while the problem statement is a concise description of the specific problem being addressed in the research. The problem statement is usually found in the introduction section of a research proposal or thesis.

Challenges and Considerations

Formulating a research problem involves several challenges and considerations that researchers should carefully address:

  • Feasibility: Before you finalize a research problem, it is crucial to assess its feasibility. Consider the availability of resources, time, and expertise required to conduct the research. Evaluate potential constraints and determine if the research problem can be realistically tackled within the given limitations.
  • Novelty and Contribution: A well-crafted research problem should aim to contribute to existing knowledge in the field. Ensure that your research problem addresses a gap in the literature or provides innovative insights. Review past studies to understand what has already been done and how your research can build upon or offer something new.
  • Ethical and Social Implications: Take into account the ethical and social implications of your research problem. Research involving human subjects or sensitive topics requires ethical considerations. Consider the potential impact of your research on individuals, communities, or society as a whole. 
  • Scope and Focus: Be mindful of the scope of your research problem. A problem that is too broad may be challenging to address comprehensively, while one that is too narrow might limit the significance of the findings. Strike a balance between a focused research problem that can be thoroughly investigated and one that has broader implications.
  • Clear Objectives: Ensure that your research problem aligns with specific research objectives. Clearly define what you intend to achieve through your study. Having well-defined objectives will help you stay on track and maintain clarity throughout the research process.
  • Relevance and Significance: Consider the relevance and significance of your research problem in the context of your field of study. Assess its potential implications for theory, practice, or policymaking. A research problem that addresses important questions and has practical implications is more likely to be valuable to the academic community and beyond.
  • Stakeholder Involvement: In some cases, involving relevant stakeholders early in the process of formulating a research problem can be beneficial. This could include experts in the field, practitioners, or individuals who may be impacted by the research. Their input can provide valuable insights that can help you enhance the quality of the research problem.

In conclusion, understanding how to formulate a research problem is fundamental for you to have meaningful research and intellectual growth. Remember that a well-crafted research problem serves as the foundation for groundbreaking discoveries and advancements in various fields. It not only enhances the credibility and relevance of your study but also contributes to the expansion of knowledge and the betterment of society.

Therefore, put more effort into the process of identifying and formulating research problems with enthusiasm and curiosity. Engage in comprehensive literature reviews, observe your surroundings, and reflect on the gaps in existing knowledge. Lastly, don’t forget to be mindful of the challenges and considerations, and ensure your research problem aligns with clear objectives and ethical principles.

Logo

Connect to Formplus, Get Started Now - It's Free!

  • problem statements
  • research objectives
  • research problem vs research topic
  • research problems
  • research studies

Formplus

You may also like:

Sources of Data For Research: Types & Examples

Introduction In the age of information, data has become the driving force behind decision-making and innovation. Whether in business,...

research problem article

Defining Research Objectives: How To  Write Them

Almost all industries use research for growth and development. Research objectives are how researchers ensure that their study has...

Naive vs Non Naive Participants In Research: Meaning & Implications

Introduction In research studies, naive and non-naive participant information alludes to the degree of commonality and understanding...

How to Write a Problem Statement for your Research

Learn how to write problem statements before commencing any research effort. Learn about its structure and explore examples

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

  • Research Process
  • Manuscript Preparation
  • Manuscript Review
  • Publication Process
  • Publication Recognition
  • Language Editing Services
  • Translation Services

Elsevier QRcode Wechat

How to Write an Effective Problem Statement for Your Research Paper

  • 4 minute read

Table of Contents

The problem statement usually appears at the beginning of an article, making it one of the first things readers encounter. An excellent problem statement not only explains the relevance and importance of the research but also helps readers quickly determine if the article aligns with their interests by clearly defining the topic. Therefore, the problem statement plays a unique role in the widespread dissemination of the paper and enhancing the researcher’s academic influence.  

In this article, we will focus on writing ideas, structure, and practical examples of the problem statement, helping researchers easily write an excellent problem statement.  

Basic Writing Strategies for the Problem Statement  

The problem statement aims to highlight the pressing issue the research intends to address. It should be concise and to the point. Researchers can follow a two-step approach: first, think about the content of the problem statement, and then organize the writing framework.  

Before writing, clarify the following points¹ :  

  • What is the reader’s level of understanding of the research topic?  
  • How can the significance of the research be effectively conveyed to the reader?  

After addressing these two questions, you can organize the content according to the following structure:  

  • Clarify what you aim to achieve with your research.  
  • Explore why the problem exists and explain how solving it helps reach the goal.  
  • Outline the potential impact of the research, such as possible outcomes, challenges, and benefits.  
  • Recommend a plan for your experiment that follows the rules of science.  
  • Explain the potential consequences if the problem is not resolved (if applicable).  

Three Important Parts of the Problem Statement  

The content and length of the problem statement can vary depending on the type of research. Although there’s no fixed format, it’s helpful to include these three key parts:  

  Research Background:  

Explain clearly what problem your research focuses on. Describe how things would be better if this problem didn’t exist. Also, talk about what other researchers have tried to do about this problem and what still needs to be figured out.  

  Research Significance:  

Clarify the impact of the problem on the research field and society, and analyze the cause of the problem. Explain who will benefit from solving the problem, thus demonstrating the relevance of the research and its contribution to the existing research system.²  To illustrate the relevance, consider aspects such as the geographical location or process where the problem occurs, the time period during which it exists, and the severity of the problem.  

Solution:  

Describe the research objective and the expected solution or results.  

Understanding the Writing Method Through Examples  

To further explore the writing method of the problem statement, let’s look at the following case.  

Research Topic: 

The benefits of vitamin D supplementation on the immune system.  

Problem Statement: 

  • Review existing research on the role of vitamin D in the immune system, emphasizing the potential impacts of vitamin D deficiency on the human body.  
  • List the obstacles encountered when trying to increase vitamin D levels in the body through supplements, and briefly mention the physiological or molecular mechanisms behind these obstacles.  
  • Clarify feasible ways to overcome these obstacles, such as new methods to promote the absorption of vitamin D in the intestine. Then, focus on the benefits of these methods, such as helping postmenopausal women with breast cancer improve their blood vitamin D levels.   

Points to Note: 

When crafting your problem statement, focus on essential details and avoid unnecessary information. Additionally, absolute terms such as “must” should be avoided.  

( The examples in this article are used only to illustrate writing points, and the academic views contained therein are not for reference. )  

By mastering these techniques and methods, you can enhance the clarity and impact of their problem statements. This not only makes the articles more engaging for reviewers and readers but also increases the likelihood of broader dissemination.  

For efficient and professional assistance, consider reaching out to Elsevier Language Services. Our team of expert editors, who are native English speakers across various disciplines, can help refine every aspect of your article, including the problem statement. Our goal is to ensure your research achieves efficient publication and has wide-reaching impact, supporting your academic journey in the long term.  

Type in wordcount for Plus Total: USD EUR JPY Follow this link if your manuscript is longer than 9,000 words. Upload

References:  

  • SURF Workshop Resources: Problem Statements – Purdue OWL® – Purdue University. (n.d.). https://owl.purdue.edu/owl/subject_specific_writing/writing_in_the_purdue_surf_program/surf_workshop_resources_problem_statements/index.html
  • Problem Statement | A practical guide to delivering results. (n.d.). Copyright (C)2024 a Practical Guide to Delivering Results. All Rights Reserved. https://deliveringresults.leeds.ac.uk/delivering-results-lifecycle/problem-statement/

What is and How to Write a Good Hypothesis in Research?

What is and How to Write a Good Hypothesis in Research?

How to Use Tables and Figures effectively in Research Papers

How to Use Tables and Figures effectively in Research Papers

You may also like.

what is a descriptive research design

Descriptive Research Design and Its Myriad Uses

Doctor doing a Biomedical Research Paper

Five Common Mistakes to Avoid When Writing a Biomedical Research Paper

Writing in Environmental Engineering

Making Technical Writing in Environmental Engineering Accessible

Risks of AI-assisted Academic Writing

To Err is Not Human: The Dangers of AI-assisted Academic Writing

Importance-of-Data-Collection

When Data Speak, Listen: Importance of Data Collection and Analysis Methods

choosing the Right Research Methodology

Choosing the Right Research Methodology: A Guide for Researchers

Why is data validation important in research

Why is data validation important in research?

Writing a good review article

Writing a good review article

Input your search keywords and press Enter.

The Library Is Open

The Wallace building is now open to the public. More information on services available.

  • RIT Libraries
  • Social/Behavioral Sciences Research Guide

Identifying a Research Problem

This InfoGuide assists students starting their research proposal and literature review.

  • Introduction
  • Research Process
  • Types of Research Methodology
  • Data Collection Methods
  • Anatomy of a Scholarly Article
  • Finding a topic
  • Problem Statement
  • Research Question
  • Research Design
  • Search Strategies
  • Psychology Database Limiters
  • Literature Review Search
  • Annotated Bibliography
  • Writing a Literature Review
  • Writing a Research Proposal

A  research problem  is a specific issue or gap in existing knowledge that you aim to address in your research. You may look for practical problems aimed at contributing to change or theoretical problems aimed at expanding knowledge.

Some research will do both of these things, but usually, the research problem focuses on one or the other. The research problem you choose depends on your broad  topic  of interest and the  type of research  you think will fit best.

This section helps you identify and refine a research problem. When writing your  research proposal  or  introduction , formulate it as a  problem statement  and/or  research questions .

Research Problems Steps

Why is the research problem important?

Having an interesting topic isn’t a strong enough basis for academic research. Without a well-defined research problem, you will likely end up with an unfocused and unmanageable project.

You might end up repeating what other people have already said, trying to say too much, or doing research without a clear purpose and justification. You need a clear problem to research that contributes new and relevant insights.

Whether planning your  thesis , starting a  research paper , or writing a  research proposal , the research problem is the first step towards knowing exactly what you’ll do and why.

Identify a broad problem area As you read about your topic, look for under-explored aspects or areas of concern, conflict, or controversy. Your goal is to find a gap that your research project can fill.

Practical research problems If you are doing practical research, you can identify a problem by reading reports, following up on previous research, or talking to people who work in the relevant field or organization. You might look for:

  • Issues with performance or efficiency
  • Processes that could be improved
  • Areas of concern among practitioners
  • Difficulties faced by specific groups of people Examples of practical research problems
  • Voter turnout in New England has been decreasing, in contrast to the rest of the country.
  • The HR department of a local chain of restaurants has a high staff turnover rate.

A non-profit organization faces a funding gap that means some of its programs will have to be cut

Theoretical research problems If you are doing theoretical research, you can identify a research problem by reading existing research, theory, and debates on your topic to find a gap in what is currently known about it. You might look for:

  • A phenomenon or context that has not been closely studied
  • A contradiction between two or more perspectives
  • A situation or relationship that is not well understood
  • A troubling question that has yet to be resolved Examples of theoretical research problems
  • The effects of long-term Vitamin D deficiency on cardiovascular health are not well understood.
  • The relationship between gender, race, and income inequality has yet to be closely studied in the context of the millennial gig economy
  • Historians of Scottish nationalism disagree about the role of the British Empire in developing Scotland’s national identity.

Learn more about the problem Next, you have to find out what is already known about the problem and pinpoint the exact aspect that your research will address. Context and background

  • Who does the problem affect?
  • Is it a newly-discovered problem, or a well-established one?
  • What research has already been done?
  • What, if any, solutions have been proposed?
  • What are the current debates about the problem? What is missing from these debates?

Specificity and relevance

  • What particular place, time, and/or group of people will you focus on?
  • What aspects will you not be able to tackle?
  • What will the consequences be if the problem is not resolved Example of a specific research problem A local non-profit organization that alleviates food insecurity has always fundraised from its existing support base. It lacks an understanding of how best to target potential new donors. To continue its work, the organization requires research into more effective fundraising strategies.

Once you have narrowed down your research problem, the next step is to formulate a  problem statement , as well as your  research questions  or  hypotheses .

  • << Previous: Finding a topic
  • Next: Problem Statement >>

Edit this Guide

Log into Dashboard

Use of RIT resources is reserved for current RIT students, faculty and staff for academic and teaching purposes only. Please contact your librarian with any questions.

Facebook icon

Help is Available

research problem article

Email a Librarian

A librarian is available by e-mail at [email protected]

Meet with a Librarian

Call reference desk voicemail.

A librarian is available by phone at (585) 475-2563 or on Skype at llll

Or, call (585) 475-2563 to leave a voicemail with the reference desk during normal business hours .

Chat with a Librarian

Social/behavioral sciences research guide infoguide url.

https://infoguides.rit.edu/researchguide

Use the box below to email yourself a link to this guide

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence
  • Market Research
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO

research problem article

Academic Experience

How to identify and resolve research problems

Updated July 12, 2023

In this article, we’re going to take you through one of the most pertinent parts of conducting research: a research problem (also known as a research problem statement).

When trying to formulate a good research statement, and understand how to solve it for complex projects, it can be difficult to know where to start.

Not only are there multiple perspectives (from stakeholders to project marketers who want answers), you have to consider the particular context of the research topic: is it timely, is it relevant and most importantly of all, is it valuable?

In other words: are you looking at a research worthy problem?

The fact is, a well-defined, precise, and goal-centric research problem will keep your researchers, stakeholders, and business-focused and your results actionable.

And when it works well, it's a powerful tool to identify practical solutions that can drive change and secure buy-in from your workforce.

Free eBook: The ultimate guide to market research

What is a research problem?

In social research methodology and behavioral sciences , a research problem establishes the direction of research, often relating to a specific topic or opportunity for discussion.

For example: climate change and sustainability, analyzing moral dilemmas or wage disparity amongst classes could all be areas that the research problem focuses on.

As well as outlining the topic and/or opportunity, a research problem will explain:

  • why the area/issue needs to be addressed,
  • why the area/issue is of importance,
  • the parameters of the research study
  • the research objective
  • the reporting framework for the results and
  • what the overall benefit of doing so will provide (whether to society as a whole or other researchers and projects).

Having identified the main topic or opportunity for discussion, you can then narrow it down into one or several specific questions that can be scrutinized and answered through the research process.

What are research questions?

Generating research questions underpinning your study usually starts with problems that require further research and understanding while fulfilling the objectives of the study.

A good problem statement begins by asking deeper questions to gain insights about a specific topic.

For example, using the problems above, our questions could be:

"How will climate change policies influence sustainability standards across specific geographies?"

"What measures can be taken to address wage disparity without increasing inflation?"

Developing a research worthy problem is the first step - and one of the most important - in any kind of research.

It’s also a task that will come up again and again because any business research process is cyclical. New questions arise as you iterate and progress through discovering, refining, and improving your products and processes. A research question can also be referred to as a "problem statement".

Note: good research supports multiple perspectives through empirical data. It’s focused on key concepts rather than a broad area, providing readily actionable insight and areas for further research.

Research question or research problem?

As we've highlighted, the terms “research question” and “research problem” are often used interchangeably, becoming a vague or broad proposition for many.

The term "problem statement" is far more representative, but finds little use among academics.

Instead, some researchers think in terms of a single research problem and several research questions that arise from it.

As mentioned above, the questions are lines of inquiry to explore in trying to solve the overarching research problem.

Ultimately, this provides a more meaningful understanding of a topic area.

It may be useful to think of questions and problems as coming out of your business data – that’s the O-data (otherwise known as operational data) like sales figures and website metrics.

What's an example of a research problem?

Your overall research problem could be: "How do we improve sales across EMEA and reduce lost deals?"

This research problem then has a subset of questions, such as:

"Why do sales peak at certain times of the day?"

"Why are customers abandoning their online carts at the point of sale?"

As well as helping you to solve business problems, research problems (and associated questions) help you to think critically about topics and/or issues (business or otherwise). You can also use your old research to aid future research -- a good example is laying the foundation for comparative trend reports or a complex research project.

(Also, if you want to see the bigger picture when it comes to research problems, why not check out our ultimate guide to market research? In it you'll find out: what effective market research looks like, the use cases for market research, carrying out a research study, and how to examine and action research findings).

The research process: why are research problems important?

A research problem has two essential roles in setting your research project on a course for success.

1. They set the scope

The research problem defines what problem or opportunity you’re looking at and what your research goals are. It stops you from getting side-tracked or allowing the scope of research to creep off-course .

Without a strong research problem or problem statement, your team could end up spending resources unnecessarily, or coming up with results that aren’t actionable - or worse, harmful to your business - because the field of study is too broad.

2. They tie your work to business goals and actions

To formulate a research problem in terms of business decisions means you always have clarity on what’s needed to make those decisions. You can show the effects of what you’ve studied using real outcomes.

Then, by focusing your research problem statement on a series of questions tied to business objectives, you can reduce the risk of the research being unactionable or inaccurate.

It's also worth examining research or other scholarly literature (you’ll find plenty of similar, pertinent research online) to see how others have explored specific topics and noting implications that could have for your research.

Four steps to defining your research problem

Defining a research problem

Image credit: http://myfreeschooltanzania.blogspot.com/2014/11/defining-research-problem.html

1. Observe and identify

Businesses today have so much data that it can be difficult to know which problems to address first. Researchers also have business stakeholders who come to them with problems they would like to have explored. A researcher’s job is to sift through these inputs and discover exactly what higher-level trends and key concepts are worth investing in.

This often means asking questions and doing some initial investigation to decide which avenues to pursue. This could mean gathering interdisciplinary perspectives identifying additional expertise and contextual information.

Sometimes, a small-scale preliminary study might be worth doing to help get a more comprehensive understanding of the business context and needs, and to make sure your research problem addresses the most critical questions.

This could take the form of qualitative research using a few in-depth interviews , an environmental scan, or reviewing relevant literature.

The sales manager of a sportswear company has a problem: sales of trail running shoes are down year-on-year and she isn’t sure why. She approaches the company’s research team for input and they begin asking questions within the company and reviewing their knowledge of the wider market.

2. Review the key factors involved

As a marketing researcher, you must work closely with your team of researchers to define and test the influencing factors and the wider context involved in your study. These might include demographic and economic trends or the business environment affecting the question at hand. This is referred to as a relational research problem.

To do this, you have to identify the factors that will affect the research and begin formulating different methods to control them.

You also need to consider the relationships between factors and the degree of control you have over them. For example, you may be able to control the loading speed of your website but you can’t control the fluctuations of the stock market.

Doing this will help you determine whether the findings of your project will produce enough information to be worth the cost.

You need to determine:

  • which factors affect the solution to the research proposal.
  • which ones can be controlled and used for the purposes of the company, and to what extent.
  • the functional relationships between the factors.
  • which ones are critical to the solution of the research study.

The research team at the running shoe company is hard at work. They explore the factors involved and the context of why YoY sales are down for trail shoes, including things like what the company’s competitors are doing, what the weather has been like – affecting outdoor exercise – and the relative spend on marketing for the brand from year to year.

The final factor is within the company’s control, although the first two are not. They check the figures and determine marketing spend has a significant impact on the company.

3. Prioritize

Once you and your research team have a few observations, prioritize them based on their business impact and importance. It may be that you can answer more than one question with a single study, but don’t do it at the risk of losing focus on your overarching research problem.

Questions to ask:

  • Who? Who are the people with the problem? Are they end-users, stakeholders, teams within your business? Have you validated the information to see what the scale of the problem is?
  • What? What is its nature and what is the supporting evidence?
  • Why? What is the business case for solving the problem? How will it help?
  • Where? How does the problem manifest and where is it observed?

To help you understand all dimensions, you might want to consider focus groups or preliminary interviews with external (including consumers and existing customers) and internal (salespeople, managers, and other stakeholders) parties to provide what is sometimes much-needed insight into a particular set of questions or problems.

After observing and investigating, the running shoe researchers come up with a few candidate questions, including:

  • What is the relationship between US average temperatures and sales of our products year on year?
  • At present, how does our customer base rank Competitor X and Competitor Y’s trail running shoe compared to our brand?
  • What is the relationship between marketing spend and trail shoe product sales over the last 12 months?

They opt for the final question, because the variables involved are fully within the company’s control, and based on their initial research and stakeholder input, seem the most likely cause of the dive in sales. The research question is specific enough to keep the work on course towards an actionable result, but it allows for a few different avenues to be explored, such as the different budget allocations of offline and online marketing and the kinds of messaging used.

Get feedback from the key teams within your business to make sure everyone is aligned and has the same understanding of the research problem and questions, and the actions you hope to take based on the results. Now is also a good time to demonstrate the ROI of your research and lay out its potential benefits to your stakeholders.

Different groups may have different goals and perspectives on the issue. This step is vital for getting the necessary buy-in and pushing the project forward.

The running shoe company researchers now have everything they need to begin. They call a meeting with the sales manager and consult with the product team, marketing team, and C-suite to make sure everyone is aligned and has bought into the direction of the research topic. They identify and agree that the likely course of action will be a rethink of how marketing resources are allocated, and potentially testing out some new channels and messaging strategies .

Can you explore a broad area and is it practical to do so?

A broader research problem or report can be a great way to bring attention to prevalent issues, societal or otherwise, but are often undertaken by those with the resources to do so.

Take a typical government cybersecurity breach survey, for example. Most of these reports raise awareness of cybercrime, from the day-to-day threats businesses face to what security measures some organizations are taking. What these reports don't do, however, is provide actionable advice - mostly because every organization is different.

The point here is that while some researchers will explore a very complex issue in detail, others will provide only a snapshot to maintain interest and encourage further investigation. The "value" of the data is wholly determined by the recipients of it - and what information you choose to include.

To summarize, it can be practical to undertake a broader research problem, certainly, but it may not be possible to cover everything or provide the detail your audience needs. Likewise, a more systematic investigation of an issue or topic will be more valuable, but you may also find that you cover far less ground.

It's important to think about your research objectives and expected findings before going ahead.

Ensuring your research project is a success

A complex research project can be made significantly easier with clear research objectives, a descriptive research problem, and a central focus. All of which we've outlined in this article.

If you have previous research, even better. Use it as a benchmark

Remember: what separates a good research paper from an average one is actually very simple: valuable, empirical data that explores a prevalent societal or business issue and provides actionable insights.

And we can help.

Sophisticated research made simple with Qualtrics

Trusted by the world's best brands, our platform enables researchers from academic to corporate to tackle the hardest challenges and deliver the results that matter.

Our CoreXM platform supports the methods that define superior research and delivers insights in real-time. It's easy to use (thanks to drag-and-drop functionality) and requires no coding, meaning you'll be capturing data and gleaning insights in no time.

Satisfaction New York vs Massachusetts

It also excels in flexibility; you can track consumer behavior across segments , benchmark your company versus competitors , carry out complex academic research, and do much more, all from one system.

It's one platform with endless applications, so no matter your research problem, we've got the tools to help you solve it. And if you don't have a team of research experts in-house, our market research team has the practical knowledge and tools to help design the surveys and find the respondents you need.

Of course, you may want to know where to begin with your own market research . If you're struggling, make sure to download our ultimate guide using the link below.

It's got everything you need and there’s always information in our research methods knowledge base.

Scott Smith

Scott Smith, Ph.D. is a contributor to the Qualtrics blog.

Related Articles

April 1, 2023

How to write great survey questions (with examples)

February 8, 2023

Smoothing the transition from school to work with work-based learning

December 6, 2022

How customer experience helps bring Open Universities Australia’s brand promise to life

August 9, 2022

3 things that will improve your teachers’ school experience

August 2, 2022

Why a sense of belonging at school matters for K-12 students

July 14, 2022

Improve the student experience with simplified course evaluations

March 17, 2022

Understanding what’s important to college students

February 18, 2022

Malala: ‘Education transforms lives, communities, and countries’

Stay up to date with the latest xm thought leadership, tips and news., request demo.

Ready to learn more about Qualtrics?

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Starting the research process
  • 10 Research Question Examples to Guide Your Research Project

10 Research Question Examples to Guide your Research Project

Published on October 30, 2022 by Shona McCombes . Revised on October 19, 2023.

The research question is one of the most important parts of your research paper , thesis or dissertation . It’s important to spend some time assessing and refining your question before you get started.

The exact form of your question will depend on a few things, such as the length of your project, the type of research you’re conducting, the topic , and the research problem . However, all research questions should be focused, specific, and relevant to a timely social or scholarly issue.

Once you’ve read our guide on how to write a research question , you can use these examples to craft your own.

Research question Explanation
The first question is not enough. The second question is more , using .
Starting with “why” often means that your question is not enough: there are too many possible answers. By targeting just one aspect of the problem, the second question offers a clear path for research.
The first question is too broad and subjective: there’s no clear criteria for what counts as “better.” The second question is much more . It uses clearly defined terms and narrows its focus to a specific population.
It is generally not for academic research to answer broad normative questions. The second question is more specific, aiming to gain an understanding of possible solutions in order to make informed recommendations.
The first question is too simple: it can be answered with a simple yes or no. The second question is , requiring in-depth investigation and the development of an original argument.
The first question is too broad and not very . The second question identifies an underexplored aspect of the topic that requires investigation of various  to answer.
The first question is not enough: it tries to address two different (the quality of sexual health services and LGBT support services). Even though the two issues are related, it’s not clear how the research will bring them together. The second integrates the two problems into one focused, specific question.
The first question is too simple, asking for a straightforward fact that can be easily found online. The second is a more question that requires and detailed discussion to answer.
? dealt with the theme of racism through casting, staging, and allusion to contemporary events? The first question is not  — it would be very difficult to contribute anything new. The second question takes a specific angle to make an original argument, and has more relevance to current social concerns and debates.
The first question asks for a ready-made solution, and is not . The second question is a clearer comparative question, but note that it may not be practically . For a smaller research project or thesis, it could be narrowed down further to focus on the effectiveness of drunk driving laws in just one or two countries.

Note that the design of your research question can depend on what method you are pursuing. Here are a few options for qualitative, quantitative, and statistical research questions.

Type of research Example question
Qualitative research question
Quantitative research question
Statistical research question

Other interesting articles

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

Methodology

  • Sampling methods
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, October 19). 10 Research Question Examples to Guide your Research Project. Scribbr. Retrieved August 21, 2024, from https://www.scribbr.com/research-process/research-question-examples/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, writing strong research questions | criteria & examples, how to choose a dissertation topic | 8 steps to follow, evaluating sources | methods & examples, what is your plagiarism score.

Sacred Heart University Library

Organizing Academic Research Papers: The Research Problem/Question

  • Purpose of Guide
  • Design Flaws to Avoid
  • Glossary of Research Terms
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Executive Summary
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tertiary Sources
  • What Is Scholarly vs. Popular?
  • Qualitative Methods
  • Quantitative Methods
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Annotated Bibliography
  • Dealing with Nervousness
  • Using Visual Aids
  • Grading Someone Else's Paper
  • How to Manage Group Projects
  • Multiple Book Review Essay
  • Reviewing Collected Essays
  • About Informed Consent
  • Writing Field Notes
  • Writing a Policy Memo
  • Writing a Research Proposal
  • Acknowledgements

A research problem is a statement about an area of concern, a condition to be improved, a difficulty to be eliminated, or a troubling question that exists in scholarly literature, in theory, or in practice that points to the need for meaningful understanding and deliberate investigation. In some social science disciplines the research problem is typically posed in the form of a question. A research problem does not state how to do something, offer a vague or broad proposition, or present a value question.

Importance of...

The purpose of a problem statement is to:

  • Introduce the reader to the importance of the topic being studied . The reader is oriented to the significance of the study and the research questions or hypotheses to follow.
  • Places the problem into a particular context that defines the parameters of what is to be investigated.
  • Provides the framework for reporting the results and indicates what is probably necessary to conduct the study and explain how the findings will present this information.

In the social sciences, the research problem establishes the means by which you must answer the "So What?" question. The "So What?" question refers to a research problem surviving the relevancy test [the quality of a measurement procedure that provides repeatability and accuracy]. Note that answering the "So What" question requires a commitment on your part to not only show that you have researched the material, but that you have thought about its significance.

To survive the "So What" question, problem statements should possess the following attributes:

  • Clarity and precision [a well-written statement does not make sweeping generalizations and irresponsible statements],
  • Identification of what would be studied, while avoiding the use of value-laden words and terms,
  • Identification of an overarching question and key factors or variables,
  • Identification of key concepts and terms,
  • Articulation of the study's boundaries or parameters,
  • Some generalizability in regards to applicability and bringing results into general use,
  • Conveyance of the study's importance, benefits, and justification [regardless of the type of research, it is important to address the “so what” question by demonstrating that the research is not trivial],
  • Does not have unnecessary jargon; and,
  • Conveyance of more than the mere gathering of descriptive data providing only a snapshot of the issue or phenomenon under investigation.

Castellanos, Susie. Critical Writing and Thinking . The Writing Center. Dean of the College. Brown University; Ellis, Timothy J. and Yair Levy Nova Framework of Problem-Based Research: A Guide for Novice Researchers on the Development of a Research-Worthy Problem. Informing Science: the International Journal of an Emerging Transdiscipline 11 (2008); Thesis and Purpose Statements . The Writer’s Handbook. Writing Center. University of Wisconsin, Madison; Thesis Statements . The Writing Center. University of North Carolina; Tips and Examples for Writing Thesis Statements . The Writing Lab and The OWL. Purdue University.  

Structure and Writing Style

I.  Types and Content

There are four general conceptualizations of a research problem in the social sciences:

  • Casuist Research Problem -- this type of problem relates to the determination of right and wrong in questions of conduct or conscience by analyzing moral dilemmas through the application of general rules and the careful distinction of special cases.
  • Difference Research Problem -- typically asks the question, “Is there a difference between two or more groups or treatments?” This type of problem statement is used when the researcher compares or contrasts two or more phenomena.
  • Descriptive Research Problem -- typically asks the question, "what is...?" with the underlying purpose to describe a situation, state, or existence of a specific phenomenon.
  • Relational Research Problem -- suggests a relationship of some sort between two or more variables to be investigated. The underlying purpose is to investigate qualities/characteristics that are connected in some way.

A problem statement in the social sciences should contain :

  • A lead-in that helps ensure the reader will maintain interest over the study
  • A declaration of originality [e.g., mentioning a knowledge void, which would be supported by the literature review]
  • An indication of the central focus of the study, and
  • An explanation of the study's significance or the benefits to be derived from an investigating the problem.

II.  Sources of Problems for Investigation

Identifying a problem to study can be challenging, not because there is a lack of issues that could be investigated, but due to pursuing a goal of formulating a socially relevant and researchable problem statement that is unique and does not simply duplicate the work of others. To facilitate how you might select a problem from which to build a research study, consider these three broad sources of inspiration:

Deductions from Theory This relates to deductions made from social philosophy or generalizations embodied in life in society that the researcher is familiar with. These deductions from human behavior are then fitted within an empirical frame of reference through research. From a theory, the research can formulate a research problem or hypothesis stating the expected findings in certain empirical situations. The research asks the question: “What relationship between variables will be observed if theory aptly summarizes the state of affairs?” One can then design and carry out a systematic investigation to assess whether empirical data confirm or reject the hypothesis and hence the theory.

Interdisciplinary Perspectives Identifying a problem that forms the basis for a research study can come from academic movements and scholarship originating in disciplines outside of your primary area of study. A review of pertinent literature should include examining research from related disciplines, which can expose you to new avenues of exploration and analysis. An interdisciplinary approach to selecting a research problem offers an opportunity to construct a more comprehensive understanding of a very complex issue than any single discipline might provide.

Interviewing Practitioners The identification of research problems about particular topics can arise from formal or informal discussions with practitioners who provide insight into new directions for future research and how to make research findings increasingly relevant to practice. Discussions with experts in the field, such as, teachers, social workers, health care providers, etc., offers the chance to identify practical, “real worl” problems that may be understudied or ignored within academic circles. This approach also provides some practical knowledge which may help in the process of designing and conducting your study.

Personal Experience Your everyday experiences can give rise to worthwhile problems for investigation. Think critically about your own experiences and/or frustrations with an issue facing society, your community, or in your neighborhood. This can be derived, for example, from deliberate observations of certain relationships for which there is no clear explanation or witnessing an event that appears harmful to a person or group or that is out of the ordinary.

Relevant Literature The selection of a research problem can often be derived from an extensive and thorough review of pertinent research associated with your overall area of interest. This may reveal where gaps remain in our understanding of a topic. Research may be conducted to: 1) fill such gaps in knowledge; 2) evaluate if the methodologies employed in prior studies can be adapted to solve other problems; or, 3) determine if a similar study could be conducted in a different subject area or applied to different study sample [i.e., different groups of people]. Also, authors frequently conclude their studies by noting implications for further research; this can also be a valuable source of problems to investigate.

III.  What Makes a Good Research Statement?

A good problem statement begins by introducing the broad area in which your research is centered and then gradually leads the reader to the more narrow questions you are posing. The statement need not be lengthy but a good research problem should incorporate the following features:

Compelling topic Simple curiosity is not a good enough reason to pursue a research study. The problem that you choose to explore must be important to you and to a larger community you share. The problem chosen must be one that motivates you to address it. Supports multiple perspectives The problem most be phrased in a way that avoids dichotomies and instead supports the generation and exploration of multiple perspectives. A general rule of thumb is that a good research problem is one that would generate a variety of viewpoints from a composite audience made up of reasonable people. Researchable It seems a bit obvious, but you don't want to find yourself in the midst of investigating a complex  research project and realize that you don't have much to draw on for your research. Choose research problems that can be supported by the resources available to you. Not sure? Seek out help  from a librarian!

NOTE:   Do not confuse a research problem with a research topic. A topic is something to read and obtain information about whereas a problem is something to solve or framed as a question that must be answered.

IV.  Mistakes to Avoid

Beware of circular reasoning . Don’t state that the research problem as simply the absence of the thing you are suggesting. For example, if you propose, "The problem in this community is that it has no hospital."

This only leads to a research problem where:

  • The need is for a hospital
  • The objective is to create a hospital
  • The method is to plan for building a hospital, and
  • The evaluation is to measure if there is a hospital or not.

This is an example of a research problem that fails the "so what?" test because it does not reveal the relevance of why you are investigating the problem of having no hospital in the community [e.g., there's a hospital in the community ten miles away] and because the research problem does not elucidate the significance of why one should study the fact that no hospital exists in the community [e.g., that hospital in the community ten miles away has no emergency room].

Choosing and Refining Topics . Writing@CSU. Colorado State University; Ellis, Timothy J. and Yair Levy Nova Framework of Problem-Based Research: A Guide for Novice Researchers on the Development of a Research-Worthy Problem. Informing Science: the International Journal of an Emerging Transdiscipline 11 (2008); How to Write a Research Question . The Writing Center. George Mason University; Invention: Developing a Thesis Statement . The Reading/Writing Center. Hunter College; Problem Statements PowerPoint Presentation . The Writing Lab and The OWL. Purdue University; Procter, Margaret. Using Thesis Statements . University College Writing Centre. University of Toronto; Trochim, William M.K. Problem Formulation . Research Methods Knowledge Base. 2006; Thesis and Purpose Statements . The Writer’s Handbook. Writing Center. University of Wisconsin, Madison; Thesis Statements . The Writing Center. University of North Carolina; Tips and Examples for Writing Thesis Statements . The Writing Lab and The OWL. Purdue University.

  • << Previous: Background Information
  • Next: Theoretical Framework >>
  • Last Updated: Jul 18, 2023 11:58 AM
  • URL: https://library.sacredheart.edu/c.php?g=29803
  • QuickSearch
  • Library Catalog
  • Databases A-Z
  • Publication Finder
  • Course Reserves
  • Citation Linker
  • Digital Commons
  • Our Website

Research Support

  • Ask a Librarian
  • Appointments
  • Interlibrary Loan (ILL)
  • Research Guides
  • Databases by Subject
  • Citation Help

Using the Library

  • Reserve a Group Study Room
  • Renew Books
  • Honors Study Rooms
  • Off-Campus Access
  • Library Policies
  • Library Technology

User Information

  • Grad Students
  • Online Students
  • COVID-19 Updates
  • Staff Directory
  • News & Announcements
  • Library Newsletter

My Accounts

  • Interlibrary Loan
  • Staff Site Login

Sacred Heart University

FIND US ON  

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Research process
  • How to Define a Research Problem | Ideas & Examples

How to Define a Research Problem | Ideas & Examples

Published on 8 November 2022 by Shona McCombes and Tegan George.

A research problem is a specific issue or gap in existing knowledge that you aim to address in your research. You may choose to look for practical problems aimed at contributing to change, or theoretical problems aimed at expanding knowledge.

Some research will do both of these things, but usually the research problem focuses on one or the other. The type of research problem you choose depends on your broad topic of interest and the type of research you think will fit best.

This article helps you identify and refine a research problem. When writing your research proposal or introduction , formulate it as a problem statement and/or research questions .

Table of contents

Why is the research problem important, step 1: identify a broad problem area, step 2: learn more about the problem, frequently asked questions about research problems.

Having an interesting topic isn’t a strong enough basis for academic research. Without a well-defined research problem, you are likely to end up with an unfocused and unmanageable project.

You might end up repeating what other people have already said, trying to say too much, or doing research without a clear purpose and justification. You need a clear problem in order to do research that contributes new and relevant insights.

Whether you’re planning your thesis , starting a research paper , or writing a research proposal , the research problem is the first step towards knowing exactly what you’ll do and why.

Prevent plagiarism, run a free check.

As you read about your topic, look for under-explored aspects or areas of concern, conflict, or controversy. Your goal is to find a gap that your research project can fill.

Practical research problems

If you are doing practical research, you can identify a problem by reading reports, following up on previous research, or talking to people who work in the relevant field or organisation. You might look for:

  • Issues with performance or efficiency
  • Processes that could be improved
  • Areas of concern among practitioners
  • Difficulties faced by specific groups of people

Examples of practical research problems

Voter turnout in New England has been decreasing, in contrast to the rest of the country.

The HR department of a local chain of restaurants has a high staff turnover rate.

A non-profit organisation faces a funding gap that means some of its programs will have to be cut.

Theoretical research problems

If you are doing theoretical research, you can identify a research problem by reading existing research, theory, and debates on your topic to find a gap in what is currently known about it. You might look for:

  • A phenomenon or context that has not been closely studied
  • A contradiction between two or more perspectives
  • A situation or relationship that is not well understood
  • A troubling question that has yet to be resolved

Examples of theoretical research problems

The effects of long-term Vitamin D deficiency on cardiovascular health are not well understood.

The relationship between gender, race, and income inequality has yet to be closely studied in the context of the millennial gig economy.

Historians of Scottish nationalism disagree about the role of the British Empire in the development of Scotland’s national identity.

Next, you have to find out what is already known about the problem, and pinpoint the exact aspect that your research will address.

Context and background

  • Who does the problem affect?
  • Is it a newly-discovered problem, or a well-established one?
  • What research has already been done?
  • What, if any, solutions have been proposed?
  • What are the current debates about the problem? What is missing from these debates?

Specificity and relevance

  • What particular place, time, and/or group of people will you focus on?
  • What aspects will you not be able to tackle?
  • What will the consequences be if the problem is not resolved?

Example of a specific research problem

A local non-profit organisation focused on alleviating food insecurity has always fundraised from its existing support base. It lacks understanding of how best to target potential new donors. To be able to continue its work, the organisation requires research into more effective fundraising strategies.

Once you have narrowed down your research problem, the next step is to formulate a problem statement , as well as your research questions or hypotheses .

Once you’ve decided on your research objectives , you need to explain them in your paper, at the end of your problem statement.

Keep your research objectives clear and concise, and use appropriate verbs to accurately convey the work that you will carry out for each one.

I will compare …

The way you present your research problem in your introduction varies depending on the nature of your research paper . A research paper that presents a sustained argument will usually encapsulate this argument in a thesis statement .

A research paper designed to present the results of empirical research tends to present a research question that it seeks to answer. It may also include a hypothesis – a prediction that will be confirmed or disproved by your research.

Research objectives describe what you intend your research project to accomplish.

They summarise the approach and purpose of the project and help to focus your research.

Your objectives should appear in the introduction of your research paper , at the end of your problem statement .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. & George, T. (2022, November 08). How to Define a Research Problem | Ideas & Examples. Scribbr. Retrieved 21 August 2024, from https://www.scribbr.co.uk/the-research-process/define-research-problem/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, dissertation & thesis outline | example & free templates, example theoretical framework of a dissertation or thesis, how to write a strong hypothesis | guide & examples.

Organizing Your Social Sciences Research Paper: Choosing a Research Problem

  • Purpose of Guide
  • Writing a Research Proposal
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • The Research Problem/Question
  • Academic Writing Style
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • The C.A.R.S. Model
  • Background Information
  • Theoretical Framework
  • Citation Tracking
  • Evaluating Sources
  • Reading Research Effectively
  • Primary Sources
  • Secondary Sources
  • What Is Scholarly vs. Popular?
  • Is it Peer-Reviewed?
  • Qualitative Methods
  • Quantitative Methods
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism [linked guide]
  • Annotated Bibliography
  • Grading Someone Else's Paper

A research problem is the main organizing principle guiding the analysis of your paper. The problem under investigation offers us an occasion for writing and a focus that governs what we want to say. It represents the core subject matter of scholarly communication, and the means by which we arrive at other topics of conversations and the discovery of new knowledge and understanding.

Choosing a Research Problem

Do not assume that choosing a research problem to study will be a quick or easy task! You should be thinking about it at the start of the course. There are generally three ways you are asked to write about a research problem : 1) your professor provides you with a general topic from which you study a particular aspect; 2) your professor provides you with a list of possible topics to study and you choose a topic from that list; or, 3) your professor leaves it up to you to choose a topic and you only have to obtain permission to write about it before beginning your investigation. Here are some strategies for getting started for each scenario.

Alvesson, Mats and Jörgen Sandberg. Constructing Research Questions: Doing Interesting Research . London: Sage, 2013; Chapter 1: Research and the Research Problem. Nicholas Walliman . Your Research Project: Designing and Planning Your Work . 3rd edition. Thousand Oaks, CA: Sage Publications, 2011.

I. How To Begin: You are given the topic to write about

Step 1 : Identify concepts and terms that make up the topic statement . For example, your professor wants the class to focus on the following research problem: “Is the European Union a credible security actor with the capacity to contribute to confronting global terrorism?" The main concepts is this problem are: European Union, global terrorism, credibility [ hint : focus on identifying proper nouns, nouns or noun phrases, and action verbs in the assignment description]. Step 2 : Review related literature to help refine how you will approach examining the topic and finding a way to analyze it . You can begin by doing any or all of the following: reading through background information from materials listed in your course syllabus; searching the University Libraries Catalog to find a recent book on the topic and, if appropriate, more specialized works about the topic; conducting a preliminary review of the research literature using multidisciplinary library databases such as Start Your Research or subject-specific databases from the " Databases By Subject " page.

Choose the advanced search option feature and enter into each search box the main concept terms you developed in Step 1. Also consider using their synonyms to retrieve relevant articles. This will help you refine and frame the scope of the research problem. You will likely need to do this several times before you can finalize how to approach writing about the topic. NOTE : Always review the references from your most relevant research results cited by the authors in footnotes, endnotes, or a bibliography to locate related research on your topic. This is a good strategy for identifying important prior research about the topic because titles that are repeatedly cited indicate their significance in laying a foundation for understanding the problem. However, if you’re having trouble at this point locating relevant research literature, ask a librarian for help!

ANOTHER NOTE :  If you find an article from a journal that's particularly helpful, put quotes around the title of the article and paste it into Google Scholar . If the article record appears, look for a "cited by" reference followed by a number. This link indicates how many times other researchers have subsequently cited that article since it was first published. This is an excellent strategy for identifying more current, related research on your topic. Finding additional cited by references from your original list of cited by references helps you navigate through the literature and, by so doing, understand the evolution of thought around a particular research problem. Step 3 : Since social science research papers are generally designed to get you to develop your own ideas and arguments, look for sources that can help broaden, modify, or strengthen your initial thoughts and arguments. For example, if you decide to argue that the European Union is ill prepared to take on responsibilities for broader global security because of the debt crisis in many EU countries, then focus on identifying sources that support as well as refute this position. From the advanced search option in Start Your Research , a sample search would use "European Union" in one search box, "global security" in the second search box, and adding a third search box to include "debt crisis."

There are least four appropriate roles your related literature plays in helping you formulate how to begin your analysis :

  • Sources of criticism -- frequently, you'll find yourself reading materials that are relevant to your chosen topic, but you disagree with the author's position. Therefore, one way that you can use a source is to describe the counter-argument, provide evidence from your review of the literature as to why the prevailing argument is unsatisfactory, and to discuss how your own view is more appropriate based upon your interpretation of the evidence.
  • Sources of new ideas -- while a general goal in writing college research papers in the social sciences is to approach a research problem with some basic idea of what position you'd like to take and what grounds you'd like to stand upon, it is certainly acceptable [and often encouraged] to read the literature and extend, modify, and refine your own position in light of the ideas proposed by others. Just make sure that you cite the sources !
  • Sources for historical context -- another role your related literature plays in helping you formulate how to begin your analysis is to place issues and events in proper historical context. This can help to demonstrate familiarity with developments in relevant scholarship about your topic, provide a means of comparing historical versus contemporary issues and events, and identifying key people, places, and events that had an important role related to the research problem.
  • Sources of interdisciplinary insight -- an advantage of using databases like Start Your Research to begin exploring your topic is that it covers publications from a variety of different disciplines. Another way to formulate how to study the topic is to look at it from different disciplinary perspectives. If the topic concerns immigration reform, for example, ask yourself, how do studies from sociological journals found by searching Start Your Research vary in their analysis from those in law journals. A goal in reviewing related literature is to provide a means of approaching a topic from multiple perspectives rather than the perspective offered from just one discipline.

NOTE : Remember to keep careful notes at every stage or utilize a citation management system like Endnote. You may think you'll remember what you have searched and where you found things, but it’s easy to forget or get confused. Most databases have a search history feature that allows you to go back and see what searches you conducted previously as long as you haven't closed your session. If you start over, that history could be deleted.

Step 4 : Assuming you've done an effective job of synthesizing and thinking about the results of your initial search for related literature, you're ready to prepare a detailed outline for your paper that lays the foundation for a more in-depth and focused review of relevant research literature [after consulting with a librarian, if needed!]. How will you know you haven't done an effective job of synthesizing and thinking about the results of our initial search for related literature? A good indication is that you start composing your paper outline and gaps appear in how you want to approach the study. This indicates the need to gather further background information and analysis about your research problem.

II. How To Begin: Your professor leaves it up to you to choose a topic

Step 1 : Under this scenario, the key process is turning an idea or general thought into a topic that can be configured into a research problem. When given an assignment where you choose the research topic, don't begin by thinking about what to write about, but rather, ask yourself the question, "What do I want to know?" Treat an open-ended assignment as an opportunity to learn about something that's new or exciting to you.

Step 2 : If you lack ideas, or wish to gain focus, try any or all of the following strategies:

  • Review your course readings, particularly the suggested readings, for topic ideas. Don't just review what you've already read but jump ahead in the syllabus to readings that have not been covered yet.
  • Search the University Libraries Catalog for a good, recently published book and, if appropriate, more specialized works related to the discipline area of the course [e.g., for the course SOCI 335, search for books on population and society].
  • Browse through some current journals in your subject discipline. Even if most of the articles are not relevant, you can skim through the contents quickly. You only need one to be the spark that begins the process of wanting to learn more about a topic. Consult with a librarian and/or your professor about the core journals within your subject discipline.
  • Think about essays you have written for past classes, other courses you have taken, or academic lectures and programs you have attended. Thinking back, what interested you the most? What would you like to know more about? Place this in the context of the current course assignment.
  • Search online media sources, such as CNN , the Los Angeles Times , Huffington Post , Fox News , or Newsweek , to see if your idea has been covered by the media. Use this coverage to refine your idea into something that you'd like to investigate further, but in a more deliberate, scholarly way based on a particular problem that needs to be researched.

Step 3 : To build upon your initial idea, use the suggestions under this tab to help narrow , broaden , or increase the timeliness of your idea so you can write it out as a research problem.

Once you are comfortable with having turned your idea into a research problem, follow Steps 1 - 4 listed in Part I above to further develop it into a research paper.

Alderman, Jim. " Choosing a Research Topic ." Beginning Library and Information Systems Strategies. Paper 17. Jacksonville, FL: University of North Florida Digital Commons, 2014; Alvesson, Mats and Jörgen Sandberg. Constructing Research Questions: Doing Interesting Research . London: Sage, 2013; Chapter 2: Choosing a Research Topic. Adrian R. Eley. Becoming a Successful Early Career Researcher . New York: Routledge, 2012; Answering the Question . Academic Skills Centre. University of Canberra; Brainstorming . Department of English Writing Guide. George Mason University; Brainstorming . The Writing Center. University of North Carolina; Chapter 1: Research and the Research Problem. Nicholas Walliman . Your Research Project: Designing and Planning Your Work . 3rd edition. Thousand Oaks, CA: Sage Publications, 2011; Choosing a Topic . The Writing Lab and The OWL. Purdue University;  Coming Up With Your Topic . Institute for Writing Rhetoric. Dartmouth College; How To Write a Thesis Statement . Writing Tutorial Services, Center for Innovative Teaching and Learning. Indiana University; Identify Your Question . Start Your Research. University Library, University of California, Santa Cruz; The Process of Writing a Research Paper . Department of History. Trent University; Trochim, William M.K. Problem Formulation . Research Methods Knowledge Base. 2006.

III. How To Begin: You are provided a list of possible topics to choose from

I.  How To Begin:  You are provided a list of possible topics to choose from Step 1 : I know what you’re thinking--which topic from this list my professor has given me will be the easiest to find the most information on? An effective instructor should never include a topic that is so obscure or complex that no research is available to examine and from which to begin to design a study. Instead of searching for the path of least resistance choose a topic that you find interesting in some way, or that is controversial and that you have a strong opinion about, or has some personal meaning for you. You're going to be working on your topic for quite some time, so choose one that you find interesting and engaging or that motivates you to take a position. Embrace the opportunity to learn something new! Once you’ve settled on a topic of interest from the list, follow Steps 1 - 4 listed above to further develop it into a research paper.

NOTE : It’s ok to review related literature to help refine how you will approach analyzing a topic, and then discover that the topic isn’t all that interesting to you. In that case, you can choose another from the list. Just don’t wait too long to make a switch and, of course, be sure to inform your professor that you are changing your topic.

Resources for Identifying a Topic

Resources for identifying a research problem.

If you are having difficulty identifying a topic to study or need basic background information, the following web resources and databases can be useful:

Reports on a wide range of topics, with overviews, background and timeline of a topic, assessment of the current situation, tables and maps, pro/con statements from opposing positions. 1923 to present.

  • New York Times Topics Each topic page collects news, reference and archival information, photos, graphics, audio and video files published on a variety of topics. Content is available without charge on articles going back to 1981.

TexShare

Writing Tip

Don't be a Martyr!

In thinking about a research topic to study, don't adopt the mindset of pursuing an esoteric or incredibly complicated topic just to impress your professor but that, in reality, does not have any real interest to you. As best as you can, choose a topic that has at least some interest to you or that you care about. Obviously, this is easier for courses within your major, but even for those nasty prerequisite classes that you must take in order to graduate [and that provide an additional revenue stream to the university], try to apply issues associated with your major to the general topic given to you. For example, if you are an IR major taking a philosophy class where the assignment asks you to apply the question of "what is truth" to some aspect of life, you could choose to study how government leaders attempt to shape truth through the use of propaganda.

Another Writing Tip

Not Finding Anything on Your Topic? Ask a Librarian!

Librarians are experts in locating information and providing strategies for analyzing existing knowledge in new ways. Don't assume or jump to the conclusion that your topic is too narrowly defined or obscure just because you haven’t found any information about it. Always consult a librarian before you consider giving up on finding information about the topic you want to investigate. If there isn't a lot of information about your topic, a librarian can often help you identify a closely related topic that you can study. Follow this link to contact a librarian.

  • << Previous: Independent and Dependent Variables
  • Next: Narrowing a Topic Idea >>
  • Last Updated: Sep 8, 2023 12:19 PM
  • URL: https://guides.library.txstate.edu/socialscienceresearch

Sir Kim Workman’s major police investigation released: What the seven reports reveal

Julia Gabel

WARNING: This story deals with suicide

A major report into unconscious bias among the police has found being Māori increases the chance of being prosecuted by 11% compared with Pākehā.

In addition, Māori made up 42% of people who were tasered during the review period, despite comprising 17.8% of the population.

The three-year study into systemic bias has also found most people tasered by police were mentally distressed or unwell and there were instances of officers wrongly interpreting that person’s behaviour as disobedience rather than an inability to understand instructions.

The panel, chaired by esteemed legal experts Tā (Sir) Kim Workman and Professor Khylee Quince, revealed systemic bias and racial profiling within policing is continuing – partly due to gaps in police training.

The investigation gave researchers for the first time unrestricted access to police systems, complaints, meeting minutes and documents.

Key findings include:

  • 54% of all Taser discharge events over a six-month period involved people who were mentally distressed, unwell or attempting self-harm or suicide
  • Being Māori increased the likelihood of prosecution by 11% compared to Pākehā for the same offence
  • Māori were over-represented in Taser events, with complaints of “racism or racial profiling”
  • Police could be influenced by stereotypes when making decisions under threat
  • Being in a gang, an associate or having prior convictions “significantly increased” the likelihood of prosecution
  • The panel has “significant concerns” over new police Tasers, known as Taser10
  • Four people staying at a mental health facilities, and three emotionally distressed young people aged between 14-17, were tasered during the report period
  • Five children were “laser painted”, meaning the Taser’s laser was shone on their bodies by police
  • Illegal photographing of rangatahi: Police focused on ensuring their practice was “within the law” without considering the effect their actions had on public trust and confidence.

The first tranche of documents, including seven reports, were released today and include 40 wide-ranging recommendations, including stopping the use of police ethnicity data for decision-making, reviewing all incidents where a Taser is deployed on young people or those aged over 60, and increase de-escalation training for police responding to individuals experiencing a mental health crisis.

Quince, who is dean of AUT’s law school, told the Herald police leadership would “absolutely” accept systemic bias and racial profiling occurred, as detailed in the report – but the “big question” was how they would act and “whether or not it would land”.

The panel the review was focused on the systems within policing that could contribute to systemic bias, not individual officers. Findings were sense checked with a group of 30 front-line police officers.

“It’s no secret that as a whole the Māori community doesn’t have high levels of trust and confidence in the police. But what they do want is equity and fairness.”

However, Police Minister Mark Mitchell has denied there is systemic bias in policing. Mitchell had not finished reading the report, but was presented with some of its findings by reporters.

“I’m not denying that Māori are over-represented in our justice system, absolutely they are,” he said.

“What I’m saying to you though is stop pointing the finger and blaming the police... social investment is something that I firmly believe in, we’ve got to get into people’s lives much earlier.”

Police Minister Mark Mitchell. Photo / Marty Melville

The study, named Understanding Policing Delivery (UPD), was commissioned by Police Commissioner Andrew Coster in 2020 during a time when police globally faced intense scrutiny following the killing of George Floyd and the Black Lives Matter movement.

High Taser rates for mentally unwell people

Researchers reviewed six months of Taser data, including footage of the events. Of people tasered over this period, 54% involved people experiencing mental distress and who were mentally unwell and/or attempting self-harm or suicide.

The Taser reports indicated a belief among police that those clearly experiencing distress were non-compliant, rather than unwell or unable to follow instruction, the report noted.

In some instances, multiple officers issued commands at the same time, often confusing the person involved. In other instances, footage indicated the person may have had a cognitive or physical disability, but this was not noted by the officer in the report.

“Police appeared to be unwilling to approach individuals they perceived to be unwell, and reported this in the narratives, preferring to maintain distance and deploy a Taser43,” the report noted.

Khylee Quince, chairwoman of the independent panel for the Understanding Policing Delivery Project, speaking about the report findings. Photo / Alyse Wright

In some instances, urgency to gain control of a situation involving a mentally unwell person appeared to influence decision-making. It appeared discharging a Taser was seen as the quickest and most efficient way of gaining compliance, the report noted.

“Camera footage of events indicated not all police have the skills to appropriately respond, manage or de-escalate a mental health crisis using appropriate humanistic tactics.”

Quince said the proportion of Taser events involving mentally distressed people was “pretty shocking” but police had been saying “for some time” they did not want to be the agency responding to such events.

The panel’s recommendations include establishing a cross-government response to these events from health, welfare and other agencies, rather than police.

Tā Kim Workman: Māori-police relationship needs work

Māori are over-represented at every stage of the justice system and the study shows the long-known disparities in the experiences of Māori in policing and the justice system continue today.

Māori were 11% more likely to be prosecuted than Pākehā for the same offence and Māori made 38% of complaints of racism or racial profiling, the highest proportion for an ethnicity. For complaints about the use of force, the report found the level of force described tended to be more serious for Māori and Pacific people.

Although Māori made up 17.8% of the population, they were involved in 42% of Taser discharge events.

Esteemed justice reform advocate and former police officer Tā Kim Workman said up until recently, the history of policing had not been taught at police college – but it is an important history that illustrates why the relationship between Māori and police today may be fraught.

“Most police officers, even senior police officers, had a very sketchy understanding [of that history],” he told the Herald .

“When you start to tell the story, it starts to reveal that Māori were a subject of discriminatory and systemic racism from the very beginning.”

As Workman explained, from 1840s onwards the purpose of the colonial police force was to “subjugate the local indigenous people and mould them to a European way of life”.

“They were granted extraordinary powers to ensure that they did, including legislation which allowed them to assault and suppress Māori.”

Through the 1860s, efforts to subdue Māori resistance to land sales led to the Waikato War and mass arrests, imprisonment and execution often without trial, the report noted.

Other historic instances of police force against Māori include the 1881 constabulary invasion of Parihaka, a community that had campaigned on passive resistance and the 1916 police raid on the Ngāi Tūhoe settlement of Maungapōhatu to arrest prophet and community leader Rua Kēnana.

More recently, reporting by RNZ revealed a widespread practice of police stopping and illegally photographing Māori youth. An investigation was launched and in 2021, police leadership acknowledged police practices could be improved.

The panel found police’s focus was on ensuring photographs could be taken within the law, without considering the effect of police practice on public trust and confidence. Workman described the issue as “unresolved”.

Justice reform advocate and former policeman Tā Kim Workman. Photo / Supplied

“It’s inevitable when that stuff happens over many years that the level of trust and confidence in the police and the criminal justice system is impacted.”

Police were part of wider system of government agencies that had underserved Māori, including education and health, and as a result, Māori were more likely to live in poverty and be exposed to police intervention in their lives, he said.

“When that happens repeatedly over time, those sorts of issue coalesce into people’s whānau so you have whānau who have low education, have mental health issues or drug and alcohol issues, unemployed, poor housing, and so forth, that it’s inevitable that crime becomes an outcome,” he said.

“When you see that happening over decades, then you start to understand about the level of over-representation of Māori within the system.”

Workman said people valued police kindness and empathy, which was noted in the feedback from people thanking police for their warmth, even though they had done something unlawful.

Workman said Māori, including people who were sceptical of police, had an ultimate “desire to be accepted fully as a New Zealand citizen and be Māori”.

“When they are engaged with in a way which acknowledges that they have the right to be treated with respect and dignity, that they are a human and they have rights, they respond really well to that.”

Police Commissioner Andrew Coster talking with Prime Minister Christopher Luxon in Auckland. Photo / Michael Craig

Commissioner: Police on the “right path”

Police Commissioner Andrew Coster acknowledged there was work to do, but said the sector was on the “right” path and “the momentum would continue.”

“I am very confident in the direction our organisation is heading. We have one of the best police services in the world, there is no doubt about that. We have incredible people doing great work.

“What we have taken from this research are oppourtunities to keep shaping the way training and the way our systems position our people to deliver fairness in policing. Not all of the things are within police’s control, so I’m real confident this is on the right path.”

“What we can’t tell from the research, in terms of that remaining gap, is how much is within police’s control and how much isn’t. Things like the high prevalence of mental health crisis demand may be driving some of that discrepancy in way that is beyond police’s control. The path is right.”

Young people among those tasered

Young people in “emotionally heightened and distressed” states were tasered during the report period. All three were Māori males, aged between 14-17. They were aggressive and had threatened police, the report said.

One of the young men had a weapon, was intoxicated and had stolen a car, while another youth was tasered after an order from Oranga Tamariki to remove him from the house he had run away to. He had refused to leave the house and had become “assaultive” toward an officer.

Five children were “laser painted”, meaning the Taser’s laser was shone on their body, before they complied with police instruction. Two of the children had a knife and were threatening police, the youngest, a 10-year-old, was attempting suicide.

‘Significant concerns’ over police Tasers

During the study, the panel raised “significant concerns” over plans to replace police’s current high-voltage stun gun with the Taser10 model. The old model is being discontinued and the new model does not have a camera.

The panel said the “assurance model” was based around the ability to review footage from in-built Taser cameras. Taser cameras were an “important layer of national assurance”, the report said.

16 internal police complaints

Researchers were given access to complaints filed against police during a three-month period. This included 16 internal complaints (police officers complaining about other officers). Themes that emerged from the complaints included bullying and harassment, sexism, subcultures of police and negative impacts on police work.

In the complaints, officers described feeling harassed, bullied and targeted, which impacted their mental health and wellbeing. Individuals reported they didn’t feel safe at work, and this led to feelings of paranoia and being on edge, the report noted.

Public lap up police kindness, empathy

Just under a quarter of the feedback received during the reporting period was praise, often where officers were seen to go the extra mile to help people.

This included officers responding to mental health situations, including one instance where “fantastic” officers convinced a woman’s flatmate, who was attempting suicide, to “seek help at the hospital despite the ambulance not helping”.

“They went above and beyond their duty, and I believe they are the reason my roommate is alive.”

SUICIDE AND DEPRESSION – Where to get help:

  • Lifeline : Call 0800 543 354 or text 4357 (HELP) (available 24/7)
  • Suicide Crisis Helpline : Call 0508 828 865 (0508 TAUTOKO) (available 24/7)
  • Youth services: (06) 3555 906
  • Youthline : Call 0800 376 633 or text 234
  • What’s Up : Call 0800 942 8787 (11am to 11pm) or webchat (11am to 10.30pm)
  • Depression helpline : Call 0800 111 757 or text 4202 (available 24/7)
  • Helpline: Need to talk? Call or text 1737
  • Aoake te Rā (Bereaved by Suicide Service) : Call 0800 000 053If it is an emergency and you feel like you or someone else is at risk, call 111.

Julia Gabel is a Wellington-based political reporter. She joined the Herald in 2020 and has most recently focused on data journalism.

Latest from New Zealand

‘it didn’t happen’: ex-political figure accused of sexual abuse, ‘slippery slope to mediocrity’: mayors hit back after pm demands focus on basics, hawke's bay ambulance workers 'tired, broken', tackling nz’s food waste problem.

‘It didn’t happen’: Ex-political figure accused of sexual abuse

The jury has now heard the defendant's side of the story.

‘Slippery slope to mediocrity’: Mayors hit back after PM demands focus on basics

Government to help councils move faster in emergencies

Plan now and play later

Plan now and play later

  • Open access
  • Published: 20 August 2024

Current limitations in predicting mRNA translation with deep learning models

  • Niels Schlusser 1 ,
  • Asier González 1 , 2 ,
  • Muskan Pandey 1 , 3 &
  • Mihaela Zavolan   ORCID: orcid.org/0000-0002-8832-2041 1  

Genome Biology volume  25 , Article number:  227 ( 2024 ) Cite this article

Metrics details

The design of nucleotide sequences with defined properties is a long-standing problem in bioengineering. An important application is protein expression, be it in the context of research or the production of mRNA vaccines. The rate of protein synthesis depends on the 5′ untranslated region (5′UTR) of the mRNAs, and recently, deep learning models were proposed to predict the translation output of mRNAs from the 5′UTR sequence. At the same time, large data sets of endogenous and reporter mRNA translation have become available.

In this study, we use complementary data obtained in two different cell types to assess the accuracy and generality of currently available models for predicting translational output. We find that while performing well on the data sets on which they were trained, deep learning models do not generalize well to other data sets, in particular of endogenous mRNAs, which differ in many properties from reporter constructs.

Conclusions

These differences limit the ability of deep learning models to uncover mechanisms of translation control and to predict the impact of genetic variation. We suggest directions that combine high-throughput measurements and machine learning to unravel mechanisms of translation control and improve construct design.

The translation of most mRNAs into proteins is initiated by the recruitment of the eIF4F complex at the 7-methylguanosine cap, followed by eIF3, the initiator tRNA and the 40S subunit of the ribosome [ 1 ]. The 40S subunit scans the mRNA’s 5′ untranslated region (5′UTR) until it recognizes a start codon; then, the 60S subunit joins to complete the ribosome assembly and initiate protein synthesis. Initiation is the limiting step of translation, largely determining the rate of protein synthesis [ 2 ]. It is influenced by multiple features of the 5′ untranslated region (5′UTR), from the structural accessibility of the cap-proximal region [ 3 ], to the strength of the Kozak sequence around the start codon (consensus gccRccAUGG, upper case-highly conserved bases, R = A or G, [ 4 ]), and the number and properties of upstream open reading frames (uORFs) that can hinder ribosome scanning to the main ORF (mORF), inhibiting its translation [ 5 , 6 , 7 , 8 ]. These (and presumably other) factors lead to initiation rates that differ up to 100 fold between mRNAs [ 9 ] and a similarly wider range of protein relative to mRNA abundance [ 10 ].

Accurate prediction of protein output from the mRNA sequence is of great interest for protein engineering and increasingly relevant with the rise of RNA-based therapies. This has prompted the development of both experimental methods for the high-throughput measurement of protein outputs as well as of computational models that can be trained on these data. An important development has been the introduction of ribosome footprinting (also known as ribosome profiling), a technique for capturing and sequencing the footprints of translating ribosomes (RPFs) on individual mRNAs [ 2 ]. The ratio of normalized RPFs and RNA-seq reads over the coding region is used as an estimate of “translation efficiency” (TE), which is considered a proxy for the synthesis rate of the encoded protein [ 2 ]. Ribosome footprinting has been applied to a variety of cells and organisms [ 11 ], yielding new mechanistic and regulatory insights (e.g., [ 12 , 13 ]). An early study of yeast translation concluded that up to 58% of the variance in TE can be explained with 6 parameters, though the most predictive was the mRNA level expression of the gene, which is not a feature that can be derived from the sequence of the mRNA [ 6 ].

At the same time, massively parallel reporter assays (MPRA) were developed to measure translation for large libraries of reporter constructs, further used to train deep learning (DL) models. A convolutional neural network (CNN) [ 14 ] explained 93% of the variance in the mean ribosome load (MRL) of reporter constructs, but less, 81%, for 5′UTR fragments taken from endogenous mRNAs. The CNN also recovered some of the important regulatory elements such as uORFs [ 14 ]. More recently, a novel experimental design was used to accurately measure the output of yeast reporters driven by natural 5′UTRs [ 15 ], while novel DL architectures and training approaches aimed to improve prediction accuracy [ 16 , 17 ]. Potential limitations of DL models built from synthetic sequences is that it is a priori unclear whether the training set contains the regulatory elements that are relevant in vivo and whether the features extracted by the model generalize well across systems such as cell types and readouts of the process of interest. These bottlenecks may limit not only the understanding of regulatory mechanisms but also the use of derived models for predictions of functional impact of sequence variations and for construct design. To assess whether these issues impact the current RNA sequence-based models of translation, we carried out a detailed comparison of model performance in a standardized setting that uses complementary data sets obtained in two distinct cell types. We trained and applied models to the prediction of translation output in yeast and human cells, addressing the following questions: (1) are models trained on synthetic sequences able to predict the translation output of endogenous mRNAs in the same cellular system? (2) do these models generalize between different cellular systems (different cell types, different species)? (3) what is their parameter-efficiency (fraction of explained variance per model parameter)? (4) what are the conserved regulatory elements of translation that have so far been learned by DL models?

Experimental measurements of translation output

The current method of choice for measuring the translation output of endogenous mRNAs is ribosome footprinting, consisting in the purification and sequencing of mRNA fragments that are protected from RNase digestion by translating ribosomes [ 2 ]. The TE of an mRNA is then estimated as the ratio of ribosome-protected fragments (RPFs) obtained from the mRNA by ribosome footprinting and coding-region-mapped reads obtained by RNA-seq from the same sample [ 18 ]. Ribosome footprinting has been applied to many systems, including yeast cells [ 6 , 19 ] and the human embryonic kidney cell line HEK 293 [ 20 ], for which a variety of omics measurements are available. Importantly, MPRA of translation were carried out in these cell types, giving us the opportunity to determine whether reporter-based models can predict translation of endogenous mRNAs in a given cell type. Figure  1 summarizes the main approaches used to measure translation in yeast and human cells, starting from the just-described ribosome footprinting technique (Fig.  1 A). The MPRA approach used by [ 14 ] to generate the Optimus50/100 MPRA data sets (Fig.  1 B) consists in the transfection of in vitro-transcribed mRNAs with randomized 5′UTRs upstream of the eGFP coding region into HEK 293 cells, followed by sequencing of RNAs from polysome fractions. The MRL, i.e., the average number of ribosomes on individual mRNAs is derived from abundance profile of individual mRNAs along polysome fractions. In another approach, called DART (for direct analysis of ribosome targeting), Niederer and colleagues [ 15 ] have synthesized in vitro translation-competent mRNAs consisting of natural 5′UTRs and a very short (24 nucleotides (nts)) coding sequence. A few mutations were introduced in the 5′UTRs, as necessary to unambiguously define the translation start. After incubation with yeast extract, ribosome-associated mRNAs were isolated, sequenced, and a ribosome recruitment score (RRS) of an mRNA was calculated as the ratio of its abundance in the ribosome-bound fraction relative to the input mRNA pool (Fig.  1 C). A previously developed MPRA used plasmids containing randomized 5′UTRs placed upstream of the HIS3 gene to transform yeast cells lacking a native copy of the gene [ 21 ]. The amount of HIS3 protein generated from the reporters was assessed in a competitive growth assay, by culturing the yeast cells in media lacking histidine and using the enrichment of reporter constructs in the output vs. the input culture as a measure of HIS3 expression (Fig.  1 D).

figure 1

Experimental approaches to quantifying translation output. A Sequencing of total mRNA and ribosome-protected fragments of endogenous mRNAs is used to estimate the translation efficiency per mRNA. B Massively parallel reporter assays (MPRA) measure the output of constructs consisting in randomized 5′UTRs attached to the coding region of a reporter protein. Sequencing of polysome fractions enables the calculation of a mean ribosome load per construct, which is used as a measure of translation output. C DART follows a similar approach with endogenous 5′UTRs, once upstream AUGs (uAUGs) located in the 5′UTR are mutated to AGU to avoid ambiguity in translation start sites.  D  In an alternative MPRA in yeast, the enrichment of 5′UTRs driving expression of a protein required for growth served as proxy for the translation output of the respective constructs. More details can be found in the Methods - Experimental methods section

The reproducibility of experimental measurements sets an upper bound on the accuracy of models’ predictions of different types of data. For MPRA data sets the \(R^2\) of replicate measurements is usually very high, values of  0.95 being generally reported [ 14 ]. In contrast, the reproducibility of TE, which is a ratio of two variables-ribosome footprints and RNA-seq fragments mapping to a given mRNA-is generally lower. In the HEK 293 ribo-seq data set that we analyzed [ 20 ], the \(R^2\) for RPFs was in the range \(0.77-0.82\) , while for mRNA-seq it was 0.96, leading to \(R^2\) of TE estimates \(0.47-0.52\) (Additional file 1: Fig. S1). We further obtained an additional ribo-seq data set from another human cell line, HepG2, with the aim of exploring the limits of replicate reproducibility of this type of measurement and evaluating the conservation of TE between cell types, which is also important when applying a model trained in a particular cell type to predict data from another cell type, c.f. Additional file 1: Fig. S2. The TE estimates from HepG2 cells were more reproducible, with \(R^2\) for replicates \(0.68-0.8\) . When comparing the TE estimates from HEK 293 and HepG2 cells, we obtained an \(R^2 = 0.31\) , which would be an upper bound on the accuracy of a model trained on one of these data sets in predicting the TEs in the other cell line. The general reproducibility of translational efficiency as well as coverage in RNA sequencing and ribosome footprinting in yeast (data from [ 19 ]) appears to be of similar quality as the HepG2 data, as can be seen from Additional file 1: Fig. S3.

To ensure comparability of our results with those of previous studies, we aimed to replicate their model training strategy, which generally involved setting aside the highest quality data (constructs with the largest number of mapped reads) for testing and using the rest for training [ 14 , 16 ]. High expression is not the only determinant of measurement accuracy for endogenous mRNA data sets. For example, in yeast, the integrity of the sequenced RNAs was previously identified as key source of noise for TE estimates [ 6 ]. A proxy for RNA integrity is the transcript integrity score (TIN) [ 22 ], which quantifies the uniformity of coverage of the RNA by sequenced reads and ranges from 0 (3′ bias) to 100 (perfectly uniform coverage). As the TE reproducibility in HEK 293 cells increased with the TIN score ( \(R^2 = 0.67-0.75\) for TIN \(> 70\) vs. \(0.47-0.52\) for all), for the human endogenous data, we used the mRNAs with TIN \(>70\) ( \(\approx\) 10% mRNAs) for testing and all the others for training. In yeast, however, the reproducibility of TE does not depend on TIN (Additional file 1: Fig. S3 shows that), indicating that RNA degradation is much less dominant here than in human cells. As can be seen from Additional file 1: Fig. S4, selection of the test data set based on the TIN score does not introduce any bias in 5′UTR length, main ORF length, or TE in HEK 293 cells. The situation for yeast, however, is slightly different; transcripts with higher TIN score have higher TE. Also, in the yeast data, the increase in inter-replicate reproducibility of TE when selecting the transcripts with TIN \(>70\) is negligible. Therefore, we used three different random splits for the endogenous yeast data set and provide the average performance.

Models for predicting translation output from the mRNA sequence

To explain the translation efficiency estimated by ribosome footprinting in yeast, Weinberg et al. [ 6 ] proposed a simple, 6-parameter linear model with the following features: lengths of the CDS and 5′UTR, G/C content of the 5′UTR, number of uAUGs, free energy of folding of the 5′cap-proximal region and the mRNA abundance. This linear model was surprisingly accurate ( \(R^2 = 0.58\) ) in predicting the efficiency, though leaving out the mRNA level reduced the \(R^2\) to 0.39. Here, we use a similar model as baseline to assess the parameter efficiency of DL models, i.e., the fraction of explained variance per model parameter. The features of our linear model are as follows: we use the same length and G/C content measures, the 5′UTR folding free energy divided by the 5′UTR length, the number of out-of-frame upstream AUGs (OOF uAUGs), the number of in-frame upstream AUGs (IF uAUGs), and the number of exons in the mRNA [ 23 ]. A bias term adds an additional parameter.

The first type of DL architecture trained on MPRA data was the Optimus 5-Prime CNN [ 14 ], operating on one-hot-encoded 5′UTR sequences. Optimus 5-Prime has 3 convolutional layers, each with 120 filters of 8 nucleotides (nts), followed by two dense layers separated by a dropout layer. The output of the last layer is the predicted translation output, i.e., the MRL for the HEK 293 cell line data set (Fig.  1 B, Fig.  2 A) and the relative growth rate for yeast cells [ 21 ] (Fig.  1 D). While reaching very good performance in predicting the MRL for synthetic sequences, Optimus 5-Prime could only make predictions for UTRs of up to 100 nts, which account for only \(\sim\) 32% of annotated human \(5'\) UTRs. Longer 5′UTRs could be accommodated by truncation to the upstream vicinity of the start codon. The Optimus 5-Prime model just described has 474,681 parameters.

figure 2

Architectures of different artificial neural networks used to predict the output of translation. Optimus 5-Prime [ 14 ] uses three convolutional layers and two dense layers ( A ), Framepool ( B ) is similar, but with a customized frame-wise pooling operation between convolutional and dense layers [ 16 ], and MTtrans ( C ) stacks a “task-specific” tower of two recurrent and one dense layers on top of a “shared encoder” of four convolutional layers. D  represents an approach entirely relying on recurrent layers; it is built from two bidirectional LSTM layers, followed by a dropout and fully connected layer. TranslateLSTM ( E ) consists of three sub-networks: a two-layer bidirectional LSTM network for the 5′UTRs, another two-layer bidirectional LSTM network for the first 100 nts of the CDS, and non-sequential input features previously found to control translation. For further information, we refer to the Methods - Model architectures section

The 5′UTR length limitation was overcome by Framepool [ 16 ], another CNN containing 3 convolutional layers with 128 7-nts filters (Fig.  2 B). Importantly, Framepool slices the output of the third convolutional layer according to the frame relative to the start codon, then pools the data frame-wise, taking the maximum and average values for each pool. This allows both the processing of sequences of arbitrary length and the detection of motifs in specific reading frames. The frame-wise results are the input for the final two dense layers. For variable length 5′UTRs, Framepool was shown to yield somewhat improved predictions relative to Optimus 5-Prime [ 16 ], with a smaller number of parameters, 282,625.

MTtrans [ 17 ] is the most recently proposed DL model (Fig.  2 C). Its basic premise is that the elements controlling translation generalize across data sets generated with different experimental techniques. Each data set is viewed as a task. The model combines 4 convolutional layers with batch normalization, regularization and dropout, with two bidirectional gated recurrent unit (GRU) layers and a final dense layer. GRUs are recurrent layers that process the input sequence token by token and map these to an internal space that allows contextual information to be preserved. Inputs of different lengths can naturally be processed by this architecture. The 4 convolutional layers, which the authors called “shared encoder,” are assumed to be universal among different prediction tasks, while the recurrent and dense layers (“task-specific tower”) are specific to each task. The shared encoder is therefore trained on multiple data sets, while the task-specific tower is trained only on the respective task. In comparison to Optimus 5-Prime and Framepool, MTtrans provides an increase in \(R^2\) of \(0.015-0.06\) in prediction accuracy, depending on the data set [ 17 ]). Interestingly, training MTtrans on multiple data sets at once rather than in a sequential, task-specific manner, achieved an almost similar effect. While we were able to obtain the code for MTtrans from [ 24 ], we were unable to run the code “out-of-the-box.” Therefore, we set up MTtrans as described in the conference publication [ 17 ] though this left many details unclear, i.e., the exact layout of the “task-specific tower,” its recurrent and dense layers, the number of training epochs, the exact training rate schedule, and criteria for early stopping. It also led to a different number of parameters in our implementation, 776,097, compared to the number reported by the authors \(\sim 2.1\) million. Consequently, we trained with a callback that automatically stops once overfitting is reached and restores the best weights. Although, in our experience, these are details have only a minor impact on the model performance, we note that our results differ to some extent from those reported in [ 17 ]. The use of GRUs in the task-specific tower allows MTtrans to predict output for any 5′UTR length.

While DL models become increasingly more parameter-rich, their performance improves only marginally, leading to a decrease in the gained accuracy per parameter. We were therefore interested in whether the parameter-efficiency of DL models can be improved, i.e., whether the top performance can be achieved with smaller rather than larger models. To address this, we turned to long short-term memory networks (LSTMs), a variety of recurrent neural networks (RNNs) designed to detect and take advantage of long-range dependencies in sequences [ 25 ]. While such dependencies are expected in structured 5′UTRs, LSTMs have not been applied yet to the prediction of translation output. We therefore implemented here two LSTM-based architectures: one operating only on 5′UTR sequences and a second one, TranslateLSTM, operating not only on 5′UTRs but also on the first 100 nts of the associated coding regions and the non-sequential features of the linear model described above. The extended TranslateLSTM allows for factors such as the secondary structure and codon bias in the vicinity of the start codon [ 5 ] to impact the translation output. One-hot-encoded sequences are fed into two bidirectional LSTM layers, the outputs of the second layers are concatenated and sent to dense layer which predicts the output (Fig.  2 D). TranslateLSTM has 268,549–268,552 parameters, while the 5′UTR-only LSTM model has 134,273 parameters. We further note that, depending on the experimental design, not all data sets to which a given model is applied require the same number of parameters. For instance, a data set in which all sequences have the same length like Optimus50 does not require the sequence length as a parameter in TranslateLSTM or the linear model. Similarly, as the first 100 nts of the CDS are the same in all MPRA data sets, the associated parameters are not needed in TranslateLSTM, which reduces the number of parameters to about 50% relative to the full model.

Available DL models do not generalize well across experimental systems

The results of our comprehensive tests of prediction accuracy of all models across multiple data sets are summarized in Fig.  3 A. The most salient result is that differences in performance between DL models applied to a particular data set are much smaller than differences between applications of the same model to distinct data sets. In particular, DL models can be trained on synthetic constructs to predict the output of leave-out constructs, but they cannot be trained well on TE data to predict the translation of endogenous mRNAs (compare lines 1,2,3 and 4,5 in Fig.  3 A). To make sure the test data selection strategy is not the issue, we also tested three different random splits and stratified splits (enforcing similar distributions in test and train sets) for TIN and TE as selection strategies for the test data in HEK 293 cells for Optimus 5-Prime and TranslateLSTM. They showed comparable but mildly worse performance (difference in \(R^2\) around 0.02). The stratified split along the TE-axis performed the worst, whereas random and stratified split along the TIN-axis performed with \(R^2\) values only 0.01 smaller than test data selection based on the TIN, where only highest TIN score transcripts were used for testing. Figure  3 B shows scatter plots of ribosome load predicted by each of the discussed DL architectures against their measured counterparts. It can be clearly seen that OOF uAUGs are strongly inhibiting translation. Moreover, the TranslateLSTM predictions are most uniformly spread around the diagonal, as measured by the sum of the differences between predicted and measured MRL for every model, where we find \(-360\) (translateLSTM) vs. \(-652\) (Optimus 5-Prime) vs. \(-762\) (Framepool) vs. \(-461\) (MTtrans). The size of the training data set is not strictly a limiting factor, because DL models can be trained to some extent on the relatively small DART data set of \(\sim 7'000\) natural yeast 5′UTRs (Fig.  3 A, l. 6). Furthermore, models trained on synthetic 5′UTRs do not predict the TE of endogenous mRNAs measured in the same cell type (see Fig.  3 A ls. 7,8). This reduced performance was previously attributed to the different experimental readout and to the influence of the coding sequence, which is the same in MPRA, but different in TE assays [ 16 ]. To test this, we applied the models trained on human MPRA data to the prediction of MPRA data from yeast and vice versa. This involved not only a very different readout of translation (MRL in human, growth-dependent enrichment of 5′UTRs in yeast) but also entirely different organisms. In both cases, the cross-system performance was substantially higher, \(R^2 = 0.41\) and 0.64 (c.f. Fig.  3 A ls. 9,10) compared to the performance of the model trained on synthetic data in predicting the TE data in the same cell type. Thus, the type of readout is not the main factor behind the reduced predictability of TE data. Another limiting factor, not discussed before, could be the accuracy of the experimental measurements. MPRA and DART-based measurements are very reproducible, with \(R^2 ~ 0.95\) , while the TE estimates much less so ( \(R^2 \approx 0.5\) for the HEK 293 data set, Additional file 1: Fig. S1). Thus, the TE data may be less predictable as it is also more noisy. However, the measurement accuracy is not a factor in the highly reproducible DART experiments, yet models trained on synthetic construct data from yeast could not predict the RRS measured in the DART experiment, also done in yeast. Altogether, these results indicate that synthetic sequences differ substantially from natural, evolved, 5′UTRs, leading to models trained on synthetic data not being able to adequately capture 5′UTR features that are relevant for the translation of endogenous mRNAs. We also applied a transfer-learning strategy to human HEK 293 data, where we first trained the models on the Optimus100 data set, then re-trained the last layer on endogenous data, and finally some epochs of training the entire network on the endogenous data. For the models that did not specify a certain number of training epochs training was terminated automatically by a callback function with patience of 10 epochs. Typically, that lead to \(\sim 30\) epochs of pre-training, \(\sim 50\) epochs re-training the last layer, and \(\sim 15\) epochs of fine-tuning the entire network. The results are displayed in Fig.  3 A l. 13. Applying transfer learning indeed lead to a small performance increase of 0.04 in \(R^2\) .

figure 3

Performance of all evaluated models in different application scenarios ( A ), measured by the Pearson correlation coefficient \(R^2_{\text {Pearson}}\) between experimentally measured and predicted translation output in the test data. Different random splits of the DART data lead to variations in \(R^2\) of \(\lesssim 0.02\) , with the exception of the Framepool model, which had differences of up to 0.16 between splits. The average of the correlation of TE between different replicates for the endogenous HEK 293 and yeast data sets serves as theoretical upper bound on the predictive power of the model, imposed by measurement reproducibility. Values for DART and Optimus50 data sets were taken from the corresponding publications [ 14 , 15 ]. B Correlation of predicted and true ribosome load of four model architectures trained on the Optimus100 data set. OOF uAUGs clearly inhibit translation initiation. TranslateLSTM predicts the most even scattering pattern around the diagonal, as measured by sum of the differences between predicted and measured ribosome load for all transcripts in the test set

As seen above, the yeast-human cross-species prediction accuracy is substantial, indicating that the translation regulatory elements inferred from synthetic constructs in the two species are partially conserved. Given that the cell type in which a model is developed will generally differ from the cell type where model predictions are of interest, we asked whether the TE of human mRNAs are largely similar across human cell lines. We thus generated ribosome footprinting data from the human HepG2 liver cancer cell line and compared the TE inferred from this cell line with those from the HEK 293 cell line data set. The estimates of TE were more reproducible than those from the HEK 293 cells ( \(R^2 = 0.68-0.8\) for replicate experiments). The TEs estimated from HEK 293 were moderately similar to those from HepG2 cells ( \(R^2 = 0.31\) ), especially when considering mRNAs with TIN \(> 70\) ( \(R^2 = 0.44\) ). This indicates that within an organism, transcript-intrinsic properties contribute substantially to the variation in translation output relative to the cellular context. This is a good basis for developing models in model systems, provided that the protocol allows for highly accurate measurements on translation output (Additional file 1: Fig. S2).

Although DL models are not generally benchmarked against simple models with more limited predictive power, this test provides an assessment of parameter-efficiency (gain in predictive power per parameter) as well as insights into model interpretation. Trained on synthetic construct data, the 8-parameter linear model described above could explain as much as 60% of the variance in the respective test sets, which is quite remarkable given the size of the model. In addition, this model could also be trained to some extent on TE measurements of endogenous mRNAs. Strikingly, the accuracy of cross-system predictions of synthetic construct-based DL models is similar to the accuracy of linear models inferred from the respective data sets. This indicates that the conserved mechanisms of translation control learned by the DL architectures from synthetic sequences are represented in a small number of features and that currently available DL architectures are heavily over-parameterized.

Reporter sequences differ in translation-relevant properties from endogenous mRNAs

To further identify the most predictive and conserved features, we inspected the weights learned by the linear model from individual data sets (Additional file 1: Fig. S5A). We found that only the uAUGs, especially those located out-of-frame (OOF) with respect to the mORF, consistently contributed to the prediction of translation output across all systems. OOF uORFs/uAUGs are known to repress translation, by hindering ribosome scanning towards the mORF [ 26 ] and triggering the mechanism of nonsense-mediated mRNA decay [ 27 ]. uAUGs contribute much more to the translation output of human or yeast reporters constructs compared to endogenous mRNAs, which is a reflection of differences in sequence composition between synthetic and natural 5′UTRs (Additional file 1: Fig. S6). To gain further insight into the sequence features learned by the LSTMs, we visualized the contributions (Shapley values) of single nucleotides in test sequences to the output of the LSTM architecture using the SHAP package [ 28 ]. While the inhibitory effect of a uAUG becomes evident for representative sequences from the Optimus50 data set (Additional file 1: Fig. S5C), this is not the case for sequences from the HEK 293 data, where individual nucleotides composing the AUG codon may even have contributions of opposite signs (Additional file 1: Fig. S5D). A superposition of 200 high-TIN sequences from the HEK 293 data set in Additional file 1: Fig. S5E shows position-dependent nucleotide biases that contribute to the translation output of endogenous sequences (with the caveat of a small predictive power in this setting). Specifically, C nucleotides contribute positively when located upstream and in the vicinity of the start codon, while G nucleotides contribute negatively, especially when located at the 5′ end, downstream of the cap. Thus, test examples, the weights of the linear model, and the visualization of the effect of individual nucleotides on the LSTM predictions all suggest that models trained on synthetic sequences will incorrectly weigh the translation-relevant features they learned from these sequences when predicting the output of natural 5′UTRs, leading to reduced prediction accuracy. To illustrate this, we carried out a simulation using the Optimus50 data set: we set aside the 20,000 constructs with highest coverage in mRNA sequencing for testing as before but trained the Optimus 5-Prime model on the subset of remaining constructs that did not contain uAUGs. As shown in Additional file 1: Fig. S5B, the resulting model performs poorly on the test set, specifically on the subset of test sequences that do contain uAUGs. However, the model trained on the entire spectrum of sequences that could, in principle, learn all regulatory elements of translation does not predict the translation output of the DART dataset of natural yeast 5′UTRs lacking uAUGs, see l. 11 of Fig.  3 A. These results demonstrate that the similarity of distributions of translation-relevant features among training and test set are key to the ability of the DL model to generalize. Having undergone extensive selection under a variety of constraints, endogenous 5′UTRs likely accumulated multiple elements that control their translation, elements that are probably not represented among synthetic 5′UTRs. This leads to large differences in performance when models trained on synthetic data are applied to other data sets.

Previous studies reached different conclusions concerning the impact of IF uAUGs on translation [ 15 , 21 , 29 , 30 ]. To clarify this, we determined the relationship between the location of OOF and IF uAUGs in the 5′UTR and the translation output of the mRNAs, in both yeast and human sequences, synthetic or endogenous. To avoid a superposition of effects from multiple uAUGs, we analyzed only constructs with a single uAUG in the 5′UTR. As shown in Additional file 1: Fig. S7A-F, the repressive effect of IF uAUGs increases with their distance from the mORF, while the repressive effect of OOF uAUGs on the translation of synthetic constructs only weakly depends on the position. The data for endogenous mRNAs was too noisy to verify or falsify the trend observed in synthetic data (Additional file 1: Fig. S7E, F). These results indicate that both the frame and the distance of uAUGs with respect to the mORF should be taken into account when predicting their impact on translation.

A more accurate and parameter-efficient DL model to predict the impact of 5′UTR sequence variation on translation

To provide a more accurate model of endogenous mRNA translation, accommodating different constraints on uAUGs and improving parameter-efficiency, we turned to LSTM-based architectures. The two architectures that we implemented, LSTM and TranslateLSTM (see Fig.  2 ) performed similarly on the synthetic data sets, and were more accurate than the other DL models tested. The largest performance gain was reached for RNAs with IF uAUGs, as may be expected from the model’s treatment of sequence context (Additional file 1: Fig. S8). The similar performance of LSTM and TranslateLSTM on synthetic data indicates the LSTM can learn correlates of the non-sequential features represented in TranslateLSTM. However, these features were important for the performance of TranslateLSTM on the endogenous HEK 293 TE data (Fig.  3 A and Additional file 1: Fig. S5A).

To demonstrate the relevance of DL models for interpreting the functional significance of single nucleotide polymorphisms (SNPs), Sample et al. [ 14 ] measured the MRL of constructs with 50 nts-long fragments of natural 5′UTRs as well as of variants with naturally occurring SNPs. TranslateLSTM predicted better the measured MRL of these sequences than Optimus 5-Prime model (Fig.  4 A, B). However, in this experiment, 5′UTR sequences were taken out of their endogenous context, which, as we have shown above, is important for the prediction of translation output and thereby functional impact. Therefore, we sought to improve the prediction of SNP effects on translation taking advantage of the insights provided by our analyses. We used transfer learning (TL) to extract information from both synthetic and endogenous 5′UTRs, and we applied the resulting model to all the 5′UTR-located SNPs from the ClinVar database [ 31 , 32 ], in their native 5′UTR context. 84,128 of the 2,300,005 SNPs were located in 5′UTRs, and of these, 7238 were located in mRNA isoforms (one per gene) expressed and with measured TE in HEK 293. As shown in Fig.  3 A l. 13, the TL strategy leads to better predictions than the training on endogenous data alone and also better than the predictions of other DL models trained by TL. The distribution of log-ratio of predicted translation output of variant and wildtype sequences is shown in Fig.  4 C. One hundred ten of the 7238 variants are predicted to affect the TE by 10-fold or more, 34 increasing and 76 decreasing the TE compared to the wildtype sequence. Interestingly, despite the large predicted impact, none of the 110 SNPs create or destroy an uAUG. However, overall, while absolute numbers of uAUG changes are small (328 of 7238 variants), creation/destruction of an uAUG was associated with a predicted reduction/increase of translation output. Moreover, the pathogenic variants had a small bias for increased TE (Fig.  4 D).

figure 4

Effect of 5′UTR sequence variation on mRNA translation output. A Optimus 5-Prime was trained on a pool of randomized 50nt long sequences and applied to a pool of equally long known variants (see the “ Human genetic variants ” section). Yellow points indicate 5′UTRs with OOF uAUGs; purple points without OOF uAUGs. Same was done for the TranslateLSTM architecture in panel ( B ). C TranslateLSTM was used to predict the TE of known clinical variants of endogenous sequences from the ClinVar database [ 31 ], which were compared to the measured TEs of their wildtype counterparts (see the “ ClinVar data ” section) and obtain predictions of log-fold changes of the translation efficiency (TE LFC). These follow a normal distribution, where a negative TE LFC can be associated with a propensity for the variant to create uAUGs (orange fraction of bars), while positive TE LFC is associated with a propensity of breaking uAUGs (green fraction of bars). D  Clinical variants annotated as  pathogenic (clinical significance annotation: pathogenic, likely pathogenic, risk factor) are predicted to significantly increase the TE compared to variants with neutral phenotype annotation (clinical significance annotation: other, uncertain), whereas variants with benign phenotypes (clinical significance annotation: benign, likely benign, protective, drug response) do not significantly alter the distribution, as demonstrated by Kolmogorov-Smirnov tests

The wider dynamic range of protein compared to mRNA expression suggested an important role of translation control in determining protein levels [ 10 ]. Initiation is the limiting step of translation [ 2 , 6 , 7 ], modulated by a variety of regulatory elements in the 5′UTRs [ 6 , 33 ], from uORFs to internal ribosome entry sites [ 34 , 35 , 36 ]. With the rise in mRNA-based therapies, the interest in designing 5′UTRs to drive specific levels of protein expression has surged [ 14 ], prompting the development of DL models to predict the translation output of mRNAs from the 5′UTR sequence. To satisfy the large data needs of these models, a few groups have devised innovative approaches to measure the translation output of large numbers of constructs, containing either random 5′UTRs or fragments of endogenous sequences [ 14 , 15 , 21 ]. DL models trained on these data achieve impressive prediction performance on leave-out data and are used to identify sequence elements that modulate translation, the most predictive among these being uAUGs/uORFs. However, DL models trained on synthetic data do not predict well the translation output of endogenous mRNAs. In this study, we carried out an extensive comparison of models of translation across multiple data sets and settings, to understand the limits of their applicability and generality.

We took advantage of two systems in which the translation output has been measured for both synthetic and endogenous 5′UTRs, namely yeast [ 6 , 21 ], and HEK 293 cells [ 14 , 20 ]. For yeast, an additional library of \(\sim 12,000\) endogenous 5′UTRs devoid of uAUGs was tested for their ability to recruit ribosomes [ 15 ]. We observed the best performance in the yeast-human cross-prediction of translation output of synthetic constructs, even though the readouts of the assays were very different for the two organisms. This prediction relies on a small number of conserved determinants of translation output, in particular uAUGs, as underscored by the similar performance achieved with an 8-parameter linear model trained on the same data sets. However, models trained on synthetic constructs do not predict the translation output of endogenous mRNAs. The coding region or trans-acting factors do not explain this discrepancy, as demonstrated with the various yeast data sets, where these factors were controlled. Rather, endogenous sequences have been selected in evolution under multiple constraints, not limited to translation output, and have acquired a variety of regulatory elements that are not well-represented in the randomized 5′UTRs. This leads to models trained on synthetic data not having the possibility to learn such features. We could most clearly demonstrate this with a simulation, in which a model trained on sequences lacking uAUGs performed poorly on a data set in which these elements are represented. While in this case the outcome may seem obvious, as uAUGs are important modulators of translation output, there are likely many other elements that are not well represented among synthetic sequences yet affect the translation output in various ways, including for, e.g., by influencing the mRNA stability. All of these factors ultimately contribute to the poor performance of models trained in synthetic 5′UTRs on predicting the translation output of endogenous mRNAs.

The same issues likely compound the prediction of SNP effects. As the genetic variation of human populations is being mapped, DL models are increasingly used to predict various molecular phenotypes, including the translation output of mRNAs [ 14 , 16 ]. Genetic variation is manifested in the native gene expression context, implying that predictions of models trained on synthetic sequences will not be reliable. Given that with TranslateLSTM we were able to explain more of the variance in TE compared to other DL models, we also sought to provide updated predictions of the potential impact of ClinVar variants on TE [ 31 ]. Surprisingly, variants classified as pathogenic are predicted to more often increase than decrease the TE of the respective mRNA, i.e., they tend to be gain-of-function variants. Interestingly, the increase is not generally explained by the removal of a repressive uAUG, as relatively few SNPs changed the number of uAUGs in the 5′UTRs. These 1000 SNPs predicted to most increase the TE came from genes involved in protein and organelle localization (Additional file 1: Tab. S1), predictions that could be tested in a future study.

That \(\sim\) 60% of the variance in MPRA data can be explained with models constructed from such distant species such as yeast and human indicates that the models have learned deeply conserved mechanisms of controlling the translation output. That the simple, 8-parameter linear model, performs almost on par with the DL models in this setting indicates not only that these mechanisms are reflected in a small number of mRNA features but also that the DL models are heavily over-parameterized. Indeed, the cross-species prediction power comes largely from the OOF uAUGs, as demonstrated by the poor performance of the linear model lacking this element. The 5′UTR G/C content and/or free energy of folding appear to be additional conserved regulatory elements, with more prominent role in explaining the translation output of evolved 5′UTRs.

To the extent to which synthetic data sets and DL models are used to uncover molecular mechanisms, it is important to ponder whether the synthetic sequences cover these mechanisms as well as whether the model architecture allows for the appropriate representation of these mechanisms. This is, of course, difficult to ensure a priori, when the mechanisms are unknown. However, an improved grasp of the parameter efficiency of models and their interpretation should facilitate the discovery of regulatory mechanisms and avoid false inferences. For example, CNN type of architectures may be able to encode correlates of RNA secondary structure sufficiently well to predict the translation of short, synthetic 5′UTRs. Yet, the sequence motifs learned by the CNN need not represent as such a regulatory mechanism. Instead, they could reflect long-range secondary structure constraints, which could be more efficiently captured by a different type of representation than the CNN allows.

A main application of DL models trained on synthetic sequences is the design of constructs with desired translation outputs [ 14 ]. While this has been demonstrated within the setting of the MPRA, where randomized 5′UTRs drive the expression of proteins such as eGFP or mCherry, whether the same accuracy can be achieved for endogenous mRNAs of interest remains to be determined [ 14 ] have tested the same 5′UTR library in the context of the eGFP and mCherry coding regions and found that the model trained on the eGFP constructs explains 77–78% of the variance in mCherry expression, contrasting the 93% of the variance explained in eGFP expression. Interestingly, this difference has been attributed, in part, to differences in the polysome profiling protocol [ 14 ]. This points to the importance of establishing robust experimental protocols for generating reference data sets. However, the different coding regions likely contribute to the discrepancy in prediction accuracy as well, underscoring the importance of measuring the same library of constructs in different systems to identify the mechanisms responsible for a specific readout.

In summary, our analysis suggests a few directions for the study of translation control and applications to protein expression. First, to continue to uncover mechanisms that explain the expression of endogenous sequences, it will be important to include these sequences in high-throughput assays. The method of choice for measuring the translation output of endogenous mRNAs is ribosome footprinting, a method that, on its own, is very reproducible ( \(R^2 \gtrsim 0.8\) ). However, factoring in the mRNA-seq-based estimation of mRNA abundance to calculate the TE leads to increased error in the TE estimate. Ensuring high accuracy of mRNA-seq and ribo-seq is important for obtaining reference data sets of TE. An additional limitation of endogenous mRNA translation data is its size. Currently, the number of mRNAs whose TE is estimated in a typical experiment is \(\sim 20,000\) , which corresponds roughly to one isoform per gene. Accurate estimation of the TE of individual isoforms could be an important direction of methodological development [ 37 , 38 ]. However, it is unlikely that many isoforms are simultaneously expressed at a high enough level to be accurately measured in a given cell type or that sufficiently accurate data can be currently obtained from single cells [ 39 ] that express distinct isoforms. As a suboptimal alternative, TE measurements could be obtained in closely related cell types in which sufficient variation of transcription and thereby translation start sites occurs. In terms of training DL models on such data, an important consideration will be to ensure that training and test sets do not contain related sequences, to prevent models from achieving high prediction accuracy simply based on sequence similarity, without learning the regulatory grammar [ 40 ].

Second, towards predicting the impact of SNPs on translation, accurate models of endogenous mRNA expression are needed. As we have seen here, architectures beyond CNNs are desirable, and models used in natural language processing may provide a useful stepping stone. However, it will be interesting to develop architectures that can represent long-range dependencies of RNA secondary structures, perhaps also incorporated co-evolution constraints, as done for protein structure prediction [ 41 , 42 ].

Third, towards the goal of designing constructs with specified translation outputs, it will be important to first determine the range of variation afforded by randomized 5′UTR variants by actually measuring the range of protein expression that can be covered with these variants. If this is sufficient, it will be important to determine the impact of unexplored parameters, such as the cellular context of construct expression and the impact of the coding region downstream of the randomized construct. For the former, the same construct library can be tested in various cell types, especially those that are closest to the cell type in which the mRNAs will be ultimately expressed (e.g., muscle cells for mRNA vaccines) [ 43 ]. Regarding the coding region, it will be interesting to test at least a few that cover the range of endogenous expression, from mRNAs with different life times and codon bias.

To conclude, DL models can be trained to very high precision on synthetic data, irrespective of their architecture. However, so far, synthetic data does not appropriately cover the space of regulatory elements influencing translation initiation. To achieve a comprehensive and predictive model as well as understand translation, training on endogenous sequences is necessary. The main bottleneck at the moment is obtaining sufficient and highly reproducible data on the translation of endogenous mRNAs. Experiments in a single cell type such as a cell line may not yield sufficiently many reliably measured 5′UTRs to train models such as TranslateLSTM. Perhaps this limitation can be circumvented by collecting data from multiple cell types, as they may contain distinct isoforms, with distinct 5′UTRs and translation efficiencies. Such a model could then be used for a broad variety of tasks, such as predicting the effect of point mutations, the translation efficiency of synthetic constructs, and for deepening our mechanistic understanding of translational control.

Experimental methods

We outline the experimental procedure for RNA and ribosome footprint sequencing of HepG2 cells.

Cell culture

The HepG2 cell line was obtained from the laboratory of Dr. Salvatore Piscuoglio (DBM, Basel) and was cultured in Dulbecco’s Modified Eagle Medium (DMEM) containing 4.5 g/l glucose, 10% fetal calf serum, 4 mM L-glutamine, 1X NEAA, 50 U/ml penicillin and 50 µg/ml streptomycin at 5% \(CO_2\) , \(37^{\circ }\textrm{C}\) . Cells were passaged every 3–4 days.

Cell lysis and collection

Cells were grown in 15-cm dishes to achieve a 70–80% confluency. Medium was replenished 3 h prior to cell lysis. Cycloheximide (CHX) was added to a final concentration of 100 µg/ml to arrest elongating ribosomes. Medium was immediately discarded and cells were washed once with ice-cold PBS containing 100 µg/ml CHX. Five hundred microliters of lysis buffer (20 mM Tris-HCl pH 7.5, 100 mM NaCl, 10 mM \(\textrm{MgCl}_2\) , 1% Triton X-100, 2 mM dithiothreitol (DTT), 100 µg/ml CHX, 0.8 U/µl RNasin plus RNase inhibitor (Promega), 0.04 U/µl Turbo DNase (Invitrogen), and EDTA-free protease inhibitor cocktail (Roche)) was added directly to the cells on the Petri dish. Cells were scraped and collected into 1.5 ml tubes. Then, samples were incubated for 5 min at \(4^{\circ }\textrm{C}\) at continuous rotation (60 rpm), passed through a 23G needle for 10 times, and again incubated for 5 min at \(4^{\circ }\textrm{C}\) at continuous rotation (60 rpm). Lysates were clarified by centrifugation at 3000×g for 3 min at \(4^{\circ }\textrm{C}\) . Supernatants were centrifuged again at 10,000×g for 5 min at \(4^{\circ }\textrm{C}\) .

Ribosome footprint sequencing

The ribosome footprinting sequencing protocol was adapted from protocols described in Refs. [ 18 , 19 , 44 ]. An equivalent to 8 OD 260 of lysate was treated with 66 U RNase I (Invitrogen) for 45 min at \(22^{\circ }\textrm{C}\) in a thermomixer with mixing at 1000 rpm. Then, 200 U SUPERase·In RNase inhibitor (20 U/µl, Invitrogen) was added to each sample. Digested lysates were loaded onto 10–50% home-made sucrose density gradients in open-top polyclear centrifuge tubes (Seton Scientific). Tubes were centrifuged at 35,000 rpm (210,100×g) for 3 h at \(4^{\circ }\textrm{C}\) (SW-41 Ti rotor, Beckmann Coulter ultracentrifuge). Samples were fractionated using the Piston Gradient Fractionator (Biocomp Instruments) at 0.75 ml/min by monitoring A260 values. Thirty fractions of 0.37 ml were collected in 1.5 ml tubes, flash frozen, and stored at \(-80^{\circ }\textrm{C}\) . The fractions (typically 3 or 4) corresponding to the digested monosome peak were pooled. RNA was extracted using the hot acid phenol/chloroform method. The ribosome-protected RNA fragments (28–32 nt) were selected by electrophoresis on 15% polyacrylamide urea TBE gels and visualized with SYBR Gold Nucleic Acid Gel Stain (ThermoFisher Scientific). Size selected RNA was dephosphorylated by T4 PNK (NEB) for 1 h at \(37^{\circ }\textrm{C}\) . RNA was purified using the acid phenol/chloroform method. Depletion of rRNA was performed using the riboPOOL kit (siTOOLs biotech) from 433 ng of RNA according to the manufacturer’s instructions. Libraries were prepared using the SMARTer smRNA-Seq Kit for Illumina (Takara) following the manufacturer’s instructions from 15 ng of RNA. Libraries were purified by electrophoresis on 8% polyacrylamide TBE gels and sequenced on the Illumina NextSeq 500 sequencer in the Genomics Facility Basel (Department of Biosystems Science and Engineering (D-BSSE), ETH Zürich).

RNA-sequencing

RNA was extracted from 15 µl of cell lysate using the Direct-zol RNA Microprep Kit (Zymo Research) following the manufacturer’s instructions and including DNase treatment for 15 min at room temperature. Samples were eluted with 15 µl nuclease-free water. The RNA integrity numbers (RIN) of the samples were between 9.9 and 10.0, measured using High Sensitivity RNA ScreenTape (TapeStation system, Agilent). RNA was quantified using a Qubit Flex fluorometer (Thermo Fisher Scientific). Libraries were prepared using the SMART-seq Stranded for total RNA-seq kit (Takara) from 5 ng of RNA and sequenced on the Illumina NextSeq 500 sequencer in the Genomics Facility Basel (Department of Biosystems Science and Engineering (D-BSSE), ETH Zürich).

The data sets used in this study are as follows.

Constructs consisted in 25 nts of identical sequence (for PCR amplification) followed by a 50-nt-long random 5′UTR sequence upstream of the GFP coding region. Their sequences and associated mean ribosome load measurements were obtained from the GEO repository, accession number GSE114002 [ 45 ]. Non-sequential features were computed and annotated for each sequence with a python script. The normalized 5′UTR folding energy was determined with the RNAfold program from the ViennaRNA package [ 46 ]. The G/C-fraction was calculated using the biopython package [ 47 ]. Number of OOF and IF uAUGs were calculated with standard python methods. ORF/UTR length and number of exons were identical in this data set and therefore uninformative. Following [ 14 ], we split the 20,000 5′UTRs with the highest coverage in mRNA seq for testing and kept the rest for training.

Constructs were made from random sequences, human 5′UTRs of suitable size (25–100 nts), their single nucleotide polymorphism-containing variants, and 3′-terminal fragments of longer 5′UTRs. MRL measurements were done as for the Optimus50 data set. Sequences and associated MRL estimates were obtained from the GEO repository, accession number GSE114002 [ 45 ]. The non-sequential features were computed just as for Optimus50, with the UTR length being an additional degree of freedom. The 5000 5′UTRs with highest coverage in mRNA-seq are held out for testing, just as in [ 14 ].

Human genetic variants

Sample et al. [ 14 ] extracted 3577 5′UTR SNPs from the ClinVar database [ 31 ] and constructed variant 5′UTRs containing these SNPs. These variants were transfected to HEK 293 cells, and the respective MRL was measured as described in the paragraph about Optimus50. We also appended non-sequential features as outlined there, with the UTR length as an additional variable. The sequences and MRL were downloaded from GEO repository GSE114002 [ 45 ].

Yeast colonies were grown in media without HIS3 . Yeast cells were transduced with plasmids containing the HIS3 -ORF attached to a random pool of \(\sim 500,000\) randomized 50-nt-long 5′UTRs. The growth rate is directly controlled by the amount of HIS3 protein, which only is controlled by the 5′UTR sequence. The data were obtained from GEO, accession number GSE104252 [ 48 ]. The calculation of non-sequential features followed the exact same procedure as for Optimus50. The top 5% 5′UTRs in terms of read coverage were used for testing.

We downloaded the training data from Suppl. Tab. S2 of [ 15 ]. Non-sequential features were calculated as for Optimus50. Since uAUGs are mutated in this data set to avoid ambiguity in the translation start site, we did not include the number of OOF or IF uAUGs in the list of non-sequential features to learn from. Also, DART uses a luciferase reporter only including the first bit of the coding sequence, so neither the number of exons nor the CDS length are meaningful; therefore, we did not include these features, either. The first bit of the CDS sequence is available as a separate column in their Suppl. Tab. S2. We use three different random splits of 10% of the data for testing.

Human mRNA sequences

The human transcript sequences were pulled from ENSEMBL [ 49 ] with pybiomart. We use the GRCh38.105 annotation and the GRCh38.dna_sm.primary_assembly.fa primary assembly file. The human transcriptome sequences were assembled with gffread version 0.12.7 [ 50 ].

Yeast mRNA sequences

We used the R64.1.1 yeast genome [ 51 ] with the R64.1.1.110 annotation from the Saccharomyces cerevisiae Genome Database (SGD). We enriched this annotation with the longest annotated transcript from TIF-seq, see [ 52 ], providing us with 5′UTR sequences. Gffread [ 50 ] yielded the yeast transcriptome.

Yeast TE data

We used ribosome footprinting (GSM2278862, GSM2278863, GSM2278864) and RNA sequencing data (GSM2278844, GSM2278845, GSM2278846) from the control experiments performed in [ 19 ], downloaded from the European Nucleotide Archive, accession PRJNA338918 [ 53 ]. The riboseq analysis was conducted as in [ 54 ]; the RNA-seq analysis was performed using zarp [ 55 ]. All non-sequential features (log ORF length, UTR length, G/C-content fraction of the UTR, number of exons, number of OOF uAUGs, number of IF uAUGs, normalized 5′UTR folding energy) were computed or extracted from the genome annotation. The 10% of transcripts with the highest TIN were used for testing purposes.

HEK 293 TE data

Ribo-seq and mRNA-seq data were obtained from the European Nucleotide Archive, accession PRJNA591214 [ 56 ]. The riboseq analysis was conducted as in [ 54 ]; the RNA-seq analysis was performed as in [ 55 ]. For the calculation of the translation efficiency, we only took into account RNA-seq and ribo-seq reads in the CDS, not on the entire transcript. For stringency in the attribution of reads to mRNAs, we calculated relative isoform abundances by running salmon [ 57 ] on the RNA-seq samples and selected the most abundant isoform as representative, to which we mapped the RNA and ribo-seq reads. The 10% of transcripts with the highest TIN (squared average over the three replicates) were used for testing.

HepG2 TE data

We followed the experimental procedure outlined in the experimental methods. The rest of the analysis was done as for the HEK 293 TE data. The data was deposited in the European Nucleotide Archive under accession PRJNA1045106 [ 58 ].

ClinVar data

We downloaded the ClinVar data base vcf file ( vcf_GRCh38 [ 32 ]). With bedtools-intersect (v 2.30) [ 59 ], we identified variants from ClinVar in annotated genes and only kept variants of annotated 5′UTRs. With a python script, we calculated the coordinates of the polymorphisms on all affected transcripts. Then, we constructed the variant 5′UTRs in the human transcriptome (created with gffread [ 50 ] from the GRCh38.105 ENSEMBL annotation) and extracted the coding regions. This left us with 84,127 mutated transcripts. Next, we computed the non-sequential features as for Optimus50. We predicted the variant TE with the transfer-learning version of TranslateLSTM (trained on human endogenous HEK 293, pre-trained on the Optimus100 data set). Matching the transcript variants and predictions to transcripts for which we have TE measurements left us with 7238 transcripts.

Model architectures

We implemented previous published models that predict the translation output from 5′UTR sequences according to the their description in the respective studies. We used tensorflow 2.10, along with cuda toolkit 11.8.0 on NVIDIA titanx GPUs with 12GB of graphics memory.

Optimus 5-Prime

Optimus 5-Prime was the first neural network trained to predict translation initiation efficiency [ 14 ]. It consist of three convolutional layers with 120 8nt-long filters each. They all feature a relu activation function and are succeeded by two dense layers, one reducing the input dimensionality to 40 with another relu nonlinearity, and a last dense layer reducing to a single number. The two last layers are separated by a dropout layer that stochastically ignores 20% of the input signals during training. The configuration allowing predictions for 5′UTR s up to 100 nts in length has 714,681 parameters. Two different configurations of Optimus 5-Prime were proposed: one trained of a pool of \(\sim 280,000\) 5′UTR sequences of 50 nts and another trained on a pool of \(\sim 105,000\) 5′UTRs of 25–100 nts. Variable lengths were handled by anchoring the 5′UTR at their 3′end (adjacent to the start codon) and padding the 5′ end with 0s in the one-hot encoded representation. Since endogenous 5′UTR s vary widely in length, we used the latter configuration and data set, considering it to be more realistic. However, the size of the model is also larger. To run the Optimus models on the local scientific computing infrastructure, the model training was re-implemented in a python script, rather than a jupyter notebook as in the git repository cited in [ 14 ].

Framepool [ 16 ] technically overcomes the limitation on 5′UTR length. While also relying on convolutional layers and a final two-layer perceptron, Framepool introduced an operation called “framewise pooling.” This was motivated by previous observations that out-of-frame uORFs have a strong impact on the translation output. Framewise pooling involves the pooling of output of convolutional layers separately for the +0, +1, and +2 frames. The subsequent multi-layer perceptron (MLP) takes as input the average and the maximum of each of the three pools (per convolutional filter). This makes the input of the final MLP independent of the input length and allows for varying UTR lengths from a technical standpoint. Trained on the same data sets as Optimus 5-Prime, the performance on data of varying UTR length was increased. The number of parameters in Framepool is only about a third of what Optimus 5-Prime requires for UTR lengths \(\le 100\)  nt, namely 282,625 parameters. We pulled Framepool from the git repository referenced in [ 16 ]. A python script related the model to our format of input data.

While CNNs are generally not a natural choice when it comes to modeling sequences of variable length, recurrent neural networks (RNNs) were developed exactly for this purpose. Conventional RNNs suffer from the so-called vanishing gradient problem, whereby memory of distant context is lost. Moreover, they can only memorize the left-side context, since they process sequences from left to right. These problems are solved by long-short term memory units (LSTM) [ 25 ] and bidirectional layers. However, as there is no correspondence between output cells and position in the sequence, the interpretability of this type of model is more challenging. MTtrans [ 17 ] has been proposed as an attempt to get the best of both CNN and LSTM worlds. It follows the general idea of detecting motifs, with four convolutional layers stacked on top of each other, batch normalization, L2 regularization, and dropout layers in between to avoid over-fitting and ensure stability. This component of MTtrans is called “shared encoder” and is followed by two bidirectional gated recurrent unit (GRU) [ 60 ] layers and two dense layers to make the final prediction. GRUs are quite similar to LSTMs, but they do not feature an output-gate [ 25 , 60 ] and therefore have fewer weights to adjust than LSTM layers. This second component of MTtrans is called “task-specific tower,” because it is re-trained for each data set (task), while the encoder is shared across all tasks. By training the encoder on data sets of different organisms and cells, the authors aim to capture general features of translation that apply to all of the studied systems. This is an example of transfer learning, hence the “trans” in the name MTtrans. MTtrans appears to be considerably bigger than its two predecessors, with \(\sim 2,100,000\) parameters. A re-evaluation of the results in [ 17 ] was unfortunately not possible since the code in the provided github repository was still work in progress. Therefore, we attempted reconstructing MTtrans in our own ML framework, but will quote the numbers reported in [ 17 ], wherever available.

TranslateLSTM

The start of the coding region has a biased nucleotide composition that also plays a role in translation initiation ( c.f. [ 61 ]). Putting the first 100 nts into another bidirectional LSTM model therefore provides additional information about initiation likelihood. These three models, bidirectional LSTM for 5′UTRs, bidirectional LSTM for beginning of ORF, and non-sequential features, can now be concatenated into a big model. There is, of course, a lot of redundance in these inputs, as the folding energy of the 5′UTR is determined by its nucleotide sequence, GC-content, and length of the 5′UTR. One way to mitigate this redundance is to use high dropout rates after the final bidirectional LSTM layer of both RNNs (5′UTR and ORF). For training from scratch, we used dropout rates of 0.8 for the 5′UTR model and 0.2 for the CDS model. After concatenating the numerical input data with the two bidirectional LSTMs, a final dense layer computes a single number, the logarithm of the translation efficiency, scaled to a Gaussian distribution with unit variance and expectation value 0. The network was implemented in python, using the keras API with a tensorflow backend [ 62 , 63 ]. We used the adam algorithm [ 64 ] for training, with the default learning rate of 0.001 that proved to superior even to more sophisticated learning rate schedules. Beyond dropout layers, overfitting is prevented by imposing an early stopping criterion. To this end, we used a keras callback object. This object monitors the validation loss an terminates training once it sees a consistent increase in the validation loss over a given number of epochs (“patience” parameter). We set the patience to 10 epochs and restored the best weights within these 10 epochs after termination. Of the randomly reshuffled training data, 20% serve validation purposes. To further improve the performance of the model, we pretrained it on the variable-length Optimus100 data set before training on the endogenous data. In that scenario, we used slightly lower dropout rates for the 5′UTR LSTM of 0.5.

Linear models

As the translation initiation efficiency was reported to be explained, to a large extent, by a small number of mRNA features [ 6 ], we have included in our study two variants of a small linear model. The features were as follows. First, upstream open reading frames (uORFs) were reported in many studies to reduce the initiation of translation at the main ORF [ 65 ]. The effect was found to be largely due to uORFs that are in a different frame than the main ORF, which we have referred to as “out-of-frame ORFs” or “out-of-frame AUGs,” because the presence and position of stop codons matching these ORFs is not generally considered. Thus, one of the linear models included the numbers of out-of-frame and in-frame AUGs, while the other only the former. The secondary structure of 5′ cap-proximal region of the mRNA is known to interfere with the binding of the eIF4F cap-binding-complex [ 5 ], and thus a weak positive correlation has been observed between the free energy of folding of the first 80 5′UTR nts and the translation initiation efficiency of yeast mRNAs [ 6 , 7 , 8 ]. A more minor impact on yeast translation has also been attributed to the ORF length (negative correlation with TIE) [ 6 , 66 ], 5′UTR length, and G/C content [ 6 ]. For human cells, the number of exon-exon junctions has also been linked to TIE [ 23 ]. Additional file 1: Fig. S6 shows density plots of these parameters, comparing the major data sets we used, i.e., the three MPRA data sets, DART, and the two endogenous ones.

The linear models are of compelling simplicity: they only have as many parameters as features they cover, plus a single global bias term. For instance, the linear model describing the Optimus50 data set consists of weights multiplying the normalized 5′UTR folding energy, the G/C-content, the number of IF and OOF upstream AUGs, and the bias term, totaling to 5 parameters.

Availability of data and materials

Sequencing data from ribosome footprinting and RNA sequencing in the HepG2 cell line are available under BioProject ID PRJNA1045106 [ 58 ]. The evaluated clinical variants from the clinvar data base are attached in Additional File 1: Tab. S1. MPRA measurements in HEK 293 cells from [ 14 ] are publicly available from GEO repository GSE114002 [ 45 ], MPRA measurements in yeast (see [ 21 ]) GSE104252 [ 48 ]. DART measurements from yeast data are available from Supp. Tab. 2 of [ 15 ]. Yeast RNA sequencing and ribosome footprinting data from [ 19 ] were retrieved from the European Nucleotide Archive under accession number PRJNA338918 [ 53 ]. RNA sequencing and ribosome footprinting data from HEK 293 cells (c.f. [ 20 ]) for this study were downloaded from the European Nucleotide Archive under accession number PRJNA591214 [ 56 ].

Code availability

The code for TranslateLSTM and data-preprocessing is publicly available under the MIT license in the github repository [ 67 ], with a Zenodo version at the time of publication under [ 68 ]. The scripts to create the plots are available under [ 69 ].

Galloway A, Cowling VH. mRNA cap regulation in mammalian cell function and fate. Biochim Biophys Acta Gene Regul Mech. 2019;1862(3):270–9.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ingolia NT, Ghaemmaghami S, Newman JR, Weissman JS. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009;324(5924):218–23.

de Smit MH, van Duin J. Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. Proc Natl Acad Sci U S A. 1990;87(19):7668–72.

Article   PubMed   PubMed Central   Google Scholar  

Kozak M. An analysis of 5’-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res. 1987;15(20):8125–48.

Kertesz M, Wan Y, Mazor E, Rinn JL, Nutter RC, Chang HY, et al. Genome-wide measurement of RNA secondary structure in yeast. Nature. 2010;467(7311):103–7.

Article   CAS   PubMed   Google Scholar  

Weinberg DE, Shah P, Eichhorn SW, Hussmann JA, Plotkin JB, Bartel DP. Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation. Cell Rep. 2016;14(7):1787–99.

Shah P, Ding Y, Niemczyk M, Kudla G, Plotkin JB. Rate-limiting steps in yeast protein translation. Cell. 2013;153(7):1589–601.

Godefroy-Colburn T, Ravelonandro M, Pinck L. Cap accessibility correlates with the initiation efficiency of alfalfa mosaic virus RNAs. Eur J Biochem. 1985;147(3):549–52.

Loo LS, Cohen RE, Gleason KK. Chain mobility in the amorphous region of nylon 6 observed under active uniaxial deformation. Science. 2000;288(5463):116–9.

Schwanhaeusser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, et al. Global quantification of mammalian gene expression control. Nature. 2011;473(7347):337–42.

Article   CAS   Google Scholar  

Tierney J, Swirski M, Tjeldes H, Carancini G, Kiran A, Michel A, et al. RiboSeq.Org. 2016. https://riboseq.org . Accessed July 2024.

Li GW, Burkhardt D, Gross C, Weissman JS. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell. 2014;157(3):624–35.

Bohlen J, Fenzl K, Kramer G, Bukau B, Teleman AA. Selective 40S footprinting reveals cap-tethered ribosome scanning in human cells. Mol Cell. 2020;79(4):561–74.

Sample PJ, Wang B, Reid DW, Presnyak V, McFadyen IJ, Morris DR, et al. Human 5’ UTR design and variant effect prediction from a massively parallel translation assay. Nat Biotechnol. 2019;37(7):803–9.

Niederer RO, Rojas-Duran MF, Zinshteyn B, Gilbert WV. Direct analysis of ribosome targeting illuminates thousand-fold regulation of translation initiation. Cell Syst. 2022;13(3):256–64.

Karollus A, Avsec Z, Gagneur J. Predicting mean ribosome load for 5’UTR of any length using deep learning. PLoS Comput Biol. 2021;17(5):e1008982.

Zheng W, Fong JHC, Wan YK, Chu AHY, Huang Y, Wong ASL, et al. Translation rate prediction and regulatory motif discovery with multi-task learning. In: Tang H, editor. Research in Computational Molecular Biology. Cham: Springer Nature Switzerland; 2023. pp. 139–54.

Ingolia NT, Brar GA, Rouskin S, McGeachy AM, Weissman JS. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat Protoc. 2012;7(8):1534–50.

Mittal N, Guimaraes JC, Gross T, Schmidt A, Vina-Vilaseca A, Nedialkova DD, et al. The Gcn4 transcription factor reduces protein synthesis capacity and extends yeast lifespan. Nat Commun. 2017;8(1):457.

Alexaki A, Kames J, Hettiarachchi GK, Athey JC, Katneni UK, Hunt RC, et al. Ribosome profiling of HEK293T cells overexpressing codon optimized coagulation factor IX. F1000Res. 2020;9:174.

Cuperus JT, Groves B, Kuchina A, Rosenberg AB, Jojic N, Fields S, et al. Deep learning of the regulatory grammar of yeast 5’ untranslated regions from 500,000 random sequences. Genome Res. 2017;27(12):2015–24.

Wang L, Nie J, Sicotte H, Li Y, Eckel-Passow JE, Dasari S, et al. Measure transcript integrity using RNA-seq data. BMC Bioinformatics. 2016;17:58.

Nott A, Le Hir H, Moore MJ. Splicing enhances translation in mammalian cells: an additional function of the exon junction complex. Genes Dev. 2004;18(2):210–22.

Ho JWK. MTtrans. Github. 2022. https://github.com/holab-hku/MTtrans . Accessed Aug 2023.

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

Li K, Kong J, Zhang S, Zhao T, Qian W. Distance-dependent inhibition of translation initiation by downstream out-of-frame AUGs is consistent with a Brownian ratchet process of ribosome scanning. Genome Biol. 2022;23(1):254.

Russell PJ, Slivka JA, Boyle EP, Burghes AHM, Kearse MG. Translation reinitiation after uORFs does not fully protect mRNAs from nonsense-mediated decay. RNA. 2023;29(6):735–44.

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al., editors. Advances in Neural Information Processing Systems. vol. 30. Red Hook: Curran Associates, Inc.; 2017. p. 4765–74.

Nikolados EM, Wongprommoon A, Aodha OM, Cambray G, Oyarzún DA. Accuracy and data efficiency in deep learning models of protein expression. Nat Commun. 2022;13(1):7755.

May GE, Akirtava C, Agar-Johnson M, Micic J, Woolford J, McManus J. Unraveling the influences of sequence and position on yeast uORF activity using massively parallel reporter systems and machine learning. Elife. 2023;12:e69611.

Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–7.

Landrum MJ, Chitipiralla S, Brown GR, Chen C, Gu B, Hart J, et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 2020;48(D1):D835–44.

Riba A, Di Nanni N, Mittal N, Arhné E, Schmidt A, Zavolan M. Protein synthesis rates and ribosome occupancies reveal determinants of translation elongation rates. Proc Natl Acad Sci U S A. 2019;116(30):15023–32.

Hinnebusch AG. Translational regulation of yeast GCN4. A window on factors that control initiator-trna binding to the ribosome. J Biol Chem. 1997;272(35):21661–4.

Jang SK, Kräusslich HG, Nicklin MJ, Duke GM, Palmenberg AC, Wimmer E. A segment of the 5’ nontranslated region of encephalomyocarditis virus RNA directs internal entry of ribosomes during in vitro translation. J Virol. 1988;62(8):2636–43.

Pelletier J, Sonenberg N. Internal initiation of translation of eukaryotic mRNA directed by a sequence derived from poliovirus RNA. Nature. 1988;334(6180):320–5.

Weber R, Ghoshdastider U, Spies D, Duré C, Valdivia-Francia F, Forny M, et al. Monitoring the 5’UTR landscape reveals isoform switches to drive translational efficiencies in cancer. Oncogene. 2023;42(9):638–50.

Calviello L, Mukherjee N, Wyler E, Zauber H, Hirsekorn A, Selbach M, et al. Detecting actively translated open reading frames in ribosome profiling data. Nat Methods. 2016;13(2):165–70.

VanInsberghe M, van den Berg J, Andersson-Rolf A, Clevers H, van Oudenaarden A. Single-cell Ribo-seq reveals cell cycle-dependent translational pausing. Nature. 2021;597(7877):561–5.

Riba A, Emmenlauer M, Chen A, Sigoillot F, Cong F, Dehio C, et al. Explicit modeling of siRNA-dependent on- and off-target repression improves the interpretation of screening results. Cell Syst. 2017;4(2):182–93.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need. 2017. http://arxiv.org/abs/1706.03762 .

Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9.

Tharakan R, Ubaida-Mohien C, Piao Y, Gorospe M, Ferrucci L. Ribosome profiling analysis of human skeletal muscle identifies reduced translation of mitochondrial proteins with age. RNA Biol. 2021;18(11):1555–9.

Hornstein N, Torres D, Das Sharma S, Tang G, Canoll P, Sims PA. Ligation-free ribosome profiling of cell type-specific translation in the brain. Genome Biol. 2016;17(1):149.

Sample PJ, Wang B, Seelig G. Human 5’ UTR design and variant effect prediction from a massively parallel translation assay. GSM3130435, GSM4084997, GSM3130443. Gene Expression Omnibus; 2018. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114002 . Accessed Sept  2021.

Lorenz R, Bernhart SH, Zu Siederdissen CH, Tafer H, Flamm C, Stadler PF, et al. ViennaRNA Package 2.0. Algorithms Mol Biol. 2011;6:26.

Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3.

Cuperus J, Groves B, Kuchina A, Rosenberg AB, Jojic N, Fields S, et al. Learning the regulatory grammar of yeast 5’ untranslated regions from a large library of random sequences. GSM2793751. Gene Expression Omnibus; 2017. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE104252 . Accessed Sept 2021.

Martin FJ, Amode MR, Aneja A, Austine-Orimoloye O, Azov AG, Barnes I, et al. Ensembl 2023. Nucleic Acids Res. 2023;51(D1):D933–41.

Pertea G, Pertea M. GFF utilities: GffRead and GffCompare. F1000Res. 2020;9.

Liachko I, Youngblood RA, Keich U, Dunham MJ. High-resolution mapping, characterization, and optimization of autonomously replicating sequences in yeast. Genome Res. 2013;23(4):698–704.

Pelechano V, Wei W, Steinmetz LM. Extensive transcriptional heterogeneity revealed by isoform profiling. Nature. 2013;497(7447):127–31.

Mittal N. The Gcn4 transcription factor reduces protein synthesis capacity and extends yeast lifespan. GSM2278862, GSM2278863, GSM2278864, GSM2278844, GSM2278845, GSM2278846. Bioproject; 2016. https://www.ncbi.nlm.nih.gov/bioproject/?term=338918 . Accessed Apr 2024.

Banerjee A, Ataman M, Smialek MJ, Mookherjee D, Rabl J, Mironov A, et al. Ribosomal protein RPL39L is an efficiency factor in the cotranslational folding of proteins with alpha helical domains. Nucleic Acids Res. 2024:gkae630. https://doi.org/10.1093/nar/gkae630 .

Katsantoni M, Gypas F, Herrmann CJ, Burri D, Bak M, Iborra P, et al. ZARP: an automated workflow for processing of RNA-seq data. bioRxiv. 2021. https://doi.org/10.1101/2021.11.18.469017 .

Alexaki A. Ribosome profiling of HEK-293T cells stably expressing wild-type and codon optimized coagulation factor IX. Bioproject; 2019. https://www.ncbi.nlm.nih.gov/bioproject/591214 . Accessed Sept 2023.

Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14(4):417–9.

Schlusser N, Gonzalez A, Pandey M, Zavolan M. Current limitations in predicting mRNAtranslation with deep learning models. Bioproject; 2023. https://www.ncbi.nlm.nih.gov/bioproject/?term=1045106 . Accessed July 2024.

Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.

Cho K, van Merrienboer B, Gülçehre Ç, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR. 2014;abs/1406.1078. http://arxiv.org/abs/1406.1078 . Accessed July 2024.

Archer SK, Shirokikh NE, Beilharz TH, Preiss T. Dynamics of ribosome scanning and recycling revealed by translation complex profiling. Nature. 2016;535(7613):570–4.

Chollet F, et al. Keras. 2015. https://keras.io . Accessed Aug 2021.

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: large-scale machine learning on heterogeneous systems. 2015. Software available from tensorflow.org. https://www.tensorflow.org/ . Accessed Aug 2021.

Kingma DP, Ba J. Adam: a method for stochastic optimization. 2017. https://arxiv.org/abs/1412.6980 . Accessed Aug 2021.

Zur H, Tuller T. New universal rules of eukaryotic translation initiation fidelity. PLoS Comput Biol. 2013;9(7):e1003136.

Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D. Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 2003;100(7):3889–94.

Schlusser N. predicting-translation-initiation-efficiency. GitHub; 2024. https://git.scicore.unibas.ch/zavolan_group/data_analysis/predicting-translation-initiation-efficiency/-/tree/main?ref_type=heads .

Schlusser N. Version of GitHub repository: predicting-translation-initiation-efficiency. Zenodo; 2024. https://doi.org/10.5281/zenodo.13133725 .

Schlusser N. Plot scripts for: Current limitations in predicting mRNA translation with deep learning models. Zenodo; 2024. https://doi.org/10.5281/zenodo.10463090 .

Download references

Acknowledgements

We would like to thank Aleksei Mironov for providing us with yeast 5′UTR annotations and, along with Meric Ataman, for helpful discussions.

Review history

The review history is available as Additional file 2.

Peer review information

Andrew Cosgrove was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Open access funding provided by University of Basel. This work has been supported by the Swiss National Science Foundation grant #310030_204517 to M.Z. Calculations were performed at sciCORE ( http://scicore.unibas.ch/ ) scientific computing core facility at University of Basel.

Author information

Authors and affiliations.

Biozentrum, University of Basel, Spitalstrasse 41, 4056, Basel, Switzerland

Niels Schlusser, Asier González, Muskan Pandey & Mihaela Zavolan

Departament de Bioquímica i Biologia Molecular and Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Spain

Asier González

Current address: Institute of Molecular Biology and Biophysics, Department of Biology, ETH Zurich, 8093, Zurich, Switzerland

Muskan Pandey

You can also search for this author in PubMed   Google Scholar

Contributions

M.Z. and N.S. conceived the study. N.S. implemented the models and carried out the analysis with input from M.Z. M.P. and A.G. provided experimental data.

Corresponding authors

Correspondence to Niels Schlusser or Mihaela Zavolan .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

13059_2024_3369_moesm1_esm.zip.

Additional file 1: Contains Supplementary Figures S1 - S8 as well as the description of Supplementary Table S1. For the sake of readability, we did not display all 7238 lines in the additional file. Instead, the table is provided extra as supp_tab_1.tsv .

Additional file 2. Contains the review history

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Schlusser, N., González, A., Pandey, M. et al. Current limitations in predicting mRNA translation with deep learning models. Genome Biol 25 , 227 (2024). https://doi.org/10.1186/s13059-024-03369-6

Download citation

Received : 26 December 2023

Accepted : 07 August 2024

Published : 20 August 2024

DOI : https://doi.org/10.1186/s13059-024-03369-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Translation control
  • Deep learning
  • Explainable AI
  • Systems biology

Genome Biology

ISSN: 1474-760X

research problem article

IMAGES

  1. How To Identify The Problem In A Research Study

    research problem article

  2. How to Formulate a Research Problem: Useful Tips

    research problem article

  3. 🌱 How to write a research problem statement. How to Write a Problem

    research problem article

  4. How To Identify A Problem Statement In A Research Article

    research problem article

  5. Research Problem Statement Examples : Welcome to the Purdue OWL

    research problem article

  6. (PDF) Identifying and Stating the Problem through the Use of a Research

    research problem article

COMMENTS

  1. What is a Research Problem? Characteristics, Types, and Examples

    A research problem is at the heart of scientific inquiry. It guides the trajectory of an investigation, helping to define the research scope and identify the key questions that need to be answered. Read this detailed article to know more about what is a research problem, types, key characteristics, and how to define a research problem, with examples

  2. How to Define a Research Problem

    The type of research problem you choose depends on your broad topic of interest and the type of research you think will fit best. This article helps you identify and refine a research problem. When writing your research proposal or introduction, formulate it as a problem statement and/or research questions.

  3. (PDF) Identifying and Formulating the Research Problem

    Abstract. The first and most important step of a rese arch is formulation of research problems. It is like. the foundation of a building to be constructed. To solve a problem someone has to know ...

  4. 1. Choosing a Research Problem

    The research problem, therefore, is the main organizing principle guiding the analysis of your research. The problem under investigation establishes an occasion for writing and a focus that governs what you want to say. It represents the core subject matter of scholarly communication and the means by which scholars arrive at other topics of ...

  5. The Research Problem/Question

    A research problem is a definite or clear expression [statement] about an area of concern, a condition to be improved upon, a difficulty to be eliminated, or a troubling question that exists in scholarly literature, in theory, or within existing practice that points to a need for meaningful understanding and deliberate investigation. A research ...

  6. Identifying a Research Problem: A Step-by-Step Guide

    A compelling research problem not only captivates the attention of your peers but also lays the foundation for impactful and meaningful research outcomes. Initial Steps to Identification. To identify a research problem, you need a systematic approach and a deep understanding of the subject area. Below are some steps to guide you in this process:

  7. The Research Problem & Problem Statement

    A research problem can be theoretical in nature, focusing on an area of academic research that is lacking in some way. Alternatively, a research problem can be more applied in nature, focused on finding a practical solution to an established problem within an industry or an organisation. In other words, theoretical research problems are motivated by the desire to grow the overall body of ...

  8. PDF Identifying a Research Problem and Question, and Searching Relevant

    esearch question for a study, depending on the complex-ity and breadth of your proposed work. Each question should be clear and specific, refer to the problem or phenomenon, reflect an inter. ention in experimental work, and note the target population or participants (see Figure 2.1). Identifying a research question will provide greater focus ...

  9. What is a Problem Statement in Research? How to Write It with Examples

    A research problem statement is the descriptive statement which conveys the issue a researcher is trying to address through the study with the aim of informing the reader the context and significance of performing the study at hand. The research problem statement is crucial for researchers to focus on a particular component of a vast field of ...

  10. Research Problem

    Applications of Research Problem. Applications of Research Problem are as follows: Academic research: Research problems are used to guide academic research in various fields, including social sciences, natural sciences, humanities, and engineering. Researchers use research problems to identify gaps in knowledge, address theoretical or practical problems, and explore new areas of study.

  11. Research Problem

    Here, in this article, we explore a research problem in a dissertation or an essay with some research problem examples to help you better understand how and when you should write a research problem. "A research problem is a specific statement relating to an area of concern and is contingent on the type of research. Some research studies focus ...

  12. How To Formulate A Research Problem

    Difference Between a Research Problem and a Research Topic. Research Problem: A research problem is a specific issue, gap, or question that requires investigation and can be addressed through research. It is a clearly defined and focused problem that the researcher aims to solve or explore. The research problem provides the context and ...

  13. How to Write an Effective Problem Statement

    The problem statement usually appears at the beginning of an article, making it one of the first things readers encounter. An excellent problem statement not only explains the relevance and importance of the research but also helps readers quickly determine if the article aligns with their interests by clearly defining the topic.

  14. Identifying a Research Problem

    A research problem is a specific issue or gap in existing knowledge that you aim to address in your research.You may look for practical problems aimed at contributing to change or theoretical problems aimed at expanding knowledge. Some research will do both of these things, but usually, the research problem focuses on one or the other.

  15. Full article: Research Problems and Hypotheses in Empirical Research

    A research problem can be broadly or narrowly formulated, and a broad problem will in general require more resources. Note that the first initial-need subcriterion equals the first need subcriterion for produced knowledge claims, that the second initial-need subcriterion refers to the before-study knowledge space, and that the second need ...

  16. Research Problems: How to Identify & Resolve

    A research problem has two essential roles in setting your research project on a course for success. 1. They set the scope. The research problem defines what problem or opportunity you're looking at and what your research goals are. It stops you from getting side-tracked or allowing the scope of research to creep off-course.

  17. (PDF) Research Problem

    The research problem is. typically formulated based on gaps or deficiencies in existing knowledge, unresolved. questions, pr actical concerns, or emerging issues within a particular field or ...

  18. How To Define a Research Problem in 6 Steps (With Types)

    5. Select and include important variables. A clear and manageable research problem typically includes the variables that are most relevant to the study. A research team summarizes how they plan to consider and use these variables and how they might influence the results of the study. Selecting the most important variables can help the study's ...

  19. Q: How do I identify a research problem and properly state it?

    The problem statement is a crystallization - a focused expression - of the research problem. A good problem statement will do the following: Describe the problem (s) succinctly. Include a vision (solution) Suggest a method to solve the problem (s) Provide a hypothesis. Again, here is an excellent detailed article, with multiple examples and ...

  20. 10 Research Question Examples to Guide your Research Project

    The first question asks for a ready-made solution, and is not focused or researchable. The second question is a clearer comparative question, but note that it may not be practically feasible. For a smaller research project or thesis, it could be narrowed down further to focus on the effectiveness of drunk driving laws in just one or two countries.

  21. The Research Problem/Question

    A research problem is a statement about an area of concern, a condition to be improved, a difficulty to be eliminated, or a troubling question that exists in scholarly literature, in theory, or in practice that points to the need for meaningful understanding and deliberate investigation. In some social science disciplines the research problem is typically posed in the form of a question.

  22. How to Define a Research Problem

    The type of research problem you choose depends on your broad topic of interest and the type of research you think will fit best. This article helps you identify and refine a research problem. When writing your research proposal or introduction, formulate it as a problem statement and/or research questions.

  23. Choosing a Research Problem

    A research problem is the main organizing principle guiding the analysis of your paper. The problem under investigation offers us an occasion for writing and a focus that governs what we want to say. It represents the core subject matter of scholarly communication, and the means by which we arrive at other topics of conversations and the discovery of new knowledge and understanding.

  24. Sir Kim Workman's major police investigation released: What the seven

    A major report into unconscious bias among the police has found being Māori increases the chance of being prosecuted by 11% compared with Pākehā. In addition, Māori made up 42% of people who ...

  25. Current limitations in predicting mRNA translation with deep learning

    The design of nucleotide sequences with defined properties is a long-standing problem in bioengineering. An important application is protein expression, be it in the context of research or the production of mRNA vaccines. The rate of protein synthesis depends on the 5′ untranslated region (5′UTR) of the mRNAs, and recently, deep learning models were proposed to predict the translation ...