Research-Methodology

Regression Analysis

Regression analysis is a quantitative research method which is used when the study involves modelling and analysing several variables, where the relationship includes a dependent variable and one or more independent variables. In simple terms, regression analysis is a quantitative method used to test the nature of relationships between a dependent variable and one or more independent variables.

The basic form of regression models includes unknown parameters (β), independent variables (X), and the dependent variable (Y).

Regression model, basically, specifies the relation of dependent variable (Y) to a function combination of independent variables (X) and unknown parameters (β)

                                    Y  ≈  f (X, β)   

Regression equation can be used to predict the values of ‘y’, if the value of ‘x’ is given, and both ‘y’ and ‘x’ are the two sets of measures of a sample size of ‘n’. The formulae for regression equation would be

Regression analysis

Do not be intimidated by visual complexity of correlation and regression formulae above. You don’t have to apply the formula manually, and correlation and regression analyses can be run with the application of popular analytical software such as Microsoft Excel, Microsoft Access, SPSS and others.

Linear regression analysis is based on the following set of assumptions:

1. Assumption of linearity . There is a linear relationship between dependent and independent variables.

2. Assumption of homoscedasticity . Data values for dependent and independent variables have equal variances.

3. Assumption of absence of collinearity or multicollinearity . There is no correlation between two or more independent variables.

4. Assumption of normal distribution . The data for the independent variables and dependent variable are normally distributed

My e-book,  The Ultimate Guide to Writing a Dissertation in Business Studies: a step by step assistance  offers practical assistance to complete a dissertation with minimum or no stress. The e-book covers all stages of writing a dissertation starting from the selection to the research area to submitting the completed version of the work within the deadline. John Dudovskiy

Regression analysis

Duquesne University Logo

Quantitative Research Methods

  • Introduction
  • Descriptive and Inferential Statistics
  • Hypothesis Testing
  • Regression and Correlation
  • Time Series
  • Meta-Analysis
  • Mixed Methods
  • Additional Resources
  • Get Research Help

regression analysis quantitative research

Correlation is the relationship or association between two variables. There are multiple ways to measure correlation, but the most common is Pearson's correlation coefficient (r), which tells you the strength of the linear relationship between two variables. The value of r has a range of -1 to 1 (0 indicates no relationship). Values of r closer to -1 or 1 indicate a stronger relationship and values closer to 0 indicate a weaker relationship.  Because Pearson's coefficient only picks up on linear relationships, and there are many other ways for variables to be associated, it's always best to plot your variables on a scatter plot, so that you can visually inspect them for other types of correlation.

  • Correlation Penn State University tutorial
  • Correlation and Causation Australian Bureau of Statistics Article

Spurious Relationships

It's important to remember that correlation does not always indicate causation. Two variables can be correlated without either variable causing the other. For instance, ice cream sales and drownings might be correlated, but that doesn't mean that ice cream causes drownings—instead, both ice cream sales and drownings increase when the weather is hot. Relationships like this are called spurious correlations.

  • Spuriousness Harvard Business Review article.
  • New Evidence for Theory of The Stork A satirical article demonstrating the dangers of confusing correlation with causation.

regression analysis quantitative research

Regression is a statistical method for estimating the relationship between two or more variables. In theory, regression can be used to predict the value of one variable (the dependent variable) from the value of one or more other variables (the independent variable/s or predictor/s). There are many different types of regression, depending on the number of variables and the properties of the data that one is working with, and each makes assumptions about the relationship between the variables. (For instance, most types of regression assume that the variables have a linear relationship.) Therefore, it is important to understand the assumptions underlying the type of regression that you use and how to properly interpret its results. Because regression will always output a relationship, whether or not the variables are truly causally associated, it is also important to carefully select your predictor variables.

  • A Refresher on Regression Analysis Harvard Business Review article.
  • Introductory Business Statistics - Regression

Simple Linear Regression

Simple linear regression estimates a linear relationship between one dependent variable and one independent variable.

  • Simple Linear Regression Tutorial Penn State University Tutorial
  • Statistics 101: Linear Regression, The Very Basics YouTube video from Brandon Foltz.

Multiple Linear Regression

Multiple linear regression estimates a linear relationship between one dependent variable and two or more independent variables.

  • Multiple Linear Regression Tutorial Penn State University Tutorial
  • Multiple Regression Basics NYU course materials.
  • Statistics 101: Multiple Linear Regression, The Very Basics YouTube video from Brandon Foltz.

If you do a subject search for Regression Analysis you'll see that the library has over 200 books about regression.  Select books are listed below.  Also, note that econometrics texts will often include regression analysis and other related methods.  

regression analysis quantitative research

Search for ebooks using Quicksearch .  Use keywords to search for e-books about Regression .  

regression analysis quantitative research

  • << Previous: Hypothesis Testing
  • Next: ANOVA >>
  • Last Updated: Aug 18, 2023 11:55 AM
  • URL: https://guides.library.duq.edu/quant-methods
  • Privacy Policy

Research Method

Home » Regression Analysis – Methods, Types and Examples

Regression Analysis – Methods, Types and Examples

Table of Contents

Regression Analysis

Regression Analysis

Regression analysis is a set of statistical processes for estimating the relationships among variables . It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables (or ‘predictors’).

Regression Analysis Methodology

Here is a general methodology for performing regression analysis:

  • Define the research question: Clearly state the research question or hypothesis you want to investigate. Identify the dependent variable (also called the response variable or outcome variable) and the independent variables (also called predictor variables or explanatory variables) that you believe are related to the dependent variable.
  • Collect data: Gather the data for the dependent variable and independent variables. Ensure that the data is relevant, accurate, and representative of the population or phenomenon you are studying.
  • Explore the data: Perform exploratory data analysis to understand the characteristics of the data, identify any missing values or outliers, and assess the relationships between variables through scatter plots, histograms, or summary statistics.
  • Choose the regression model: Select an appropriate regression model based on the nature of the variables and the research question. Common regression models include linear regression, multiple regression, logistic regression, polynomial regression, and time series regression, among others.
  • Assess assumptions: Check the assumptions of the regression model. Some common assumptions include linearity (the relationship between variables is linear), independence of errors, homoscedasticity (constant variance of errors), and normality of errors. Violation of these assumptions may require additional steps or alternative models.
  • Estimate the model: Use a suitable method to estimate the parameters of the regression model. The most common method is ordinary least squares (OLS), which minimizes the sum of squared differences between the observed and predicted values of the dependent variable.
  • I nterpret the results: Analyze the estimated coefficients, p-values, confidence intervals, and goodness-of-fit measures (e.g., R-squared) to interpret the results. Determine the significance and direction of the relationships between the independent variables and the dependent variable.
  • Evaluate model performance: Assess the overall performance of the regression model using appropriate measures, such as R-squared, adjusted R-squared, and root mean squared error (RMSE). These measures indicate how well the model fits the data and how much of the variation in the dependent variable is explained by the independent variables.
  • Test assumptions and diagnose problems: Check the residuals (the differences between observed and predicted values) for any patterns or deviations from assumptions. Conduct diagnostic tests, such as examining residual plots, testing for multicollinearity among independent variables, and assessing heteroscedasticity or autocorrelation, if applicable.
  • Make predictions and draw conclusions: Once you have a satisfactory model, use it to make predictions on new or unseen data. Draw conclusions based on the results of the analysis, considering the limitations and potential implications of the findings.

Types of Regression Analysis

Types of Regression Analysis are as follows:

Linear Regression

Linear regression is the most basic and widely used form of regression analysis. It models the linear relationship between a dependent variable and one or more independent variables. The goal is to find the best-fitting line that minimizes the sum of squared differences between observed and predicted values.

Multiple Regression

Multiple regression extends linear regression by incorporating two or more independent variables to predict the dependent variable. It allows for examining the simultaneous effects of multiple predictors on the outcome variable.

Polynomial Regression

Polynomial regression models non-linear relationships between variables by adding polynomial terms (e.g., squared or cubic terms) to the regression equation. It can capture curved or nonlinear patterns in the data.

Logistic Regression

Logistic regression is used when the dependent variable is binary or categorical. It models the probability of the occurrence of a certain event or outcome based on the independent variables. Logistic regression estimates the coefficients using the logistic function, which transforms the linear combination of predictors into a probability.

Ridge Regression and Lasso Regression

Ridge regression and Lasso regression are techniques used for addressing multicollinearity (high correlation between independent variables) and variable selection. Both methods introduce a penalty term to the regression equation to shrink or eliminate less important variables. Ridge regression uses L2 regularization, while Lasso regression uses L1 regularization.

Time Series Regression

Time series regression analyzes the relationship between a dependent variable and independent variables when the data is collected over time. It accounts for autocorrelation and trends in the data and is used in forecasting and studying temporal relationships.

Nonlinear Regression

Nonlinear regression models are used when the relationship between the dependent variable and independent variables is not linear. These models can take various functional forms and require estimation techniques different from those used in linear regression.

Poisson Regression

Poisson regression is employed when the dependent variable represents count data. It models the relationship between the independent variables and the expected count, assuming a Poisson distribution for the dependent variable.

Generalized Linear Models (GLM)

GLMs are a flexible class of regression models that extend the linear regression framework to handle different types of dependent variables, including binary, count, and continuous variables. GLMs incorporate various probability distributions and link functions.

Regression Analysis Formulas

Regression analysis involves estimating the parameters of a regression model to describe the relationship between the dependent variable (Y) and one or more independent variables (X). Here are the basic formulas for linear regression, multiple regression, and logistic regression:

Linear Regression:

Simple Linear Regression Model: Y = β0 + β1X + ε

Multiple Linear Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

In both formulas:

  • Y represents the dependent variable (response variable).
  • X represents the independent variable(s) (predictor variable(s)).
  • β0, β1, β2, …, βn are the regression coefficients or parameters that need to be estimated.
  • ε represents the error term or residual (the difference between the observed and predicted values).

Multiple Regression:

Multiple regression extends the concept of simple linear regression by including multiple independent variables.

Multiple Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

The formulas are similar to those in linear regression, with the addition of more independent variables.

Logistic Regression:

Logistic regression is used when the dependent variable is binary or categorical. The logistic regression model applies a logistic or sigmoid function to the linear combination of the independent variables.

Logistic Regression Model: p = 1 / (1 + e^-(β0 + β1X1 + β2X2 + … + βnXn))

In the formula:

  • p represents the probability of the event occurring (e.g., the probability of success or belonging to a certain category).
  • X1, X2, …, Xn represent the independent variables.
  • e is the base of the natural logarithm.

The logistic function ensures that the predicted probabilities lie between 0 and 1, allowing for binary classification.

Regression Analysis Examples

Regression Analysis Examples are as follows:

  • Stock Market Prediction: Regression analysis can be used to predict stock prices based on various factors such as historical prices, trading volume, news sentiment, and economic indicators. Traders and investors can use this analysis to make informed decisions about buying or selling stocks.
  • Demand Forecasting: In retail and e-commerce, real-time It can help forecast demand for products. By analyzing historical sales data along with real-time data such as website traffic, promotional activities, and market trends, businesses can adjust their inventory levels and production schedules to meet customer demand more effectively.
  • Energy Load Forecasting: Utility companies often use real-time regression analysis to forecast electricity demand. By analyzing historical energy consumption data, weather conditions, and other relevant factors, they can predict future energy loads. This information helps them optimize power generation and distribution, ensuring a stable and efficient energy supply.
  • Online Advertising Performance: It can be used to assess the performance of online advertising campaigns. By analyzing real-time data on ad impressions, click-through rates, conversion rates, and other metrics, advertisers can adjust their targeting, messaging, and ad placement strategies to maximize their return on investment.
  • Predictive Maintenance: Regression analysis can be applied to predict equipment failures or maintenance needs. By continuously monitoring sensor data from machines or vehicles, regression models can identify patterns or anomalies that indicate potential failures. This enables proactive maintenance, reducing downtime and optimizing maintenance schedules.
  • Financial Risk Assessment: Real-time regression analysis can help financial institutions assess the risk associated with lending or investment decisions. By analyzing real-time data on factors such as borrower financials, market conditions, and macroeconomic indicators, regression models can estimate the likelihood of default or assess the risk-return tradeoff for investment portfolios.

Importance of Regression Analysis

Importance of Regression Analysis is as follows:

  • Relationship Identification: Regression analysis helps in identifying and quantifying the relationship between a dependent variable and one or more independent variables. It allows us to determine how changes in independent variables impact the dependent variable. This information is crucial for decision-making, planning, and forecasting.
  • Prediction and Forecasting: Regression analysis enables us to make predictions and forecasts based on the relationships identified. By estimating the values of the dependent variable using known values of independent variables, regression models can provide valuable insights into future outcomes. This is particularly useful in business, economics, finance, and other fields where forecasting is vital for planning and strategy development.
  • Causality Assessment: While correlation does not imply causation, regression analysis provides a framework for assessing causality by considering the direction and strength of the relationship between variables. It allows researchers to control for other factors and assess the impact of a specific independent variable on the dependent variable. This helps in determining the causal effect and identifying significant factors that influence outcomes.
  • Model Building and Variable Selection: Regression analysis aids in model building by determining the most appropriate functional form of the relationship between variables. It helps researchers select relevant independent variables and eliminate irrelevant ones, reducing complexity and improving model accuracy. This process is crucial for creating robust and interpretable models.
  • Hypothesis Testing: Regression analysis provides a statistical framework for hypothesis testing. Researchers can test the significance of individual coefficients, assess the overall model fit, and determine if the relationship between variables is statistically significant. This allows for rigorous analysis and validation of research hypotheses.
  • Policy Evaluation and Decision-Making: Regression analysis plays a vital role in policy evaluation and decision-making processes. By analyzing historical data, researchers can evaluate the effectiveness of policy interventions and identify the key factors contributing to certain outcomes. This information helps policymakers make informed decisions, allocate resources effectively, and optimize policy implementation.
  • Risk Assessment and Control: Regression analysis can be used for risk assessment and control purposes. By analyzing historical data, organizations can identify risk factors and develop models that predict the likelihood of certain outcomes, such as defaults, accidents, or failures. This enables proactive risk management, allowing organizations to take preventive measures and mitigate potential risks.

When to Use Regression Analysis

  • Prediction : Regression analysis is often employed to predict the value of the dependent variable based on the values of independent variables. For example, you might use regression to predict sales based on advertising expenditure, or to predict a student’s academic performance based on variables like study time, attendance, and previous grades.
  • Relationship analysis: Regression can help determine the strength and direction of the relationship between variables. It can be used to examine whether there is a linear association between variables, identify which independent variables have a significant impact on the dependent variable, and quantify the magnitude of those effects.
  • Causal inference: Regression analysis can be used to explore cause-and-effect relationships by controlling for other variables. For example, in a medical study, you might use regression to determine the impact of a specific treatment while accounting for other factors like age, gender, and lifestyle.
  • Forecasting : Regression models can be utilized to forecast future trends or outcomes. By fitting a regression model to historical data, you can make predictions about future values of the dependent variable based on changes in the independent variables.
  • Model evaluation: Regression analysis can be used to evaluate the performance of a model or test the significance of variables. You can assess how well the model fits the data, determine if additional variables improve the model’s predictive power, or test the statistical significance of coefficients.
  • Data exploration : Regression analysis can help uncover patterns and insights in the data. By examining the relationships between variables, you can gain a deeper understanding of the data set and identify potential patterns, outliers, or influential observations.

Applications of Regression Analysis

Here are some common applications of regression analysis:

  • Economic Forecasting: Regression analysis is frequently employed in economics to forecast variables such as GDP growth, inflation rates, or stock market performance. By analyzing historical data and identifying the underlying relationships, economists can make predictions about future economic conditions.
  • Financial Analysis: Regression analysis plays a crucial role in financial analysis, such as predicting stock prices or evaluating the impact of financial factors on company performance. It helps analysts understand how variables like interest rates, company earnings, or market indices influence financial outcomes.
  • Marketing Research: Regression analysis helps marketers understand consumer behavior and make data-driven decisions. It can be used to predict sales based on advertising expenditures, pricing strategies, or demographic variables. Regression models provide insights into which marketing efforts are most effective and help optimize marketing campaigns.
  • Health Sciences: Regression analysis is extensively used in medical research and public health studies. It helps examine the relationship between risk factors and health outcomes, such as the impact of smoking on lung cancer or the relationship between diet and heart disease. Regression analysis also helps in predicting health outcomes based on various factors like age, genetic markers, or lifestyle choices.
  • Social Sciences: Regression analysis is widely used in social sciences like sociology, psychology, and education research. Researchers can investigate the impact of variables like income, education level, or social factors on various outcomes such as crime rates, academic performance, or job satisfaction.
  • Operations Research: Regression analysis is applied in operations research to optimize processes and improve efficiency. For example, it can be used to predict demand based on historical sales data, determine the factors influencing production output, or optimize supply chain logistics.
  • Environmental Studies: Regression analysis helps in understanding and predicting environmental phenomena. It can be used to analyze the impact of factors like temperature, pollution levels, or land use patterns on phenomena such as species diversity, water quality, or climate change.
  • Sports Analytics: Regression analysis is increasingly used in sports analytics to gain insights into player performance, team strategies, and game outcomes. It helps analyze the relationship between various factors like player statistics, coaching strategies, or environmental conditions and their impact on game outcomes.

Advantages and Disadvantages of Regression Analysis

Advantages of Regression AnalysisDisadvantages of Regression Analysis
Provides a quantitative measure of the relationship between variablesAssumes a linear relationship between variables, which may not always hold true
Helps in predicting and forecasting outcomes based on historical dataRequires a large sample size to produce reliable results
Identifies and measures the significance of independent variables on the dependent variableAssumes no multicollinearity, meaning that independent variables should not be highly correlated with each other
Provides estimates of the coefficients that represent the strength and direction of the relationship between variablesAssumes the absence of outliers or influential data points
Allows for hypothesis testing to determine the statistical significance of the relationshipCan be sensitive to the inclusion or exclusion of certain variables, leading to different results
Can handle both continuous and categorical variablesAssumes the independence of observations, which may not hold true in some cases
Offers a visual representation of the relationship through the use of scatter plots and regression linesMay not capture complex non-linear relationships between variables without appropriate transformations
Provides insights into the marginal effects of independent variables on the dependent variableRequires the assumption of homoscedasticity, meaning that the variance of errors is constant across all levels of the independent variables

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Grounded Theory

Grounded Theory – Methods, Examples and Guide

Probability Histogram

Probability Histogram – Definition, Examples and...

Phenomenology

Phenomenology – Methods, Examples and Guide

Narrative Analysis

Narrative Analysis – Types, Methods and Examples

Methodological Framework

Methodological Framework – Types, Examples and...

Multidimensional Scaling

Multidimensional Scaling – Types, Formulas and...

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence
  • Market Research
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Data Analysis & Reporting
  • Regression Analysis

Try Qualtrics for free

The complete guide to regression analysis.

19 min read What is regression analysis and why is it useful? While most of us have heard the term, understanding regression analysis in detail may be something you need to brush up on. Here’s what you need to know about this popular method of analysis.

When you rely on data to drive and guide business decisions, as well as predict market trends, just gathering and analyzing what you find isn’t enough — you need to ensure it’s relevant and valuable.

The challenge, however, is that so many variables can influence business data: market conditions, economic disruption, even the weather! As such, it’s essential you know which variables are affecting your data and forecasts, and what data you can discard.

And one of the most effective ways to determine data value and monitor trends (and the relationships between them) is to use regression analysis, a set of statistical methods used for the estimation of relationships between independent and dependent variables.

In this guide, we’ll cover the fundamentals of regression analysis, from what it is and how it works to its benefits and practical applications.

Free eBook: 2024 global market research trends report

What is regression analysis?

Regression analysis is a statistical method. It’s used for analyzing different factors that might influence an objective – such as the success of a product launch, business growth, a new marketing campaign – and determining which factors are important and which ones can be ignored.

Regression analysis can also help leaders understand how different variables impact each other and what the outcomes are. For example, when forecasting financial performance, regression analysis can help leaders determine how changes in the business can influence revenue or expenses in the future.

Running an analysis of this kind, you might find that there’s a high correlation between the number of marketers employed by the company, the leads generated, and the opportunities closed.

This seems to suggest that a high number of marketers and a high number of leads generated influences sales success. But do you need both factors to close those sales? By analyzing the effects of these variables on your outcome,  you might learn that when leads increase but the number of marketers employed stays constant, there is no impact on the number of opportunities closed, but if the number of marketers increases, leads and closed opportunities both rise.

Regression analysis can help you tease out these complex relationships so you can determine which areas you need to focus on in order to get your desired results, and avoid wasting time with those that have little or no impact. In this example, that might mean hiring more marketers rather than trying to increase leads generated.

How does regression analysis work?

Regression analysis starts with variables that are categorized into two types: dependent and independent variables. The variables you select depend on the outcomes you’re analyzing.

Understanding variables:

1. dependent variable.

This is the main variable that you want to analyze and predict. For example, operational (O) data such as your quarterly or annual sales, or experience (X) data such as your net promoter score (NPS) or customer satisfaction score (CSAT) .

These variables are also called response variables, outcome variables, or left-hand-side variables (because they appear on the left-hand side of a regression equation).

There are three easy ways to identify them:

  • Is the variable measured as an outcome of the study?
  • Does the variable depend on another in the study?
  • Do you measure the variable only after other variables are altered?

2. Independent variable

Independent variables are the factors that could affect your dependent variables. For example, a price rise in the second quarter could make an impact on your sales figures.

You can identify independent variables with the following list of questions:

  • Is the variable manipulated, controlled, or used as a subject grouping method by the researcher?
  • Does this variable come before the other variable in time?
  • Are you trying to understand whether or how this variable affects another?

Independent variables are often referred to differently in regression depending on the purpose of the analysis. You might hear them called:

Explanatory variables

Explanatory variables are those which explain an event or an outcome in your study. For example, explaining why your sales dropped or increased.

Predictor variables

Predictor variables are used to predict the value of the dependent variable. For example, predicting how much sales will increase when new product features are rolled out .

Experimental variables

These are variables that can be manipulated or changed directly by researchers to assess the impact. For example, assessing how different product pricing ($10 vs $15 vs $20) will impact the likelihood to purchase.

Subject variables (also called fixed effects)

Subject variables can’t be changed directly, but vary across the sample. For example, age, gender, or income of consumers.

Unlike experimental variables, you can’t randomly assign or change subject variables, but you can design your regression analysis to determine the different outcomes of groups of participants with the same characteristics. For example, ‘how do price rises impact sales based on income?’

Carrying out regression analysis

Regression analysis

So regression is about the relationships between dependent and independent variables. But how exactly do you do it?

Assuming you have your data collection done already, the first and foremost thing you need to do is plot your results on a graph. Doing this makes interpreting regression analysis results much easier as you can clearly see the correlations between dependent and independent variables.

Let’s say you want to carry out a regression analysis to understand the relationship between the number of ads placed and revenue generated.

On the Y-axis, you place the revenue generated. On the X-axis, the number of digital ads. By plotting the information on the graph, and drawing a line (called the regression line) through the middle of the data, you can see the relationship between the number of digital ads placed and revenue generated.

Regression analysis - step by step

This regression line is the line that provides the best description of the relationship between your independent variables and your dependent variable. In this example, we’ve used a simple linear regression model.

Regression analysis - step by step

Statistical analysis software can draw this line for you and precisely calculate the regression line. The software then provides a formula for the slope of the line, adding further context to the relationship between your dependent and independent variables.

Simple linear regression analysis

A simple linear model uses a single straight line to determine the relationship between a single independent variable and a dependent variable.

This regression model is mostly used when you want to determine the relationship between two variables (like price increases and sales) or the value of the dependent variable at certain points of the independent variable (for example the sales levels at a certain price rise).

While linear regression is useful, it does require you to make some assumptions.

For example, it requires you to assume that:

  • the data was collected using a statistically valid sample collection method that is representative of the target population
  • The observed relationship between the variables can’t be explained by a ‘hidden’ third variable – in other words, there are no spurious correlations.
  • the relationship between the independent variable and dependent variable is linear – meaning that the best fit along the data points is a straight line and not a curved one

Multiple regression analysis

As the name suggests, multiple regression analysis is a type of regression that uses multiple variables. It uses multiple independent variables to predict the outcome of a single dependent variable. Of the various kinds of multiple regression, multiple linear regression is one of the best-known.

Multiple linear regression is a close relative of the simple linear regression model in that it looks at the impact of several independent variables on one dependent variable. However, like simple linear regression, multiple regression analysis also requires you to make some basic assumptions.

For example, you will be assuming that:

  • there is a linear relationship between the dependent and independent variables (it creates a straight line and not a curve through the data points)
  • the independent variables aren’t highly correlated in their own right

An example of multiple linear regression would be an analysis of how marketing spend, revenue growth, and general market sentiment affect the share price of a company.

With multiple linear regression models you can estimate how these variables will influence the share price, and to what extent.

Multivariate linear regression

Multivariate linear regression involves more than one dependent variable as well as multiple independent variables, making it more complicated than linear or multiple linear regressions. However, this also makes it much more powerful and capable of making predictions about complex real-world situations.

For example, if an organization wants to establish or estimate how the COVID-19 pandemic has affected employees in its different markets, it can use multivariate linear regression, with the different geographical regions as dependent variables and the different facets of the pandemic as independent variables (such as mental health self-rating scores, proportion of employees working at home, lockdown durations and employee sick days).

Through multivariate linear regression, you can look at relationships between variables in a holistic way and quantify the relationships between them. As you can clearly visualize those relationships, you can make adjustments to dependent and independent variables to see which conditions influence them. Overall, multivariate linear regression provides a more realistic picture than looking at a single variable.

However, because multivariate techniques are complex, they involve high-level mathematics that require a statistical program to analyze the data.

Logistic regression

Logistic regression models the probability of a binary outcome based on independent variables.

So, what is a binary outcome? It’s when there are only two possible scenarios, either the event happens (1) or it doesn’t (0). e.g. yes/no outcomes, pass/fail outcomes, and so on. In other words, if the outcome can be described as being in either one of two categories.

Logistic regression makes predictions based on independent variables that are assumed or known to have an influence on the outcome. For example, the probability of a sports team winning their game might be affected by independent variables like weather, day of the week, whether they are playing at home or away and how they fared in previous matches.

What are some common mistakes with regression analysis?

Across the globe, businesses are increasingly relying on quality data and insights to drive decision-making — but to make accurate decisions, it’s important that the data collected and statistical methods used to analyze it are reliable and accurate.

Using the wrong data or the wrong assumptions can result in poor decision-making, lead to missed opportunities to improve efficiency and savings, and — ultimately — damage your business long term.

  • Assumptions

When running regression analysis, be it a simple linear or multiple regression, it’s really important to check that the assumptions your chosen method requires have been met. If your data points don’t conform to a straight line of best fit, for example, you need to apply additional statistical modifications to accommodate the non-linear data. For example, if you are looking at income data, which scales on a logarithmic distribution, you should take the Natural Log of Income as your variable then adjust the outcome after the model is created.

  • Correlation vs. causation

It’s a well-worn phrase that bears repeating – correlation does not equal causation. While variables that are linked by causality will always show correlation, the reverse is not always true. Moreover, there is no statistic that can determine causality (although the design of your study overall can).

If you observe a correlation in your results, such as in the first example we gave in this article where there was a correlation between leads and sales, you can’t assume that one thing has influenced the other. Instead, you should use it as a starting point for investigating the relationship between the variables in more depth.

  • Choosing the wrong variables to analyze

Before you use any kind of statistical method, it’s important to understand the subject you’re researching in detail. Doing so means you’re making informed choices of variables and you’re not overlooking something important that might have a significant bearing on your dependent variable.

  • Model building The variables you include in your analysis are just as important as the variables you choose to exclude. That’s because the strength of each independent variable is influenced by the other variables in the model. Other techniques, such as Key Drivers Analysis, are able to account for these variable interdependencies.

Benefits of using regression analysis

There are several benefits to using regression analysis to judge how changing variables will affect your business and to ensure you focus on the right things when forecasting.

Here are just a few of those benefits:

Make accurate predictions

Regression analysis is commonly used when forecasting and forward planning for a business. For example, when predicting sales for the year ahead, a number of different variables will come into play to determine the eventual result.

Regression analysis can help you determine which of these variables are likely to have the biggest impact based on previous events and help you make more accurate forecasts and predictions.

Identify inefficiencies

Using a regression equation a business can identify areas for improvement when it comes to efficiency, either in terms of people, processes, or equipment.

For example, regression analysis can help a car manufacturer determine order numbers based on external factors like the economy or environment.

Using the initial regression equation, they can use it to determine how many members of staff and how much equipment they need to meet orders.

Drive better decisions

Improving processes or business outcomes is always on the minds of owners and business leaders, but without actionable data, they’re simply relying on instinct, and this doesn’t always work out.

This is particularly true when it comes to issues of price. For example, to what extent will raising the price (and to what level) affect next quarter’s sales?

There’s no way to know this without data analysis. Regression analysis can help provide insights into the correlation between price rises and sales based on historical data.

How do businesses use regression? A real-life example

Marketing and advertising spending are common topics for regression analysis. Companies use regression when trying to assess the value of ad spend and marketing spend on revenue.

A typical example is using a regression equation to assess the correlation between ad costs and conversions of new customers. In this instance,

  • our dependent variable (the factor we’re trying to assess the outcomes of) will be our conversions
  • the independent variable (the factor we’ll change to assess how it changes the outcome) will be the daily ad spend
  • the regression equation will try to determine whether an increase in ad spend has a direct correlation with the number of conversions we have

The analysis is relatively straightforward — using historical data from an ad account, we can use daily data to judge ad spend vs conversions and how changes to the spend alter the conversions.

By assessing this data over time, we can make predictions not only on whether increasing ad spend will lead to increased conversions but also what level of spending will lead to what increase in conversions. This can help to optimize campaign spend and ensure marketing delivers good ROI.

This is an example of a simple linear model. If you wanted to carry out a more complex regression equation, we could also factor in other independent variables such as seasonality, GDP, and the current reach of our chosen advertising networks.

By increasing the number of independent variables, we can get a better understanding of whether ad spend is resulting in an increase in conversions, whether it’s exerting an influence in combination with another set of variables, or if we’re dealing with a correlation with no causal impact – which might be useful for predictions anyway, but isn’t a lever we can use to increase sales.

Using this predicted value of each independent variable, we can more accurately predict how spend will change the conversion rate of advertising.

Regression analysis tools

Regression analysis is an important tool when it comes to better decision-making and improved business outcomes. To get the best out of it, you need to invest in the right kind of statistical analysis software.

The best option is likely to be one that sits at the intersection of powerful statistical analysis and intuitive ease of use, as this will empower everyone from beginners to expert analysts to uncover meaning from data, identify hidden trends and produce predictive models without statistical training being required.

IQ stats in action

To help prevent costly errors, choose a tool that automatically runs the right statistical tests and visualizations and then translates the results into simple language that anyone can put into action.

With software that’s both powerful and user-friendly, you can isolate key experience drivers, understand what influences the business, apply the most appropriate regression methods, identify data issues, and much more.

Regression analysis tools

With Qualtrics’ Stats iQ™, you don’t have to worry about the regression equation because our statistical software will run the appropriate equation for you automatically based on the variable type you want to monitor. You can also use several equations, including linear regression and logistic regression, to gain deeper insights into business outcomes and make more accurate, data-driven decisions.

Related resources

Analysis & Reporting

Data Analysis 31 min read

Social media analytics 13 min read, kano analysis 21 min read, margin of error 11 min read, data saturation in qualitative research 8 min read, thematic analysis 11 min read, behavioral analytics 12 min read, request demo.

Ready to learn more about Qualtrics?

regression analysis quantitative research

Quantitative Data Analysis 101

The lingo, methods and techniques, explained simply.

By: Derek Jansen (MBA)  and Kerryn Warren (PhD) | December 2020

Quantitative data analysis is one of those things that often strikes fear in students. It’s totally understandable – quantitative analysis is a complex topic, full of daunting lingo , like medians, modes, correlation and regression. Suddenly we’re all wishing we’d paid a little more attention in math class…

The good news is that while quantitative data analysis is a mammoth topic, gaining a working understanding of the basics isn’t that hard , even for those of us who avoid numbers and math . In this post, we’ll break quantitative analysis down into simple , bite-sized chunks so you can approach your research with confidence.

Quantitative data analysis methods and techniques 101

Overview: Quantitative Data Analysis 101

  • What (exactly) is quantitative data analysis?
  • When to use quantitative analysis
  • How quantitative analysis works

The two “branches” of quantitative analysis

  • Descriptive statistics 101
  • Inferential statistics 101
  • How to choose the right quantitative methods
  • Recap & summary

What is quantitative data analysis?

Despite being a mouthful, quantitative data analysis simply means analysing data that is numbers-based – or data that can be easily “converted” into numbers without losing any meaning.

For example, category-based variables like gender, ethnicity, or native language could all be “converted” into numbers without losing meaning – for example, English could equal 1, French 2, etc.

This contrasts against qualitative data analysis, where the focus is on words, phrases and expressions that can’t be reduced to numbers. If you’re interested in learning about qualitative analysis, check out our post and video here .

What is quantitative analysis used for?

Quantitative analysis is generally used for three purposes.

  • Firstly, it’s used to measure differences between groups . For example, the popularity of different clothing colours or brands.
  • Secondly, it’s used to assess relationships between variables . For example, the relationship between weather temperature and voter turnout.
  • And third, it’s used to test hypotheses in a scientifically rigorous way. For example, a hypothesis about the impact of a certain vaccine.

Again, this contrasts with qualitative analysis , which can be used to analyse people’s perceptions and feelings about an event or situation. In other words, things that can’t be reduced to numbers.

How does quantitative analysis work?

Well, since quantitative data analysis is all about analysing numbers , it’s no surprise that it involves statistics . Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions).

Sounds like gibberish? Don’t worry. We’ll explain all of that in this post. Importantly, you don’t need to be a statistician or math wiz to pull off a good quantitative analysis. We’ll break down all the technical mumbo jumbo in this post.

Need a helping hand?

regression analysis quantitative research

As I mentioned, quantitative analysis is powered by statistical analysis methods . There are two main “branches” of statistical methods that are used – descriptive statistics and inferential statistics . In your research, you might only use descriptive statistics, or you might use a mix of both , depending on what you’re trying to figure out. In other words, depending on your research questions, aims and objectives . I’ll explain how to choose your methods later.

So, what are descriptive and inferential statistics?

Well, before I can explain that, we need to take a quick detour to explain some lingo. To understand the difference between these two branches of statistics, you need to understand two important words. These words are population and sample .

First up, population . In statistics, the population is the entire group of people (or animals or organisations or whatever) that you’re interested in researching. For example, if you were interested in researching Tesla owners in the US, then the population would be all Tesla owners in the US.

However, it’s extremely unlikely that you’re going to be able to interview or survey every single Tesla owner in the US. Realistically, you’ll likely only get access to a few hundred, or maybe a few thousand owners using an online survey. This smaller group of accessible people whose data you actually collect is called your sample .

So, to recap – the population is the entire group of people you’re interested in, and the sample is the subset of the population that you can actually get access to. In other words, the population is the full chocolate cake , whereas the sample is a slice of that cake.

So, why is this sample-population thing important?

Well, descriptive statistics focus on describing the sample , while inferential statistics aim to make predictions about the population, based on the findings within the sample. In other words, we use one group of statistical methods – descriptive statistics – to investigate the slice of cake, and another group of methods – inferential statistics – to draw conclusions about the entire cake. There I go with the cake analogy again…

With that out the way, let’s take a closer look at each of these branches in more detail.

Descriptive statistics vs inferential statistics

Branch 1: Descriptive Statistics

Descriptive statistics serve a simple but critically important role in your research – to describe your data set – hence the name. In other words, they help you understand the details of your sample . Unlike inferential statistics (which we’ll get to soon), descriptive statistics don’t aim to make inferences or predictions about the entire population – they’re purely interested in the details of your specific sample .

When you’re writing up your analysis, descriptive statistics are the first set of stats you’ll cover, before moving on to inferential statistics. But, that said, depending on your research objectives and research questions , they may be the only type of statistics you use. We’ll explore that a little later.

So, what kind of statistics are usually covered in this section?

Some common statistical tests used in this branch include the following:

  • Mean – this is simply the mathematical average of a range of numbers.
  • Median – this is the midpoint in a range of numbers when the numbers are arranged in numerical order. If the data set makes up an odd number, then the median is the number right in the middle of the set. If the data set makes up an even number, then the median is the midpoint between the two middle numbers.
  • Mode – this is simply the most commonly occurring number in the data set.
  • In cases where most of the numbers are quite close to the average, the standard deviation will be relatively low.
  • Conversely, in cases where the numbers are scattered all over the place, the standard deviation will be relatively high.
  • Skewness . As the name suggests, skewness indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph, or do they skew to the left or right?

Feeling a bit confused? Let’s look at a practical example using a small data set.

Descriptive statistics example data

On the left-hand side is the data set. This details the bodyweight of a sample of 10 people. On the right-hand side, we have the descriptive statistics. Let’s take a look at each of them.

First, we can see that the mean weight is 72.4 kilograms. In other words, the average weight across the sample is 72.4 kilograms. Straightforward.

Next, we can see that the median is very similar to the mean (the average). This suggests that this data set has a reasonably symmetrical distribution (in other words, a relatively smooth, centred distribution of weights, clustered towards the centre).

In terms of the mode , there is no mode in this data set. This is because each number is present only once and so there cannot be a “most common number”. If there were two people who were both 65 kilograms, for example, then the mode would be 65.

Next up is the standard deviation . 10.6 indicates that there’s quite a wide spread of numbers. We can see this quite easily by looking at the numbers themselves, which range from 55 to 90, which is quite a stretch from the mean of 72.4.

And lastly, the skewness of -0.2 tells us that the data is very slightly negatively skewed. This makes sense since the mean and the median are slightly different.

As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones.

But why do all of these numbers matter?

While these descriptive statistics are all fairly basic, they’re important for a few reasons:

  • Firstly, they help you get both a macro and micro-level view of your data. In other words, they help you understand both the big picture and the finer details.
  • Secondly, they help you spot potential errors in the data – for example, if an average is way higher than you’d expect, or responses to a question are highly varied, this can act as a warning sign that you need to double-check the data.
  • And lastly, these descriptive statistics help inform which inferential statistical techniques you can use, as those techniques depend on the skewness (in other words, the symmetry and normality) of the data.

Simply put, descriptive statistics are really important , even though the statistical techniques used are fairly basic. All too often at Grad Coach, we see students skimming over the descriptives in their eagerness to get to the more exciting inferential methods, and then landing up with some very flawed results.

Don’t be a sucker – give your descriptive statistics the love and attention they deserve!

Examples of descriptive statistics

Branch 2: Inferential Statistics

As I mentioned, while descriptive statistics are all about the details of your specific data set – your sample – inferential statistics aim to make inferences about the population . In other words, you’ll use inferential statistics to make predictions about what you’d expect to find in the full population.

What kind of predictions, you ask? Well, there are two common types of predictions that researchers try to make using inferential stats:

  • Firstly, predictions about differences between groups – for example, height differences between children grouped by their favourite meal or gender.
  • And secondly, relationships between variables – for example, the relationship between body weight and the number of hours a week a person does yoga.

In other words, inferential statistics (when done correctly), allow you to connect the dots and make predictions about what you expect to see in the real world population, based on what you observe in your sample data. For this reason, inferential statistics are used for hypothesis testing – in other words, to test hypotheses that predict changes or differences.

Inferential statistics are used to make predictions about what you’d expect to find in the full population, based on the sample.

Of course, when you’re working with inferential statistics, the composition of your sample is really important. In other words, if your sample doesn’t accurately represent the population you’re researching, then your findings won’t necessarily be very useful.

For example, if your population of interest is a mix of 50% male and 50% female , but your sample is 80% male , you can’t make inferences about the population based on your sample, since it’s not representative. This area of statistics is called sampling, but we won’t go down that rabbit hole here (it’s a deep one!) – we’ll save that for another post .

What statistics are usually used in this branch?

There are many, many different statistical analysis methods within the inferential branch and it’d be impossible for us to discuss them all here. So we’ll just take a look at some of the most common inferential statistical methods so that you have a solid starting point.

First up are T-Tests . T-tests compare the means (the averages) of two groups of data to assess whether they’re statistically significantly different. In other words, do they have significantly different means, standard deviations and skewness.

This type of testing is very useful for understanding just how similar or different two groups of data are. For example, you might want to compare the mean blood pressure between two groups of people – one that has taken a new medication and one that hasn’t – to assess whether they are significantly different.

Kicking things up a level, we have ANOVA, which stands for “analysis of variance”. This test is similar to a T-test in that it compares the means of various groups, but ANOVA allows you to analyse multiple groups , not just two groups So it’s basically a t-test on steroids…

Next, we have correlation analysis . This type of analysis assesses the relationship between two variables. In other words, if one variable increases, does the other variable also increase, decrease or stay the same. For example, if the average temperature goes up, do average ice creams sales increase too? We’d expect some sort of relationship between these two variables intuitively , but correlation analysis allows us to measure that relationship scientifically .

Lastly, we have regression analysis – this is quite similar to correlation in that it assesses the relationship between variables, but it goes a step further to understand cause and effect between variables, not just whether they move together. In other words, does the one variable actually cause the other one to move, or do they just happen to move together naturally thanks to another force? Just because two variables correlate doesn’t necessarily mean that one causes the other.

Stats overload…

I hear you. To make this all a little more tangible, let’s take a look at an example of a correlation in action.

Here’s a scatter plot demonstrating the correlation (relationship) between weight and height. Intuitively, we’d expect there to be some relationship between these two variables, which is what we see in this scatter plot. In other words, the results tend to cluster together in a diagonal line from bottom left to top right.

Sample correlation

As I mentioned, these are are just a handful of inferential techniques – there are many, many more. Importantly, each statistical method has its own assumptions and limitations .

For example, some methods only work with normally distributed (parametric) data, while other methods are designed specifically for non-parametric data. And that’s exactly why descriptive statistics are so important – they’re the first step to knowing which inferential techniques you can and can’t use.

Remember that every statistical method has its own assumptions and limitations,  so you need to be aware of these.

How to choose the right analysis method

To choose the right statistical methods, you need to think about two important factors :

  • The type of quantitative data you have (specifically, level of measurement and the shape of the data). And,
  • Your research questions and hypotheses

Let’s take a closer look at each of these.

Factor 1 – Data type

The first thing you need to consider is the type of data you’ve collected (or the type of data you will collect). By data types, I’m referring to the four levels of measurement – namely, nominal, ordinal, interval and ratio. If you’re not familiar with this lingo, check out the video below.

Why does this matter?

Well, because different statistical methods and techniques require different types of data. This is one of the “assumptions” I mentioned earlier – every method has its assumptions regarding the type of data.

For example, some techniques work with categorical data (for example, yes/no type questions, or gender or ethnicity), while others work with continuous numerical data (for example, age, weight or income) – and, of course, some work with multiple data types.

If you try to use a statistical method that doesn’t support the data type you have, your results will be largely meaningless . So, make sure that you have a clear understanding of what types of data you’ve collected (or will collect). Once you have this, you can then check which statistical methods would support your data types here .

If you haven’t collected your data yet, you can work in reverse and look at which statistical method would give you the most useful insights, and then design your data collection strategy to collect the correct data types.

Another important factor to consider is the shape of your data . Specifically, does it have a normal distribution (in other words, is it a bell-shaped curve, centred in the middle) or is it very skewed to the left or the right? Again, different statistical techniques work for different shapes of data – some are designed for symmetrical data while others are designed for skewed data.

This is another reminder of why descriptive statistics are so important – they tell you all about the shape of your data.

Factor 2: Your research questions

The next thing you need to consider is your specific research questions, as well as your hypotheses (if you have some). The nature of your research questions and research hypotheses will heavily influence which statistical methods and techniques you should use.

If you’re just interested in understanding the attributes of your sample (as opposed to the entire population), then descriptive statistics are probably all you need. For example, if you just want to assess the means (averages) and medians (centre points) of variables in a group of people.

On the other hand, if you aim to understand differences between groups or relationships between variables and to infer or predict outcomes in the population, then you’ll likely need both descriptive statistics and inferential statistics.

So, it’s really important to get very clear about your research aims and research questions, as well your hypotheses – before you start looking at which statistical techniques to use.

Never shoehorn a specific statistical technique into your research just because you like it or have some experience with it. Your choice of methods must align with all the factors we’ve covered here.

Time to recap…

You’re still with me? That’s impressive. We’ve covered a lot of ground here, so let’s recap on the key points:

  • Quantitative data analysis is all about  analysing number-based data  (which includes categorical and numerical data) using various statistical techniques.
  • The two main  branches  of statistics are  descriptive statistics  and  inferential statistics . Descriptives describe your sample, whereas inferentials make predictions about what you’ll find in the population.
  • Common  descriptive statistical methods include  mean  (average),  median , standard  deviation  and  skewness .
  • Common  inferential statistical methods include  t-tests ,  ANOVA ,  correlation  and  regression  analysis.
  • To choose the right statistical methods and techniques, you need to consider the  type of data you’re working with , as well as your  research questions  and hypotheses.

regression analysis quantitative research

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

77 Comments

Oddy Labs

Hi, I have read your article. Such a brilliant post you have created.

Derek Jansen

Thank you for the feedback. Good luck with your quantitative analysis.

Abdullahi Ramat

Thank you so much.

Obi Eric Onyedikachi

Thank you so much. I learnt much well. I love your summaries of the concepts. I had love you to explain how to input data using SPSS

MWASOMOLA, BROWN

Very useful, I have got the concept

Lumbuka Kaunda

Amazing and simple way of breaking down quantitative methods.

Charles Lwanga

This is beautiful….especially for non-statisticians. I have skimmed through but I wish to read again. and please include me in other articles of the same nature when you do post. I am interested. I am sure, I could easily learn from you and get off the fear that I have had in the past. Thank you sincerely.

Essau Sefolo

Send me every new information you might have.

fatime

i need every new information

Dr Peter

Thank you for the blog. It is quite informative. Dr Peter Nemaenzhe PhD

Mvogo Mvogo Ephrem

It is wonderful. l’ve understood some of the concepts in a more compréhensive manner

Maya

Your article is so good! However, I am still a bit lost. I am doing a secondary research on Gun control in the US and increase in crime rates and I am not sure which analysis method I should use?

Joy

Based on the given learning points, this is inferential analysis, thus, use ‘t-tests, ANOVA, correlation and regression analysis’

Peter

Well explained notes. Am an MPH student and currently working on my thesis proposal, this has really helped me understand some of the things I didn’t know.

Jejamaije Mujoro

I like your page..helpful

prashant pandey

wonderful i got my concept crystal clear. thankyou!!

Dailess Banda

This is really helpful , thank you

Lulu

Thank you so much this helped

wossen

Wonderfully explained

Niamatullah zaheer

thank u so much, it was so informative

mona

THANKYOU, this was very informative and very helpful

Thaddeus Ogwoka

This is great GRADACOACH I am not a statistician but I require more of this in my thesis

Include me in your posts.

Alem Teshome

This is so great and fully useful. I would like to thank you again and again.

Mrinal

Glad to read this article. I’ve read lot of articles but this article is clear on all concepts. Thanks for sharing.

Emiola Adesina

Thank you so much. This is a very good foundation and intro into quantitative data analysis. Appreciate!

Josyl Hey Aquilam

You have a very impressive, simple but concise explanation of data analysis for Quantitative Research here. This is a God-send link for me to appreciate research more. Thank you so much!

Lynnet Chikwaikwai

Avery good presentation followed by the write up. yes you simplified statistics to make sense even to a layman like me. Thank so much keep it up. The presenter did ell too. i would like more of this for Qualitative and exhaust more of the test example like the Anova.

Adewole Ikeoluwa

This is a very helpful article, couldn’t have been clearer. Thank you.

Samih Soud ALBusaidi

Awesome and phenomenal information.Well done

Nūr

The video with the accompanying article is super helpful to demystify this topic. Very well done. Thank you so much.

Lalah

thank you so much, your presentation helped me a lot

Anjali

I don’t know how should I express that ur article is saviour for me 🥺😍

Saiqa Aftab Tunio

It is well defined information and thanks for sharing. It helps me a lot in understanding the statistical data.

Funeka Mvandaba

I gain a lot and thanks for sharing brilliant ideas, so wish to be linked on your email update.

Rita Kathomi Gikonyo

Very helpful and clear .Thank you Gradcoach.

Hilaria Barsabal

Thank for sharing this article, well organized and information presented are very clear.

AMON TAYEBWA

VERY INTERESTING AND SUPPORTIVE TO NEW RESEARCHERS LIKE ME. AT LEAST SOME BASICS ABOUT QUANTITATIVE.

Tariq

An outstanding, well explained and helpful article. This will help me so much with my data analysis for my research project. Thank you!

chikumbutso

wow this has just simplified everything i was scared of how i am gonna analyse my data but thanks to you i will be able to do so

Idris Haruna

simple and constant direction to research. thanks

Mbunda Castro

This is helpful

AshikB

Great writing!! Comprehensive and very helpful.

himalaya ravi

Do you provide any assistance for other steps of research methodology like making research problem testing hypothesis report and thesis writing?

Sarah chiwamba

Thank you so much for such useful article!

Lopamudra

Amazing article. So nicely explained. Wow

Thisali Liyanage

Very insightfull. Thanks

Melissa

I am doing a quality improvement project to determine if the implementation of a protocol will change prescribing habits. Would this be a t-test?

Aliyah

The is a very helpful blog, however, I’m still not sure how to analyze my data collected. I’m doing a research on “Free Education at the University of Guyana”

Belayneh Kassahun

tnx. fruitful blog!

Suzanne

So I am writing exams and would like to know how do establish which method of data analysis to use from the below research questions: I am a bit lost as to how I determine the data analysis method from the research questions.

Do female employees report higher job satisfaction than male employees with similar job descriptions across the South African telecommunications sector? – I though that maybe Chi Square could be used here. – Is there a gender difference in talented employees’ actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – Is there a gender difference in the cost of actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – What practical recommendations can be made to the management of South African telecommunications companies on leveraging gender to mitigate employee turnover decisions?

Your assistance will be appreciated if I could get a response as early as possible tomorrow

Like

This was quite helpful. Thank you so much.

kidane Getachew

wow I got a lot from this article, thank you very much, keep it up

FAROUK AHMAD NKENGA

Thanks for yhe guidance. Can you send me this guidance on my email? To enable offline reading?

Nosi Ruth Xabendlini

Thank you very much, this service is very helpful.

George William Kiyingi

Every novice researcher needs to read this article as it puts things so clear and easy to follow. Its been very helpful.

Adebisi

Wonderful!!!! you explained everything in a way that anyone can learn. Thank you!!

Miss Annah

I really enjoyed reading though this. Very easy to follow. Thank you

Reza Kia

Many thanks for your useful lecture, I would be really appreciated if you could possibly share with me the PPT of presentation related to Data type?

Protasia Tairo

Thank you very much for sharing, I got much from this article

Fatuma Chobo

This is a very informative write-up. Kindly include me in your latest posts.

naphtal

Very interesting mostly for social scientists

Boy M. Bachtiar

Thank you so much, very helpfull

You’re welcome 🙂

Dr Mafaza Mansoor

woow, its great, its very informative and well understood because of your way of writing like teaching in front of me in simple languages.

Opio Len

I have been struggling to understand a lot of these concepts. Thank you for the informative piece which is written with outstanding clarity.

Eric

very informative article. Easy to understand

Leena Fukey

Beautiful read, much needed.

didin

Always greet intro and summary. I learn so much from GradCoach

Mmusyoka

Quite informative. Simple and clear summary.

Jewel Faver

I thoroughly enjoyed reading your informative and inspiring piece. Your profound insights into this topic truly provide a better understanding of its complexity. I agree with the points you raised, especially when you delved into the specifics of the article. In my opinion, that aspect is often overlooked and deserves further attention.

Shantae

Absolutely!!! Thank you

Thazika Chitimera

Thank you very much for this post. It made me to understand how to do my data analysis.

lule victor

its nice work and excellent job ,you have made my work easier

Pedro Uwadum

Wow! So explicit. Well done.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

regression analysis quantitative research

Home Market Research

Regression Analysis: Definition, Types, Usage & Advantages

regression analysis quantitative research

Regression analysis is perhaps one of the most widely used statistical methods for investigating or estimating the relationship between a set of independent and dependent variables. In statistical analysis , distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities.

It is also used as a blanket term for various data analysis techniques utilized in a qualitative research method for modeling and analyzing numerous variables. In the regression method, the dependent variable is a predictor or an explanatory element, and the dependent variable is the outcome or a response to a specific query.

LEARN ABOUT:   Statistical Analysis Methods

Content Index

Definition of Regression Analysis

Types of regression analysis, regression analysis usage in market research, how regression analysis derives insights from surveys, advantages of using regression analysis in an online survey.

Regression analysis is often used to model or analyze data. Most survey analysts use it to understand the relationship between the variables, which can be further utilized to predict the precise outcome.

For Example – Suppose a soft drink company wants to expand its manufacturing unit to a newer location. Before moving forward, the company wants to analyze its revenue generation model and the various factors that might impact it. Hence, the company conducts an online survey with a specific questionnaire.

After using regression analysis, it becomes easier for the company to analyze the survey results and understand the relationship between different variables like electricity and revenue – here, revenue is the dependent variable.

LEARN ABOUT: Level of Analysis

In addition, understanding the relationship between different independent variables like pricing, number of workers, and logistics with the revenue helps the company estimate the impact of varied factors on sales and profits.

Survey researchers often use this technique to examine and find a correlation between different variables of interest. It provides an opportunity to gauge the influence of different independent variables on a dependent variable.

Overall, regression analysis saves the survey researchers’ additional efforts in arranging several independent variables in tables and testing or calculating their effect on a dependent variable. Different types of analytical research methods are widely used to evaluate new business ideas and make informed decisions.

Create a Free Account

Researchers usually start by learning linear and logistic regression first. Due to the widespread knowledge of these two methods and ease of application, many analysts think there are only two types of models. Each model has its own specialty and ability to perform if specific conditions are met.

This blog explains the commonly used seven types of multiple regression analysis methods that can be used to interpret the enumerated data in various formats.

01. Linear Regression Analysis

It is one of the most widely known modeling techniques, as it is amongst the first elite regression analysis methods picked up by people at the time of learning predictive modeling. Here, the dependent variable is continuous, and the independent variable is more often continuous or discreet with a linear regression line.

Please note that multiple linear regression has more than one independent variable than simple linear regression. Thus, linear regression is best to be used only when there is a linear relationship between the independent and a dependent variable.

A business can use linear regression to measure the effectiveness of the marketing campaigns, pricing, and promotions on sales of a product. Suppose a company selling sports equipment wants to understand if the funds they have invested in the marketing and branding of their products have given them substantial returns or not.

Linear regression is the best statistical method to interpret the results. The best thing about linear regression is it also helps in analyzing the obscure impact of each marketing and branding activity, yet controlling the constituent’s potential to regulate the sales.

If the company is running two or more advertising campaigns simultaneously, one on television and two on radio, then linear regression can easily analyze the independent and combined influence of running these advertisements together.

LEARN ABOUT: Data Analytics Projects

02. Logistic Regression Analysis

Logistic regression is commonly used to determine the probability of event success and event failure. Logistic regression is used whenever the dependent variable is binary, like 0/1, True/False, or Yes/No. Thus, it can be said that logistic regression is used to analyze either the close-ended questions in a survey or the questions demanding numeric responses in a survey.

Please note logistic regression does not need a linear relationship between a dependent and an independent variable, just like linear regression. Logistic regression applies a non-linear log transformation for predicting the odds ratio; therefore, it easily handles various types of relationships between a dependent and an independent variable.

Logistic regression is widely used to analyze categorical data, particularly for binary response data in business data modeling. More often, logistic regression is used when the dependent variable is categorical, like to predict whether the health claim made by a person is real(1) or fraudulent, to understand if the tumor is malignant(1) or not.

Businesses use logistic regression to predict whether the consumers in a particular demographic will purchase their product or will buy from the competitors based on age, income, gender, race, state of residence, previous purchase, etc.

03. Polynomial Regression Analysis

Polynomial regression is commonly used to analyze curvilinear data when an independent variable’s power is more than 1. In this regression analysis method, the best-fit line is never a ‘straight line’ but always a ‘curve line’ fitting into the data points.

Please note that polynomial regression is better to use when two or more variables have exponents and a few do not.

Additionally, it can model non-linearly separable data offering the liberty to choose the exact exponent for each variable, and that too with full control over the modeling features available.

When combined with response surface analysis, polynomial regression is considered one of the sophisticated statistical methods commonly used in multisource feedback research. Polynomial regression is used mostly in finance and insurance-related industries where the relationship between dependent and independent variables is curvilinear.

Suppose a person wants to budget expense planning by determining how long it would take to earn a definitive sum. Polynomial regression, by taking into account his/her income and predicting expenses, can easily determine the precise time he/she needs to work to earn that specific sum amount.

04. Stepwise Regression Analysis

This is a semi-automated process with which a statistical model is built either by adding or removing the dependent variable on the t-statistics of their estimated coefficients.

If used properly, the stepwise regression will provide you with more powerful data at your fingertips than any method. It works well when you are working with a large number of independent variables. It just fine-tunes the unit of analysis model by poking variables randomly.

Stepwise regression analysis is recommended to be used when there are multiple independent variables, wherein the selection of independent variables is done automatically without human intervention.

Please note, in stepwise regression modeling, the variable is added or subtracted from the set of explanatory variables. The set of added or removed variables is chosen depending on the test statistics of the estimated coefficient.

Suppose you have a set of independent variables like age, weight, body surface area, duration of hypertension, basal pulse, and stress index based on which you want to analyze its impact on the blood pressure.

In stepwise regression, the best subset of the independent variable is automatically chosen; it either starts by choosing no variable to proceed further (as it adds one variable at a time) or starts with all variables in the model and proceeds backward (removes one variable at a time).

Thus, using regression analysis, you can calculate the impact of each or a group of variables on blood pressure.

05. Ridge Regression Analysis

Ridge regression is based on an ordinary least square method which is used to analyze multicollinearity data (data where independent variables are highly correlated). Collinearity can be explained as a near-linear relationship between variables.

Whenever there is multicollinearity, the estimates of least squares will be unbiased, but if the difference between them is larger, then it may be far away from the true value. However, ridge regression eliminates the standard errors by appending some degree of bias to the regression estimates with a motive to provide more reliable estimates.

If you want, you can also learn about Selection Bias through our blog.

Please note, Assumptions derived through the ridge regression are similar to the least squared regression, the only difference being the normality. Although the value of the coefficient is constricted in the ridge regression, it never reaches zero suggesting the inability to select variables.

Suppose you are crazy about two guitarists performing live at an event near you, and you go to watch their performance with a motive to find out who is a better guitarist. But when the performance starts, you notice that both are playing black-and-blue notes at the same time.

Is it possible to find out the best guitarist having the biggest impact on sound among them when they are both playing loud and fast? As both of them are playing different notes, it is substantially difficult to differentiate them, making it the best case of multicollinearity, which tends to increase the standard errors of the coefficients.

Ridge regression addresses multicollinearity in cases like these and includes bias or a shrinkage estimation to derive results.

06. Lasso Regression Analysis

Lasso (Least Absolute Shrinkage and Selection Operator) is similar to ridge regression; however, it uses an absolute value bias instead of the square bias used in ridge regression.

It was developed way back in 1989 as an alternative to the traditional least-squares estimate with the intention to deduce the majority of problems related to overfitting when the data has a large number of independent variables.

Lasso has the capability to perform both – selecting variables and regularizing them along with a soft threshold. Applying lasso regression makes it easier to derive a subset of predictors from minimizing prediction errors while analyzing a quantitative response.

Please note that regression coefficients reaching zero value after shrinkage are excluded from the lasso model. On the contrary, regression coefficients having more value than zero are strongly associated with the response variables, wherein the explanatory variables can be either quantitative, categorical, or both.

Suppose an automobile company wants to perform a research analysis on average fuel consumption by cars in the US. For samples, they chose 32 models of car and 10 features of automobile design – Number of cylinders, Displacement, Gross horsepower, Rear axle ratio, Weight, ¼ mile time, v/s engine, transmission, number of gears, and number of carburetors.

As you can see a correlation between the response variable mpg (miles per gallon) is extremely correlated to some variables like weight, displacement, number of cylinders, and horsepower. The problem can be analyzed by using the glmnet package in R and lasso regression for feature selection.

07. Elastic Net Regression Analysis

It is a mixture of ridge and lasso regression models trained with L1 and L2 norms. The elastic net brings about a grouping effect wherein strongly correlated predictors tend to be in/out of the model together. Using the elastic net regression model is recommended when the number of predictors is far greater than the number of observations.

Please note that the elastic net regression model came into existence as an option to the lasso regression model as lasso’s variable section was too much dependent on data, making it unstable. By using elastic net regression, statisticians became capable of over-bridging the penalties of ridge and lasso regression only to get the best out of both models.

A clinical research team having access to a microarray data set on leukemia (LEU) was interested in constructing a diagnostic rule based on the expression level of presented gene samples for predicting the type of leukemia. The data set they had, consisted of a large number of genes and a few samples.

Apart from that, they were given a specific set of samples to be used as training samples, out of which some were infected with type 1 leukemia (acute lymphoblastic leukemia) and some with type 2 leukemia (acute myeloid leukemia).

Model fitting and tuning parameter selection by tenfold CV were carried out on the training data. Then they compared the performance of those methods by computing their prediction mean-squared error on the test data to get the necessary results.

A market research survey focuses on three major matrices; Customer Satisfaction , Customer Loyalty , and Customer Advocacy . Remember, although these matrices tell us about customer health and intentions, they fail to tell us ways of improving the position. Therefore, an in-depth survey questionnaire intended to ask consumers the reason behind their dissatisfaction is definitely a way to gain practical insights.

However, it has been found that people often struggle to put forth their motivation or demotivation or describe their satisfaction or dissatisfaction. In addition to that, people always give undue importance to some rational factors, such as price, packaging, etc. Overall, it acts as a predictive analytic and forecasting tool in market research.

When used as a forecasting tool, regression analysis can determine an organization’s sales figures by taking into account external market data. A multinational company conducts a market research survey to understand the impact of various factors such as GDP (Gross Domestic Product), CPI (Consumer Price Index), and other similar factors on its revenue generation model.

Obviously, regression analysis in consideration of forecasted marketing indicators was used to predict a tentative revenue that will be generated in future quarters and even in future years. However, the more forward you go in the future, the data will become more unreliable, leaving a wide margin of error .

Case study of using regression analysis

A water purifier company wanted to understand the factors leading to brand favorability. The survey was the best medium for reaching out to existing and prospective customers. A large-scale consumer survey was planned, and a discreet questionnaire was prepared using the best survey tool .

A number of questions related to the brand, favorability, satisfaction, and probable dissatisfaction were effectively asked in the survey. After getting optimum responses to the survey, regression analysis was used to narrow down the top ten factors responsible for driving brand favorability.

All the ten attributes derived (mentioned in the image below) in one or the other way highlighted their importance in impacting the favorability of that specific water purifier brand.

Regression Analysis in Market Research

It is easy to run a regression analysis using Excel or SPSS, but while doing so, the importance of four numbers in interpreting the data must be understood.

The first two numbers out of the four numbers directly relate to the regression model itself.

  • F-Value: It helps in measuring the statistical significance of the survey model. Remember, an F-Value significantly less than 0.05 is considered to be more meaningful. Less than 0.05 F-Value ensures survey analysis output is not by chance.
  • R-Squared: This is the value wherein the independent variables try to explain the amount of movement by dependent variables. Considering the R-Squared value is 0.7, a tested independent variable can explain 70% of the dependent variable’s movement. It means the survey analysis output we will be getting is highly predictive in nature and can be considered accurate.

The other two numbers relate to each of the independent variables while interpreting regression analysis.

  • P-Value: Like F-Value, even the P-Value is statistically significant. Moreover, here it indicates how relevant and statistically significant the independent variable’s effect is. Once again, we are looking for a value of less than 0.05.
  • Interpretation: The fourth number relates to the coefficient achieved after measuring the impact of variables. For instance, we test multiple independent variables to get a coefficient. It tells us, ‘by what value the dependent variable is expected to increase when independent variables (which we are considering) increase by one when all other independent variables are stagnant at the same value.

In a few cases, the simple coefficient is replaced by a standardized coefficient demonstrating the contribution from each independent variable to move or bring about a change in the dependent variable.

01. Get access to predictive analytics

Do you know utilizing regression analysis to understand the outcome of a business survey is like having the power to unveil future opportunities and risks?

For example, after seeing a particular television advertisement slot, we can predict the exact number of businesses using that data to estimate a maximum bid for that slot. The finance and insurance industry as a whole depends a lot on regression analysis of survey data to identify trends and opportunities for more accurate planning and decision-making.

02. Enhance operational efficiency

Do you know businesses use regression analysis to optimize their business processes?

For example, before launching a new product line, businesses conduct consumer surveys to better understand the impact of various factors on the product’s production, packaging, distribution, and consumption.

A data-driven foresight helps eliminate the guesswork, hypothesis, and internal politics from decision-making. A deeper understanding of the areas impacting operational efficiencies and revenues leads to better business optimization.

03. Quantitative support for decision-making

Business surveys today generate a lot of data related to finance, revenue, operation, purchases, etc., and business owners are heavily dependent on various data analysis models to make informed business decisions.

For example, regression analysis helps enterprises to make informed strategic workforce decisions. Conducting and interpreting the outcome of employee surveys like Employee Engagement Surveys, Employee Satisfaction Surveys, Employer Improvement Surveys, Employee Exit Surveys, etc., boosts the understanding of the relationship between employees and the enterprise.

It also helps get a fair idea of certain issues impacting the organization’s working culture, working environment, and productivity. Furthermore, intelligent business-oriented interpretations reduce the huge pile of raw data into actionable information to make a more informed decision.

04. Prevent mistakes from happening due to intuitions

By knowing how to use regression analysis for interpreting survey results, one can easily provide factual support to management for making informed decisions. ; but do you know that it also helps in keeping out faults in the judgment?

For example, a mall manager thinks if he extends the closing time of the mall, then it will result in more sales. Regression analysis contradicts the belief that predicting increased revenue due to increased sales won’t support the increased operating expenses arising from longer working hours.

Regression analysis is a useful statistical method for modeling and comprehending the relationships between variables. It provides numerous advantages to various data types and interactions. Researchers and analysts may gain useful insights into the factors influencing a dependent variable and use the results to make informed decisions. 

With QuestionPro Research, you can improve the efficiency and accuracy of regression analysis by streamlining the data gathering, analysis, and reporting processes. The platform’s user-friendly interface and wide range of features make it a valuable tool for researchers and analysts conducting regression analysis as part of their research projects.

Sign up for the free trial today and let your research dreams fly!

LEARN MORE         FREE TRIAL

MORE LIKE THIS

Types of organizational change

Types of Organizational Change & Strategies for Business

Aug 2, 2024

voice of the customer programs

Voice of the Customer Programs: What It Is, Implementations

Aug 1, 2024

regression analysis quantitative research

A Case for Empowerment and Being Bold — Tuesday CX Thoughts

Jul 30, 2024

typeform vs google forms

Typeform vs. Google Forms: Which one is best for my needs?

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence

What is Regression Analysis?

  • Regression Analysis – Linear Model Assumptions
  • Regression Analysis – Simple Linear Regression
  • Regression Analysis – Multiple Linear Regression

Regression Analysis in Finance

Regression tools, additional resources, regression analysis.

The estimation of relationships between a dependent variable and one or more independent variables

Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables . It can be utilized to assess the strength of the relationship between variables and for modeling the future relationship between them.

Regression Analysis - Types of Regression Analysis

Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most common models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for more complicated data sets in which the dependent and independent variables show a nonlinear relationship.

Regression analysis offers numerous applications in various disciplines, including finance .

Regression Analysis – Linear Model Assumptions

Linear regression analysis is based on six fundamental assumptions:

  • The dependent and independent variables show a linear relationship between the slope and the intercept.
  • The independent variable is not random.
  • The value of the residual (error) is zero.
  • The value of the residual (error) is constant across all observations.
  • The value of the residual (error) is not correlated across all observations.
  • The residual (error) values follow the normal distribution.

Regression Analysis – Simple Linear Regression

Simple linear regression is a model that assesses the relationship between a dependent variable and an independent variable. The simple linear model is expressed using the following equation:

Y = a + bX + ϵ

  • Y – Dependent variable
  • X – Independent (explanatory) variable
  • a – Intercept
  • b – Slope
  • ϵ – Residual (error)

Check out the following video to learn more about simple linear regression:

Regression Analysis – Multiple Linear Regression

Multiple linear regression analysis is essentially similar to the simple linear model, with the exception that multiple independent variables are used in the model. The mathematical representation of multiple linear regression is:

Y = a + b X 1  + c X 2  + d X 3 + ϵ

  • X 1 , X 2 , X 3  – Independent (explanatory) variables
  • b, c, d – Slopes

Multiple linear regression follows the same conditions as the simple linear model. However, since there are several independent variables in multiple linear analysis, there is another mandatory condition for the model:

  • Non-collinearity: Independent variables should show a minimum correlation with each other. If the independent variables are highly correlated with each other, it will be difficult to assess the true relationships between the dependent and independent variables.

Regression analysis comes with several applications in finance. For example, the statistical method is fundamental to the Capital Asset Pricing Model (CAPM) . Essentially, the CAPM equation is a model that determines the relationship between the expected return of an asset and the market risk premium.

The analysis is also used to forecast the returns of securities, based on different factors, or to forecast the performance of a business. Learn more forecasting methods in CFI’s Budgeting and Forecasting Course !

1. Beta and CAPM

In finance, regression analysis is used to calculate the Beta (volatility of returns relative to the overall market) for a stock. It can be done in Excel using the Slope function .

Screenshot of Beta Calculator Template in Excel

Download CFI’s free beta calculator !

2. Forecasting Revenues and Expenses

When forecasting financial statements for a company, it may be useful to do a multiple regression analysis to determine how changes in certain assumptions or drivers of the business will impact revenue or expenses in the future. For example, there may be a very high correlation between the number of salespeople employed by a company, the number of stores they operate, and the revenue the business generates.

Simple Linear Regression - Forecasting Revenues and Expenses

The above example shows how to use the Forecast function in Excel to calculate a company’s revenue, based on the number of ads it runs.

Learn more forecasting methods in CFI’s Budgeting and Forecasting Course !

Excel remains a popular tool to conduct basic regression analysis in finance, however, there are many more advanced statistical tools that can be used.

Python and R are both powerful coding languages that have become popular for all types of financial modeling, including regression. These techniques form a core part of data science and machine learning, where models are trained to detect these relationships in data.

Learn more about regression analysis, Python, and Machine Learning in CFI’s Business Intelligence & Data Analysis certification.

To learn more about related topics, check out the following free CFI resources:

  • Cost Behavior Analysis
  • Forecasting Methods
  • Joseph Effect
  • Variance Inflation Factor (VIF)
  • High Low Method vs. Regression Analysis
  • See all data science resources
  • Share this article

Excel Fundamentals - Formulas for Finance

Create a free account to unlock this Template

Access and download collection of free Templates to help power your productivity and performance.

Already have an account? Log in

Supercharge your skills with Premium Templates

Take your learning and productivity to the next level with our Premium Templates.

Upgrading to a paid membership gives you access to our extensive collection of plug-and-play Templates designed to power your performance—as well as CFI's full course catalog and accredited Certification Programs.

Already have a Self-Study or Full-Immersion membership? Log in

Access Exclusive Templates

Gain unlimited access to more than 250 productivity Templates, CFI's full course catalog and accredited Certification Programs, hundreds of resources, expert reviews and support, the chance to work with real-world finance and research tools, and more.

Already have a Full-Immersion membership? Log in

Regression Analysis: A Comprehensive Guide to Quantitative Forecasting

Master regression analysis for accurate forecasting with our expert guide. Dive into quantitative methods for predictive insights.

Regression analysis stands as a cornerstone in the realm of quantitative forecasting, offering an extensive suite of methods for researchers and analysts who seek to understand and predict relationships among variables. It is an indispensable statistical tool that aids decision-making across fields as varied as economics, medicine, and environmental studies. At its core, regression analysis is utilized to discern patterns in data, forecast future trends, optimize business strategies, and support scientific research.

This academic exposé delves into the intricacies of regression analysis, highlighting its multifaceted uses, strengths, and limitations. We begin by establishing a sound foundation on the topic and thereafter explore the types, methodology, outputs, applications, recent developments, and lastly provide a summation of its crucial role in today’s data-driven landscape.

Introduction to Regression Analysis

Definition of regression analysis.

Regression analysis is a statistical method used to model the relationship between a dependent variable and one or more independent variables. The primary goal is to understand how the typical value of the dependent variable changes when any one of the independent variables is varied while the other independent variables are held fixed.

In simpler terms, it attempts to explain the variation in a variable of interest, such as sales or growth, by breaking it down into the effect of various factors.

Brief History and Evolution of Regression Analysis

Historically, regression analysis has its origins in the 19th century with Sir Francis Galton's work on heredity, coining the term "regression" to describe the phenomenon he observed that the heights of descendants of tall ancestors tended to regress down towards a normal average.

Over the years, the technique has evolved significantly, absorbing contributions from mathematicians and statisticians to become more sophisticated and applicable across various scientific disciplines.

Importance and Application of Regression Analysis in Various Fields

Unlocking the power of statistics with a probability-based approach.

Root Cause Tree Analysis: Insights to Forensic Decision Making

Aristotle: The Wise Choice of Excellence

Unlocking the Secrets of Da Vinci: A Comprehensive Guide

Today, regression analysis is crucial across myriad fields for making informed decisions. In finance, it predicts stock prices, in marketing it analyzes consumer behavior, and in healthcare, it assesses treatment effectiveness. Its applications are not limited to these areas and its versatility is what makes it an essential analytical tool for professionals and researchers alike.

Types of Regression Analysis

Explanation of simple linear regression.

Simple linear regression is the most basic form of regression that involves predicting a quantitative response based on a single predictor variable. It is represented by the equation Y = a + bX + e, where Y is the dependent variable, X is the independent variable, a is the y-intercept, b is the slope, and e is the error term. The method assumes a straight-line relationship between the two variables.

Understanding Multiple Linear Regression

When more than one independent variable is present, multiple linear regression is employed. This method is capable of handling numerous predictors and gauging the influence of each on the dependent variable. It extends the simple linear regression model by incorporating multiple coefficients, one for each variable. This allows for a multi-dimensional analysis of data.

Unveiling the Concept of Polynomial Regression

Polynomial regression steps beyond the straight-line relationship and involves an equation where the power of the independent variable is greater than one. It is particularly useful when the relationship between variables is curvilinear. This type of regression can model a wider range of curves and can thus fit complex datasets more flexibly than simple or multiple linear regressions.

Overview of Ridge Regression

Ridge Regression is a technique used when data suffer from multicollinearity, where predictor variables are highly correlated. Unlike standard least squares regression, which can have significant problems in the presence of multicollinearity, Ridge Regression adds a degree of bias to the regression estimates, which serves to reduce the standard errors.

Understanding Lasso Regression

The Lasso Regression is similar to Ridge Regression but has the ability to reduce the coefficient estimates for the least important variables all the way to zero. This acts as a form of automatic variable selection and thus produces simpler and more interpretable models, which is particularly beneficial in the context of large datasets with many features.

Insights into Logistic Regression

Unlike the previously mentioned methods that predict quantitative outcomes, Logistic Regression is used for categorical dependent variables, particularly for binary classification. It estimates the probability that a certain event occurs, such as pass/fail, win/lose, alive/dead, based on an underlying linear relationship between the logits of the probabilities and the predictors.

Highlights on Stepwise Regression

Stepwise Regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automated procedure. During this process, variables are added or subtracted from the multivariable model based on their statistical significance, often using F-tests or t-tests.

Other Types of Regression (Brief Overview)

There are numerous other types of regression analysis techniques available to address specific circumstances and datasets including Quantile Regression, Cox Regression, and Elastic Net Regression. Each carries its own assumptions, application contexts, and considerations, offering diverse tools for robust analysis of complex data patterns.

Steps in Conducting a Regression Analysis

Problem definition.

The first crucial step in conducting a regression analysis is defining the problem. This involves understanding the context and outlining the specific question or hypothesis that the regression model aims to address. A clear problem statement guides the direction of the analysis and ensures that the right type of regression analysis is employed.

Data Collection

Data collection comes as the second step. This phase involves gathering adequate and relevant data to work with. The quality and quantity of data collected directly influence the reliability of the analytical results. The researcher must pay attention to the sources, nature, and integrity of the data to mitigate any potential biases or errors.

Variables Identification

Once data is collected, identifying and classifying variables into independent and dependent categories is imperative. This process requires a thorough understanding of the dataset and the hypothesized relationships. Proper identification ensures that the appropriate modeling techniques are applied and that the findings from the analysis will be valid.

Model Specification

Model specification involves choosing the suitable regression model based upon the nature of the dependent variable and the shape of the relationship between the variables. Here, the researcher decides whether to use simple linear, multiple linear, or another type of regression and defines how the variables will be included in the model.

Model Fit and Assumptions Checking

Once the model is specified, fitting the model to the data is the next step. This includes estimating the regression coefficients. Additionally, checking the underlying assumptions of the selected regression model, such as linearity, independence of errors, homoscedasticity, and normality of error distributions, is critical to ensure accuracy and reliability of the results.

Interpretation of Results

The last step is interpreting the results obtained from the analysis. Coefficients need to be examined to understand the relationship between the independent variables and the dependent variable, the error term to check the model’s predictive power, and the significance levels to determine the reliability of the predictions. It's essential to report and interpret these findings in a manner that's comprehensible and actionable for the intended audience.

Understanding Regression Analysis Outputs

Deciphering coefficients.

In regression analysis, coefficients represent the magnitude and direction of the relationship between an independent variable and the dependent variable. Deciphering these values allows researchers to understand how much the dependent variable is expected to change with a one-unit change in the independent variable, holding all other variables constant.

Recognizing Error Term

The error term in a regression equation is indicative of the variation in the dependent variable that cannot be explained by the independent variables in the model. It represents the distance between the actual data points and the predicted values by the model, often reflecting information that was not accounted for in the model.

Understanding R-squared and Adjusted R-squared

R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variables. Meanwhile, the Adjusted R-squared adjusts the R-squared value based on the number of predictors in the model, providing a more accurate reflection of the model's explanatory power, especially in the context of multiple regression.

Making Sense of Confidence Intervals and Significance Levels

Confidence intervals and significance levels are crucial in assessing the reliability of the regression estimates. Confidence intervals offer a range of values within which the true population parameter is likely to fall, while significance levels (often denoted by p-values) inform whether the observed relationship between the variables is statistically significant, not likely due to random chance.

Advantages and Limitations of Regression Analysis

Highlighting the strengths of regression analysis.

The major advantages of regression analysis include its ability to infer relationships, predict future values, and control for various confounding variables. It enables analysts to quantify the impact of changes in predictor variables on the outcome, making it an essential tool for data-driven decision-making.

Identifying the Weaknesses and Pitfalls

However, regression analysis is not without its limitations. The accuracy of the results depends heavily on the appropriateness of the selected model and underlying assumptions. Misinterpretation of results can occur if these conditions are not properly checked or understood. Influential points, multicollinearity, or autocorrelation can also distort the outcome, and it's critical to be aware of these potential pitfalls.

Practical Application of Regression Analysis

Regression analysis in business and economics.

In the realms of business and economics, regression analysis is frequently employed for demand forecasting, risk management, and optimizing operational efficiencies. Example usages include predicting sales based on advertising spends or assessing the impact of economic variables on market trends.

Role of Regression Analysis in Healthcare and Medicine

Healthcare and medicine leverage regression analysis to analyze patient outcomes, the efficacy of new drugs, and to calculate risk scores for diseases. It helps in building models that can predict health events or responses to treatments, contributing immensely to patient care and public health policies.

Use of Regression Analysis in Social Sciences

In the social sciences, regression analysis provides insights into the factors that influence human behavior and social phenomena. It's instrumental in fields such as psychology, sociology, and political science, where researchers can isolate and examine the effects of socioeconomic variables on various outcomes.

Regression Analysis in Environmental Studies

Environmental studies utilize regression analysis to model ecological processes and forecast environmental changes. For instance, understanding the factors that influence pollution levels or the impact of climate variables on species distributions.

Regression Analysis in Engineering

Engineers apply regression analysis for quality control, product design, and optimization. It assists in understanding how various design parameters affect the performance or reliability of engineered systems, leading to better and more efficient designs.

Recent Developments and Future Trends in Regression Analysis

Venture into machine learning integration with regression.

The interfacing of regression analysis with machine learning signifies a significant development, as it enhances predictive modeling with algorithms that can learn patterns from large datasets. Techniques such as regularized regression, like Lasso and Ridge mentioned earlier, are at the forefront of this overlap.

Overview of Big Data and Regression Analysis

As we stride deeper into the age of Big Data, regression analysis techniques adapt and evolve to handle the immense volume and complexity of data. Big data analytics often require sophisticated forms of regression that can process high-dimensional datasets efficiently and effectively.

Predictions and Future Trends in the Field

The future of regression analysis promises to unfurl with the continued integration of new computational techniques and the adoption of more robust statistical methodologies to accommodate evolving data trends. The omnipresence of data and the drive towards precision in predictions assure that regression analysis will persist as a linchpin in quantitative analysis for years to come.

Concluding Thoughts on Regression Analysis

Recap of key points of regression analysis.

From the simple linear models to complex machine learning integrations, regression analysis encompasses an expansive spectrum of methods tailored to interpret the past, illuminate the present, and predict the future. It provides a robust framework for quantitative forecasting and decision-making across a variety of domains.

Importance of Regression Analysis in Decision Making and Policy Formulation

The ability of regression analysis to distill insights from raw data and identify cause-and-effect relationships underpins its significant role in guiding policy formulation and strategic decision-making. Its structured approach enables stakeholders to make data-driven choices with increased confidence.

Encouraging Further Study and Application of Regression Analysis

The persistent evolution of analytical methods, alongside the increasing volume and variety of available data, underscores the importance of continuous learning and application of regression analysis techniques. Individuals and organizations are encouraged to invest in problem solving training courses and online certificate course offerings, broadening their analytical repertoire and enhancing their ability to harness the full potential of regression analysis in the data-rich world that lies ahead.

What are the primary assumptions made in regression analysis and why are they vital for accurate forecasting?

Understanding regression analysis.

Regression analysis stands as a statistical tool. It models relationships between variables. Forecasters and researchers rely on it heavily. For accurate forecasting, assumptions must hold true. Proper understanding of these assumptions ensures robust models.

Linearity Assumption

The linearity assumption is fundamental. It posits a linear relationship between predictor and outcome variables. When this assumption is violated, predictions become unreliable. Linearity can be checked with scatter plots or residual plots. Non-linear relationships require alternative modeling approaches.

Independence Assumption

Independence assumes observations are not correlated. When they are, we encounter autocorrelation . Autocorrelation distorts standard errors. This leads to incorrect statistical tests. Time series data often violate this assumption. Thus, special care is necessary in such analyses.

Homoscedasticity Assumption

Homoscedasticity implies constant variance of errors. Unequal variances, or heteroscedasticity , affect confidence intervals and hypothesis tests. This assumption can be scrutinized through residual plots. Corrective measures include transformations or robust standard errors.

Normality Assumption

Errors should distribute normally for precise hypothesis testing. Non-normality signals potential model issues. These may include incorrect specification or outliers. The normality assumption mainly affects small sample sizes.

No Multicollinearity Assumption

Multicollinearity exists when predictors correlate strongly. This complicates the interpretation of individual coefficients. Variance inflation factor (VIF) helps detect multicollinearity. High VIF values suggest a need to reconsider the model.

Why These Assumptions Matter

Assumptions in regression are not arbitrary. They cement the foundation for reliable results. Valid inference on coefficients depends on these. Accurate forecasting does too.

- Predictive Accuracy : Correct assumptions guide toward accurate predictions.

- Correct Inference : Meeting assumptions leads to valid hypothesis tests.

- Confidence in Results : Adhering to assumptions builds confidence in findings.

- Tool Selection : Awareness of assumptions guides the choice of statistical tools.

These conditions interlink to ensure that the regression models crafted produce outcomes close to reality. It is this adherence that transforms raw data into insightful, actionable forecasts. For those keen on extracting truth from numbers, the journey begins and ends with meeting these assumptions.

How is multicollinearity detected in regression analysis and what strategies can be used to address it?

Multicollinearity detection.

Detecting multicollinearity involves several statistical methods. Analysts often start with correlation matrices . Strong correlations suggest multicollinearity. Correlations close to 1 or -1 are red flags. Correlation coefficients represent the strength and direction of linear relationships. They range from -1 to 1. High absolute values indicate potential problems.

Variance Inflation Factor

Another key tool is the Variance Inflation Factor (VIF) . VIF quantifies multicollinearity severity. It measures how much variance increases for estimated regression coefficients. VIF values above 5 or 10 indicate high multicollinearity. Some experts accept a lower threshold. They consider VIF above 2.5 as problematic.

Tolerance Levels

VIF relates inversely to tolerance . Tolerance measures how well a model predicts without a predictor. Low tolerance values suggest multicollinearity. Values below 0.1 often warrant further investigation. They can signal that the independent variable has multicollinearity issues.

Eigenvalue Analysis

Eigenvalue analysis offers deeper insight. It involves decomposing the matrix. Small eigenvalues can show multicollinearity presence. Analysts compare them to the condition index. A condition index over 30 suggests serious multicollinearity.

Condition Index

The condition index is crucial. It measures matrix sensitivity to minor changes. High values can indicate numerical problems. They often flag high multicollinearity.

Addressing Multicollinearity

Omit variables.

One strategy is to omit variables . Multicollinear variables may not all be necessary. Removing one can solve the problem. Depth in understanding the data guides this choice. It involves model simplification .

Combine Variables

Another method is to combine variables . This can involve creating indices or scores. It reduces the number of predictors. It combines related information into a single predictor.

Principal Component Analysis

Principal Component Analysis (PCA) is more complex. It creates uncorrelated predictors. PCA transforms the data into principal components. These components help maintain the information. They do so without multicollinearity.

Regularization Techniques

Regularization techniques like Ridge regression adjust coefficients. They shrink them towards zero. This can reduce multicollinearity impacts. It ensures better generalization for the model.

Increase Sample Size

Lastly, increasing the sample size can help. More data provides more information. It can reduce variance in the estimates. It also lowers the chances of finding false relationships.

Understanding and addressing multicollinearity strengthens regression analysis. It ensures valid, reliable, and interpretable models. Analysts must detect and remedy this issue to ensure clear conclusions. We can better understand how variables really relate to each other. With this insight, we make more accurate predictions and better decisions.

How are outliers identified and treated in regression analysis to ensure reliability of the forecast?

Outliers in regression analysis, defining outliers.

Outliers present significant challenges in regression analysis. These are atypical observations. They deviate markedly from other data points. Analysts often spot them during preliminary data analysis. Outliers can distort predictions. They can affect the regression equation disproportionately. Accurate identification is crucial for reliable forecasting.

Identifying Outliers

Several methods aid outlier detection. Visual approaches include scatter plots. They allow quick outlier identification. Histograms and boxplots also serve this purpose. Statistical tests offer more precision. The Z-score method detects data points far from the mean. Grubbs' test identifies the most extreme outlier.

Standardizing Data

d -values standardize the difference between values. The interquartile range (IQR) method detects values beyond a threshold. Usually, these are 1.5 times the IQR above the third quartile. Or below the first quartile.

Treatment of Outliers

Once identified, several treatment options exist. Simplest is removal. This option suits clear errors or irrelevant data. Another approach involves transformation. It reduces the impact of extreme values. Logarithmic transformation is one example.

Advanced Methods

Robust regression techniques downplay outliers. They weigh them less in the analysis. This method maintains outlier inclusion while reducing influence. Winsorizing is another technique. It replaces extreme values. It uses the nearest value within the acceptable range.

Addressing Influential Points

Influential points affect regression results significantly. These outliers can skew regression lines dramatically. Cook’s Distance is a measure of influence. Analysts use it to assess each point's impact on the regression coefficients.

Testing and Validation

After outlier treatment, model reevaluation is necessary. One must check for improvement in model fit. Adjustments continue until the model shows robust predictive power. Cross-validation can assess the regression's reliability.

Outliers have major effects on regression analyses. Identifying and addressing them is key. Proper treatment ensures reliable and accurate forecasting. Analysts must balance outlier detection and treatment. This balance ensures the integrity of their models. It also prevents overfitting and maintains model validity.

A middle-aged man is seen wearing a pair of black-rimmed glasses. His hair is slightly tousled, and he looks off to the side, suggesting he is deep in thought. He is wearing a navy blue sweater, and his hands are folded in front of him. His facial expression is one of concentration and contemplation. He appears to be in an office, with a white wall in the background and a few bookshelves visible behind him. He looks calm and composed.

He is a content producer who specializes in blog content. He has a master's degree in business administration and he lives in the Netherlands.

Learn how to develop a positive attitude to problem solving and gain the skills to tackle any challenge. Discover the power of a positive mindset and how it can help you succeed.

A Positive Attitude for Problem Solving Skills

Unlock the power of data and uncover new insights with this probabilitybased approach Statistics Tool

Developing Problem Solving Skills Since 1960s WSEIAC Report

A magnifying glass with a light emitting from it is being held up against a black background. The light is illuminating a white letter O on the background. Below the magnifying glass, a spider is visible, with a web encircling it. In the foreground, a woman wearing a white turtleneck and black jacket is visible. She looks to be examining the magnifying glass and the spider. The scene is illuminated by the magnifying glass's bright light, and the spider web is highlighted against the dark background. A close-up of the spider web reveals intricate details of the structure. This image can be used to demonstrate the power of a magnifying glass in exploring the world of tiny creatures.

How Darwin Cultivated His Problem-Solving Skills

Logo for Rhode Island College Digital Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Quantitative Data Analysis With SPSS

17 Quantitative Analysis with SPSS: Bivariate Regression

Mikaila Mariel Lemonik Arthur

This chapter will detail how to conduct basic bivariate linear regression analysis using one continuous independent variable and one continuous dependent variable. The concepts and mathematics underpinning regression are discussed more fully in the chapter on Correlation and Regression . Some more advanced regression techniques will be discussed in the chapter on Multivariate Regression .

Before beginning a regression analysis, analysts should first run appropriate descriptive statistics . In addition, they should create a scatterplot with regression line, as described in the chapter on Quantitative Analysis with SPSS: Correlation & descriptive statistics. One important reason why is that linear regression has as a basic assumption the idea that data are arranged in a linear—or line-like—shape. When relationships are weak, it will not be possible to see just by glancing at the scatterplot whether it is linear or not, or if there is no relationship at all.

A diagram showing what a curvilinear, arc-shaped relationship would look like.

However, there are cases where it is quite obvious that there *is* a relationship, but that this relationship is not line-like in shape. For example, if the scatterplot shows a clear curve, as in Figure 1, one that could not be approximated by a line, then the relationship is not sufficiently linear to be detected by a linear regression. [1] Thus, any results you obtain from linear regression analysis would considerably underestimate the strength of such a relationship and would not be able to discern its nature. Therefore, looking at the scatterplot before running a regression allows the analyst to determine if the particular relationship of interest can appropriately be tested with a linear regression.

Assuming that the relationship of interest is appropriate for linear regression, the regression can be produced by going to Analyze → Regression → Linear [2] (Alt+A, Alt+R, Alt+L). The dependent variable is placed in the Dependent box; the independent in the “Block 1 of 1” box. Under Statistics, be sure both Estimates and Model fit are checked. Here, we are using the independent variable AGE and the dependent variable CARHR. Once the regression is set up, click OK to run it.

A screenshot of the linear regression dialog and the statistics popup from within the dialog. Alt+D navigates to the dependent variable box; tab must be used to get to the Block 1 of 1 box, where independent variables go. Alt+S opens the statistics window; here, Alt+E toggles displaying estimates and Alt+M toggles displaying Model Fit (both of these should stay on). There are many, many other options both in the main window and in the statistics window, but these are beyond the scope of the chapter.

The results will appear in the output window. There will be four tables: Variables Entered/Removed; Model Summary; ANOVA; and Coefficients. The first of these simply documents the variables you have used. [3] The other three contain important elements of the analysis. Results are shown in Tables 1, 2, and 3.

Table 1. Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .100 .010 .009 8.656
a. Predictors: (Constant), Age of respondent
Table 2. ANOVA
Model Sum of Squares df Mean Square F Sig.
1 Regression 1290.505 1 1290.505 17.225 <.001
Residual 127966.152 1708 74.922
Total 129256.657 1709
a. Dependent Variable: How many hours in a typical week does r spend in a car or other motor vehicle, not counting public transit
b. Predictors: (Constant), Age of respondent
Table 3. Coefficients
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 9.037 .680 13.297 <.001
Age of respondent -.051 .012 -.100 -4.150 <.001
a. Dependent Variable: How many hours in a typical week does r spend in a car or other motor vehicle, not counting public transit

When interpreting the results of a bivariate linear regression, we need to answer the following questions:

  • What is the significance of the regression?
  • What is the  strength of the observed relationship?
  • What is the  direction of the observed relationship?
  • What is the  actual numerical relationship ?

Each of these questions is answered by numbers found in different places within the set of tables we have produced.

First, let’s look at the significance. The significance is found in two places among our results, under “Sig.” in the ANOVA table (here, table 2) and under “Sig.” in the Coefficients table (here, table 3). In the Coefficients table, look at the significance number in the row with the independent variable; you will also see a significance number for the constant, which will be discussed later. In a bivariate regression, these two significance numbers are the same (this is not true for multivariate regressions). So, in these results, the significance is p<0.001, which means we can conclude that the results are significant.

Next, we look at the strength. Again, we can look in two places for the strength, under R in the Model Summary table (here, table 1) and under Beta in the Coefficients table. Beta refers to the Greek letter [latex]\beta[/latex], and beta and [latex]\beta[/latex] are used interchangeably when referring to the standardized coefficient. R, here, refers to Pearson’s r, and both it and Beta are interpreted the same way as measures of association usually are. While the R and the Beta will be the same in a bivariate regression, the sign (whether the number is positive or negative) may not be; again, in multivariate regressions, the numbers will not be the same. This is because Beta is used to look at the strength of the relationship each individual independent variable has with the dependent variable. Here, the R/Beta is 0.100, so the relationship is moderate in strength.

The direction of the relationship is determined by whether the Beta is positive or negative. Here, it is negative, so that means it is an inverse relationship. In other words, as age goes up, time spent in cars each week goes down. And the B value, found in the Coefficients table, tells us by how much it goes down. Here we see that for every one year of additional age, time spent in cars goes down by about 0.051 hours (a little more than three minutes).

One final thing to look at is the R squared (R 2 ) in the Model Summary table. The R 2 tells us how much of the variance in our dependent variable is explained by our independent variable. Here, then, age explains 1% (0.010 converted to a percent by multiplying it times 100) of the variance in time spent in a car each week. That might not seem like very much, and it is not very much. But considering all the things that matter to how much time you spend in a car each week, it is clear that age is contributing somehow.

The numbers in the Coefficients table also allow us to construct the regression equation (the equation for the line of best fit). The number under B for the constant row is the y intercept (in other words, if X were 0, what would Y be?), and the number under B for the variable is the slope of the line. We apply asterisks to indicate significance, giving us the following equation: [latex]y=9.037-0.051x^{***}[/latex]. Note that whether or not the constant/intercept is statistically significant is just telling us whether the constant/intercept is statistically significantly different from zero, which is not actually very interesting, and thus most analysts do not pay much attention to the significance of the constant/intercept.

So, in summary, our results tell us that age has a significant, moderate, inverse relationship with time spent in a car each week; that age explains 1% of the variance in time spent in the car each week, and that for every one year of additional age, just over 3 more minutes per week are spent in the car.

  • Choose two continuous variables of interest. Write a hypothesis about the relationship between the variables.
  • Create a scatterplot for these two variables with regression line (line of best fit). Explain what the scatterplot shows.
  • Run a bivariate regression for these two variables. Interpret the results, being sure to discuss significance, strength, direction, and the actual magnitude of the effect.
  • Create the regression equation for your regression results.

Media Attributions

  • curvilinear © Mikaila Mariel Lemonik Arthur is licensed under a CC BY-NC-SA (Attribution NonCommercial ShareAlike) license
  • linear regression dialog © IBM SPSS is licensed under a All Rights Reserved license
  • There are other regression techniques that are appropriate for such relationships, but they are beyond the scope of this text. ↵
  • You will notice that there are many, many options and tools within the Linear Regression dialog; some of these will be discussed in the chapter on Multivariate Regression, while others are beyond the scope of this text. ↵
  • The Variables Entered/Removed table is important to those running a series of multivariate models while adding or removing individual variables, but is not useful when only one model is run at a time. ↵

Social Data Analysis Copyright © 2021 by Mikaila Mariel Lemonik Arthur is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

  • Introduction
  • Conclusions
  • Article Information

Individual effects and overall effect of resistance exercise training on depressive symptoms. The different sizes of the data markers indicate the respective weight of the individual effects in the overall analysis. Studies are cited multiple times because multiple effects were derived from individual trials. Each citation represents a unique effect. The dashed vertical lines show the difference between the overall effect and each individual effect.

eFigure 1. Flowchart of Study Selection

eFigure 2. Funnel Plot of Hedges d Effect Sizes vs Study Standard Error

eTable 1. Individual Scores on Amended Detsky Quality Assessment

eTable 2. Values Used to Calculate Hedges d Effect Size and Primary Moderator Values

eTable 3. Definitions for Each Moderator and Associated Levels

eReference s

  • Efficacy of Resistance Exercise Training With Depressive Symptoms JAMA Psychiatry Comment & Response October 1, 2018 Sammi R. Chekroud, BA; Adam M. Chekroud, PhD

See More About

Select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Others Also Liked

  • Download PDF
  • X Facebook More LinkedIn

Gordon BR , McDowell CP , Hallgren M , Meyer JD , Lyons M , Herring MP. Association of Efficacy of Resistance Exercise Training With Depressive Symptoms : Meta-analysis and Meta-regression Analysis of Randomized Clinical Trials . JAMA Psychiatry. 2018;75(6):566–576. doi:10.1001/jamapsychiatry.2018.0572

Manage citations:

© 2024

  • Permissions

Association of Efficacy of Resistance Exercise Training With Depressive Symptoms : Meta-analysis and Meta-regression Analysis of Randomized Clinical Trials

  • 1 Department of Physical Education and Sport Sciences, University of Limerick, Limerick, Ireland
  • 2 Department of Public Health Sciences, Karolinska Institutet, Stockholm, Sweden
  • 3 Department of Kinesiology, Iowa State University, Ames
  • 4 Health Research Institute, University of Limerick, Limerick, Ireland
  • Comment & Response Efficacy of Resistance Exercise Training With Depressive Symptoms Sammi R. Chekroud, BA; Adam M. Chekroud, PhD JAMA Psychiatry

Question   What is the overall association of efficacy of resistance exercise training with depressive symptoms, and which logical, theoretical, and/or prior empirical variables are associated with depressive symptoms?

Findings   In this meta-analysis of 33 clinical trials including 1877 participants, resistance exercise training was associated with a significant reduction in depressive symptoms, with a moderate-sized mean effect. Total volume of resistance exercise training, health status, and strength improvements were not associated with the antidepressant effect; however, smaller reductions in depressive symptoms were derived from trials with blinded allocation and/or assessment.

Meaning   The available empirical evidence supports resistance exercise training as an alternative and/or adjuvant therapy for depressive symptoms.

Importance   The physical benefits of resistance exercise training (RET) are well documented, but less is known regarding the association of RET with mental health outcomes. To date, no quantitative synthesis of the antidepressant effects of RET has been conducted.

Objectives   To estimate the association of efficacy of RET with depressive symptoms and determine the extent to which logical, theoretical, and/or prior empirical variables are associated with depressive symptoms and whether the association of efficacy of RET with depressive symptoms accounts for variability in the overall effect size.

Data Sources   Articles published before August 2017, located using Google Scholar, MEDLINE, PsycINFO, PubMed, and Web of Science.

Study Selection   Randomized clinical trials included randomization to RET (n = 947) or a nonactive control condition (n = 930).

Data Extraction and Synthesis   Hedges d effect sizes were computed and random-effects models were used for all analyses. Meta-regression was conducted to quantify the potential moderating influence of participant and trial characteristics.

Main Outcomes and Measures   Randomized clinical trials used validated measures of depressive symptoms assessed at baseline and midintervention and/or postintervention. Four primary moderators were selected a priori to provide focused research hypotheses about variation in effect size: total volume of prescribed RET, whether participants were healthy or physically or mentally ill, whether or not allocation and/or assessment were blinded, and whether or not the RET intervention resulted in a significant improvement in strength.

Results   Fifty-four effects were derived from 33 randomized clinical trials involving 1877 participants. Resistance exercise training was associated with a significant reduction in depressive symptoms with a moderate-sized mean effect ∆ of 0.66 (95% CI, 0.48-0.83; z  = 7.35; P  < .001). Significant heterogeneity was indicated (total Q  = 216.92, df  = 53; P  < .001; I 2  = 76.0% [95% CI, 72.7%-79.0%]), and sampling error accounted for 32.9% of observed variance. The number needed to treat was 4. Total volume of prescribed RET, participant health status, and strength improvements were not significantly associated with the antidepressant effect of RET. However, smaller reductions in depressive symptoms were derived from randomized clinical trials with blinded allocation and/or assessment.

Conclusions and Relevance   Resistance exercise training significantly reduced depressive symptoms among adults regardless of health status, total prescribed volume of RET, or significant improvements in strength. Better-quality randomized clinical trials blinding both allocation and assessment and comparing RET with other empirically supported treatments for depressive symptoms are needed.

Depression is a highly prevalent global burden, affecting more than 300 million people worldwide 1 ; is a significant source of absenteeism and disability in the work force 2 ; has an economic burden of approximately $118 billion annually 3 ; and is the most costly mental health disorder in Europe, accounting for 1% of the total gross domestic product. 4 Depressive symptoms are highly comorbid and significantly associated with poor health, 5 including an increased risk of cardiovascular diseases, 6 , 7 Alzheimer disease, 8 type 2 diabetes, 9 mortality, 10 and noncompliance with medical treatment. 11

Current frontline treatments for depression include medication and psychotherapy. However, for individuals with mild to moderate or severe depression, medication can be expensive, with limited efficacy ( d  < 0.20). 12 , 13 Psychotherapy can be expensive and inaccessible, and previously reported effects may be overestimated owing to publication bias. 14 Moreover, among individuals with depression who are seeking treatment, depressive symptoms persist for approximately 67% after first-line treatment of up to 14 weeks, and at least 30% remain depressed after 4 rounds of distinct 12-week treatments. 15 Thus, there is continued interest in alternative treatments for depression and continued need to compare potential alternative treatments with established treatments.

Exercise interventions are promising treatments for depressive symptoms, and these interventions are free from the adverse effects and high costs associated with antidepressant medications and psychotherapy. 16 , 17 Exercise interventions also have established benefits for cardiovascular diseases, the leading cause of death among individuals with major depressive disorder. 6 Exercise training improves depressive symptoms among otherwise healthy adults, 18 chronically ill adults, 19 and adults with a depressive disorder. 17 However, the magnitude of the effect remains unclear, as publication bias and flawed inclusion criteria may have resulted in underestimations of the magnitude of exercise effects. 17 , 20 The benefits of acute aerobic exercise and aerobic exercise training (AET) for depressive symptoms among otherwise healthy adults and chronically ill adults are well established, 18 , 19 , 21 , 22 but less is known regarding the associations of resistance exercise training (RET) with depressive symptoms. In addition, few trials have included both an RET and an AET arm in the same investigation, limiting direct comparisons between the modalities.

Resistance exercise training interventions are generally designed to increase strength, skeletal muscle mass, endurance, and/or power. 23 Evidence has supported significant anxiolytic effects of RET among adults, regardless of their health status, 24 and a previous narrative review supported the antidepressant effects of RET. 25 However, no quantitative synthesis of randomized clinical trials (RCTs) of the antidepressant effect of RET has been conducted. Furthermore, there is a need to identify potential sources of variability in the antidepressant effect of RET, particularly modifiable participant and trial characteristics, to better inform the prescription of RET and future RET interventions.

Quiz Ref ID The key objectives of this meta-analysis and meta-regression analysis were to estimate the overall association of efficacy of RET with depressive symptoms; determine the extent to which the overall effect varies based on variables of logical, theoretical, and/or prior empirical variables associated with depressive symptoms; and compare the effect of different exercise modes derived from RCTs in which participants were randomized to RET, AET, or a nonactive control condition.

This systematic review was conducted in accordance with the PRISMA guidelines. 26 Articles published before August 2017 were identified using Google Scholar, MEDLINE, PsycINFO, PubMed, and Web of Science. Key words used included combinations of strength training , resistance training , and weight training , along with depress* . Supplementary searches of relevant systematic reviews 17 , 18 , 24 , 25 , 27 and references within included articles were performed manually.

Inclusion criteria were peer-reviewed publication, clinical trials, randomized allocation to either an RET intervention or a nonactive control condition, and a validated self-report or clinician-rated measure of depressive symptoms assessed at baseline and at midintervention and/or postintervention. Investigations were excluded that included exercise as part of a multicomponent intervention but did not include the additional component in comparison conditions, and/or compared RET only with an active treatment for depression, including cognitive therapy, pharmacotherapy, relaxation or meditation, and flexibility training. One article 28 was excluded because the depressive outcomes were reported in an earlier included article. 29 eFigure 1 in the Supplement provides a flowchart of article inclusion and exclusion.

Data were extracted from the included RCTs into an SPSS (SPSS Inc) file by 3 of us (B.R.G., C.P.M., and M.P.H.). The data extracted included the characteristics of the participants and the trials and the associations of exercise with outcomes of logical, theoretical, and/or prior empirical relation to depressive symptoms and/or the associations of RET with depressive symptoms; these included age, sex, physical and mental health status, type of control condition, whether allocation and/or assessment were blinded, duration of exercise program, frequency, session duration, RET intensity, whether or not RET sessions were supervised, whether or not the primary outcome of the trial was depressive symptoms, depressive symptom measure used, and whether or not there was a significant improvement in strength. To calculate total volume of RET prescribed, intervention duration (weeks), weekly frequency (days), and session duration (minutes) were multiplied together.

Two of us (B.R.G. and M.P.H.) independently assessed trial quality (scored 0-13) using the Detsky scale. 30 This scale was amended to include research design, control condition, randomization and blinding methods, outcome measures, adherence, and characteristics of the exercise intervention. Higher scores indicated better study quality. The individual scores of each included RCT are presented in eTable 1 in the Supplement .

To calculate Hedges d effect sizes, the mean change for the control was subtracted from the mean change for RET, and the difference was divided by the pooled baseline SD. 31 Larger reductions in depressive symptoms for RET resulted in positive effect sizes. eTable 2 in the Supplement presents the values used to calculate Hedges d and primary moderator values. Interrater reliability for effect size calculations was examined by calculating 2-way (effects × raters) intraclass correlation coefficients for absolute agreement. The initial intraclass correlation coefficients were greater than 0.90. Quiz Ref ID When means and SDs were not reported, the authors were contacted. When these values could not be provided ( k  = 5), they were estimated from exact P values reported in the trial, 32 included graphs, 33 , 34 or from the largest other study of the same population sample that used the same measure of depressive symptoms, 35 , 36 in accordance with common meta-analytic protocols. 37 Discrepancies (eg, values of SDs estimated from included graphs) were resolved by consensus among the investigators involved in the data extraction (B.R.G., C.P.M., and M.P.H.).

Meta-regression was used for moderator analyses because it reduces the probability of type I error by computing concurrent estimates of independent effects by multiple moderators on the variation in effect size across trials. Random-effects models were used with macros (MeanES; MetaReg) 38 to aggregate the mean effect size delta (Δ) and test the variation in effects according to moderator variables. 31 , 38 Heterogeneity was evaluated with Cochrane Q , and consistency was evaluated with I 2 . 37 If sampling error accounted for less than 75% of the observed variance, heterogeneity was indicated. 31 The mean reduction in depressive symptoms among participants engaging in RET, expressed as a function of absolute risk reduction, was calculated to determine the number needed to treat. 39 The number of unretrieved or unpublished studies of null effect that would diminish the significance of observed effects of P  > .05 was estimated as fail-safe N+. 40

As a sensitivity analysis, the mean effect was recalculated, extracting single effects from the included RCTs determined by the effect with the maximum dose of RET, and the effect in which the Beck Depression Inventory was used, 41 for homogeneity of results. There were 3 exceptions in which 2 effects remained extracted from single RCTs because these RCTs each contained 2 treatment groups and 2 control groups. 33 , 42 , 43

To examine publication bias, funnel plot symmetry was examined, Egger regression 44 and Begg rank correlation tests were calculated, 45 and trim and fill analysis adjusting to the left of the mean was performed. 46 Potential outliers, effects substantially larger than most, were also removed, and the mean effect size ∆ was recalculated for additional sensitivity analysis.

Four primary moderators were selected a priori to provide focused research hypotheses about variation in effect size: total volume of prescribed RET, participant’s health status, whether or not allocation and/or assessment was blinded, and whether or not the RET intervention resulted in a significant improvement in strength. Definitions for each primary and secondary moderator and associated levels are presented in eTable 3 in the Supplement .

Each of the 4 primary moderators were coded according to the planned contrasts ( P  ≤ .05) among its levels. 47 Primary moderators were included in the mixed-effects multiple linear regression analyses with maximum likelihood estimation, adjusting for nonindependence of multiple effects contributed by single studies, baseline depressive symptoms, and the depressive symptom measure. 31 , 38 Tests of the regression model ( Q R ) and its residual error ( Q E ) are reported.

Secondary moderators were selected for exploratory univariate analyses. Random-effects models were used to calculate the mean effect sizes (Δ) and 95% CIs for moderator variables. 38 Each secondary moderator was included in random-effects univariate meta-regression analysis with maximum likelihood estimation. 31 , 38

Fifty-four effects were derived from 33 RCTs of 1877 participants (RET group, 947 participants; control group, 930 participants). Table 1 presents the relevant characteristics for each of the included RCTs. 28 , 32 - 36 , 42 , 43 , 48 - 72 Depressive symptoms were the primary outcome in 18 RCTs ( k  = 37). Quiz Ref ID The mean (SD) sample age was 52 (18) years, and 67% of participants were female. The mean prescribed RET program duration was 16 weeks (range, 6-52 weeks). The frequency of RET sessions ranged from 2 to 7 days per week; the most common frequency was 3 days per week (20 RCTs; k  = 30). Twenty-five RCTs ( k  = 39) evaluated participants with a physical or mental illness. Twenty-five RET interventions ( k  = 44) were fully supervised by various health care professionals. Seven RET interventions ( k  = 9) included a combination of supervised and unsupervised sessions, and 1 RET intervention was unsupervised. Adherence or compliance was reported in 15 of the 33 RCTs; the mean (SD) adherence rate was 78% (18%). Of the 18 remaining RCTs that did not report adherence or compliance, 2 reported attendance rates, which ranged from 87.5% 53 to 94%. 71 The Beck Depression Inventory 41 was the most frequently used measure of depressive symptoms ( k  = 21).

A forest plot of the distribution of effects is presented in the Figure . Forty-eight of the 54 effects (89%) were larger than zero, indicating a reduction in depressive symptoms favoring RET. Twenty effects significantly favored RET. The mean effect size ∆ was 0.66 (95% CI, 0.48-0.83; z  = 7.35; P  < .001). The effect was heterogeneous (total Q  = 216.92, df  = 53; P  < .001; I 2  = 76.0% [95% CI, 72.7%-79.0%]), and sampling error accounted for 32.9% of observed variance. The mean quality score was 10.5 (range, 7-13). The fail-safe number of effects was 1358, indicating that 1358 null effects would be needed to diminish the overall effect to P  > .05. Significant Begg rank correlation (Kendall τ = 0.45; P  < .001) and Egger regression tests (intercept = –1.34; SE = 0.52; P  = .01) indicated significant funnel plot asymmetry (eFigure 2 in the Supplement ). Trim and fill analyses did not change the overall effect (∆ = 0.66; 95% CI, 0.48-0.83; 0 RCTs trimmed). Quiz Ref ID The mean reduction in depressive symptoms among participants engaging in RET resulted in a number needed to treat of 4.

Three effects substantially larger than most were derived from 1 RCT. 69 The magnitude of these effects appeared to be due partly to greater depressive symptoms among participants who were randomized to the intervention group compared with controls. The mean effect was recalculated with this RCT removed, and the effect remained moderate and significant (∆ = 0.53; 95% CI, 0.38-0.68; z  = 7.00; P  < .001). Similarly, a nonsignificant reduction in the overall effect was observed when calculated with single effects derived from each study (∆ = 0.48; 95% CI, 0.30-0.67; z  = 5.08; P  < .001).

The overall meta-regression model was significant ( Q R  = 17.97, df  = 7; P  = .01; R 2  = 0.30; Q E   =  42.57, df  = 31; P  = .08; I 2  = 38.88% [95% CI, 25.63%-49.77%]). Blinded allocation and/or assessment of outcomes accounted for significant variation in the antidepressant effects of RET (β = –0.39; z  = –2.50; P  = .01). Effects were significantly smaller when outcome allocation and/or assessment was blinded (∆ = 0.56; 95% CI, 0.40-0.71) compared with when outcome allocation and/or assessment was not blinded (∆ = 1.07; 95% CI, 0.36-1.78). Total volume of prescribed exercise (β = –0.28; P  = .09), significant improvements in strength (β = 0.32; P  = .09), and participant’s health status (β = –0.23; P  = .17) were not significantly related to effect size ( Table 2 ).

The results of univariate moderator analyses for the primary and secondary moderators are presented in Table 3 .

To facilitate subanalyses between RET and AET, data were extracted from 9 RCTs ( k  = 17) in which participants were randomized to RET, AET, or a nonactive control condition. 32 , 35 , 48 , 51 , 54 - 56 , 58 , 61 , 66 Effects were not significantly different for the RET interventions (∆ = 0.64; 95% CI, 0.34-0.93) than for the AET interventions (∆ = 0.46; 95% CI, 0.22-0.70) compared with the control groups ( P  = .48). When directly comparing the effects of RET with AET (positive effects favoring RET), a small, nonsignificant mean effect ∆ favoring RET was found (∆ = 0.15; 95% CI, –0.004 to 0.30; z =  1.91; P  = .06).

To our knowledge, this is the first meta-analysis to examine RCTs to assess the efficacy of RET on depressive symptoms. Across 33 RCTs, RET was associated with a significant reduction in depressive symptoms regardless of the participants’ characteristics (ie, age, sex, and health status) or the features of the RET stimulus (ie, program duration, session duration, intensity, frequency, or total prescribed volume). However, while simultaneously considering the potential variation associated with baseline depressive scores, multiple effects from single RCTs, whether or not strength was significantly improved, total prescribed RET volume, and participant’s health status, blinded allocation and/or assessment was significantly associated with the overall effect of RET, such that significantly smaller reductions in depressive symptoms were found when investigators were blinded to allocation and/or assessment.

Univariate analyses showed that significantly larger reductions in depressive symptoms were derived from RCTs of participants with scores indicative of mild to moderate depression compared with RCTs of participants without scores indicating mild to moderate depression, and from RCTs of shorter RET sessions (<45 minutes) compared with RCTs featuring longer session durations. In addition, significantly larger reductions were found in fully supervised RCTs compared with RCTs that used combinations of supervised and unsupervised RET, and in RCTs in which the primary outcome was depressive symptoms ( Table 3 ).

The magnitude of the overall mean effect (Δ = 0.66; 95% CI, 0.48-0.83) is consistent with the association of diverse types of exercise training with depression (pooled standardized mean difference, –0.62; 95% CI, –0.81 to 0.42, with negative scores favoring exercise) 18 and is larger than the recently reported association of RET with anxiety (∆ = 0.31). 24 In addition, the magnitude of the overall mean effect and the magnitude of the effects among important subsamples are consistent with previously reported effects. Specifically, the mean effect for individuals with a physical illness (∆ = 0.34; 95% CI, 0.17-0.52) is consistent with previous evidence of the associations of all types of exercise training with depressive symptoms among adults with a chronic illness (∆ = 0.30; 95% CI, 0.25-0.36) 19 and adults with neurologic disorders (∆ = 0.28; 95% CI, 0.15-0.41). 73

The large effect of RET found among adults with depressive symptoms indicative of mild to moderate depression (∆ = 0.90; 95% CI, 0.68-1.11) is consistent with previously reported effects of all exercise modes among people with major depressive disorder (standardized mean difference, 1.11; 95% CI, 0.79-1.43). 17 Twelve RCTs ( k  = 25) included samples that reported clinically significant elevations in depressive symptoms, based on cutoff scores commonly used for clinical screening. 74 - 77 The mean scores for 10 of the 25 effects (40%) suggested potential remission based on a frequently used response threshold of a 50% or greater reduction in baseline scores. 78 The mean percentage reduction from baseline scores for all 25 of these effects was 45%. Moreover, the mean effect for RCTs in which baseline scores were indicative of mild to moderate depression (Δ = 0.90; 95% CI, 0.68-1.12; z =  8.12; P  < .001) was significantly larger than effects from RCTs in which baseline scores were below suggested clinical cutoff scores (Δ = 0.45; 95% CI, 0.23-0.67; z  = 4.02; P  = .03) ( Table 3 ). The larger percentage reduction found from RCTs of participants with elevated depressive symptoms, coupled with the significant difference based on initial severity of depressive symptoms, suggests that RET may be particularly helpful for reducing depressive symptoms in people with greater depressive symptoms. These findings support potentially different mechanisms of action and/or unique interactions in participants with clinical depression that may not be present in participants with subclinical depressive symptoms.

Blinded allocation and/or assessment was independently and significantly associated with reductions in depressive symptoms; smaller reductions occurred in RCTs with blinded allocation and/or assessment (∆ = 0.56; 95% CI, 0.40-0.71). Blinded allocation and assessment of outcomes can limit biases associated with self-reported measures in exercise interventions. 79 - 81 Previous reports have demonstrated a reduction in the overall effect of exercise on depression after exclusion of trials that do not adequately blind allocation and/or assessment. 18

Blinded allocation and/or assessment is also an indication of intervention quality. 30 , 82 Based on the study quality assessment used here, the overall quality of RCTs was high, with a mean score of 10.5 (range, 7-13) on a 13-point scale. When blinding was removed from the overall quality score, such that the maximum total score was 11, RCTs that reported blinded allocation and/or assessment had significantly higher mean (SD) quality scores (10.0 [1.0]) compared with those without blinded allocation and/or assessment (8.0 [0.9]) ( t  = 5.82, df  = 31; P  < .001). Blinded allocation and/or assessment may indicate a higher-quality research design, which may have resulted in smaller effects by providing a more rigorous estimation of the “true” effect of RET on depressive symptoms.

Participant’s health status, volume of prescribed RET, and whether or not strength was significantly improved were not independently associated with the overall mean reduction in depressive symptoms. These findings are consistent with previous evidence showing that the antidepressant effects of exercise training were not dependent on a significant improvement in fitness. 19 These findings are also consistent with recently reported associations of RET with anxiety. 24

Although RET significantly reduced depressive symptoms independent of total prescribed volume of RET, this measure of total volume (intervention length × frequency × session duration) could not be extracted for all RCTs because 8 RCTs ( k  = 14) did not report the duration of RET sessions. In addition, this measure of total volume did not include the intensity of prescribed RET. Heterogeneous reporting of prescribed intensity did not allow differentiation between low-intensity RET and moderate-intensity RET, necessitating their merger and comparison with vigorous-intensity RET. Only 4 interventions ( k  = 9) 28 , 36 , 70 , 71 were of vigorous intensity. The relationship between RET intensity and strength gains is moderated by participant training status, as moderate-intensity RET improves strength most in untrained participants, and vigorous-intensity RET improves strength most in trained participants. 83 There is a paucity of within-study comparisons of RET dose, multiarm RCTs comparing RET and other strictly matched exercise modalities, and investigations of the influence of exercise volume, exercise intensity, and their interaction. For example, more frequently completed vigorous RET may afford the possibility of shorter exercise sessions while meeting recommended guidelines, 84 potentially increasing feasibility while maintaining positive mental health benefits.

There is continued interest in the comparative effects of different exercise modes on mental health outcomes. However, with one notable exception, 85 , 86 few RCTs have directly compared the antidepressant effects of different exercise modes in a single study sample. Nine RCTs included here directly compared RET with AET and a nonactive control condition. 32 , 35 , 48 , 51 , 54 - 56 , 58 , 61 , 66 Although the magnitude of improvement for AET and RET did not differ significantly, consistent with recent results of the comparative associations of AET and RET with anxiety symptoms, 24 only 2 RCTs attempted to match AET and RET interventions in any capacity. One trial matched AET and RET based on energy expenditure, 55 and 1 trial more thoroughly matched AET and RET based on body region, positive work, time actively engaged in exercise, and load progression. 58 Future trials, matching different exercise modes on relevant features of the exercise stimulus, will allow more rigorous and controlled comparisons between exercise modalities, and the examination of interactions between factors such as frequency, intensity, duration, and exercise modality.

In addition, authors should report the mean session duration, the numbers of sets performed, the numbers of repetitions, the lengths of rest periods between sets, and the intensity (eg, the percentages of 1-repetition maximum and the rate of perceived exertion), to more thoroughly assess the total volume of exercise prescribed. Authors should report whether interventions were performed in groups or individually. When exercise sessions are supervised, the efforts made to control for social interaction during sessions should be reported. Future trials should blind allocation, blind assessors from group assignment, explicitly report this process, and state how missing data and dropouts were handled, including explicitly stating if intention-to-treat analyses were conducted.

Six RCTs assessed the effects of RET on depressive symptoms in participants with a clinical diagnosis of depression or anxiety, and 8 RCTs assessed depressive symptoms in participants who had scores indicative of moderate depression without an actual diagnosis. More important, individuals who display elevated subclinical depressive or anxiety symptoms are at increased risk of developing clinically significant psychopathologic features. 87 Because participants with baseline scores indicative of mild to moderate depression had significantly larger improvements than those who did not, investigating RET interventions among individuals at different points on the severity spectrum may be particularly interesting.

Quiz Ref ID There was a notable lack of clear and complete reporting of intervention design, protocol, data analyses, participant information, medication use, adherence, and compliance, which should be emphasized in future trial reporting. Medication use was insufficiently reported to allow comparisons between RCTs; 12 of the 33 RCTs (36%) did not report information regarding medication use. Twenty-one of 33 RCTs (64%) did not report adherence or compliance with the interventions. Prescribed antidepressant medication use is associated with poor adherence to exercise programs among patients, 88 making this omission particularly problematic.

The available empirical evidence supports RET as an alternative or adjuvant therapy for depressive symptoms. Future trials should include thorough reporting of trial and RET design, specifically blinded allocation, assessment, and adherence. In addition, future trials should compare RET with other empirically supported therapies for depressive symptoms.

Accepted for Publication: February 12, 2018.

Corresponding Author: Brett R. Gordon, MSc, Department of Physical Education and Sport Sciences, University of Limerick, P-1039, The Physical Education and Sport Sciences Bldg, Limerick V94 T9PX, Ireland ( [email protected] ).

Published Online: May 9, 2018. doi:10.1001/jamapsychiatry.2018.0572

Author Contributions: Mr Gordon had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Gordon, Herring.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Gordon, McDowell, Herring.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Gordon, McDowell, Meyer, Herring.

Administrative, technical, or material support: McDowell, Hallgren, Herring.

Study supervision: Meyer, Lyons, Herring.

Conflict of Interest Disclosures: None reported.

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

U.S. flag

An official website of the United States government

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Data and Tools
  • National Forest Service Library

Comparison of Maximum Likelihood Estimation Approach and Regression Approach in Detecting Quantitative Trait Lco Using RAPD Markers

Authors: Changren Weng, Thomas L. Kubisiak, C. Dana Nelson, James P. Geaghan, Michael Stine
Year: 1999
Type: Scientific Journal
Station: Southern Research Station
Source: Proceedings of the Southern Forest Tree Improvement Conference, New Orleans, Louisiana, USA, July 11-14, 1999

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 05 July 2024

Mitigating urban heat island through neighboring rural land cover

  • Miao Yang 1 , 2 ,
  • Chen Ren   ORCID: orcid.org/0000-0002-4691-1198 1 , 2 ,
  • Haorui Wang 1 , 2 ,
  • Junqi Wang   ORCID: orcid.org/0000-0001-8843-7781 1 , 2 ,
  • Zhuangbo Feng 1 , 2 ,
  • Prashant Kumar 1 , 3 , 4 ,
  • Fariborz Haghighat 1 , 5 &
  • Shi-Jie Cao   ORCID: orcid.org/0000-0001-9136-0949 1 , 2 , 3  

Nature Cities volume  1 ,  pages 522–532 ( 2024 ) Cite this article

4945 Accesses

1 Citations

163 Altmetric

Metrics details

  • Climate-change mitigation
  • Environmental impact

Globally, the deteriorating Urban Heat Island (UHI) effect poses a significant threat to human health and undermines ecosystem stability. UHI mitigation strategies have been investigated and utilized extensively within cities by the provision of green, blue or gray infrastructures. However, urban land is precious and limited for these interventions, making it challenging to address this issue. Neighboring rural land cover may serve as a cooling source and have a great potential to mitigate UHI through processes such as heat absorption and circulation. This study aims to address the following questions: (1) what is the location of neighboring rural land cover to effectively mitigate UHI for the entire city and (2) what are the key parameters of the landscape. We investigated the quantitative and qualitative relationships between rural land cover and UHI, drawing on geographical and environmental data from 30 Chinese cities between 2000 and 2020. We found that the rural land cover extending outward from the urban boundary, approximately half of the equivalent diameter of city, had the most pronounced impact on UHI mitigation. The number and adjacency of landscape patches (a patch is a homogeneous and nonlinear basic unit of a landscape pattern, distinct from its surroundings) emerged as two key factors in mitigating UHI, with their individual potential to reduce UHI by up to 0.5 °C. The proposed recommendations were to avoid fragmentation and enhance shape complexity and distribution uniformity of patches. This work opens new avenues for addressing high-temperature urban catastrophes from a rural perspective, which may also promote coordinated development between urban and rural areas.

Similar content being viewed by others

regression analysis quantitative research

Global mapping of urban nature-based solutions for climate change adaptation

regression analysis quantitative research

Global climate-driven trade-offs between the water retention and cooling benefits of urban greening

regression analysis quantitative research

The evolution of social-ecological system interactions and their impact on the urban thermal environment

Cities are the cradle of human civilization, ensuring human progress, scientific innovations and economic advancements. Despite the constructive development activities within cities, they have also created and intensified certain environmental challenges 1 , 2 . The Urban Heat Island (UHI) effect causing urban overheating is a prominent example of these concerns resulting from rising urbanization and anthropogenic activities 3 , 4 . It has seriously endangered human lives, well-being and ecosystems, ultimately leading to economic consequences 5 . In July 2023, the world experienced the hottest month on record, with widespread heatwaves across many countries 6 . Moreover, temperature extremes on land will increase even faster compared to the increase of global mean temperature (land and ocean) due to climate change from human activities 4 . Hence, holistically formulating an effective adaptation and mitigation strategy for the UHI effect has become a focal issue for sustainable urban development 7 , 8 .

The existing literature on mitigating the UHI effect is primarily focused on strategies that seek solutions within the city limits, such as the provision of green, blue or gray infrastructure including trees 9 , grass 10 , parks 11 , green walls 12 , green roofs 13 , lakes 14 and so on. These function as urban cooling sources/heat sinks, reducing the temperature in surrounding areas through processes such as heat absorption, evapotranspiration, convection and circulation 15 , 16 . However, urban land is precious and limited for those mitigation interventions. These interventions have limited capacity to reduce UHI intensity in a specific urban district 17 , 18 , making it challenging to sufficiently reduce the UHI at the city scale. Urban heat is not confined within the physical boundaries of a city. Instead, urban heat can diffuse to the neighboring rural areas, which have more natural land cover than the city, including trees, rivers, grassland, cropland and so on. 19 , 20 , 21 . This suggests that rural land cover may also serve as an element for absorbing urban heat, thereby harboring the significant potential for mitigating urban heat islands. This offers a unique opportunity to mitigate the UHI effect through the utilization of neighboring rural land cover (NRLC) to effectively tackle the UHI challenge.

The existing literature discusses the advantages and implications of NRLC in mitigating the UHI 22 , 23 . Yao et al. 24 reported that the effect of greening in rural areas was an important and widespread driver of diurnal surface urban heat island intensity variability, responsible for 22.5%. Two cities with relatively comparable urban configuration and population density may have different UHI intensities merely due to different surrounding rural land cover characteristics 25 . Given the limited number of investigations on UHI mitigation in rural areas, there remains a dearth of knowledge about the quantified impact of rural land cover on UHI. Specifically, there is lack of understanding regarding the influencing location and patterns of mitigating UHI through NRLC. It is significant to explore the impact of rural land cover patterns on mitigating UHI for enhancing the potential of their effective implementation.

The innovations of this study lie in: (1) systematically quantifying the spatial extent (location) of rural land cover that affect UHI along with physical mechanisms discussed and (2) identifying key factors of rural land cover and evaluating their potential in UHI mitigation. Both qualitatively and quantitatively associated relations are revealed between rural land cover and UHI mitigation based on Chinese cities, considering that China has undergone rapid urbanization in the past 30 years. These cities have vast characteristics such as diversity of geographic features, climatic conditions, urban forms, development and so on, which will facilitate the feasibility and validity of technology deployment. This study can significantly contribute to the development of UHI mitigation strategies and sustainable urban development.

Locations of rural land cover for UHI mitigation

The quantitative influence of rural land cover on the Urban Heat Island Intensity (UHII) was analyzed as shown in Fig. 1 , using the regression model depicted in Methods . Two important parameters of rural land cover were considered, that is, the distance from the urban boundary (locations) and land cover types. Here the urban areas were divided into five different rings, namely urban ladders UL i ( i  = 1–5). UL 1 represented the area of urban center, whereas UL 2–5 gradually expanded outwards ( i  = 5 corresponding to the urban boundary area), as shown in Fig. 2 . The urban boundary was taken as the baseline and was extended outwards to obtain four rings, that is, rural ladders RL j ( j  = 1–4), of different radii. The inner boundary of each RL was the urban boundary. The average equivalent diameter of the selected cities was 22 km. Rural land cover was classified into five types: woodland, cropland, impervious surface, water body and grassland (grassland was not analyzed independently; explanation in Methods ).

figure 1

a – e , From the left to right, respectively: NRLC, woodland, cropland, impervious surface and water body. The horizontal coordinates represent the different urban ladders (UL i , i  = 1–5). The vertical coordinates represent the explanation degrees ( R 2 ) of different cover types to the surface UHI. f , The schematic representation of urban regions and rural land cover (the variation of color range standing for different urban regions). g , The specific locations of various urban and rural regions.

Source data

figure 2

a , Urban ladders. b , Rural ladders.

Figure 1a shows that NRLC (considering the totality of five types) can achieve cooling for all ULs. Specially, NRLC had the largest explanation degree of nearly 30% on the surface UHII variance in UL 1 , due to the effect of heat island circulation and convection (Supplementary Information B ). The NRLC in RL 3 , RL 2 and RL 4 had the highest explanation degree on the surface UHII variance in UL 1,2,5 , UL 4 and UL 3 , respectively. Given that NRLC in RL 3 still had the second largest explanation degree to the surface UHII variance in UL 3,4 , the NRLC in RL 3 potentially had the greatest UHI mitigation capacity for the entire urban area compared with RL 1,2,4 .

Figure 1b–e shows that impervious surfaces had the most significant influence on surface UHII, followed by cropland, and finally woodland and water body. The explanation degree of impervious surface in RL 3 and RL 4 to surface UHII variance was compared, with the small difference being within 0.01. The UHI mitigation capacity of impervious surface in RL 3 and RL 4 was greater than that in RL 1 and RL 2 . Cropland and woodland in RL 1 presented a markedly lower explanation degree to UHI variance compared with other RLs. Cropland in RL 3 explained the greatest degree of the UHI variance in UL 3,5 and exhibited a slightly lower explanation degree compared with the maximum degree in UL 1,2,4 (that is, difference of 0.015, 0.005, 0.005 for UL 1 , UL 2 and UL 4 , respectively). Water body in RL 2 explained the UHI variations of all ULs to a higher extent than other RLs. Supplementary Fig. A2 shows the corresponding combinations UL i – RL j , that is, NRLC or land cover types in RL j have the largest explanation degree of surface UHII variance in UL i compared with other RLs. Taking NRLC as an example, the corresponding combinations were UL 1 –RL 3 , UL 2 –RL 3 , UL 3 –RL 4 , UL 4 –RL 2 , UL 5 –RL 3 .

Key parameters of rural land cover for UHI mitigation

This section aimed to rank the landscape-level parameters (LLPs, used for NRLC) with correlations to the UHI mitigation. These parameters were determined through the calculation of SHapley Additive exPlanations (SHAP) values and Pearson correlation analyses. In the previous section, the corresponding RL for each UL (rural land cover in RL explained most of the surface UHII variance in UL) was obtained. On the basis of it, Fig. 3 illustrates the ranking of the SHAP values for the LLPs in rural regions, which reflects the capability to mitigate the surface UHII of the corresponded urban areas (that is, UL i , i  = 1–5). The higher the SHAP value, the better the UHI mitigation. It can be noted that AI (aggregation index), COHESION (patch cohesion index), PR (patch richness) and DIVISION (Landscape Division Index) were almost at the top of the SHAP rankings. To sum up, the key LLPs for different corresponding combinations on UHI mitigation were: UL 1 –RL 3 (PR, DIVISION, NP (number of patches)); UL 2 –RL 3 (AI, IJI (interspersion & juxtaposition index), PR, NP); UL 3 –RL 4 (PR, LSI (landscape shape index), COHESION, NP, IJI); UL 4 –RL 2 (LSI, COHESION, IJI, PR); and UL 5 –RL 3 (COHESION, LSI, PR, NP) (more details can be found in Supplementary Table A1 ).

figure 3

Columns from left to right represent UL 1 –RL 3 , UL 2 –RL 3 , UL 3 –RL 4 , UL 4 –RL 2 and UL 5 –RL 3 .

Supplementary Figs. A3 and A4 and Supplementary Table A1 show the extraction process and final results regarding the key parameters of four land cover types. The key parameters of woodland were mainly NP, IJI and AI. The key parameters of cropland were mainly CIRCLE_AM (circle index distribution) and IJI. The key parameters of impervious surface were mainly IJI, SPLIT (Splitting Index) and CIRCLE_AM. The key parameters for water body were mainly IJI, PD (patch density), CA (total (class) area), SHAPE_AM (shape index distribution) and CLUMPY (clumpiness index).

Impact of key parameters of rural land cover on UHI

This section aimed to elaborate the individual impact of the key LLPs on UHII for the aforementioned combinations (that is, UL 1 –RL 3 , UL 2 –RL 3 , UL 3 –RL 4 , UL 4 –RL 2 and UL 5 –RL 3 ), as presented in Fig. 4 . Most of the key landscape parameters and the surface UHII were close to a simple monotonic relationship. For instance, taking the urban region UL 1 (the hottest region) as an example (that is, UL 1 –RL 3 ; Fig. 4a ): (1) the value of surface UHII decreased as PR increased (PR, namely patch richness, indicates the total number of land cover types); (2) the value of surface UHII decreased as DIVISION increased (DIVISION, that is, landscape division index, indicates the probability that two randomly chosen pixels in the landscape are not situated in the same patch) and (3) the value of surface UHII decreased as NP decreased.

figure 4

Accumulated local effects (ALE) plot is centered so that the mean effect is zero. The value of the ALE can be interpreted as the main effect of the feature at a certain value compared to the average prediction of the data, that is, the smaller the value, the more effective it is in UHI mitigation. Panels a– e correspond to the five combinations of urban and rural ladders, that is, UL 1 –RL 3 , UL 2 –RL 3 , UL 3 –RL 4 , UL 4 –RL 2 and UL 5 –RL 3 . The effects of key parameters of LLPs for each combination on the surface UHI are shown by subpanels under the panel of each combination, that is, subpanels (i)–(iii) for PR, DIVISION, NP in a ; subpanels (i)–(iv) for AI, IJI, PR, NP in b ; subpanels (i)–(v) for PR, LSI, COHESION, NP, IJI in c ; subpanels (i)–(iv) for LSI, COHESION, IJI, PR in d and subpanels (i)–(iv) for COHESION, LSI, PR, NP in e .

Figure 4a–e columns elaborate that some key LLPs belonged to more than one UL and had a similar relationship with the surface UHII of different ULs. For example, the surface UHII at UL 1,2,3,5 decreased with the decrease of NP, the surface UHII at UL 3,4,5 decreased with the increase of LSI and COHESION and the surface UHII at UL 2,3,4 decreased with the decrease of IJI. This finding provided an opportunity to achieve effective cooling of a large part of the urban area through simple regulation of the same landscape parameters. The influencing pattern of key parameters, which belong to more than or equal to three ULs and having the same influence pattern on different ULs, were recommended to be used for generating key strategies on UHI mitigation. The other key LLPs, which can only mitigate surface UHII of a particular UL or have different influencing mechanisms on the surface UHII of different ULs, were used for generating supplementary strategies for localized area of refined UHI mitigation. As shown in Fig. 5 , the influence patterns of NP, IJI, LSI and COHESION were considered in key strategies to mitigate UHI, that is, (1) decreasing the number of patches (NP); (2) decreasing the even distribution of adjacencies among patch types (IJI); (3) avoiding square patch shapes (LSI) and (4) increasing the connectedness of the patches (COHESION). AI, DIVISION and PR were selected in complementary strategies. Taking the combination of UL 2 –RL 3 as an example: (1) increasing the connectedness of the patches (AI) and (2) increasing the number of patch types (PR).

figure 5

Strategies and suggestions for the key LLPs on UHI mitigation.

The impacts of key LCPs (landscape class parameters) for four cover types on surface UHII and the process of generating key strategies can be seen in Supplementary Figs. A5 – A12 . The influencing pattern of IJI, NP and AI for woodland were selected in key mitigation strategies on UHI. The influencing pattern of CIRCLE_AM and IJI for cropland; CIRCLE_AM, SPLIT and IJI for impervious surface; and NP, CA, CLUMPY, SHAPE_AM and IJI for water body were also selected in main strategies. All the main strategies are summarized in Table 1 .

The rural regions, with their rich natural land cover and simpler functional patterns, hold great potential for mitigating UHI 24 . This study aims to bridge this knowledge gap by investigating both quantitative and qualitative influence of NRLC on UHI mitigation in China from 2000 to 2020, as shown in Extended Data Figs. 1 and 2 . Results indicate that NRLC can possess the capacity to mitigate UHI for entire cities. Specifically, we discover that NRLC can contribute to urban cooling, with the most pronounced impact occurring within a 10–15 km radius from the urban boundary, which is closely interacted with the urban area. It further suggests that NRLC within this range can contribute up to 30% to the reduction of UHII in urban centers. The richness and density of landscape patches emerge as key factors in mitigating UHI, with the potential to reduce UHI by 0.5 °C through the modulation of key parameters. More suggestions are summarized in Table 1 .

Why do we need rural land cover types at specific locations to mitigate UHI? We explain this through urban physics, leveraging the concept of heat island circulation (Supplementary Information B ) and convection. Heat circulation holds paramount importance in urban ventilation and the exchange of energy between urban and rural environments. In this dynamic circulation process, air is warmed up within urban areas due to the effect of buoyancy force, creating a low-pressure zone near the ground. Subsequently the heated air will be transported to the rural regions via convection and diffusion, drawing cooler air from rural areas to continuously replenish the urban core. Our study reveals that RL 3 land cover can exert the most significant effect on UHI, with its circular radius encompassing a range of 10–15 km, which is approximately half of the city’s equivalent diameter, similar to the finding from Fan et al. 26

During the heat circulation cycles between different ladders (UL i and RL j ), the heat is absorbed by the rural landscapes at a different extent depending on the types and locations. Hence, well-designed landscape patches (including features such as the richness and density) should be promoted to realize self-cooling in rural regions.

The complexity and diversity of urban characteristics, including shape, development level, geographical location and climatic conditions, pose potential risks of introducing significantly deviated findings in this study. To address it, our research focuses exclusively on single-centered cities exceeding 200 km 2 , primarily characterized by plains interspersed with scattered terraces, hills and low mountains. This approach helps to reduce the impact of the cities’ shapes and geographical features under investigation. Furthermore, cities are categorized into five concentric rings (UL i ) based on their varying urban development intensities (UDI). This stratification enables a differentiated analysis of the impact of rural land cover on UHI effect across different urban development intensities. By doing so, we successfully group cities based on their urban development levels, thereby limiting and quantifying the influence of urban development on our findings. To validate the influence of climate, cities are grouped according to their climatic zones, and separate analyses are conducted. The mechanisms of rural land cover in UHI mitigation have differences between various climate zones. However, a significant impact of climate is not observed from the results. The overlap of key landscape parameters between different climate zones and China is basically higher than 0.7. Additionally, the majority of mitigation strategies identified in China are transferrable to different climate zones (Supplementary Information C ). Consequently, our findings still have relatively high generalizability and applicability in different cities. In the future, this study holds promise in offering valuable methodological and strategic guidance for refinement studies at the city scale and the development of context-specific policy formulations.

The heat island circulation can cross the physical urban boundaries to facilitate heat exchange between cities and rural areas to mitigate UHI. However, at the same time, due to the interaction and collision of energy and heat between cities and rural areas, it may lead to a large amount of pollutants flowing back into the city with the heat island circulation, causing pollution to the urban ecosystem. The local microcirculation between urban, suburban and urban–rural (buffer zone) areas can be improved by rationalizing the landscape patches of suburban and rural areas to provide spaces for heat exchange and pollutant filtration and sinking and to avoid pollutant refluxes. This study shows that different types and locations of rural landscapes may mitigate UHI to different degrees due to the different temperature gradients of the thermal cycles.

To sum up, rather than perceiving urbanization as an undesirable trend that opposes sustainable urban development, it is more constructive to embrace it as a continuous process. Unlike the intricate process of balancing urban development with sustainability, the regulation of rural land cover would yield numerous co-benefits for both urban and rural areas, including offering a nature-based solution without encroaching on urban land 3 , preserving rural landscape, boosting rural economy, assisting in mitigating the UHI and supporting ongoing urban prosperity and sustainability.

This study aims to explore the impacts of neighboring rural land cover (locations and landscape types) on urban heat island (UHI) mitigation. The method logic contains three main scenarios: (1) investigating the influence of rural land cover in different locations on UHI at the urban scale; (2) extracting the key rural landscape parameters on UHI mitigation and (3) identifying the impact of individual key landscape parameters on UHI and proposing key mitigation strategies. The applied research framework is shown in Extended Data Fig. 1 . First, 30 Chinese cities are selected as case studies. The data of the UHI intensity (UHII) and rural land cover for these cities are collected. Second, urban areas are divided into urban ladders (UL i ) based on UDI 27 , and rural areas are divided into Rural Ladders (RL j ) with different distances from the urban boundary 28 . The UHII values of different UL i are calculated and the land cover types of different RL j are categorized. Third, regression models are used to analyze the impact of different rural land cover from varying distances to the urban boundary 29 . Then, SHapley Additive exPlanations (SHAP) is employed to rank the key landscape parameters of rural land cover, including landscape-level parameters (LLPs) and landscape-class parameters (LCPs) 30 . Finally, accumulated local effects (ALE) plots are used to reveal the impact of individual key landscape parameters on UHII 31 .

Case studies

On the basis of the China Urban Statistical Yearbook 2020 ( http://www.stats.gov.cn ), 30 monocentric cities in China are selected with an urban area of more than 200 km 2 for investigation 32 . Documents have reported that the urban shape substantially influences the UHI; monocentric cities are more likely to experience the severe UHI phenomena 33 . As shown in Extended Data Fig. 2 , except Urumqi, all the cities are evenly distributed in China’s monsoon climate zone, representing a high level of urbanization. The landform characteristics of the sample cities can basically be categorized as dominated by plains, with scattered terraces, hills and low mountains in the city. Climatic differences in different regions of the country have negligible effects on the results, with sufficient data demonstrated in Supplementary Information C .

Data collection of UHII and rural land cover

Land cover data for 2000, 2005 and 2010 are obtained through Landsat 5 TM (Thematic Mapper) and for 2015 and 2020 through Landsat 8 OLI (Operational Land Imager). According to the common reference system of remote sensing monitoring in China (National Land Use/Cover Classification System for Remote Sensing Monitoring), the rural land cover was divided into five types: impervious surface, woodland, grassland, cropland and water body 34 . Neighboring rural land cover (NRLC) includes the totality of all land cover, that is, impervious surface, woodland, grassland, cropland and water body. When analyzing landscape types independently, only four land cover types (impervious surface, woodland, cropland and water body) are chosen and grassland is excluded. Grassland has limited latent heat and low heat absorption efficiency 35 , 36 and is not a common land cover type in rural area neighboring cities of China. The training sample points are equally distributed throughout the sample region obtained from high-resolution photos in Google Earth Pro. Landsat 5/8 Level 2, Collection 2, Tier 1 dataset and the Random Forest (RF) algorithm model are employed for land classification. The quality of categorization is assessed through Kappa values, which are all greater than 0.90.

This study incorporates 18 LLPs such as total area (TA), contagion (CONTAG) and Shannon’s evenness index (SHEI) and 22 LCPs such as the largest patch index (LPI), edge density (ED) and intersectionality and juxtaposition Index (IJI) 37 . LLPs are indicators for the landscape as a whole (NRLC); LCPs are indicators for individual landscape types (impervious surface, cropland, water body, woodland). The selected LLPs and LCPs are listed in Supplementary Table A2 . With the support of ArcGIS 10.2, the ArcGrid raster format images from 2000 to 2020 are imported in Fragstats 3.4. Background values of landscape types are filtered using the Class properties file. Finally, the LLPs and LCPs are selected and calculated.

The surface UHII is calculated based on remotely sensed land surface temperature (LST) data. LST is considered to be strongly connected to near-ground temperature and is commonly employed to investigate the geographic and temporal features of the UHI impact 38 . The LST dataset for the selected cities incorporates synthetic temperature data from MOD11A1 V6.1/LST_Day_1km in July and August. Due to its wide bandwidth, this dataset is well suited to regional-scale cross-sectional data analysis and modeling. Previous research has demonstrated the reliability of MODIS LST results, with errors typically within 1 K (ref. 39 ). The monthly average of the daily LST values is used to calculate the average summer LST for the selected cities in 2000, 2005, 2010, 2015 and 2020.

This study selects a reference line obtained by offsetting 20 km from the urban boundary as the baseline to calculate the surface UHII 40 . A ring equal to the urban area is obtained. The LST of this ring is considered to be unaffected by the UHI footprint and used to calculate the surface UHI. In this approach, the non-urban area used for calculating the surface UHII does not overlap with the RL area, to minimize the impact of UHII variation resulting from the temperature change of RL j .

where \({\mathrm{surface}}\;{\mathrm{UHII}}_{{\mathrm{UL}}_{i}}\) is the surface UHII of UL i in °C; \({\mathrm{LST}}_{{\mathrm{UL}}_{i}}\) is the average LST for UL i in June to August, °C; and \({\mathrm{LST}}_{\mathrm{rural}}\) is the average LST of the ring from June to August, °C.

Demarcation of UL i and RL j

To investigate the impact of rural land cover on UHI mitigation of different urban regions (at a city scale), both urban and rural areas are demarcated for the sake of cross analysis. The cities are divided into five UL i and the selected rural areas are divided into four RL j .

The urban development intensity indicator (UDI i ), which typically exhibits a linear correlation with UHI 41 , is used for the division of urban areas. Different UL i correspond to different UDI i with certain intervals. Each city is subdivided into five UL i to maximize the segmentation of the metropolitan area and prevent the UL i from becoming excessively small and fragmented 42 . There are significant differences in surface UHII between the different UL i shown in Supplementary Table A4 . The city clustering algorithm (CCA) is utilized in this study to define city boundaries 43 . Initially, a city map with a resolution of 3,000 m is generated using an UDI threshold higher than 25% (ref. 44 ). UDI is calculated as the proportion of total number of impervious grids within each 3,000 × 3,000 m pane 45 , as shown in equation ( 2 ). Subsequently, the urban area is identified using CCA with a clustering parameter of 3,000 m, corresponding to the spatial resolution of the initial urban map 45 . Therefore, two pixels in the city maps previously processed through UDI with a distance between pixel centers not exceeding 3,000 m (clustering parameter of CCA) is assigned to the same city. Then, the complete shape of an urban area is obtained and the periphery of the urban area is extracted as the urban boundary. Finally, UDI intervals within the urban area are further subdivided to better represent the changes in UHI along the UDI. With an UDI interval of 15%, five UL i of UL 1 (UDI = 85–100%, UL 2 (UDI = 70–85%), UL 3 (UDI = 55–70%), UL 4 (UDI = 40–55%) and UL 5 (UDI = 25–40%), were derived, as illustrated in Fig. 1 .

where i denotes the i th image element on the raster map; UHI i denotes the Urban Development Intensity value of the i th image element; S i denotes the total area of the i th image element; and S Impervious surface, i denotes the area of imperviousness in the i th image element.

The RL j outside the urban area is delineated to ascertain the extent of rural land cover that can exert the most significant impact on the UHI. In previous studies, the widely adopted approach for determining RL j radius is to make the RL j area equal to the urban area 46 or to employ a uniform RL j radius of 5 km or 10 km (refs. 47 , 48 ). However, when the RL j radius is too small, for instance, less than 1 km, it becomes challenging to reflect the true rural LST and there is also likelihood of it being influenced by the UHI footprint 49 . Therefore, in this study, we adopt a varying RL j radius methodology 50 . On the basis of the selection of the rural region in most UHI mitigation studies and the distances between urban boundaries of sample cities (approximately 40 km). We tentatively establish a maximum study area of 20 km from the urban boundaries for rural land cover. The rural area within this boundary is also in close proximity to the metropolitan area. The width of the RL j radius, denoted as D , is determined to change by one level at a distance of 5 km. This choice aligns with the 900-m resolution of rural land cover, and the observed land cover changes between different RL j are significant. Thus, the urban boundary is offset outward by four RL j , each with varying ring widths. As shown in Fig. 1 , the minimum and maximum radii for these RL j are set as 2.5 km and 5 km (RL 1 ), 5 km and 10 km (RL 2 ), 10 km and 15 km (RL 3 ) and 15 km and 20 km (RL 4 ), respectively. The specific RL j radius of each city is calculated by equation ( 3 ).

where D is the radius of the RL j ; D min is the minimum radius of the RL j ; D max is the maximum radius of the RL j ; S is the urban area of the sample cities; S min is the minimum metropolitan area among all sample cities; and S max is the maximum urban area among all sample cities.

Analytic method to determine rural land cover regions

After dividing urban and rural areas, the rural region (RL j ) to maximize the surface UHI mitigation for each urban area can be determined by R 2 . R 2 is a parameter reflecting the goodness of a regression model 51 . This statistic also indicates the percentage of variance in the dependent variable that is jointly explained by the independent variables. In other words, R 2 offers a measure of how effectively the variations in surface UHII can be explained by the NRLC in the regression model 52 . The greater the extent to which rural land cover elucidates variations in the UHI, the more pronounced its influence on UHI dynamics and its efficacy in mitigating UHI effects. On the basis of this, seven machine learning models (Lasso regression, Ridge regression, ElasticNet regression, Random Forest regression, Support Vector regression, K-Nearest Neighbors regression and Multilayer Perceptron regression) are used to train the dataset (independent variable: LLPs and LCPs, dependent variable: surface UHII) in this study. These models are selected because they are significantly different in training methods and recognized for their ability to train high-dimensional data with various capabilities 53 . To achieve better training results, a tenfold cross validation is used to tune the model parameters. To prevent model overfitting, the dataset is divided into training (70%) and test (30%) sets. Random seeds may significantly affect the model training outcomes 54 . To ensure the robustness of the results, random seeds controlling the division of the training and test sets are not defined, and the model output parameters are averaged over multiple training sessions 55 . The average values of model output parameters reach equilibrium when the number of trainings reaches 100. The R 2 of each regression model in this study is the average value obtained after training 100 times. The non-parametric test is also used to test for significant differences in the training results of different regression models, showing significant variability between the models. In this context, a single example will yield seven distinct R 2 values derived from seven separate regression models. The highest among these seven R 2 values is extracted to measure the degree to which rural land cover accounts for variations in UHI changes, termed as the ‘explanation degree’, as depicted in equation ( 4 ). The next interpretability analyses are carried out based on the model corresponding to the largest R 2 (explanation degree).

where R 2 denotes the extent to which the independent variables in a regression model explain changes in the dependent variable and k represents seven regression models.

By comparing the explanation degree ( R 2 ) of different rural regions (RL j ), the corresponded rural region (RL j ) to maximize the surface UHI mitigation for each urban area (UL i , i  = 1–5) are obtained, and together make up the combinations (RL j –UL i ).

In this study, three linear regression models (including Lasso regression, Ridge regression and ElasticNet regression) and four nonlinear regression models, including Random Forest regression, Support Vector regression, K-Nearest Neighbors regression and Multilayer Perceptron regression, are used for correlation analyses, which are described as follows. Supplementary Fig. A1 shows the distribution of R 2 for these regression models, and the appropriate model is determined by comparing the R 2 values.

Ranking method of key landscape parameters for rural land cover

On the basis of the obtained corresponding combinations (RL j –UL i ), the average marginal contribution of different landscape parameters to UHI changes is calculated by coupling the SHAP model with the best-fit machine learning model obtained in the previous section. SHAP belongs to the method of ex post interpretation. The basic idea is to calculate the marginal contribution of a feature when it is added to the model and then take the mean value, that is, the SHAP baseline value of the feature, considering the different marginal contributions in the case of all feature sequences. The SHAP values of LLPs and LCPs are sorted and accumulated, and when the accumulated amount reaches 80% of the total number, the parameters whose SHAP values are accumulated are chosen. Because these parameters may not be independent, considering all these parameters may affect or even restrict each other when applying. Therefore, the parameter screening is done by the correlation analysis. Most of the existing data analysis studies use 0.5 to 0.7 as the correlation value for high-dimensional parameter screening 56 , 57 , and we chose the middle value of 0.6 as the screening value for this study. In the set of data with correlation coefficients greater than 0.6, parameters with smaller SHAP values are eliminated. This is because the larger the SHAP value, the larger the effect of the parameter on the UHI within its own range of variation.

Influence of the key rural landscape parameters responding to UHI

ALE are used to recognize the influencing patterns of key landscape parameters on UHI mitigation. ALE is a global explanation technique that can describe how key parameters affect the prediction values from a machine learning model, which is a faster and unbiased alternative to partial dependence plots. In this study, ALE can examine the relationship between feature values (that is, landscape parameters) and target variables (that is, UHII). ALE averages and adds the difference in predictions throughout the key landscape parameters, thereby isolating the impacts of each feature value, which is at the cost of a greater number of observations and a nearly uniform distribution. Overall, the ALE model shows the main effects of individual predictor variables and their second-order interactions in black-box supervised learning models that are easy to understand. According to the interactive relationships between variables, ALE plots can be generated based on the fitted supervised learning model.

Data analysis

The data analysis process is shown in Supplementary Fig. A17 , which can be considered as a nesting of three loops. The first level of the loop is to train the LLPs and LCPs (independent variable) for a specific RL j and the UHII (dependent variable) for a particular UL i through seven machine learning models. The regression model with the largest R 2 is considered as the best trained. This regression model and the corresponding R 2 , that is, explanation degree, are the output terms of the first layer of the loop. The first loop is to obtain the best-trained model for the specified UL i and RL j (corresponding combination). The second level of the loop, based on the result of the first loop, can be used to compare the extent to which the land cover of different RL j affects the UHI of a specified UL i . At the end, to output the rural region that has the most significant effect on the UHI of a specified UL i . The first two layers of the cycle allow the first objective of this study to be achieved. The criterion for judging whether to continue analyzing the key parameters and patterns of the impact of rural land affects the UHI (the third level of the cycle) is whether there is a RL j of land cover that has a greater than 0° impact on the intensity of the heat island in that urban ladder. The third level of the cycle performs the previous steps once in each of the five UL i to obtain the region of rural land cover that has the most significant impact on the UHI of the respective UL i and the impact extent of rural land cover on the UHI, that is, explanation degree. If explanation degree is less than 0, it suggests that the UHI of this UL i is not affected by rural land cover. If explanation degree is greater than 0, the LLPs and LCPs of rural land cover are ranked from the largest to the smallest by SHAP value. Accumulation starts with the first SHAP value and stops, when the value reaches 80% of the total. The accumulated LLPs and LCPs are subjected to the correlation analysis, and parameters with lower SHAP rankings in the set of parameters with correlation coefficients greater than 0.6 are deleted, which aims to identify the key parameters of rural land cover affecting the UHI. Finally, ALE plots of these key parameters are plotted to explain the pattern of UHI response to the key parameters. At this point, the last question of this study is answered.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Landsat 5/8 Level 2, Collection 2, Tier 1 dataset and MOD11A1 V6.1 product are available through Google Earth Engine platform at http://code.earthengine.google.com . The dataset produced and used in this study is available as Supplementary Information and via Zenodo at https://doi.org/10.5281/zenodo.10424322 (ref. 58 ).

Code availability

The code used to produce and analyze the data in this study is available via Zenodo at https://doi.org/10.5281/zenodo.10424322 (ref. 58 ).

Zhao, C. et al. Long-term trends in surface thermal environment and its potential drivers along the urban development gradients in rapidly urbanizing regions of China. Sustain. Cities Soc. 105 , 105324 (2024).

Article   Google Scholar  

Yang, M. et al. A global challenge of accurately predicting building energy consumption under urban heat island effect. Indoor Built Environ. 32 , 455–459 (2023).

Kumar, P. et al. Urban heat mitigation by green and blue infrastructure: drivers, effectiveness, and future needs. Innovation 5 , 100588 (2024).

Google Scholar  

Tuholske, C. & Chapman, H. How to cool American cities. Nat. Cities 1 , 16–17 (2024).

Haddad, S. et al. Quantifying the energy impact of heat mitigation technologies at the urban scale. Nat. Cities 1 , 62–72 (2024).

NASA NASA Finds June 2023 Hottest on Record (NASA, 2023).

Mirzaei, P. A. et al. Urban neighborhood characteristics influence on a building indoor environment. Sustain. Cities Soc. 19 , 403–413 (2015).

Xi, C. et al. How can greenery space mitigate urban heat island? An analysis of cooling effect, carbon sequestration, and nurturing cost at the street scale. J. Cleaner Prod. https://doi.org/10.1016/j.jclepro.2023.138230 (2023).

Meng, Y. et al. Investigation of heat stress on urban roadways for commuting children and mitigation strategies from the perspective of urban design. Urban Clim. https://doi.org/10.1016/j.uclim.2023.101564 (2023).

Kim, H., Gu, D. & Kim, H. Y. Effects of urban heat island mitigation in various climate zones in the United States. Sustain. Cities Soc. 41 , 841–852 (2018).

Fu, J. C. et al. Impact of urban park design on microclimate in cold regions using newly developped prediction method. Sustain. Cities Soc. https://doi.org/10.1016/j.scs.2022.103781 (2022).

Cao, S. J. et al. Low-carbon design towards sustainable city development: integrating glass space with vertical greenery. Sci. China Technol. Sci. https://doi.org/10.1007/s11431-023-2570-x (2023).

Adilkhanova, I., Santamouris, M. & Yun, G. Y. Green roofs save energy in cities and fight regional climate change. Nat. Cities 1 , 238–249 (2024).

Schatz, J. & Kucharik, C. J. Seasonality of the urban heat island effect in Madison, Wisconsin. J. Appl. Meteorol. Climatol. 53 , 2371–2386 (2014).

Mirzaei, P. A. & Haghighat, F. Approaches to study urban heat island—abilities and limitations. Build. Environ. 45 , 2192–2201 (2010).

Yao, L. et al. Are water bodies effective for urban heat mitigation? Evidence from field studies of urban lakes in two humid subtropical cities. Build. Environ. 245 , 110860 (2023).

Aboelata, A. & Sodoudi, S. Evaluating the effect of trees on UHI mitigation and reduction of energy usage in different built up areas in Cairo. Build. Environ. 168 , 106490 (2020).

Sun, R. & Chen, L. How can urban water bodies be designed for climate adaptation? Landscape Urban Plann. 105 , 27–33 (2012).

Yang, J. & Huang, X. The 30 m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 13 , 3907–3925 (2021).

Gong, P. et al. Stable classification with limited sample: transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci. Bull. 64 , 370–373 (2019).

Li, Z. et al. SinoLC-1: the first 1 m resolution national-scale land-cover map of China created with a deep learning framework and open-access data. Earth Syst. Sci. Data 15 , 4749–4780 (2023).

Angel, S. et al. The dimensions of global urban expansion: estimates and projections for all countries, 2000–2050. Prog. Plann. https://doi.org/10.1016/j.progress.2011.04.001 (2011).

Martilli, A., Krayenhoff, E. S. & Nazarian, N. Is the Urban Heat Island intensity relevant for heat mitigation studies? Urban Clim. https://doi.org/10.1016/j.uclim.2019.100541 (2020).

Yao, R. et al. Greening in rural areas increases the surface urban heat island intensity. Geophys. Res. Lett. 46 , 2204–2212 (2019).

Stewart, I. D., Oke, T. R. & Krayenhoff, E. S. Evaluation of the ‘local climate zone’ scheme using temperature observations and model simulations. Int. J. Climatol. 34 , 1062–1080 (2014).

Fan, Y. et al. Horizontal extent of the urban heat dome flow. Sci. Rep. 7 , 11681 (2017).

Molinaro, R. et al. Urban development index (UDI): a comparison between the city of Rio de Janeiro and four other global cities. Sustainability https://doi.org/10.3390/su12030823 (2020).

Zhang, Q. M. et al. The influence of different urban and rural selection methods on the spatial variation of urban heat island intensity. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS) Yokohama, JAPAN, Jul 28–Aug 02 2019 4403–4406 (IGARSS, 2019).

Kong, H., Choi, N. and Park, S. Thermal environment analysis of landscape parameters of an urban park in summer—a case study in Suwon, Republic of Korea. Urban For. Urban Greening https://doi.org/10.1016/j.ufug.2021.127377 (2021).

Mangalathu, S., Hwang, S. H. and Jeon, J. S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. https://doi.org/10.1016/j.engstruct.2020.110927 (2020).

Galkin, F. et al. Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects. Preprint at b ioRxiv https://doi.org/10.1101/507780 (2018).

National Bureau of Statistics of China (NBS). China City Statistical Yearbook (China Statistics Press, 2021).

Liu, X. et al. Influences of landform and urban form factors on urban heat island: comparative case study between Chengdu and Chongqing. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2022.153395 (2022).

Mugabowindekwe, M. et al. Nation-wide mapping of tree-level aboveground carbon stocks in Rwanda. Nat. Clim. Change 13 , 91–97 (2023).

Yu, Z. et al. Quantifying seasonal and diurnal contributions of urban landscapes to heat energy dynamics. Appl. Energy 264 , 114724 (2020).

Xue, Y. et al. Measurements and estimation of turbulent fluxes over a sparse-short grassland in Mangshan Forest Area in Beijing. Plateau Meteorol. 32 , 1692–1703 (2013).

Riitters, K. H. et al. A factor-analysis of landscape pattern and structure metrics. Landscape Ecol. 10 , 23–39 (1995).

Yu, Z. W. et al. Spatiotemporal patterns and characteristics of remotely sensed region heat islands during the rapid urbanization (1995-2015) of Southern China. Sci. Total Environ. 674 , 242–254 (2019).

Wan, Z. M. New refinements and validation of the collection-6 MODIS land-surface temperature/emissivity product. Remote Sens. Environ. 140 , 36–45 (2014).

Liu, H, et al. The influence of urban form on surface urban heat island and its planning implications: evidence from 1288 urban clusters in China. Sustain. Cities Soc. https://doi.org/10.1016/j.scs.2021.102987 (2021).

Li, Y. et al. On the influence of density and morphology on the urban heat island intensity. Nat. Commun. https://doi.org/10.1038/s41467-020-16461-9 (2020).

Zhou, D. C. et al. Spatiotemporal trends of urban heat island effect along the urban development intensity gradient in China. Sci. Total Environ. 544 , 617–626 (2016).

Rozenfeld, H. D. et al. Laws of population growth. Proc. Natl Acad. Sci. USA 105 , 18702–18707 (2008).

Liu, H. M., Huang, B. & Yang, C. Assessing the coordination between economic growth and urban climate change in China from 2000 to 2015. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2020.139283 (2020).

Zhou, B., Rybski, D. & Kropp, J. P. On the statistics of urban heat island intensity. Geophys. Res. Lett. 40 , 5486–5491 (2013).

Zhou, D. C. et al. Surface urban heat island in China’s 32 major cities: spatial patterns and drivers. Remote Sens. Environ. 152 , 51–61 (2014).

Clinton, N. & Gong, P. MODIS detected surface urban heat islands and sinks: global locations and controls. Remote Sens. Environ. 134 , 294–304 (2013).

Rasul, A., Balzter, H. & Smith, C. Spatial variation of the daytime surface urban cool island during the dry season in Erbil, Iraqi Kurdistan, from Landsat 8. Urban Clim. 14 , 176–186 (2015).

Yang, Q., Huang, X. & Tang, Q. The footprint of urban heat island effect in 302 Chinese cities: temporal trends and associated factors. Sci. Total Environ. 655 , 652–662 (2019).

Liang, Z. et al. The relationship between urban form and heat island intensity along the urban development gradients. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2019.135011 (2020).

Satapathy, S. K., Jagadev, A. K. & Dehuri, S. An empirical analysis of training algorithms of neural networks: a case study of EEG signal classification using Java Framework. in Intelligent Computing, Communication and Devices. Advances in Intelligent Systems and Computing, vol 309 (eds Jain, L. et al.) (Springer, 2015).

Lin, Y. & Wiegand, K. Low R2 in ecology: bitter, or B-side? Ecol. Indic. 153 , 110406 (2023).

Wang, H. et al. Intelligent anti-infection ventilation strategy based on computer audition: towards healthy built environment and low carbon emission. Sustain. Cities Soc. 99 , 104888 (2023).

Kaczmarczyk, K. & Miałkowska, K. Backtesting comparison of machine learning algorithms with different random seed. Procedia Comput. Sci. 207 , 1901–1910 (2022).

Ma, J. et al. Metaheuristic-based support vector regression for landslide displacement prediction: a comparative study. Landslides 19 , 2489–2511 (2022).

Rusakov, D. A. A misadventure of the correlation coefficient. Trends Neurosci. https://doi.org/10.1016/j.tins.2022.09.009 (2023).

Owusu, C. et al. Developing a granular scale environmental burden index (EBI) for diverse land cover types across the contiguous United States. Sci. Total Environ. 838 , 155908 (2022).

Cao, S.-J. et al. Mitigating urban heat island through neighboring rural land cover: dataset. Zenodo https://doi.org/10.5281/zenodo.10424322 (2024).

Download references

Acknowledgements

We disclose support for this work from National Natural Science Funds for Distinguished Young Scholar (grant number 52225005). We greatly appreciate the free access to the Landsat data provided by the United States Geological Survey (USGS) and MOD11A1 V6.1 product provided by the USGS and National Aeronautics and Space Administration (NASA). We thank the Google Earth Engine team for their excellent work to maintain the planetary-scale geospatial cloud platform.

Author information

Authors and affiliations.

School of Architecture, Southeast University, Nanjing, China

Miao Yang, Chen Ren, Haorui Wang, Junqi Wang, Zhuangbo Feng, Prashant Kumar, Fariborz Haghighat & Shi-Jie Cao

Jiangsu Province Engineering Research Center of Urban Heat and Pollution Control, Southeast University, Nanjing, China

Miao Yang, Chen Ren, Haorui Wang, Junqi Wang, Zhuangbo Feng & Shi-Jie Cao

Global Centre for Clean Air Research, School of Sustainability, Civil and Environmental Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Surrey, UK

Prashant Kumar & Shi-Jie Cao

Institute for Sustainability, University of Surrey, Surrey, UK

Prashant Kumar

Energy and Environment Group, Department of Building, Civil and Environmental Engineering, Concordia University, Montreal, Quebec, Canada

Fariborz Haghighat

You can also search for this author in PubMed   Google Scholar

Contributions

M.Y.: writing–original draft, conceptualization, data curation, formal analysis, investigation, methodology, software, visualization. C.R.: writing–original draft, writing–review and editing, methodology. H.W.: writing–original draft, writing–review and editing, methodology, software. J.W.: writing–review and editing, methodology, project administration. Z.F.: writing–review and editing, methodology, conceptualization. P.K.: writing–review and editing, methodology. F.H.: writing–review and editing, methodology, conceptualization. S.-J.C.: writing–review and editing, conceptualization, funding acquisition, project administration, resources, supervision, visualization.

Corresponding author

Correspondence to Shi-Jie Cao .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Cities thanks Murat Atasoy, Mingliang Liu, Chaobin Yang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1.

Framework of this work .

Extended Data Fig. 2

Locations of the 30 selected cities in the mainland China .

Supplementary information

Supplementary information.

Appendix A Figs. 1–17, Tables 1–4; Appendix B Figs. 1–2; Appendix C Figs. 1–16, Table 1, Discussions of variables, Statistics and Regressions.

Reporting Summary

Source data fig. 1.

Source data of Fig. 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Yang, M., Ren, C., Wang, H. et al. Mitigating urban heat island through neighboring rural land cover. Nat Cities 1 , 522–532 (2024). https://doi.org/10.1038/s44284-024-00091-z

Download citation

Received : 02 December 2023

Accepted : 17 June 2024

Published : 05 July 2024

Issue Date : August 2024

DOI : https://doi.org/10.1038/s44284-024-00091-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

regression analysis quantitative research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • BMC Musculoskelet Disord
  • PMC11267785

Logo of bmcmudis

Meta-analysis of the accuracy of the serum procalcitonin diagnostic test for osteomyelitis in children

1 Department of Emergency Surgery, The Second People’s Hospital of Lianyungang, Lianyungang, China

Dongsheng Zhu

2 Department of Pediatric Orthopedics, The First People’s Hospital of Lianyungang, Lianyungang, 222000 China

Xiaodong Wang

3 Department of Orthopedics, Children’s Hospital of Soochow University, Suzhou, China

4 Department of Pediatric, Xiangcheng District People’s Hospital, Suzhou, Jiangsu Province 215000 China

Associated Data

Datasets used and/or analysed during the current study are available from the corresponding authors upon reasonable request.

This study sought to assess the sensitivity, specificity, and predictive utility of serum procalcitonin (PCT) in the diagnosis of pediatric osteomyelitis.

A systematic computer-based search was conducted for eligible literature focusing on PCT for the diagnosis of osteomyelitis in children. Records were manually screened according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines. Statistical analysis was performed using Review Manager software 5.3, Meta-disc software1.4, STATA 12.0, and R 3.4 software.

A total of 5 investigations were included. Of these, 148 children with osteomyelitis were tested for bacterial cultures in PCT. For PCT in the diagnosis of pediatric osteomyelitis, diagnostic meta-analysis revealed a pooled sensitivity and specificity of 0.58 (95% confidence interval (CI): 0.49 to 0.68) and 0.92 (95% CI: 0.90 to 0.93) respectively. The PCT had the greatest area under the curve (AUC) at 0.80 for the diagnosis of osteomyelitis in children. The Deeks’ regression test for asymmetry results indicated that there was no publication bias when evaluating publication bias ( P  = 0.90).

This study provided a comprehensive review of the literature on the use of PCT in pediatric osteomyelitis diagnosis. PCT may be used as a biomarker for osteomyelitis diagnosis; however, its sensitivity was low. It still needs to be validated by a large sample study.

Introduction

Osteomyelitis in children and adolescents is an inflammation of bones induced by an acute bacterial infection [ 1 ]. Based on a study conducted by Agarwal et al. over the past decade, estimates of the total prevalence of bone and joint infections range from 2 to 13 per 10,000 youngsters [ 2 ]. However, it is often not clinically obvious where the infection-causing bacteria originated, making the colonization of the skin, mouth, or respiratory mucous membranes the most likely entry point for the infection, which is truly a diagnostic enigma in pediatric orthopedics [ 3 ]. Localized redness, swelling, suppuration, and other signs are the primary clinical indications of osteomyelitis [ 4 ]After an infection has taken hold, a rapid chain reaction of local and systemic inflammatory reactions occurs with the production of several inflammatory mediators. Laboratory serum inflammatory biomarkers had the features of accurate diagnosis in suspected instances of acute bone infections, both in pediatric and adult patients [ 5 ]. Procalcitonin (PCT) has been demonstrated in clinical studies to be a more accurate serum inflammatory biomarker for the detection of bacterial infections [ 6 ]. This biomarker exhibits the characteristics of a diagnostically reliable biomarker in suspected instances of acute bone infection. There were previously several findings in the literature that now suggest the usefulness of PCT in identifying acute osteomyelitis [ 7 – 9 ]. A research found PCT, at 0.4 ng/ml, was 85.2% sensitive and 87.3% specific in diagnosing septic arthritis and acute osteomyelitis; However, in comparison, PCT at conventional cut-off of 0.5 ng/ml is 66.7% sensitive and 91% specific, it seemed 0.4 ng/ml more suitable for diagnosis osteomyelitis [ 7 ]. In pyogenic spondylodiscitis, Italian scholars have found that PCT also has certain diagnostic value, especially when PCT was greater than 0.11ng/ml, it indicated poor prognosis [ 10 ]. Given the increasing use of PCT in the diagnosis of pediatric osteomyelitis, we conducted a thorough and quantitative review of current relevant literature reports to provide a scientific basis for the clinical use of PCT in the diagnosis of pediatric osteomyelitis.

Materials and methods

Search strategy.

PubMed, Embase and the Cochrane Library were used to identify articles eligible for the meta-analysis (last search: April 01, 2023), with the following subject terms: “osteomyelitis” or “bone infection” and “procalcitonin” or “PCT” and “adolescent” or “children” or “pediatrics”. We also reviewed the reference lists of previous systematic reviews to identify additional studies that might have been eligible for inclusion in the analysis. Reference lists of relevant reviews and previous meta-analyses were also searched.

Inclusion and exclusion criteria

The analysis comprised the following studies: (a) objective was to assess PCT’s diagnostic utility for osteomyelitis in children; (b) studies had a sufficient amount of data to build two-by-two contingency tables; (c)age under 18;(d) gold standard for the diagnosis of osteomyelitis was pathogen isolation or culture. The following studies were excluded from the analysis: review articles, case reports, clinical guideline and animal experiments.

Quality assessment

Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was used to assess the methodological quality of the included studies.

Literature screening and data extraction

Two reviewers reached an agreement on which studies should ultimately be included in the analysis after each of them separately assessed the studies that were eligible for inclusion. Negotiations or the assistance of a third researcher were used to resolve disagreements. Most of the retrieved information consisted of the following: study, year, country, age and sample, PCT cut-off and true positive (TP), false positive (FP), true negative (TN), and false negative (FN) of PCT.

Criteria of heterogeneity test

Threshold and non-threshold effects were the primary sources of variability in diagnostic tests. On the receiver operating characteristic (ROC) plane graph, sensitivity and specificity displayed a “shoulder-arm” point distribution when the threshold effect was present. Otherwise, there was no correlation. When there was no threshold effect, individual evaluation metrics, such as sensitivity and specificity, can be combined immediately. When threshold effect was present, Summary ROC (SROC) curve approach should be used to calculate the area and Q-index.

Statistical methods

Sensitivity, specificity, positive likelihood ratio (LR), negative likelihood ratio (LR), and diagnostic odds ratio (DOR) were calculated. Spearman correlation analysis was used to detect the threshold effect, and the SROC curve approximation was used to calculate the area and Q-index if threshold effect was present. I 2 values were used to measure the heterogeneity between studies. I 2 values > 50% suggested that there was significant heterogeneity present. We used the Deeks’ regression test for funnel plot asymmetry to examine for potential publication bias. Statistical analysis was performed using Review Manager Software 5.3, Meta-disc software1.4, STATA 12.0, and R 3.4 software.

Basic characteristic

The above searches yielded a total of 137,544 documents; however, only 5 articles met the inclusion requirements (Fig.  1 ) [ 11 – 15 ]. Finally, 5 articles came from USA, France, Israel, China and South Africa included 1235 children between 2005 and 2022 (Tables  1 and ​ and2 2 ).

An external file that holds a picture, illustration, etc.
Object name is 12891_2024_7716_Fig1_HTML.jpg

Diagram of workflow in the systematic review and meta-analysis

Basic characteristic of the included studies

AuthorYearNationAge(year)Study DesignPro-calcitonin detection methodInstrument company
Butbul-Aviel2005Israel.7.9 (7.6 ± 5.5)prospectiveA semi-quantitative rapid immunoassayBRAHMS
Faesch2009France4(0.1–14)retrospectiveAn automatic quantitative methodBRAHMS
Greeff2013South Africa< 14prospectivePro-calcitonin sensitive kitBRAHMS
Cui2017China6.50 ± 3.44prospectiveMicro-particle enzyme immunoassayBRAHMS
Lyons2022USA7(4–11)prospectiveSensitive compact immunoassayBRAHMS

Basic data of the included studies

AuthorYearSampleosteomyelitisTrue positiveFalse positiveFalse negativeTrue negativeSensitivitySpecificityCut-off
Butbul-Aviel200544127352958.33%90.62%0.5ng/mL
Faesch2009339205101530925.00%96.9%0.5ng/mL
Greeff201333431111875.00%62.07%0.5ng/mL
Cui2017172924435138077.17%69.47%0.356 ng/mL
Lyons2022647207321359535.00%96.86%0.5ng/mL

Quality evaluation

Evaluation of literature quality based on QUADAS-2 entries, including patient selection, index test, reference standard, and flow & timing, only one study was described as ambiguous in the reference criteria (Fig.  2 ).

An external file that holds a picture, illustration, etc.
Object name is 12891_2024_7716_Fig2_HTML.jpg

Methodological quality summaries

Threshold effect detection

This can lead to a threshold effect due to the different critical values of the PCT. Therefore, the first step is to test whether there is a threshold effect in this diagnostic experiment. The Spearman correlation coefficient was 0.9( P  = 0.037) and ROC plane graph displayed a “shoulder-arm” point distribution, indicating a threshold effect in PCT detection of pediatric osteomyelitis (Fig.  3 ). Therefore, the SROC curve approximation should be employed to calculate the area and Q-index.

An external file that holds a picture, illustration, etc.
Object name is 12891_2024_7716_Fig3_HTML.jpg

ROC plane graph for threshold effect detection

The value of PCT in diagnosing osteomyelitis in children

In all 5 trials included in the final meta-analysis, PCT had a pooled sensitivity of 0.58 (95% Confidence interval (CI), 0.49–0.68) to osteomyelitis diagnosis in children (Fig.  4 A) and a pooled specificity of 0.92 (95% CI, 0.90–0.93)to osteomyelitis diagnosis in children (Fig.  4 B).The pooled positive LR was 4.05(95% CI, 2.30–7.14)(Fig.  5 A), the pooled negative LR was 0.55 (95% CI, 0.37–0.82)(Fig.  5 B), and the diagnostic OR was 8.93, 95% CI (5.46 − 14.61)(Fig.  5 C), for the detection of osteomyelitis by PCT. Fagan’s nomogram showed the change in the predictive power of the PCT in pediatric osteomyelitis diagnosis after meta-analysis (Fig.  6 A)and its predictive utility (Fig.  6 B).Based on the SROC curve of the PCT, the AUC of the PCT was found to be 0.80 (Fig.  7 ).

An external file that holds a picture, illustration, etc.
Object name is 12891_2024_7716_Fig4_HTML.jpg

( A ) Forest of sensitivity for procalcitonin diagnosis of children with osteomyelitis; ( B ) Forest of specificity for procalcitonin diagnosis of children with osteomyelitis

An external file that holds a picture, illustration, etc.
Object name is 12891_2024_7716_Fig5_HTML.jpg

( A ) Forest of positive likelihood ratio for procalcitonin diagnosis of children with osteomyelitis; ( B ) Forest of negative likelihood ratio for procalcitonin diagnosis of children with osteomyelitis; ( C ) Forest of diagnostic odds ratio for procalcitonin diagnosis of children with osteomyelitis

An external file that holds a picture, illustration, etc.
Object name is 12891_2024_7716_Fig6_HTML.jpg

( A ) Fagan nomogram of PCT for diagnosis of children with osteomyelitis; ( B ) The value gram of positive likelihood ratio and negative likelihood ratio for procalcitonin diagnosis of children with osteomyelitis

An external file that holds a picture, illustration, etc.
Object name is 12891_2024_7716_Fig7_HTML.jpg

Summary receiver operating characteristics curve for procalcitonin diagnosis of children with osteomyelitis

Publication bias

Using Deeks’ regression test for asymmetry to detect publication bias ( P  = 0.9,> 0.05), we found that there was no discernible publication bias in the papers that were part of the study (Fig.  8 ).

An external file that holds a picture, illustration, etc.
Object name is 12891_2024_7716_Fig8_HTML.jpg

Deeks’ funnel plot for procalcitonin diagnosis of children with osteomyelitis

Osteomyelitis is most common in the long bones of the limbs, such as the femur, tibia, humerus, and radius, especially the femur and tibia [ 16 ]. It is a bacterial infection that spreads into bone tissue via the bloodstream, trauma, local diffusion, or surgery [ 17 ]. Microbiological examinations of children were previously used to determine the diagnostic criteria for osteomyelitis. However, bacterial culture typically takes several days, which is not conducive to the early diagnosis of acute osteomyelitis in children. Thus, serum inflammation biomarkers play a potential role in clinical practice for patients with suspected osteomyelitis, of which PCT is one [ 6 ]. Osteomyelitis must be diagnosed and treated at an early stage, especially for children and adolescents who are still growing and developing [ 18 ]. If it develops into chronic osteomyelitis, it can potentially impair a child’s bone growth and can remain morbid, sometimes taking years to heal [ 19 ].

PCT is a precursor peptide of calcitonin that is stable in humans [ 20 ]. Within a few hours of contracting a bacterial infection, children emit huge amounts of interleukins and tumor necrosis factor, which will boost the expression of PCT gene [ 21 ]. Since PCT is a sensitive biomarker of an important inflammatory component in bacterial infections, it is now routinely employed as a marker of infection [ 22 ]. Following an extensive literature review, we did not find any meta-analysis of PCT in pediatric osteomyelitis diagnosis, only a similar report [ 23 ]. However, this paper incorporated pediatric osteomyelitis and purulent arthritis into one set of data, which inevitably leads to a bias. To investigate the usefulness of PCT as a diagnostic biomarker for osteomyelitis in children, we performed a meta-analysis that eventually included 5 studies. We used pooled data from 5 studies enrolling a total of 148 children with osteomyelitis presentation. This study contained one retrospective study and four prospective trials of excellent overall quality. Our meta-analysis indicated a pooled sensitivity of 58% and specificity of 91% for osteomyelitis diagnosis in children. The findings suggested that the rate of misdiagnosis of osteomyelitis in children is minimal, but the rate of missed diagnosis is high.

Pooled likelihood ratio estimation: positive LR and negative LR were used to calculate the post-test probabilities to make our results more clinically informative. They are relatively independent and clinically significant evaluation indicator of the efficacy of diagnostic trials [ 24 ]. The meta-analysis indicated a pooled positive LR was 4.05, and a pooled negative LR was 0.55. In addition, Fagn’s nomogram showed that given a pre-test probability of 20%, a pooled negative LR of 0.50 reduces the post-test probability to 11%. Thus, only 1 out 9 patients with negative PCT results may end up with osteomyelitis. However, as reported in previous literature, when the pooled positive LR > 10, and the pooled negative LR < 0.1, the likelihood of diagnosing or ruling out a certain disease is significantly increased [ 25 ]. This suggests that the PCT still has some shortcomings in its application to the diagnosis of pediatric osteomyelitis. In 2021, a meta-analysis examined the value of C-reactive protein (CRP) in the diagnosis of bone and joint infections in children and adolescents. They found a sensitivity and specificity of 0.86 and 0.90 for CRP in the diagnosis of bone and joint infections in children and adolescents, respectively; however the positive LR and negative LR were 5.3 and 0.1, respectively [ 26 ]. This indicates that the ability to diagnose bone and joint infections in children and adolescents using CRP alone remains inadequate. ​Similar shortcomings were found in our meta-analysis of PCT, so a combined test is still necessary. ​In addition, there is still no consensus on the best PCT cut-off for diagnosing inflammatory bone and joint disease. PCT at the conventional cut-off of 0.5 ng/ml; however, some scholars have suggested that 0.4 ng/ml is more appropriate for the diagnosis of osteomyelitis [ 7 ]. In pyogenic spondylodiscitis, Italian scholars have found that the PCT also has some diagnostic value, especially when the PCT is greater than 0.11ng/ml, indicating a poor prognosis [ 10 ]. Therefor, more high-quality studies are necessary in the future to further investigate the role of PCT in the diagnosis of pediatric osteomyelitis. When the AUC is lower than 0.5, it indicates no diagnostic value; 0.5 ~ 0.7 indicates a low diagnostic value; 0.7–0.9 indicates a high diagnostic value; When it is higher than 0.9, the diagnostic accuracy is highest [ 27 ]. The results of this study showed that the PCT AUC reaches 0.80, indicating high diagnostic accuracy. The Q-index is the value corresponding to the point closest to the upper left corner of the SROC curve, where sensitivity and specificity are equal. The higher the -Q-index, the higher the accuracy of the diagnostic test. The Q-index of this study was 0.737, which, combined with the AUC, indicats that PTC is of good diagnostic value for the diagnosis of osteomyelitis in children. Our study provides a comprehensive review of the literature on the application of PCT in the diagnosis of pediatric osteomyelitis and yields several important conclusions. PCT emerges as a potential biomarker for osteomyelitis diagnosis, with its elevated levels showing a certain degree of correlation with the occurrence of the disease. This offers a novel diagnostic approach. However, given the low diagnostic sensitivity of the PCT, it is still best to combine other indicators for the detection of pediatric osteomyelitis. Despite the limited sensitivity of PCT, our findings still make significant contributions to the current understanding of PCT in pediatric osteomyelitis diagnosis. Firstly, our review provides clinicians with comprehensive information on the application of PCT in pediatric osteomyelitis diagnosis, helping them better understand the advantages and limitations of this biomarker. Secondly, our findings highlight the direction for future research, which is to further validate the value of PCT in pediatric osteomyelitis diagnosis through large-sample studies and explore possible methods to improve its sensitivity.

The significant findings of our study have important implications for the clinical management of pediatric osteomyelitis. Our results indicate that PCT may serve as a reliable biomarker for the diagnosis of this condition, with potential to improve diagnostic accuracy and reduce unnecessary antibiotic exposure. This has significant value given the risks associated with misdiagnosis and overtreatment of pediatric osteomyelitis. Moreover, our work builds upon and extends the current understanding of PCT as a diagnostic marker in pediatric osteomyelitis. While previous studies have examined the utility of PCT in this context, our study provides additional insights by focusing specifically on pediatric patients and analyzing a larger sample size. This allows us to more precisely characterize the diagnostic performance of PCT in this patient population.

However, it is important to acknowledge that our study has limitations, including the retrospective nature of the analysis and the potential for confounding factors. Limitations of the study: (1) The design of the included studies contained prospective research and perspective studies, which may have had some impact on the results; (2) The threshold values for inclusion in the study were different, and different thresholds may still affect the sensitivity and specificity of the PCT diagnosis; 3)Spearman correlation coefficients suggested the presence of threshold effects, and subgroup analyses were not performed due to the limited literature included; 4)This study included only English-language literatures, and may have been language bias. Future prospective studies with larger sample sizes and rigorous methodological designs will be necessary to further validate our findings and determine the optimal use of PCT in pediatric osteomyelitis diagnosis. Additionally, we can consider incorporating more clinical parameters and biomarkers, such as white blood cell count, CRP, and erythrocyte sedimentation rate, to enhance the accuracy and reliability of diagnosis. Futhermore, we can explore new detection techniques, such as gene- or proteome-based detection methods, to potentially improve the sensitivity of PCT in pediatric osteomyelitis diagnosis.

This study provided a comprehensive assessment of the existing literature on the diagnosis of osteomyelitis in children using PCT, which may serve as a biomarker for osteomyelitis diagnosis. However, the PCT has a low sensitivity to pediatric osteomyelitis diagnosis. Hence, its use as a biomarker for pediatric osteomyelitis detection alone is not recommended and needs to be combined with other detection methods. Due to the limitations in the number and quality of included studies, a large number of high-quality studies are required in the future to further explore the application value of their combined diagnostics.

Acknowledgements

Not applicable.

Abbreviations

PCTProcalcitonin
QUADAS-2Quality Assessment of Diagnostic Accuracy Studies-2
TPTrue positive
FPFalse positive
TNTrue negative
FNFalse negative
ROCReceiver operating characteristic
SROCSummery receiver operating characteristic
LRLikelihood ratio
DORDiagnostic odds ratio
CIConfidence interval

Author contributions

D.S.Z wrote the main manuscript text and D.S.Z, H.Q, J.W, X.D.W prepared Figs.  1 , ​ ,2, 2 , ​ ,3, 3 , ​ ,4, 4 , ​ ,5, 5 , ​ ,6, 6 , ​ ,7 7 and ​ and8. 8 . All authors reviewed the manuscript.

Medical research Project of Jiangsu Provincial Health Committee, K2019005; Lianyungang City maternal and child health research project (F202319).

Data availability

Declarations.

Not Applicable.

The authors declare no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Han Qi and Dongsheng Zhu are co-first authors.

Contributor Information

Dongsheng Zhu, Email: nc.ude.umt@gnehsgnoduhz .

Jian Wu, Email: moc.361@oowynav .

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • For authors
  • New editors
  • BMJ Journals

You are here

  • Volume 57, Issue 16
  • Exercise as medicine for depressive symptoms? A systematic review and meta-analysis with meta-regression
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0001-9270-7027 Andreas Heissel 1 ,
  • http://orcid.org/0000-0001-9172-9718 Darlene Heinen 1 ,
  • http://orcid.org/0000-0003-4136-9704 Luisa Leonie Brokmeier 1 ,
  • Nora Skarabis 1 ,
  • http://orcid.org/0000-0001-8693-2949 Maria Kangas 2 ,
  • http://orcid.org/0000-0002-4592-8625 Davy Vancampfort 3 ,
  • http://orcid.org/0000-0001-7387-3791 Brendon Stubbs 4 ,
  • http://orcid.org/0000-0002-0618-2752 Joseph Firth 5 ,
  • http://orcid.org/0000-0002-5779-7722 Philip B Ward 6 ,
  • http://orcid.org/0000-0002-8984-4941 Simon Rosenbaum 7 ,
  • http://orcid.org/0000-0002-0599-2403 Mats Hallgren 8 ,
  • http://orcid.org/0000-0002-5190-4515 Felipe Schuch 9 , 10 , 11
  • 1 Social and Preventive Medicine, Department of Sports and Health Sciences, Intra faculty unit "Cognitive Sciences" , Faculty of Human Science and Faculty of Health Sciences Brandenburg, Research Area Services Research and e-Health , Potsdam , Brandenburg , Germany
  • 2 School of Psychological Sciences, Centre for Emotional Health , Macquarie University , Sydney , New South Wales , Australia
  • 3 Department of Rehabilitation Sciences , University of Leuven , Leuven , Belgium
  • 4 Physiotherapy Department , South London and Maudsley NHS Foundation Trust; Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, Kings College London , London , UK
  • 5 Division of Psychology and Mental Health , University of Manchester, Manchester Academic Health Science Centre, Manchester, UK; NICM Health Research Institute, Western Sydney University, Westmead Australia; Greater Manchester Mental Health NHS Foundation Trust, Manchester Academic Health Science Centre , Manchester , UK
  • 6 Discipline of Psychiatry and Mental Health, UNSW Sydney, Sydney New South Wales, Australia; Ingham Institute of Applied Medical Research , UNSW , Liverpool BC , New South Wales , Australia
  • 7 School of Psychiatry , UNSW , Sydney , New South Wales , Australia
  • 8 Epidemiology of Psychiatric Conditions, Substance use and Social Environment (EPiCSS) , Department of Public Health Sciences, Karolinska Institute Solna , Solna , Sverige , Sweden
  • 9 Department of Sports Methods and Techniques , Federal University of Santa Maria , Santa Maria , Brazil
  • 10 Institute of Psychiatry , Federal University of Rio de Janeiro , Rio de Janeiro , Brazil
  • 11 Universidad Autónoma de Chile , Providencia , Chile
  • Correspondence to Dr Andreas Heissel, Sport and Health Sciences, Social and Preventive Medicine, Faculty of Human Sciences, University of Potsdam, 14476 Potsdam, Germany; andreas.heissel{at}uni-potsdam.de

BMJ Learning - Take the Test

Objective To estimate the efficacy of exercise on depressive symptoms compared with non-active control groups and to determine the moderating effects of exercise on depression and the presence of publication bias.

Design Systematic review and meta-analysis with meta-regression.

Data sources The Cochrane Central Register of Controlled Trials, PubMed, MEDLINE, Embase, SPORTDiscus, PsycINFO, Scopus and Web of Science were searched without language restrictions from inception to 13 September2022 (PROSPERO registration no CRD42020210651).

Eligibility criteria for selecting studies Randomised controlled trials including participants aged 18 years or older with a diagnosis of major depressive disorder or those with depressive symptoms determined by validated screening measures scoring above the threshold value, investigating the effects of an exercise intervention (aerobic and/or resistance exercise) compared with a non-exercising control group.

Results Forty-one studies, comprising 2264 participants post intervention were included in the meta-analysis demonstrating large effects (standardised mean difference (SMD)=−0.946, 95% CI −1.18 to −0.71) favouring exercise interventions which corresponds to the number needed to treat (NNT)=2 (95% CI 1.68 to 2.59). Large effects were found in studies with individuals with major depressive disorder (SMD=−0.998, 95% CI −1.39 to −0.61, k=20), supervised exercise interventions (SMD=−1.026, 95% CI −1.28 to −0.77, k=40) and moderate effects when analyses were restricted to low risk of bias studies (SMD=−0.666, 95% CI −0.99 to −0.34, k=12, NNT=2.8 (95% CI 1.94 to 5.22)).

Conclusion Exercise is efficacious in treating depression and depressive symptoms and should be offered as an evidence-based treatment option focusing on supervised and group exercise with moderate intensity and aerobic exercise regimes. The small sample sizes of many trials and high heterogeneity in methods should be considered when interpreting the results.

  • Public health

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bjsports-2022-106282

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN?

Depression is the leading cause of disability worldwide with potentially increasing prevalence since the COVID-19 pandemic, yet more than two thirds of adults diagnosed with depression remain untreated.

Exercise is an efficacious treatment option for reducing depressive symptoms for individuals with depression.

However, evidence reported by meta-analyses reveals heterogeneous effects and is not up to date.

WHAT ARE THE NEW FINDINGS?

This methodologically sound systematic review and meta-analysis with meta-regression is the largest synthesis of the effect of exercise on major depressive disorder (MDD) and depressive symptoms covering 41 included studies, accounting for 2.264 adult participants postintervention.

Results show moderate to large effects of exercise on depressive symptoms even when limiting the analysis to low risk of bias studies or only MDD, although high heterogeneity among the studies was addressed with meta-regression.

Non-inferiority trials indicate that exercise is non-inferior to current first line treatments, and evidence that exercise is effective at long-term follow-ups are needed to clarify the identified evidence gaps.

Introduction

Depression is a prevalent and disabling disorder associated with reduced quality of life, medical comorbidity and mortality. 1 2 Over 300 million people live with depressive disorder, equating to approximately 4.4% of the world’s population. 3 The prevalence of depression has increased during the COVID-19 pandemic 4–7 by an estimated 27.6%, 7 highlighting the need for appropriate, accessible and cost-effective treatment options. 8

Currently, recommended treatments include psychotherapy and antidepressant medication (or a combination of both). 9 However, psychotherapy achieves remission rates of only 50% while typically being cost-intensive. 10 Side effects and relapses from antidepressant medication commonly occur 11 as can withdrawal symptoms. 12 Importantly, about two thirds of adults with depression do not receive adequate treatment. 13 Untreated depression often leads to intensification of the illness including the development of comorbidities resulting in even higher costs for society. 14 This attests to the need for rapid and readily available alternative treatment options.

Exercise has been recommended as an adjunct treatment for depression by both the WHO 15 and National Institute for Health and Care Excellence (NICE) guidelines. 16 Evidence for these recommendations included results from multiple meta-analyses investigating the antidepressant effect of exercise in people with depression. 17–20 However, some of these meta-analyses 18 21 found moderate, weak or no effects of exercise while others reported large effects. 17 19 20 These mixed results stem from methodological and conceptual differences regarding inclusion criteria and analytical approaches. For example, some studies focused on individuals with a diagnosis of major depressive disorder (MDD) while excluding studies that evaluated the presence of depression based on validated screening measures. 18 Others 17 investigated the effect of exercise alone or as a complementary treatment for depression to pharmacological therapy for studies published from 2003 to 2019. Further, some reviews included studies where patients also received exercise interventions 21 in the control groups. This creates the potential for bias 22 as even light intensity exercise can exert antidepressant effects. Importantly, a cause for concern has been raised in several reviews that exercise does not have a significant effect when restricted to ‘low risk of bias’ randomised controlled trials (RCTs) . 18 21 Therefore, extant meta-analyses have failed to provide convincing evidence to enable clinicians globally to implement exercise as an evidence-based effective treatment option for depression. One meta-analysis 20 addressed these methodological shortcomings by focusing on studies that included samples with depression using cut-offs on validated screening instruments and samples with MDD diagnosis assessed with diagnostic tools and including only studies that compared exercise versus non-active controls. The authors excluded trials comparing different exercise regimens. However, a large volume of studies has been published within the last 5 years, requiring an updated meta-analysis on the antidepressant effects of exercise, while addressing the shortcomings of previous reviews.

The objective of this meta‐analysis was to update the current evidence on the effects of exercise in reducing depressive symptoms in adults with clinically elevated levels of depression including MDD and dysthymia, comparing exercise with non-exercising control groups. Additionally, we aimed to investigate the potential moderators of the antidepressant effects of exercise, and the presence of publication bias.

Search strategy and selection criteria

This systematic review and meta-analysis was registered in the International Prospective Register of Systematic Reviews (PROSPERO) with the protocol number CRD42020210651. The PRISMA Statement was followed 23 in its updated version 24 additionally considering the PERSiST guidance (implementing PRISMA in Exercise, Rehabilitation, Sport medicine and SporTs science). 25

To structure the eligibility criteria, the PICOS (Patient/Population; Intervention; Comparison, Outcome; Study design) approach was used. 26 Eligible for this meta-analysis included studies that: (1) Investigated participants aged 18 years or older with a primary diagnosis of MDD or dysthymia defined by the Research Diagnostic Criteria, 24 Diagnostic and Statistical Manual of Mental Disorders (DSM-IV or DSM-5) 27 or International Classification of Diseases (ICD-10) 28 or adults with depressive symptoms determined by validated screening measures scoring above the threshold value (eg, Beck Depression Inventory (BDI) or Hamilton Rating Scale for Depression (HAM-D)). 29 30 If scales did not have validated cut-offs, the cut-off used by the author was accepted. (2) Investigated an exercise intervention in the treatment of depression, where exercise was defined as planned, structured, repetitive and purposive physical activity with the purpose to improve or maintain physical fitness. 31 Studies using yoga, tai chi or other mind-body activities were excluded, because the focus of such mind-body interventions are behavioural techniques that include, but not limited to, deep breathing, meditation/mindfulness and self-awareness. 32 (3) Included a non-exercising control group, such as usual-care, wait-list control conditions or placebo pills. Studies with any other exercise intervention (such as stretching or low-dose exercise) as a comparator were excluded as well as control and intervention groups commencing standardised interventions (eg, psychotherapy, medication) at the beginning of the intervention even if this was applied to both intervention and control groups (eg, starting medication treatment at the beginning of the intervention in both groups). However, ongoing treatments started at least 3 months before intervention initiation were included. (4) Examined the pre-post effects of exercise interventions on depressive symptoms using a validated depression scale. (5) Were RCTs and were published in peer-reviewed journals or as part of a dissertation. Conference proceedings were not included.

An electronic search of the following databases was conducted: Cochrane Central Register of Controlled Trials (CENTRAL), PubMed, MEDLINE, Embase, SPORTDiscus and PsycINFO without any (eg, language or date) restrictions from inception to 13 September 2022. The search used a range of relevant terms to capture all potentially eligible results relating to exercise interventions for depressive symptoms (for the full list of search terms, see online supplemental text 1 ). Duplicate references were removed electronically and manually. To identify unpublished or ongoing studies, clinical trials.gov ( www.clinicaltrials.gov ) was searched. Additionally, reference lists of all eligible articles of recent reviews investigating the effectiveness of exercise versus control were screened to identify potentially eligible articles. All manuscripts were reviewed by at least two independent reviewers. Three reviewers (NS/LLB, DH) independently determined potentially eligible articles meeting the inclusion criteria using the titles and abstracts. Three independent reviewers (NS/LLB, DH) then applied the eligibility criteria after obtaining the full texts and generated a final list of included articles through consensus. If full-texts were not available, study authors were contacted to provide them. Five investigators (NS/ LLB, DH, FS, AH) judged article eligibility with any disagreements resolved through discussion.

Supplemental material

Data extraction.

Data extraction was done by three reviewers (NS/ LLB, DH) independently. A systematic extraction form was used for each article to collect the following data: (1) sample description (eg, sample size, mean age of participants); (2) intervention features (eg, type of exercise, length of trial); (3) methodological factors (eg, risk of bias, instruments used for diagnosis and symptom assessment); (4) effects on depressive symptoms (eg, changes in total depressive symptoms scored before and after intervention). For further information of extracted data see online supplemental tables 1, 4 and 5 .

Primary outcome

The primary outcome was the mean change in depressive symptoms in the exercise compared with the control group from baseline to postintervention. The primary outcome proposed by the authors was selected if two or more instruments were used.

Study quality assessment

Selected studies were assessed by three independent authors (NS/ LLB, DH) given an overall estimation of risk of bias (ie, low risk, some concerns or high risk) according to the revised Cochrane risk-of-bias tool for randomised trials (RoB2). 33 According to RoB2, the following domains were considered for the assessment of risk of bias: randomisation process, deviations from intended interventions, missing outcome data, measurement of the outcome and selection of the reported result (see online supplemental table 3 ).

Data-analysis

A random effects meta-analysis was calculated due to expected heterogeneity. The standardised mean difference (SMD) and 95% CIs were used as the effect size (ES) measure. The SMD was calculated using the difference from pre to post intervention 34 with correlations of 0.7 between the exercise and the control group. All results were calculated on an intention-to-treat basis. Heterogeneity was calculated using the I 2 . 35 Sensitivity analyses with pre-post-correlations of 0.6 and 0.8 were calculated to investigate changes in effect with less or more conservative values. Sensitivity analyses were further calculated excluding one study due to unequal distribution of psychotherapy among the intervention and control group and excluding studies with high risk for bias. Potential moderators (see table 1 ) of the antidepressant effects of exercise were investigated using linear meta-regression analyses for all studies and, separately, for studies including only patients with a diagnosis of MDD and/or dysthymia. Meta-regression assumptions were tested in JASP. Subgroup analyses were calculated to estimate the effect across depression classification, risk of bias, differing control conditions, intensity of exercise, exercise type, exercising in a group or individually, sample sizes and supervision (by different supervisors). We also calculated the mean difference (MD) on studies that assessed depressive symptoms using the Hamilton scale for depression or the BDI separately. Significance level was set at 0.05. 36 Publication bias was assessed with visual inspection of funnel plots and with the Begg-Mazumdar Kendall’s tau 37 and Egger bias test. 38 Whenever significant, the Duval and Tweedie Trim and Fill was applied. 39 Fail safe number of negative studies that would be required to nullify (ie, make p>0.05) the ES were calculated. 40 All analyses were performed using Comprehensive Meta-Analysis software, 38 and number needed to treat (NNT) 41 analyses were calculated using Lenhard and Lenhard 42 with the formula of converting Cohens’ d to NNT from Furukawa and Leucht. 43 Additionally, studies reporting (severe) adverse events and side effects were listed.

  • View inline

Subgroup meta-analysis of studies included in the quantitative analyses

Search results

Searching of databases yielded 15 734 studies and an additional 84 studies were identified through other sources. Following removal of duplicates, 7100 potentially eligible studies remained for which abstracts were screened. At full text stage, 207 studies were reviewed and 166 removed because they failed to meet inclusion criteria (see online supplemental table 2 for references and exclusion reasons). The remaining 41 studies were included in the review and quantitative synthesis (see figure 1 ).

  • Download figure
  • Open in new tab
  • Download powerpoint

Flowchart of study selection. Flowchart adapted from the PRISMA 2020 statement. 42 RCT, randomised controlled trial.

Study characteristics

In total, 2544 participants are included in the review and 2264 completed treatments (post-treatment n) and were included in the meta-analytical calculations, 1227 in intervention groups and 1037 in control groups. Twenty-one studies assessed depressive symptoms, 44–64 while MDD was diagnosed in 20 studies. 65–84 Percentage of females ranged from 26% to 100%, mean age from 18.8 to 87.9 years. Of the 41 included RCTs, studies originated from North and South America, Europe, Asia and Australia. See online supplemental table 1 for characteristics of selected studies (further characteristics are summarised in online supplemental tables 4 and 5 ).

Risk of bias

Risk of bias assessment revealed 12 studies to be rated of low risk of bias, 49 58 60 65 66 68–70 74 79 80 82 while 7 were rated with some concerns. 44 45 48 51 55 73 76 For 22 studies, RoB2 indicated high risk for bias. 46 47 50 52–54 56 57 59 61–64 67 71 72 75 77 78 81 83 84 For full details, see online supplemental table 3 .

Main analysis

The main analysis of pooled data from 41 studies showed a large effect favouring exercise for a pre-post-correlation of 0.7 (SMD=−0.946, 95% CI −1.18 to −0.71, p<0.001, I 2 =82.49, p<0.001; see figure 2 ). Publication bias was indicated by Begg-Mazumdar Kendall’s Tau 37 (=−0.379, p<0.001 and the Egger 38 tests (intercept=−2.706, p<0.001). However, Duval and Tweedies’ trim and fill method did not affect the effect. Fail-safe number of additional negative studies was 2789. The visual inspection of the funnel plot (see online supplemental figure 1 ) did not indicate risk of bias. Sensitivity analyses revealed a trivial change in the effect from −0.930 (95% CI −1.16 to −0.70, p<0.001, I 2 =82.032, p<0.001) for a 0.8 pre-post-correlation to −0.957 (95% CI −1.19 to −0.72, p<0.001, I 2 =82.820, p<0.001) for a 0.6 pre-post-correlation. Excluding one study 74 due to unequal distribution of psychotherapy treatments among the intervention and control group (20% vs 0%, respectively) revealed an effect of SMD=−0.938 (95% CI −1.17 to −0.70, p<0.001, I 2 =82.703, p<0.001). Excluding studies with high risk of bias (see online supplemental table 3 ) rendered a moderate effect favouring exercise intervention (SMD=−0.717, 95% CI −1.01 to −0.43, p<0.001, I 2 =82.372, p<0.001). Excluding studies with less than 6 weeks of intervention (see online supplemental table 1 ) showed a large effect favouring exercise intervention (SMD=−0.959, 95% CI −1.21 to −0.71, p<0.001, I 2 =84.132, p<0.001). I 2 is suggesting substantial heterogeneity for the analyses.

Meta-analysis of overall studies. N, preintervention n, postintervention, SMD, standardised difference.

Subgroup analyses

Subgroup analyses (summarised in table 1 ) showed that the beneficial effect of exercise on depression remained for all subgroups regarding depression classification, risk of bias, group exercise, the sample size of the trial and supervision by exercise professionals. Aerobic (SMD=−1.156) and resistance training (−1.042) as exercise types showed large effects whereas mixed aerobic and resistance training showed small effects (−0.455). Large effects were also found for studies including sample sizes in the intervention arm of less than 25 participants (SMD=−0.868 to −1.281) whereas larger samples of participants revealed moderate effects (SMD=−0.532). Subgroup analyses with health education, with light exercise interventions, or with unsupervised training only including small numbers of analysed studies showed comparable SMDs but no significant effects. Subgroup analyses with studies with low or moderate risk of bias confirm results by showing similar outcomes (see online supplemental table 6 ) as well as subgroup analyses for studies with the diagnosis of MDD and dysthymia (see online supplemental table 7 ).

Adverse events and side effects

In 10 studies, it was documented that no (serious) adverse events occurred. 52 62 63 69 73 74 80–82 Three of these studies reported minor adverse events like muscle or joint pain, headache and fatigue. 52 70 74 One study reported that adverse events occurred but did not provide further information. 79 Three studies reported few side effects like worsening of pre-existing orthopaedic injuries or admittance to psychiatric ward due to major depression. 46 66 71

Meta-regression

Meta-regression (see table 2 ) was calculated for the main analysis and MDD only. In the main analysis, duration of trial in weeks moderated the effect of exercise on depression, with shorter trials associated with larger effects (β=0.032, 95% CI 0.01 to 0.09, p=0.032, R²=0.06). For MDD only, higher antidepressant use by the control group was associated with smaller effects (β=−0.013, 95% CI −0.02 to −0.01, p=0.012, R²=0.28). A meta-regression with studies with low and moderate risk of bias (see online supplemental table 8 ) rendered a moderating effect of duration of trials overall (β=0.064, 95% CI 0.01 to 0.126, p=0.04, R²=0.12) as well as for MDD only (β=0.070, 95% CI 0.01 to 0.14, p=0.034, R²=0.26).

Meta-regression of moderators/correlates of effects of exercise on depression

Mean change and numbers needed to treat (NNT)

We found a mean change of −4.70 points (95% CI −6.25 to −3.15, p<0.001, n=685) on the HAM-D and for the BDI of −6.49 points (95% CI −8.55 to −4.42, p<0.001, n=275) as an additional improvement effect of exercise over control conditions. The calculated NNT was 2.0 (95% CI 1.68 to 2.59) for the main-analysis, and 2.8 (95% CI 1.94 to 5.22) for the low risk of bias studies. For MDD, only the NNT was 1.9 (95% CI 1.49 to 2.99) and 1.6 (95% CI 1.58 to 2.41) in supervision by other professionals/students.

This is the largest meta-analysis investigating the effects of exercise for depressive symptoms within samples with diagnosed or indicated depression. Among 41 RCTs, we found that exercise interventions had a large effect favouring exercise over control conditions. Publication bias tests indicate an underestimation of this effect. Subgroup analyses resolved several key questions that lacked clarity from previous reviews; 17–20 specifically, the positive effect of exercise remained significant regardless of risk of bias, depression classification, exercise type, group setting, type of supervision or sample size. Subgroup analyses with health education (k=3), with light exercise interventions (k=2) or with unsupervised training (k=6) showed comparable SMDs but no statistical significance, which can be attributable to the lack of power due to the small numbers of studies included in the subgroup analyses. Surprisingly, the combination of mixed aerobic and resistance training showed smaller effects than aerobic or resistance training as single interventions. We also found a decline in ES from large to moderate for studies with sample sizes in the intervention arm of 25 or more participants. Focusing on solely diagnosed MDD, significant effects of exercise were found for all subgroup analyses except for light and mixed exercise, unsupervised training and for studies with some concern for risk of bias which can again be attributable to a lack of power due to the small number of included studies in the analyses (k=2 to 3). Limiting analyses to studies with low risk of bias and some concerns according to ROB 2 reveal similar results but with ESs declining from high to moderate for most analyses (see online supplemental table 6 ). Meta-regressions indicated a moderating effect of trial duration favouring shorter interventions and remained robust in meta-regressions without studies of high risk of bias. Regarding the type of exercise, most trial arms (k=30) investigated aerobic exercise detecting large effects followed by resistance training with comparable outcomes. In terms of the exercise intensity, only two arms investigated light intensity exercise while 26 and 10 trials applied moderate and vigorous intensity respectively, with all trials evidencing large effects. Supervised exercise revealed large ESs compared with unsupervised exercise with small effects. Minimal differences were detected between group and non-group exercise interventions favouring group exercise, with both showing large effects. Intervention arms with samples sizes ≥25 revealed moderate effects (see table 1 for details).

A recent meta-analysis of Cuijpers et al 85 found a moderate ES for psychotherapy treatment for depression across all age groups (g=0.75), and also when solely including studies with low risk of bias (g=0.51); while in terms of antidepressant efficacy, Cipriani et al 86 found medication to be more effective in comparison to placebo with Odds ratio of 2.13 indicating a small ES of d=0.417. This is notable as the presented results suggest exercise to qualify as an efficacious treatment option for depressive symptoms among individuals with depression.

These results extend the findings from an earlier meta-analysis of Schuch et al 20 (based on 25 studies including 1487 participants, revealing high heterogeneity of I 2 82.10%). Notably, the present findings are based on an additional 17 studies 44 45 48 52 53 55 59 62 67–69 71 74 78 79 84 since Schuch et al ’s 20 review and 4 45 55 68 74 studies following the most recent meta-analysis by Carneiro et al , 17 comprising only 15 studies focusing on different inclusion criteria including medication in treatment and control arm conditions.

In contrast to Krogh et al , 18 the analyses including only low risk of bias studies resulted in moderate effects with wide 95% CIs ranging from −0.99 to −0.34. Of note, we used the current risk of bias tool (RoB2) and included a greater number of low risk of bias trials compared with Krogh et al ’s 18 meta-analysis (11 vs 4). To reduce risk of bias, we compared exercise treatment groups with non-exercising control groups only. From the included 35 trials in the Cochrane Meta-Analysis by Cooney et al 21 consisting of 1356 participants, they reported 63% heterogeneity for the main analyses, the current review excluded 13 of these studies as groups were labelled as either ‘controls’ (ie, they received psychotherapy or pharmacotherapy) or groups labelled as ‘exercise’ groups (ie, they received a combination of exercise and another form of therapy or no therapy at all) or participants did not meet criteria for depression (see Ref. 22 for a critical appraisal). Krogh et al 18 also included 35 trials comprising 2498 participants with high heterogeneity (I 2 =81%) of which the current review excluded 17 studies with control groups that received stretching, relaxation or compared exercise to psychotherapy, medication or combined exercise with psychotherapy. Morres et al 19 included 11 trials involving 455 patients revealing low and non-statistically significant heterogeneity (I²=21%) but focused on aerobic exercise only; however, five of these studies were excluded from the current review because they included medication, active control conditions or cognitive or counselling therapy as comparator conditions. Carneiro et al 17 included 15 studies in their meta-analysis with a total sample size of 1532 individuals focusing on pharmacological treatment, exercise treatment and combined exercise with psychotherapy, of which the current review excluded 7 studies due to the inclusion of pharmacological therapy as a comparator condition either alone or in combination with psychotherapy. A further study was also excluded because participants were offered internet guided text modules on how to become more physically active but no actual exercise intervention was administered. Carneiro et al . 17 overall reported moderate heterogeneity (I²= 33%).

This summary reveals that a notable methodological limitation based on the former published meta-analyses in this field, included a proportion of trials with questionable intervention or control group conditions, which resulted in the inability to detect the effect of exercise per se (while excluding other forms of interventions). Hence, this notable shortcoming was addressed in this current meta-analytic review. Although we explored heterogeneity with sub-analyses and meta regression, we also found similar large heterogeneity comparable to previous larger meta-analyses 18 20 21 which guarantees comparability, yet needs to be considered when interpreting the results.

Our meta-regressions indicated that shorter trials are associated with larger effects than longer trials. A possible explanation is that larger trials had more dropouts, and higher dropout rates can reduce the effect in intention to treat analyses. 87 Alternatively, it is possible that the effect wanes with the time. However, all but three studies had interventions lasting 16 weeks or less and further studies with longer follow-ups should confirm this effect. 88 Also, we have found that studies in which control groups had a higher percentage of participants taking antidepressants identified smaller effects of exercise. This is expected as the difference on the magnitude of the improvement on depressive symptoms is smaller when exercise is compared with effective treatments, such as the use of pharmacological antidepressants, or when compared with controls without any treatment. 87

Clinical implications included that if 100 people were each in the control and the exercise group, 20 participants in the control and 54 in the exercise group for the main analysis and 43 in the exercise group for the low risk of bias studies, analyses can be expected to have favourable outcomes. 89 The NNT for the main-analysis was 2, while it was 2.8 in the low risk of bias studies, 1.9 in MDD only and 1.6 in supervision by other professionals/students. This effect is comparable to recent meta-analyses with psychotherapy revealing a NNT of 2.5 for the main analyses and 3.5 in the low risk of bias studies and for medication of 4.3. 85 86 Based on a NNT of 2 for the main analyses this means that for every two people treated with exercise, it is expected at least one to have a large magnitude reduction in depressive symptoms. 43 Furthermore, exercise showed an additional declining effect over control conditions of −4.70 points in the HAM-D as a diagnostic clinician measure in 16 studies and −6.49 in the BDI in eight studies indicating a clinically meaningful reduction of depressive symptoms from moderate to mild depression. According to the NICE guidelines, a three-point change is indicated as clinically meaningful for both measures. 16

Limitations

We acknowledge that limitations lie in the high heterogeneity of the included studies that can stem from different control group conditions, cultural backgrounds, gender distribution, variable forms of assessments and diagnosis of depression severity or MDD. Notwithstanding, we have performed several subgroup analyses and meta-regressions to explore the sources of this heterogeneity. Additionally, most of the included studies comprised small sample sizes for example, 13 studies with intervention arms of ≤10 participants in each group postintervention which we addressed with subanalyses. However, studies with larger samples sizes showed smaller but still moderate effects. Some subanalyses showed non-significant results as they lacked power due to the small number of studies included. In principle, the overreliance of significance testing should be avoided and interpretation of results based on SMD and 95% CI along with p values. Mostly seen wider ranges in CIs within the analyses can stem to a large extent from smaller studies (eg, 10 studies with n<10) and small number of studies in the subanalyses (especially less than k=10) which brings some uncertainty pertaining to the true effect. However, for the main analyses, 95% CIs were documented for exercise conditions comprising moderate intensity, aerobic exercise, group exercise and supervised exercise (ranging between 26 and 41 included studies), thus indicating moderate to large effects even for the lower limits. These outcomes provide adequate evidence to support the recommendation that exercise has utility in treating depression based on the aforementioned conditions. Long-term effects could not be investigated due to missing follow-up data for most studies. Moreover, it was not possible to control for placebo effects due to the nature of the interventions. Furthermore, 6 out of the 41 included studies were published prior to 2001 and can therefore be assigned to the pre-CONSORT era. This means that these earlier trials might not reflect the current standards and/or feature incomplete reporting of methodological details that was introduced with the CONSORT guidelines and checklist, therefore increasing scope in biased risk assessments and heterogeneity. 22 90

Further steps need to be undertaken to consider exercise as a first-line treatment for depression alongside psychotherapy and medication, including conducting non-inferiority trials to demonstrate that exercise is non-inferior to current first line treatments, and evidence that exercise is effective at long-term follow-ups. Future large-scale research studies should also investigate which patients benefit most from which exercise condition and identify any groups for whom exercise might not be the optimal treatment choice. It is noteworthy that the studies included in the current and former reviews consisted of samples which met the trial inclusion criteria comprising individuals that were willing, motivated and physically able to take part in the exercise regimen (eg, assessed by the Physical Activity Readiness Questionnaire 91 ) and excluded individuals with diagnoses that exercise may pose a risk (for example, individuals with cardiovascular diseases that require physician guidance to undertake exercise). Further, adverse events and outcomes due to exercise may occur in rare instances (nevertheless, they should be reported which was not documented for the majority of studies in this review), and not everyone has access to any form of exercise or exercise with the needed quality (eg, with a former sport medical examination). It is also noteworthy that the included studies were mainly conducted in high-income and upper-middle income countries, for example, no study was identified from the African continent. Future study designs should consider these relevant points including motivational aspects of attendance and samples from developing countries or rural areas to increase the generalisability of the results for healthcare.

Further strengthening the evidence base for exercise also has utility as it may be a less stigmatising treatment option for depressed individuals who may be reluctant to seek and adhere to psychotherapy and/or medication.

The findings from this review represent the most up to date and comprehensive meta-analysis of the available evidence and further supports the use of exercise focusing specifically on supervised and group exercise with moderate intensity and aerobic exercise regimes. This offers a further evidence-based treatment option for the large amount of untreated individuals with depression, including individuals who refuse or cannot tolerate medication and/or psychotherapy. However, given the high heterogeneity and mainly small and selected samples of the included studies, this requires individual decisions involving the treating physician to determine if and which conditions of exercise are the optimal treatment of choice while also recognising the potential synergistic effects of exercise in managing both physical and mental well-being. Updated guidelines as well as routine clinical decisions regarding interventions for treating depression should consider the current findings. This is particularly timely, following the post COVID-19 pandemic, given that rates of depression have continued to increase worldwide.

Ethics statements

Patient consent for publication.

Not applicable.

  • Andrade LH ,
  • Hwang I , et al
  • World Health Organization
  • Stevenson J ,
  • Lazuras L , et al
  • González-Sanguino C ,
  • Castellanos MA , et al
  • Veldhuis CB ,
  • Nesoff ED ,
  • McKowen ALW , et al
  • Santomauro DF ,
  • Mantilla Herrera AM ,
  • Pfefferbaum B ,
  • Bassett D , et al
  • Cuijpers P ,
  • Karyotaki E , et al
  • Jakobsen JC ,
  • Katakam KK ,
  • Schou A , et al
  • Thornicroft G ,
  • Chatterji S ,
  • Evans-Lacko S , et al
  • National Institute for Health and Care Excellence
  • Carneiro L ,
  • Vasoncelos-Raposo J , et al
  • Hjorthøj C ,
  • Speyer H , et al
  • Morres ID ,
  • Hatzigeorgiadis A ,
  • Stathi A , et al
  • Schuch FB ,
  • Vancampfort D ,
  • Richards J , et al
  • Cooney GM ,
  • Greig CA , et al
  • Ekkekakis P
  • Liberati A ,
  • Tetzlaff J , et al
  • McKenzie JE ,
  • Bossuyt PM , et al
  • Ardern CL ,
  • Büttner F ,
  • Andrade R , et al
  • Spitzer RL ,
  • Endicott J ,
  • American Psychiatric Association
  • Mendelson M , et al
  • Caspersen CJ ,
  • Powell KE ,
  • Christenson GM
  • Etnier J , et al
  • Sterne JAC ,
  • Savović J ,
  • Page MJ , et al
  • Higgins JPT ,
  • Thompson SG ,
  • Deeks JJ , et al
  • Wetterslev J ,
  • Winkel P , et al
  • Davey Smith G ,
  • Schneider M , et al
  • Rosenthal R ,
  • Borenstein M ,
  • Higgins J , et al
  • Lenhard W ,
  • Furukawa TA ,
  • Mather AS ,
  • Rodriguez C ,
  • Guthrie MF , et al
  • Abdelbasset WK ,
  • Alqahtani BA ,
  • Elshehawy AA , et al
  • Brenes GA ,
  • Williamson JD ,
  • Messier SP , et al
  • Ossip-Klein DJ ,
  • Bowman ED , et al
  • Qi H , et al
  • Hallgren M ,
  • Kraepelien M ,
  • Öjehagen A , et al
  • Hemat-Far A ,
  • Shahsavari A ,
  • Tsai YH , et al
  • McNeil JK ,
  • LeBlanc EM ,
  • Makizako H ,
  • Tsutsumimoto K ,
  • Doi T , et al
  • Mutrie N , et al
  • Nabkasorn C ,
  • Sootmongkol A , et al
  • Prakhinkit S ,
  • Suppapitiporn S ,
  • Tanaka H , et al
  • Shahidi M ,
  • Mojtahed A ,
  • Modabbernia A , et al
  • Davidson S , et al
  • Taylor N , et al
  • Williams CL ,
  • Alfonso H ,
  • Newton RU , et al
  • Blumenthal JA ,
  • Babyak MA ,
  • Doraiswamy PM , et al
  • Carneiro LSF ,
  • Fonseca AM ,
  • Vieira-Coelho MA , et al
  • Wong EYW , et al
  • Cheung LK ,
  • Danielsson L ,
  • Papoulias I ,
  • Petersson EL , et al
  • Ziegenbein M ,
  • Hoos O , et al
  • Dunbar SB ,
  • Higgins MK , et al
  • La Rocque CL ,
  • Mazurka R ,
  • Stuckless TJR , et al
  • Mota-Pereira J ,
  • Silverio J ,
  • Carvalho S , et al
  • Oertel-Knöchel V ,
  • Thiel C , et al
  • Hardoy MC , et al
  • Govindan R ,
  • Muralidharan K
  • Schneider KL ,
  • Handschin B , et al
  • Vasconcelos-Moreno MP ,
  • Borowsky C , et al
  • Clements KM ,
  • Fiatarone MA
  • Stavrinos TM ,
  • Scarbek Y , et al
  • Le Fevre K ,
  • Pantelis C , et al
  • Vieira JLL ,
  • Rocha PGM da
  • Karyotaki E ,
  • Eckshtain D , et al
  • Cipriani A ,
  • Salanti G , et al
  • Rosenbaum S , et al
  • Magnusson K
  • Lepage L , et al
  • Marshall AL ,
  • Sjöström M , et al

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1

Twitter @AndreasHeissel, @LuisaBrokmeier, @MariaKangas88, @davyvancampfort, @PhilWardAu, @simon_rosenbaum, @SchuchFelipe

Correction notice This article has been corrected since it published Online First. The article type has been changed to systematic review.

Contributors AH and FS conceived and designed the study. AH, LLB and FS had full access to all data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. NS/LLB and DH did the literature search. AH, LLB, NS, DH and FS conducted the analyses, interpreted the data and wrote the first draft of the manuscript. All authors contributed to critical revision of the report for important intellectual content.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests AH is founder and CEO of the Centre for Emotional Health Germany GmbH supported by the Potsdam Transfer Centre from the University of Potsdam. BS has an NIHR Advanced fellowship (NIHR-301206, 2021–2026) and is coinvestigator on an NIHR program grant: Supporting Physical and Activity through Co-production in people with Severe Mental Illness (SPACES). BS is on the Editorial board of Mental Health and Physical Activity and The Brazilian Journal of Psychiatry. BS has received honorarium from a coedited book on exercise and mental illness and advisory work from ASICS for unrelated work. MK is on the Editorial boards for Behavior Therapy (Associate Editor), Stress and Health (Sections Editor), Psychological Bulletin, and Behaviour Research and Therapy. JF is supported by a UK Research and Innovation Future Leaders Fellowship (MR/T021780/1) and has received honoraria / consultancy fees from Atheneum, Informa, Gillian Kenny Associates, Big Health, Wood For Trees, Nutritional Medicine Institute, Angelini, ParachuteBH, Richmond Foundation and Nirakara, independent of this work. FS is on the Editorial board of Mental Health and Physical Activity, The Brazilian Journal of Psychiatry and Journal Brasileiro de Psiquiatria. FS has received honorarium from a co-edited book on lifestyle and mental illness. The other authors declare no funding, editorial or potential competing interests.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

COMMENTS

  1. Regression Analysis

    Regression analysis is a quantitative research method which is used when the study involves modelling and analysing several variables, where the relationship includes a dependent variable and one or more independent variables. In simple terms, regression analysis is a quantitative method used to test the nature of relationships between a dependent variable and one or more independent variables.

  2. Regression and Correlation

    Quantitative Research Methods. Correlation is the relationship or association between two variables. There are multiple ways to measure correlation, but the most common is Pearson's correlation coefficient (r), which tells you the strength of the linear relationship between two variables. The value of r has a range of -1 to 1 (0 indicates no ...

  3. Regression Analysis

    Marketing Research: Regression analysis helps marketers understand consumer behavior and make data-driven decisions. It can be used to predict sales based on advertising expenditures, pricing strategies, or demographic variables. ... Disadvantages of Regression Analysis; Provides a quantitative measure of the relationship between variables ...

  4. Introduction to Research Statistical Analysis: An Overview of the

    Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.

  5. Regression Analysis: The Complete Guide

    Regression analysis is a statistical method. It's used for analyzing different factors that might influence an objective - such as the success of a product launch, business growth, a new marketing campaign - and determining which factors are important and which ones can be ignored.

  6. The clinician's guide to interpreting a regression analysis

    Regression analysis is an important statistical method that is commonly used to determine the relationship between several factors ... Logistic regression in medical research. Anesth Analg. 2021 ...

  7. Sage Research Methods Foundations

    Regression analysis is a useful tool not only for highly sophisticated and seemingly esoteric academic applications but also for basic descriptions of data covering many societal contexts. Across applications, however, the strengths and limitations of regression modeling approaches should be kept front and center.

  8. Sage Research Methods

    Subject index. Understanding Regression Analysis: An Introductory Guide presents the fundamentals of regression analysis, from its meaning to uses, in a concise, easy-to-read, and non-technical style. It illustrates how regression coefficients are estimated, interpreted, and used in a variety of settings within the social sciences, business ...

  9. Quantitative Data Analysis Methods & Techniques 101

    Quantitative data analysis is one of those things that often strikes fear in students. It's totally understandable - quantitative analysis is a complex topic, full of daunting lingo, like medians, modes, correlation and regression.Suddenly we're all wishing we'd paid a little more attention in math class…. The good news is that while quantitative data analysis is a mammoth topic ...

  10. Regression Analysis

    Regression analysis is a statistical method for analyzing a relationship between two or more variables in such a manner that one of the variables can be predicted or explained by the information on the other variables. The term "regression" was first introduced by Sir Francis Galton in the late 1800s to explain the relation between heights ...

  11. Regression Analysis: Definition, Types, Usage & Advantages

    Overall, regression analysis saves the survey researchers' additional efforts in arranging several independent variables in tables and testing or calculating their effect on a dependent variable. Different types of analytical research methods are widely used to evaluate new business ideas and make informed decisions.

  12. Introduction to Multivariate Regression Analysis

    These questions can in principle be answered by multiple linear regression analysis. In the multiple linear regression model, Y has normal distribution with mean. The model parameters β 0 + β 1 + +β ρ and σ must be estimated from data. β 0 = intercept. β 1 β ρ = regression coefficients.

  13. Regression Analysis

    Multiple linear regression analysis is essentially similar to the simple linear model, with the exception that multiple independent variables are used in the model. The mathematical representation of multiple linear regression is: Y = a + b X1 + c X2 + d X3 + ϵ. Where: Y - Dependent variable. X1, X2, X3 - Independent (explanatory) variables.

  14. What Is Quantitative Research? An Overview and Guidelines

    The necessity, importance, relevance, and urgency of quantitative research are articulated, establishing a strong foundation for the subsequent discussion, which delineates the scope, objectivity, goals, data, and methods that distinguish quantitative research, alongside a balanced inspection of its strengths and shortcomings, particularly in ...

  15. Regression Analysis: A Comprehensive Guide to Quantitative Forecasting

    23 November 2023. Regression analysis stands as a cornerstone in the realm of quantitative forecasting, offering an extensive suite of methods for researchers and analysts who seek to understand and predict relationships among variables. It is an indispensable statistical tool that aids decision-making across fields as varied as economics ...

  16. 17 Quantitative Analysis with SPSS: Bivariate Regression

    Assuming that the relationship of interest is appropriate for linear regression, the regression can be produced by going to Analyze → Regression → Linear [2] (Alt+A, Alt+R, Alt+L). The dependent variable is placed in the Dependent box; the independent in the "Block 1 of 1" box. Under Statistics, be sure both Estimates and Model fit are ...

  17. A Practical Guide to Writing Quantitative and Qualitative Research

    In quantitative research, hypotheses predict the expected relationships among variables.15 Relationships among variables that can be predicted include 1) between a single dependent variable and a single independent variable ... A logistic regression analysis was performed for employment status, ...

  18. Association of Efficacy of Resistance Exercise Training ...

    Key Points. Question What is the overall association of efficacy of resistance exercise training with depressive symptoms, and which logical, theoretical, and/or prior empirical variables are associated with depressive symptoms?. Findings In this meta-analysis of 33 clinical trials including 1877 participants, resistance exercise training was associated with a significant reduction in ...

  19. Quantitative assessment of machine learning ...

    Risk Analysis is an international journal publishing new developments, empirical research and commentaries on a wide range of topics in the field of risk analysis. Abstract Advances in machine learning (ML) have led to applications in safety-critical domains, including security, defense, and healthcare. ... fewer consider quantitative ...

  20. Comparison of Maximum Likelihood Estimation Approach and Regression

    Maximum likelihood estimation was found to be more power than regression and could also estimate the distance between markers and QTLs. Test statistic for the relationship between simple regression and maximum likelihood estimation is introduced. A total of four major QTLs linked to random amplified polymorphic.

  21. Mitigating urban heat island through neighboring rural land cover

    The quantitative influence of rural land cover on the Urban Heat Island Intensity (UHII) was analyzed as shown in Fig. 1, using the regression model depicted in Methods. Two important parameters ...

  22. Meta-analysis of the accuracy of the serum procalcitonin diagnostic

    Result. A total of 5 investigations were included. Of these, 148 children with osteomyelitis were tested for bacterial cultures in PCT. For PCT in the diagnosis of pediatric osteomyelitis, diagnostic meta-analysis revealed a pooled sensitivity and specificity of 0.58 (95% confidence interval (CI): 0.49 to 0.68) and 0.92 (95% CI: 0.90 to 0.93) respectively.

  23. British Journal of Sports Medicine

    Moved Permanently. The document has moved here.