- Skip to main content
- Skip to primary sidebar
- Skip to footer
- QuestionPro
- Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
- Resources Blog eBooks Survey Templates Case Studies Training Help center
Home Market Research
Data Analysis in Research: Types & Methods
What is data analysis in research?
Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense.
Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.
On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.
We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”
Why analyze data in research?
Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.
Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research.
Create a Free Account
Types of data in research
Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.
- Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
- Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
- Categorical data : It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.
Learn More : Examples of Qualitative Data in Education
Data analysis in qualitative research
Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .
Finding patterns in the qualitative data
Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words.
For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find “food” and “hunger” are the most commonly used words and will highlight them for further analysis.
The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.
For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’
The scrutiny-based technique is also one of the highly recommended text analysis methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other.
For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .
Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.
Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.
Methods used for data analysis in qualitative research
There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,
- Content Analysis: It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
- Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
- Discourse Analysis: Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
- Grounded Theory: When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.
Choosing the right software can be tough. Whether you’re a researcher, business leader, or marketer, check out the top 10 qualitative data analysis software for analyzing qualitative data.
Data analysis in quantitative research
Preparing data for analysis.
The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.
Phase I: Data Validation
Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages
- Fraud: To ensure an actual human being records each response to the survey or the questionnaire
- Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
- Procedure: To ensure ethical standards were maintained while collecting the data sample
- Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.
Phase II: Data Editing
More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.
Phase III: Data Coding
Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.
LEARN ABOUT: Steps in Qualitative Research
Methods used for data analysis in quantitative research
After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .
Descriptive statistics
This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.
Measures of Frequency
- Count, Percent, Frequency
- It is used to denote home often a particular event occurs.
- Researchers use it when they want to showcase how often a response is given.
Measures of Central Tendency
- Mean, Median, Mode
- The method is widely used to demonstrate distribution by various points.
- Researchers use this method when they want to showcase the most commonly or averagely indicated response.
Measures of Dispersion or Variation
- Range, Variance, Standard deviation
- Here the field equals high/low points.
- Variance standard deviation = difference between the observed score and mean
- It is used to identify the spread of scores by stating intervals.
- Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.
Measures of Position
- Percentile ranks, Quartile ranks
- It relies on standardized scores helping researchers to identify the relationship between different scores.
- It is often used when researchers want to compare scores with the average count.
For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided sample without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.
Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.
Inferential statistics
Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected sample to reason that about 80-90% of people like the movie.
Here are two significant areas of inferential statistics.
- Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
- Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.
These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.
Here are some of the commonly used methods for data analysis in research.
- Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
- Cross-tabulation: Also called contingency tables, cross-tabulation is used to analyze the relationship between multiple variables. Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
- Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
- Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
- Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
Considerations in research data analysis
- Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
- Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.
LEARN ABOUT: Best Data Collection Tools
- The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing audience sample il to draw a biased inference.
- Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
- The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.
LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.
LEARN ABOUT: Average Order Value
QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.
MORE LIKE THIS
Maximize Employee Feedback with QuestionPro Workforce’s Slack Integration
Nov 6, 2024
2024 Presidential Election Polls: Harris vs. Trump
Nov 5, 2024
Your First Question Should Be Anything But, “Is The Car Okay?” — Tuesday CX Thoughts
QuestionPro vs. Qualtrics: Who Offers the Best 360-Degree Feedback Platform for Your Needs?
Nov 4, 2024
Other categories
- Academic Research
- Artificial Intelligence
- Assessments
- Brand Awareness
- Case Studies
- Communities
- Consumer Insights
- Customer effort score
- Customer Engagement
- Customer Experience
- Customer Loyalty
- Customer Research
- Customer Satisfaction
- Employee Benefits
- Employee Engagement
- Employee Retention
- Friday Five
- General Data Protection Regulation
- Insights Hub
- Life@QuestionPro
- Market Research
- Mobile diaries
- Mobile Surveys
- New Features
- Online Communities
- Question Types
- Questionnaire
- QuestionPro Products
- Release Notes
- Research Tools and Apps
- Revenue at Risk
- Survey Templates
- Training Tips
- Tuesday CX Thoughts (TCXT)
- Uncategorized
- What’s Coming Up
- Workforce Intelligence
An official website of the United States government
Official websites use .gov A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
- Publications
- Account settings
- Advanced Search
- Journal List
Comprehensive guidelines for appropriate statistical analysis methods in research
Jonghae kim, dong hyuck kim, sang gyu kwak.
- Author information
- Article notes
- Copyright and License information
Corresponding author: Sang Gyu Kwak, Ph.D. Department of Medical Statistics, Daegu Catholic University School of Medicine, 33 Duryugongwon-ro 17-gil, Nam-gu, Daegu, 42472, Korea Tel: +82-53-650-4724 Fax: +82-53-650-4517 Email: [email protected]
Received 2024 Jan 7; Revised 2024 May 24; Accepted 2024 Jul 11; Issue date 2024 Oct.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( https://creativecommons.org/licenses/by-nc/4.0/ ) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The selection of statistical analysis methods in research is a critical and nuanced task that requires a scientific and rational approach. Aligning the chosen method with the specifics of the research design and hypothesis is paramount, as it can significantly impact the reliability and quality of the research outcomes.
This study explores a comprehensive guideline for systematically choosing appropriate statistical analysis methods, with a particular focus on the statistical hypothesis testing stage and categorization of variables. By providing a detailed examination of these aspects, this study aims to provide researchers with a solid foundation for informed methodological decision making. Moving beyond theoretical considerations, this study delves into the practical realm by examining the null and alternative hypotheses tailored to specific statistical methods of analysis. The dynamic relationship between these hypotheses and statistical methods is thoroughly explored, and a carefully crafted flowchart for selecting the statistical analysis method is proposed.
Based on the flowchart, we examined whether exemplary research papers appropriately used statistical methods that align with the variables chosen and hypotheses built for the research. This iterative process ensures the adaptability and relevance of this flowchart across diverse research contexts, contributing to both theoretical insights and tangible tools for methodological decision-making.
Conclusions
This study emphasizes the importance of a scientific and rational approach for the selection of statistical analysis methods. By providing comprehensive guidelines, insights into the null and alternative hypotheses, and a practical flowchart, this study aims to empower researchers and enhance the overall quality and reliability of scientific studies.
Keywords: Algorithms, Biostatistics, Data analysis, Guideline, Statistical data interpretation, Statistical model
Introduction
The nuanced process of selecting appropriate statistical analysis methods has emerged as a pivotal and multifaceted challenge in navigating the dynamic and ever-evolving landscape of modern scientific research. The proliferation of data and increasing complexity of research questions highlight the need for a thorough and comprehensive understanding of statistical methodology. At the heart of this complex effort is the recognition that statistical analyses play a central role in shaping and defining the integrity and interpretability of research results.
Choosing the appropriate statistical analysis method is not a one-size-fits-all task. Rather, it is a dynamic process influenced by the complex interactions between different datasets and the complexities inherent in the research hypothesis. As researchers grapple with a wide range of methodological choices, the importance of each decision becomes increasingly clear and the far-reaching implications for the validity and reliability of research findings grow. The multifaceted nature of statistical methodology becomes apparent when we consider the number of variables that are more than simple numbers and the complexities inherent in the experimental design, sample characteristics, and underlying assumptions of each statistical approach. Each of these factors contributes to the complex decision-making process that researchers must navigate, requiring a nuanced understanding and careful consideration of the unique needs that arise in each research endeavor. The interdependence of the data and research hypotheses further complicates this situation. A deep understanding of statistical analysis tools and a keen understanding of how these tools interact with specific research questions are therefore essential.
As statistical analysis methods and research hypotheses interact, the chosen statistical approach must accurately reflect the characteristics of the data and objectives of the study. In response to this complexity, researchers must develop a methodological sophistication that extends beyond the simple application of statistical techniques. Rather, a keen awareness of the assumptions, hypotheses, strengths, and limitations inherent to each method is necessary. Navigating this environment therefore involves not only choosing a statistical tool but also understanding why a particular tool is appropriate for a given situation. Efforts to guide researchers through this complex process require thus more than an explanation of the statistical techniques.
This study presents a detailed description of the sequential steps for statistical hypothesis testing. It also includes an explanation of the variable types, an exploration of different statistical hypothesis tests, and a careful examination of important considerations when choosing a statistical analysis method. We also introduce a structured flowchart designed to serve as a practical tool for researchers to navigate through the various methodological options.
The final goal of this study is to improve methodological precision by facilitating researchers’ understanding of a comprehensive algorithm for choosing statistical methods based on the variables of interest and research hypotheses. By exploring the complexities of statistical analysis, we aim to provide researchers with the insights and resources needed to delve into their scientific inquiries with confidence, methodological rigor, and an unwavering commitment to advancing knowledge.
Materials and Methods
Statistical hypothesis testing.
Statistical hypothesis testing is a structured process involving five key steps [ 1 ]. First, the hypothesis is formulated, then the significance level is established, the test statistic is calculated, the rejection area or significance probability (P value) is determined, and conclusions are drawn. In the conclusion stage, if the test statistic falls outside the rejection area, or the P value is greater than the predetermined significance level, “the null hypothesis cannot be rejected at the predetermined significance level.” Conversely, if the test statistic falls within the rejection area, or if the P value is less than the predetermined significance level, “the null hypothesis is rejected at the significance level.” In this case, conclusions are drawn and interpreted in alignment with the alternative hypothesis rather than the null hypothesis. For example, in a statistical hypothesis test where the significance level is set at 0.05 and the calculated significance probability is 0.002, the null hypothesis is rejected. Similarly, in a statistical hypothesis test where the significance level is set at 0.1 and the calculated significance probability is 0.07, the null hypothesis is also rejected. The conclusion is based on the content of the alternative hypothesis. This process provides a systematic framework for researchers to rigorously evaluate hypotheses and draw meaningful conclusions based on statistical evidence. The decisive stages of hypothesis testing serve as a robust foundation for deriving insights into the underlying dynamics of null and alternative hypotheses and contribute to the integrity and reliability of the research outcomes.
Types of variables
As the word suggests, a variable is a “changeable number.” Variables with their inherent property of variability can be investigated through measurements or observations, manifesting diverse values based on the object under scrutiny. Examples of variables include anthropometric measures such as height, demographic factors such as age, and health indicators such as body mass index (BMI). These diverse entities allow researchers to capture and quantify essential characteristics in their studies. Broadly, variables fall into two primary categories: categorical (qualitative) and quantitative. Categorical variables encapsulate characteristics that resist straightforward quantification and further branch into two subtypes: nominal and ordinal. Nominal variables serve as descriptors representing names, labels, or categories without any inherent order. A classic example is sex, in which the categories of male and female do not have a natural ranking. On the other hand, ordinal variables introduce an element of order, defining values based on a ranking system between different categories. The quintessential example is assessing the satisfaction level using a Likert-type scale (e.g., “very dissatisfied,” “dissatisfied,” “neither dissatisfied nor satisfied,” “satisfied,” and “very satisfied”).
By contrast, quantitative variables denote characteristics that can be precisely quantified and expressed as numerical values. A quantitative variable is further subdivided into continuous and discrete variables. Continuous variables can assume an infinite number of real values within a defined interval, offering a nuanced representation of the attributes. Examples include age, which captures a spectrum of real values, and height, which spans a continuous range of measurements. In contrast, discrete variables can only take a finite set of real values within a given range. An example is the number of children, where the possible values are limited to zero and positive integers (e.g., 0, 1, 2, etc.).
In essence, variables, as discerned from their names, embody a concept of change that can be expressed numerically. This intricate tapestry of variability allows for a nuanced understanding of data in research and analysis.
Statistical analysis methods and hypotheses
As mentioned previously, determining the statistical hypothesis testing method is contingent on the hypothesis established for the analysis. In other words, hypotheses are formulated in alignment with the selected statistical analysis method. In this section, we delve deeper into specific hypotheses associated with the various statistical analysis methods. Table 1 summarizes the types of variables and the null and alternative hypotheses for the various statistical analysis methods. In this study, various statistical analysis methods are discussed, including normality test, one-group mean and independent two-group mean difference test, dependent or before-and-after group mean difference test, one-way analysis of variance (ANOVA), repeated-measures ANOVA, chi-square test, Fisher’s exact test, correlation analysis, linear regression analysis, and logistic regression analysis.
Types of Variables, Null Hypotheses, and Alternative Hypotheses for Various Statistical Analysis Methods
ANOVA: analysis of variance, CO: coefficient, CV: categorical variable, EV: explanatory variable, QV: quantitative variable, RG: regression, RM ANOVA: repeated-measures analysis of variance, RV: response variable, ST: significance test.
We also examine the null and alternative hypotheses in detail and show how to correctly interpret the result when the null hypothesis is rejected. To determine whether the null hypothesis should be rejected, the results obtained from the statistical analysis are compared with predetermined significance levels. When the null hypothesis is rejected, the observed data provides sufficient evidence to contradict the notion that no effect or difference exists. In a two-tailed test, rejection area of the null hypothesis implies that the observed outcome falls on either the extreme right or left tail of the distribution, suggesting a statistically significant deviation from the expected outcome. Understanding the nuances of null and alternative hypotheses and the outcomes of their testing is pivotal for researchers and practitioners to draw meaningful conclusions from statistical analyses. This section demystifies these concepts and offers insights into the intricacies of hypothesis testing across a spectrum of statistical methods.
Normality test
The normality test is a statistical method used to test whether the collected data follows a normal distribution or satisfies normality [ 2 ]. The null hypothesis states that “the data follows a normal distribution” and the alternative hypothesis states that “the data does not follow a normal distribution.” If the null hypothesis is rejected, the conclusion is that “the data cannot be said to statistically follow a normal distribution under significance level,” and thus the data does not follow a normal distribution. If the data satisfies normality, the data should be analyzed using a parametric approach and presented with statistics, such as the mean, standard deviation, and confidence interval. If the data does not satisfy normality, the data should be analyzed using a nonparametric approach and presented with statistics, such as the median, quartiles, and interquartile range [ 2 , 3 ].
The types of t-tests include the one-sample t-test (one-group mean difference test), two-sample t-test (independent two-group mean difference test), and paired t-test (dependent or before-and-after group mean difference test) [ 4 , 5 ]. The one-sample t-test is a statistical hypothesis-testing method used to assess whether the average of a group is the same or different from a specific value. The null hypothesis states that “the average of the group is equal to the specific value” and the alternative hypothesis states that “the average of the group is different from the specific value.” If the null hypothesis is rejected, the conclusion is that “the average of the group cannot be statistically equal to the specific value under the significance level,” and the average of the group can be judged to be different from the specific value. The two-sample t-test is a statistical hypothesis-testing method used to test whether the averages of two independent groups are the same or different. The null hypothesis states that “the averages of group A and group B are the same” and the alternative hypothesis states that “the averages of group A and group B are different.” If the null hypothesis is rejected, the conclusion is that “the averages of group A and group B cannot be said to be statistically the same under the significance level,” and it can be determined that the averages of group A and group B are different. The paired t-test is a statistical hypothesis-testing method used to assess whether the average difference between the dependent or before-and-after groups is zero or not. Data from the two dependent or before-and-after groups is created when two measurements are taken from the same subject at different times. An initial measurement is made, and then the measurement is taken again after some type of intervention, such as an educational program, training program, surgery, or medication. Therefore, the purpose of this type of study is to determine whether an intervention performed between two measurements is effective, and the amount of change must be calculated for each subject to determine whether the average is zero. The null hypothesis states that “the average of the difference between the two dependent or before-and-after groups is equal to zero” and the alternative hypothesis states that “the average of the difference between the two dependent or before-and-after groups is different from zero.” If the null hypothesis is rejected, the conclusion is that “the average of the difference between the two dependent or before-and-after groups cannot be said to be statistically equal to zero under the significance level,” and the average of the difference between the two dependent or before-and-after groups can be judged to be different from zero. The specific type of intervention performed between the two measurements can then be said to be statistically effective.
One-way analysis of variance
An ANOVA is a statistical hypothesis testing method used to determine whether the averages of three or more independent groups are the same [ 6 ]. If the averages of three independent groups are compared, the null hypothesis states that “all averages of groups A, B, and C are the same” and the alternative hypothesis is that “all averages of groups A, B, and C are not the same.” Importantly, the alternative hypothesis states that “they are not the same,” which is different from “they are all different.” Although the averages of all three groups may indeed be different, the phrase “not the same” can also mean that two of the three groups are the same and one group is different. Therefore, if the null hypothesis is rejected, a post-hoc analysis or multiple comparison should be conducted to examine the differences among the various cases that are “not the same” [ 7 , 8 ]. In addition, because the Bonferroni correction problem occurs when the same hypothesis is tested multiple times, investigators must consider that a type 1 error will occur to the extent that multiple comparisons are performed.
Additional expanded concepts of ANOVA include analysis of covariance (ANCOVA) and multivariate analysis of variance (MANOVA). ANCOVA is a method for testing the pure impact of an explanatory variable on a response variable by controlling for covariates that can affect the relationship between the explanatory and response variables. MANOVA is a method used to test the relationship between two or more response and explanatory variables. Multivariate analysis of covariance (MANCOVA) is the term used when covariates are considered in MANOVA. In this study, detailed information regarding the Bonferroni correction problem, ANCOVA, MANOVA, and MANCOVA is not provided.
One-factor or two-factor repeated-measures analysis of variance
A one-factor repeated-measures ANOVA is a statistical hypothesis testing method used for data measured three or more times to determine whether the averages for each measurement are the same. Repeated measurements generally refer to repeated measurements over time, but they may also depend on the location, such as the general ward, operating room, post-anesthetic care unit, and surgical intensive care unit. The point in time is the “one factor” measured repeatedly. If the averages of three repeated measurements are being compared, the null hypothesis states that “all averages in the first, second, and third measurements are the same” and the alternative hypothesis states that “all averages in the first, second, and third measurements are not the same.” As mentioned with one-way ANOVA, the alternative hypothesis for one-factor repeated-measures ANOVA states that the averages are “not the same”; therefore, if the null hypothesis is rejected, the individual cases that are “not the same” need to be further analyzed.
Two-factor repeated-measures ANOVA is a statistical hypothesis testing method used on data measured repeatedly three or more times for each group out of two or more groups. The repeatedly measured point in time is one factor and the group is another factor, so it is considered “two-factor.” In two-factor repeated-measures ANOVA, a total of three tests are performed [ 9 ]. First, whether the averages at each measured time point are the same or not is tested, ignoring the effect of the group (i.e., only differences in the test variable among the time points are tested). Second, whether the averages for each group are the same or not is tested, ignoring the effect of the time point (i.e., only differences in the test variable among the groups are tested). Finally, whether the patterns of changes in the test variable among the groups are the same or not among the time points is tested (i.e., the differences in the changes in the test variable among the time points and groups [interaction effects between time point factor and group factor] are tested.). If the research design uses a two-factor repeated-measures ANOVA, the primary goal is to see the pattern of change between groups as the time points change. If you are comparing the averages of three repeated measurements for two groups, the null hypothesis for the time points alone states that “all averages in the first, second, and third measurements are the same” and the alternative hypothesis states that “all averages in the first, second, and third measurements are not the same.” If the null hypothesis is rejected, differences among the many cases that are “not the same” need to be further analyzed. The null hypothesis for the group alone states that “the averages of the two groups are the same” and the alternative hypothesis states that “the averages of the two groups are not the same.” Lastly, the null hypothesis for the difference in the pattern of change between the two groups and among the three time points states that “the change patterns between the two groups are the same as time progresses from the first point to the third point,” and the alternative hypothesis is “the change patterns between the two groups are not the same as time progresses from the first point to the third point.” If the null hypothesis is rejected, it needs to be further evaluated between which time points the change patterns between the two groups exist.
Chi-square test and Fisher’s exact test
The chi-square method can be used for a goodness-of-fit test, which is used to determine whether the observed frequency follows a specific distribution, and an independence or homogeneity test to determine whether two categorical variables are independent or homogeneous, respectively [ 10 ]. As an example of a goodness-of-fit test, data on the number of traffic accidents is collected each day from Monday to Friday, and researchers determine whether they occur equally on each day, following a specific distribution (Monday: Tuesday: Wednesday: Thursday: Friday = 1:1:1:1:1). The null hypothesis states that “the collected data follows a specific distribution” and the alternative hypothesis states that “the collected data does not follow a specific distribution.” If the null hypothesis is rejected, the conclusion is that “the data cannot be said to statistically follow a specific distribution under the significance level,” and it can be determined that the specified distribution is not followed.
Alternatively, an independence or homogeneity test can be conducted to examine whether there is a relationship between smoking and the development of lung cancer. The sentence “whether you smoke or not is independent of the incidence of lung cancer” has the same meaning as the sentence “the distribution of lung cancer in subjects who smoke and the distribution of lung cancer in subjects who do not smoke are the same.” However, whether a test is classified as an independence or homogeneity test depends on the topic of the research and the content of the data. The null hypothesis in the independence test states that “the relationship between the two categorical variables is independent” and the alternative hypothesis states that “the relationship between the two categorical variables is dependent.” If the null hypothesis is rejected, the conclusion is that “the relationship between the two categorical variables cannot be said to be statistically independent under the significance level,” and the relationship between the two categorical variables can be judged to be dependent. The null hypothesis in the homogeneity test states that “the distribution of categorical variable B according to categorical variable A is homogeneous” and the alternative hypothesis states that “the distribution of categorical variable B according to categorical variable A is not homogeneous.” If the null hypothesis is rejected, the conclusion is that “the distribution of categorical variable B according to categorical variable A cannot be said to be statistically homogeneous under the significance level,” and it can be determined that the distribution of categorical variable B according to categorical variable A is heterogeneous.
The Fisher’s exact test is an analysis method that identifies the relationship between two categorical variables in a 2 × 2 contingency table when the sample size is small. The conditions for performing the Fisher’s exact test [ 10 ] are as follows: 1) a 2 × 2 contingency table is required (i.e., both categorical variables must have two levels), 2) one cell must have an expected frequency < 5, and 3) the total number of study subjects or the sample size must be < 40. The Fisher’s exact test is performed when all three conditions are satisfied. If one of the conditions is not satisfied, the Fisher’s exact test should not be performed and the chi-square test should be performed instead.
Correlation analysis
Correlation analyses involve 1) calculating a correlation coefficient that measures the strength of the linear relationship between two variables and 2) testing the significance of the correlation coefficient to determine whether the calculated correlation coefficient is zero [ 11 ]. If the two variables being analyzed are ratio scales, Pearson’s correlation coefficient is calculated and if one of the two variables is a rank scale, Spearman’s correlation coefficient is calculated. The two correlation coefficients measure the strength of the linear relationship between two variables. Additionally, the closer to +1 the correlation coefficient, the stronger the positive linear relationship between the two variables, whereas the closer to -1 the correlation coefficient, the stronger the negative linear relationship. If the correlation coefficient is 0, no linear relationship is indicated. Therefore, a significance test must be performed to determine whether the calculated correlation coefficient is zero. The null hypothesis states that “the correlation coefficient is zero” and the alternative hypothesis states that “the correlation coefficient is not zero.” If the null hypothesis is rejected, the conclusion is that “the correlation coefficient cannot be said to be statistically zero under the significance level,” and a significant positive or negative linear correlation is indicated, depending on the sign of the correlation coefficient.
Linear regression analysis
Linear regression is a statistical analysis method used to estimate a regression model that defines the linear relationship between one or more explanatory variable(s) and a quantitative response variable. Linear regression analyses involve the following: 1) the regression coefficient of each explanatory variable is estimated, 2) the regression model is tested on whether all estimated regression coefficients are equal to zero, 3) it is tested whether the regression coefficients of each explanatory variable are equal to zero, 4) the final regression model is built, and 5) the coefficient of determination (R 2 ) is calculated to show how well the regression model explains the data used to build the model [ 12 ].
The null hypothesis for the significance test of the regression model states that “all regression coefficients are zero” and the alternative hypothesis states that “at least one regression coefficient is not zero.” If the null hypothesis is rejected, the conclusion is that “it cannot be stated that all the regression coefficients are zero under the significance level.” Because at least one regression coefficient is not zero, the estimated regression model shows a significant linear relationship between the response and explanatory variables. For testing the significance of the regression coefficient, the null hypothesis states that “the regression coefficient is zero” and the alternative hypothesis states that “the regression coefficient is not zero.” If the null hypothesis is rejected, the conclusion is that “the regression coefficient cannot be said to statistically be zero under the significance level.” Because the calculated regression coefficient is not zero, a 1-unit change in the explanatory variable results in the changes in the response variable by the value of the regression coefficient if the other explanatory variables are held constant.
Linear regression analyses must satisfy several assumptions in order to be performed. First, the distribution of the residuals must satisfy normality. Otherwise, a generalized linear model (GLM) should be conducted rather than a linear regression analysis. Second, the residuals must satisfy homoscedasticity. If the residuals do not satisfy homoscedasticity, the regression model needs to be modified by transforming the response variable to fulfill the homoscedasticity. Third, the residuals must satisfy independence. If the residuals do not satisfy independence, which indicates a dependent relationship between the residuals, a time-series regression analysis should be performed rather than a linear regression analysis. Fourth, linearity of the regression model must be satisfied. If linearity of the regression model is not satisfied, the explanatory or response variable must be transformed or the model must be reset such that linearity is satisfied. Finally, multicollinearity should not exist between the explanatory variables used in the linear regression model [ 13 ]. If multicollinearity does exist, variable selection methods such as stepwise selection, forward selection, and backward elimination should be used to modify the regression model. In summary, linear regression analyses must satisfy five assumptions: 1) normality of residuals, 2) homoscedasticity of residuals, 3) independence of residuals, 4) linearity of the model, and 5) absence of multicollinearity among the explanatory variables. If any of these assumptions are not satisfied, the linear regression model should not be accepted. Therefore, to create a reliable linear regression model, the following steps should be repeated until the five assumptions are met: 1) setting up a linear regression model, 2) assessing whether the five assumptions are met, and 3) modifying the linear regression model.
Linear regression analyses can be divided into simple and multiple linear regression analyses, depending on the number of explanatory variables. If the number of explanatory variables is only one, it is classified as a simple linear regression analysis, whereas if the number of explanatory variables is two or more, it is classified as a multiple linear regression analysis.
Logistic regression analysis
Logistic regression is a statistical analysis method used to estimate a regression model that defines the linear relationship between one or more explanatory variables and a log odds ratio (logit) of a categorical response variable [ 14 ]. Most concepts for logistic regression analyses are the same as those for linear regression analyses; however, since the response variable is a categorical rather than a quantitative variable, it is logit transformed. Therefore, rather than calculating the regression coefficient, as in the linear regression model, the odds ratio is calculated. Logistic regression analyses involve the following: 1) the odds ratios for each explanatory variable are estimated, 2) the significance of the logistic regression model is tested, and 3) the significance of the odds ratios for each explanatory variable is tested, 4) the final logistic regression model is constructed.
The null hypothesis in the significance test of the logistic regression model states that “all the odds ratios are equal to one” and the alternative hypothesis states that “at least one odds ratio is not one.” If the null hypothesis is rejected, the conclusion is that “all the odds ratios cannot be said to be statistically equal to one under the significance level.” Because at least one odds ratio is not one, the obtained logistic regression model is significant. For testing the significance of the odds ratio for each explanatory variable, the null hypothesis states that “the odds ratio is one” and the alternative hypothesis states that “the odds ratio is not one.” If the null hypothesis is rejected, the conclusion is that “the odds ratio cannot be said to statistically be one under the significance level.” Because the calculated odds ratio is not one, the relationship between the explanatory variable and the response variable can be explained as follows.
The odds ratio is interpreted differently depending on whether the explanatory variable used in the logistic regression analysis is a categorical or quantitative variable. If the explanatory variable is categorical, one level of the explanatory variable should become the reference, and the odds ratios to the other levels are calculated. For example, if the explanatory variable is sex, the levels of which are male (reference) and female, the odds ratio of females to males can be calculated. The odds ratio for male is 1, because male is used as the reference. Conversely, female can be used as the reference, in which case the odds ratio for female would be 1. The researcher can set the reference according to the research situation. Conversely, if the explanatory variable is quantitative, designating a reference is not necessary. In this case, a 1-unit increase in the explanatory variable increases the odds of the response variable by the odds ratio if the other explanatory variables are held constant (e.g., if the odds ratio of a quantitative explanatory variable is 1.2, 1-unit increase in the explanatory variable increases the odds ratio by 20%)
Similar to linear regression analyses, logistic regression analyses can be divided into simple and multiple logistic regression analyses, depending on the number of explanatory variables. If the number of explanatory variables is one, a simple logistic regression analysis is conducted, whereas if the number of explanatory variables is two or more, a multiple logistic regression analysis is conducted. Depending on the number of levels of the response variable, logistic regression analyses can be further divided into binary and multinomial. If the number of levels for the response variable is two, a binary logistic regression analysis is conducted, whereas if the number of levels for the response variable is three or more, a multinomial logistic regression analysis is conducted. Therefore, four logistic regression analysis divisions can be used according to the number of explanatory variables and the number of levels for the response variable as follows: 1) if the number of explanatory variables is one and the number of levels for the response variable is two, the classification is a simple binary logistic regression analysis; 2) if the number of explanatory variables is one and the number of levels for the response variable is three or more, the classification is a simple multinomial logistic regression analysis; 3) if the number of explanatory variables is two or more and the number of levels for the response variable is two, the classification is a multiple binary logistic regression analysis; and 4) if the number of explanatory variables is two or more and the number of levels for the response variable is three or more, the classification is a multiple multinomial logistic regression analysis. However, if the response variable is an ordinal variable, the word “multinomial” is changed to “ordinal” ( Table 2 ).
Classification of Logistic Regression Analyses according to the Number of Explanatory Variables and Levels for the Response Variable
Flowchart for selecting the statistical analysis method
Fig. 1 shows a flowchart designed to guide researchers in selecting the appropriate statistical analysis method. In the flowchart, the flowline represents the execution order of the process and the rectangular shape represents the proposed statistical analysis method. The diamond shapes represent a condition that, according to the investigator’s response (e.g., yes/no, quantitative/categorical, one/two, etc.), determines the direction of the pathway to follow.
Flowchart for selecting the statistical analysis method. Expanded concepts of analysis of variance (ANOVA) include analysis of covariance (ANCOVA) and multivariate analysis of variance (MANOVA). ANCOVA is a method used to test the pure impact of an explanatory variable on a response variable by controlling for covariates that can affect the relationship between the explanatory and response variables. MANOVA is a method used to test the relationship between two or more response and explanatory variables. If covariates are considered in MANOVA, multivariate analysis of covariance (MANCOVA) is indicated. C: categorical, CC: correlation coefficient, CO: coefficient, Dep: dependent, EF: expected frequency, EV: explanatory variable, Ind: independent, IQR: interquartile range, M: mean, MCR: multiple comparison result, ME: median, N: number of subjects, N(EV): number of explanatory variables, NL(EV): number of levels for the explanatory variable, NL(RV): number of levels for the response variable, OR: odds ratio, Q: quantitative, Ref: reference, RV: response variable, SD: standard deviation, SMD: standardized mean difference, SMED: standardized median difference, w/: with, w/o: without. * Both the response variable and the explanatory variable should be the ratio scale. † At least one of the response variable and the explanatory variable is the rank scale. ‡ C and Q mixed: explanatory variables have at least one categorical and one quantitative variable.
The decision process begins from the black diamond located at the center, the condition for which is the type of response variable (categorical or quantitative). Proceeding along the pathway tailored to match the characteristics of each study and dataset will lead to the identification of the most suitable statistical analysis method. The recommended statistical analysis methods are encapsulated within rectangles on a gray background, each accompanied by a description of the corresponding statistics to be presented in the results. This visual aid serves as a navigational tool for simplified processes of selecting an appropriate statistical analysis method based on the context of the study and the unique properties of the data.
To demonstrate the appropriate use of the proposed flowchart, a few examples will be presented and the process described. First, we can consider the prospective randomized controlled trial conducted by Lee et al. [ 15 ] that examines whether intravenous patient-controlled analgesia or a continuous block prevents rebound pain following an infraclavicular brachial plexus block after distal radius fracture fixation. In this study, visual analog scale (VAS) scores and total opioid equivalent consumption were compared among three groups, each consisting of 22 patients (the brachial plexus block only, intravenous patient-controlled analgesia, and continuous block groups). Additionally, age (years), BMI (kg/m 2 ), sex, American Society of Anesthesiologists (ASA) classification (1/2/3), and fracture type (A/B/C) were compared as baseline information. The statistical analysis section of this study states that the Kruskal-Wallis test was used to compare groups for age, BMI, VAS scores at each time point, and total opioid consumption. Fig. 2 illustrates the use of the flowchart presented in Fig. 1 (expressed by thick solid lines and black diamonds) to select the appropriate statistical analysis method for this study. First, as the response variables (age and BMI) are quantitative variables, the pathway from the starting (first) diamond to the right is indicated. For the second diamond, the explanatory variable (group with three levels [3 groups]) is a categorical variable, indicating the pathway to the left. For the third diamond, as there is only one explanatory variable, the pathway to the left is indicated. For the fourth diamond, the explanatory variable, group, has three categories, so the right pathway should be followed. For the fifth diamond, the groups are independent, so the left pathway is followed. For the sixth diamond, because normality is not satisfied, the path leading upward should be followed, finally indicating that the Kruskal-Wallis statistical analysis method should be used.
Application of the flowchart for selecting the statistical analysis method to the study by Lee et al. ( Lee JH, Kim HJ, Kim JK, Cheon S, Shin YH. Does intravenous patient-controlled analgesia or continuous block prevent rebound pain following infraclavicular brachial plexus block after distal radius fracture fixation? A prospective randomized controlled trial. Korean J Anesthesiol 2023; 76: 559-66 ), in which the Kruskal-Wallis test was used. ANOVA: analysis of variance, C: categorical, CO: coefficient, Dep: dependent, EV: explanatory variable, Ind: independent, IQR: interquartile range, M: mean, MCR: multiple comparison result, ME: median, N(EV): number of explanatory variables, NL(EV): number of levels for the explanatory variable, Q: quantitative, RV: response variable, SD: standard deviation, SMD: standardized mean difference, SMED: standardized median difference. * C and Q mixed: explanatory variables have at least one categorical and one quantitative variable.
Additionally, the authors mention that the Fisher’s exact test was used for group comparisons of sex, ASA classification, and fracture type. Fig. 3 indicates, with thick solid lines and black diamonds, how the flowchart from Fig. 1 can be used to select this statistical analysis method. First, the response variable (sex) is a categorical variable, indicating the pathway from the starting diamond to the left. For the second diamond, the response variable has two categories (male/female), indicating the downward path. For the third diamond, the explanatory variable (group with three levels) is a categorical variable; thus, the path to the left should be followed. For the fourth diamond, there is one explanatory variable, so the path to the left is indicated. For the fifth diamond, the contingency table is 2 × 3 rather than 2 × 2; thus, the right pathway is followed, finally indicating that the chi-square test (not Fisher’s exact test) should be used for statistical analysis.
Application of the flowchart for selecting the statistical analysis method to the study by Lee et al. ( Lee JH, Kim HJ, Kim JK, Cheon S, Shin YH. Does intravenous patient-controlled analgesia or continuous block prevent rebound pain following infraclavicular brachial plexus block after distal radius fracture fixation? A prospective randomized controlled trial. Korean J Anesthesiol 2023; 76: 559-66 ), in which the chi-square test was used. C: categorical, EF: expected frequency, EV: explanatory variable, N: number of subjects, N(EV): number of explanatory variables, NL(RV): number of levels for the response variable, OR: odds ratio, Q: quantitative, Ref: reference, RV: response variable, w/: with, w/o: without. * C and Q mixed: explanatory variables have at least one categorical and one quantitative variable.
Next, the response variables (ASA classification and fracture type) are categorical variables; therefore, the pathway from the starting diamond to the left is indicated. For the second diamond, the response variables have three categories; thus, the upward path should be followed. For the third diamond, the explanatory variable (group) is a categorical variable, thus the left path is indicated. For the fourth diamond, the number of explanatory variables is one, so the left pathway is followed, finally indicating that the chi-square test should be used for statistical analysis. In the above cases, no thick solid lines and block diamonds were indicated.
Another example is the study entitled “neuromodulation of the median nerve in carpal tunnel syndrome, a single-blind, randomized controlled study,” published by Genç Perdecioğlu et al. [ 16 ]. In this study, the Boston Carpal Tunnel Syndrome Questionnaire (BCTQ) score was measured in 36 and 26 patients with carpal tunnel syndrome in the noninvasive pulsed radiofrequency and splinting (control) groups, respectively, at baseline, the 4th week, and the 8th week. The patients’ age (years), sex, and electroneuromyelography findings were compared as baseline data. In the statistical analysis section of this study, the authors indicate that the chi-square test was used for categorical variables and the t-test was used for quantitative variables. Fig. 4 illustrates how the flowchart from Fig. 1 could be used to select the statistical analysis method. First, the response variable (BCTQ score) is a quantitative variable, thus the path to the right of the starting diamond is indicated. For the second diamond, the explanatory variable (group with two levels) is categorical; thus, the left path is indicated. For the third diamond, the number of explanatory variables is one, indicating the left path. For the fourth diamond, the explanatory variable has two categories; thus, the left pathway is indicated. For the fifth diamond, the explanatory variables are independent; thus, the left pathway should be followed. For the sixth diamond, as normality is satisfied, the path to the left is followed, finally indicating that the two-sample t-test should be used for statistical analysis.
Application of the flowchart for selecting the statistical analysis method to the study by Genç Perdecioğlu et al. ( Genç Perdecioğlu GR, Panpallı Ateş M, Yürük D, Akkaya ÖT. Neuromodulation of the median nerve in carpal tunnel syndrome, a single-blind, randomized controlled study. Korean J Pain 2024; 37: 34-40 ), in which the two-sample t-test was used. ANOVA: analysis of variance, C: categorical, CO: coefficient, Dep: dependent, EV: explanatory variable, Ind: independent, IQR: interquartile range, M: mean, MCR: multiple comparison result, ME: median, N(EV): number of explanatory variables, NL(EV): number of levels for the explanatory variable, Q: quantitative, RV: response variable, SD: standard deviation, SMD: standardized mean difference, SMED: standardized median difference. * C and Q mixed: explanatory variables have at least one categorical and one quantitative variable.
The authors also state that as the BCTQ was scored three times, a two-way repeated-measures ANOVA was used. Fig. 5 demonstrates how the flowchart from Fig. 1 could have been used to determine the statistical analysis method. First, the response variable (BCTQ score) is a quantitative variable, indicating the pathway to the right of the starting diamond. For the second diamond, the explanatory variable (group) is categorical; thus, the left path should be followed. For the third diamond, the number of explanatory variables is one, so the left pathway is indicated. For the fourth diamond, the number of explanatory variable categories is three; thus, the right pathway should be followed. For the fifth diamond, the explanatory variable is dependent, indicating the right path. For the sixth diamond, normality is satisfied, so the left pathway is indicated. For the seventh diamond, sphericity is satisfied, so the pathway to the left should be followed, finally indicating that the repeated-measures ANOVA statistical analysis method should be used.
Application of the flowchart for selecting the statistical analysis method to the study by Genç Perdecioğlu et al. ( Genç Perdecioğlu GR, Panpallı Ateş M, Yürük D, Akkaya ÖT. Neuromodulation of the median nerve in carpal tunnel syndrome, a single-blind, randomized controlled study. Korean J Pain 2024; 37: 34-40 ), in which the repeated-measures ANOVA was used. ANOVA: analysis of variance, C: categorical, CO: coefficient, Dep: dependent, EV: explanatory variable, Ind: independent, IQR: interquartile range, M: mean, MCR: multiple comparison result, ME: median, N(EV): number of explanatory variables, NL(EV): number of levels for the explanatory variable, Q: quantitative, RV: response variable, SD: standard deviation, SMD: standardized mean difference, SMED: standardized median difference. * C and Q mixed: explanatory variables have at least one categorical and one quantitative variable.
In this study, we have delineated the precise formulations of the null and alternative hypotheses according to various statistical analysis methods. Emphasis was placed on critical considerations for the application and interpretation of these hypotheses. The systematic steps involved in statistical hypothesis testing, including the sequential processes of hypothesis formulation, establishment of the significance level, computation of test statistics, determination of the rejection area and significance probability, and drawing conclusive inferences, were discussed. The identification and characterization of different types of variables were explored to elucidate their distinctive features. This involved a detailed examination of the null and alternative hypotheses specific to commonly utilized statistical analysis methods, accompanied by a discussion of the essential precautions relevant for testing each statistical hypothesis. We also introduced a flowchart designed as a visual aid to facilitate the selection of the most suitable statistical analysis method. This innovative tool provides researchers with a structured path to explore various types of research data and serves as a comprehensive guideline for selecting statistical analysis methods.
It is hoped that this study will help researchers select appropriate statistical analysis methods and establish accurate hypotheses in statistical hypothesis testing.
Conflicts of Interest
No potential conflict of interest relevant to this article was reported.
Data Availability
The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Author Contributions
Jonghae Kim (Conceptualization; Formal analysis; Methodology; Validation; Writing – review & editing)
Dong Hyuck Kim (Data curation; Methodology; Writing – review & editing)
Sang Gyu Kwak (Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Writing – original draft)
- 1. Kwak S. Are only p-values less than 0.05 significant? A p-value greater than 0.05 is also significant! J Lipid Atheroscler. 2023;12:89–95. doi: 10.12997/jla.2023.12.2.89. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 2. Kim TK, Park JH. More about the basic assumptions of t-test: normality and sample size. Korean J Anesthesiol. 2019;72:331–5. doi: 10.4097/kja.d.18.00292. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 3. Nahm FS. Nonparametric statistical tests for the continuous data: the basic concept and the practical use. Korean J Anesthesiol. 2016;69:8–14. doi: 10.4097/kjae.2016.69.1.8. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 4. Casella G, Berger RL. Statistical inference. 2nd ed. Pacific Grove, Duxbury/Thomson Learning. 2002, pp 374-82. [ Google Scholar ]
- 5. Kim TK. T test as a parametric statistic. Korean J Anesthesiol. 2015;68:540–6. doi: 10.4097/kjae.2015.68.6.540. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 6. Pukelsheim F. Optimal design of experiments. Philadelphia, SIAM. 2006, pp 1-4. [ Google Scholar ]
- 7. Lee S, Lee DK. What is the proper way to apply the multiple comparison test? Korean J Anesthesiol. 2018;71:353–60. Erratum in: Korean J Anesthesiol 2020; 73: 572. doi: 10.4097/kja.d.18.00242. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 8. Kim TK. Understanding one-way ANOVA using conceptual figures. Korean J Anesthesiol. 2017;70:22–6. doi: 10.4097/kjae.2017.70.1.22. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 9. Lee Y. What repeated measures analysis of variances really tells us. Korean J Anesthesiol. 2015;68:340–5. doi: 10.4097/kjae.2015.68.4.340. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 10. Agresti A. Categorical data analysis. 3rd ed. Hoboken, John Wiley & Sons. 2013, pp 90-3. [ Google Scholar ]
- 11. Kutner MH, Nachtsheim C, Neter J. Applied Linear Regression Models. 4th ed. New York, McGraw-Hill/Irwin. 2004, pp 78-87. [ Google Scholar ]
- 12. Lewis-Beck C, Lewis-Beck M. Applied regression: An introduction. 2nd ed. Thousand Oaks, Sage publications, Inc. 2015, pp 55-60. [ Google Scholar ]
- 13. Kim JH. Multicollinearity and misleading statistical results. Korean J Anesthesiol. 2019;72:558–69. doi: 10.4097/kja.19087. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 14. Hosmer Jr DW, Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken, John Wiley & Sons. 2013, pp 1-10. [ Google Scholar ]
- 15. Lee JH, Kim HJ, Kim JK, Cheon S, Shin YH. Does intravenous patient-controlled analgesia or continuous block prevent rebound pain following infraclavicular brachial plexus block after distal radius fracture fixation? A prospective randomized controlled trial. Korean J Anesthesiol. 2023;76:559–66. doi: 10.4097/kja.23076. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 16. Genç Perdecioğlu GR, Panpallı Ateş M, Yürük D, Akkaya ÖT. Neuromodulation of the median nerve in carpal tunnel syndrome, a single-blind, randomized controlled study. Korean J Pain. 2024;37:34–40. doi: 10.3344/kjp.23232. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- View on publisher site
- PDF (1.1 MB)
- Collections
Similar articles
Cited by other articles, links to ncbi databases.
- Download .nbib .nbib
- Format: AMA APA MLA NLM
Add to Collections
From ANOVA to regression: 10 key statistical analysis methods explained
Last updated
24 October 2024
Reviewed by
Miroslav Damyanov
Every action we take generates data. When you stream a video, browse a website, or even make a purchase, valuable data is created. However, without statistical analysis, the potential of this information remains untapped.
Understanding how different statistical analysis methods work can help you make the right choice. Each is applicable to a certain situation, data type, and goal.
- What is statistical analysis?
Statistical analysis is the process of collecting, organizing, and interpreting data. The goal is to identify trends and relationships. These insights help analysts forecast outcomes and make strategic business decisions.
This type of analysis can apply to multiple business functions and industries, including the following:
Finance : helps companies assess investment risks and performance
Marketing : enables marketers to identify customer behavior patterns, segment markets, and measure the effectiveness of advertising campaigns
Operations: helps streamline process optimization and reduce waste
Human resources : helps track employee performance trends or analyze turnover rates
Product development : helps with feature prioritization, evaluating A/B test results, and improving product iterations based on user data
Scientific research: supports hypothesis testing, experiment validation, and the identification of significant relations in data
Government: informs public policy decisions, such as understanding population demographics or analyzing inflation
With high-quality statistical analysis, businesses can base their decisions on data-driven insights rather than assumptions. This helps build more effective strategies and ultimately improves the bottom line.
- Importance of statistical analysis
Statistical analysis is an integral part of working with data. Implementing it at different stages of operations or research helps you gain insights that prevent costly errors.
Here are the key benefits of statistical analysis:
Informed decision-making
Statistical analysis allows businesses to base their decisions on solid data rather than assumptions.
By collecting and interpreting data, decision-makers can evaluate the potential outcomes of their strategies before they implement them. This approach reduces risks and increases the chances of success.
Understanding relationships and trends
In many complex environments, the key to insights is understanding relationships between different variables. Statistical methods such as regression or factor analysis help uncover these relationships.
Uncovering correlations through statistical methods can pave the way for breakthroughs in fields like medicine, but the true impact lies in identifying and validating cause-effect relationships. By distinguishing between simple associations and meaningful patterns, statistical analysis helps guide critical decisions, such as developing potentially life-saving treatments.
Predicting future outcomes
Statistical analysis, particularly predictive analysis and time series analysis, provides businesses with tools to forecast events based on historical data.
These forecasts help organizations prepare for future challenges (such as fluctuations in demand, market trends, or operational bottlenecks). Being able to predict outcomes allows for better resource allocation and risk mitigation.
Improving efficiency and reducing waste
Using statistical analysis can lead to improved efficiency in areas where waste occurs. In operations, this can result in streamlining processes.
For example, manufacturers can use causal analysis to identify the factors contributing to defective products and then implement targeted improvements to eliminate the causes.
Enhancing accuracy in research
In scientific research, statistical methods ensure accurate results by validating hypotheses and analyzing experimental data.
Methods such as regression analysis and ANOVA (analysis of variance) allow researchers to draw conclusions from experiments by examining relationships between variables and identifying key factors that influence outcomes.
Without statistical analysis, research findings may not be reliable. This could result in teams drawing incorrect conclusions and forming strategies that cost more than they’re worth.
Validating business assumptions
When businesses make assumptions about customer preferences, market conditions, or operational outcomes, statistical analysis can validate them.
For example, hypothesis testing can provide a framework to either confirm or reject an assumption. With these results at hand, businesses reduce the likelihood of pursuing incorrect strategies and improve their overall performance.
- Types of statistical analysis
The two main types of statistical analysis are descriptive and inferential. However, there are also other types. Here’s a short breakdown:
Descriptive analysis
Descriptive analysis focuses on summarizing and presenting data in a clear and understandable way. You can do this with simple tools like graphs and charts.
This type of statistical analysis helps break down large datasets into smaller, digestible pieces. This is usually done by calculating averages, frequencies, and ranges. The goal is to present the data in an orderly fashion and answer the question, “What happened?”
Businesses can use descriptive analysis to evaluate customer demographics or sales trends. A visual breakdown of complex data is often useful enough for people to come to useful conclusions.
Diagnostic statistics
This analysis is used to determine the cause of a particular outcome or behavior by examining relationships between variables. It answers the question, “Why did this happen?”
This approach often involves identifying anomalies or trends in data to understand underlying issues.
Inferential analysis
Inferential analysis involves drawing conclusions about a larger population based on a sample of data. It helps predict trends and test hypotheses by accounting for uncertainty and potential errors in the data.
For example, a marketing team can arrive at a conclusion about their potential audience’s demographics by analyzing their existing customer base. Another example is vaccine trials, which allow researchers to come to conclusions about side effects based on how the trial group reacts.
Predictive analysis
Predictive analysis uses historical data to forecast future outcomes. It answers the question, “What might happen in the future?”
For example, a business owner can predict future customer behavior by analyzing their past interactions with the company. Meanwhile, marketers can anticipate which products are likely to succeed based on past sales data.
This type of analysis requires the implementation of complex techniques to ensure the expected results. These results are still educated guesses—not error-free conclusions.
Prescriptive analysis
Prescriptive analysis goes beyond predicting outcomes. It suggests actionable steps to achieve desired results.
This type of statistical analysis combines data, algorithms, and business rules to recommend actual strategies. It often uses optimization techniques to suggest the best course of action in a given scenario, answering the question, “What should we do next?”
For example, in supply chain management, prescriptive analysis helps optimize inventory levels by providing specific recommendations based on forecasts. A bank can use this analysis to predict loan defaults based on economic trends and adjust lending policies accordingly.
Exploratory data analysis
Exploratory data analysis (EDA) allows you to investigate datasets to discover patterns or anomalies without predefined hypotheses. This approach can summarize a dataset’s main characteristics, often using visual methods.
EDA is particularly useful for uncovering new insights that weren’t anticipated during initial data collection.
Causal analysis
Causal analysis seeks to identify cause-and-effect relationships between variables. It helps determine why certain events happen, often employing techniques such as experiments or quasi-experimental designs to establish causality.
Understanding the “why” of specific events can help design accurate proactive and reactive strategies.
For example, in marketing, causal analysis can be applied to understand the impact of a new advertising campaign on sales.
Bayesian statistics
This approach incorporates prior knowledge or beliefs into the statistical analysis. It involves updating the probability of a hypothesis as more evidence becomes available.
- Statistical analysis methods
Depending on your industry, needs, and budget, you can implement different statistical analysis methods. Here are some of the most common techniques:
A t-test helps determine if there’s a significant difference between the means of two groups. It works well when you want to compare the average performance of two groups under different conditions.
There are different types of t-tests, including independent or dependent.
T-tests are often used in research experiments and quality control processes. For example, they work well in drug testing when one group receives a real drug and another receives a placebo. If the group that received a real drug shows significant improvements, a t-test helps determine if the improvement is real or chance-related.
2. Chi-square tests
Chi-square tests examine the relationship between categorical variables. They compare observed results with expected results. The goal is to understand if the difference between the two is due to chance or the relationship between the variables.
For instance, a company might use a chi-square test to analyze whether customer preferences for a product differ by region.
It’s particularly useful in market research, where businesses analyze responses to surveys.
ANOVA, which stands for analysis of variance, compares the means of three or more groups to determine if there are statistically significant differences among them.
Unlike t-tests, which are limited to two groups, ANOVA is ideal when comparing multiple groups at once.
One-way ANOVA: analysis with one independent variable and one dependent variable
Two-way ANOVA: analysis with two independent variables
Multivariate ANOVA (MANOVA): analysis with more than two independent variables
Businesses often use ANOVA to compare product performance across different markets and evaluate customer satisfaction across various demographics. The method is also common in experimental research, where multiple groups are exposed to different conditions.
4. Regression analysis
Regression analysis examines the relationship between one dependent variable and one or more independent variables. It helps businesses and researchers predict outcomes and understand which factors influence results the most.
This method determines a best-fit line and allows the researcher to observe how the data is distributed around this line.
It helps economists with asset valuations and predictions. It can also help marketers determine how variables like advertising affect sales.
A company might use regression analysis to forecast future sales based on marketing spend, product price, and customer demographics.
6. Time series analysis
Time series analysis evaluates data points collected over time to identify trends. An analyst records data points at equal intervals over a certain period instead of doing it randomly.
This method can help businesses and researchers forecast future outcomes based on historical data. For example, retailers might use time series analysis to plan inventory around holiday shopping trends, while financial institutions rely on it to track stock market trends. An energy company can use it to evaluate consumption trends and streamline the production schedule.
7. Survival analysis
Survival analysis focuses on time-to-event data, such as the time it takes for a machine to break down or for a customer to churn. It looks at a variable with a start time and end time. The time between them is the focus of the analysis.
This method is highly useful in medical research—for example, when studying the time between the beginning of a patient’s cancer remission and relapse. It can help doctors understand which treatments have desired or unexpected effects.
This analysis also has important applications in business. For example, companies use survival analysis to predict customer retention, product lifespan, or time until product failure.
8. Factor analysis
Factor analysis (FA) reduces large sets of variables into fewer components. It’s useful when dealing with complex datasets because it helps identify underlying structures and simplify data interpretation. This analysis is great for extracting maximum common variance from all necessary variables and turning them into a single score.
For example, in market research, businesses use factor analysis to group customer responses into broad categories. This helps reveal hidden patterns in consumer behavior.
It’s also helpful in product development, where it can use survey data to identify which product features are most important to customers.
9. Cluster analysis
Cluster analysis groups objects or individuals based on their similarities. This technique works great for customer segmentation, where businesses group customers based on common factors (such as purchasing behavior, demographics, and location).
Distinct clusters help companies tailor marketing strategies and develop personalized services. In education, this analysis can help identify groups of students who require additional assistance based on their achievement data. In medicine, it can help identify patients with similar symptoms to create targeted treatment plans.
10. Principal component analysis
Principal component analysis (PCA) is a dimensionality-reduction technique that simplifies large datasets by converting them into fewer components. It helps remove similar data from the line of comparison without affecting the data’s quality.
PCA is widely used in fields like finance, marketing, and genetics because it helps handle large datasets with many variables. For example, marketers can use PCA to identify which factors most influence customer buying decisions.
- How to choose the right statistical analysis method
Since numerous statistical analysis methods exist, choosing the right one for your needs may be complicated. While all of them can be applicable to the same situation, understanding where to start can save time and money.
Define your objective
Before choosing any statistical method, clearly define the objective of your analysis. What do you want to find out? Are you looking to compare groups, predict outcomes, or identify relationships between variables?
For example, if your goal is to compare averages between two groups, you can use a t-test. If you want to understand the effect of multiple factors on a single outcome, regression analysis could be the right choice for you.
Identify your data type
Data can be categorical (like yes/no or product types) or numerical (like sales figures or temperature readings).
For example, if you’re analyzing the relationship between two categorical variables, you may need a chi-square test. If you’re working with numerical data and need to predict future outcomes, you could use a time series analysis.
Evaluate the number of variables
The number of variables involved in your analysis influences the method you should choose. If you’re working with one dependent variable and one or more independent variables, regression analysis or ANOVA may be appropriate.
If you’re handling multiple variables, factor analysis or PCA can help simplify your dataset.
Determine sample size and data availability
Consider the assumptions of each method.
Each statistical method has its own set of assumptions, such as the distribution of the data or the relationship between variables.
For example, ANOVA assumes that the groups being compared have similar variances, while regression assumes a linear relationship between independent and dependent variables.
Understand if observations are paired or unpaired
When choosing a statistical test, you need to figure out if the data is paired or unpaired.
Paired data : the same subjects are measured more than once, like before and after a treatment or when using different methods.
Unpaired data: each group has different subjects.
For example, if you’re comparing the average scores of two groups, use a paired t-test for paired data and an independent t-test for unpaired data.
- Making the most of key statistical analysis methods
Each statistical analysis method is designed to simplify the process of gaining insights from a specific dataset. Understanding which data you need to analyze and which results you want to see can help you choose the right method.
With a comprehensive approach to analytics, you can maximize the benefits of insights and streamline decision-making. This isn’t just applicable in research and science. Businesses across multiple industries can reap significant benefits from well-structured statistical analysis.
Should you be using a customer insights hub?
Do you want to discover previous research faster?
Do you share your research findings with others?
Do you analyze research data?
Start for free today, add your research, and get to key insights faster
Editor’s picks
Last updated: 24 October 2024
Last updated: 11 January 2024
Last updated: 17 January 2024
Last updated: 12 December 2023
Last updated: 30 April 2024
Last updated: 4 July 2024
Last updated: 12 October 2023
Last updated: 5 March 2024
Last updated: 6 March 2024
Last updated: 31 January 2024
Last updated: 23 January 2024
Last updated: 13 May 2024
Last updated: 20 December 2023
Latest articles
Related topics, a whole new way to understand your customer is here, log in or sign up.
Get started for free
- Privacy Policy
Home » Data Analysis – Process, Methods and Types
Data Analysis – Process, Methods and Types
Table of Contents
Data Analysis
Definition:
Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets. The ultimate aim of data analysis is to convert raw data into actionable insights that can inform business decisions, scientific research, and other endeavors.
Data Analysis Process
The following are step-by-step guides to the data analysis process:
Define the Problem
The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome.
Collect the Data
The next step is to collect the relevant data from various sources. This may involve collecting data from surveys, databases, or other sources. It is important to ensure that the data collected is accurate, complete, and relevant to the problem being analyzed.
Clean and Organize the Data
Once the data has been collected, it needs to be cleaned and organized. This involves removing any errors or inconsistencies in the data, filling in missing values, and ensuring that the data is in a format that can be easily analyzed.
Analyze the Data
The next step is to analyze the data using various statistical and analytical techniques. This may involve identifying patterns in the data, conducting statistical tests, or using machine learning algorithms to identify trends and insights.
Interpret the Results
After analyzing the data, the next step is to interpret the results. This involves drawing conclusions based on the analysis and identifying any significant findings or trends.
Communicate the Findings
Once the results have been interpreted, they need to be communicated to stakeholders. This may involve creating reports, visualizations, or presentations to effectively communicate the findings and recommendations.
Take Action
The final step in the data analysis process is to take action based on the findings. This may involve implementing new policies or procedures, making strategic decisions, or taking other actions based on the insights gained from the analysis.
Types of Data Analysis
Types of Data Analysis are as follows:
Descriptive Analysis
This type of analysis involves summarizing and describing the main characteristics of a dataset, such as the mean, median, mode, standard deviation, and range.
Inferential Analysis
This type of analysis involves making inferences about a population based on a sample. Inferential analysis can help determine whether a certain relationship or pattern observed in a sample is likely to be present in the entire population.
Diagnostic Analysis
This type of analysis involves identifying and diagnosing problems or issues within a dataset. Diagnostic analysis can help identify outliers, errors, missing data, or other anomalies in the dataset.
Predictive Analysis
This type of analysis involves using statistical models and algorithms to predict future outcomes or trends based on historical data. Predictive analysis can help businesses and organizations make informed decisions about the future.
Prescriptive Analysis
This type of analysis involves recommending a course of action based on the results of previous analyses. Prescriptive analysis can help organizations make data-driven decisions about how to optimize their operations, products, or services.
Exploratory Analysis
This type of analysis involves exploring the relationships and patterns within a dataset to identify new insights and trends. Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation.
Data Analysis Methods
Data Analysis Methods are as follows:
Statistical Analysis
This method involves the use of mathematical models and statistical tools to analyze and interpret data. It includes measures of central tendency, correlation analysis, regression analysis, hypothesis testing, and more.
Machine Learning
This method involves the use of algorithms to identify patterns and relationships in data. It includes supervised and unsupervised learning, classification, clustering, and predictive modeling.
Data Mining
This method involves using statistical and machine learning techniques to extract information and insights from large and complex datasets.
Text Analysis
This method involves using natural language processing (NLP) techniques to analyze and interpret text data. It includes sentiment analysis, topic modeling, and entity recognition.
Network Analysis
This method involves analyzing the relationships and connections between entities in a network, such as social networks or computer networks. It includes social network analysis and graph theory.
Time Series Analysis
This method involves analyzing data collected over time to identify patterns and trends. It includes forecasting, decomposition, and smoothing techniques.
Spatial Analysis
This method involves analyzing geographic data to identify spatial patterns and relationships. It includes spatial statistics, spatial regression, and geospatial data visualization.
Data Visualization
This method involves using graphs, charts, and other visual representations to help communicate the findings of the analysis. It includes scatter plots, bar charts, heat maps, and interactive dashboards.
Qualitative Analysis
This method involves analyzing non-numeric data such as interviews, observations, and open-ended survey responses. It includes thematic analysis, content analysis, and grounded theory.
Multi-criteria Decision Analysis
This method involves analyzing multiple criteria and objectives to support decision-making. It includes techniques such as the analytical hierarchy process, TOPSIS, and ELECTRE.
Data Analysis Tools
There are various data analysis tools available that can help with different aspects of data analysis. Below is a list of some commonly used data analysis tools:
- Microsoft Excel: A widely used spreadsheet program that allows for data organization, analysis, and visualization.
- SQL : A programming language used to manage and manipulate relational databases.
- R : An open-source programming language and software environment for statistical computing and graphics.
- Python : A general-purpose programming language that is widely used in data analysis and machine learning.
- Tableau : A data visualization software that allows for interactive and dynamic visualizations of data.
- SAS : A statistical analysis software used for data management, analysis, and reporting.
- SPSS : A statistical analysis software used for data analysis, reporting, and modeling.
- Matlab : A numerical computing software that is widely used in scientific research and engineering.
- RapidMiner : A data science platform that offers a wide range of data analysis and machine learning tools.
Applications of Data Analysis
Data analysis has numerous applications across various fields. Below are some examples of how data analysis is used in different fields:
- Business : Data analysis is used to gain insights into customer behavior, market trends, and financial performance. This includes customer segmentation, sales forecasting, and market research.
- Healthcare : Data analysis is used to identify patterns and trends in patient data, improve patient outcomes, and optimize healthcare operations. This includes clinical decision support, disease surveillance, and healthcare cost analysis.
- Education : Data analysis is used to measure student performance, evaluate teaching effectiveness, and improve educational programs. This includes assessment analytics, learning analytics, and program evaluation.
- Finance : Data analysis is used to monitor and evaluate financial performance, identify risks, and make investment decisions. This includes risk management, portfolio optimization, and fraud detection.
- Government : Data analysis is used to inform policy-making, improve public services, and enhance public safety. This includes crime analysis, disaster response planning, and social welfare program evaluation.
- Sports : Data analysis is used to gain insights into athlete performance, improve team strategy, and enhance fan engagement. This includes player evaluation, scouting analysis, and game strategy optimization.
- Marketing : Data analysis is used to measure the effectiveness of marketing campaigns, understand customer behavior, and develop targeted marketing strategies. This includes customer segmentation, marketing attribution analysis, and social media analytics.
- Environmental science : Data analysis is used to monitor and evaluate environmental conditions, assess the impact of human activities on the environment, and develop environmental policies. This includes climate modeling, ecological forecasting, and pollution monitoring.
When to Use Data Analysis
Data analysis is useful when you need to extract meaningful insights and information from large and complex datasets. It is a crucial step in the decision-making process, as it helps you understand the underlying patterns and relationships within the data, and identify potential areas for improvement or opportunities for growth.
Here are some specific scenarios where data analysis can be particularly helpful:
- Problem-solving : When you encounter a problem or challenge, data analysis can help you identify the root cause and develop effective solutions.
- Optimization : Data analysis can help you optimize processes, products, or services to increase efficiency, reduce costs, and improve overall performance.
- Prediction: Data analysis can help you make predictions about future trends or outcomes, which can inform strategic planning and decision-making.
- Performance evaluation : Data analysis can help you evaluate the performance of a process, product, or service to identify areas for improvement and potential opportunities for growth.
- Risk assessment : Data analysis can help you assess and mitigate risks, whether it is financial, operational, or related to safety.
- Market research : Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective marketing strategies.
- Quality control: Data analysis can help you ensure product quality and customer satisfaction by identifying and addressing quality issues.
Purpose of Data Analysis
The primary purposes of data analysis can be summarized as follows:
- To gain insights: Data analysis allows you to identify patterns and trends in data, which can provide valuable insights into the underlying factors that influence a particular phenomenon or process.
- To inform decision-making: Data analysis can help you make informed decisions based on the information that is available. By analyzing data, you can identify potential risks, opportunities, and solutions to problems.
- To improve performance: Data analysis can help you optimize processes, products, or services by identifying areas for improvement and potential opportunities for growth.
- To measure progress: Data analysis can help you measure progress towards a specific goal or objective, allowing you to track performance over time and adjust your strategies accordingly.
- To identify new opportunities: Data analysis can help you identify new opportunities for growth and innovation by identifying patterns and trends that may not have been visible before.
Examples of Data Analysis
Some Examples of Data Analysis are as follows:
- Social Media Monitoring: Companies use data analysis to monitor social media activity in real-time to understand their brand reputation, identify potential customer issues, and track competitors. By analyzing social media data, businesses can make informed decisions on product development, marketing strategies, and customer service.
- Financial Trading: Financial traders use data analysis to make real-time decisions about buying and selling stocks, bonds, and other financial instruments. By analyzing real-time market data, traders can identify trends and patterns that help them make informed investment decisions.
- Traffic Monitoring : Cities use data analysis to monitor traffic patterns and make real-time decisions about traffic management. By analyzing data from traffic cameras, sensors, and other sources, cities can identify congestion hotspots and make changes to improve traffic flow.
- Healthcare Monitoring: Healthcare providers use data analysis to monitor patient health in real-time. By analyzing data from wearable devices, electronic health records, and other sources, healthcare providers can identify potential health issues and provide timely interventions.
- Online Advertising: Online advertisers use data analysis to make real-time decisions about advertising campaigns. By analyzing data on user behavior and ad performance, advertisers can make adjustments to their campaigns to improve their effectiveness.
- Sports Analysis : Sports teams use data analysis to make real-time decisions about strategy and player performance. By analyzing data on player movement, ball position, and other variables, coaches can make informed decisions about substitutions, game strategy, and training regimens.
- Energy Management : Energy companies use data analysis to monitor energy consumption in real-time. By analyzing data on energy usage patterns, companies can identify opportunities to reduce energy consumption and improve efficiency.
Characteristics of Data Analysis
Characteristics of Data Analysis are as follows:
- Objective : Data analysis should be objective and based on empirical evidence, rather than subjective assumptions or opinions.
- Systematic : Data analysis should follow a systematic approach, using established methods and procedures for collecting, cleaning, and analyzing data.
- Accurate : Data analysis should produce accurate results, free from errors and bias. Data should be validated and verified to ensure its quality.
- Relevant : Data analysis should be relevant to the research question or problem being addressed. It should focus on the data that is most useful for answering the research question or solving the problem.
- Comprehensive : Data analysis should be comprehensive and consider all relevant factors that may affect the research question or problem.
- Timely : Data analysis should be conducted in a timely manner, so that the results are available when they are needed.
- Reproducible : Data analysis should be reproducible, meaning that other researchers should be able to replicate the analysis using the same data and methods.
- Communicable : Data analysis should be communicated clearly and effectively to stakeholders and other interested parties. The results should be presented in a way that is understandable and useful for decision-making.
Advantages of Data Analysis
Advantages of Data Analysis are as follows:
- Better decision-making: Data analysis helps in making informed decisions based on facts and evidence, rather than intuition or guesswork.
- Improved efficiency: Data analysis can identify inefficiencies and bottlenecks in business processes, allowing organizations to optimize their operations and reduce costs.
- Increased accuracy: Data analysis helps to reduce errors and bias, providing more accurate and reliable information.
- Better customer service: Data analysis can help organizations understand their customers better, allowing them to provide better customer service and improve customer satisfaction.
- Competitive advantage: Data analysis can provide organizations with insights into their competitors, allowing them to identify areas where they can gain a competitive advantage.
- Identification of trends and patterns : Data analysis can identify trends and patterns in data that may not be immediately apparent, helping organizations to make predictions and plan for the future.
- Improved risk management : Data analysis can help organizations identify potential risks and take proactive steps to mitigate them.
- Innovation: Data analysis can inspire innovation and new ideas by revealing new opportunities or previously unknown correlations in data.
Limitations of Data Analysis
- Data quality: The quality of data can impact the accuracy and reliability of analysis results. If data is incomplete, inconsistent, or outdated, the analysis may not provide meaningful insights.
- Limited scope: Data analysis is limited by the scope of the data available. If data is incomplete or does not capture all relevant factors, the analysis may not provide a complete picture.
- Human error : Data analysis is often conducted by humans, and errors can occur in data collection, cleaning, and analysis.
- Cost : Data analysis can be expensive, requiring specialized tools, software, and expertise.
- Time-consuming : Data analysis can be time-consuming, especially when working with large datasets or conducting complex analyses.
- Overreliance on data: Data analysis should be complemented with human intuition and expertise. Overreliance on data can lead to a lack of creativity and innovation.
- Privacy concerns: Data analysis can raise privacy concerns if personal or sensitive information is used without proper consent or security measures.
About the author
Muhammad Hassan
Researcher, Academic Writer, Web developer
You may also like
Background of The Study – Examples and Writing...
Research Gap – Types, Examples and How to...
Research Project – Definition, Writing Guide and...
Substantive Framework – Types, Methods and...
Research Objectives – Types, Examples and...
Data Interpretation – Process, Methods and...
Data Analysis in Quantitative Research
- Reference work entry
- First Online: 13 January 2019
- Cite this reference work entry
- Yong Moon Jung 2
2494 Accesses
2 Citations
Quantitative data analysis serves as part of an essential process of evidence-making in health and social sciences. It is adopted for any types of research question and design whether it is descriptive, explanatory, or causal. However, compared with qualitative counterpart, quantitative data analysis has less flexibility. Conducting quantitative data analysis requires a prerequisite understanding of the statistical knowledge and skills. It also requires rigor in the choice of appropriate analysis model and the interpretation of the analysis outcomes. Basically, the choice of appropriate analysis techniques is determined by the type of research question and the nature of the data. In addition, different analysis techniques require different assumptions of data. This chapter provides introductory guides for readers to assist them with their informed decision-making in choosing the correct analysis models. To this end, it begins with discussion of the levels of measure: nominal, ordinal, and scale. Some commonly used analysis techniques in univariate, bivariate, and multivariate data analysis are presented for practical examples. Example analysis outcomes are produced by the use of SPSS (Statistical Package for Social Sciences).
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save.
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Available as EPUB and PDF
- Durable hardcover edition
- Dispatched in 3 to 5 business days
- Free shipping worldwide - see info
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Similar content being viewed by others
Data Analysis Techniques for Quantitative Study
Meta-Analytic Methods for Public Health Research
Armstrong JS. Significance tests harm progress in forecasting. Int J Forecast. 2007;23(2):321–7.
Article Google Scholar
Babbie E. The practice of social research. 14th ed. Belmont: Cengage Learning; 2016.
Google Scholar
Brockopp DY, Hastings-Tolsma MT. Fundamentals of nursing research. Boston: Jones & Bartlett; 2003.
Creswell JW. Research design: qualitative, quantitative, and mixed methods approaches. Thousand Oaks: Sage; 2014.
Fawcett J. The relationship of theory and research. Philadelphia: F. A. Davis; 1999.
Field A. Discovering statistics using IBM SPSS statistics. London: Sage; 2013.
Grove SK, Gray JR, Burns N. Understanding nursing research: building an evidence-based practice. 6th ed. St. Louis: Elsevier Saunders; 2015.
Hair JF, Black WC, Babin BJ, Anderson RE, Tatham RD. Multivariate data analysis. Upper Saddle River: Pearson Prentice Hall; 2006.
Katz MH. Multivariable analysis: a practical guide for clinicians. Cambridge: Cambridge University Press; 2006.
Book Google Scholar
McHugh ML. Scientific inquiry. J Specialists Pediatr Nurs. 2007; 8 (1):35–7. Volume 8, Issue 1, Version of Record online: 22 FEB 2007
Pallant J. SPSS survival manual: a step by step guide to data analysis using IBM SPSS. Sydney: Allen & Unwin; 2016.
Polit DF, Beck CT. Nursing research: principles and methods. Philadelphia: Lippincott Williams & Wilkins; 2004.
Trochim WMK, Donnelly JP. Research methods knowledge base. 3rd ed. Mason: Thomson Custom Publishing; 2007.
Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics. Boston: Pearson Education.
Wells CS, Hin JM. Dealing with assumptions underlying statistical tests. Psychol Sch. 2007;44(5):495–502.
Download references
Author information
Authors and affiliations.
Centre for Business and Social Innovation, University of Technology Sydney, Ultimo, NSW, Australia
Yong Moon Jung
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Yong Moon Jung .
Editor information
Editors and affiliations.
School of Science and Health, Western Sydney University, Penrith, NSW, Australia
Pranee Liamputtong
Rights and permissions
Reprints and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this entry
Cite this entry.
Jung, Y.M. (2019). Data Analysis in Quantitative Research. In: Liamputtong, P. (eds) Handbook of Research Methods in Health Social Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-10-5251-4_109
Download citation
DOI : https://doi.org/10.1007/978-981-10-5251-4_109
Published : 13 January 2019
Publisher Name : Springer, Singapore
Print ISBN : 978-981-10-5250-7
Online ISBN : 978-981-10-5251-4
eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences
Share this entry
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Publish with us
Policies and ethics
- Find a journal
- Track your research
Data Analysis
- Introduction to Data Analysis
- Quantitative Analysis Tools
- Qualitative Analysis Tools
- Mixed Methods Analysis
- Geospatial Analysis
- Further Reading
What is Data Analysis?
According to the federal government, data analysis is "the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data" ( Responsible Conduct in Data Management ). Important components of data analysis include searching for patterns, remaining unbiased in drawing inference from data, practicing responsible data management , and maintaining "honest and accurate analysis" ( Responsible Conduct in Data Management ).
In order to understand data analysis further, it can be helpful to take a step back and understand the question "What is data?". Many of us associate data with spreadsheets of numbers and values, however, data can encompass much more than that. According to the federal government, data is "The recorded factual material commonly accepted in the scientific community as necessary to validate research findings" ( OMB Circular 110 ). This broad definition can include information in many formats.
Some examples of types of data are as follows:
- Photographs
- Hand-written notes from field observation
- Machine learning training data sets
- Ethnographic interview transcripts
- Sheet music
- Scripts for plays and musicals
- Observations from laboratory experiments ( CMU Data 101 )
Thus, data analysis includes the processing and manipulation of these data sources in order to gain additional insight from data, answer a research question, or confirm a research hypothesis.
Data analysis falls within the larger research data lifecycle, as seen below.
( University of Virginia )
Why Analyze Data?
Through data analysis, a researcher can gain additional insight from data and draw conclusions to address the research question or hypothesis. Use of data analysis tools helps researchers understand and interpret data.
What are the Types of Data Analysis?
Data analysis can be quantitative, qualitative, or mixed methods.
Quantitative research typically involves numbers and "close-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures ( Creswell & Creswell, 2018 , p. 4). Quantitative analysis usually uses deductive reasoning.
Qualitative research typically involves words and "open-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). According to Creswell & Creswell, "qualitative research is an approach for exploring and understanding the meaning individuals or groups ascribe to a social or human problem" ( 2018 , p. 4). Thus, qualitative analysis usually invokes inductive reasoning.
Mixed methods research uses methods from both quantitative and qualitative research approaches. Mixed methods research works under the "core assumption... that the integration of qualitative and quantitative data yields additional insight beyond the information provided by either the quantitative or qualitative data alone" ( Creswell & Creswell, 2018 , p. 4).
- Next: Planning >>
- Last Updated: Sep 30, 2024 4:47 PM
- URL: https://guides.library.georgetown.edu/data-analysis
Comprehensive Guide to Data Analysis in Qualitative Research
Data analysis of a qualitative research is a transformative process focused on examining non-numeric data to uncover patterns, themes, and insights. This careful and iterative approach allows researchers to delve into complex phenomena with great detail. Unlike quantitative analysis, which prioritizes numbers and statistical significance, data analysis of a qualitative research provides a deeper understanding through detailed examination of narratives, interviews, and observations.
Qualitative researchers immerse themselves in the data, often revisiting it multiple times to identify emerging patterns and relationships. This can involve coding the data, categorizing different pieces of information, and seeking connections that reveal underlying themes. The richness of qualitative data lies in its ability to capture the nuances of human experiences, providing context and meaning that numbers alone cannot convey. By delving into the subjective aspects of human behavior, qualitative analysis offers insights that help inform social policies, educational practices, and organizational strategies, among other applications.
Best Practices for Conducting Data Analysis in Qualitative Research
When conducting data analysis for qualitative research, it's essential to remain open-minded and flexible throughout the process. Begin by meticulously organizing your data, which often involves categorizing and labeling each piece of information for easy reference. Immerse yourself in the dataset to gain a comprehensive understanding of the context and nuances of the data collected. Employ various coding techniques to segment the data into meaningful categories, allowing for systematic examination. Utilize qualitative data tools to assist in identifying recurring themes and patterns across different data points. Consistent reflection and iterative analysis are key to ensuring robust insights, as they allow the researcher to refine their understanding and interpretations continually. Engaging in discussions with peers or mentors can also provide new perspectives and enhance the depth of the analysis.
How to Code and Categorize Data in Qualitative Research
Coding is a pivotal step in the analysis of data qualitative research. It involves labeling and categorizing segments of text to identify patterns and themes. Start with open coding, then progress to axial coding to establish connections between categories. Use AI qualitative data analysis software to streamline this process and enhance reliability.
Data analysis in qualitative studies involves examining non-numerical data to identify patterns, themes, and insights. Researchers often use methods such as coding, thematic analysis, and narrative analysis to interpret the data collected from interviews, focus groups, and observations. The process is iterative, allowing researchers to refine their understanding as they delve deeper into the data. By exploring participants' perspectives and experiences, qualitative data analysis provides a rich, nuanced understanding of complex social phenomena, offering valuable insights that quantitative studies might overlook.
Qualitative Data Analysis: Tools and Techniques for Accurate Results
There are numerous qualitative data tools available to aid researchers in their work. Software such as NVivo and Atlas.ti can facilitate the intricate coding process and help organize large datasets effectively, making it easier to identify patterns and themes. These tools allow researchers to link data segments, annotate findings, and visualize data in various ways. Employing techniques like thematic analysis can help researchers identify recurring themes and patterns across the data, while narrative analysis allows for a deeper exploration of how individual stories and experiences shape those patterns. By using these methods, researchers can generate comprehensive insights and ensure the validity and reliability of their findings, ultimately contributing to a more nuanced understanding of the research topic.
1. What is qualitative data analysis, and how is it different from quantitative analysis?
Qualitative data analysis focuses on exploring qualitative data through thematic exploration and pattern recognition, unlike quantitative analysis, which relies on statistical techniques and numerical data.
2. What are the most common methods used for qualitative data analysis?
Common methods include thematic analysis, grounded theory, and narrative analysis. These methods emphasize identifying themes and constructing narratives from the data.
3. How do I code data in qualitative research?
Begin with open coding to identify initial themes, then use axial coding to explore relationships. Qual data analysis software can assist in managing and refining codes.
4. What is thematic analysis, and how is it applied in qualitative research?
Thematic analysis is a method for pinpointing and analyzing patterns within qualitative data. It involves systematic examination of the data to identify recurring themes and insights.
5. How can I ensure the reliability and validity of qualitative data analysis?
Maintain reliability through meticulous data organization and consistent coding practices. Validity can be enhanced by triangulating data sources and using qualitative data tools to cross-verify findings. Additionally, data analysis for focus groups can provide multiple perspectives, enriching the analysis and ensuring a comprehensive understanding.
Nishi Singh
Incentive Distribution for Qualitative Research
Citations and AI: Transforming Research and Academic Writing
Exploring the World of Online Qualitative Research Platform
Evaluating Qual Data Analysis abilities of AI powered platforms
Qualitative Analysis
Data Analysis For Qualitative Research
Driving 72% Cost Efficiency in Online Qualitative Fieldwork
Quantitative Data Analysis 101
The Lingo, Methods and Techniques – Explained Simply.
By: Derek Jansen (MBA) and Kerryn Warren (PhD) | December 2020
Overview: Quantitative Data Analysis 101
- What (exactly) is quantitative data analysis?
- When to use quantitative analysis
- How quantitative analysis works
The two “branches” of quantitative analysis
- Descriptive statistics 101
- Inferential statistics 101
- How to choose the right quantitative methods
- Recap & summary
What is quantitative data analysis?
Despite being a mouthful, quantitative data analysis simply means analysing data that is numbers-based – or data that can be easily “converted” into numbers without losing any meaning.
For example, category-based variables like gender, ethnicity, or native language could all be “converted” into numbers without losing meaning – for example, English could equal 1, French 2, etc.
This contrasts against qualitative data analysis, where the focus is on words, phrases and expressions that can’t be reduced to numbers. If you’re interested in learning about qualitative analysis, check out our post and video here .
What is quantitative analysis used for?
Quantitative analysis is generally used for three purposes.
- Firstly, it’s used to measure differences between groups . For example, the popularity of different clothing colours or brands.
- Secondly, it’s used to assess relationships between variables . For example, the relationship between weather temperature and voter turnout.
- And third, it’s used to test hypotheses in a scientifically rigorous way. For example, a hypothesis about the impact of a certain vaccine.
Again, this contrasts with qualitative analysis , which can be used to analyse people’s perceptions and feelings about an event or situation. In other words, things that can’t be reduced to numbers.
How does quantitative analysis work?
Well, since quantitative data analysis is all about analysing numbers , it’s no surprise that it involves statistics . Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions).
Sounds like gibberish? Don’t worry. We’ll explain all of that in this post. Importantly, you don’t need to be a statistician or math wiz to pull off a good quantitative analysis. We’ll break down all the technical mumbo jumbo in this post.
Need a helping hand?
As I mentioned, quantitative analysis is powered by statistical analysis methods . There are two main “branches” of statistical methods that are used – descriptive statistics and inferential statistics . In your research, you might only use descriptive statistics, or you might use a mix of both , depending on what you’re trying to figure out. In other words, depending on your research questions, aims and objectives . I’ll explain how to choose your methods later.
So, what are descriptive and inferential statistics?
Well, before I can explain that, we need to take a quick detour to explain some lingo. To understand the difference between these two branches of statistics, you need to understand two important words. These words are population and sample .
First up, population . In statistics, the population is the entire group of people (or animals or organisations or whatever) that you’re interested in researching. For example, if you were interested in researching Tesla owners in the US, then the population would be all Tesla owners in the US.
However, it’s extremely unlikely that you’re going to be able to interview or survey every single Tesla owner in the US. Realistically, you’ll likely only get access to a few hundred, or maybe a few thousand owners using an online survey. This smaller group of accessible people whose data you actually collect is called your sample .
So, to recap – the population is the entire group of people you’re interested in, and the sample is the subset of the population that you can actually get access to. In other words, the population is the full chocolate cake , whereas the sample is a slice of that cake.
So, why is this sample-population thing important?
Well, descriptive statistics focus on describing the sample , while inferential statistics aim to make predictions about the population, based on the findings within the sample. In other words, we use one group of statistical methods – descriptive statistics – to investigate the slice of cake, and another group of methods – inferential statistics – to draw conclusions about the entire cake. There I go with the cake analogy again…
With that out the way, let’s take a closer look at each of these branches in more detail.
Branch 1: Descriptive Statistics
Descriptive statistics serve a simple but critically important role in your research – to describe your data set – hence the name. In other words, they help you understand the details of your sample . Unlike inferential statistics (which we’ll get to soon), descriptive statistics don’t aim to make inferences or predictions about the entire population – they’re purely interested in the details of your specific sample .
When you’re writing up your analysis, descriptive statistics are the first set of stats you’ll cover, before moving on to inferential statistics. But, that said, depending on your research objectives and research questions , they may be the only type of statistics you use. We’ll explore that a little later.
So, what kind of statistics are usually covered in this section?
Some common statistical tests used in this branch include the following:
- Mean – this is simply the mathematical average of a range of numbers.
- Median – this is the midpoint in a range of numbers when the numbers are arranged in numerical order. If the data set makes up an odd number, then the median is the number right in the middle of the set. If the data set makes up an even number, then the median is the midpoint between the two middle numbers.
- Mode – this is simply the most commonly occurring number in the data set.
- In cases where most of the numbers are quite close to the average, the standard deviation will be relatively low.
- Conversely, in cases where the numbers are scattered all over the place, the standard deviation will be relatively high.
- Skewness . As the name suggests, skewness indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph, or do they skew to the left or right?
Feeling a bit confused? Let’s look at a practical example using a small data set.
First, we can see that the mean weight is 72.4 kilograms. In other words, the average weight across the sample is 72.4 kilograms. Straightforward.
Next, we can see that the median is very similar to the mean (the average). This suggests that this data set has a reasonably symmetrical distribution (in other words, a relatively smooth, centred distribution of weights, clustered towards the centre).
In terms of the mode , there is no mode in this data set. This is because each number is present only once and so there cannot be a “most common number”. If there were two people who were both 65 kilograms, for example, then the mode would be 65.
Next up is the standard deviation . 10.6 indicates that there’s quite a wide spread of numbers. We can see this quite easily by looking at the numbers themselves, which range from 55 to 90, which is quite a stretch from the mean of 72.4.
And lastly, the skewness of -0.2 tells us that the data is very slightly negatively skewed. This makes sense since the mean and the median are slightly different.
As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones. On the left-hand side is the data set. This details the bodyweight of a sample of 10 people. On the right-hand side, we have the descriptive statistics. Let’s take a look at each of them.
As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones. But why do all of these numbers matter?
While these descriptive statistics are all fairly basic, they’re important for a few reasons:
- Firstly, they help you get both a macro and micro-level view of your data. In other words, they help you understand both the big picture and the finer details.
- Secondly, they help you spot potential errors in the data – for example, if an average is way higher than you’d expect, or responses to a question are highly varied, this can act as a warning sign that you need to double-check the data.
- And lastly, these descriptive statistics help inform which inferential statistical techniques you can use, as those techniques depend on the skewness (in other words, the symmetry and normality) of the data.
Simply put, descriptive statistics are really important , even though the statistical techniques used are fairly basic. All too often at Grad Coach, we see students skimming over the descriptives in their eagerness to get to the more exciting inferential methods, and then landing up with some very flawed results.
Branch 2: Inferential Statistics
As I mentioned, while descriptive statistics are all about the details of your specific data set – your sample – inferential statistics aim to make inferences about the population . In other words, you’ll use inferential statistics to make predictions about what you’d expect to find in the full population.
What kind of predictions, you ask? Well, there are two common types of predictions that researchers try to make using inferential stats:
- Firstly, predictions about differences between groups – for example, height differences between children grouped by their favourite meal or gender.
- And secondly, relationships between variables – for example, the relationship between body weight and the number of hours a week a person does yoga.
In other words, inferential statistics (when done correctly), allow you to connect the dots and make predictions about what you expect to see in the real world population, based on what you observe in your sample data. For this reason, inferential statistics are used for hypothesis testing – in other words, to test hypotheses that predict changes or differences.
For example, if your population of interest is a mix of 50% male and 50% female , but your sample is 80% male , you can’t make inferences about the population based on your sample, since it’s not representative. This area of statistics is called sampling, but we won’t go down that rabbit hole here (it’s a deep one!) – we’ll save that for another post . What statistics are usually used in this branch?
There are many, many different statistical analysis methods within the inferential branch and it’d be impossible for us to discuss them all here. So we’ll just take a look at some of the most common inferential statistical methods so that you have a solid starting point.
First up are T-Tests . T-tests compare the means (the averages) of two groups of data to assess whether they’re statistically significantly different. In other words, do they have significantly different means, standard deviations and skewness.
This type of testing is very useful for understanding just how similar or different two groups of data are. For example, you might want to compare the mean blood pressure between two groups of people – one that has taken a new medication and one that hasn’t – to assess whether they are significantly different.
Kicking things up a level, we have ANOVA, which stands for “analysis of variance”. This test is similar to a T-test in that it compares the means of various groups, but ANOVA allows you to analyse multiple groups , not just two groups So it’s basically a t-test on steroids…
Next, we have correlation analysis . This type of analysis assesses the relationship between two variables. In other words, if one variable increases, does the other variable also increase, decrease or stay the same. For example, if the average temperature goes up, do average ice creams sales increase too? We’d expect some sort of relationship between these two variables intuitively , but correlation analysis allows us to measure that relationship scientifically .
Lastly, we have regression analysis – this is quite similar to correlation in that it assesses the relationship between variables, but it goes a step further to understand cause and effect between variables, not just whether they move together. In other words, does the one variable actually cause the other one to move, or do they just happen to move together naturally thanks to another force? Just because two variables correlate doesn’t necessarily mean that one causes the other. Stats overload…
I hear you. To make this all a little more tangible, let’s take a look at an example of a correlation in action.
How to choose the right analysis method
To choose the right statistical methods, you need to think about two important factors :
- The type of quantitative data you have (specifically, level of measurement and the shape of the data). And,
- Your research questions and hypotheses
Let’s take a closer look at each of these.
Factor 1 – Data type
The first thing you need to consider is the type of data you’ve collected (or the type of data you will collect). By data types, I’m referring to the four levels of measurement – namely, nominal, ordinal, interval and ratio. If you’re not familiar with this lingo, check out the video below.
Well, because different statistical methods and techniques require different types of data. This is one of the “assumptions” I mentioned earlier – every method has its assumptions regarding the type of data.
For example, some techniques work with categorical data (for example, yes/no type questions, or gender or ethnicity), while others work with continuous numerical data (for example, age, weight or income) – and, of course, some work with multiple data types.
If you try to use a statistical method that doesn’t support the data type you have, your results will be largely meaningless . So, make sure that you have a clear understanding of what types of data you’ve collected (or will collect). Once you have this, you can then check which statistical methods would support your data types here .
If you haven’t collected your data yet, you can work in reverse and look at which statistical method would give you the most useful insights, and then design your data collection strategy to collect the correct data types.
Another important factor to consider is the shape of your data . Specifically, does it have a normal distribution (in other words, is it a bell-shaped curve, centred in the middle) or is it very skewed to the left or the right? Again, different statistical techniques work for different shapes of data – some are designed for symmetrical data while others are designed for skewed data.
Factor 2: Your research questions
The next thing you need to consider is your specific research questions, as well as your hypotheses (if you have some). The nature of your research questions and research hypotheses will heavily influence which statistical methods and techniques you should use.
If you’re just interested in understanding the attributes of your sample (as opposed to the entire population), then descriptive statistics are probably all you need. For example, if you just want to assess the means (averages) and medians (centre points) of variables in a group of people.
On the other hand, if you aim to understand differences between groups or relationships between variables and to infer or predict outcomes in the population, then you’ll likely need both descriptive statistics and inferential statistics.
So, it’s really important to get very clear about your research aims and research questions, as well your hypotheses – before you start looking at which statistical techniques to use.
Never shoehorn a specific statistical technique into your research just because you like it or have some experience with it. Your choice of methods must align with all the factors we’ve covered here.
Time to recap…
You’re still with me? That’s impressive. We’ve covered a lot of ground here, so let’s recap on the key points:
- Quantitative data analysis is all about analysing number-based data (which includes categorical and numerical data) using various statistical techniques.
- The two main branches of statistics are descriptive statistics and inferential statistics . Descriptives describe your sample, whereas inferentials make predictions about what you’ll find in the population.
- Common descriptive statistical methods include mean (average), median , standard deviation and skewness .
- Common inferential statistical methods include t-tests , ANOVA , correlation and regression analysis.
- To choose the right statistical methods and techniques, you need to consider the type of data you’re working with , as well as your research questions and hypotheses.
Learn More About Quantitative:
Triangulation: The Ultimate Credibility Enhancer
Triangulation is one of the best ways to enhance the credibility of your research. Learn about the different options here.
Inferential Statistics 101: Simple Explainer (With Examples)
Learn about the key concepts and tests within inferential statistics, including t-tests, ANOVA, chi-square, correlation and regression.
Descriptive Statistics 101: Simple Explainer (With Examples)
Learn about the key concepts and measures within descriptive statistics, including measures of central tendency and dispersion.
Validity & Reliability: Explained Simply
Learn about validity and reliability within the context of research methodology. Plain-language explainer video with loads of examples.
Research Design 101: Qualitative & Quantitative
Learn about research design for both qualitative and quantitative studies. Includes plain-language explanations and examples.
📄 FREE TEMPLATES
Research Topic Ideation
Proposal Writing
Literature Review
Methodology & Analysis
Academic Writing
Referencing & Citing
Apps, Tools & Tricks
The Grad Coach Podcast
78 Comments
Hi, I have read your article. Such a brilliant post you have created.
Thank you for the feedback. Good luck with your quantitative analysis.
Thank you so much.
Thank you so much. I learnt much well. I love your summaries of the concepts. I had love you to explain how to input data using SPSS
Very useful, I have got the concept
Amazing and simple way of breaking down quantitative methods.
This is beautiful….especially for non-statisticians. I have skimmed through but I wish to read again. and please include me in other articles of the same nature when you do post. I am interested. I am sure, I could easily learn from you and get off the fear that I have had in the past. Thank you sincerely.
Send me every new information you might have.
i need every new information
Thank you for the blog. It is quite informative. Dr Peter Nemaenzhe PhD
It is wonderful. l’ve understood some of the concepts in a more compréhensive manner
Your article is so good! However, I am still a bit lost. I am doing a secondary research on Gun control in the US and increase in crime rates and I am not sure which analysis method I should use?
Based on the given learning points, this is inferential analysis, thus, use ‘t-tests, ANOVA, correlation and regression analysis’
Well explained notes. Am an MPH student and currently working on my thesis proposal, this has really helped me understand some of the things I didn’t know.
I like your page..helpful
wonderful i got my concept crystal clear. thankyou!!
This is really helpful , thank you
Thank you so much this helped
Wonderfully explained
thank u so much, it was so informative
THANKYOU, this was very informative and very helpful
This is great GRADACOACH I am not a statistician but I require more of this in my thesis
Include me in your posts.
This is so great and fully useful. I would like to thank you again and again.
Glad to read this article. I’ve read lot of articles but this article is clear on all concepts. Thanks for sharing.
Thank you so much. This is a very good foundation and intro into quantitative data analysis. Appreciate!
You have a very impressive, simple but concise explanation of data analysis for Quantitative Research here. This is a God-send link for me to appreciate research more. Thank you so much!
Avery good presentation followed by the write up. yes you simplified statistics to make sense even to a layman like me. Thank so much keep it up. The presenter did ell too. i would like more of this for Qualitative and exhaust more of the test example like the Anova.
This is a very helpful article, couldn’t have been clearer. Thank you.
Awesome and phenomenal information.Well done
The video with the accompanying article is super helpful to demystify this topic. Very well done. Thank you so much.
thank you so much, your presentation helped me a lot
I don’t know how should I express that ur article is saviour for me 🥺😍
It is well defined information and thanks for sharing. It helps me a lot in understanding the statistical data.
I gain a lot and thanks for sharing brilliant ideas, so wish to be linked on your email update.
Very helpful and clear .Thank you Gradcoach.
Thank for sharing this article, well organized and information presented are very clear.
VERY INTERESTING AND SUPPORTIVE TO NEW RESEARCHERS LIKE ME. AT LEAST SOME BASICS ABOUT QUANTITATIVE.
An outstanding, well explained and helpful article. This will help me so much with my data analysis for my research project. Thank you!
wow this has just simplified everything i was scared of how i am gonna analyse my data but thanks to you i will be able to do so
simple and constant direction to research. thanks
This is helpful
Great writing!! Comprehensive and very helpful.
Do you provide any assistance for other steps of research methodology like making research problem testing hypothesis report and thesis writing?
Thank you so much for such useful article!
Amazing article. So nicely explained. Wow
Very insightfull. Thanks
I am doing a quality improvement project to determine if the implementation of a protocol will change prescribing habits. Would this be a t-test?
The is a very helpful blog, however, I’m still not sure how to analyze my data collected. I’m doing a research on “Free Education at the University of Guyana”
tnx. fruitful blog!
So I am writing exams and would like to know how do establish which method of data analysis to use from the below research questions: I am a bit lost as to how I determine the data analysis method from the research questions.
Do female employees report higher job satisfaction than male employees with similar job descriptions across the South African telecommunications sector? – I though that maybe Chi Square could be used here. – Is there a gender difference in talented employees’ actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – Is there a gender difference in the cost of actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – What practical recommendations can be made to the management of South African telecommunications companies on leveraging gender to mitigate employee turnover decisions?
Your assistance will be appreciated if I could get a response as early as possible tomorrow
This was quite helpful. Thank you so much.
wow I got a lot from this article, thank you very much, keep it up
Thanks for yhe guidance. Can you send me this guidance on my email? To enable offline reading?
Thank you very much, this service is very helpful.
Every novice researcher needs to read this article as it puts things so clear and easy to follow. Its been very helpful.
Wonderful!!!! you explained everything in a way that anyone can learn. Thank you!!
I really enjoyed reading though this. Very easy to follow. Thank you
Many thanks for your useful lecture, I would be really appreciated if you could possibly share with me the PPT of presentation related to Data type?
Thank you very much for sharing, I got much from this article
This is a very informative write-up. Kindly include me in your latest posts.
Very interesting mostly for social scientists
Thank you so much, very helpfull
You’re welcome 🙂
woow, its great, its very informative and well understood because of your way of writing like teaching in front of me in simple languages.
I have been struggling to understand a lot of these concepts. Thank you for the informative piece which is written with outstanding clarity.
very informative article. Easy to understand
Beautiful read, much needed.
Always greet intro and summary. I learn so much from GradCoach
Quite informative. Simple and clear summary.
I thoroughly enjoyed reading your informative and inspiring piece. Your profound insights into this topic truly provide a better understanding of its complexity. I agree with the points you raised, especially when you delved into the specifics of the article. In my opinion, that aspect is often overlooked and deserves further attention.
Absolutely!!! Thank you
Thank you very much for this post. It made me to understand how to do my data analysis.
its nice work and excellent job ,you have made my work easier
Wow! So explicit. Well done.
This explanation is very clear and straight forward. Excellent job!
Submit a Comment Cancel reply
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
Submit Comment
- Print Friendly
IMAGES
VIDEO
COMMENTS
Learn how to analyze data in research using different techniques and methods for qualitative and quantitative data. Find out the definition, purpose, types, and examples of data analysis in research.
As statistical analysis methods and research hypotheses interact, the chosen statistical approach must accurately reflect the characteristics of the data and objectives of the study. In response to this complexity, researchers must develop a methodological sophistication that extends beyond the simple application of statistical techniques.
An energy company can use it to evaluate consumption trends and streamline the production schedule. 7. Survival analysis. Survival analysis focuses on time-to-event data, such as the time it takes for a machine to break down or for a customer to churn. It looks at a variable with a start time and end time.
Learn about the definition, process, types, methods, tools, and applications of data analysis in research. Explore various statistical and computational techniques to inspect, clean, transform, and model data.
This article seeks to describe a systematic method of data analysis appropriate for undergraduate research theses, where the data consists of the results from available published research. We present a step-by-step guide with authentic examples and practical tips.
Learn about data analysis, the process of collecting, cleaning, transforming, describing, modeling, and interpreting data, generally using statistical techniques. Explore data collection, storage, and analysis methods, such as descriptive, exploratory, and predictive analysis.
A guide for choosing appropriate analysis models for quantitative data in health and social sciences. It explains the levels of measurement, the types of analysis techniques, and the assumptions of data for univariate, bivariate, and multivariate data analysis.
Through data analysis, a researcher can gain additional insight from data and draw conclusions to address the research question or hypothesis. Use of data analysis tools helps researchers understand and interpret data.
Data analysis in qualitative studies involves examining non-numerical data to identify patterns, themes, and insights. Researchers often use methods such as coding, thematic analysis, and narrative analysis to interpret the data collected from interviews, focus groups, and observations. The process is iterative, allowing researchers to refine ...
Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions). Sounds like gibberish? Don’t worry. We’ll explain all of that in this post.