Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 March 2018

An ANOVA approach for statistical comparisons of brain networks

  • Daniel Fraiman   ORCID: orcid.org/0000-0002-0482-9137 1 , 2 &
  • Ricardo Fraiman 3 , 4  

Scientific Reports volume  8 , Article number:  4746 ( 2018 ) Cite this article

21k Accesses

17 Citations

3 Altmetric

Metrics details

  • Data processing
  • Statistical methods

The study of brain networks has developed extensively over the last couple of decades. By contrast, techniques for the statistical analysis of these networks are less developed. In this paper, we focus on the statistical comparison of brain networks in a nonparametric framework and discuss the associated detection and identification problems. We tested network differences between groups with an analysis of variance (ANOVA) test we developed specifically for networks. We also propose and analyse the behaviour of a new statistical procedure designed to identify different subnetworks. As an example, we show the application of this tool in resting-state fMRI data obtained from the Human Connectome Project. We identify, among other variables, that the amount of sleep the days before the scan is a relevant variable that must be controlled. Finally, we discuss the potential bias in neuroimaging findings that is generated by some behavioural and brain structure variables. Our method can also be applied to other kind of networks such as protein interaction networks, gene networks or social networks.

Similar content being viewed by others

research paper on anova

Functional brain networks reflect spatial and temporal autocorrelation

research paper on anova

Null models in network neuroscience

research paper on anova

Telling functional networks apart using ranked network features stability

Introduction.

Understanding how individual neurons, groups of neurons and brain regions connect is a fundamental issue in neuroscience. Imaging and electrophysiology have allowed researchers to investigate this issue at different brain scales. At the macroscale, the study of brain connectivity is dominated by MRI, which is the main technique used to study how different brain regions connect and communicate. Researchers use different experimental protocols in an attempt to describe the true brain networks of individuals with disorders as well as those of healthy individuals. Understanding resting state networks is crucial for understanding modified networks, such as those involved in emotion, pain, motor learning, memory, reward processing, and cognitive development, among others. Comparing brain networks accurately can also lead to the precise early diagnosis of neuropsychiatric and neurological disorders 1 , 2 . Rigorous mathematical methods are needed to conduct such comparisons.

Currently, the two main techniques used to measure brain networks at the whole brain scale are Diffusion Tensor Imaging (DTI) and resting-state functional magnetic resonance imaging (rs-fMRI). In DTI, large white-matter fibres are measured to create a connectional neuroanatomy brain network, while in rs-fMRI, functional connections are inferred by measuring the BOLD activity at each voxel and creating a whole brain functional network based on functionally-connected voxels (i.e., those with similar behaviour). Despite technical limitations, both techniques are routinely used to provide a structural and dynamic explanation for some aspects of human brain function. These magnetic resonance neuroimages are typically analysed by applying network theory 3 , 4 , which has gained considerable attention for the analysis of brain data over the last 10 years.

The space of networks with as few as 10 nodes (brain regions) contains as many as 10 13 different networks. Thus, one can imagine the number of networks if one analyses brain network populations (e.g. healthy and unhealthy) with, say, 1000 nodes. However, most studies currently report data with few subjects, and the neuroscience community has recently begun to address this issue 5 , 6 , 7 and question the reproducibility of such findings 8 , 9 , 10 . In this work, we present a tool for comparing samples of brain networks. This study contributes to a fast-growing area of research: network statistics of network samples 11 , 12 , 13 , 14 .

We organized the paper as follows: In the Results section, we first present a discussion about the type of differences that can be observed when comparing brain networks. Second, we present the method for comparing brain networks and identifying network differences that works well even with small samples. Third, we present an example that illustrates in greater detail the concept of comparing networks. Next, we apply the method to resting-state fMRI data from the Human Connectome Project and discuss the potential biases generated by some behavioural and brain structural variables. Finally, in the Discussion section, we discuss possible improvements, the impact of sample size, and the effects of confounding variables.

Preliminars

Most studies that compare brain networks (e.g., in healthy controls vs. patients) try to identify the subnetworks, hubs, modules, etc. that are affected in the particular disease. There is a widespread belief (largely supported by data) that the brain network modifications induced by the factor studied (disease, age, sex, stimulus) are specific . This means that the factor will similarly affect the brains of different people.

On the other hand, labeled networks can be modified in many different ways while preserving the nodes, and these modifications can be categorized into three. In the first category, called here localized modifications , some particular identified links suffer changes by the factor. In the second, called unlocalized modifications , some links change, but the changed links differ among subjects. For example, the degree of interconnection of some nodes may decrease/increase by 50%, but in some individuals, this happens in the frontal lobe, in others in the right parietal lobe or the occipital lobe, and so on. In this case, the localization of the links/nodes affected by the factor can be considered random. In the third category, called here global modifications , some links (not the same across subjects) are changed, and these changes produce a global alteration of the network. For example, they can notably decrease/increase the average path length, the average degree, or the number of modules, or just produce more heterogeneous networks in a population of homogeneous ones. This last category is similar to the unlocalized modifications case, but in this case, an important global change in the network occurs.

In all cases, there are changes in the links influenced by the “factor”, while nodes are fixed. How to detect if any of these changes have occurred (hereinafter called detection) is one of the main challenges of this work. And, once their occurrence has been determined, we aim to identify where they occurred (hereinafter called identification). The difficulty lies in statistically asserting that the factor produced true modifications in the huge space of labeled networks. We aim to detect all three types of network modifications. Clearly, as is always true in statistics, more precise methods can be proposed when hypotheses regarding the data are more accurate (e.g., that the differences belong to the global modifications category). However, this last approach requires one to make many more assumptions about the brain’s behaviour. The assumptions are generally unverifiable; for this reason, we use a nonparametric approach, following the adage “less is more”, which is often very useful in statistics. For the detection problem, we developed an analysis of variance (ANOVA) test specifically for networks. As is well known, ANOVA is designed to test differences among the means of the subpopulations, and one may observe that equal means have different distributions. However, we propose a definition of means that will differ in the presence of any of the three modification categories mentioned above. As is well known, the identification stage is computationally far more complicated, and we address it partially looking at the subset of links or a subnetwork that present the highest network differences between groups.

Network Theory Framework

A network (or graph), denoted by G = ( V , E ), is an object described by a set V of nodes (vertices) and a set E   ⊂   V × V of links (edges) between them. In what follows, we consider families of networks defined over the same fixed finite set of n nodes (brain regions). A network is completely described by its adjacency matrix A   ∈  {0, 1} n  ×  n , where A ( i , j ) = 1 if and only if the link ( i , j )  ∈   E . If the matrix A is symmetric, then the graph is undirected; otherwise, we have a directed graph.

Let us suppose we are interested in studying the brain network of a given population, where most likely brain networks differ from each other to some extent. If we randomly choose a person from this population and study his/her brain network, what we obtain is a random network. This random network, G , will have a given probability of being network G 1 , another probability of being network G 2 , and so on until \({G}_{\tilde{n}}\) . Therefore, a random network is completely characterized by its probability law,

Likewise, a random variable is also completely characterized by its probability law. In this case, the most common test for comparing many subpopulations is the analysis of variance test (ANOVA). This test rejects the null hypothesis of equal means if the averages are statistically different. Here, we propose an ANOVA test designed specifically to compare networks.

To develop this test, we first need to specify the null assumption in terms of some notion of mean network and a statistic to base the test on. We only have at hand two main tools for that: the adjacency matrices of the networks and a notion of distance between networks.

The first step for comparing networks is to define a distance or metric between them. Given two networks G 1 , G 2 we consider the most classical distance, the edit distance 15 defined as

This distance corresponds to the minimum number of links that must be added and subtracted to transform G 1 into G 2 (i.e. the number of different links), and is the L 1 distance between the two matrices. We will also use equation ( 2 ) for the case of weighted networks, i.e. for matrices with A ( i , j ) taking values between 0 and 1. It is important to mention that the results presented here are still valid under other metrics 16 , 17 , 18 .

Next, we consider the average weighted network - hereinafter called the average network - defined as the network whose adjacency matrix is the average of the adjacency matrices in the sample of networks. More precisely, we consider the following definitions.

Definition 1

Given a sample of networks { G 1 , …, G l } with the same distribution

The average network \( {\mathcal M} \) that has as adjacency matrix the average of the adjacency matrices

which in terms of the population version corresponds to the mean matrix \( {\mathcal M} (i,\,j)={\mathbb{E}}({A}_{{\bf{G}}}(i,\,j))=:{p}_{ij}\) .

The average distance around a graph H is defined as

which corresponds to the mean population distance

With these definitions in mind, the natural way to define a measure of network variability is

which measures the average distance (variability) of the networks around the average weighted network.

Given m subpopulations G 1 , …, G m the null assumption for our ANOVA test will be that the means of the m subpopulations \({\tilde{{ {\mathcal M} }}}_{1},\,\ldots ,\,{\tilde{{ {\mathcal M} }}}_{m}\) are the same. The test statistic will be based on a normalized version of the sum of the differences between \({\bar{d}}_{{G}^{i}}({{ {\mathcal M} }}_{i})\) and \({\bar{d}}_{G}({{ {\mathcal M} }}_{i})\) , where \({\bar{d}}_{{G}^{i}}\) and \({\bar{d}}_{G}\) are calculated according to (4) using the i –sample and the pooled sample respectively. This is developed in more detail in the next section.

Detecting and identifying network differences

Now we address the testing problem. Let \({G}_{1}^{1},{G}_{2}^{1},\ldots ,{G}_{{n}_{1}}^{1}\) denote the networks from subpopulation 1, \({G}_{1}^{2},{G}_{2}^{2},\ldots ,{G}_{{n}_{2}}^{2}\) the ones from subpopulation 2, and so on until \({G}_{1}^{m},{G}_{2}^{m},\ldots ,{G}_{{n}_{m}}^{m}\) the networks of subpopulation m . Let G 1 , G 2 , …, G n denote, without superscript, the complete pooled sample of networks, where \(n={\sum }_{i\mathrm{=1}}^{m}{n}_{i}\) . And finally, let \({{ {\mathcal M} }}_{i}\) and σ i denote the average network and the variability of the i -subpopulation of networks. We want to test (H 0 )

that all the subpopulations have the same mean network, under the alternative that at least one subpopulation has a different mean network.

It is interesting to note that for objects that are networks, the average network ( \({ {\mathcal M} }\) ) and the variability ( σ ) are not independent summary measures. In fact, the relationship between them is given by

Therefore, the proposed test can also be considered a test for equal variability. The proposed statistic for testing the null hypothesis is:

where a is a normalization constant given in Supplementary Information  1.3 . This statistic measures the difference between the network variability of each specific subpopulation and the average distance between all the populations and the specific average network. Theorem 1 states that under the null hypothesis (items (i) and (ii)) T is asymptotically Normal(0, 1), and if H 0 is false (item (iii)) T will be smaller than some negative constant c . This specific value is obtained by the following theorem (see the Supplementary Information  1 for the proof).

. Under the null hypothesis, the T statistic fulfills (i) and (ii), while T is sensitive to the alternative hypothesis, and (iii) holds true.

\({\mathbb{E}}(T)=0\)

T is asymptotically ( K : = min { n 1 , n 2 , .., n m } → ∞) Normal(0, 1).

Under the alternative hypothesis, T will be smaller than any negative value if K is large enough (The test is consistent).

This theorem provides a procedure for testing whether two or more groups of networks are different. Although having a procedure like the one described is important, we not only want to detect network differences, we also want to identify the specific network changes or differences. We discuss this issue next.

Identification

. Let us suppose that the ANOVA test for networks rejects the null hypothesis, and now the main goal is to identify network differences. Two main objectives are discussed:

Identification of all the links that show statistical differences between groups.

Identification of a set of nodes (a subnetwork) that present the highest network differences between groups.

The identification procedure we describe below aims to eliminate the noise (links or nodes without differences between subpopulations) while keeping the signal (links or nodes with differences between subpopulations).

Given a network G = ( V , E ) and a subset of links \(\tilde{E}\subset E\) , let us generically denote \({G}_{\tilde{E}}\) the subnetwork with the same nodes but with links identified by the set \(\tilde{E}\) . The rest of the links are erased. Given a subset of nodes \(\tilde{V}\subset V\) let us denote \({G}_{\tilde{V}}\) the subnetwork that only has the nodes (with the links between them) identified by the set \(\tilde{V}\) . The T statistic for the sample of networks with only the set of \(\tilde{E}\) links is denoted by \({T}_{\tilde{E}}\) , and the T statistic computed for all the sample networks with only the nodes that belong to \(\tilde{V}\) is denoted by \({T}_{\tilde{V}}\) .

The procedure we propose for identifying all the links that show statistical differences between groups is based on the minimization for \(\tilde{E}\subset E\) of \({T}_{\tilde{E}}\) . The set of links, \(\bar{E}\) , defined by

contain all the links that show statistical differences between subpopulations. One limitation of this identification procedure is that the space E is huge (# E  = 2 n ( n −1)/2 where n is the number of nodes) and an efficient algorithm is needed to find the minimum. That is why we focus on identifying a group of nodes (or a subnetwork) expressing the largest differences.

The procedure proposed for identifying the subnetwork with the highest statistical differences between groups is similar to the previous one. It is based on the minimization of \({T}_{\tilde{V}}\) . The set of nodes, N , defined by

contains all relevant nodes. These nodes make up the subnetwork with the largest difference between groups. In this case, the complexity is smaller, since the space V is not so big (# V = 2 n  −  n  − 1).

As in other well-known statistical procedures such as cluster analysis or selection of variables in regression models, finding the size \(\tilde{j}:=\#N\) of the number of nodes in the true subnetwork is a difficult problem due to possible overestimation of noisy data. The advantage of knowing \(\tilde{j}\) is that it reduces the computational complexity for finding the minimum to an order of \({n}^{\tilde{j}}\) instead of 2 n if we have to look for all possible sizes. However, the problem in our setup is less severe than other cases since the objective function ( \({T}_{\tilde{V}}\) ) is not monotonic when the size of the space increases. To solve this problem, we suggest the following algorithm.

Let V { j } be the space of networks with j distinguishable nodes, j   ∈  {2, 3, …, n } and \(V=\mathop{\cup }\limits_{j}{V}_{\{j\}}\) . The nodes N j

define a subnetwork. In order to find the true subnetwork with differences between the groups, we now study the sequence T 2 , T 3 , …, T n . We continue with the search (increasing j ) until we find \(\tilde{j}\) fulfilling

where g is a positive function that decreases together with the sample size (in practice, a real value). \({N}_{\tilde{j}}\) are the nodes that make up the subnetwork with the largest differences among the groups or subpopulations studied.

It is important to mention that the procedures described above do not impose any assumption regarding the real connectivity differences between the populations. With additional hypotheses, the procedure can be improved. For instance, in 14 , 19 the authors proposed a methodology for the edge-identification problem that is powerful only when the real difference connection between the populations form a large unique connected component.

Examples and Applications

A relevant problem in the current neuroimaging research agenda is how to compare populations based on their brain networks. The ANOVA test presented above deals with this problem. Moreover, the ANOVA procedure allows the identification of the variables related to the brain network structure. In this section, we show an example and application of this procedure in neuroimaging (EEG, MEG, fMRI, eCoG). In the example we show the robustness of the procedures for testing and identification of different sample sizes. In the application, we analyze fMRI data to understand which variables in the dataset are dependent on the brain network structure. Identifying these variables is also very important because any fair comparison between two or more populations requires these variables be controlled (similar values).

Let us suppose we have three groups of subjects with equal sample size, K , and the brain network of each subject is studied using 16 regions (electrodes or voxels). Studies show connectivity between certain brain regions is different in certain neuropathologies, in aging, under the influence of psychedelic drugs, and more recently, in motor learning 20 , 21 . Recently, we have shown that a simple way to study connectivity is by what the physics community calls “the correlation function” 22 . This function describes the correlation between regions as a function of the distance between them. Although there exist long range connections, on average, regions (voxels or electrodes) closer to each other interact strongly, while distant ones interact more weakly. We have shown that the way in which this function decays with distance is a marker of certain diseases 23 , 24 , 25 . For example, patients with a traumatic brachial plexus lesion with root avulsions revealed a faster correlation decay as a function of distance in the primary motor cortex region corresponding to the arm 24 .

Next we present a toy model that analyses the method’s performance. In a network context, the behaviour described above can be modeled in the following way: since the probability that two regions are connected is a monotonic function of the correlation between them (i.e. on average, distant regions share fewer links than nearby regions) we decided to skip the correlations and directly model the link probability as an exponential function that decays with distance. We assume that the probability that region i is connected with j is defined as

where d ( i , j ) is the distance between regions i and j . For the alternative hypothesis, we consider that there are six frontal brain regions (see Fig.  1 Panel A) that interact with a different decay rate in each of the three subpopulations. Figure  1 panel (A) shows the 16 regions analysed on an x-y scale. Panel (B) shows the link probability function for all electrodes and for each subpopulation. As shown, there is a slight difference between the decay of the interactions between the frontal electrodes in each subpopulation ( λ 1 = 1, λ 2 = 0.8 and λ 3 = 0.6 for groups 1, 2 and 3, respectively). The aim is to determine whether the ANOVA test for networks detects the network differences that are induced by the link probability function.

figure 1

Detection problem. ( A ) Diagram of the scalp (each node represent a EEG electrode) on an x-y scale and the link probability. The three groups confirm the equation P ( ○  ↔ •) =  P (• ↔ •) =  e − d . ( B ) Link probability of frontal electrodes, P ( ○  ↔  ○ ), as a function of the distance for the three subpopulations. (C) Power of the tests as a function of sample size, K . Both tests are presented.

Here we investigated the power of the proposed test by simulating the model under different sample sizes ( K ). K networks were computed for each of the three subpopulations and the T statistic was computed for each of 10,000 replicates. The proportion of replicates with a T value smaller than −1.65 is an estimation of the power of the test for a significance level of 0.05 (unilateral hypothesis testing). Star symbols in Fig.  1C represent the power of the test for the different sample sizes. For example, for a sample size of 100, the test detects this small difference between the networks 100% of the time. As expected, the test has less power for small sample sizes, and if we change the values λ 2 and λ 3 in the model to 0.66 and 0.5, respectively, power increases. In this last case, the power changed from 64% to 96% for a sample size of 30 (see Supplementary Fig.  S1 for the complete behaviour).

To the best of our knowledge, the T statistic is the first proposal of an ANOVA test for networks. Thus, here we compare it with a naive test where each individual link is compared among the subpopulations. The procedure is as follows: for each link, we calculate a test for equal proportions between the three groups to obtain a p-value for each link. Since we are conducting multiple comparisons, we apply the Benjamini-Hochberg procedure controlling at a significance level of α = 0.05. The procedure is as follows:

1. Compute the p-value of each link comparison, pv 1 , pv 2 , …, pv m .

2. Find the j largest p-value such that \(p{v}_{(j)}\le \frac{j}{m}\alpha \mathrm{.}\)

3. Declare that the link probability is different for all links that have a p-value ≤  pv ( j ) .

This procedure detects differences in the individual links while controlling for multiple comparisons. Finally, we consider the networks as being different if at least one link (of the 15 that have real differences) was detected to have significant differences. We will call this procedure the “Links Test”. Crosses in Fig.  1C correspond to the power of this test as a function of the sample size. As can be observed, the test proposed for testing equal mean networks is much more powerful than the previous test.

Theorem 1 States that T is asymptotically (sample size → ∞) Normal(0, 1) under the Null hypothesis. Next we investigated how large the sample size must be to obtain a good approximation. Moreover, we applied Theorem 1 in the simulations above for K = {30, 50, 70, 100}, but we did not show that the approximation is valid for K = 30, for example. Here, we show that the normal approximation is valid even for K = 30 in the case of 16-node networks. We simulated 10,000 replicates of the model considering that all three groups have exactly the same probability law given by group 1, i.e. all brain connections confirm the equation \(P(i\leftrightarrow j)={e}^{-{\lambda }_{1}d(i,j)}\) for the three groups (H 0 hypothesis). The T value is computed for each replicate of sample size K = 30, and the distribution is shown in Fig.  2(A) . The histogram shows that the distribution is very close to normal. Moreover, the Kolmogorov-Smirnov test against a normal distribution did not reject the hypothesis of a normal distribution for the T statistic (p-value = 0.52). For sample sizes smaller than 30, the distribution has more variance. For example, for K = 10, the standard deviation of T is 1.1 instead of 1 (see Supplementary Fig.  S2 ). This deviation from a normal distribution can also be observed in panel B where we show the percentage of Type I errors as a function of the sample size ( K ). For sample sizes smaller than 30, this percentage is slightly greater than 5%, which is consistent with a variance greater than 1. The Links test procedure yielded a Type I error percentage smaller than 5% for small sample sizes.

figure 2

Null hypothesis. ( A ) Histogram of T statistics for K = 30. ( B ) Percentage of Type I Error as a function of sample size, K . Both tests are presented.

Finally, we applied the subnetwork identification procedure described before to this example. Fifty simulations were performed for the model with a sample size of K = 100. For each replication, the minimum statistic T j was studied as a function of the number of j nodes in the subnetwork. Figure  3A and B show two of the 50 simulation outcomes for the T j function of ( j ) number of nodes. Panel A shows that as nodes are incorporated into the subnetwork, the statistic sharply decreases to six nodes, and further incorporating nodes produces a very small decay in T j in the region between six and nine nodes. Finally, adding even more nodes results in a statistical increase. A similar behaviour is observed in the simulation shown in panel B, but the “change point” appears for a number of nodes equal to five. If we define that the number of nodes with differences, \(\tilde{j}\) , confirms

we obtain the values circled. For each of the 50 simulations, we studied the value \(\tilde{j}\) and a histogram of the results is shown in Panel C. With the criteria defined, most of the simulations (85%) result in a subnetwork of 6 nodes, as expected. Moreover, these 6 nodes correspond to the real subnetwork with differences between subpopulations (white nodes in Fig.  1A ). This was observed in 100% of simulations with \(\tilde{j}\)  = 6 (blue circles in Panel D). In the simulations where this value was 5, five of the six true nodes were identified, and five of the six nodes with differences vary between simulations (represented with grey circles in Panel D). For the simulations where \(\tilde{j}\)  = 7, all six real nodes were identified and a false node (grey circle) that changed between simulations was identified as being part of the subnetwork with differences.

figure 3

Identification problem. ( A , B ) Statistic T j as a function of the number of nodes of the subnetwork ( j ) for two simulations. Blue circles represent the value \(\tilde{j}\) following the criteria described in the text. ( C ) Histogram of the number of subnetwork nodes showing differences, \(\tilde{j}\) . ( D ) Identification of the nodes. Blue and grey circles represent the nodes identified from the set \({N}_{\tilde{j}}\) . Circled blue nodes are those identified 100% of the time. Grey circles represent nodes that are identified some of the time. On the left, grey circles alternate between the six white nodes. On the right, the grey circle alternates between the black nodes.

The identification procedure was also studied for a smaller sample size of K = 30, and in this case, the real subnetwork was identified only 28% of the time (see Suppplementary Fig.  S3 for more details). Identifying the correct subnetwork is more difficult (larger sample sizes are needed) than detecting global differences between group networks.

Resting-state fMRI functional networks

In this section, we analysed resting-state fMRI data from the 900 participants in the 2015 Human Connectome Project (HCP 26 ). We included data from the 812 healthy participants who had four complete 15-minute rs-fMRI runs, for a total of one hour of brain activity. We partitioned the 812 participants into three subgroups and studied the differences between the brain groups. Clearly, if the participants are randomly divided into groups, no brain subgroup differences are expected, but if the participants are divided in an intentional way, differences may appear. For example, if we divided the 812 by the amount of hours slept before the scan ( G 1 less than 6 hours, G 2 between 6 and 7 hours, and G 3 more than 7) it might be expected 27 , 28 to observe differences in brain connectivity on the day of the scan. Moreover, as a by-product, we obtain that this variable is an important factoring variable to be controlled before the scan. Fortunately, HCP provides interesting individual socio-demographic, behavioural and structural brain data to facilitate this analysis. Moreover, using a previous release of the HCP data (461 subjects), Smith et al . 29 , using a multivariate analysis (canonical correlation), showed that a linear combination of demographics and behavior variables highly correlates with a linear combination of functional interactions between brain parcellations (obtained by Independent Component Analysis). Our approach has the same spirit, but has some differences. In our case, the main objective is to identify variables that “explain” (that are dependent with) the individual brain network. We do not impose a linear relationship between non-imaging and imaging variables, and we study the brain network as a whole object without different “loads” in each edge. Our method does not impose any kind of linearity, and it also detects linear and non-linear dependence structures.

Data were pre-processed by HCP 30 , 31 , 32 (details can be found in 30 ), yielding the following outputs:

Group-average brain regional parcellations obtained by means of group-Independent Component Analysis (ICA 33 ). Fifteen components are described.

Subject-specific time series per ICA component.

Figure  4(A) shows three of the 15 ICA components with the specific one hour time series for a particular subject. These signals were used to construct an association matrix between pairs of ICA components per subject. This matrix represents the strength of the association between each pair of components, which can be quantified by different functional coupling metrics, such as the Pearson correlation coefficient between the signals of the component, which we adopted in the present study (panel (B)). For each of the 812 subjects, we studied functional connectivity by transforming each correlation matrix, Σ, into binary matrices or networks, G , (panel (C)). Two criteria for this transformation were used 34 , 35 , 36 : a fixed correlation threshold and a fixed number of links criterion. In the first criterion, the matrix was thresholded by a value ρ affording networks with varying numbers of links. In the second, a fixed number of link criteria were established and a specific threshold was chosen for each subject.

figure 4

( A ) ICA components and their corresponding time series. ( B ) Correlation matrix of the time series. ( C ) Network representation. The links correspond to the nine highest correlations.

As we have already mentioned, HCP provides interesting individual socio-demographic, behavioural and structural brain data. Variables are grouped into seven main categories: alertness, motor response, cognition, emotion, personality, sensory, and brain anatomy. Volume, thickness and areas of different brain regions were computed using the T1-weighted images of each subject in Free Surfer 37 . Thus, for each subject, we obtained a brain functional network, G , and a multivariate vector X that contains this last piece of information.

The main focus of this section is to analyse the “impact” of each of these variables ( X ) on the brain networks (i.e., on brain activity). To this end, we first selected a variable such as k , X k , and grouped each subject according to his/her value into only one of three categories (Low, Medium, or High) just by placing the values in ascending and using the 33.3% percentile. In this way, we obtained three groups of subjects, each identified by its correlation matrix \({{\rm{\Sigma }}}_{1}^{L},\,\ldots ,\,{{\rm{\Sigma }}}_{{n}_{L}}^{L}\) , \({{\rm{\Sigma }}}_{1}^{M},\,\ldots ,\,{{\rm{\Sigma }}}_{{n}_{M}}^{M}\) , and \({{\rm{\Sigma }}}_{1}^{H},\,\ldots ,\,{{\rm{\Sigma }}}_{{n}_{H}}^{H}\) , or by its corresponding network (once the criteria and the parameter are chosen) \({G}_{1}^{L},\,\ldots ,\,{G}_{{n}_{L}}^{L},\,\,\,{G}_{1}^{M},\,\ldots ,\,{G}_{{n}_{M}}^{M}\) , and \({G}_{1}^{H},\,\ldots ,\,{G}_{{n}_{H}}^{H}\) . The sample size of each group ( n L , n M , and n H ) is approximately 1/3 of 812, except in cases where there were ties. Once we obtained these three sets of networks, we applied the developed test. If differences exist between all three groups, then we are confirming an interdependence between the factoring variable and the functional networks. However, we cannot yet elucidate directionality (i.e., different networks lead to different sleeping patterns or vice versa?).

After filtering the data, we identified 221 variables with 100% complete information for the 812 subjects, and 90 other variables with almost complete information, giving a total of 311 variables. We applied the network ANOVA test for each of these 311 variables and report the T statistic. Figure  5(A) shows the T statistic for the variable Thickness of the right Inferior Parietal region. All values of the T statistic are between −2 and 2 for all ρ values using the fixed correlation criterion (left panel) for constructing the networks. The same occurs when a fixed number of link criteria is used (right panel). According to Theorem 1, when there are no differences between groups, T is asymptotically normal (0, 1), and therefore a value smaller than −3 is very unlikely (p-value = 0.00135). Since all T values are between −2 and 2, we assert that Thickness of the right Inferior Parietal region is not associated with the resting-state functional interactions. In panel (B), we show the T statistic for the variable Amount of hours spent sleeping on the 30 nights prior to the scan (“During the past month, how many hours of actual sleep did you get at night? (This may be different than the number of hours you spent in bed.)”) which corresponds to the alertness category. As one can see, most T values are much lower than −3, rejecting the hypothesis of equal mean network. Importantly, this shows that the number of hours a person sleeps is associated with their brain functional networks (or brain activity). However, as explained above, we do not know whether the number of hours slept the nights before represent these individuals’ habitual sleeping patterns, complicating any effort to infer causation. In other words, six hours of sleep for an individual who habitually sleeps six hours may not produce the same network pattern as six hours in an individual who normally sleeps eight hours (and is likely tired during the scan). Alternatively, different activity observed during waking hours may “produce” different sleep behaviours. Nevertheless, we know that the amount of hours slept before the scan should be measured and controlled when scanning a subject. In Panel (C), we show that brain volumetric variables can also influence resting-state fMRI networks. In that panel, we show the T value for the variable Area of the left Middle temporal region. Significant differences for both network criteria are also observed for this variable.

figure 5

( A – C ) T –statistics as a function of (left panel) ρ and (right panel) the number of links for three variables: ( A ) Right Inferioparietal Thickness, ( B ) Number of hours slept the nights prior to the scan. ( C ) Left Middle temporal Area. ( D ) W -statistic distribution (black bars) based on a bootstrap strategy. The W -statistic of the three variables studied is depicted with dots.

Under the hypothesis of equal mean networks between groups, we expect not to obtain a T statistic less than −3 when comparing the sample networks. We tested several different thresholds and numbers of links in order to present a more robust methodology. However, in this way, we generate sets of networks that are dependent on each criterion and between criteria, similarly to what happens when studying dynamic networks with overlapping sliding windows. This makes the statistical inference more difficult. To address this problem, we decided to define a new statistic based on T , W 3 , and study its distribution using the bootstrap resampling technique. The new statistic is defined as,

where Δ is the number of values of T that are lower than −3 for the resolution (grid of thresholds) studied. The supraindex in Δ indicates the criteria (correlation threshold, ρ or number of links fixed, L ) and the subindex indicates whether it is for positive or negative parameter values ( ρ or number of links). For example, Fig.  5(C) reveals that the variable Area of the left Middle temporal confirms having \({{\rm{\Delta }}}_{+}^{\rho }=10\) , \({{\rm{\Delta }}}_{-}^{\rho }=10\) , \({{\rm{\Delta }}}_{+}^{L}=9\) , and \({{\rm{\Delta }}}_{-}^{L}=9\) , and therefore W 3 = 9. The distribution of W 3 under the null hypothesis is studied numerically. Ten thousand random resamplings of the real networks were selected and the W 3 statistic was computed for each one. Figure  5(D) shows the W empirical distribution (under the null hypothesis) with black bars. Most W 3 values are zero, as expected. In this figure, the W 3 values of the three variables described are also represented by dots. The extreme values of W 3 for the variables Amount of Sleep and Middle Temporal Area L confirm that these differences are not a matter of chance. Both variables are related to brain network connectivity.

So far we have shown, among other things, that functional networks differ between individuals who get more or fewer hours of sleep, but how do these networks differ exactly? Fig.  6(A) shows the average networks for the three groups of subjects. There are differences in connectivity strength between some of the nodes (ICA components). These differences are more evident in panel (B), which presents a weighted network Ψ with links showing the variability among the subpopulation’s average networks. This weighted network is defined as

where \(\overline{{ {\mathcal M} }}(i,\,j)=\frac{1}{3}\mathop{\sum _{s\mathrm{=1}}}\limits^{3}{{ {\mathcal M} }}^{{\rm{grp}}s}\) . The role of Ψ is to highlight the differences between the mean networks. The greatest difference is observed between nodes 1 and 11. Individuals that sleep 6.5 hours or less show the strongest connection between ICA component number 1 (which corresponds to the occipital pole and the cuneal cortex in the occipital lobe) and ICA component number 11 (which includes the middle and superior frontal gyri in the frontal lobe, the superior parietal lobule and the angular gyrus in the parietal lobe). Another important connection that differs between groups is the one between ICA components 1 and 8, which corresponds to the anterior and posterior lobes of the cerebellum. Using the subnetwork identification procedure previously described (see Fig.  6C ) we identified a 7-node subnetwork as the most significant for network differences. The nodes that make up that network are presented in panel D.

figure 6

( A ) Average network for each subgroup defined by hours of sleep ( B ) Weighted network with links that represent the differences among the subpopulation mean networks. ( C ) T j -statistic as a function of the number of nodes in each subnetwork ( j ). The nodes identified by the minimum T j are presented in the boxes, while the number of nodes identified by the procedure are represented with a red circle. ( D ) Nodes from the identified subnetwork are circled in blue. The nodes identified in ( D ) correspond to those in panel ( B ).

The results described above refer to only three of the 311 variables we analysed. In terms of the remaining variables, we observed more variables that partitioned the subjects into groups presenting statistical differences between the corresponding brain networks. Two more behavioral variables were identified the variable Dimensional Change Card Sort (CardSort_AgeAdj and CardSort_Unadj) which is a measure of cognitive flexibility, and the variable motor strength (Strength_AgeAdj and Strength_Unadj). Also 20 different brain volumetric variables were identified, the complete list of these variables is shown in Suppl. Table  S1 . It is important to note that these brain volumetric variables are largely dependent on each other; for example, individuals with larger inferior-temporal areas often have a greater supratentorial volume, and so on (see Suppl. Fig.  S4 ).

We have reported only those variables for which there is very strong statistical evidence in favor of the existence of dependence between the functional networks and the “behavioral” variables, irrespectively of the threshold used to build up the networks. There are other variables that show this dependence only for some levels of the threshold parameter, but we do not report these to avoid reporting results that may not be significant. Our results complement those observed in 29 . In particular, Smith et al . report that the variable Picture Vocabulary test is the most significant. With a less restrictive criterion, this variable can also be considered significant with our methodology. In fact, the W 3 value equals 3 (see Supplementary Fig.  S5 for details), which supports the notion (see panel D in Fig.  5 ) that the variable Picture Vocabulary test is also relevant for explaining the functional networks. On the other hand, the variable we found to vary significantly ( W 3  = 9) the Amount of sleep is not reported by Smith et al . Perhaps the canonical correlation cannot find the variable because it looks for linear correlations in a high dimensional space. It is well known that non-linearities appear typically in high dimensional statistical problems (See for instance 38 ). To capture nonlinear associations, a kernel CCA method was introduced, see 39 , 40 and the references therein. By contrast, our method does not impose any kind of linearity, and detects linear as well as non-linear dependence structures. The variable “Cognitive flexibility” (Card Sort) found here was also reported in 38 . Finally, the brain volumetric variables we found to be relevant here were not analyzed in 29 .

So far, we apply the methodology presented here to analyse brain data by using only 15 brain ICA dimensions (provided by HCP). But, what is the impact of working with more ICA components? Does we identify more covariables? Fortunately, we can respond these questions since more ICA dimensions were recently made available on HCP webpage. Three new cognitive variables, Working memory , Relational processing and Self-regulation/Impulsivity were identified for higher network dimension (50 and 300 ICA dimensions, see Suppl. Table  S2 for details).

Performing statistical inference on brain networks is important in neuroimaging. In this paper, we presented a new method for comparing anatomical and functional brain networks of two or more subgroups of subjects. Two problems were studied: the detection of differences between the groups and the identification of the specific network differences. For the first problem, we developed an ANOVA test based on the distance between networks. This test performed well in terms of detecting existing differences (high statistical power). Finally, based on the statistics developed for the testing problem, we proposed a way of solving the identification problem. Next, we discuss our findings.

Based on the minimization of the T statistic, we propose a method for identifying the subnetwork that differs among the subgroups. This subnetwork is very useful. On the one hand, it allows us to understand which brain regions are involved in the specific comparison study (neurobiological interpretation), and on the other, it allows us to identify/diagnose new subjects with greater accuracy.

The relationship between the minimum T value for a fixed number of nodes as a function of the number of nodes ( T j vs. j ) is very informative. A large decrease in T j incorporating a new node into the subnetwork ( T j + 1 << T j ) means that the new node and its connections explain much of the difference between groups. A very small decrease shows that the new node explains only some of the difference because either the subgroup difference is small for the connections of the new node, or because there is a problem of overestimation.

The correct number of nodes in each subnetwork must verify

In this paper, we present ad hoc criteria in each example (a certain constant for g ( sample size )) and we do not give a general formula for g ( sample size ). We believe that this could be improved in theory, but in practice, one can propose a natural way to define the upper bound and subsequently identify the subnetwork, as we showed in the example and in the application by observing T j as a function of j . Statistical methods such as the one developed for change-point detection may be useful in solving this problem.

Sample size

What is the adequate sample size for comparing brain networks? This is typically the first question in any comparison study. Clearly, the response depends on the magnitude of the network differences between the groups and the power of the test. If the subpopulations differ greatly, then a moderate number of networks in each group is enough. On the other hand, if the differences are not very big, then a larger sample size is required to have a reasonable power of detection. The problem gets more complicated when it comes to identification. We showed in Example 1 that we obtain a good identification rate when a sample size of 100 networks is selected from each subgroup. Thus, the rate of correct identification is small for a sample size of for example 30.

Confounding variables in Neuroimaging

Humans are highly variable in their brain activity, which can be influenced, in turn, by their level of alertness, mood, motivation, health and many other factors. Even the amount of coffee drunk prior to the scan can greatly influence resting-state neural activity. What variables must be controlled to make a fair comparison between two or more groups? Certainly age, gender, and education are among those variables, and in this study we found that the amount of hours slept the nights prior to the scan is also relevant. Although this might seem pretty obvious, to the best of our knowledge, most studies do not control for this variable. Five other variables were identified, each one related with some dimensions of cognitive flexibility, self-regulation/impulsivity, relational processing, working memory or motor strength. Finally, we identified as being relevant a set of 20 highly interdependent brain volumetric variables. In principle, the role of these variables is not surprising, since comparing brain activity between individuals requires one to pre-process the images by realigning and normalizing them to a standard brain. In other words, the relevance of specific area volumes may simply be a by-product of the standardization process. However, if our finding that brain volumetric variables affect functional networks is replicated in other studies, this poses a problem for future experimental designs. Specifically, groups will not only have to be matched by variables such as age, gender and education level, but also in terms of volumetric variables, which can only be observed in the scanner. Therefore, several individuals would have to be scanned before selecting the final study groups.

In sum, a large number of subjects in each group must be tested to obtain highly reproducible findings when analysing resting-state data with network methodologies. Also, whenever possible, the same participants should be tested both as controls and as the treatment group (paired samples) in order to minimize the impact of brain volumetric variables.

Deco, G. & Kringelbach, M. L. Great expectations: using whole-brain computational connectomics for understanding neuropsychiatric disorders. Neuron 84 , 892–905 (2014).

Article   CAS   PubMed   Google Scholar  

Stephan, K. E., Iglesias, S., Heinzle, J. & Diaconescu, A. O. Translational perspectives for computational neuroimaging. Neuron 87 , 716–732 (2015).

Bullmore, E. & Sporns, O. Complex brain networks: network theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10 , 186–196 (2009).

Fornito, A., Zalesky, A. & Bullmore, E. Fundamentals of Brain Network Analysis. Elsevier .

Anonymous Focus on human brain mapping. Nat. Neurosci. 20 , 297–298 (2017).

Article   Google Scholar  

Button, K. S. et al . Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14 , 365–376 (2013).

Poldrack, R. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat. Rev. Neurosci. 18 , 115–126 (2017).

Nichols, T. E. et al . Best Practices in Data Analysis and Sharing in Neuroimaging using MRI. Nat. Neurosci. 20 , 299–303 (2016).

Bennett, C. M. & Miller, M. B. How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences 1191 , 133–155 (2010).

Article   ADS   PubMed   Google Scholar  

Brown, E. N. & Behrmann, M. Controversy in statistical analysis of functional magnetic resonance imaging data. Proc Natl Acad Sci USA 114 , E3368–E3369 (2017).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Fraiman, D., Fraiman, N. & Fraiman, R. Non Parametric Statistics of Dynamic Networks with distinguishable nodes. Test 26 , 546?573 (2017).

Article   MATH   Google Scholar  

Cerqueira, A., Fraiman, D., Vargas, C. & Leonardi, F. A test of hypotheses for random graph distributions built from EEGdata. IEEE Transactions on Network Science and Engineering 4 , 75–82 (2017).

Article   MathSciNet   Google Scholar  

Kolar, M., Song, L., Ahmed, A. & Xing, E. Estimating Time-varying networks. Ann. Appl. Stat. Estimating Time-varying networks. 4 , 94–123 (2010).

Google Scholar  

Zalesky, A., Fornito, A. & Bullmore, E. Network-based statistic: identifying differences in brain networks. Neuroimage 53 , 1197–1207 (2010).

Article   PubMed   Google Scholar  

Sanfeliu, A. & Fu, K. A distance measure between attributed relational graphs. IEEE T. Sys. Man. Cyb. 13 , 353–363 (1983).

Schieber, T. et al . Quantification of network structural dissimilarities. Nature communications 8 , 13928 (2017).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Shimada, Y., Hirata, Y., Ikeguchi, T. & Aihara, K. Graph distance for complex networks. Scientific reports 6 , 34944 (2016).

Gao, X., Xiao, B., Tao, D. & Li, X. A survey of graph edit distance. Pattern Anal Appl 13 , 113–129 (2010).

Zalesky, A., Cocchi, L., Fornito, A., Murray, M. & Bullmore, E. Connectivity differences in brain networks. Neuroimage 60 , 1055–1062 (2012).

Della-Maggiore, V., Villalta, J. I., Kovacevic, N. & McIntosh, A. R. Functional Evidence for Memory Stabilization in Sensorimotor Adaptation: A 24-h Resting-State fMRI Study. Cerebral Cortex 27 , 1748–1757 (2015).

Mawase, F., Bar-Haim, S. & Shmuelof, L. Formation of Long-Term Locomotor Memories Is Associated with Functional Connectivity Changes in the Cerebellar?Thalamic?Cortical Network. Journal of Neuroscience 37 , 349–361 (2017).

Fraiman, D. & Chialvo, D. What kind of noise is brain noise: anomalous scaling behavior of the resting brain activity fluctuations. Frontiers in Physiology 3 , 307 (2012).

Article   PubMed   PubMed Central   Google Scholar  

Garcia-Cordero, I. et al . Stroke and neurodegeneration induce different connectivity aberrations in the insula. Stroke 46 , 2673–2677 (2015).

Fraiman, D. et al . Reduced functional connectivity within the primary motor cortex of patients with brachial plexus injury. Neuroimage Clinical 12 , 277–284 (2016).

Dottori, M. et al . Towards affordable biomarkers of frontotemporal dementia: A classification study via network’s information sharing. Scientific Reports 7 , 3822 (2017).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Human Connectome Project. http://www.humanconnectomeproject.org/

Kaufmann, T. et al . The brain functional connectome is robustly altered by lack of sleep. NeuroImage 127 , 324–332 (2016).

Krause, A. et al . The sleep-deprived human brain. Nature Reviews Neuroscience 18 , 404–418 (2017).

Smith, S. et al . A positive-negative mode of population covariation links brain connectivity, demographics and behavior. Nature neuroscience 18 , 1565–1567 (2015).

Human Connectome Project. WU-Minn HCP 900 Subjects Data Release: Reference Manual. 67–87 (2015).

Griffanti, L. et al . ICA-based artefact removal and accelerated fMRI acquisition for improved resting state network imaging. Neuroimage 95 , 232–247 (2014).

Smith, S. M. et al . Resting-state fMRI in the Human Connectome Project. Neuroimage 80 , 144–168 (2013).

Beckmann, C., DeLuca, M., Devlin, J. & Smith, S. Investigations into resting-state connectivity using independent component analysis. Philosophical Transactions of the Royal Society of London B: Biological Sciences 360 , 1001–1013 (2005).

Fraiman, D., Saunier, G., Martins, E. & Vargas, C. Biological Motion Coding in the Brain: Analysis of Visually Driven EEG Functional Networks. PloS One , 0084612 (2014).

Amoruso, L. et al . Brain network organization predicts style-specific expertise during Tango dance observation. Neuroimage 146 , 690–700 (2017).

van den Heuvel, M. P. et al . Proportional thresholding in resting-state fMRI functional connectivity networks and consequences for patient-control connectome studies: Issues and recommendations. Neuroimage 152 , 437–449 (2017).

Fischl, B. FreeSurfer. Neuroimage 62 , 774–781 (2012).

Buhlmann, P. & van der Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer (2011).

Yoshida, K., Yoshimoto, J. & Doya, K. Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data. BMC Bioinformatics 18 , 108 (2017).

Yamanishi, Y., Vert, J. P., Nakaya, A. & Kanehisa, M. Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics 19 , 323–330 (2003).

Download references

Acknowledgements

We thank two anonymous reviewers for extensive comments that helped improve the manuscript significantly. Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. This paper was produced as part of the activities of FAPESP Research, Innovation and Dissemination Center for Neuromathematics (Grant No. 2013/07699-0, S. Paulo Research Foundation). This work was partially supported by PAI UdeSA.

Author information

Authors and affiliations.

Departamento de Matemática y Ciencias, Universidad de San Andrés, Buenos Aires, Argentina

Daniel Fraiman

Consejo Nacional de Investigaciones Científicas y Tecnológicas, Buenos Aires, Argentina

Centro de Matemática, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay

Ricardo Fraiman

Instituto Pasteur de Montevideo, Montevideo, Uruguay

You can also search for this author in PubMed   Google Scholar

Contributions

D.F. and R.F. conceived the research, analysed the data and wrote the manuscript.

Corresponding author

Correspondence to Daniel Fraiman .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary information, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Fraiman, D., Fraiman, R. An ANOVA approach for statistical comparisons of brain networks. Sci Rep 8 , 4746 (2018). https://doi.org/10.1038/s41598-018-23152-5

Download citation

Received : 13 November 2017

Accepted : 06 March 2018

Published : 16 March 2018

DOI : https://doi.org/10.1038/s41598-018-23152-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research paper on anova

One-Way ANOVA

  • First Online: 13 March 2020

Cite this chapter

research paper on anova

  • Ronald Christensen 5  

Part of the book series: Springer Texts in Statistics ((STS))

3429 Accesses

4 Citations

This chapter considers the analysis of the one-way ANOVA models originally exploited by R.A. Fisher.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and affiliations.

Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM, USA

Ronald Christensen

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ronald Christensen .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Christensen, R. (2020). One-Way ANOVA. In: Plane Answers to Complex Questions. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-32097-3_4

Download citation

DOI : https://doi.org/10.1007/978-3-030-32097-3_4

Published : 13 March 2020

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-32096-6

Online ISBN : 978-3-030-32097-3

eBook Packages : Mathematics and Statistics Mathematics and Statistics (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

The one-way ANOVA test explained

Affiliation.

  • 1 University of Limerick, Limerick, Republic of Ireland.
  • PMID: 37317616
  • DOI: 10.7748/nr.2023.e1885

Background: Quantitative methods and statistical analysis are essential tools in nursing research, as they support researchers testing phenomena, illustrate their findings clearly and accurately, and provide explanation or generalisation of the phenomenon being investigated. The most popular inferential statistics test is the one-way analysis of variance (ANOVA), as it is the test designated for comparing the means of a study's target groups to identify if they are statistically different to the others. However, the nursing literature has identified that statistical tests are not being used correctly and findings are being reported incorrectly.

Aim: To present and explain the one-way ANOVA.

Discussion: The article presents the purpose of inferential statistics and explains one-way ANOVA. It uses relevant examples to examine the steps needed to successfully apply the one-way ANOVA. The authors also provide recommendations for other statistical tests and measurements in parallel to one-way ANOVA.

Conclusion: Nurses need to develop their understanding and knowledge of statistical methods, to engage in research and evidence-based practice.

Implications for practice: This article enhances the understanding and application of one-way ANOVAs by nursing students, novice researchers, nurses and those engaged in academic studies. Nurses, nursing students and nurse researchers need to familiarise themselves with statistical terminology and develop their understanding of statistical concepts, to support evidence-based, quality, safe care.

Keywords: data analysis; quantitative research; research; study design.

©2023 RCN Publishing Company Ltd. All rights reserved. Not to be copied, transmitted or recorded in any way, in whole or part, without prior permission of the publishers.

PubMed Disclaimer

Conflict of interest statement

None declared

Similar articles

  • Decision-tables for choosing commonly applied inferential statistical tests in comparative and correlation studies. Evelyn Malone H, Coyne I. Evelyn Malone H, et al. Nurse Res. 2019 Dec 16;27(4):29-35. doi: 10.7748/nr.2019.e1636. Epub 2019 Oct 5. Nurse Res. 2019. PMID: 31621211
  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
  • Interpretation and use of statistics in nursing research. Giuliano KK, Polanowicz M. Giuliano KK, et al. AACN Adv Crit Care. 2008 Apr-Jun;19(2):211-22. doi: 10.1097/01.AACN.0000318124.33889.6e. AACN Adv Crit Care. 2008. PMID: 18560290 Review.
  • The use of Husserl's phenomenology in nursing research: A discussion paper. Al-Sheikh Hassan M. Al-Sheikh Hassan M. J Adv Nurs. 2023 Aug;79(8):3160-3169. doi: 10.1111/jan.15564. Epub 2023 Jan 31. J Adv Nurs. 2023. PMID: 36718849
  • Effectiveness and experience of arts-based pedagogy among undergraduate nursing students: a mixed methods systematic review. Rieger KL, Chernomas WM, McMillan DE, Morin FL, Demczuk L. Rieger KL, et al. JBI Database System Rev Implement Rep. 2016 Nov;14(11):139-239. doi: 10.11124/JBISRIR-2016-003188. JBI Database System Rev Implement Rep. 2016. PMID: 27941518 Review.
  • Transgenic expression of artificial microRNA targeting soybean mosaic virus P1 gene confers virus resistance in plant. Latif MF, Tan J, Zhang W, Yang W, Zhuang T, Lu W, Qiu Y, Du X, Zhuang X, Zhou T, Kundu JK, Yin J, Xu K. Latif MF, et al. Transgenic Res. 2024 Jun;33(3):149-157. doi: 10.1007/s11248-024-00388-8. Epub 2024 Jun 6. Transgenic Res. 2024. PMID: 38842603
  • Risk factors and healing factors for pharyngocutaneous fistula after total laryngectomy for laryngeal cancer: An epidemiological study. Tai Y, Zang Y, Liu T, Ma J, Qin L, Ji Y, Dai H, Wang G, Ma L, Liu F. Tai Y, et al. Int Wound J. 2024 Apr;21(4):e14706. doi: 10.1111/iwj.14706. Int Wound J. 2024. PMID: 38660912 Free PMC article.
  • A New Cell Model Overexpressing sTGFBR3 for Studying Alzheimer's Disease In vitro . Chen J, Zhou L, Zhao Q, Qi Z. Chen J, et al. Curr Pharm Des. 2024;30(7):552-563. doi: 10.2174/0113816128278324240115104615. Curr Pharm Des. 2024. PMID: 38362698
  • Incomplete insertion of pedicle screws triggers a higher biomechanical risk of screw loosening: mechanical tests and corresponding numerical simulations. Yang JX, Luo L, Liu JH, Wang N, Xi ZP, Li JC. Yang JX, et al. Front Bioeng Biotechnol. 2024 Jan 8;11:1282512. doi: 10.3389/fbioe.2023.1282512. eCollection 2023. Front Bioeng Biotechnol. 2024. PMID: 38260754 Free PMC article.
  • Adaptation of the International Fitness Scale and Self-Perceived Health-Related Physical Fitness Questionnaire into Turkish. İnce Parpucu T, Kıyak G, Taş FU, Usta M, Örsçelik A, Ercan S. İnce Parpucu T, et al. Children (Basel). 2023 Sep 13;10(9):1546. doi: 10.3390/children10091546. Children (Basel). 2023. PMID: 37761507 Free PMC article.
  • Search in MeSH
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

One-way ANOVA | When and How to Use It (With Examples)

Published on March 6, 2020 by Rebecca Bevans . Revised on May 10, 2024.

ANOVA , which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups.

A one-way ANOVA uses one independent variable , while a two-way ANOVA uses two independent variables.

Table of contents

When to use a one-way anova, how does an anova test work, assumptions of anova, performing a one-way anova, interpreting the results, post-hoc testing, reporting the results of anova, other interesting articles, frequently asked questions about one-way anova.

Use a one-way ANOVA when you have collected data about one categorical independent variable and one quantitative dependent variable . The independent variable should have at least three levels (i.e. at least three different groups or categories).

ANOVA tells you if the dependent variable changes according to the level of the independent variable. For example:

  • Your independent variable is social media use , and you assign groups to low , medium , and high levels of social media use to find out if there is a difference in hours of sleep per night .
  • Your independent variable is brand of soda , and you collect data on Coke , Pepsi , Sprite , and Fanta to find out if there is a difference in the price per 100ml .
  • You independent variable is type of fertilizer , and you treat crop fields with mixtures 1 , 2 and 3 to find out if there is a difference in crop yield .

The null hypothesis ( H 0 ) of ANOVA is that there is no difference among group means. The alternative hypothesis ( H a ) is that at least one group differs significantly from the overall mean of the dependent variable.

If you only want to compare two groups, use a t test instead.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

ANOVA determines whether the groups created by the levels of the independent variable are statistically different by calculating whether the means of the treatment levels are different from the overall mean of the dependent variable.

If any of the group means is significantly different from the overall mean, then the null hypothesis is rejected.

ANOVA uses the F test for statistical significance . This allows for comparison of multiple means at once, because the error is calculated for the whole set of comparisons rather than for each individual two-way comparison (which would happen with a t test).

The F test compares the variance in each group mean from the overall group variance. If the variance within groups is smaller than the variance between groups , the F test will find a higher F value, and therefore a higher likelihood that the difference observed is real and not due to chance.

The assumptions of the ANOVA test are the same as the general assumptions for any parametric test:

  • Independence of observations : the data were collected using statistically valid sampling methods , and there are no hidden relationships among observations. If your data fail to meet this assumption because you have a confounding variable that you need to control for statistically, use an ANOVA with blocking variables.
  • Normally-distributed response variable : The values of the dependent variable follow a normal distribution .
  • Homogeneity of variance : The variation within each group being compared is similar for every group. If the variances are different among the groups, then ANOVA probably isn’t the right fit for the data.

While you can perform an ANOVA by hand , it is difficult to do so with more than a few observations. We will perform our analysis in the R statistical program because it is free, powerful, and widely available. For a full walkthrough of this ANOVA example, see our guide to performing ANOVA in R .

The sample dataset from our imaginary crop yield experiment contains data about:

  • fertilizer type (type 1, 2, or 3)
  • planting density (1 = low density, 2 = high density)
  • planting location in the field (blocks 1, 2, 3, or 4)
  • final crop yield (in bushels per acre).

This gives us enough information to run various different ANOVA tests and see which model is the best fit for the data.

For the one-way ANOVA, we will only analyze the effect of fertilizer type on crop yield.

Sample dataset for ANOVA

After loading the dataset into our R environment, we can use the command aov() to run an ANOVA. In this example we will model the differences in the mean of the response variable , crop yield, as a function of type of fertilizer.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

research paper on anova

To view the summary of a statistical model in R, use the summary() function.

The summary of an ANOVA test (in R) looks like this:

One-way ANOVA summary

The ANOVA output provides an estimate of how much variation in the dependent variable that can be explained by the independent variable.

  • The first column lists the independent variable along with the model residuals (aka the model error).
  • The Df column displays the degrees of freedom for the independent variable (calculated by taking the number of levels within the variable and subtracting 1), and the degrees of freedom for the residuals (calculated by taking the total number of observations minus 1, then subtracting the number of levels in each of the independent variables).
  • The Sum Sq column displays the sum of squares (a.k.a. the total variation) between the group means and the overall mean explained by that variable. The sum of squares for the fertilizer variable is 6.07, while the sum of squares of the residuals is 35.89.
  • The Mean Sq column is the mean of the sum of squares, which is calculated by dividing the sum of squares by the degrees of freedom.
  • The F value column is the test statistic from the F test: the mean square of each independent variable divided by the mean square of the residuals. The larger the F value, the more likely it is that the variation associated with the independent variable is real and not due to chance.
  • The Pr(>F) column is the p value of the F statistic. This shows how likely it is that the F value calculated from the test would have occurred if the null hypothesis of no difference among group means were true.

Because the p value of the independent variable, fertilizer, is statistically significant ( p < 0.05), it is likely that fertilizer type does have a significant effect on average crop yield.

ANOVA will tell you if there are differences among the levels of the independent variable, but not which differences are significant. To find how the treatment levels differ from one another, perform a TukeyHSD (Tukey’s Honestly-Significant Difference) post-hoc test.

The Tukey test runs pairwise comparisons among each of the groups, and uses a conservative error estimate to find the groups which are statistically different from one another.

The output of the TukeyHSD looks like this:

Tukey summary one-way ANOVA

First, the table reports the model being tested (‘Fit’). Next it lists the pairwise differences among groups for the independent variable.

Under the ‘$fertilizer’ section, we see the mean difference between each fertilizer treatment (‘diff’), the lower and upper bounds of the 95% confidence interval (‘lwr’ and ‘upr’), and the p value , adjusted for multiple pairwise comparisons.

The pairwise comparisons show that fertilizer type 3 has a significantly higher mean yield than both fertilizer 2 and fertilizer 1, but the difference between the mean yields of fertilizers 2 and 1 is not statistically significant.

When reporting the results of an ANOVA, include a brief description of the variables you tested, the  F value, degrees of freedom, and p values for each independent variable, and explain what the results mean.

If you want to provide more detailed information about the differences found in your test, you can also include a graph of the ANOVA results , with grouping letters above each level of the independent variable to show which groups are statistically different from one another:

One-way ANOVA graph

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Pearson correlation
  • Null hypothesis

Methodology

  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

The only difference between one-way and two-way ANOVA is the number of independent variables . A one-way ANOVA has one independent variable, while a two-way ANOVA has two.

  • One-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka) and race finish times in a marathon.
  • Two-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka), runner age group (junior, senior, master’s), and race finishing times in a marathon.

All ANOVAs are designed to test for differences among three or more groups. If you are only testing for a difference between two groups, use a t-test instead.

A factorial ANOVA is any ANOVA that uses more than one categorical independent variable . A two-way ANOVA is a type of factorial ANOVA.

Some examples of factorial ANOVAs include:

  • Testing the combined effects of vaccination (vaccinated or not vaccinated) and health status (healthy or pre-existing condition) on the rate of flu infection in a population.
  • Testing the effects of marital status (married, single, divorced, widowed), job status (employed, self-employed, unemployed, retired), and family history (no family history, some family history) on the incidence of depression in a population.
  • Testing the effects of feed type (type A, B, or C) and barn crowding (not crowded, somewhat crowded, very crowded) on the final weight of chickens in a commercial farming operation.

In ANOVA, the null hypothesis is that there is no difference among group means. If any group differs significantly from the overall group mean, then the ANOVA will report a statistically significant result.

Significant differences among group means are calculated using the F statistic, which is the ratio of the mean sum of squares (the variance explained by the independent variable) to the mean square error (the variance left over).

If the F statistic is higher than the critical value (the value of F that corresponds with your alpha value, usually 0.05), then the difference among groups is deemed statistically significant.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2024, May 09). One-way ANOVA | When and How to Use It (With Examples). Scribbr. Retrieved August 13, 2024, from https://www.scribbr.com/statistics/one-way-anova/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, two-way anova | examples & when to use it, anova in r | a complete step-by-step guide with examples, guide to experimental design | overview, steps, & examples, what is your plagiarism score.

  • Privacy Policy

Research Method

Home » ANOVA (Analysis of variance) – Formulas, Types, and Examples

ANOVA (Analysis of variance) – Formulas, Types, and Examples

Table of Contents

ANOVA

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means. It is similar to the t-test, but the t-test is generally used for comparing two means, while ANOVA is used when you have more than two means to compare.

ANOVA is based on comparing the variance (or variation) between the data samples to the variation within each particular sample. If the between-group variance is high and the within-group variance is low, this provides evidence that the means of the groups are significantly different.

ANOVA Terminology

When discussing ANOVA, there are several key terms to understand:

  • Factor : This is another term for the independent variable in your analysis. In a one-way ANOVA, there is one factor, while in a two-way ANOVA, there are two factors.
  • Levels : These are the different groups or categories within a factor. For example, if the factor is ‘diet’ the levels might be ‘low fat’, ‘medium fat’, and ‘high fat’.
  • Response Variable : This is the dependent variable or the outcome that you are measuring.
  • Within-group Variance : This is the variance or spread of scores within each level of your factor.
  • Between-group Variance : This is the variance or spread of scores between the different levels of your factor.
  • Grand Mean : This is the overall mean when you consider all the data together, regardless of the factor level.
  • Treatment Sums of Squares (SS) : This represents the between-group variability. It is the sum of the squared differences between the group means and the grand mean.
  • Error Sums of Squares (SS) : This represents the within-group variability. It’s the sum of the squared differences between each observation and its group mean.
  • Total Sums of Squares (SS) : This is the sum of the Treatment SS and the Error SS. It represents the total variability in the data.
  • Degrees of Freedom (df) : The degrees of freedom are the number of values that have the freedom to vary when computing a statistic. For example, if you have ‘n’ observations in one group, then the degrees of freedom for that group is ‘n-1’.
  • Mean Square (MS) : Mean Square is the average squared deviation and is calculated by dividing the sum of squares by the corresponding degrees of freedom.
  • F-Ratio : This is the test statistic for ANOVAs, and it’s the ratio of the between-group variance to the within-group variance. If the between-group variance is significantly larger than the within-group variance, the F-ratio will be large and likely significant.
  • Null Hypothesis (H0) : This is the hypothesis that there is no difference between the group means.
  • Alternative Hypothesis (H1) : This is the hypothesis that there is a difference between at least two of the group means.
  • p-value : This is the probability of obtaining a test statistic as extreme as the one that was actually observed, assuming that the null hypothesis is true. If the p-value is less than the significance level (usually 0.05), then the null hypothesis is rejected in favor of the alternative hypothesis.
  • Post-hoc tests : These are follow-up tests conducted after an ANOVA when the null hypothesis is rejected, to determine which specific groups’ means (levels) are different from each other. Examples include Tukey’s HSD, Scheffe, Bonferroni, among others.

Types of ANOVA

Types of ANOVA are as follows:

One-way (or one-factor) ANOVA

This is the simplest type of ANOVA, which involves one independent variable . For example, comparing the effect of different types of diet (vegetarian, pescatarian, omnivore) on cholesterol level.

Two-way (or two-factor) ANOVA

This involves two independent variables. This allows for testing the effect of each independent variable on the dependent variable , as well as testing if there’s an interaction effect between the independent variables on the dependent variable.

Repeated Measures ANOVA

This is used when the same subjects are measured multiple times under different conditions, or at different points in time. This type of ANOVA is often used in longitudinal studies.

Mixed Design ANOVA

This combines features of both between-subjects (independent groups) and within-subjects (repeated measures) designs. In this model, one factor is a between-subjects variable and the other is a within-subjects variable.

Multivariate Analysis of Variance (MANOVA)

This is used when there are two or more dependent variables. It tests whether changes in the independent variable(s) correspond to changes in the dependent variables.

Analysis of Covariance (ANCOVA)

This combines ANOVA and regression. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative covariates (interval variables) account. This allows the comparison of one variable outcome between groups, while statistically controlling for the effect of other continuous variables that are not of primary interest.

Nested ANOVA

This model is used when the groups can be clustered into categories. For example, if you were comparing students’ performance from different classrooms and different schools, “classroom” could be nested within “school.”

ANOVA Formulas

ANOVA Formulas are as follows:

Sum of Squares Total (SST)

This represents the total variability in the data. It is the sum of the squared differences between each observation and the overall mean.

  • yi represents each individual data point
  • y_mean represents the grand mean (mean of all observations)

Sum of Squares Within (SSW)

This represents the variability within each group or factor level. It is the sum of the squared differences between each observation and its group mean.

  • yij represents each individual data point within a group
  • y_meani represents the mean of the ith group

Sum of Squares Between (SSB)

This represents the variability between the groups. It is the sum of the squared differences between the group means and the grand mean, multiplied by the number of observations in each group.

  • ni represents the number of observations in each group
  • y_mean represents the grand mean

Degrees of Freedom

The degrees of freedom are the number of values that have the freedom to vary when calculating a statistic.

For within groups (dfW):

For between groups (dfB):

For total (dfT):

  • N represents the total number of observations
  • k represents the number of groups

Mean Squares

Mean squares are the sum of squares divided by the respective degrees of freedom.

Mean Squares Between (MSB):

Mean Squares Within (MSW):

F-Statistic

The F-statistic is used to test whether the variability between the groups is significantly greater than the variability within the groups.

If the F-statistic is significantly higher than what would be expected by chance, we reject the null hypothesis that all group means are equal.

Examples of ANOVA

Examples 1:

Suppose a psychologist wants to test the effect of three different types of exercise (yoga, aerobic exercise, and weight training) on stress reduction. The dependent variable is the stress level, which can be measured using a stress rating scale.

Here are hypothetical stress ratings for a group of participants after they followed each of the exercise regimes for a period:

  • Yoga: [3, 2, 2, 1, 2, 2, 3, 2, 1, 2]
  • Aerobic Exercise: [2, 3, 3, 2, 3, 2, 3, 3, 2, 2]
  • Weight Training: [4, 4, 5, 5, 4, 5, 4, 5, 4, 5]

The psychologist wants to determine if there is a statistically significant difference in stress levels between these different types of exercise.

To conduct the ANOVA:

1. State the hypotheses:

  • Null Hypothesis (H0): There is no difference in mean stress levels between the three types of exercise.
  • Alternative Hypothesis (H1): There is a difference in mean stress levels between at least two of the types of exercise.

2. Calculate the ANOVA statistics:

  • Compute the Sum of Squares Between (SSB), Sum of Squares Within (SSW), and Sum of Squares Total (SST).
  • Calculate the Degrees of Freedom (dfB, dfW, dfT).
  • Calculate the Mean Squares Between (MSB) and Mean Squares Within (MSW).
  • Compute the F-statistic (F = MSB / MSW).

3. Check the p-value associated with the calculated F-statistic.

  • If the p-value is less than the chosen significance level (often 0.05), then we reject the null hypothesis in favor of the alternative hypothesis. This suggests there is a statistically significant difference in mean stress levels between the three exercise types.

4. Post-hoc tests

  • If we reject the null hypothesis, we conduct a post-hoc test to determine which specific groups’ means (exercise types) are different from each other.

Examples 2:

Suppose an agricultural scientist wants to compare the yield of three varieties of wheat. The scientist randomly selects four fields for each variety and plants them. After harvest, the yield from each field is measured in bushels. Here are the hypothetical yields:

The scientist wants to know if the differences in yields are due to the different varieties or just random variation.

Here’s how to apply the one-way ANOVA to this situation:

  • Null Hypothesis (H0): The means of the three populations are equal.
  • Alternative Hypothesis (H1): At least one population mean is different.
  • Calculate the Degrees of Freedom (dfB for between groups, dfW for within groups, dfT for total).
  • If the p-value is less than the chosen significance level (often 0.05), then we reject the null hypothesis in favor of the alternative hypothesis. This would suggest there is a statistically significant difference in mean yields among the three varieties.
  • If we reject the null hypothesis, we conduct a post-hoc test to determine which specific groups’ means (wheat varieties) are different from each other.

How to Conduct ANOVA

Conducting an Analysis of Variance (ANOVA) involves several steps. Here’s a general guideline on how to perform it:

  • Null Hypothesis (H0): The means of all groups are equal.
  • Alternative Hypothesis (H1): At least one group mean is different from the others.
  • The significance level (often denoted as α) is usually set at 0.05. This implies that you are willing to accept a 5% chance that you are wrong in rejecting the null hypothesis.
  • Data should be collected for each group under study. Make sure that the data meet the assumptions of an ANOVA: normality, independence, and homogeneity of variances.
  • Calculate the Degrees of Freedom (df) for each sum of squares (dfB, dfW, dfT).
  • Compute the Mean Squares Between (MSB) and Mean Squares Within (MSW) by dividing the sum of squares by the corresponding degrees of freedom.
  • Compute the F-statistic as the ratio of MSB to MSW.
  • Determine the critical F-value from the F-distribution table using dfB and dfW.
  • If the calculated F-statistic is greater than the critical F-value, reject the null hypothesis.
  • If the p-value associated with the calculated F-statistic is smaller than the significance level (0.05 typically), you reject the null hypothesis.
  • If you rejected the null hypothesis, you can conduct post-hoc tests (like Tukey’s HSD) to determine which specific groups’ means (if you have more than two groups) are different from each other.
  • Regardless of the result, report your findings in a clear, understandable manner. This typically includes reporting the test statistic, p-value, and whether the null hypothesis was rejected.

When to use ANOVA

ANOVA (Analysis of Variance) is used when you have three or more groups and you want to compare their means to see if they are significantly different from each other. It is a statistical method that is used in a variety of research scenarios. Here are some examples of when you might use ANOVA:

  • Comparing Groups : If you want to compare the performance of more than two groups, for example, testing the effectiveness of different teaching methods on student performance.
  • Evaluating Interactions : In a two-way or factorial ANOVA, you can test for an interaction effect. This means you are not only interested in the effect of each individual factor, but also whether the effect of one factor depends on the level of another factor.
  • Repeated Measures : If you have measured the same subjects under different conditions or at different time points, you can use repeated measures ANOVA to compare the means of these repeated measures while accounting for the correlation between measures from the same subject.
  • Experimental Designs : ANOVA is often used in experimental research designs when subjects are randomly assigned to different conditions and the goal is to compare the means of the conditions.

Here are the assumptions that must be met to use ANOVA:

  • Normality : The data should be approximately normally distributed.
  • Homogeneity of Variances : The variances of the groups you are comparing should be roughly equal. This assumption can be tested using Levene’s test or Bartlett’s test.
  • Independence : The observations should be independent of each other. This assumption is met if the data is collected appropriately with no related groups (e.g., twins, matched pairs, repeated measures).

Applications of ANOVA

The Analysis of Variance (ANOVA) is a powerful statistical technique that is used widely across various fields and industries. Here are some of its key applications:

Agriculture

ANOVA is commonly used in agricultural research to compare the effectiveness of different types of fertilizers, crop varieties, or farming methods. For example, an agricultural researcher could use ANOVA to determine if there are significant differences in the yields of several varieties of wheat under the same conditions.

Manufacturing and Quality Control

ANOVA is used to determine if different manufacturing processes or machines produce different levels of product quality. For instance, an engineer might use it to test whether there are differences in the strength of a product based on the machine that produced it.

Marketing Research

Marketers often use ANOVA to test the effectiveness of different advertising strategies. For example, a marketer could use ANOVA to determine whether different marketing messages have a significant impact on consumer purchase intentions.

Healthcare and Medicine

In medical research, ANOVA can be used to compare the effectiveness of different treatments or drugs. For example, a medical researcher could use ANOVA to test whether there are significant differences in recovery times for patients who receive different types of therapy.

ANOVA is used in educational research to compare the effectiveness of different teaching methods or educational interventions. For example, an educator could use it to test whether students perform significantly differently when taught with different teaching methods.

Psychology and Social Sciences

Psychologists and social scientists use ANOVA to compare group means on various psychological and social variables. For example, a psychologist could use it to determine if there are significant differences in stress levels among individuals in different occupations.

Biology and Environmental Sciences

Biologists and environmental scientists use ANOVA to compare different biological and environmental conditions. For example, an environmental scientist could use it to determine if there are significant differences in the levels of a pollutant in different bodies of water.

Advantages of ANOVA

Here are some advantages of using ANOVA:

Comparing Multiple Groups: One of the key advantages of ANOVA is the ability to compare the means of three or more groups. This makes it more powerful and flexible than the t-test, which is limited to comparing only two groups.

Control of Type I Error: When comparing multiple groups, the chances of making a Type I error (false positive) increases. One of the strengths of ANOVA is that it controls the Type I error rate across all comparisons. This is in contrast to performing multiple pairwise t-tests which can inflate the Type I error rate.

Testing Interactions: In factorial ANOVA, you can test not only the main effect of each factor, but also the interaction effect between factors. This can provide valuable insights into how different factors or variables interact with each other.

Handling Continuous and Categorical Variables: ANOVA can handle both continuous and categorical variables . The dependent variable is continuous and the independent variables are categorical.

Robustness: ANOVA is considered robust to violations of normality assumption when group sizes are equal. This means that even if your data do not perfectly meet the normality assumption, you might still get valid results.

Provides Detailed Analysis: ANOVA provides a detailed breakdown of variances and interactions between variables which can be useful in understanding the underlying factors affecting the outcome.

Capability to Handle Complex Experimental Designs: Advanced types of ANOVA (like repeated measures ANOVA, MANOVA, etc.) can handle more complex experimental designs, including those where measurements are taken on the same subjects over time, or when you want to analyze multiple dependent variables at once.

Disadvantages of ANOVA

Some limitations or disadvantages that are important to consider:

Assumptions: ANOVA relies on several assumptions including normality (the data follows a normal distribution), independence (the observations are independent of each other), and homogeneity of variances (the variances of the groups are roughly equal). If these assumptions are violated, the results of the ANOVA may not be valid.

Sensitivity to Outliers: ANOVA can be sensitive to outliers. A single extreme value in one group can affect the sum of squares and consequently influence the F-statistic and the overall result of the test.

Dichotomous Variables: ANOVA is not suitable for dichotomous variables (variables that can take only two values, like yes/no or male/female). It is used to compare the means of groups for a continuous dependent variable.

Lack of Specificity: Although ANOVA can tell you that there is a significant difference between groups, it doesn’t tell you which specific groups are significantly different from each other. You need to carry out further post-hoc tests (like Tukey’s HSD or Bonferroni) for these pairwise comparisons.

Complexity with Multiple Factors: When dealing with multiple factors and interactions in factorial ANOVA, interpretation can become complex. The presence of interaction effects can make main effects difficult to interpret.

Requires Larger Sample Sizes: To detect an effect of a certain size, ANOVA generally requires larger sample sizes than a t-test.

Equal Group Sizes: While not always a strict requirement, ANOVA is most powerful and its assumptions are most likely to be met when groups are of equal or similar sizes.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Inferential Statistics

Inferential Statistics – Types, Methods and...

Data Analysis

Data Analysis – Process, Methods and Types

Uniform Histogram

Uniform Histogram – Purpose, Examples and Guide

Phenomenology

Phenomenology – Methods, Examples and Guide

Symmetric Histogram

Symmetric Histogram – Examples and Making Guide

Descriptive Statistics

Descriptive Statistics – Types, Methods and...

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Restor Dent Endod
  • v.39(2); 2014 May

Logo of rde

Statistical notes for clinical researchers: Two-way analysis of variance (ANOVA)-exploring possible interaction between factors

Hae-young kim.

Department of Dental Laboratory Science and Engineering, College of Health Science & Department of Public Health Science, Graduate School, Korea University, Seoul, Korea.

When we have a continuous outcome e.g., bonding strength and two categorical explanatory variables such as 4 different resin types and 2 different curing light sources, usually we consider applying the two-way ANOVA for analyzing the relationships. However because implementing the two-way ANOVA is relatively complicated, some clinical researchers prefer to apply the one-way ANOVA for one factor on each level of the other factor, repeatedly. They often insist that they are interested only in one factor (e.g., manipulation methods) and are not interested in the other factor (e.g., brands), claiming that the one-way ANOVA is the more appropriate strategy. Even though the trial with a variety of brands may be considered as a simple way of generalization among various brand types, possible different effects of materials of different brands can never be detected by the one-way ANOVA. Actually materials of different brands may have slightly different ingredient compositions which may elicit different effects on the other factor. Application of the one-way ANOVA cannot detect the possible interaction between two explanatory categories.

Table 1 shows a data of bonding strength of four types of resin (A, B, C and D) on the teeth surface with a simultaneous use of two different curing light sources (Halogen, LED). The highest overall bonding strength is found on resin D followed by resin C and resin B showing insignificant differences (see the superscript a, b, and c). Considering cases using the 'Halogen,' resin D is the strongest among four resin types, while resin C shows the highest value when the 'LED' was used. This explicitly shows that the effects of different resin types are not following a similar trend according to different levels of curing methods, 'Halogen' or 'LED'. Figure 1a shows that trend of (descriptive) mean bonding strengths for resin types are changing according to the levels of curing methods.

An external file that holds a picture, illustration, etc.
Object name is rde-39-143-g001.jpg

Graphs for bonding strength by resin materials (A, B, C, and D) displayed as separated lines of different curing light sources (Halogen & LED): (a) Descriptive means; (b) Estimated means by the model with the interaction term; (c) Estimated means by the model without the interaction term between two factors.

Measurements of bonding strength (Mpa) according to four different types of resin and two curing methods

An external file that holds a picture, illustration, etc.
Object name is rde-39-143-i001.jpg

* Different alphabets mean significantly different values at a type one error rate of 0.05.

By the independent t -test: p -value (Halogen vs. LED) = 0.498.

By the One-way ANOVA (A vs. B vs. C vs. D): p -value (all methods) < 0.001, p -value (only Halogen) < 0.001, p -value (only LED) = 0.025.

Interaction model or main effect (no-interaction) model?

When we have a quantitative continuous outcome and two categorical explanatory variables, we may consider two kinds of relationship between two categorical variables, which could be typically seen in the Figures 1b and 1c . The Figure 1c shows that the relative effect of each level in the material category doesn't change with different levels of curing methods, which means an additive relationship between the two categorical variables, i.e. the second categorical variable takes a role of adding a uniform effect on the relationship between the outcome and the first categorical variable. In this relationship we can distinguish effect of one factor from that of the other factor. This type of model is called a main effect model or no-interaction model. However, Figure 1b shows that the effect of material depends on the levels of the curing methods and we cannot tell effect of one factor separately, i.e. which light source does produce stronger bonding? This is called an interaction model because an interaction relationship is included. We may see that the interaction model could easily reproduce the actual relationship among descriptive means, as seen in Figure 1a . Therefore generally the first step in application of the two-way ANOVA is fitting the interaction model, specified as the "Full factorial model" ( Part A , d-1 , below) and test the significance of the interaction term. The resulting ANOVA table of two-way ANOVA interaction model is shown in Table 2 and g-1 (below) and we could find the interaction term (Light*Resin) is statistically significant at an alpha level of 0.05 ( p < 0.001). As an effect of a level of one variable depends on levels of the other variable, we cannot separate the effects of two variables, neither independent effect of resin types nor independent effect of curing light. The levels of two categorical variables should combined into a total of eight categories (2 levels of Light * 4 levels of Resin) and the post-hoc multiple comparisons may be implemented among the eight categories as if they consist of one (combined) factor (shown in h-1 ①; variable name = 'light_mat'). Table 3 provides the complete report of analyses results from the interaction model as well as post-hoc multiple comparisons. The plot in Figure 1 may be displayed by requesting plots ( e , below). The underlying assumptions of the two-way ANOVA model are the same with those of the one-way ANOVA, normal distribution of outcomes and equal variances. The assumption of normality should be checked in an exploratory procedure and the assumption of equal variances may be tested as the homogeneity test for the null hypothesis of equal variances for all groups, as shown in the procedure f , below.

The ANOVA table from the two-way ANOVA considering two factors with the interaction term (correct)

An external file that holds a picture, illustration, etc.
Object name is rde-39-143-i002.jpg

Comparative mean bonding strength under the two-way ANOVA model with the interaction model (correct, g-1)

An external file that holds a picture, illustration, etc.
Object name is rde-39-143-i003.jpg

p -value (model) < 0.001; p -value (light) = 0.297; p -value (resin) < 0.001; p -value (resin*light) < 0.001; R-square = 0.61.

On the other hand, if we have an insignificant interaction term, different from the results above, we consider a main effect (no-interaction) model as shown in Part B below, which may be actually incorrect in modeling this data. Table 4 shows the comprehensive results of the analyses based on the main effect model, although the model is actually inadequate because it doesn't fit the data well. The superscripts represent statistical differences among levels of Resin types only because the effect of Light was insignificant ( p = 0.412). You may add superscripts of upper cases to represent statistical differences in the levels of the Light variable if the Light is significant.

Comparative mean bonding strength under the two-way ANOVA using the main effect model (no-interaction model, incorrect † , g-2)

An external file that holds a picture, illustration, etc.
Object name is rde-39-143-i004.jpg

† This table simply shows how to report results of the main effect model, only for the purpose of illustration. Actually the Table 3 displays the correct results which reflect given data well.

p -value (model) < 0.001; R-square = 0.35.

The two-way ANOVA with interaction term using the SPSS statistical package (SPSS Inc., Chicago, IL, USA) as following procedures:

An external file that holds a picture, illustration, etc.
Object name is rde-39-143-g002.jpg

Hypothesis Testing - Analysis of Variance (ANOVA)

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

research paper on anova

Introduction

This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific test considered here is called analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups. For example, in some clinical trials there are more than two comparison groups. In a clinical trial to evaluate a new medication for asthma, investigators might compare an experimental medication to a placebo and to a standard treatment (i.e., a medication currently being used). In an observational study such as the Framingham Heart Study, it might be of interest to compare mean blood pressure or mean cholesterol levels in persons who are underweight, normal weight, overweight and obese.  

The technique to test for a difference in more than two independent means is an extension of the two independent samples procedure discussed previously which applies when there are exactly two independent comparison groups. The ANOVA technique applies when there are two or more than two independent groups. The ANOVA procedure is used to compare the means of the comparison groups and is conducted using the same five step approach used in the scenarios discussed in previous sections. Because there are more than two groups, however, the computation of the test statistic is more involved. The test statistic must take into account the sample sizes, sample means and sample standard deviations in each of the comparison groups.

If one is examining the means observed among, say three groups, it might be tempting to perform three separate group to group comparisons, but this approach is incorrect because each of these comparisons fails to take into account the total data, and it increases the likelihood of incorrectly concluding that there are statistically significate differences, since each comparison adds to the probability of a type I error. Analysis of variance avoids these problemss by asking a more global question, i.e., whether there are significant differences among the groups, without addressing differences between any two groups in particular (although there are additional tests that can do this if the analysis of variance indicates that there are differences among the groups).

The fundamental strategy of ANOVA is to systematically examine variability within groups being compared and also examine variability among the groups being compared.

Learning Objectives

After completing this module, the student will be able to:

  • Perform analysis of variance by hand
  • Appropriately interpret results of analysis of variance tests
  • Distinguish between one and two factor analysis of variance tests
  • Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

The ANOVA Approach

Consider an example with four independent groups and a continuous outcome measure. The independent groups might be defined by a particular characteristic of the participants such as BMI (e.g., underweight, normal weight, overweight, obese) or by the investigator (e.g., randomizing participants to one of four competing treatments, call them A, B, C and D). Suppose that the outcome is systolic blood pressure, and we wish to test whether there is a statistically significant difference in mean systolic blood pressures among the four groups. The sample data are organized as follows:

 

n

n

n

n

s

s

s

s

The hypotheses of interest in an ANOVA are as follows:

  • H 0 : μ 1 = μ 2 = μ 3 ... = μ k
  • H 1 : Means are not all equal.

where k = the number of independent comparison groups.

In this example, the hypotheses are:

  • H 0 : μ 1 = μ 2 = μ 3 = μ 4
  • H 1 : The means are not all equal.

The null hypothesis in ANOVA is always that there is no difference in means. The research or alternative hypothesis is always that the means are not all equal and is usually written in words rather than in mathematical symbols. The research hypothesis captures any difference in means and includes, for example, the situation where all four means are unequal, where one is different from the other three, where two are different, and so on. The alternative hypothesis, as shown above, capture all possible situations other than equality of all means specified in the null hypothesis.

Test Statistic for ANOVA

The test statistic for testing H 0 : μ 1 = μ 2 = ... =   μ k is:

and the critical value is found in a table of probability values for the F distribution with (degrees of freedom) df 1 = k-1, df 2 =N-k. The table can be found in "Other Resources" on the left side of the pages.

NOTE: The test statistic F assumes equal variability in the k populations (i.e., the population variances are equal, or s 1 2 = s 2 2 = ... = s k 2 ). This means that the outcome is equally variable in each of the comparison populations. This assumption is the same as that assumed for appropriate use of the test statistic to test equality of two independent means. It is possible to assess the likelihood that the assumption of equal variances is true and the test can be conducted in most statistical computing packages. If the variability in the k comparison groups is not similar, then alternative techniques must be used.

The F statistic is computed by taking the ratio of what is called the "between treatment" variability to the "residual or error" variability. This is where the name of the procedure originates. In analysis of variance we are testing for a difference in means (H 0 : means are all equal versus H 1 : means are not all equal) by evaluating variability in the data. The numerator captures between treatment variability (i.e., differences among the sample means) and the denominator contains an estimate of the variability in the outcome. The test statistic is a measure that allows us to assess whether the differences among the sample means (numerator) are more than would be expected by chance if the null hypothesis is true. Recall in the two independent sample test, the test statistic was computed by taking the ratio of the difference in sample means (numerator) to the variability in the outcome (estimated by Sp).  

The decision rule for the F test in ANOVA is set up in a similar way to decision rules we established for t tests. The decision rule again depends on the level of significance and the degrees of freedom. The F statistic has two degrees of freedom. These are denoted df 1 and df 2 , and called the numerator and denominator degrees of freedom, respectively. The degrees of freedom are defined as follows:

df 1 = k-1 and df 2 =N-k,

where k is the number of comparison groups and N is the total number of observations in the analysis.   If the null hypothesis is true, the between treatment variation (numerator) will not exceed the residual or error variation (denominator) and the F statistic will small. If the null hypothesis is false, then the F statistic will be large. The rejection region for the F test is always in the upper (right-hand) tail of the distribution as shown below.

Rejection Region for F   Test with a =0.05, df 1 =3 and df 2 =36 (k=4, N=40)

Graph of rejection region for the F statistic with alpha=0.05

For the scenario depicted here, the decision rule is: Reject H 0 if F > 2.87.

The ANOVA Procedure

We will next illustrate the ANOVA procedure using the five step approach. Because the computation of the test statistic is involved, the computations are often organized in an ANOVA table. The ANOVA table breaks down the components of variation in the data into variation between treatments and error or residual variation. Statistical computing packages also produce ANOVA tables as part of their standard output for ANOVA, and the ANOVA table is set up as follows: 

Source of Variation

Sums of Squares (SS)

Degrees of Freedom (df)

Mean Squares (MS)

F

Between Treatments

k-1

Error (or Residual)

N-k

Total

N-1

where  

  • X = individual observation,
  • k = the number of treatments or independent comparison groups, and
  • N = total number of observations or total sample size.

The ANOVA table above is organized as follows.

  • The first column is entitled "Source of Variation" and delineates the between treatment and error or residual variation. The total variation is the sum of the between treatment and error variation.
  • The second column is entitled "Sums of Squares (SS)" . The between treatment sums of squares is

and is computed by summing the squared differences between each treatment (or group) mean and the overall mean. The squared differences are weighted by the sample sizes per group (n j ). The error sums of squares is:

and is computed by summing the squared differences between each observation and its group mean (i.e., the squared differences between each observation in group 1 and the group 1 mean, the squared differences between each observation in group 2 and the group 2 mean, and so on). The double summation ( SS ) indicates summation of the squared differences within each treatment and then summation of these totals across treatments to produce a single value. (This will be illustrated in the following examples). The total sums of squares is:

and is computed by summing the squared differences between each observation and the overall sample mean. In an ANOVA, data are organized by comparison or treatment groups. If all of the data were pooled into a single sample, SST would reflect the numerator of the sample variance computed on the pooled or total sample. SST does not figure into the F statistic directly. However, SST = SSB + SSE, thus if two sums of squares are known, the third can be computed from the other two.

  • The third column contains degrees of freedom . The between treatment degrees of freedom is df 1 = k-1. The error degrees of freedom is df 2 = N - k. The total degrees of freedom is N-1 (and it is also true that (k-1) + (N-k) = N-1).
  • The fourth column contains "Mean Squares (MS)" which are computed by dividing sums of squares (SS) by degrees of freedom (df), row by row. Specifically, MSB=SSB/(k-1) and MSE=SSE/(N-k). Dividing SST/(N-1) produces the variance of the total sample. The F statistic is in the rightmost column of the ANOVA table and is computed by taking the ratio of MSB/MSE.  

A clinical trial is run to compare weight loss programs and participants are randomly assigned to one of the comparison programs and are counseled on the details of the assigned program. Participants follow the assigned program for 8 weeks. The outcome of interest is weight loss, defined as the difference in weight measured at the start of the study (baseline) and weight measured at the end of the study (8 weeks), measured in pounds.  

Three popular weight loss programs are considered. The first is a low calorie diet. The second is a low fat diet and the third is a low carbohydrate diet. For comparison purposes, a fourth group is considered as a control group. Participants in the fourth group are told that they are participating in a study of healthy behaviors with weight loss only one component of interest. The control group is included here to assess the placebo effect (i.e., weight loss due to simply participating in the study). A total of twenty patients agree to participate in the study and are randomly assigned to one of the four diet groups. Weights are measured at baseline and patients are counseled on the proper implementation of the assigned diet (with the exception of the control group). After 8 weeks, each patient's weight is again measured and the difference in weights is computed by subtracting the 8 week weight from the baseline weight. Positive differences indicate weight losses and negative differences indicate weight gains. For interpretation purposes, we refer to the differences in weights as weight losses and the observed weight losses are shown below.

Low Calorie

Low Fat

Low Carbohydrate

Control

8

2

3

2

9

4

5

2

6

3

4

-1

7

5

2

0

3

1

3

3

Is there a statistically significant difference in the mean weight loss among the four diets?  We will run the ANOVA using the five-step approach.

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ 1 = μ 2 = μ 3 = μ 4 H 1 : Means are not all equal              α=0.05

  • Step 2. Select the appropriate test statistic.  

The test statistic is the F statistic for ANOVA, F=MSB/MSE.

  • Step 3. Set up decision rule.  

The appropriate critical value can be found in a table of probabilities for the F distribution(see "Other Resources"). In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k. In this example, df 1 =k-1=4-1=3 and df 2 =N-k=20-4=16. The critical value is 3.24 and the decision rule is as follows: Reject H 0 if F > 3.24.

  • Step 4. Compute the test statistic.  

To organize our computations we complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean based on the total sample.  

 

Low Calorie

Low Fat

Low Carbohydrate

Control

n

5

5

5

5

Group mean

6.6

3.0

3.4

1.2

We can now compute

So, in this case:

Next we compute,

SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants in the low calorie diet:  

6.6

8

1.4

2.0

9

2.4

5.8

6

-0.6

0.4

7

0.4

0.2

3

-3.6

13.0

Totals

0

21.4

For the participants in the low fat diet:  

3.0

2

-1.0

1.0

4

1.0

1.0

3

0.0

0.0

5

2.0

4.0

1

-2.0

4.0

Totals

0

10.0

For the participants in the low carbohydrate diet:  

3

-0.4

0.2

5

1.6

2.6

4

0.6

0.4

2

-1.4

2.0

3

-0.4

0.2

Totals

0

5.4

For the participants in the control group:

2

0.8

0.6

2

0.8

0.6

-1

-2.2

4.8

0

-1.2

1.4

3

1.8

3.2

Totals

0

10.6

We can now construct the ANOVA table .

Source of Variation

Sums of Squares

(SS)

Degrees of Freedom

(df)

Means Squares

(MS)

F

Between Treatmenst

75.8

4-1=3

75.8/3=25.3

25.3/3.0=8.43

Error (or Residual)

47.4

20-4=16

47.4/16=3.0

Total

123.2

20-1=19

  • Step 5. Conclusion.  

We reject H 0 because 8.43 > 3.24. We have statistically significant evidence at α=0.05 to show that there is a difference in mean weight loss among the four diets.    

ANOVA is a test that provides a global assessment of a statistical difference in more than two independent means. In this example, we find that there is a statistically significant difference in mean weight loss among the four diets considered. In addition to reporting the results of the statistical test of hypothesis (i.e., that there is a statistically significant difference in mean weight losses at α=0.05), investigators should also report the observed sample means to facilitate interpretation of the results. In this example, participants in the low calorie diet lost an average of 6.6 pounds over 8 weeks, as compared to 3.0 and 3.4 pounds in the low fat and low carbohydrate groups, respectively. Participants in the control group lost an average of 1.2 pounds which could be called the placebo effect because these participants were not participating in an active arm of the trial specifically targeted for weight loss. Are the observed weight losses clinically meaningful?

Another ANOVA Example

Calcium is an essential mineral that regulates the heart, is important for blood clotting and for building healthy bones. The National Osteoporosis Foundation recommends a daily calcium intake of 1000-1200 mg/day for adult men and women. While calcium is contained in some foods, most adults do not get enough calcium in their diets and take supplements. Unfortunately some of the supplements have side effects such as gastric distress, making them difficult for some patients to take on a regular basis.  

 A study is designed to test whether there is a difference in mean daily calcium intake in adults with normal bone density, adults with osteopenia (a low bone density which may lead to osteoporosis) and adults with osteoporosis. Adults 60 years of age with normal bone density, osteopenia and osteoporosis are selected at random from hospital records and invited to participate in the study. Each participant's daily calcium intake is measured based on reported food intake and supplements. The data are shown below.   

1200

1000

890

1000

1100

650

980

700

1100

900

800

900

750

500

400

800

700

350

Is there a statistically significant difference in mean calcium intake in patients with normal bone density as compared to patients with osteopenia and osteoporosis? We will run the ANOVA using the five-step approach.

H 0 : μ 1 = μ 2 = μ 3 H 1 : Means are not all equal                            α=0.05

In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k.   In this example, df 1 =k-1=3-1=2 and df 2 =N-k=18-3=15. The critical value is 3.68 and the decision rule is as follows: Reject H 0 if F > 3.68.

To organize our computations we will complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean.  

Normal Bone Density

n =6

n =6

n =6

 If we pool all N=18 observations, the overall mean is 817.8.

We can now compute:

Substituting:

SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants with normal bone density:

1200

261.6667

68,486.9

1000

61.6667

3,806.9

980

41.6667

1,738.9

900

-38.3333

1,466.9

750

-188.333

35,456.9

800

-138.333

19,126.9

Total

0

130,083.3

For participants with osteopenia:

1000

200

40,000

1100

300

90,000

700

-100

10,000

800

0

0

500

-300

90,000

700

-100

10,000

Total

0

240,000

For participants with osteoporosis:

890

175

30,625

650

-65

4,225

1100

385

148,225

900

185

34,225

400

-315

99,225

350

-365

133,225

Total

0

449,750

Between Treatments

152,477.7

2

76,238.6

1.395

Error or Residual

819,833.3

15

54,655.5

Total

972,311.0

17

We do not reject H 0 because 1.395 < 3.68. We do not have statistically significant evidence at a =0.05 to show that there is a difference in mean calcium intake in patients with normal bone density as compared to osteopenia and osterporosis. Are the differences in mean calcium intake clinically meaningful? If so, what might account for the lack of statistical significance?

One-Way ANOVA in R

The video below by Mike Marin demonstrates how to perform analysis of variance in R. It also covers some other statistical issues, but the initial part of the video will be useful to you.

Two-Factor ANOVA

The ANOVA tests described above are called one-factor ANOVAs. There is one treatment or grouping factor with k > 2 levels and we wish to compare the means across the different categories of this factor. The factor might represent different diets, different classifications of risk for disease (e.g., osteoporosis), different medical treatments, different age groups, or different racial/ethnic groups. There are situations where it may be of interest to compare means of a continuous outcome across two or more factors. For example, suppose a clinical trial is designed to compare five different treatments for joint pain in patients with osteoarthritis. Investigators might also hypothesize that there are differences in the outcome by sex. This is an example of a two-factor ANOVA where the factors are treatment (with 5 levels) and sex (with 2 levels). In the two-factor ANOVA, investigators can assess whether there are differences in means due to the treatment, by sex or whether there is a difference in outcomes by the combination or interaction of treatment and sex. Higher order ANOVAs are conducted in the same way as one-factor ANOVAs presented here and the computations are again organized in ANOVA tables with more rows to distinguish the different sources of variation (e.g., between treatments, between men and women). The following example illustrates the approach.

Consider the clinical trial outlined above in which three competing treatments for joint pain are compared in terms of their mean time to pain relief in patients with osteoarthritis. Because investigators hypothesize that there may be a difference in time to pain relief in men versus women, they randomly assign 15 participating men to one of the three competing treatments and randomly assign 15 participating women to one of the three competing treatments (i.e., stratified randomization). Participating men and women do not know to which treatment they are assigned. They are instructed to take the assigned medication when they experience joint pain and to record the time, in minutes, until the pain subsides. The data (times to pain relief) are shown below and are organized by the assigned treatment and sex of the participant.

Table of Time to Pain Relief by Treatment and Sex

12

21

15

19

16

18

17

24

14

25

14

21

17

20

19

23

20

27

17

25

25

37

27

34

29

36

24

26

22

29

The analysis in two-factor ANOVA is similar to that illustrated above for one-factor ANOVA. The computations are again organized in an ANOVA table, but the total variation is partitioned into that due to the main effect of treatment, the main effect of sex and the interaction effect. The results of the analysis are shown below (and were generated with a statistical computing package - here we focus on interpretation). 

 ANOVA Table for Two-Factor ANOVA

Model

967.0

5

193.4

20.7

0.0001

Treatment

651.5

2

325.7

34.8

0.0001

Sex

313.6

1

313.6

33.5

0.0001

Treatment * Sex

1.9

2

0.9

0.1

0.9054

Error or Residual

224.4

24

9.4

Total

1191.4

29

There are 4 statistical tests in the ANOVA table above. The first test is an overall test to assess whether there is a difference among the 6 cell means (cells are defined by treatment and sex). The F statistic is 20.7 and is highly statistically significant with p=0.0001. When the overall test is significant, focus then turns to the factors that may be driving the significance (in this example, treatment, sex or the interaction between the two). The next three statistical tests assess the significance of the main effect of treatment, the main effect of sex and the interaction effect. In this example, there is a highly significant main effect of treatment (p=0.0001) and a highly significant main effect of sex (p=0.0001). The interaction between the two does not reach statistical significance (p=0.91). The table below contains the mean times to pain relief in each of the treatments for men and women (Note that each sample mean is computed on the 5 observations measured under that experimental condition).  

Mean Time to Pain Relief by Treatment and Gender

A

14.8

21.4

B

17.4

23.2

C

25.4

32.4

Treatment A appears to be the most efficacious treatment for both men and women. The mean times to relief are lower in Treatment A for both men and women and highest in Treatment C for both men and women. Across all treatments, women report longer times to pain relief (See below).  

Graph of two-factor ANOVA

Notice that there is the same pattern of time to pain relief across treatments in both men and women (treatment effect). There is also a sex effect - specifically, time to pain relief is longer in women in every treatment.  

Suppose that the same clinical trial is replicated in a second clinical site and the following data are observed.

Table - Time to Pain Relief by Treatment and Sex - Clinical Site 2

22

21

25

19

26

18

27

24

24

25

14

21

17

20

19

23

20

27

17

25

15

37

17

34

19

36

14

26

12

29

The ANOVA table for the data measured in clinical site 2 is shown below.

Table - Summary of Two-Factor ANOVA - Clinical Site 2

Source of Variation

Sums of Squares

(SS)

Degrees of freedom

(df)

Mean Squares

(MS)

F

P-Value

Model

907.0

5

181.4

19.4

0.0001

Treatment

71.5

2

35.7

3.8

0.0362

Sex

313.6

1

313.6

33.5

0.0001

Treatment * Sex

521.9

2

260.9

27.9

0.0001

Error or Residual

224.4

24

9.4

Total

1131.4

29

Notice that the overall test is significant (F=19.4, p=0.0001), there is a significant treatment effect, sex effect and a highly significant interaction effect. The table below contains the mean times to relief in each of the treatments for men and women.  

Table - Mean Time to Pain Relief by Treatment and Gender - Clinical Site 2

24.8

21.4

17.4

23.2

15.4

32.4

Notice that now the differences in mean time to pain relief among the treatments depend on sex. Among men, the mean time to pain relief is highest in Treatment A and lowest in Treatment C. Among women, the reverse is true. This is an interaction effect (see below).  

Graphic display of the results in the preceding table

Notice above that the treatment effect varies depending on sex. Thus, we cannot summarize an overall treatment effect (in men, treatment C is best, in women, treatment A is best).    

When interaction effects are present, some investigators do not examine main effects (i.e., do not test for treatment effect because the effect of treatment depends on sex). This issue is complex and is discussed in more detail in a later module. 

  • Announcements
  • AUTHOR'S GUIDELINES

Reporting and Interpreting One-Way Analysis of Variance (ANOVA) Using a Data-Driven Example: A Practical Guide for Social Science Researchers

  • Simon NTUMI University of Education, Winneba, West Africa, Ghana

One-way ( between-groups) analysis of variance (ANOVA) is a statistical tool or procedure used to analyse variation in a response variable (continuous random variable) measured under conditions defined by discrete factors (classification variables, often with nominal levels). The tool is used to detect a difference in means of 3 or more independent groups. It compares the means of the samples or groups in order to make inferences about the population means. It can be construed as an extension of the independent t-test. Given the omnibus nature of ANOVA, it appears that most researchers in social sciences and its related fields have difficulties in reporting and interpreting ANOVA results in their studies. This paper provides detailed processes and steps on how researchers can practically analyse and interpret ANOVA in their research works. The paper expounded that in applying ANOVA in analysis, a researcher must first formulate the null and in other cases alternative hypothesis. After the data have been gathered and cleaned, the researcher must test statistical assumptions to see if the data meet those assumptions. After this, the researcher must then do the necessary statistical computations and calculate the F-ratio (ANOVA result) using a software. To this end, the researcher then compares the critical value of the F-ratio with the table value or simply look at the p -value against the established alpha. If the calculated critical value is greater than the table value, the null hypothesis will be rejected and the alternative hypothesis is upheld.

research paper on anova

  • EndNote - EndNote format (Macintosh & Windows)
  • ProCite - RIS format (Macintosh & Windows)
  • Reference Manager - RIS format (Windows only)

The Copyright Transfer Form to ASERS Publishing (The Publisher) This form refers to the manuscript, which an author(s) was accepted for publication and was signed by all the authors. The undersigned Author(s) of the above-mentioned Paper here transfer any and all copyright-rights in and to The Paper to The Publisher. The Author(s) warrants that The Paper is based on their original work and that the undersigned has the power and authority to make and execute this assignment. It is the author's responsibility to obtain written permission to quote material that has been previously published in any form. The Publisher recognizes the retained rights noted below and grants to the above authors and employers for whom the work performed royalty-free permission to reuse their materials below. Authors may reuse all or portions of the above Paper in other works, excepting the publication of the paper in the same form. Authors may reproduce or authorize others to reproduce the above Paper for the Author's personal use or for internal company use, provided that the source and The Publisher copyright notice are mentioned, that the copies are not used in any way that implies The Publisher endorsement of a product or service of an employer, and that the copies are not offered for sale as such. Authors are permitted to grant third party requests for reprinting, republishing or other types of reuse. The Authors may make limited distribution of all or portions of the above Paper prior to publication if they inform The Publisher of the nature and extent of such limited distribution prior there to. Authors retain all proprietary rights in any process, procedure, or article of manufacture described in The Paper. This agreement becomes null and void if and only if the above paper is not accepted and published by The Publisher, or is with drawn by the author(s) before acceptance by the Publisher.

  • Come and join our team! become an author
  • Soon, we launch the books app stay tune!
  • Online support 24/7 +4077 033 6758
  • Tell Friends and get $5 a small gift for you
  • Privacy Policy
  • Customer Service
  • Refunds Politics

Mail to: [email protected]

Phone: +40754 027 417

IMAGES

  1. (PDF) Understanding one-way ANOVA using conceptual figures

    research paper on anova

  2. (PDF) Methodology and Application of One-way ANOVA

    research paper on anova

  3. (PDF) A reassessment of ANOVA reporting practices: A review of three

    research paper on anova

  4. (PDF) One-Way Analysis of Variance (ANOVA)

    research paper on anova

  5. Analysis of Variance (ANOVA) Explained with Formula, and an Example

    research paper on anova

  6. Hoc Analysis One Way Anova

    research paper on anova

COMMENTS

  1. (PDF) Methodology and Application of One-way ANOVA

    Received October 15, 2013; R evised October 28, 2013; Accepted November 13, 2013. Abstract This paper describes the powerful statistical technique one-way ANOVA that can be used in many. engineeri ...

  2. Understanding one-way ANOVA using conceptual figures

    The present article aims to examine the necessity of using a one-way ANOVA instead of simply repeating the comparisons using Student's t-test. ANOVA literally means analysis of variance, and the present article aims to use a conceptual illustration to explain how the difference in means can be explained by comparing the variances rather by the ...

  3. Application of Student's t-test, Analysis of Variance, and Covariance

    Student's t test (t test), analysis of variance (ANOVA), and analysis of covariance (ANCOVA) are statistical methods used in the testing of hypothesis for comparison of means between the groups.The Student's t test is used to compare the means between two groups, whereas ANOVA is used to compare the means among three or more groups. In ANOVA, first gets a common P value.

  4. An ANOVA approach for statistical comparisons of brain networks

    This paper was produced as part of the activities of FAPESP Research, Innovation and Dissemination Center for Neuromathematics (Grant No. 2013/07699-0, S. Paulo Research Foundation). This work was ...

  5. (PDF) Analysis of Variance: The Fundamental Concepts

    Analysis of variance (ANOVA) [57] is a statistical test used to detect the difference in the mean of the test group when there is a parameter-dependent variable and one or more independent ...

  6. Analysis of variance (ANOVA)

    Throughout the paper the importance of experimental design is emphasized. References are given to ANOVA methods for more complicated models. 1 INTRODUCTION 1.1 General ANOVA (Analysis of Variance) is probably the most widely used statistical method for hypothesis testing currently in use.

  7. (PDF) Data Analysis and Application: One-Way ANOVA

    For data analysis, mixed ANOVA (3*4), 4 one-way ANOVA and 3 ANOVA with repeated measures were used for each group with adjusted Bonferroni. Statistical analysis showed a significant difference in ...

  8. Analysis of Variance ANOVA

    The development of analysis of variance (ANOVA) methodology has in turn had an influenced on the types of experimental research being carried out in many fields. ANOVA is one of the most commonly used statistical techniques, with applications across the full spectrum of experiments in agriculture, biology, chemistry, toxicology, pharmaceutical ...

  9. Anova

    Abstract. Two types of statistical tests are used with normally distributed data, a Student's t-test and an ANOVA. This chapter discusses analysis of variance (ANOVA), and also explains variations to the ANOVA (one-way, two-way, and repeated measures). The one- or two-way ANOVA or rANOVA is most appropriate when certain conditions and ...

  10. PDF Chapter 4 One-Way ANOVA

    One-Way ANOVA. Abstract This chapter considers the analysis of the one-way ANOVA models orig-inally exploited by R.A. Fisher. In this and the following chapters, we apply the general theory of linear models to various special cases. This chapter considers the analysis of one-way ANOVA models. A one-way ANOVA model can be written.

  11. Analysis of variance (ANOVA) comparing means of more than two groups

    The ANOVA method assesses the relative size of variance among group means (between group variance) compared to the average variance within groups (within group variance). Figure 1 shows two comparative cases which have similar 'between group variances' (the same distance among three group means) but have different 'within group variances'. When ...

  12. The one-way ANOVA test explained

    Background: Quantitative methods and statistical analysis are essential tools in nursing research, as they support researchers testing phenomena, illustrate their findings clearly and accurately, and provide explanation or generalisation of the phenomenon being investigated. The most popular inferential statistics test is the one-way analysis of variance (ANOVA), as it is the test designated ...

  13. One-way ANOVA

    Use a one-way ANOVA when you have collected data about one categorical independent variable and one quantitative dependent variable. The independent variable should have at least three levels (i.e. at least three different groups or categories). ANOVA tells you if the dependent variable changes according to the level of the independent variable.

  14. PDF ANOVA Analysis of Student Daily Test Scores in Multi-Day Test Periods

    jority of students take the exam the last day of the testing pe-riod. Te. t score variance for each test day also increases with each test day. One-way ANOVA analysis finds that mean test scores of stud. nts who take the test later in the test period significantly decline. Pairwise comparisons that assume unequal numbers of observations in each ...

  15. PDF Introduction to analysis of variance

    Analysis of variance, often abbreviated to ANOVA, is a powerful statistic and a core technique for testing causality in biological data. Researchers use ANOVA to explain variation in the magnitude of a response variable of interest. For example, an investigator might be interested in the sources of variation in patients' blood cholesterol ...

  16. ANOVA (Analysis of variance)

    The Analysis of Variance (ANOVA) is a powerful statistical technique that is used widely across various fields and industries. Here are some of its key applications: Agriculture. ANOVA is commonly used in agricultural research to compare the effectiveness of different types of fertilizers, crop varieties, or farming methods.

  17. The application of analysis of variance (ANOVA) to different

    Introduction. Analysis of variance (ANOVA) is the most efficient parametric method available for the analysis of data from experiments.It was devised originally to test the differences between several different groups of treatments thus circumventing the problem of making multiple comparisons between the group means using t-tests (Snedecor and Cochran, 1980).

  18. (PDF) One-Way Analysis of Variance (ANOVA)

    As the number of features are only 244 18 and p-values are almost zero, we have not included any features selection algorithm. One-way ANOVA [55] 245 shows whether one or more group on which it is ...

  19. Analysis of Variance

    Analysis of Variance. Analysis of variance (ANOVA) is a statistical technique to analyze variation in a response variable (continuous random variable) measured under conditions defined by discrete factors (classification variables, often with nominal levels). Frequently, we use ANOVA to test equality among several means by comparing variance ...

  20. Statistical notes for clinical researchers: Two-way analysis of

    The resulting ANOVA table of two-way ANOVA interaction model is shown in Table 2 and g-1 (below) and we could find the interaction term (Light*Resin) is statistically significant at an alpha level of 0.05 (p < 0.001). As an effect of a level of one variable depends on levels of the other variable, we cannot separate the effects of two variables ...

  21. Hypothesis Testing

    The hypothesis is based on available information and the investigator's belief about the population parameters. The specific test considered here is called analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups.

  22. Reporting and Interpreting One-Way Analysis of Variance (ANOVA) Using a

    The paper expounded that in applying ANOVA in analysis, a researcher must first formulate the null and in other cases alternative hypothesis. After the data have been gathered and cleaned, the researcher must test statistical assumptions to see if the data meet those assumptions. ... Journal of Research in Educational Sciences, [S.l.], v. 12, n ...

  23. The Complete Guide: How to Report ANOVA Results

    When reporting the results of a one-way ANOVA, we always use the following general structure: A brief description of the independent and dependent variable. The overall F-value of the ANOVA and the corresponding p-value. The results of the post-hoc comparisons (if the p-value was statistically significant). Here's the exact wording we can use: