Proportion
N | DF | Chi-Sq | P-Value |
---|---|---|---|
40 | 3 | 1.8 | 0.615 |
All expected values are at least 5 so we can use the chi-square distribution to approximate the sampling distribution. Our results are \(\chi^2 (3) = 1.8\). \(p = 0.615\). Because our p-value is greater than the standard alpha level of 0.05, we fail to reject the null hypothesis. There is not enough evidence to conclude that the proportions are different in the population.
The example above tested equal population proportions. Minitab also has the ability to conduct a chi-square goodness-of-fit test when the hypothesized population proportions are not all equal. To do this, you can choose to test specified proportions or to use proportions based on historical counts.
Example: tulips.
A company selling tulip bulbs claims they have equal proportions of white, pink, and purple bulbs and that they fill customer orders by randomly selecting bulbs from the population of all of their bulbs.
You ordered 30 bulbs and received 16 white, 8 pink, and 6 purple.
Is there convincing evidence the bulbs you received were not randomly selected from a population with an equal proportion of each color?
Use Minitab to conduct a hypothesis test to address this research question.
We'll go through each of the steps in the hypotheses test:
\(H_0\colon p_{white}=p_{pink}=p_{purple}=\dfrac{1}{3}\) \(H_a\colon\) at least one \(p_i\) is not \(\dfrac{1}{3}\)
All \(p_i\) are \(\frac{1}{3}\). \(30(\frac{1}{3})=10\), thus this assumption is met and we can approximate the sampling distribution using the chi-square distribution.
Let's use Minitab to calculate this.
First, enter the summarized data into a Minitab Worksheet.
C1 | C2 | |
---|---|---|
Color | Count | |
1 | White | 16 |
2 | Pink | 8 |
3 | Purple | 6 |
Category | Observed | Test | Expected | Contribution |
---|---|---|---|---|
White | 16 | 0.333333 | 10 | 3.6 |
Pink | 8 | 0.333333 | 10 | 0.4 |
Purple | 6 | 0.333333 | 10 | 1.6 |
N | DF | Chi-Sq | P-Value |
---|---|---|---|
30 | 2 | 5.6 | 0.061 |
The test statistic is a Chi-Square of 5.6.
The p-value from the output is 0.061.
There is not enough evidence that the tulip bulbs were not randomly selected from a population with equal proportions of white, pink and purple.
Example: roulette.
An American roulette wheel contains 38 slots: 18 red, 18 black, and 2 green. A casino has purchased a new wheel and they want to know if there is convincing evidence that the wheel is unfair. They spin the wheel 100 times and it lands on red 44 times, black 49 times, and green 7 times.
Use Minitab to conduct a hypothesis test to address this question.
If the wheel is 'fair' then the probability of red and black are both 18/38 and the probability of green is 2/38.
\(H_0\colon p_{red}=\dfrac{18}{38}, p_{black}=\dfrac{18}{38}, p_{green}=\dfrac{2}{38}\) \(H_a\colon\) at least one \(p_i\) is not as specified in the null
With n = 100 we meet the assumptions needed to use Chi-square.
C1 | C2 | |
---|---|---|
Color | Count | |
1 | Red | 44 |
2 | Black | 49 |
3 | Green | 7 |
Category | Observed | Historical Counts | Test | Expected | Contribution |
---|---|---|---|---|---|
Red | 44 | 18 | 0.473684 | 47.3684 | 0.239532 |
Black | 49 | 18 | 0.473684 | 47.3684 | 0.056199 |
Green | 7 | 2 | 0.052632 | 5.2632 | 0.573158 |
N | DF | Chi-Sq | P-Value |
---|---|---|---|
100 | 2 | 0.868889 | 0.648 |
The test statistic is a Chi-Square of 0.87.
The p-value from the output is 0.648.
There is not enough evidence to state that this roulette wheel is unfair.
JMP | Statistical Discovery.™ From SAS.
A free online introduction to statistics
What is the chi-square goodness of fit test.
The Chi-square goodness of fit test is a statistical hypothesis test used to determine whether a variable is likely to come from a specified distribution or not. It is often used to evaluate whether sample data is representative of the full population.
You can use the test when you have counts of values for a categorical variable.
Using the chi-square goodness of fit test.
The Chi-square goodness of fit test checks whether your sample data is likely to be from a specific theoretical distribution. We have a set of data values, and an idea about how the data values are distributed. The test gives us a way to decide if the data values have a “good enough” fit to our idea, or if our idea is questionable.
For the goodness of fit test, we need one variable. We also need an idea, or hypothesis, about how that variable is distributed. Here are a couple of examples:
To apply the goodness of fit test to a data set we need:
Let’s use the bags of candy as an example. We collect a random sample of ten bags. Each bag has 100 pieces of candy and five flavors. Our hypothesis is that the proportions of the five flavors in each bag are the same.
Let’s start by answering: Is the Chi-square goodness of fit test an appropriate method to evaluate the distribution of flavors in bags of candy?
Based on the answers above, yes, the Chi-square goodness of fit test is an appropriate method to evaluate the distribution of the flavors in bags of candy.
Figure 1 below shows the combined flavor counts from all 10 bags of candy.
Without doing any statistics, we can see that the number of pieces for each flavor are not the same. Some flavors have fewer than the expected 200 pieces and some have more. But how different are the proportions of flavors? Are the number of pieces “close enough” for us to conclude that across many bags there are the same number of pieces for each flavor? Or are the number of pieces too different for us to draw this conclusion? Another way to phrase this is, do our data values give a “good enough” fit to the idea of equal numbers of pieces of candy for each flavor or not?
To decide, we find the difference between what we have and what we expect. Then, to give flavors with fewer pieces than expected the same importance as flavors with more pieces than expected, we square the difference. Next, we divide the square by the expected count, and sum those values. This gives us our test statistic.
These steps are much easier to understand using numbers from our example.
Let’s start by listing what we expect if each bag has the same number of pieces for each flavor. Above, we calculated this as 200 for 10 bags of candy.
Flavor | Number of Pieces of Candy (10 bags) | Expected Number of Pieces of Candy |
Apple | 180 | 200 |
Lime | 250 | 200 |
Cherry | 120 | 200 |
Cherry | 225 | 200 |
Grape | 225 | 200 |
Now, we find the difference between what we have observed in our data and what we expect. The last column in Table 2 below shows this difference:
Flavor | Number of Pieces of Candy (10 bags) | Expected Number of Pieces of Candy | Observed-Expected |
Apple | 180 | 200 | 180-200 = -20 |
Lime | 250 | 200 | 250-200 = 50 |
Cherry | 120 | 200 | 120-200 = -80 |
Orange | 225 | 200 | 225-200 = 25 |
Grape | 225 | 200 | 225-200 = 25 |
Some of the differences are positive and some are negative. If we simply added them up, we would get zero. Instead, we square the differences. This gives equal importance to the flavors of candy that have fewer pieces than expected, and the flavors that have more pieces than expected.
Flavor | Number of Pieces of Candy (10 bags) | Expected Number of Pieces of Candy | Observed-Expected | Squared Difference |
Apple | 180 | 200 | 180-200 = -20 | 400 |
Lime | 250 | 200 | 250-200 = 50 | 2500 |
Cherry | 120 | 200 | 120-200 = -80 | 6400 |
Orange | 225 | 200 | 225-200 = 25 | 625 |
Grape | 225 | 200 | 225-200 = 25 | 625 |
Next, we divide the squared difference by the expected number:
Flavor | Number of Pieces of Candy (10 bags) | Expected Number of Pieces of Candy | Observed-Expected | Squared Difference | Squared Difference / Expected Number |
Apple | 180 | 200 | 180-200 = -20 | 400 | 400 / 200 = 2 |
Lime | 250 | 200 | 250-200 = 50 | 2500 | 2500 / 200 = 12.5 |
Cherry | 120 | 200 | 120-200 = -80 | 6400 | 6400 / 200 = 32 |
Orange | 225 | 200 | 225-200 = 25 | 625 | 625 / 200 = 3.125 |
Grape | 225 | 200 | 225-200 = 25 | 625 | 625 / 200 = 3.125 |
Finally, we add the numbers in the final column to calculate our test statistic:
$ 2 + 12.5 + 32 + 3.125 + 3.125 = 52.75 $
To draw a conclusion, we compare the test statistic to a critical value from the Chi-Square distribution . This activity involves four steps:
We make a practical conclusion that bags of candy across the full population do not have an equal number of pieces for the five flavors. This makes sense if you look at the original data. If your favorite flavor is Lime, you are likely to have more of your favorite flavor than the other flavors. If your favorite flavor is Cherry, you are likely to be unhappy because there will be fewer pieces of Cherry candy than you expect.
Let’s use a few graphs to understand the test and the results.
A simple bar chart of the data shows the observed counts for the flavors of candy:
Another simple bar chart shows the expected counts of 200 per flavor. This is what our chart would look like if the bags of candy had an equal number of pieces of each flavor.
The side-by-side chart below shows the actual observed number of pieces of candy in blue. The orange bars show the expected number of pieces. You can see that some flavors have more pieces than we expect, and other flavors have fewer pieces.
The statistical test is a way to quantify the difference. Is the actual data from our sample “close enough” to what is expected to conclude that the flavor proportions in the full population of bags are equal? Or not? From the candy data above, most people would say the data is not “close enough” even without a statistical test.
What if your data looked like the example in Figure 5 below instead? The purple bars show the observed counts and the orange bars show the expected counts. Some people would say the data is “close enough” but others would say it is not. The statistical test gives a common way to make the decision, so that everyone makes the same decision on a set of data values.
Let’s look at the candy data and the Chi-square test for goodness of fit using statistical terms. This test is also known as Pearson’s Chi-square test.
Our null hypothesis is that the proportion of flavors in each bag is the same. We have five flavors. The null hypothesis is written as:
$ H_0: p_1 = p_2 = p_3 = p_4 = p_5 $
The formula above uses p for the proportion of each flavor. If each 100-piece bag contains equal numbers of pieces of candy for each of the five flavors, then the bag contains 20 pieces of each flavor. The proportion of each flavor is 20 / 100 = 0.2.
The alternative hypothesis is that at least one of the proportions is different from the others. This is written as:
$ H_a: at\ least\ one\ p_i\ not\ equal $
In some cases, we are not testing for equal proportions. Look again at the example of children's sports teams near the top of this page. Using that as an example, our null and alternative hypotheses are:
$ H_0: p_1 = 0.2, p_2 = 0.65, p_3 = 0.15 $
$ H_a: at\ least\ one\ p_i\ not\ equal\ to\ expected\ value $
Unlike other hypotheses that involve a single population parameter, we cannot use just a formula. We need to use words as well as symbols to describe our hypotheses.
We calculate the test statistic using the formula below:
$ \sum^n_{i=1} \frac{(O_i-E_i)^2}{E_i} $
In the formula above, we have n groups. The $ \sum $ symbol means to add up the calculations for each group. For each group, we do the same steps as in the candy example. The formula shows O i as the Observed value and E i as the Expected value for a group.
We then compare the test statistic to a Chi-square value with our chosen significance level (also called the alpha level) and the degrees of freedom for our data. Using the candy data as an example, we set α = 0.05 and have four degrees of freedom. For the candy data, the Chi-square value is written as:
$ χ²_{0.05,4} $
There are two possible results from our comparison:
Let’s use a graph of the Chi-square distribution to better understand the test results. You are checking to see if your test statistic is a more extreme value in the distribution than the critical value. The distribution below shows a Chi-square distribution with four degrees of freedom. It shows how the critical value of 9.488 “cuts off” 95% of the data. Only 5% of the data is greater than 9.488.
The next distribution plot includes our results. You can see how far out “in the tail” our test statistic is, represented by the dotted line at 52.75. In fact, with this scale, it looks like the curve is at zero where it intersects with the dotted line. It isn’t, but it is very, very close to zero. We conclude that it is very unlikely for this situation to happen by chance. If the true population of bags of candy had equal flavor counts, we would be extremely unlikely to see the results that we collected from our random sample of 10 bags.
Most statistical software shows the p-value for a test. This is the likelihood of finding a more extreme value for the test statistic in a similar sample, assuming that the null hypothesis is correct. It’s difficult to calculate the p-value by hand. For the figure above, if the test statistic is exactly 9.488, then the p - value will be p=0.05. With the test statistic of 52.75, the p - value is very, very small. In this example, most statistical software will report the p - value as “p < 0.0001.” This means that the likelihood of another sample of 10 bags of candy resulting in a more extreme value for the test statistic is less than one chance in 10,000, assuming our null hypothesis of equal counts of flavors is true.
The chi-square goodness-of-fit test can be applied to either a categorical or discrete quantitative variable with a finite number of values. The objective of the chi-square goodness-of-fit test is to test whether the variable does not follow the probability distribution specified in the null hypothesis [latex]H_0[/latex].
The main idea behind the chi-square goodness-of-fit test is to compare the observed frequencies (O) to the expected frequencies ([latex]E[/latex]), which are based on the probability distribution specified in [latex]H_0[/latex]. If [latex]H_0[/latex] is true, the observed and expected frequencies should be reasonably similar. Therefore, we reject [latex]H_0[/latex] if the observed and expected frequencies are very different. The discrepancy between the observed and expected frequencies can be quantified by chi-square statistic
[latex]\chi^2 = \sum_{\text{all cells}} \frac{(O - E)^2}{E}[/latex]
which follows a chi-square distribution with [latex]df = k-1[/latex], where [latex]k[/latex] is the number of possible values for the variable under consideration. The chi-square statistic will be large when the observed and expected frequencies are very different. Thus, we reject the null hypothesis when the chi-square statistic is sufficiently large. More specifically, at the significance level of [latex]\alpha[/latex], we reject [latex]H_0[/latex] if the chi-square statistic is larger than the critical value [latex]\chi_{\alpha}^2[/latex]. Since we only reject [latex]H_0[/latex] if the chi-square statistic is sufficiently large, chi-square tests are always right-tailed. That is, both the rejection region and the p-value are upper-tailed probabilities.
Assumptions :
Note : If assumptions 1 or 2 are violated, one can consider combining the cells to increase the counts in those cells.
Steps to perform a chi-square goodness-of-fit test :
First, check the assumptions. Calculate the expected frequency for each possible value of the variable using [latex]E=np[/latex], where [latex]n[/latex] is the total number of observations and [latex]p[/latex] is the relative frequency (or probability) specified in the null hypothesis. Check whether the expected frequencies satisfy assumptions 1 and 2. If not, consider combining some cells.
Rejection region | [latex]\chi^2 \geq \chi_{\alpha}^2[/latex] the region to the right of [latex]\chi_{\alpha}^2[/latex], the area is [latex]\alpha[/latex] |
---|---|
P-value | [latex]P(\chi^2 \geq \chi_o^2)[/latex] the area to the right of [latex]\chi_o^2[/latex] under the curve |
Example: Chi-Square Goodness-of-Fit Test
According to the results of the federal election in 2015, 31.9% of votes supported the Conservative Party, 39.5% supported the Liberal Party, 19.7% supported the New Democratic Party (NDP), 4.7% supported Bloc Québécois, and 3.4% supported the Green Party (data from Wikipedia). Thirty-seven students in my Stat151 class responded to an online survey and their preferences are summarized in the following table:
Table 11.2 : Voting Preference of the Class
Test at the 5% significance level whether the class had different voting preferences than all Canadians in the 2015 election.
Check the assumptions : since [latex]n = 37[/latex], each expected frequency is computed as [latex]E = np = 37 \times p[/latex]. For example, the expected count of conservative voters is [latex]E = 37 \times 0.319 = 11.803[/latex]. The following table gives all expected counts:
Table 11.3 : Expected Frequency of Voting Preference
Conservative | Green | Liberal | NDP | Bloc Québécois | Others | |
---|---|---|---|---|---|---|
Proportion [latex](p)[/latex] | ||||||
Counts |
There are [latex]k = 6[/latex] cells and at most [latex]6 \times 0.2 = 1.2[/latex] cells are expected to have expected counts less than 5; however, there are actually three cells less than 5. We could combine the cells “Green”, “Bloc Québécois” and “Others”, and name it as “Others”. Therefore, we have the working table as follows.
Table 11.4 : Working Table for a Chi-Square Goodness of Fit Test (Example)
NDP | ||||
Note: After combining the cells, all the expected counts are greater than 1, while 25% of the expected counts are below 5 (the expected count for Others is below 5). Since more than 20% of the expected counts are below 5, there is still a violation in the assumptions. However, the expected frequency for “Others” is 3.293 which is not very far away from 5. To maintain a meaningful number of parties, we proceed to conduct the chi-square goodness-of-fit test.
If using the critical value approach, steps 4–6 are as follows :
Exercise: Chi-square goodness-of-fit test
A company claims their deluxe mixed nuts consist of 20% peanuts, 60% cashews, and 20% almonds. An inspector obtains a random sample of [latex]n = 100[/latex] nuts and observes 30 peanuts, 55 cashews, and 15 almonds. Test at the 5% significance level whether the percentages differ from what the company claims.
Check the assumptions : [latex]n = 100[/latex] and the expected counts are [latex]E_{\text{peanut}} = 100 \times 0.2 = 20, E_{\text{cashew}} = 100 \times 0.6 = 60,[/latex] [latex]E_{\text{almond}} = 100 \times 0.2 = 20[/latex] and all greater than 5.
Table 11.5 : Working Table for Chi-Square Goodness-of-Fit Test (Exercise)
| | [latex]E = np = 100 \times p[/latex] | [latex]\frac{(O-E)^2}{E}[/latex] | |
---|---|---|---|---|
Introduction to Applied Statistics Copyright © 2024 by Wanhua Su is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Goodness-of-fit test, learning outcomes.
In this type of hypothesis test, you determine whether the data “fit” a particular distribution or not. For example, you may suspect your unknown data fit a binomial distribution. You use a chi-square test (meaning the distribution for the hypothesis test is chi-square) to determine if there is a fit or not. The null and the alternative hypotheses for this test may be written in sentences or may be stated as equations or inequalities.
The test statistic for a goodness-of-fit test is: [latex]\displaystyle{\sum_{k}}\frac{{({O}-{E})}^{{2}}}{{E}}[/latex]
The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true. There are n terms of the form [latex]\displaystyle\frac{{({O}-{E})}^{{2}}}{{E}}[/latex].
The number of degrees of freedom is df = (number of categories – 1).
The goodness-of-fit test is almost always right-tailed. If the observed values and the corresponding expected values are not close to each other, then the test statistic can get very large and will be way out in the right tail of the chi-square curve.
Note: The expected value for each cell needs to be at least five in order for you to use this test.
Absenteeism of college students from math classes is a major concern to math instructors because missing class appears to increase the drop rate. Suppose that a study was done to determine if the actual student absenteeism rate follows faculty perception. The faculty expected that a group of 100 students would miss class according to this table.
Number of absences per term | Expected number of students |
---|---|
0–2 | 50 |
3–5 | 30 |
6–8 | 12 |
9–11 | 6 |
12+ | 2 |
A random survey across all mathematics courses was then done to determine the actual number (observed) of absences in a course. The chart in this table displays the results of that survey.
Number of absences per term | Actual number of students |
---|---|
0–2 | 35 |
3–5 | 40 |
6–8 | 20 |
9–11 | 1 |
12+ | 4 |
Determine the null and alternative hypotheses needed to conduct a goodness-of-fit test.
H 0 : Student absenteeism fits faculty perception.
The alternative hypothesis is the opposite of the null hypothesis.
H a : Student absenteeism does not fit faculty perception.
Number of absences per term | Expected number of students |
---|---|
0–2 | 50 |
3–5 | 30 |
6–8 | 12 |
9+ | 8 |
Number of absences per term | Actual number of students |
---|---|
0–2 | 35 |
3–5 | 40 |
6–8 | 20 |
9+ | 5 |
A factory manager needs to understand how many products are defective versus how many are produced. The number of expected defects is listed in the table.
Number produced | Number defective |
---|---|
0–100 | 5 |
101–200 | 6 |
201–300 | 7 |
301–400 | 8 |
401–500 | 10 |
A random sample was taken to determine the actual number of defects. This table shows the results of the survey.
Number produced | Number defective |
---|---|
0–100 | 5 |
101–200 | 7 |
201–300 | 8 |
301–400 | 9 |
401–500 | 11 |
State the null and alternative hypotheses needed to conduct a goodness-of-fit test, and state the degrees of freedom.
H 0 : The number of defaults fits expectations.
H a : The number of defaults does not fit expectations.
Employers want to know which days of the week employees are absent in a five-day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers were asked on which day of the week they had the highest number of employee absences. The results were distributed as in the table below. For the population of employees, do the days for the highest number of absences occur with equal frequencies during a five-day work week? Test at a 5% significance level.
Day of the Week Employees were Most Absent
Monday | Tuesday | Wednesday | Thursday | Friday | |
---|---|---|---|---|---|
Number of Absences | 15 | 12 | 9 | 9 | 15 |
The null and alternative hypotheses are:
If the absent days occur with equal frequencies, then, out of 60 absent days (the total in the sample: 15 + 12 + 9 + 9 + 15 = 60), there would be 12 absences on Monday, 12 on Tuesday, 12 on Wednesday, 12 on Thursday, and 12 on Friday. These numbers are the expected ( E ) values. The values in the table are the observed ( O ) values or data.
This time, calculate the χ 2 test statistic by hand. Make a chart with the following headings and fill in the columns:
Now add (sum) the last column. The sum is three. This is the χ 2 test statistic.
To find the p -value, calculate P ( χ 2 > 3). This test is right-tailed. (Use a computer or calculator to find the p -value. You should get p -value = 0.5578.)
The dfs are the number of cells – 1 = 5 – 1 = 4
Press 2nd DISTR . Arrow down to χ2cdf . Press ENTER . Enter (3,10^99,4) . Rounded to four decimal places, you should see 0.5578, which is the p-value.
Next, complete a graph like the following one with the proper labeling and shading. (You should shade the right tail.)
The decision is not to reject the null hypothesis.
Conclusion: At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.
TI-83+ and some TI-84 calculators do not have a special program for the test statistic for the goodness-of-fit test. The next example has the calculator instructions. The newer TI-84 calculators have in STAT TESTS the test Chi2 GOF . To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF . Enter the list names for the Observed list and the Expected list. Enter the degrees of freedom and press calculate or draw . Make sure you clear any lists before you start. To Clear Lists in the calculators: Go into STAT EDIT and arrow up to the list name area of the particular list. Press CLEAR and then arrow down. The list will be cleared. Alternatively, you can press STAT and press 4 (for ClrList ). Enter the list name and press ENTER .
Teachers want to know which night each week their students are doing most of their homework. Most teachers think that students do homework equally throughout the week. Suppose a random sample of 49 students were asked on which night of the week they did the most homework. The results were distributed as in the table.
Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | |
---|---|---|---|---|---|---|---|
Number of Students | 11 | 8 | 10 | 7 | 10 | 5 | 5 |
From the population of students, do the nights for the highest number of students doing the majority of their homework occur with equal frequencies during a week? What type of hypothesis test should you use?
p -value = 0.6093
We decline to reject the null hypothesis. There is not enough evidence to support that students do not do the majority of their homework equally throughout the week.
One study indicates that the number of televisions that American families have is distributed (this is the given distribution for the American population) as in the table.
Number of Televisions | Percent |
---|---|
0 | 10 |
1 | 16 |
2 | 55 |
3 | 11 |
4+ | 8 |
The table contains expected ( E ) percents.
A random sample of 600 families in the far western United States resulted in the data in this table.
Number of Televisions | Frequency |
---|---|
Total = 600 | |
0 | 66 |
1 | 119 |
2 | 340 |
3 | 60 |
4+ | 15 |
The table contains observed ( O ) frequency values.
At the 1% significance level, does it appear that the distribution “number of televisions” of far western United States families is different from the distribution for the American population as a whole?
This problem asks you to test whether the far western United States families distribution fits the distribution of the American families. This test is always right-tailed.
The first table contains expected percentages. To get expected ( E ) frequencies, multiply the percentage by 600. The expected frequencies are shown in this table.
Number of Televisions | Percent | Expected Frequency |
---|---|---|
0 | 10 | (0.10)(600) = 60 |
1 | 16 | (0.16)(600) = 96 |
2 | 55 | (0.55)(600) = 330 |
3 | 11 | (0.11)(600) = 66 |
over 3 | 8 | (0.08)(600) = 48 |
Therefore, the expected frequencies are 60, 96, 330, 66, and 48. In the TI calculators, you can let the calculator do the math. For example, instead of 60, enter 0.10*600.
H 0 : The “number of televisions” distribution of far western United States families is the same as the “number of televisions” distribution of the American population.
H a : The “number of televisions” distribution of far western United States families is different from the “number of televisions” distribution of the American population.
Distribution for the test: [latex]\displaystyle\chi^{2}_{4}[/latex] where df = (the number of cells) – 1 = 5 – 1 = 4.
Note : [latex]df\neq600-1[/latex]
Calculate the test statistic: χ 2 = 29.65
Probability statement: p -value = P ( χ 2 > 29.65) = 0.000006
Compare α and the p -value:
α = 0.01 p -value = 0.000006
So, α > p -value.
Make a decision: Since α > p -value, reject H o .
This means you reject the belief that the distribution for the far western states is the same as that of the American population as a whole.
Conclusion: At the 1% significance level, from the data, there is sufficient evidence to conclude that the “number of televisions” distribution for the far western United States is different from the “number of televisions” distribution for the American population as a whole.
Press STAT and ENTER . Make sure to clear lists L1 , L2 , and L3 if they have data in them (see the note at the end of Example 2). Into L1 , put the observed frequencies 66 , 119 , 349 , 60 , 15 . Into L2 , put the expected frequencies .10*600, .16*600 , .55*600 , .11*600 , .08*600 . Arrow over to list L3 and up to the name area L3 . Enter (L1-L2)^2/L2 and ENTER . Press 2nd QUIT . Press 2nd LIST and arrow over to MATH . Press 5 . You should see "sum" (Enter L3) . Rounded to 2 decimal places, you should see 29.65 . Press 2nd DISTR . Press 7 or Arrow down to 7:χ2cdf and press ENTER . Enter (29.65,1E99,4) . Rounded to four places, you should see 5.77E-6 = .000006 (rounded to six decimal places), which is the p-value.
The newer TI-84 calculators have in STAT TESTS the test Chi2 GOF . To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF . Enter the list names for the Observed list and the Expected list. Enter the degrees of freedom and press calculate or draw . Make sure you clear any lists before you start.
The expected percentage of the number of pets students have in their homes is distributed (this is the given distribution for the student population of the United States) as in this table.
Number of Pets | Percent |
---|---|
0 | 18 |
1 | 25 |
2 | 30 |
3 | 18 |
4+ | 9 |
A random sample of 1,000 students from the Eastern United States resulted in the data in the table below.
Number of Pets | Frequency |
---|---|
0 | 210 |
1 | 240 |
2 | 320 |
3 | 140 |
4+ | 90 |
At the 1% significance level, does it appear that the distribution “number of pets” of students in the Eastern United States is different from the distribution for the United States student population as a whole? What is the p -value?
Suppose you flip two coins 100 times. The results are 20 HH , 27 HT , 30 TH , and 23 TT . Are the coins fair? Test at a 5% significance level.
This problem can be set up as a goodness-of-fit problem. The sample space for flipping two fair coins is { HH , HT , TH , TT }. Out of 100 flips, you would expect 25 HH , 25 HT , 25 TH , and 25 TT . This is the expected distribution. The question, “Are the coins fair?” is the same as saying, “Does the distribution of the coins (20 HH , 27 HT , 30 TH , 23 TT ) fit the expected distribution?”
Random Variable: Let X = the number of heads in one flip of the two coins. X takes on the values 0, 1, 2. (There are 0, 1, or 2 heads in the flip of two coins.) Therefore, the number of cells is three . Since X = the number of heads, the observed frequencies are 20 (for two heads), 57 (for one head), and 23 (for zero heads or both tails). The expected frequencies are 25 (for two heads), 50 (for one head), and 25 (for zero heads or both tails). This test is right-tailed.
H 0 : The coins are fair.
H a : The coins are not fair.
Distribution for the test: [latex]\chi^2_2[/latex] where df = 3 – 1 = 2.
Calculate the test statistic: χ 2 = 2.14
Probability statement: p -value = P ( χ 2 > 2.14) = 0.3430
α < p -value.
Make a decision: Since α < p -value, do not reject H 0 .
Conclusion: There is insufficient evidence to conclude that the coins are not fair.
Press STAT and ENTER . Make sure you clear lists L1 , L2 , and L3 if they have data in them. Into L1 , put the observed frequencies 20 , 57 , 23 . Into L2 , put the expected frequencies 25 , 50 , 25 . Arrow over to list L3 and up to the name area "L3" . Enter (L1-L2)^2/L2 and ENTER . Press 2nd QUIT . Press 2nd LIST and arrow over to MATH . Press 5 . You should see "sum" . Enter L3 . Rounded to two decimal places, you should see 2.14 . Press 2nd DISTR . Arrow down to 7:χ2cdf (or press 7 ). Press ENTER . Enter 2.14,1E99,2) . Rounded to four places, you should see .3430 , which is the p-value.
Students in a social studies class hypothesize that the literacy rates across the world for every region are 82%. This table shows the actual literacy rates across the world broken down by region. What are the test statistic and the degrees of freedom?
MDG Region | Adult Literacy Rate (%) |
---|---|
Developed Regions | 99.0 |
Commonwealth of Independent States | 99.5 |
Northern Africa | 67.3 |
Sub-Saharan Africa | 62.5 |
Latin America and the Caribbean | 91.0 |
Eastern Asia | 93.8 |
Southern Asia | 61.9 |
South-Eastern Asia | 91.9 |
Western Asia | 84.5 |
Oceania | 66.4 |
chi 2 test statistic = 26.38
Press STAT and ENTER . Make sure you clear lists L1, L2, and L3 if they have data in them. Into L1, put the observed frequencies 99, 99.5, 67.3, 62.5, 91, 93.8, 61.9, 91.9, 84.5, 66.4 . Into L2 , put the expected frequencies 82, 82, 82, 82, 82, 82, 82, 82, 82, 82 . Arrow over to list L3 and up to the name area L3 . Enter (L1-L2)^2/L2 and ENTER . Press 2nd QUIT . Press 2nd LIST and arrow over to MATH . Press 5 . You should see "sum" . Enter L3 . Rounded to two decimal places, you should see 26.38 . Press 2nd DISTR . Arrow down to 7:χ2cdf (or press 7 ). Press ENTER . Enter 26.38,1E99,9) . Rounded to four places, you should see .0018 , which is the p -value.
Statistics By Jim
Making statistics intuitive
By Jim Frost 2 Comments
Goodness of fit evaluates how well observed data align with the expected values from a statistical model.
When diving into statistics , you’ll often ask, “How well does my model fit the data?” A tight fit? Your model’s excellent. A loose fit? Maybe reconsider that model. That’s the essence of goodness of fit. More specifically:
A model that fits the data well provides accurate predictions and deeper insights, while a poor fit can lead to misleading conclusions and predictions. Ensuring a good fit is crucial for reliable outcomes and informed actions.
A goodness of fit measure summarizes the size of the differences between the observed data and the model’s expected values. A goodness of fit test determines whether the differences are statistically significant. Moreover, they can guide us in choosing a model offering better representation. The appropriate goodness of fit measure and test depend on the setting.
In this blog post, you’ll learn about the essence of goodness of fit in the crucial contexts of regression models and probability distributions. We’ll measure it in regression models and learn how to test sample data against distributions using goodness of fit tests.
In regression models, understanding the goodness of fit is crucial to ensure accurate predictions and meaningful insights; here, we’ll delve into key metrics that reveal this alignment with the data.
A regression model fits the data well when the differences between the observed and predicted values are small and unbiased. Statisticians refer to these differences as residuals .
As the goodness of fit increases, the data points move closer to the model’s fitted line.
R-squared is a goodness of fit statistic for linear regression models. It measures the percentage of the dependent variable variation the model explains using a convenient 0 – 100% scale.
R-squared evaluates the spread of the data around the fitted regression line. For a data set, higher R-squared values indicate smaller differences between the sample data and the fitted values.
The model with the wider spread has an R-squared of 15% while the one with the narrower spread is 85%.
Think of R² as the percentage that explains the variation. Higher R²? Better fit.
Remember, it’s not the sole indicator. High R² doesn’t always mean a perfect model!
Learn more about How to Interpret R-squared and Independent vs Dependent Variables .
This standard error of the regression is a goodness of fit measure that provides the typical size of the absolute difference between observed and predicted values. S uses the units of the dependent variable (DV).
Suppose your model uses body mass index (BMI) to predict the body fat percentage (the DV). Consequently, if your model’s S is 3.5, then you know that its predicted values are typically 3.5% from the observed body fat percentage values.
However, don’t view it in isolation. Compare it with the dependent variable’s units for context.
Learn more about the Standard Error of the Regression .
Akaike’s Information Criterion is a goodness of fit measure that statisticians designed to compare models and help you pick the best one. The AIC value isn’t meaningful itself, but you’re looking for the model with the lowest AIC.
Learn why you want a simpler model, which statisticians refer to as a parsimonious model: What is a Parsimonious Model? Benefits & Selecting .
There are other indicators, like Adjusted R² and BIC. Each has its unique strength. But for a start, focus on these three.
Sometimes, your statistical model is that your data follow a particular probability distribution, such as the normal , lognormal , Poisson , or some other distribution. You want to know if your sample’s distribution is consistent with the hypothesized distribution. Learn more about Probability Distributions .
Because many statistical tests and methods rest on distributional assumptions.
For instance, t-tests and ANOVA assume your data are normal. Conversely, you might expect a Poisson distribution if you’re analyzing the number of daily website visits. Capability analysis in the quality arena depends on knowing precisely which distribution your data follow.
Enter goodness of fit tests.
A goodness of fit test determines whether the differences between your sample data and the distribution are statistically significant. In this context, statistical significance indicates the model does not adequately fit the data. The test results can guide the analytical procedures you’ll use.
I’ll cover two of the many available goodness of fit tests. The Anderson-Darling test works for continuous data , and the chi-square goodness of fit test is for categorical and discrete data.
The Anderson-Darling goodness of fit test compares continuous sample data to a particular probability distribution. Statisticians often use it for normality tests, but the Anderson-Darling Test can also assess other probability distributions, making it versatile in statistical analysis.
The hypotheses for the Anderson-Darling test are the following:
When the p-value is less than your significance level , reject the null hypothesis . Consequently, statistically significant results for a goodness of fit test suggest your data do not fit the chosen distribution, prompting further investigation or model adjustments.
Imagine you’re researching the body fat percentages of pre-teen girls, and you want to know if these percentages follow a normal distribution. You can download the CSV data file: body_fat .
After collecting body fat data from 92 girls, you perform the Anderson-Darling Test and obtain the following results.
Because the p-value is less than 0.05, reject the null hypothesis and conclude the sample data do not follow a normal distribution.
Learn how to identify the distribution of this bodyfat dataset using the Anderson-Darling goodness of fit test.
The chi square goodness of fit test reveals if the proportions of a discrete or categorical variable follow a distribution with hypothesized proportions.
Statisticians often use the chi square goodness of fit test to evaluate if the proportions of categorical outcomes are all equal. Or the analyst can list the proportions to use in the test. Alternatively, this test can determine if the observed outcomes fit a discrete probability distribution, like the Poisson distribution.
This goodness of fit test does the following:
Imagine you’re curious about dice fairness. You roll a six-sided die 600 times, expecting each face to come up 100 times if it’s fair.
The observed counts are 90, 110, 95, 105, 95, and 105 for sides 1 through 6. The observed values don’t matched the expected values of 100 for each die face. Let’s run the Chi-square goodness of fit test for these data to see if those differences are statistically significant.
The p-value of 0.700 is greater than 0.05, so you fail to reject the null hypothesis . The observed frequencies don’t differ significantly from the expected frequencies. Your sample data do not support the claim that the die is unfair!
To explore other examples of the chi square test in action, read the following:
Goodness of fit tells the story of your data and its relationship with a model. It’s like a quality check. For regression, R², S, and AIC are great starters. For probability distributions, the Anderson-Darling and Chi-squared goodness of fit tests are go-tos. Dive in, fit that model, and let the data guide you!
October 23, 2023 at 11:02 am
Jim, I have a pricing curve model to estimate the curvature of per unit cost (decrease) as purchased quantity increases. It follows the power law Y=Ax^B.
In my related log-log linear regression, the average residual is $0.00, which makes sense because we kept the Y-intercept in the model. However, in the transformed model in natural units, the residuals no longer average $0.00. Why does that property not carry over to the Y=Ax^B form of regression?
As a side note, I have your book “Regression Analysis,” which I have read several times and learned quite a lot. I believe there are two similar errors in Chapter 13, not related to my question above.
On page 323, when transforming the fitted line in log units back to natural units, the coefficient A in Y=Ax^B should be the common antilog of 0.5758 or 3.7653. Similarly, on page 325, the coefficient A should be the common antilog of 1.879 or 75.6833. This can be visually checked for reasonableness by looking at the graph on page 325. If we look at the x-axis, say at x=1, it appears y should be slightly less than 100. If we evaluate the power expression Y=75.6833x^(-0.6383), the fitted value is 75.68, which seems to be what the graph predicts.
The relevant logarithmic identity is log(ab) = log(a) + log(b). The Y-intercept in the log-log linear model is necessarily in log units, not natural units.
October 23, 2023 at 2:29 pm
Those are good questions.
I’m not exactly sure what is happening in your model but here are my top two possibilities.
When you transform data, especially using non-linear transformations like logarithms, the relationship between the variables can change. In the log-log linear regression, the relationship is linear, and the residuals (differences between observed and predicted values) average out to $0.00. However, when you transform back to the natural units using an exponential function, the relationship becomes non-linear. This non-linearity can cause the residuals to no longer average out to $0.00.
When you re-express the log-log model to its natural units form, there might be some approximation or rounding errors. These errors can accumulate and affect the average of the residuals.
As for the output in the book, that was all calculated by statistical software that I trust (Minitab). I’ll have to look deeper into what is going on, but I trust the results.
Hypothesis Testing - Chi Squared Test
Lisa Sullivan, PhD
Professor of Biostatistics
Boston University School of Public Health
This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific tests considered here are called chi-square tests and are appropriate when the outcome is discrete (dichotomous, ordinal or categorical). For example, in some clinical trials the outcome is a classification such as hypertensive, pre-hypertensive or normotensive. We could use the same classification in an observational study such as the Framingham Heart Study to compare men and women in terms of their blood pressure status - again using the classification of hypertensive, pre-hypertensive or normotensive status.
The technique to analyze a discrete outcome uses what is called a chi-square test. Specifically, the test statistic follows a chi-square probability distribution. We will consider chi-square tests here with one, two and more than two independent comparison groups.
After completing this module, the student will be able to:
Here we consider hypothesis testing with a discrete outcome variable in a single population. Discrete variables are variables that take on more than two distinct responses or categories and the responses can be ordered or unordered (i.e., the outcome can be ordinal or categorical). The procedure we describe here can be used for dichotomous (exactly 2 response options), ordinal or categorical discrete outcomes and the objective is to compare the distribution of responses, or the proportions of participants in each response category, to a known distribution. The known distribution is derived from another study or report and it is again important in setting up the hypotheses that the comparator distribution specified in the null hypothesis is a fair comparison. The comparator is sometimes called an external or a historical control.
In one sample tests for a discrete outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the proportions of participants in each response
Test Statistic for Testing H 0 : p 1 = p 10 , p 2 = p 20 , ..., p k = p k0
We find the critical value in a table of probabilities for the chi-square distribution with degrees of freedom (df) = k-1. In the test statistic, O = observed frequency and E=expected frequency in each of the response categories. The observed frequencies are those observed in the sample and the expected frequencies are computed as described below. χ 2 (chi-square) is another probability distribution and ranges from 0 to ∞. The test above statistic formula above is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories.
When we conduct a χ 2 test, we compare the observed frequencies in each response category to the frequencies we would expect if the null hypothesis were true. These expected frequencies are determined by allocating the sample to the response categories according to the distribution specified in H 0 . This is done by multiplying the observed sample size (n) by the proportions specified in the null hypothesis (p 10 , p 20 , ..., p k0 ). To ensure that the sample size is appropriate for the use of the test statistic above, we need to ensure that the following: min(np 10 , n p 20 , ..., n p k0 ) > 5.
The test of hypothesis with a discrete outcome measured in a single sample, where the goal is to assess whether the distribution of responses follows a known distribution, is called the χ 2 goodness-of-fit test. As the name indicates, the idea is to assess whether the pattern or distribution of responses in the sample "fits" a specified population (external or historical) distribution. In the next example we illustrate the test. As we work through the example, we provide additional details related to the use of this new test statistic.
A University conducted a survey of its recent graduates to collect demographic and health information for future planning purposes as well as to assess students' satisfaction with their undergraduate experiences. The survey revealed that a substantial proportion of students were not engaging in regular exercise, many felt their nutrition was poor and a substantial number were smoking. In response to a question on regular exercise, 60% of all graduates reported getting no regular exercise, 25% reported exercising sporadically and 15% reported exercising regularly as undergraduates. The next year the University launched a health promotion campaign on campus in an attempt to increase health behaviors among undergraduates. The program included modules on exercise, nutrition and smoking cessation. To evaluate the impact of the program, the University again surveyed graduates and asked the same questions. The survey was completed by 470 graduates and the following data were collected on the exercise question:
|
|
|
|
|
Number of Students | 255 | 125 | 90 | 470 |
Based on the data, is there evidence of a shift in the distribution of responses to the exercise question following the implementation of the health promotion campaign on campus? Run the test at a 5% level of significance.
In this example, we have one sample and a discrete (ordinal) outcome variable (with three response options). We specifically want to compare the distribution of responses in the sample to the distribution reported the previous year (i.e., 60%, 25%, 15% reporting no, sporadic and regular exercise, respectively). We now run the test using the five-step approach.
The null hypothesis again represents the "no change" or "no difference" situation. If the health promotion campaign has no impact then we expect the distribution of responses to the exercise question to be the same as that measured prior to the implementation of the program.
H 0 : p 1 =0.60, p 2 =0.25, p 3 =0.15, or equivalently H 0 : Distribution of responses is 0.60, 0.25, 0.15
H 1 : H 0 is false. α =0.05
Notice that the research hypothesis is written in words rather than in symbols. The research hypothesis as stated captures any difference in the distribution of responses from that specified in the null hypothesis. We do not specify a specific alternative distribution, instead we are testing whether the sample data "fit" the distribution in H 0 or not. With the χ 2 goodness-of-fit test there is no upper or lower tailed version of the test.
The test statistic is:
We must first assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=470 and the proportions specified in the null hypothesis are 0.60, 0.25 and 0.15. Thus, min( 470(0.65), 470(0.25), 470(0.15))=min(282, 117.5, 70.5)=70.5. The sample size is more than adequate so the formula can be used.
The decision rule for the χ 2 test depends on the level of significance and the degrees of freedom, defined as degrees of freedom (df) = k-1 (where k is the number of response categories). If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. Critical values can be found in a table of probabilities for the χ 2 distribution. Here we have df=k-1=3-1=2 and a 5% level of significance. The appropriate critical value is 5.99, and the decision rule is as follows: Reject H 0 if χ 2 > 5.99.
We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) and the expected frequencies into the formula for the test statistic identified in Step 2. The computations can be organized as follows.
|
|
|
|
|
---|---|---|---|---|
| 255 | 125 | 90 | 470 |
| 470(0.60) =282 | 470(0.25) =117.5 | 470(0.15) =70.5 | 470 |
Notice that the expected frequencies are taken to one decimal place and that the sum of the observed frequencies is equal to the sum of the expected frequencies. The test statistic is computed as follows:
We reject H 0 because 8.46 > 5.99. We have statistically significant evidence at α=0.05 to show that H 0 is false, or that the distribution of responses is not 0.60, 0.25, 0.15. The p-value is p < 0.005.
In the χ 2 goodness-of-fit test, we conclude that either the distribution specified in H 0 is false (when we reject H 0 ) or that we do not have sufficient evidence to show that the distribution specified in H 0 is false (when we fail to reject H 0 ). Here, we reject H 0 and concluded that the distribution of responses to the exercise question following the implementation of the health promotion campaign was not the same as the distribution prior. The test itself does not provide details of how the distribution has shifted. A comparison of the observed and expected frequencies will provide some insight into the shift (when the null hypothesis is rejected). Does it appear that the health promotion campaign was effective?
Consider the following:
|
|
|
|
|
---|---|---|---|---|
| 255 | 125 | 90 | 470 |
| 282 | 117.5 | 70.5 | 470 |
If the null hypothesis were true (i.e., no change from the prior year) we would have expected more students to fall in the "No Regular Exercise" category and fewer in the "Regular Exercise" categories. In the sample, 255/470 = 54% reported no regular exercise and 90/470=19% reported regular exercise. Thus, there is a shift toward more regular exercise following the implementation of the health promotion campaign. There is evidence of a statistical difference, is this a meaningful difference? Is there room for improvement?
The National Center for Health Statistics (NCHS) provided data on the distribution of weight (in categories) among Americans in 2002. The distribution was based on specific values of body mass index (BMI) computed as weight in kilograms over height in meters squared. Underweight was defined as BMI< 18.5, Normal weight as BMI between 18.5 and 24.9, overweight as BMI between 25 and 29.9 and obese as BMI of 30 or greater. Americans in 2002 were distributed as follows: 2% Underweight, 39% Normal Weight, 36% Overweight, and 23% Obese. Suppose we want to assess whether the distribution of BMI is different in the Framingham Offspring sample. Using data from the n=3,326 participants who attended the seventh examination of the Offspring in the Framingham Heart Study we created the BMI categories as defined and observed the following:
|
|
|
|
30 |
|
---|---|---|---|---|---|
| 20 | 932 | 1374 | 1000 | 3326 |
H 0 : p 1 =0.02, p 2 =0.39, p 3 =0.36, p 4 =0.23 or equivalently
H 0 : Distribution of responses is 0.02, 0.39, 0.36, 0.23
H 1 : H 0 is false. α=0.05
The formula for the test statistic is:
We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=3,326 and the proportions specified in the null hypothesis are 0.02, 0.39, 0.36 and 0.23. Thus, min( 3326(0.02), 3326(0.39), 3326(0.36), 3326(0.23))=min(66.5, 1297.1, 1197.4, 765.0)=66.5. The sample size is more than adequate, so the formula can be used.
Here we have df=k-1=4-1=3 and a 5% level of significance. The appropriate critical value is 7.81 and the decision rule is as follows: Reject H 0 if χ 2 > 7.81.
We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) into the formula for the test statistic identified in Step 2. We organize the computations in the following table.
|
|
|
|
30 |
|
---|---|---|---|---|---|
| 20 | 932 | 1374 | 1000 | 3326 |
| 66.5 | 1297.1 | 1197.4 | 765.0 | 3326 |
The test statistic is computed as follows:
We reject H 0 because 233.53 > 7.81. We have statistically significant evidence at α=0.05 to show that H 0 is false or that the distribution of BMI in Framingham is different from the national data reported in 2002, p < 0.005.
Again, the χ 2 goodness-of-fit test allows us to assess whether the distribution of responses "fits" a specified distribution. Here we show that the distribution of BMI in the Framingham Offspring Study is different from the national distribution. To understand the nature of the difference we can compare observed and expected frequencies or observed and expected proportions (or percentages). The frequencies are large because of the large sample size, the observed percentages of patients in the Framingham sample are as follows: 0.6% underweight, 28% normal weight, 41% overweight and 30% obese. In the Framingham Offspring sample there are higher percentages of overweight and obese persons (41% and 30% in Framingham as compared to 36% and 23% in the national data), and lower proportions of underweight and normal weight persons (0.6% and 28% in Framingham as compared to 2% and 39% in the national data). Are these meaningful differences?
In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable in a single population. We presented a test using a test statistic Z to test whether an observed (sample) proportion differed significantly from a historical or external comparator. The chi-square goodness-of-fit test can also be used with a dichotomous outcome and the results are mathematically equivalent.
In the prior module, we considered the following example. Here we show the equivalence to the chi-square goodness-of-fit test.
The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data?
We presented the following approach to the test using a Z statistic.
H 0 : p = 0.75
H 1 : p ≠ 0.75 α=0.05
We must first check that the sample size is adequate. Specifically, we need to check min(np 0 , n(1-p 0 )) = min( 125(0.75), 125(1-0.75))=min(94, 31)=31. The sample size is more than adequate so the following formula can be used
This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.960 or if Z > 1.960.
We now substitute the sample data into the formula for the test statistic identified in Step 2. The sample proportion is:
We reject H 0 because -6.15 < -1.960. We have statistically significant evidence at a =0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data. (p < 0.0001).
We now conduct the same test using the chi-square goodness-of-fit test. First, we summarize our sample data as follows:
| Saw a Dentist in Past 12 Months | Did Not See a Dentist in Past 12 Months | Total |
---|---|---|---|
# of Participants | 64 | 61 | 125 |
H 0 : p 1 =0.75, p 2 =0.25 or equivalently H 0 : Distribution of responses is 0.75, 0.25
We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ...,np k >) > 5. The sample size here is n=125 and the proportions specified in the null hypothesis are 0.75, 0.25. Thus, min( 125(0.75), 125(0.25))=min(93.75, 31.25)=31.25. The sample size is more than adequate so the formula can be used.
Here we have df=k-1=2-1=1 and a 5% level of significance. The appropriate critical value is 3.84, and the decision rule is as follows: Reject H 0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)
|
|
|
|
---|---|---|---|
| 64 | 61 | 125 |
| 93.75 | 31.25 | 125 |
(Note that (-6.15) 2 = 37.8, where -6.15 was the value of the Z statistic in the test for proportions shown above.)
We reject H 0 because 37.8 > 3.84. We have statistically significant evidence at α=0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data. (p < 0.0001). This is the same conclusion we reached when we conducted the test using the Z test above. With a dichotomous outcome, Z 2 = χ 2 ! In statistics, there are often several approaches that can be used to test hypotheses.
Here we extend that application of the chi-square test to the case with two or more independent comparison groups. Specifically, the outcome of interest is discrete with two or more responses and the responses can be ordered or unordered (i.e., the outcome can be dichotomous, ordinal or categorical). We now consider the situation where there are two or more independent comparison groups and the goal of the analysis is to compare the distribution of responses to the discrete outcome variable among several independent comparison groups.
The test is called the χ 2 test of independence and the null hypothesis is that there is no difference in the distribution of responses to the outcome across comparison groups. This is often stated as follows: The outcome variable and the grouping variable (e.g., the comparison treatments or comparison groups) are independent (hence the name of the test). Independence here implies homogeneity in the distribution of the outcome among comparison groups.
The null hypothesis in the χ 2 test of independence is often stated in words as: H 0 : The distribution of the outcome is independent of the groups. The alternative or research hypothesis is that there is a difference in the distribution of responses to the outcome variable among the comparison groups (i.e., that the distribution of responses "depends" on the group). In order to test the hypothesis, we measure the discrete outcome variable in each participant in each comparison group. The data of interest are the observed frequencies (or number of participants in each response category in each group). The formula for the test statistic for the χ 2 test of independence is given below.
Test Statistic for Testing H 0 : Distribution of outcome is independent of groups
and we find the critical value in a table of probabilities for the chi-square distribution with df=(r-1)*(c-1).
Here O = observed frequency, E=expected frequency in each of the response categories in each group, r = the number of rows in the two-way table and c = the number of columns in the two-way table. r and c correspond to the number of comparison groups and the number of response options in the outcome (see below for more details). The observed frequencies are the sample data and the expected frequencies are computed as described below. The test statistic is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories in each group.
The data for the χ 2 test of independence are organized in a two-way table. The outcome and grouping variable are shown in the rows and columns of the table. The sample table below illustrates the data layout. The table entries (blank below) are the numbers of participants in each group responding to each response category of the outcome variable.
Table - Possible outcomes are are listed in the columns; The groups being compared are listed in rows.
|
|
| |||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| N |
In the table above, the grouping variable is shown in the rows of the table; r denotes the number of independent groups. The outcome variable is shown in the columns of the table; c denotes the number of response options in the outcome variable. Each combination of a row (group) and column (response) is called a cell of the table. The table has r*c cells and is sometimes called an r x c ("r by c") table. For example, if there are 4 groups and 5 categories in the outcome variable, the data are organized in a 4 X 5 table. The row and column totals are shown along the right-hand margin and the bottom of the table, respectively. The total sample size, N, can be computed by summing the row totals or the column totals. Similar to ANOVA, N does not refer to a population size here but rather to the total sample size in the analysis. The sample data can be organized into a table like the above. The numbers of participants within each group who select each response option are shown in the cells of the table and these are the observed frequencies used in the test statistic.
The test statistic for the χ 2 test of independence involves comparing observed (sample data) and expected frequencies in each cell of the table. The expected frequencies are computed assuming that the null hypothesis is true. The null hypothesis states that the two variables (the grouping variable and the outcome) are independent. The definition of independence is as follows:
Two events, A and B, are independent if P(A|B) = P(A), or equivalently, if P(A and B) = P(A) P(B).
The second statement indicates that if two events, A and B, are independent then the probability of their intersection can be computed by multiplying the probability of each individual event. To conduct the χ 2 test of independence, we need to compute expected frequencies in each cell of the table. Expected frequencies are computed by assuming that the grouping variable and outcome are independent (i.e., under the null hypothesis). Thus, if the null hypothesis is true, using the definition of independence:
P(Group 1 and Response Option 1) = P(Group 1) P(Response Option 1).
The above states that the probability that an individual is in Group 1 and their outcome is Response Option 1 is computed by multiplying the probability that person is in Group 1 by the probability that a person is in Response Option 1. To conduct the χ 2 test of independence, we need expected frequencies and not expected probabilities . To convert the above probability to a frequency, we multiply by N. Consider the following small example.
|
|
|
|
|
---|---|---|---|---|
| 10 | 8 | 7 | 25 |
| 22 | 15 | 13 | 50 |
| 30 | 28 | 17 | 75 |
| 62 | 51 | 37 | 150 |
The data shown above are measured in a sample of size N=150. The frequencies in the cells of the table are the observed frequencies. If Group and Response are independent, then we can compute the probability that a person in the sample is in Group 1 and Response category 1 using:
P(Group 1 and Response 1) = P(Group 1) P(Response 1),
P(Group 1 and Response 1) = (25/150) (62/150) = 0.069.
Thus if Group and Response are independent we would expect 6.9% of the sample to be in the top left cell of the table (Group 1 and Response 1). The expected frequency is 150(0.069) = 10.4. We could do the same for Group 2 and Response 1:
P(Group 2 and Response 1) = P(Group 2) P(Response 1),
P(Group 2 and Response 1) = (50/150) (62/150) = 0.138.
The expected frequency in Group 2 and Response 1 is 150(0.138) = 20.7.
Thus, the formula for determining the expected cell frequencies in the χ 2 test of independence is as follows:
Expected Cell Frequency = (Row Total * Column Total)/N.
The above computes the expected frequency in one step rather than computing the expected probability first and then converting to a frequency.
In a prior example we evaluated data from a survey of university graduates which assessed, among other things, how frequently they exercised. The survey was completed by 470 graduates. In the prior example we used the χ 2 goodness-of-fit test to assess whether there was a shift in the distribution of responses to the exercise question following the implementation of a health promotion campaign on campus. We specifically considered one sample (all students) and compared the observed distribution to the distribution of responses the prior year (a historical control). Suppose we now wish to assess whether there is a relationship between exercise on campus and students' living arrangements. As part of the same survey, graduates were asked where they lived their senior year. The response options were dormitory, on-campus apartment, off-campus apartment, and at home (i.e., commuted to and from the university). The data are shown below.
|
|
|
|
|
---|---|---|---|---|
| 32 | 30 | 28 | 90 |
| 74 | 64 | 42 | 180 |
| 110 | 25 | 15 | 150 |
| 39 | 6 | 5 | 50 |
| 255 | 125 | 90 | 470 |
Based on the data, is there a relationship between exercise and student's living arrangement? Do you think where a person lives affect their exercise status? Here we have four independent comparison groups (living arrangement) and a discrete (ordinal) outcome variable with three response options. We specifically want to test whether living arrangement and exercise are independent. We will run the test using the five-step approach.
H 0 : Living arrangement and exercise are independent
H 1 : H 0 is false. α=0.05
The null and research hypotheses are written in words rather than in symbols. The research hypothesis is that the grouping variable (living arrangement) and the outcome variable (exercise) are dependent or related.
The condition for appropriate use of the above test statistic is that each expected frequency is at least 5. In Step 4 we will compute the expected frequencies and we will ensure that the condition is met.
The decision rule depends on the level of significance and the degrees of freedom, defined as df = (r-1)(c-1), where r and c are the numbers of rows and columns in the two-way data table. The row variable is the living arrangement and there are 4 arrangements considered, thus r=4. The column variable is exercise and 3 responses are considered, thus c=3. For this test, df=(4-1)(3-1)=3(2)=6. Again, with χ 2 tests there are no upper, lower or two-tailed tests. If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. The rejection region for the χ 2 test of independence is always in the upper (right-hand) tail of the distribution. For df=6 and a 5% level of significance, the appropriate critical value is 12.59 and the decision rule is as follows: Reject H 0 if c 2 > 12.59.
We now compute the expected frequencies using the formula,
Expected Frequency = (Row Total * Column Total)/N.
The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency. The expected frequencies are shown in parentheses.
|
|
|
|
|
---|---|---|---|---|
| 32 (48.8) | 30 (23.9) | 28 (17.2) | 90 |
| 74 (97.7) | 64 (47.9) | 42 (34.5) | 180 |
| 110 (81.4) | 25 (39.9) | 15 (28.7) | 150 |
| 39 (27.1) | 6 (13.3) | 5 (9.6) | 50 |
| 255 | 125 | 90 | 470 |
Notice that the expected frequencies are taken to one decimal place and that the sums of the observed frequencies are equal to the sums of the expected frequencies in each row and column of the table.
Recall in Step 2 a condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 9.6) and therefore it is appropriate to use the test statistic.
We reject H 0 because 60.5 > 12.59. We have statistically significant evidence at a =0.05 to show that H 0 is false or that living arrangement and exercise are not independent (i.e., they are dependent or related), p < 0.005.
Again, the χ 2 test of independence is used to test whether the distribution of the outcome variable is similar across the comparison groups. Here we rejected H 0 and concluded that the distribution of exercise is not independent of living arrangement, or that there is a relationship between living arrangement and exercise. The test provides an overall assessment of statistical significance. When the null hypothesis is rejected, it is important to review the sample data to understand the nature of the relationship. Consider again the sample data.
Because there are different numbers of students in each living situation, it makes the comparisons of exercise patterns difficult on the basis of the frequencies alone. The following table displays the percentages of students in each exercise category by living arrangement. The percentages sum to 100% in each row of the table. For comparison purposes, percentages are also shown for the total sample along the bottom row of the table.
|
|
|
|
---|---|---|---|
| 36% | 33% | 31% |
| 41% | 36% | 23% |
| 73% | 17% | 10% |
| 78% | 12% | 10% |
| 54% | 27% | 19% |
From the above, it is clear that higher percentages of students living in dormitories and in on-campus apartments reported regular exercise (31% and 23%) as compared to students living in off-campus apartments and at home (10% each).
Test Yourself
Pancreaticoduodenectomy (PD) is a procedure that is associated with considerable morbidity. A study was recently conducted on 553 patients who had a successful PD between January 2000 and December 2010 to determine whether their Surgical Apgar Score (SAS) is related to 30-day perioperative morbidity and mortality. The table below gives the number of patients experiencing no, minor, or major morbidity by SAS category.
|
|
|
|
---|---|---|---|
0-4 | 21 | 20 | 16 |
5-6 | 135 | 71 | 35 |
7-10 | 158 | 62 | 35 |
Question: What would be an appropriate statistical test to examine whether there is an association between Surgical Apgar Score and patient outcome? Using 14.13 as the value of the test statistic for these data, carry out the appropriate test at a 5% level of significance. Show all parts of your test.
In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable and two independent comparison groups. We presented a test using a test statistic Z to test for equality of independent proportions. The chi-square test of independence can also be used with a dichotomous outcome and the results are mathematically equivalent.
In the prior module, we considered the following example. Here we show the equivalence to the chi-square test of independence.
A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 0-10 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.
|
|
|
|
---|---|---|---|
| 50 | 23 | 0.46 |
| 50 | 11 | 0.22 |
We tested whether there was a significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using a Z statistic, as follows.
H 0 : p 1 = p 2
H 1 : p 1 ≠ p 2 α=0.05
Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.
We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group or that:
In this example, we have
Therefore, the sample size is adequate, so the following formula can be used:
Reject H 0 if Z < -1.960 or if Z > 1.960.
We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes:
We now substitute to compute the test statistic.
We now conduct the same test using the chi-square test of independence.
H 0 : Treatment and outcome (meaningful reduction in pain) are independent
H 1 : H 0 is false. α=0.05
The formula for the test statistic is:
For this test, df=(2-1)(2-1)=1. At a 5% level of significance, the appropriate critical value is 3.84 and the decision rule is as follows: Reject H0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)
We now compute the expected frequencies using:
The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency. The expected frequencies are shown in parentheses.
|
|
|
|
---|---|---|---|
| 23 (17.0) | 27 (33.0) | 50 |
| 11 (17.0) | 39 (33.0) | 50 |
| 34 | 66 | 100 |
A condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 22.0) and therefore it is appropriate to use the test statistic.
(Note that (2.53) 2 = 6.4, where 2.53 was the value of the Z statistic in the test for proportions shown above.)
The video below by Mike Marin demonstrates how to perform chi-squared tests in the R programming language.
We have 3 independent comparison groups (Surgical Apgar Score) and a categorical outcome variable (morbidity/mortality). We can run a Chi-Squared test of independence.
H 0 : Apgar scores and patient outcome are independent of one another.
H A : Apgar scores and patient outcome are not independent.
Chi-squared = 14.3
Since 14.3 is greater than 9.49, we reject H 0.
There is an association between Apgar scores and patient outcome. The lowest Apgar score group (0 to 4) experienced the highest percentage of major morbidity or mortality (16 out of 57=28%) compared to the other Apgar score groups.
Investopedia
The chi-square goodness of fit test is a variation of the more general chi-square test. The setting for this test is a single categorical variable that can have many levels. Often in this situation, we will have a theoretical model in mind for a categorical variable. Through this model we expect certain proportions of the population to fall into each of these levels. A goodness of fit test determines how well the expected proportions in our theoretical model matches reality.
The null and alternative hypotheses for a goodness of fit test look different than some of our other hypothesis tests. One reason for this is that a chi-square goodness of fit test is a nonparametric method . This means that our test does not concern a single population parameter. Thus the null hypothesis does not state that a single parameter takes on a certain value.
We start with a categorical variable with n levels and let p i be the proportion of the population at level i . Our theoretical model has values of q i for each of the proportions. The statement of the null and alternative hypotheses are as follows:
The calculation of a chi-square statistic involves a comparison between actual counts of variables from the data in our simple random sample and the expected counts of these variables. The actual counts come directly from our sample. The way that the expected counts are calculated depends upon the particular chi-square test that we are using.
For a goodness of fit test, we have a theoretical model for how our data should be proportioned. We simply multiply these proportions by the sample size n to obtain our expected counts.
The chi-square statistic for goodness of fit test is determined by comparing the actual and expected counts for each level of our categorical variable. The steps to computing the chi-square statistic for a goodness of fit test are as follows:
If our theoretical model matches the observed data perfectly, then the expected counts will show no deviation whatsoever from the observed counts of our variable. This will mean that we will have a chi-square statistic of zero. In any other situation, the chi-square statistic will be a positive number.
The number of degrees of freedom requires no difficult calculations. All that we need to do is subtract one from the number of levels of our categorical variable. This number will inform us on which of the infinite chi-square distributions we should use.
The chi-square statistic that we calculated corresponds to a particular location on a chi-square distribution with the appropriate number of degrees of freedom. The p-value determines the probability of obtaining a test statistic this extreme, assuming that the null hypothesis is true. We can use a table of values for a chi-square distribution to determine the p-value of our hypothesis test. If we have statistical software available, then this can be used to obtain a better estimate of the p-value.
We make our decision on whether to reject the null hypothesis based upon a predetermined level of significance. If our p-value is less than or equal to this level of significance, then we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.
Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed value of a given phenomena is significantly different from the expected value. In Chi-Square goodness of fit test, the term goodness of fit is used to compare the observed sample distribution with the expected probability distribution. This test determines how well theoretical distribution (such as normal, binomial, or Poisson) fits the empirical distribution. Sample data is divided into intervals. Then the numbers of points that fall into the interval are compared, with the expected numbers of points in each interval.
Procedure for Chi-Square Goodness of Fit Test:
A. Null hypothesis : The null hypothesis assumes that there is no significant difference between the observed and the expected value.
B. Alternative hypothesis : The alternative hypothesis assumes that there is a significant difference between the observed and the expected value.
Degree of freedom: The degree of freedom depends on the distribution of the sample. The following table shows the distribution and an associated degree of freedom:
Type of distribution | No of constraints | Degree of freedom |
Binominal distribution | 1 | n-1 |
Poisson distribution | 2 | n-2 |
Normal distribution | 3 | n-3 |
Hypothesis testing: Hypothesis testing is the same as in other tests, like t-test, ANOVA , etc. The calculated value of Chi-Square goodness of fit test is compared with the table value. If the calculated value is greater than the table value, we will reject the null hypothesis and conclude that there is a significant difference between the observed and the expected frequency. If the calculated value is less than the table value, we will accept the null hypothesis and conclude that there is no significant difference between the observed and expected value.
Schedule a time to speak with an expert using the calendar below.
Turn raw data into written, interpreted, APA formatted Chi-Square results in seconds.
Take the Course: Chi-Square Goodness of Fit Test
Related Pages:
You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.
All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.
Original Submission Date Received: .
Find support for a specific problem in the support section of our website.
Please let us know what you think of our products and services.
Visit our dedicated information section to learn more about MDPI.
The hotel architectural design factors influencing consumer destinations: a case study of three-star hotels in hua hin, thailand.
2. literature review, 2.1. the characteristics of hotels in aesthetic perception and evaluation approach, 2.2. the characteristics of hotels in physical comfort approach, 2.3. the characteristics of hotels in emotional comfort, safety, and security: influencing consumer perceptions, 2.4. sensitivity of mind approach: influencing consumer emotions and decision-making, 3. methodology, 3.1. data sources, 3.1.1. professional group, 3.1.2. consumer group, 3.2. data collection, 3.2.1. in-depth interviews with professionals, 3.2.2. open-ended questions with the consumer group, 3.2.3. developing a questionnaire for consumer groups, collection of main data, 3.3. data analysis, 4.1. aesthetics perspectives, 4.2. physical comfort perspectives, 4.3. emotional comfort perspectives, 4.4. security and sensibility of mind perspectives, 5. conclusions and discussion, author contributions, data availability statement, conflicts of interest.
Grroup 1: Aesthetic Group | ||||
---|---|---|---|---|
Factor | Observed Frequency (O) | Expected Frequency (E) | (O − E) /E | Test Result |
Design Concept Theme | 55 | 47.56 | 1.17 | Not Significant |
Harmony | 41 | 47.56 | 0.91 | Not Significant |
Balance | 40 | 47.56 | 1.2 | Not Significant |
Space | 39 | 47.56 | 1.54 | Not Significant |
Style | 53 | 47.56 | 0.63 | Not Significant |
Beautiful | 61 | 47.56 | 3.92 | Significant |
Creativity | 47 | 47.56 | 0.01 | Not Significant |
Environment | 50 | 47.56 | 0.13 | Not Significant |
Perspective & Visual | 42 | 47.56 | 0.65 | Not Significant |
Group 2: Physical Comfort Group | ||||
---|---|---|---|---|
Factor | Observed Frequency (O) | Expected Frequency (E) | (O − E) /E | Test Result |
Function | 60 | 47.56 | 3.19 | Significant |
Shape | 35 | 47.56 | 3.38 | Significant |
Proportion & Mass | 48 | 47.56 | 0 | Not Significant |
Texture & Material | 49 | 47.56 | 0.05 | Not Significant |
Human Scale | 38 | 47.56 | 1.92 | Not Significant |
Durability | 43 | 47.56 | 0.44 | Not Significant |
Color | 50 | 47.56 | 0.13 | Not Significant |
Furniture | 45 | 47.56 | 0.14 | Not Significant |
Comfortable | 60 | 47.56 | 3.19 | Significant |
Facilities | 38 | 47.56 | 1.92 | Not Significant |
Circulation | 40 | 47.56 | 1.2 | Not Significant |
Total | 466 | 523.16 | 15.56 | Not Significant |
Group 3: Emotional Comfort Group | ||||
---|---|---|---|---|
Factor | Observed Frequency (O) | Expected Frequency (E) | (O − E) /E | Test Result |
Sense of Place | 52 | 40.45 | 3.26 | Significant |
Location | 35 | 40.45 | 0.73 | Not Significant |
Feeling | 38 | 40.45 | 0.15 | Not Significant |
Relationships & Ties | 33 | 40.45 | 1.38 | Not Significant |
Natural Touch | 47 | 40.45 | 1.07 | Not Significant |
Relax | 42 | 40.45 | 0.06 | Not Significant |
Warmth | 37 | 40.45 | 0.29 | Not Significant |
Peaceful | 40 | 40.45 | 0.01 | Not Significant |
Service | 55 | 40.45 | 5.3 | Significant |
Social | 28 | 40.45 | 3.79 | Significant |
Friendly | 45 | 40.45 | 0.51 | Not Significant |
Total | 452 | 445 | 16.55 | Not Significant |
Group 4: The Security and Sensibility Group | ||||
---|---|---|---|---|
Factor | Observed Frequency (O) | Expected Frequency (E) | (O − E) /E | Test Result |
Safety | 50 | 39.56 | 2.75 | Significant |
Security | 35 | 39.56 | 0.53 | Not Significant |
Risk | 30 | 39.56 | 2.31 | Significant |
Satisfaction | 48 | 39.56 | 1.79 | Significant |
Loyalty | 45 | 39.56 | 0.75 | Not Significant |
Communication | 33 | 39.56 | 1.09 | Not Significant |
Legal Requirements | 36 | 39.56 | 0.32 | Not Significant |
Modernity | 38 | 39.56 | 0.06 | Not Significant |
Innovation | 47 | 39.56 | 1.4 | Significant |
Sustainability | 32 | 39.56 | 1.44 | Not Significant |
Value/Equality | 39 | 39.56 | 0.01 | Not Significant |
Quality | 55 | 39.56 | 6.05 | Significant |
Efficiency | 40 | 39.56 | 0 | Not Significant |
Expectations | 42 | 39.56 | 0.15 | Not Significant |
Convenient | 38 | 39.56 | 0.06 | Not Significant |
Cleanliness | 47 | 39.56 | 1.4 | Significant |
Room Comfort | 45 | 39.56 | 0.75 | Not Significant |
Remember | 32 | 39.56 | 1.44 | Not Significant |
Total | 724 | 711.08 | 21.7 | Not Significant |
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
Sirirat, S.; Thampanichwat, C.; Pongsermpol, C.; Moorapun, C. The Hotel Architectural Design Factors Influencing Consumer Destinations: A Case Study of Three-Star Hotels in Hua Hin, Thailand. Buildings 2024 , 14 , 2428. https://doi.org/10.3390/buildings14082428
Sirirat S, Thampanichwat C, Pongsermpol C, Moorapun C. The Hotel Architectural Design Factors Influencing Consumer Destinations: A Case Study of Three-Star Hotels in Hua Hin, Thailand. Buildings . 2024; 14(8):2428. https://doi.org/10.3390/buildings14082428
Sirirat, Sanawete, Chaniporn Thampanichwat, Chotewit Pongsermpol, and Chumporn Moorapun. 2024. "The Hotel Architectural Design Factors Influencing Consumer Destinations: A Case Study of Three-Star Hotels in Hua Hin, Thailand" Buildings 14, no. 8: 2428. https://doi.org/10.3390/buildings14082428
Article access statistics, further information, mdpi initiatives, follow mdpi.
Subscribe to receive issue release notifications and newsletters from MDPI journals
IMAGES
VIDEO
COMMENTS
Example: Chi-square goodness of fit test conditions. You can use a chi-square goodness of fit test to analyze the dog food data because all three conditions have been met: You want to test a hypothesis about the distribution of one categorical variable. The categorical variable is the dog food flavors. You recruited a random sample of 75 dogs.
A Chi-Square goodness of fit test uses the following null and alternative hypotheses: H 0: ... 0.05, and 0.01) then you can reject the null hypothesis. Chi-Square Goodness of Fit Test: Example. A shop owner claims that an equal number of customers come into his shop each weekday. To test this hypothesis, an independent researcher records the ...
Null: The sample data follow the hypothesized distribution.; Alternative: The sample data do not follow the hypothesized distribution.; When the p-value for the chi-square goodness of fit test is less than your significance level, reject the null hypothesis.Your data favor the hypothesis that the sample does not follow the hypothesized distribution. Let's work through two examples using the ...
I understand that if the chi-square value exceeds the appropriate minimum value in the chi-square distribution table, taken into account the degrees of freedom, you can reject the null hypothesis. (And that the same is true of the reverse, if the chi-square value does not exceed the appropriate minimum value in the chi-square distribution you ...
And we got a chi-squared value. Our chi-squared statistic was six. So this right over here tells us the probability of getting a 6.25 or greater for our chi-squared value is 10%. If we go back to this chart, we just learned that this probability from 6.25 and up, when we have three degrees of freedom, that this right over here is 10%.
The chi-square goodness of fit test is appropriate when the following conditions are met: The sampling method is simple random sampling. The variable under study is categorical. The expected value of the number of sample observations in each level of the variable is at least 5. This approach consists of four steps: (1) state the hypotheses, (2 ...
The test statistic for a goodness-of-fit test is: ∑k (O − E)2 E (11.3.1) where: O = observed values (data) E = expected values (from theory) k = the number of different data cells or categories. The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true.
the observed count O of each cell in Table 11.2.5 is at least 5, then χ2 approximately follows a chi-square distribution with df = I − 1 degrees of freedom. The test is known as a goodness-of-fit χ2 test since it tests the null hypothesis that the sample fits the assumed probability distribution well. It is always right-tailed, since ...
The chi-square goodness of fit test is used to test whether the frequency distribution of a categorical variable is different from your expectations. ... Example: Chi-square test of independence. Null hypothesis (H 0): The proportion of people who are left-handed is the same for Americans and Canadians.
When conducting a chi-square goodness-of-fit test, it makes the most sense to write the hypotheses first. The hypotheses will depend on the research question. The null hypothesis will always contain the equalities and the alternative hypothesis will be that at least one population proportion is not as specified in the null.
Let's look at the candy data and the Chi-square test for goodness of fit using statistical terms. This test is also known as Pearson's Chi-square test. Our null hypothesis is that the proportion of flavors in each bag is the same. We have five flavors. The null hypothesis is written as: $ H_0: p_1 = p_2 = p_3 = p_4 = p_5 $
Steps to perform a chi-square goodness-of-fit test: First, check the assumptions. Calculate the expected frequency for each possible value of the variable using E = np E = n p, where n n is the total number of observations and p p is the relative frequency (or probability) specified in the null hypothesis. Check whether the expected frequencies ...
The test statistic for a goodness-of-fit test is: ∑ k (O−E)2 E ∑ k ( O − E) 2 E. where: O = observed values (data) E = expected values (from theory) k = the number of different data cells or categories. The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true.
The Anderson-Darling test works for continuous data, and the chi-square goodness of fit test is for categorical and discrete data. ... Because the p-value is less than 0.05, reject the null hypothesis and conclude the sample data do not follow a normal distribution.
Here we show the equivalence to the chi-square goodness-of-fit test. ... The test is called the χ 2 test of independence and the null hypothesis is that there is no difference in the distribution of responses to the outcome across comparison groups. This is often stated as follows: The outcome variable and the grouping variable (e.g., the ...
You should use the Chi-Square Goodness of Fit Test whenever you would like to know if some categorical variable follows some hypothesized distribution. Here are some examples of when you might use this test: Example 1: Counting Customers. A shop owner wants to know if an equal number of people come into a shop each day of the week, so he counts ...
Example In the gambling example above, the chi-square test statistic was calculated to be 23.367. Since k = 4 in this case (the possibilities are 0, 1, 2, or 3 sixes), the test statistic is associated with the chi-square distribution with 3 degrees of freedom. If we are interested in a significance level of 0.05 we may reject the null hypothesis (that the dice are fair) if > 7.815, the value ...
A Chi-Square Goodness of Fit Test is used to determine whether or not a categorical variable follows a hypothesized distribution. To perform a Chi-Square Goodness of Fit Test, simply enter a list of observed and expected values for up to 10 categories in the boxes below, then click the "Calculate" button: X 2 Test Statistic: 4.360000.
Chi-square Goodness of Fit is a statistical test commonly used to compare observed data with data we would expect to obtain. ... The level of significance is the maximum tolerable probability of accepting a false null hypothesis. We use 0.05. ... The resulting value is the P value for the Chi-Square test. If you don't want it to be in ...
The chi-square statistic for goodness of fit test is determined by comparing the actual and expected counts for each level of our categorical variable. The steps to computing the chi-square statistic for a goodness of fit test are as follows: For each level, subtract the observed count from the expected count. Square each of these differences.
Hypothesis testing: Hypothesis testing is the same as in other tests, like t-test, ANOVA, etc. The calculated value of Chi-Square goodness of fit test is compared with the table value. If the calculated value is greater than the table value, we will reject the null hypothesis and conclude that there is a significant difference between the observed and the expected frequency.
1. The Chi-Square Goodness of Fit Test - Used to determine whether or not a categorical variable follows a hypothesized distribution. 2. The Chi-Square Test of Independence - Used to determine whether or not there is a significant association between two categorical variables. In this article, we share several examples of how each of these ...
If the calculated value is greater than the critical value, reject the null hypothesis of equal distribution. ... The chi-square goodness of fit test results suggest that the distribution of mentions for factors in the security and sensibility group is uniform, meaning no single factor is disproportionately mentioned more than the others. ...