If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

AP®︎/College Statistics

Course: ap®︎/college statistics   >   unit 12, chi-square statistic for hypothesis testing.

  • Chi-square goodness-of-fit example
  • Expected counts in a goodness-of-fit test
  • Conditions for a goodness-of-fit test
  • Test statistic and P-value in a goodness-of-fit test
  • Conclusions in a goodness-of-fit test

null hypothesis for chi square goodness of fit test

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Great Answer

Video transcript

Teach yourself statistics

Chi-Square Goodness of Fit Test

This lesson explains how to conduct a chi-square goodness of fit test . The test is applied when you have one categorical variable from a single population. It is used to determine whether sample data are consistent with a hypothesized distribution.

For example, suppose a company printed baseball cards. It claimed that 30% of its cards were rookies; 60% were veterans but not All-Stars; and 10% were veteran All-Stars. We could gather a random sample of baseball cards and use a chi-square goodness of fit test to see whether our sample distribution differed significantly from the distribution claimed by the company. The sample problem at the end of the lesson considers this example.

When to Use the Chi-Square Goodness of Fit Test

The chi-square goodness of fit test is appropriate when the following conditions are met:

  • The sampling method is simple random sampling .
  • The variable under study is categorical .
  • The expected value of the number of sample observations in each level of the variable is at least 5.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis (H o ) and an alternative hypothesis (H a ). The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

For a chi-square goodness of fit test, the hypotheses take the following form.

  • H o : The data are consistent with a specified distribution.
  • H a : The data are not consistent with a specified distribution.

Typically, the null hypothesis (H o ) specifies the proportion of observations at each level of the categorical variable. The alternative hypothesis (H a ) is that at least one of the specified proportions is not true.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the chi-square goodness of fit test to determine whether observed sample frequencies differ significantly from expected frequencies specified in the null hypothesis. The chi-square goodness of fit test is described in the next section, and demonstrated in the sample problem at the end of this lesson.

Analyze Sample Data

Using sample data, find the degrees of freedom, expected frequency counts, test statistic, and the P-value associated with the test statistic.

Χ 2 = Σ [ (O i - E i ) 2 / E i ]

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

Acme Toy Company prints baseball cards. The company claims that 30% of the cards are rookies, 60% veterans but not All-Stars, and 10% are veteran All-Stars.

Suppose a random sample of 100 cards has 50 rookies, 45 veterans, and 5 All-Stars. Is this consistent with Acme's claim? Use a 0.05 level of significance.

The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

  • Null hypothesis: The proportion of rookies, veterans, and All-Stars is 30%, 60% and 10%, respectively.
  • Alternative hypothesis: At least one of the proportions in the null hypothesis is false.
  • Formulate an analysis plan . For this analysis, the significance level is 0.05. Using sample data, we will conduct a chi-square goodness of fit test of the null hypothesis.

DF = k - 1 = 3 - 1 = 2 (E i ) = n * p i (E 1 ) = 100 * 0.30 = 30 (E 2 ) = 100 * 0.60 = 60 (E 3 ) = 100 * 0.10 = 10 Χ 2 = Σ [ (O i - E i ) 2 / E i ] Χ 2 = [ (50 - 30) 2 / 30 ] + [ (45 - 60) 2 / 60 ] + [ (5 - 10) 2 / 10 ] Χ 2 = (400 / 30) + (225 / 60) + (25 / 10) = 13.33 + 3.75 + 2.50 = 19.58

where DF is the degrees of freedom, k is the number of levels of the categorical variable, n is the number of observations in the sample, E i is the expected frequency count for level i, O i is the observed frequency count for level i, and Χ 2 is the chi-square test statistic.

The P-value is the probability that a chi-square statistic having 2 degrees of freedom is more extreme (bigger) than 19.58. We use the Chi-Square Distribution Calculator to find P(Χ 2 > 19.58) = 0.00006.

  • Interpret results . Since the P-value (0.00006) is less than the significance level (0.05), we cannot accept the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the variable under study was categorical, and each level of the categorical variable had an expected frequency count of at least 5.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Chi-Square (Χ²) Tests | Types, Formula & Examples

Chi-Square (Χ²) Tests | Types, Formula & Examples

Published on May 23, 2022 by Shaun Turney . Revised on June 22, 2023.

A Pearson’s chi-square test is a statistical test for categorical data. It is used to determine whether your data are significantly different from what you expected. There are two types of Pearson’s chi-square tests:

  • The chi-square goodness of fit test is used to test whether the frequency distribution of a categorical variable is different from your expectations.
  • The chi-square test of independence is used to test whether two categorical variables are related to each other.

Table of contents

What is a chi-square test, the chi-square formula, when to use a chi-square test, types of chi-square tests, how to perform a chi-square test, how to report a chi-square test, practice questions, other interesting articles, frequently asked questions about chi-square tests.

Pearson’s chi-square (Χ 2 ) tests, often referred to simply as chi-square tests, are among the most common nonparametric tests . Nonparametric tests are used for data that don’t follow the assumptions of parametric tests , especially the assumption of a normal distribution .

If you want to test a hypothesis about the distribution of a categorical variable you’ll need to use a chi-square test or another nonparametric test. Categorical variables can be nominal or ordinal and represent groupings such as species or nationalities. Because they can only have a few specific values, they can’t have a normal distribution.

Test hypotheses about frequency distributions

There are two types of Pearson’s chi-square tests, but they both test whether the observed frequency distribution of a categorical variable is significantly different from its expected frequency distribution. A frequency distribution describes how observations are distributed between different groups.

Frequency distributions are often displayed using frequency distribution tables . A frequency distribution table shows the number of observations in each group. When there are two categorical variables, you can use a specific type of frequency distribution table called a contingency table to show the number of observations in each combination of groups.

Frequency of visits by bird species at a bird feeder during a 24-hour period
Bird species Frequency
House sparrow 15
House finch 12
Black-capped chickadee 9
Common grackle 8
European starling 8
Mourning dove 6
Contingency table of the handedness of a sample of Americans and Canadians
Right-handed Left-handed
American 236 19
Canadian 157 16

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Both of Pearson’s chi-square tests use the same formula to calculate the test statistic , chi-square (Χ 2 ):

\begin{equation*} X^2=\sum{\frac{(O-E)^2}{E}} \end{equation*}

  • Χ 2 is the chi-square test statistic
  • Σ is the summation operator (it means “take the sum of”)
  • O is the observed frequency
  • E is the expected frequency

The larger the difference between the observations and the expectations ( O − E in the equation), the bigger the chi-square will be. To decide whether the difference is big enough to be statistically significant , you compare the chi-square value to a critical value.

A Pearson’s chi-square test may be an appropriate option for your data if all of the following are true:

  • You want to test a hypothesis about one or more categorical variables . If one or more of your variables is quantitative, you should use a different statistical test . Alternatively, you could convert the quantitative variable into a categorical variable by separating the observations into intervals.
  • The sample was randomly selected from the population .
  • There are a minimum of five observations expected in each group or combination of groups.

The two types of Pearson’s chi-square tests are:

Chi-square goodness of fit test

Chi-square test of independence.

Mathematically, these are actually the same test. However, we often think of them as different tests because they’re used for different purposes.

You can use a chi-square goodness of fit test when you have one categorical variable. It allows you to test whether the frequency distribution of the categorical variable is significantly different from your expectations. Often, but not always, the expectation is that the categories will have equal proportions.

  • Null hypothesis ( H 0 ): The bird species visit the bird feeder in equal proportions.
  • Alternative hypothesis ( H A ): The bird species visit the bird feeder in different proportions.

Expectation of different proportions

  • Null hypothesis ( H 0 ): The bird species visit the bird feeder in the same proportions as the average over the past five years.
  • Alternative hypothesis ( H A ): The bird species visit the bird feeder in different proportions from the average over the past five years.

You can use a chi-square test of independence when you have two categorical variables. It allows you to test whether the two variables are related to each other. If two variables are independent (unrelated), the probability of belonging to a certain group of one variable isn’t affected by the other variable .

  • Null hypothesis ( H 0 ): The proportion of people who are left-handed is the same for Americans and Canadians.
  • Alternative hypothesis ( H A ): The proportion of people who are left-handed differs between nationalities.

Other types of chi-square tests

Some consider the chi-square test of homogeneity to be another variety of Pearson’s chi-square test. It tests whether two populations come from the same distribution by determining whether the two populations have the same proportions as each other. You can consider it simply a different way of thinking about the chi-square test of independence.

McNemar’s test is a test that uses the chi-square test statistic. It isn’t a variety of Pearson’s chi-square test, but it’s closely related. You can conduct this test when you have a related pair of categorical variables that each have two groups. It allows you to determine whether the proportions of the variables are equal.

Contingency table of ice cream flavor preference
Like chocolate Dislike chocolate
Like vanilla 47 32
Dislike vanilla 8 13
  • Null hypothesis ( H 0 ): The proportion of people who like chocolate is the same as the proportion of people who like vanilla.
  • Alternative hypothesis ( H A ): The proportion of people who like chocolate is different from the proportion of people who like vanilla.

There are several other types of chi-square tests that are not Pearson’s chi-square tests, including the test of a single variance and the likelihood ratio chi-square test .

The exact procedure for performing a Pearson’s chi-square test depends on which test you’re using, but it generally follows these steps:

  • Create a table of the observed and expected frequencies. This can sometimes be the most difficult step because you will need to carefully consider which expected values are most appropriate for your null hypothesis.
  • Calculate the chi-square value from your observed and expected frequencies using the chi-square formula.
  • Find the critical chi-square value in a chi-square critical value table or using statistical software.
  • Compare the chi-square value to the critical value to determine which is larger.
  • Decide whether to reject the null hypothesis. You should reject the null hypothesis if the chi-square value is greater than the critical value. If you reject the null hypothesis, you can conclude that your data are significantly different from what you expected.

If you decide to include a Pearson’s chi-square test in your research paper , dissertation or thesis , you should report it in your results section . You can follow these rules if you want to report statistics in APA Style :

  • You don’t need to provide a reference or formula since the chi-square test is a commonly used statistic.
  • Refer to chi-square using its Greek symbol, Χ 2 . Although the symbol looks very similar to an “X” from the Latin alphabet, it’s actually a different symbol. Greek symbols should not be italicized.
  • Include a space on either side of the equal sign.
  • If your chi-square is less than zero, you should include a leading zero (a zero before the decimal point) since the chi-square can be greater than zero.
  • Provide two significant digits after the decimal point.
  • Report the chi-square alongside its degrees of freedom , sample size, and p value , following this format: Χ 2 (degrees of freedom, N = sample size) = chi-square value, p = p value).

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Pearson correlation
  • Null hypothesis

Methodology

  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

The two main chi-square tests are the chi-square goodness of fit test and the chi-square test of independence .

Both chi-square tests and t tests can test for differences between two groups. However, a t test is used when you have a dependent quantitative variable and an independent categorical variable (with two groups). A chi-square test of independence is used when you have two categorical variables.

Both correlations and chi-square tests can test for relationships between two variables. However, a correlation is used when you have two quantitative variables and a chi-square test of independence is used when you have two categorical variables.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Chi-Square (Χ²) Tests | Types, Formula & Examples. Scribbr. Retrieved August 7, 2024, from https://www.scribbr.com/statistics/chi-square-tests/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, chi-square test of independence | formula, guide & examples, chi-square goodness of fit test | formula, guide & examples, chi-square (χ²) distributions | definition & examples, what is your plagiarism score.

11.2 - Goodness of Fit Test

A chi-square goodness-of-fit test can be conducted when there is one categorical variable with more than two levels. If there are exactly two categories, then a one proportion z test may be conducted. The levels of that categorical variable must be mutually exclusive. In other words, each case must fit into one and only one category.

We can test that the proportions are all equal to one another or we can test any specific set of proportions.

If the expected counts, which we'll learn how to compute shortly, are all at least five, then the chi-square distribution may be used to approximate the sampling distribution. If any expected count is less than five, then a randomization test should be conducted. 

  • According to one research study, about 90% of American adults are right-handed, 9% are left-handed, and 1% are ambidextrous. Are the proportions of Penn State students who are right-handed, left-handed, and ambidextrous different from these national values?
  • A concessions stand sells blue, red, purple, and green freezer pops. They survey a sample of children and ask which of the four colors is their favorite. They want to know if the colors differ in popularity. 

Test Statistic

In conducting a goodness-of-fit test, we compare observed counts to expected counts. Observed counts are the number of cases in the sample in each group. Expected counts are computed given that the null hypothesis is true; this is the number of cases we would expect to see in each cell if the null hypothesis were true.

\(n\) is the total sample size \(p_i\) is the hypothesized proportion of the "ith" group

The observed and expected values are then used to compute the chi-square (\(\chi^2\)) test statistic.

\(\chi^2=\sum \dfrac{(Observed-Expected)^2}{Expected}\)

Approximating the Sampling Distribution

StatKey  has the ability to conduct a randomization test for a goodness-of-fit test. There is an example of this in Section 7.1 of the Lock 5  textbook.  If all expected values are at least five, then the sampling distribution can be approximated using a chi-square distribution.

Like the t distribution, the chi-square distribution varies depending on the degrees of freedom. Degrees of freedom for a chi-square goodness-of-fit test are equal to the number of groups minus 1. The distribution plot below compares the chi-square distributions with 2, 4, and 6 degrees of freedom.

Probability distribution plot made using Minitab Express; 3 chi-square distributions are overlaid with degrees of freedom of 2, 4, and 6

To find the p-value we find the area under the chi-square distribution to the right of our test statistic. A chi-square test is always right-tailed. 

11.2.1 - Five Step Hypothesis Testing Procedure

The examples on the following pages use the five step hypothesis testing procedure outlined below. This is the same procedure that we used to conduct a hypothesis test for a single mean, single proportion, difference in two means, and difference in two proportions.

When conducting a chi-square goodness-of-fit test, it makes the most sense to write the hypotheses first. The hypotheses will depend on the research question. The null hypothesis will always contain the equalities and the alternative hypothesis will be that at least one population proportion is not as specified in the null.

In order to use the chi-square distribution to approximate the sampling distribution, all expected counts must be at least five.

Expected Count

Where \(n\) is the total sample size and \(p_i\) is the hypothesized population proportion in the "ith" group.

To check this assumption, compute all expected counts and confirm that each is at least five.

In Step 1 you already computed the expected counts. Use this formula to compute the chi-square test statistic:

Chi-Square Test Statistic

\(\chi^2=\sum \dfrac{(O-E)^2}{E}\)

Where \(O\) is the observed count for each cell and \(E\) is the expected count for each cell.

Construct a chi-square distribution with degrees of freedom equal to the number of groups minus one. The p-value is the area under that distribution to the right of the test statistic that was computed in Step 2. You can find this area by constructing a probability distribution plot in Minitab. 

Unless otherwise stated, use the standard 0.05 alpha level.

\(p \leq \alpha\) reject the null hypothesis.

\(p > \alpha\) fail to reject the null hypothesis.

Go back to the original research question and address it directly. If you rejected the null hypothesis, then there is convincing evidence that at least one of the population proportions is not as stated in the null hypothesis. If you failed to reject the null hypothesis, then there is not enough evidence that any of the population proportions are different from what is stated in the null hypothesis. 

11.2.1.1 - Video: Cupcakes (Equal Proportions)

11.2.1.2- Cards (Equal Proportions)

Example: cards.

Research question : When randomly selecting a card from a deck with replacement, are we equally likely to select a heart, diamond, spade, and club?

I randomly selected a card from a standard deck 40 times with replacement. I pulled 13 hearts, 8 diamonds, 8 spades, and 11 clubs.

Let's use the five-step hypothesis testing procedure:

\(H_0: p_h=p_d=p_s=p_c=0.25\) \(H_a:\) at least one \(p_i\) is not as specified in the null

We can use the null hypothesis to check the assumption that all expected counts are at least 5.

\(Expected\;count=n (p_i)\)

All \(p_i\) are 0.25. \(40(0.25)=10\), thus this assumption is met and we can approximate the sampling distribution using the chi-square distribution.

 \(\chi^2=\sum \dfrac{(Observed-Expected)^2}{Expected} \)

All expected values are 10. Our observed values were 13, 8, 8, and 11.

\(\chi^2=\dfrac{(13-10)^2}{10}+\dfrac{(8-10)^2}{10}+\dfrac{(8-10)^2}{10}+\dfrac{(11-10)^2}{10}\) \(\chi^2=\dfrac{9}{10}+\dfrac{4}{10}+\dfrac{4}{10}+\dfrac{1}{10}\) \(\chi^2=1.8\)

Our sampling distribution will be a chi-square distribution.

\(df=k-1=4-1=3\)

We can find the p-value by constructing a chi-square distribution with 3 degrees of freedom to find the area to the right of \(\chi^2=1.8\)

Chi-squared distribution plot made using Minitab Express; degrees of freedom equal 3; area to the right of chi-squared value of 1.8 is 0.614935

The p-value is 0.614935

\(p>0.05\) therefore we fail to reject the null hypothesis.

There is not enough evidence to state that the proportion of hearts, diamonds, spades, and clubs that are randomly drawn from this deck are different.

11.2.1.3 - Roulette Wheel (Different Proportions)

Example: roulette wheel.

Research Question : An American roulette wheel contains 38 slots: 18 red, 18 black, and 2 green.  A casino has purchased a new wheel and they want to know if there is convincing evidence that the wheel is unfair. They spin the wheel 100 times and it lands on red 44 times, black 49 times, and green 7 times.

If the wheel is fair then \(p_{red}=\dfrac{18}{38}\), \(p_{black}=\dfrac{18}{38}\), and \(p_{green}=\dfrac{2}{38}\).

All of these proportions combined equal 1.

\(H_0: p_{red}=\dfrac{18}{38},\;p_{black}=\dfrac{18}{38}\;and\;p_{green}=\dfrac{2}{38}\)

\(H_a: At\;least\;one\;p_i\;is \;not\;as\;specified\;in\;the\;null\)

In order to conduct a chi-square goodness of fit test all expected values must be at least 5. 

For both red and black: \(Expected \;count=100(\dfrac{18}{38})=47.368\)

For green: \(Expected\;count=100(\dfrac{2}{38})=5.263\)

All expected counts are at least 5 so we can conduct a chi-square goodness of fit test. 

In the first step we computed the expected values for red and black to be 47.368 and for green to be 5.263.

 \(\chi^2= \dfrac{(44-47.368)^2}{47.368}+\dfrac{(49-47.368)^2}{47.368}+\dfrac{(7-5.263)^2}{5.263} \)

 \(\chi^2=0.239+0.056+0.573=0.868\)

\(df=k-1=3-1=2\)

We can find the p-value by constructing a chi-square distribution with 2 degrees of freedom to find the area to the right of \(\chi^2=0.868\)

Chi-squared distribution plot made using Minitab Express; degrees of freedom equal 2; area to the right of chi-squared value of 0.868 is shaded with a proportion of 0.647912

The p-value is 0.647912

\(p>0.05\) therefore we should fail to reject the null hypothesis. 

There is not enough evidence that this roulette wheel is unfair.

11.2.2 - Minitab: Goodness-of-Fit Test

Research Question : When randomly selecting a card from a deck with replacement, are we equally likely to select a heart, diamond, spade, and club?

I randomly selected a card from a standard deck 40 times with replacement. I pulled 13 Hearts ( ♥ ), 8 Diamonds ( ♦ ), 8 Spades (♠), and 11 Clubs (♣).

Minitab ®  – Conducting a Chi-Square Goodness-of-Fit Test

Summarized data, equal proportions.

To perform a chi-square goodness-of-fit test in Minitab using summarized data we first need to enter the data into the worksheet. Below you can see that we have one column with the names of each group and one column with the observed counts for each group.

  C1 C2
  Suit Count
1 Hearts 13
2 Diamonds 8
3 Spades 8
4 Clubs 11
  • After entering the data, select Stat > Tables > Chi-Square Goodness of Fit Test (One Variable)
  • Double-click  Count  to enter it into the  Observed Counts  box
  • Double-click  Suit  to enter it into the  Category names (optional) box
  • Click  OK

This should result in the following output:

Chi-Square Goodness-of-Fit Test: Count

Observed and expected counts.

Category Observed Test
Proportion
Expected Contribution
to Chi-Sq
Hearts 13 0.25 10 0.9
Diamonds 8 0.25 10 0.4
Spades 8 0.25 10 0.4
Clubs 11 0.25 10 0.1

Chi-Square Test

N DF Chi-Sq P-Value
40 3 1.8 0.615

All expected values are at least 5 so we can use the chi-square distribution to approximate the sampling distribution. Our results are \(\chi^2 (3) = 1.8\). \(p = 0.615\). Because our p-value is greater than the standard alpha level of 0.05, we fail to reject the null hypothesis. There is not enough evidence to conclude that the proportions are different in the population.

The example above tested equal population proportions. Minitab also has the ability to conduct a chi-square goodness-of-fit test when the hypothesized population proportions are not all equal. To do this, you can choose to test specified proportions or to use proportions based on historical counts.

11.2.2.1 - Example: Summarized Data, Equal Proportions

Example: tulips.

A company selling tulip bulbs claims they have equal proportions of white, pink, and purple bulbs and that they fill customer orders by randomly selecting bulbs from the population of all of their bulbs.

You ordered 30 bulbs and received 16 white, 8 pink, and 6 purple.

Is there convincing evidence the bulbs you received were not randomly selected from a population with an equal proportion of each color?

Use Minitab to conduct a hypothesis test to address this research question. 

We'll go through each of the steps in the hypotheses test:

\(H_0\colon p_{white}=p_{pink}=p_{purple}=\dfrac{1}{3}\) \(H_a\colon\) at least one \(p_i\) is not \(\dfrac{1}{3}\)

All \(p_i\) are \(\frac{1}{3}\). \(30(\frac{1}{3})=10\), thus this assumption is met and we can approximate the sampling distribution using the chi-square distribution.

Let's use Minitab to calculate this.

First, enter the summarized data into a Minitab Worksheet.

 

C1

C2

 

Color

Count

1

White

16

2

Pink

8

3

Purple

6

  • Double-click Color to enter it into the  Category names (optional) box

Category

Observed

Test
Proportion

Expected

Contribution
to Chi-Sq

White

16

0.333333

10

3.6

Pink

8

0.333333

10

0.4

Purple

6

0.333333

10

1.6

N

DF

Chi-Sq

P-Value

30

2

5.6

0.061

The test statistic is a Chi-Square of 5.6.

The p-value from the output is 0.061.  

There is not enough evidence that the tulip bulbs were not randomly selected from a population with equal proportions of white, pink and purple.

11.2.2.2 - Example: Summarized Data, Different Proportions

Example: roulette.

An American roulette wheel contains 38 slots: 18 red, 18 black, and 2 green. A casino has purchased a new wheel and they want to know if there is convincing evidence that the wheel is unfair. They spin the wheel 100 times and it lands on red 44 times, black 49 times, and green 7 times.

Use Minitab to conduct a hypothesis test to address this question. 

If the wheel is 'fair' then the probability of red and black are both 18/38 and the probability of green is 2/38.

\(H_0\colon p_{red}=\dfrac{18}{38}, p_{black}=\dfrac{18}{38}, p_{green}=\dfrac{2}{38}\) \(H_a\colon\) at least one \(p_i\) is not as specified in the null

With n = 100 we meet the assumptions needed to use Chi-square.

 

C1

C2

 

Color

Count

1

Red

44

2

Black

49

3

Green

7

  • For Test select Input constants
  • Select Proportions specified by historical counts (this is what we would expect if the null was true)
  • Enter 18/38 for Black, 2/38 for Green and 18/38 for Red

Category

Observed

Historical Counts

Test
Proportion

Expected

Contribution
to Chi-Sq

Red

44

18

0.473684

47.3684

0.239532

Black

49

18

0.473684

47.3684

0.056199

Green

7

2

0.052632

5.2632

0.573158

N

DF

Chi-Sq

P-Value

100

2

0.868889

0.648

The test statistic is a Chi-Square of 0.87.

The p-value from the output is 0.648.  

There is not enough evidence to state that this roulette wheel is unfair.

JMP | Statistical Discovery.™ From SAS.

Statistics Knowledge Portal

A free online introduction to statistics

Chi-Square Goodness of Fit Test

What is the chi-square goodness of fit test.

The Chi-square goodness of fit test is a statistical hypothesis test used to determine whether a variable is likely to come from a specified distribution or not. It is often used to evaluate whether sample data is representative of the full population.

When can I use the test?

You can use the test when you have counts of values for a categorical variable.

Is this test the same as Pearson’s Chi-square test?

Using the chi-square goodness of fit test.

The Chi-square goodness of fit test checks whether your sample data is likely to be from a specific theoretical distribution. We have a set of data values, and an idea about how the data values are distributed. The test gives us a way to decide if the data values have a “good enough” fit to our idea, or if our idea is questionable.

What do we need?

For the goodness of fit test, we need one variable. We also need an idea, or hypothesis, about how that variable is distributed. Here are a couple of examples:

  • We have bags of candy with five flavors in each bag. The bags should contain an equal number of pieces of each flavor. The idea we'd like to test is that the proportions of the five flavors in each bag are the same.
  • For a group of children’s sports teams, we want children with a lot of experience, some experience and no experience shared evenly across the teams. Suppose we know that 20 percent of the players in the league have a lot of experience, 65 percent have some experience and 15 percent are new players with no experience. The idea we'd like to test is that each team has the same proportion of children with a lot, some or no experience as the league as a whole.

To apply the goodness of fit test to a data set we need:

  • Data values that are a simple random sample from the full population.
  • Categorical or nominal data. The Chi-square goodness of fit test is not appropriate for continuous data.
  • A data set that is large enough so that at least five values are expected in each of the observed data categories. 

Chi-square goodness of fit test example

Let’s use the bags of candy as an example. We collect a random sample of ten bags. Each bag has 100 pieces of candy and five flavors. Our hypothesis is that the proportions of the five flavors in each bag are the same.

Let’s start by answering: Is the Chi-square goodness of fit test an appropriate method to evaluate the distribution of flavors in bags of candy?

  • We have a simple random sample of 10 bags of candy. We meet this requirement.
  • Our categorical variable is the flavors of candy. We have the count of each flavor in 10 bags of candy. We meet this requirement.
  • Each bag has 100 pieces of candy. Each bag has five flavors of candy. We expect to have equal numbers for each flavor. This means we expect 100 / 5 = 20 pieces of candy in each flavor from each bag. For 10 bags in our sample, we expect 10 x 20 = 200 pieces of candy in each flavor. This is more than the requirement of five expected values in each category.

Based on the answers above, yes, the Chi-square goodness of fit test is an appropriate method to evaluate the distribution of the flavors in bags of candy. 

Figure 1 below shows the combined flavor counts from all 10 bags of candy.

null hypothesis for chi square goodness of fit test

Without doing any statistics, we can see that the number of pieces for each flavor are not the same. Some flavors have fewer than the expected 200 pieces and some have more. But how different are the proportions of flavors? Are the number of pieces “close enough” for us to conclude that across many bags there are the same number of pieces for each flavor? Or are the number of pieces too different for us to draw this conclusion? Another way to phrase this is, do our data values give a “good enough” fit to the idea of equal numbers of pieces of candy for each flavor or not?

To decide, we find the difference between what we have and what we expect. Then, to give flavors with fewer pieces than expected the same importance as flavors with more pieces than expected, we square the difference. Next, we divide the square by the expected count, and sum those values. This gives us our test statistic.

These steps are much easier to understand using numbers from our example.

Let’s start by listing what we expect if each bag has the same number of pieces for each flavor.  Above, we calculated this as 200 for 10 bags of candy.

Table 1: Comparison of actual vs expected number of pieces of each flavor of candy

FlavorNumber of Pieces of Candy (10 bags)Expected Number of Pieces of Candy
Apple180200
Lime250200
Cherry120200
Cherry225200
Grape225200

Now, we find the difference between what we have observed in our data and what we expect. The last column in Table 2 below shows this difference:

Table 2: Difference between observed and expected pieces of candy by flavor

FlavorNumber of Pieces of Candy (10 bags)Expected Number of Pieces of CandyObserved-Expected
Apple180200180-200 = -20
Lime250200250-200 = 50
Cherry120200120-200 = -80
Orange225200225-200 = 25
Grape225200225-200 = 25

Some of the differences are positive and some are negative. If we simply added them up, we would get zero. Instead, we square the differences. This gives equal importance to the flavors of candy that have fewer pieces than expected, and the flavors that have more pieces than expected.

Table 3: Calculation of the squared difference between Observed and Expected for each flavor of candy

FlavorNumber of Pieces of Candy (10 bags)Expected Number of Pieces of CandyObserved-ExpectedSquared Difference
Apple180200180-200 = -20400
Lime250200250-200 = 502500
Cherry120200120-200 = -806400
Orange225200225-200 = 25625
Grape225200225-200 = 25625

Next, we divide the squared difference by the expected number:

Table 4: Calculation of the squared difference/expected number of pieces of candy per flavor

FlavorNumber of Pieces of Candy (10 bags)Expected Number of Pieces of CandyObserved-ExpectedSquared DifferenceSquared Difference / Expected Number
Apple180200180-200 = -20400400 / 200 = 2
Lime250200250-200 = 5025002500 / 200 = 12.5
Cherry120200120-200 = -8064006400 / 200 = 32
Orange225200225-200 = 25625625 / 200 = 3.125
Grape225200225-200 = 25625625 / 200 = 3.125

Finally, we add the numbers in the final column to calculate our test statistic:

$ 2 + 12.5 + 32 + 3.125 + 3.125 = 52.75 $

To draw a conclusion, we compare the test statistic to a critical value from the Chi-Square distribution . This activity involves four steps:

  • We first decide on the risk we are willing to take of drawing an incorrect conclusion based on our sample observations. For the candy data, we decide prior to collecting data that we are willing to take a 5% risk of concluding that the flavor counts in each bag across the full population are not equal when they really are. In statistics-speak, we set the significance level, α , to 0.05.
  • We calculate a test statistic. Our test statistic is 52.75.
  • We find the theoretical value from the Chi-square distribution based on our significance level. The theoretical value is the value we would expect if the bags contain the same number of pieces of candy for each flavor. In addition to the significance level, we also need the degrees of freedom to find this value. For the goodness of fit test, this is one fewer than the number of categories. We have five flavors of candy, so we have 5 – 1 = 4 degrees of freedom. The Chi-square value with α = 0.05 and 4 degrees of freedom is 9.488.
  • We compare the value of our test statistic (52.75) to the Chi-square value. Since 52.75 > 9.488, we reject the null hypothesis that the proportions of flavors of candy are equal.

We make a practical conclusion that bags of candy across the full population do not have an equal number of pieces for the five flavors. This makes sense if you look at the original data. If your favorite flavor is Lime, you are likely to have more of your favorite flavor than the other flavors. If your favorite flavor is Cherry, you are likely to be unhappy because there will be fewer pieces of Cherry candy than you expect.

Understanding results

Let’s use a few graphs to understand the test and the results.

A simple bar chart of the data shows the observed counts for the flavors of candy:

null hypothesis for chi square goodness of fit test

Another simple bar chart shows the expected counts of 200 per flavor. This is what our chart would look like if the bags of candy had an equal number of pieces of each flavor.

null hypothesis for chi square goodness of fit test

The side-by-side chart below shows the actual observed number of pieces of candy in blue. The orange bars show the expected number of pieces. You can see that some flavors have more pieces than we expect, and other flavors have fewer pieces. 

null hypothesis for chi square goodness of fit test

The statistical test is a way to quantify the difference. Is the actual data from our sample “close enough” to what is expected to conclude that the flavor proportions in the full population of bags are equal? Or not? From the candy data above, most people would say the data is not “close enough” even without a statistical test.

What if your data looked like the example in Figure 5 below instead? The purple bars show the observed counts and the orange bars show the expected counts. Some people would say the data is “close enough” but others would say it is not. The statistical test gives a common way to make the decision, so that everyone makes the same decision on a set of data values. 

null hypothesis for chi square goodness of fit test

Statistical details

Let’s look at the candy data and the Chi-square test for goodness of fit using statistical terms. This test is also known as Pearson’s Chi-square test.

Our null hypothesis is that the proportion of flavors in each bag is the same. We have five flavors. The null hypothesis is written as:

$ H_0: p_1 = p_2 = p_3 = p_4 = p_5 $

The formula above uses p for the proportion of each flavor. If each 100-piece bag contains equal numbers of pieces of candy for each of the five flavors, then the bag contains 20 pieces of each flavor. The proportion of each flavor is 20 / 100 = 0.2.

The alternative hypothesis is that at least one of the proportions is different from the others. This is written as:

$ H_a: at\ least\ one\ p_i\ not\ equal $

In some cases, we are not testing for equal proportions. Look again at the example of children's sports teams near the top of this page.  Using that as an example, our null and alternative hypotheses are:

$ H_0: p_1 = 0.2, p_2 = 0.65, p_3 = 0.15 $

$ H_a: at\ least\ one\ p_i\ not\ equal\ to\ expected\ value $

Unlike other hypotheses that involve a single population parameter, we cannot use just a formula. We need to use words as well as symbols to describe our hypotheses.

We calculate the test statistic using the formula below:

$ \sum^n_{i=1} \frac{(O_i-E_i)^2}{E_i} $

In the formula above, we have n groups. The $ \sum $ symbol means to add up the calculations for each group. For each group, we do the same steps as in the candy example. The formula shows O i  as the Observed value and E i    as the Expected value for a group.

We then compare the test statistic to a Chi-square value with our chosen significance level (also called the alpha level) and the degrees of freedom for our data. Using the candy data as an example, we set α = 0.05 and have four degrees of freedom. For the candy data, the Chi-square value is written as:

$ χ²_{0.05,4} $

There are two possible results from our comparison:

  • The test statistic is lower than the Chi-square value. You fail to reject the hypothesis of equal proportions. You conclude that the bags of candy across the entire population have the same number of pieces of each flavor in them. The fit of equal proportions is “good enough.”
  • The test statistic is higher than the Chi-Square value. You reject the hypothesis of equal proportions. You cannot conclude that the bags of candy have the same number of pieces of each flavor. The fit of equal proportions is “not good enough.”

Let’s use a graph of the Chi-square distribution to better understand the test results. You are checking to see if your test statistic is a more extreme value in the distribution than the critical value. The distribution below shows a Chi-square distribution with four degrees of freedom. It shows how the critical value of 9.488 “cuts off” 95% of the data. Only 5% of the data is greater than 9.488.

null hypothesis for chi square goodness of fit test

The next distribution plot includes our results. You can see how far out “in the tail” our test statistic is, represented by the dotted line at 52.75. In fact, with this scale, it looks like the curve is at zero where it intersects with the dotted line. It isn’t, but it is very, very close to zero. We conclude that it is very unlikely for this situation to happen by chance. If the true population of bags of candy had equal flavor counts, we would be extremely unlikely to see the results that we collected from our random sample of 10 bags.

null hypothesis for chi square goodness of fit test

Most statistical software shows the p-value for a test. This is the likelihood of finding a more extreme value for the test statistic in a similar sample, assuming that the null hypothesis is correct. It’s difficult to calculate the p-value by hand. For the figure above, if the test statistic is exactly 9.488, then the p - value will be p=0.05. With the test statistic of 52.75, the p - value is very, very small. In this example, most statistical software will report the p - value as “p < 0.0001.” This means that the likelihood of another sample of 10 bags of candy resulting in a more extreme value for the test statistic is less than one chance in 10,000, assuming our null hypothesis of equal counts of flavors is true.

Logo for MacEwan Open Books

11.3 Chi-Square Goodness-of-Fit Test

The chi-square goodness-of-fit test can be applied to either a categorical or discrete quantitative variable with a finite number of values. The objective of the chi-square goodness-of-fit test is to test whether the variable does not follow the probability distribution specified in the null hypothesis [latex]H_0[/latex].

The main idea behind the chi-square goodness-of-fit test is to compare the observed frequencies (O) to the expected frequencies ([latex]E[/latex]), which are based on the probability distribution specified in [latex]H_0[/latex]. If [latex]H_0[/latex] is true, the observed and expected frequencies should be reasonably similar. Therefore, we reject [latex]H_0[/latex] if the observed and expected frequencies are very different. The discrepancy between the observed and expected frequencies can be quantified by chi-square statistic

[latex]\chi^2 = \sum_{\text{all cells}} \frac{(O - E)^2}{E}[/latex]

which follows a chi-square distribution with [latex]df = k-1[/latex], where [latex]k[/latex] is the number of possible values for the variable under consideration. The chi-square statistic will be large when the observed and expected frequencies are very different. Thus, we reject the null hypothesis when the chi-square statistic is sufficiently large. More specifically, at the significance level of [latex]\alpha[/latex], we reject [latex]H_0[/latex] if the chi-square statistic is larger than the critical value [latex]\chi_{\alpha}^2[/latex]. Since we only reject [latex]H_0[/latex] if the chi-square statistic is sufficiently large, chi-square tests are always right-tailed. That is, both the rejection region and the p-value are upper-tailed probabilities.

Chi-Square Goodness-of-Fit Test

Assumptions :

  • All expected frequencies are at least 1.
  • At most 20% of the expected frequencies are less than 5.
  • Simple random sample (if you need to generalize the conclusion to a larger population).

Note : If assumptions 1 or 2 are violated, one can consider combining the cells to increase the counts in those cells.

Steps to perform a chi-square goodness-of-fit test :

First, check the assumptions. Calculate the expected frequency for each possible value of the variable using [latex]E=np[/latex], where [latex]n[/latex] is the total number of observations and [latex]p[/latex] is the relative frequency (or probability) specified in the null hypothesis. Check whether the expected frequencies satisfy assumptions 1 and 2. If not, consider combining some cells.

  • Set up the hypotheses: [latex]\begin{align*} H_0 &: \text{The variable has the specified distribution }\\ H_a &: \text{The variable does not have the specified distribution}. \end{align*}[/latex]
  • State the significance level [latex]\alpha[/latex].
  • Compute the value of the test statistic: [latex]\chi_o^2 = \sum_{\text{all cells}} \frac{(O - E)^2}{E}[/latex] with [latex]df = k-1[/latex].
Rejection region [latex]\chi^2 \geq \chi_{\alpha}^2[/latex] the region to the right of [latex]\chi_{\alpha}^2[/latex], the area is [latex]\alpha[/latex]
P-value  [latex]P(\chi^2 \geq \chi_o^2)[/latex] the area to the right of [latex]\chi_o^2[/latex] under the curve
  • Reject the null [latex]H_0[/latex] if P-value [latex]\leq \alpha[/latex] or [latex]\chi_o^2[/latex] falls in the rejection region.
  • Conclusion.

Example: Chi-Square Goodness-of-Fit Test

According to the results of the federal election in 2015, 31.9% of votes supported the Conservative Party, 39.5% supported the Liberal Party, 19.7% supported the New Democratic Party (NDP), 4.7% supported Bloc Québécois, and 3.4% supported the Green Party (data from Wikipedia). Thirty-seven students in my Stat151 class responded to an online survey and their preferences are summarized in the following table:

Table 11.2 : Voting Preference of the Class

Test at the 5% significance level whether the class had different voting preferences than all Canadians in the 2015 election.

Check the assumptions : since [latex]n = 37[/latex], each expected frequency is computed as [latex]E = np = 37 \times p[/latex]. For example, the expected count of conservative voters is [latex]E = 37 \times 0.319 = 11.803[/latex]. The following table gives all expected counts:

Table 11.3 : Expected Frequency of Voting Preference

Conservative Green Liberal NDP Bloc Québécois Others
Proportion [latex](p)[/latex]
Counts

There are [latex]k = 6[/latex] cells and at most [latex]6 \times 0.2 = 1.2[/latex] cells are expected to have expected counts less than 5; however, there are actually three cells less than 5. We could combine the cells “Green”, “Bloc Québécois” and “Others”, and name it as “Others”. Therefore, we have the working table as follows.

Table 11.4 : Working Table for a Chi-Square Goodness of Fit Test (Example)

NDP

Note: After combining the cells, all the expected counts are greater than 1, while 25% of the expected counts are below 5 (the expected count for Others is below 5). Since more than 20% of the expected counts are below 5, there is still a violation in the assumptions. However, the expected frequency for “Others” is 3.293 which is not very far away from 5. To maintain a meaningful number of parties, we proceed to conduct the chi-square goodness-of-fit test.

  • Set up the hypotheses: [latex]\begin{align*} H_0 & : p_{\scriptsize C} = 0.319, p_{\scriptsize L} = 0.395, p_{\scriptsize NDP} = 0.197, p_{\scriptsize Others} = 0.089 \\ H_a & : \text{At least one proportion is different from those specified in } H_0. \end{align*}[/latex]
  • The significance level is [latex]\alpha = 0.05[/latex].
  • The test statistic: [latex]\chi_o^2 = \sum_{\text{all cells}} \frac{(O- E)^2}{E} = 2.1677[/latex], with [latex]df = k -1 = 4 - 1 =3[/latex].
  • Find the P-value. Since chi-square tests are always right-tailed, the p-value is P-value [latex]= P(\chi^2 \geq \chi_o^2) = P(\chi^2 \geq 2.1677) \: \gt \: 0.1[/latex].
  • Decision: We do not reject the null [latex]H_0[/latex], since P-value [latex]\: \gt \: 0.1 \: \gt \: 0.05(\alpha)[/latex].
  • Conclusion: At the 5% significance level, we do not have sufficient evidence that the class had different voting preferences than all Canadians in the 2015 election.

If using the critical value approach, steps 4–6 are as follows :

  • Find the rejection region. For a right-tailed test with [latex]df=3[/latex], the rejection region is to the right of the critical value [latex]\chi^2 \geq \chi_{\alpha}^2 = \chi_{0.05}^2 = 7.815[/latex].
  • Decision: We do not reject the null [latex]H_0[/latex] since [latex]\chi_o^2 = 2.1667

null hypothesis for chi square goodness of fit test

Exercise: Chi-square goodness-of-fit test

A company claims their deluxe mixed nuts consist of 20% peanuts, 60% cashews, and 20% almonds. An inspector obtains a random sample of [latex]n = 100[/latex] nuts and observes 30 peanuts, 55 cashews, and 15 almonds. Test at the 5% significance level whether the percentages differ from what the company claims.

Check the assumptions : [latex]n = 100[/latex] and the expected counts are [latex]E_{\text{peanut}} = 100 \times 0.2 = 20, E_{\text{cashew}} = 100 \times 0.6 = 60,[/latex] [latex]E_{\text{almond}} = 100 \times 0.2 = 20[/latex] and all greater than 5.

  • Set up the hypotheses: [latex]\begin{align*} H_0 &: p_{\text{peanut}} = 0.2, p_{\text{cashew}} = 0.6, p_{\text{almond}} = 0.2  \\ H_a &: \text{at least one proportion is different from those specified in } H_0. \end{align*}[/latex]

Table 11.5 : Working Table for Chi-Square Goodness-of-Fit Test (Exercise)




[latex]E = np = 100 \times p[/latex]
[latex]\frac{(O-E)^2}{E}[/latex]
  • Find the P-value: P-value [latex]P(\chi^2 \geq \chi_o^2) = P(\chi^2 \geq 6.667)[/latex]. Since [latex]5.991 (\chi_{0.05}^2) < \chi_o^2=6.667 < 7.378 (\chi_{0.025}^2)[/latex], 0.025 < P-value < 0.05.
  • Decision: We should reject the null [latex]H_0[/latex] since P-value <0.05([latex]\alpha[/latex]).
  • Conclusion: At the 5% significance level, we have sufficient evidence that the percentages of nuts are different from what the company claims.

Introduction to Applied Statistics Copyright © 2024 by Wanhua Su is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Module 11: The Chi Square Distribution

Goodness-of-fit test, learning outcomes.

  • Conduct and interpret chi-square goodness-of-fit hypothesis tests

In this type of hypothesis test, you determine whether the data “fit” a particular distribution or not. For example, you may suspect your unknown data fit a binomial distribution. You use a chi-square test (meaning the distribution for the hypothesis test is chi-square) to determine if there is a fit or not. The null and the alternative hypotheses for this test may be written in sentences or may be stated as equations or inequalities.

The test statistic for a goodness-of-fit test is: [latex]\displaystyle{\sum_{k}}\frac{{({O}-{E})}^{{2}}}{{E}}[/latex]

  • O = observed values (data)
  • E = expected values (from theory)
  • k = the number of different data cells or categories

The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true. There are  n terms of the form [latex]\displaystyle\frac{{({O}-{E})}^{{2}}}{{E}}[/latex].

The number of degrees of freedom is  df = (number of categories – 1).

The goodness-of-fit test is almost always right-tailed. If the observed values and the corresponding expected values are not close to each other, then the test statistic can get very large and will be way out in the right tail of the chi-square curve.

Note:  The expected value for each cell needs to be at least five in order for you to use this test.

Absenteeism of college students from math classes is a major concern to math instructors because missing class appears to increase the drop rate. Suppose that a study was done to determine if the actual student absenteeism rate follows faculty perception. The faculty expected that a group of 100 students would miss class according to this table.

Number of absences per term Expected number of students
0–2 50
3–5 30
6–8 12
9–11 6
12+ 2

A random survey across all mathematics courses was then done to determine the actual number  (observed) of absences in a course. The chart in this table displays the results of that survey.

Number of absences per term Actual number of students
0–2 35
3–5 40
6–8 20
9–11 1
12+ 4

Determine the null and alternative hypotheses needed to conduct a goodness-of-fit test.

H 0 : Student absenteeism fits faculty perception.

The alternative hypothesis is the opposite of the null hypothesis.

H a : Student absenteeism does not fit faculty perception.

  • Can you use the information as it appears in the charts to conduct the goodness-of-fit test?
  • What is the number of degrees of freedom ( df )?
Number of absences per term Expected number of students
0–2 50
3–5 30
6–8 12
9+ 8
Number of absences per term Actual number of students
0–2 35
3–5 40
6–8 20
9+ 5
  • There are four “cells” or categories in each of the new tables. df = number of cells – 1 = 4 – 1 = 3

A factory manager needs to understand how many products are defective versus how many are produced. The number of expected defects is listed in the table.

Number produced Number defective
0–100 5
101–200 6
201–300 7
301–400 8
401–500 10

A random sample was taken to determine the actual number of defects. This table shows the results of the survey.

Number produced Number defective
0–100 5
101–200 7
201–300 8
301–400 9
401–500 11

State the null and alternative hypotheses needed to conduct a goodness-of-fit test, and state the degrees of freedom.

H 0 : The number of defaults fits expectations.

H a : The number of defaults does not fit expectations.

Employers want to know which days of the week employees are absent in a five-day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers were asked on which day of the week they had the highest number of employee absences. The results were distributed as in the table below. For the population of employees, do the days for the highest number of absences occur with equal frequencies during a five-day work week? Test at a 5% significance level.

Day of the Week Employees were Most Absent

Monday Tuesday Wednesday Thursday Friday
Number of Absences 15 12 9 9 15

The null and alternative hypotheses are:

  • H 0 : The absent days occur with equal frequencies, that is, they fit a uniform distribution.
  • H a : The absent days occur with unequal frequencies, that is, they do not fit a uniform distribution.

If the absent days occur with equal frequencies, then, out of 60 absent days (the total in the sample: 15 + 12 + 9 + 9 + 15 = 60), there would be 12 absences on Monday, 12 on Tuesday, 12 on Wednesday, 12 on Thursday, and 12 on Friday. These numbers are the expected ( E ) values. The values in the table are the observed ( O ) values or data.

This time, calculate the  χ 2 test statistic by hand. Make a chart with the following headings and fill in the columns:

  • Expected ( E ) values (12, 12, 12, 12, 12)
  • Observed ( O ) values (15, 12, 9, 9, 15)
  • ( O – E ) 2
  • [latex]\displaystyle\frac{{({O}-{E})}^{{2}}}{{E}}[/latex]

Now add (sum) the last column. The sum is three. This is the  χ 2 test statistic.

To find the  p -value, calculate P ( χ 2 > 3). This test is right-tailed. (Use a computer or calculator to find the p -value. You should get p -value = 0.5578.)

The  dfs are the number of cells – 1 = 5 – 1 = 4

Press  2nd DISTR . Arrow down to  χ2cdf . Press ENTER . Enter (3,10^99,4) . Rounded to four decimal places, you should see 0.5578, which is the p-value.

Next, complete a graph like the following one with the proper labeling and shading. (You should shade the right tail.)

This is a blank nonsymmetrical chi-square curve for the test statistic of the days of the week absent.

The decision is not to reject the null hypothesis.

Conclusion: At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.

TI-83+ and some TI-84 calculators do not have a special program for the test statistic for the goodness-of-fit test. The next example has the calculator instructions. The newer TI-84 calculators have in  STAT TESTS the test Chi2 GOF . To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS  and Chi2 GOF . Enter the list names for the Observed list and the Expected list. Enter the degrees of freedom and press  calculate or draw . Make sure you clear any lists before you start. To Clear Lists in the calculators: Go into  STAT EDIT and arrow up to the list name area of the particular list. Press  CLEAR and then arrow down. The list will be cleared. Alternatively, you can press STAT and press 4 (for  ClrList ). Enter the list name and press ENTER .

Teachers want to know which night each week their students are doing most of their homework. Most teachers think that students do homework equally throughout the week. Suppose a random sample of 49 students were asked on which night of the week they did the most homework. The results were distributed as in the table.

Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Number of Students 11 8 10 7 10 5 5

From the population of students, do the nights for the highest number of students doing the majority of their homework occur with equal frequencies during a week? What type of hypothesis test should you use?

p -value = 0.6093

We decline to reject the null hypothesis. There is not enough evidence to support that students do not do the majority of their homework equally throughout the week.

One study indicates that the number of televisions that American families have is distributed (this is the  given distribution for the American population) as in the table.

Number of Televisions Percent
0 10
1 16
2 55
3 11
4+ 8

The table contains expected ( E ) percents.

A random sample of 600 families in the far western United States resulted in the data in this table.

Number of Televisions Frequency
Total = 600
0 66
1 119
2 340
3 60
4+ 15

The table contains observed ( O ) frequency values.

At the 1% significance level, does it appear that the distribution “number of televisions” of far western United States families is different from the distribution for the American population as a whole?

This problem asks you to test whether the far western United States families distribution fits the distribution of the American families. This test is always right-tailed.

The first table contains expected percentages. To get expected ( E ) frequencies, multiply the percentage by 600. The expected frequencies are shown in this table.

Number of Televisions Percent Expected Frequency
0 10 (0.10)(600) = 60
1 16 (0.16)(600) = 96
2 55 (0.55)(600) = 330
3 11 (0.11)(600) = 66
over 3 8 (0.08)(600) = 48

Therefore, the expected frequencies are 60, 96, 330, 66, and 48. In the TI calculators, you can let the calculator do the math. For example, instead of 60, enter 0.10*600.

H 0 : The “number of televisions” distribution of far western United States families is the same as the “number of televisions” distribution of the American population.

H a : The “number of televisions” distribution of far western United States families is different from the “number of televisions” distribution of the American population.

Distribution for the test: [latex]\displaystyle\chi^{2}_{4}[/latex] where df = (the number of cells) – 1 = 5 – 1 = 4.

Note : [latex]df\neq600-1[/latex]

Calculate the test statistic: χ 2 = 29.65

This is a nonsymmetric chi-square curve with values of 0, 4, and 29.65 labeled on the horizontal axis. The value 4 coincides with the peak of the curve. A vertical upward line extends from 29.65 to the curve, and the region to the right of this line is shaded. The shaded area is equal to the p-value.

Probability statement: p -value = P ( χ 2 > 29.65) = 0.000006

Compare α and the p -value:

α = 0.01 p -value = 0.000006

So, α > p -value.

Make a decision: Since α > p -value, reject H o .

This means you reject the belief that the distribution for the far western states is the same as that of the American population as a whole.

Conclusion: At the 1% significance level, from the data, there is sufficient evidence to conclude that the “number of televisions” distribution for the far western United States is different from the “number of televisions” distribution for the American population as a whole.

Press STAT and ENTER . Make sure to clear lists L1 , L2 , and L3 if they have data in them (see the note at the end of Example 2). Into L1 , put the observed frequencies 66 , 119 , 349 , 60 , 15 . Into L2 , put the expected frequencies .10*600, .16*600 , .55*600 , .11*600 , .08*600 . Arrow over to list L3 and up to the name area L3 . Enter (L1-L2)^2/L2 and ENTER . Press 2nd QUIT . Press 2nd LIST and arrow over to MATH . Press 5 . You should see "sum" (Enter L3) . Rounded to 2 decimal places, you should see 29.65 . Press 2nd DISTR . Press 7 or Arrow down to 7:χ2cdf and press ENTER . Enter (29.65,1E99,4) . Rounded to four places, you should see 5.77E-6 = .000006  (rounded to six decimal places), which is the p-value.

The newer TI-84 calculators have in STAT TESTS the test Chi2 GOF . To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF . Enter the list names for the Observed list and the Expected list. Enter the degrees of freedom and press calculate or draw . Make sure you clear any lists before you start.

The expected percentage of the number of pets students have in their homes is distributed (this is the given distribution for the student population of the United States) as in this table.

Number of Pets Percent
0 18
1 25
2 30
3 18
4+ 9

A random sample of 1,000 students from the Eastern United States resulted in the data in the table below.

Number of Pets Frequency
0 210
1 240
2 320
3 140
4+ 90

At the 1% significance level, does it appear that the distribution “number of pets” of students in the Eastern United States is different from the distribution for the United States student population as a whole? What is the p -value?

Suppose you flip two coins 100 times. The results are 20 HH , 27 HT , 30 TH , and 23 TT . Are the coins fair? Test at a 5% significance level.

This problem can be set up as a goodness-of-fit problem. The sample space for flipping two fair coins is { HH , HT , TH , TT }. Out of 100 flips, you would expect 25 HH , 25 HT , 25 TH , and 25 TT . This is the expected distribution. The question, “Are the coins fair?” is the same as saying, “Does the distribution of the coins (20 HH , 27 HT , 30 TH , 23 TT ) fit the expected distribution?”

Random Variable: Let X = the number of heads in one flip of the two coins. X takes on the values 0, 1, 2. (There are 0, 1, or 2 heads in the flip of two coins.) Therefore, the number of cells is three . Since X = the number of heads, the observed frequencies are 20 (for two heads), 57 (for one head), and 23 (for zero heads or both tails). The expected frequencies are 25 (for two heads), 50 (for one head), and 25 (for zero heads or both tails). This test is right-tailed.

H 0 : The coins are fair.

H a : The coins are not fair.

Distribution for the test:  [latex]\chi^2_2[/latex] where df = 3 – 1 = 2.

Calculate the test statistic: χ 2 = 2.14

This is a nonsymmetrical chi-square curve with values of 0 and 2.14 labeled on the horizontal axis. A vertical upward line extends from 2.14 to the curve and the region to the right of this line is shaded. The shaded area is equal to the p-value.

Probability statement: p -value = P ( χ 2 > 2.14) = 0.3430

α < p -value.

Make a decision: Since α < p -value, do not reject H 0 .

Conclusion: There is insufficient evidence to conclude that the coins are not fair.

Press STAT and ENTER . Make sure you clear lists L1 , L2 , and L3 if they have data in them. Into L1 , put the observed frequencies 20 , 57 , 23 . Into L2 , put the expected frequencies 25 , 50 , 25 . Arrow over to list L3 and up to the name area "L3" . Enter (L1-L2)^2/L2 and ENTER . Press 2nd QUIT . Press 2nd LIST and arrow over to MATH . Press 5 . You should see "sum" . Enter L3 . Rounded to two decimal places, you should see 2.14 . Press 2nd DISTR . Arrow down to 7:χ2cdf (or press 7 ). Press ENTER . Enter 2.14,1E99,2) . Rounded to four places, you should see .3430 , which is the p-value.

Students in a social studies class hypothesize that the literacy rates across the world for every region are 82%. This table shows the actual literacy rates across the world broken down by region. What are the test statistic and the degrees of freedom?

MDG Region Adult Literacy Rate (%)
Developed Regions 99.0
Commonwealth of Independent States 99.5
Northern Africa 67.3
Sub-Saharan Africa 62.5
Latin America and the Caribbean 91.0
Eastern Asia 93.8
Southern Asia 61.9
South-Eastern Asia 91.9
Western Asia 84.5
Oceania 66.4

chi 2 test statistic = 26.38

This is a nonsymmetric chi-square curve with df = 9. The values 0, 9, and 26.38 are labeled on the horizontal axis. The value 9 coincides with the peak of the curve. A vertical upward line extends from 26.38 to the curve, and the region to the right of this line is shaded. The shaded area is equal to the p-value.

Press STAT and ENTER . Make sure you clear lists L1, L2, and L3 if they have data in them. Into L1, put the observed frequencies 99, 99.5, 67.3, 62.5, 91, 93.8, 61.9, 91.9, 84.5, 66.4 . Into L2 , put the expected frequencies 82, 82, 82, 82, 82, 82, 82, 82, 82, 82 . Arrow over to list L3 and up to the name area L3 . Enter (L1-L2)^2/L2 and ENTER . Press 2nd QUIT . Press 2nd LIST and arrow over to MATH . Press 5 . You should see "sum" . Enter L3 . Rounded to two decimal places, you should see 26.38 . Press 2nd DISTR . Arrow down to 7:χ2cdf (or press 7 ). Press ENTER . Enter 26.38,1E99,9) . Rounded to four places, you should see .0018 , which is the p -value.

  • OpenStax College, Introductory Statistics. Located at : . License : CC BY: Attribution
  • Introductory Statistics . Authored by : Barbara Illowski, Susan Dean. Provided by : Open Stax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution . License Terms : Download for free at http://cnx.org/contents/[email protected]
  • Pearson's chi square test (goodness of fit) | Probability and Statistics | Khan Academy . Authored by : Khan Academy. Located at : https://www.youtube.com/embed/2QeDRsxSF9M . License : All Rights Reserved . License Terms : Standard YouTube License
  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Goodness of Fit: Definition & Tests

By Jim Frost 2 Comments

What is Goodness of Fit?

Goodness of fit evaluates how well observed data align with the expected values from a statistical model.

Photo of a stylish suit that provides a good fit.

When diving into statistics , you’ll often ask, “How well does my model fit the data?” A tight fit? Your model’s excellent. A loose fit? Maybe reconsider that model. That’s the essence of goodness of fit. More specifically:

  • A high goodness of fit indicates the observed values are close to the model’s expected values.
  • A low goodness of fit shows the observed values are relatively far from the expected values.

A model that fits the data well provides accurate predictions and deeper insights, while a poor fit can lead to misleading conclusions and predictions. Ensuring a good fit is crucial for reliable outcomes and informed actions.

A goodness of fit measure summarizes the size of the differences between the observed data and the model’s expected values. A goodness of fit test determines whether the differences are statistically significant. Moreover, they can guide us in choosing a model offering better representation. The appropriate goodness of fit measure and test depend on the setting.

In this blog post, you’ll learn about the essence of goodness of fit in the crucial contexts of regression models and probability distributions. We’ll measure it in regression models and learn how to test sample data against distributions using goodness of fit tests.

Goodness of Fit in Regression Models

In regression models, understanding the goodness of fit is crucial to ensure accurate predictions and meaningful insights; here, we’ll delve into key metrics that reveal this alignment with the data.

A regression model fits the data well when the differences between the observed and predicted values are small and unbiased. Statisticians refer to these differences as residuals .

null hypothesis for chi square goodness of fit test

As the goodness of fit increases, the data points move closer to the model’s fitted line.

R-squared (R²)

R-squared is a goodness of fit statistic for linear regression models. It measures the percentage of the dependent variable variation the model explains using a convenient 0 – 100% scale.

R-squared evaluates the spread of the data around the fitted regression line. For a data set, higher R-squared values indicate smaller differences between the sample data and the fitted values.

Graph that illustrates a regression model with a low R-squared.

The model with the wider spread has an R-squared of 15% while the one with the narrower spread is 85%.

Think of R² as the percentage that explains the variation. Higher R²? Better fit.

  • High R²: Your model captures a lot of variation.
  • Low R²: The model doesn’t explain much of the variance.

Remember, it’s not the sole indicator. High R² doesn’t always mean a perfect model!

Learn more about How to Interpret R-squared and Independent vs Dependent Variables .

Standard Error of the Regression (S)

This standard error of the regression is a goodness of fit measure that provides the typical size of the absolute difference between observed and predicted values. S uses the units of the dependent variable (DV).

  • Small S: Predictions are close to the data values.
  • Large S: Predictions deviate more.

Suppose your model uses body mass index (BMI) to predict the body fat percentage (the DV). Consequently, if your model’s S is 3.5, then you know that its predicted values are typically 3.5% from the observed body fat percentage values.

However, don’t view it in isolation. Compare it with the dependent variable’s units for context.

Learn more about the Standard Error of the Regression .

Akaike’s Information Criterion (AIC)

Akaike’s Information Criterion is a goodness of fit measure that statisticians designed to compare models and help you pick the best one. The AIC value isn’t meaningful itself, but you’re looking for the model with the lowest AIC.

  • Lower AIC: Your model is probably better (when comparing).
  • Adjusts for complexity: Simpler models are preferred when they fit well.

Learn why you want a simpler model, which statisticians refer to as a parsimonious model: What is a Parsimonious Model? Benefits & Selecting .

There are other indicators, like Adjusted R² and BIC. Each has its unique strength. But for a start, focus on these three.

Goodness of Fit for Probability Distributions

Sometimes, your statistical model is that your data follow a particular probability distribution, such as the normal , lognormal , Poisson , or some other distribution. You want to know if your sample’s distribution is consistent with the hypothesized distribution. Learn more about Probability Distributions .

Because many statistical tests and methods rest on distributional assumptions.

For instance, t-tests and ANOVA assume your data are normal. Conversely, you might expect a Poisson distribution if you’re analyzing the number of daily website visits. Capability analysis in the quality arena depends on knowing precisely which distribution your data follow.

Enter goodness of fit tests.

A goodness of fit test determines whether the differences between your sample data and the distribution are statistically significant. In this context, statistical significance indicates the model does not adequately fit the data. The test results can guide the analytical procedures you’ll use.

I’ll cover two of the many available goodness of fit tests. The Anderson-Darling test works for continuous data , and the chi-square goodness of fit test is for categorical and discrete data.

Anderson-Darling Test

The Anderson-Darling goodness of fit test compares continuous sample data to a particular probability distribution. Statisticians often use it for normality tests, but the Anderson-Darling Test can also assess other probability distributions, making it versatile in statistical analysis.

The hypotheses for the Anderson-Darling test are the following:

  • Null Hypothesis (H₀) : The data follow the specified distribution.
  • Alternative Hypothesis (H A ) : The data do not follow the distribution.

When the p-value is less than your significance level , reject the null hypothesis . Consequently, statistically significant results for a goodness of fit test suggest your data do not fit the chosen distribution, prompting further investigation or model adjustments.

Imagine you’re researching the body fat percentages of pre-teen girls, and you want to know if these percentages follow a normal distribution. You can download the CSV data file: body_fat .

After collecting body fat data from 92 girls, you perform the Anderson-Darling Test and obtain the following results.

Statistical results for the normality goodness of fit test.

Because the p-value is less than 0.05, reject the null hypothesis and conclude the sample data do not follow a normal distribution.

Learn how to identify the distribution of this bodyfat dataset using the Anderson-Darling goodness of fit test.

Chi-squared Goodness of Fit Test

The chi square goodness of fit test reveals if the proportions of a discrete or categorical variable follow a distribution with hypothesized proportions.

Statisticians often use the chi square goodness of fit test to evaluate if the proportions of categorical outcomes are all equal. Or the analyst can list the proportions to use in the test. Alternatively, this test can determine if the observed outcomes fit a discrete probability distribution, like the Poisson distribution.

This goodness of fit test does the following:

  • Calculates deviations: Uses the squared difference between observed and expected.
  • P-value < 0.05: Observed and expected frequencies don’t match.

Imagine you’re curious about dice fairness. You roll a six-sided die 600 times, expecting each face to come up 100 times if it’s fair.

The observed counts are 90, 110, 95, 105, 95, and 105 for sides 1 through 6. The observed values don’t matched the expected values of 100 for each die face. Let’s run the Chi-square goodness of fit test for these data to see if those differences are statistically significant.

Statistical results for the chi-squared goodness of fit test.

The p-value of 0.700 is greater than 0.05, so you fail to reject the null hypothesis . The observed frequencies don’t differ significantly from the expected frequencies. Your sample data do not support the claim that the die is unfair!

To explore other examples of the chi square test in action, read the following:

  • Chi-Square Goodness of Fit Test: Uses & Example
  • How the Chi-Square Test of Independent Works

Goodness of fit tells the story of your data and its relationship with a model. It’s like a quality check. For regression, R², S, and AIC are great starters. For probability distributions, the Anderson-Darling and Chi-squared goodness of fit tests are go-tos. Dive in, fit that model, and let the data guide you!

Share this:

null hypothesis for chi square goodness of fit test

Reader Interactions

' src=

October 23, 2023 at 11:02 am

Jim, I have a pricing curve model to estimate the curvature of per unit cost (decrease) as purchased quantity increases. It follows the power law Y=Ax^B.

In my related log-log linear regression, the average residual is $0.00, which makes sense because we kept the Y-intercept in the model. However, in the transformed model in natural units, the residuals no longer average $0.00. Why does that property not carry over to the Y=Ax^B form of regression?

As a side note, I have your book “Regression Analysis,” which I have read several times and learned quite a lot. I believe there are two similar errors in Chapter 13, not related to my question above.

On page 323, when transforming the fitted line in log units back to natural units, the coefficient A in Y=Ax^B should be the common antilog of 0.5758 or 3.7653. Similarly, on page 325, the coefficient A should be the common antilog of 1.879 or 75.6833. This can be visually checked for reasonableness by looking at the graph on page 325. If we look at the x-axis, say at x=1, it appears y should be slightly less than 100. If we evaluate the power expression Y=75.6833x^(-0.6383), the fitted value is 75.68, which seems to be what the graph predicts.

The relevant logarithmic identity is log(ab) = log(a) + log(b). The Y-intercept in the log-log linear model is necessarily in log units, not natural units.

' src=

October 23, 2023 at 2:29 pm

Those are good questions.

I’m not exactly sure what is happening in your model but here are my top two possibilities.

When you transform data, especially using non-linear transformations like logarithms, the relationship between the variables can change. In the log-log linear regression, the relationship is linear, and the residuals (differences between observed and predicted values) average out to $0.00. However, when you transform back to the natural units using an exponential function, the relationship becomes non-linear. This non-linearity can cause the residuals to no longer average out to $0.00.

When you re-express the log-log model to its natural units form, there might be some approximation or rounding errors. These errors can accumulate and affect the average of the residuals.

As for the output in the book, that was all calculated by statistical software that I trust (Minitab). I’ll have to look deeper into what is going on, but I trust the results.

Comments and Questions Cancel reply

Hypothesis Testing - Chi Squared Test

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

Introductory word scramble

Introduction

This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific tests considered here are called chi-square tests and are appropriate when the outcome is discrete (dichotomous, ordinal or categorical). For example, in some clinical trials the outcome is a classification such as hypertensive, pre-hypertensive or normotensive. We could use the same classification in an observational study such as the Framingham Heart Study to compare men and women in terms of their blood pressure status - again using the classification of hypertensive, pre-hypertensive or normotensive status.  

The technique to analyze a discrete outcome uses what is called a chi-square test. Specifically, the test statistic follows a chi-square probability distribution. We will consider chi-square tests here with one, two and more than two independent comparison groups.

Learning Objectives

After completing this module, the student will be able to:

  • Perform chi-square tests by hand
  • Appropriately interpret results of chi-square tests
  • Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

Tests with One Sample, Discrete Outcome

Here we consider hypothesis testing with a discrete outcome variable in a single population. Discrete variables are variables that take on more than two distinct responses or categories and the responses can be ordered or unordered (i.e., the outcome can be ordinal or categorical). The procedure we describe here can be used for dichotomous (exactly 2 response options), ordinal or categorical discrete outcomes and the objective is to compare the distribution of responses, or the proportions of participants in each response category, to a known distribution. The known distribution is derived from another study or report and it is again important in setting up the hypotheses that the comparator distribution specified in the null hypothesis is a fair comparison. The comparator is sometimes called an external or a historical control.   

In one sample tests for a discrete outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the proportions of participants in each response

Test Statistic for Testing H 0 : p 1 = p 10 , p 2 = p 20 , ..., p k = p k0

We find the critical value in a table of probabilities for the chi-square distribution with degrees of freedom (df) = k-1. In the test statistic, O = observed frequency and E=expected frequency in each of the response categories. The observed frequencies are those observed in the sample and the expected frequencies are computed as described below. χ 2 (chi-square) is another probability distribution and ranges from 0 to ∞. The test above statistic formula above is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories.  

When we conduct a χ 2 test, we compare the observed frequencies in each response category to the frequencies we would expect if the null hypothesis were true. These expected frequencies are determined by allocating the sample to the response categories according to the distribution specified in H 0 . This is done by multiplying the observed sample size (n) by the proportions specified in the null hypothesis (p 10 , p 20 , ..., p k0 ). To ensure that the sample size is appropriate for the use of the test statistic above, we need to ensure that the following: min(np 10 , n p 20 , ..., n p k0 ) > 5.  

The test of hypothesis with a discrete outcome measured in a single sample, where the goal is to assess whether the distribution of responses follows a known distribution, is called the χ 2 goodness-of-fit test. As the name indicates, the idea is to assess whether the pattern or distribution of responses in the sample "fits" a specified population (external or historical) distribution. In the next example we illustrate the test. As we work through the example, we provide additional details related to the use of this new test statistic.  

A University conducted a survey of its recent graduates to collect demographic and health information for future planning purposes as well as to assess students' satisfaction with their undergraduate experiences. The survey revealed that a substantial proportion of students were not engaging in regular exercise, many felt their nutrition was poor and a substantial number were smoking. In response to a question on regular exercise, 60% of all graduates reported getting no regular exercise, 25% reported exercising sporadically and 15% reported exercising regularly as undergraduates. The next year the University launched a health promotion campaign on campus in an attempt to increase health behaviors among undergraduates. The program included modules on exercise, nutrition and smoking cessation. To evaluate the impact of the program, the University again surveyed graduates and asked the same questions. The survey was completed by 470 graduates and the following data were collected on the exercise question:

 

Number of Students

255

125

90

470

Based on the data, is there evidence of a shift in the distribution of responses to the exercise question following the implementation of the health promotion campaign on campus? Run the test at a 5% level of significance.

In this example, we have one sample and a discrete (ordinal) outcome variable (with three response options). We specifically want to compare the distribution of responses in the sample to the distribution reported the previous year (i.e., 60%, 25%, 15% reporting no, sporadic and regular exercise, respectively). We now run the test using the five-step approach.  

  • Step 1. Set up hypotheses and determine level of significance.

The null hypothesis again represents the "no change" or "no difference" situation. If the health promotion campaign has no impact then we expect the distribution of responses to the exercise question to be the same as that measured prior to the implementation of the program.

H 0 : p 1 =0.60, p 2 =0.25, p 3 =0.15,  or equivalently H 0 : Distribution of responses is 0.60, 0.25, 0.15  

H 1 :   H 0 is false.          α =0.05

Notice that the research hypothesis is written in words rather than in symbols. The research hypothesis as stated captures any difference in the distribution of responses from that specified in the null hypothesis. We do not specify a specific alternative distribution, instead we are testing whether the sample data "fit" the distribution in H 0 or not. With the χ 2 goodness-of-fit test there is no upper or lower tailed version of the test.

  • Step 2. Select the appropriate test statistic.  

The test statistic is:

We must first assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=470 and the proportions specified in the null hypothesis are 0.60, 0.25 and 0.15. Thus, min( 470(0.65), 470(0.25), 470(0.15))=min(282, 117.5, 70.5)=70.5. The sample size is more than adequate so the formula can be used.

  • Step 3. Set up decision rule.  

The decision rule for the χ 2 test depends on the level of significance and the degrees of freedom, defined as degrees of freedom (df) = k-1 (where k is the number of response categories). If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. Critical values can be found in a table of probabilities for the χ 2 distribution. Here we have df=k-1=3-1=2 and a 5% level of significance. The appropriate critical value is 5.99, and the decision rule is as follows: Reject H 0 if χ 2 > 5.99.

  • Step 4. Compute the test statistic.  

We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) and the expected frequencies into the formula for the test statistic identified in Step 2. The computations can be organized as follows.

   

255

125

90

470

470(0.60)

=282

470(0.25)

=117.5

470(0.15)

=70.5

470

Notice that the expected frequencies are taken to one decimal place and that the sum of the observed frequencies is equal to the sum of the expected frequencies. The test statistic is computed as follows:

  • Step 5. Conclusion.  

We reject H 0 because 8.46 > 5.99. We have statistically significant evidence at α=0.05 to show that H 0 is false, or that the distribution of responses is not 0.60, 0.25, 0.15.  The p-value is p < 0.005.  

In the χ 2 goodness-of-fit test, we conclude that either the distribution specified in H 0 is false (when we reject H 0 ) or that we do not have sufficient evidence to show that the distribution specified in H 0 is false (when we fail to reject H 0 ). Here, we reject H 0 and concluded that the distribution of responses to the exercise question following the implementation of the health promotion campaign was not the same as the distribution prior. The test itself does not provide details of how the distribution has shifted. A comparison of the observed and expected frequencies will provide some insight into the shift (when the null hypothesis is rejected). Does it appear that the health promotion campaign was effective?  

Consider the following: 

 

255

125

90

470

282

117.5

70.5

470

If the null hypothesis were true (i.e., no change from the prior year) we would have expected more students to fall in the "No Regular Exercise" category and fewer in the "Regular Exercise" categories. In the sample, 255/470 = 54% reported no regular exercise and 90/470=19% reported regular exercise. Thus, there is a shift toward more regular exercise following the implementation of the health promotion campaign. There is evidence of a statistical difference, is this a meaningful difference? Is there room for improvement?

The National Center for Health Statistics (NCHS) provided data on the distribution of weight (in categories) among Americans in 2002. The distribution was based on specific values of body mass index (BMI) computed as weight in kilograms over height in meters squared. Underweight was defined as BMI< 18.5, Normal weight as BMI between 18.5 and 24.9, overweight as BMI between 25 and 29.9 and obese as BMI of 30 or greater. Americans in 2002 were distributed as follows: 2% Underweight, 39% Normal Weight, 36% Overweight, and 23% Obese. Suppose we want to assess whether the distribution of BMI is different in the Framingham Offspring sample. Using data from the n=3,326 participants who attended the seventh examination of the Offspring in the Framingham Heart Study we created the BMI categories as defined and observed the following:

 

30

20

932

1374

1000

3326

  • Step 1.  Set up hypotheses and determine level of significance.

H 0 : p 1 =0.02, p 2 =0.39, p 3 =0.36, p 4 =0.23     or equivalently

H 0 : Distribution of responses is 0.02, 0.39, 0.36, 0.23

H 1 :   H 0 is false.        α=0.05

The formula for the test statistic is:

We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=3,326 and the proportions specified in the null hypothesis are 0.02, 0.39, 0.36 and 0.23. Thus, min( 3326(0.02), 3326(0.39), 3326(0.36), 3326(0.23))=min(66.5, 1297.1, 1197.4, 765.0)=66.5. The sample size is more than adequate, so the formula can be used.

Here we have df=k-1=4-1=3 and a 5% level of significance. The appropriate critical value is 7.81 and the decision rule is as follows: Reject H 0 if χ 2 > 7.81.

We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) into the formula for the test statistic identified in Step 2. We organize the computations in the following table.

 

30

20

932

1374

1000

3326

66.5

1297.1

1197.4

765.0

3326

The test statistic is computed as follows:

We reject H 0 because 233.53 > 7.81. We have statistically significant evidence at α=0.05 to show that H 0 is false or that the distribution of BMI in Framingham is different from the national data reported in 2002, p < 0.005.  

Again, the χ 2   goodness-of-fit test allows us to assess whether the distribution of responses "fits" a specified distribution. Here we show that the distribution of BMI in the Framingham Offspring Study is different from the national distribution. To understand the nature of the difference we can compare observed and expected frequencies or observed and expected proportions (or percentages). The frequencies are large because of the large sample size, the observed percentages of patients in the Framingham sample are as follows: 0.6% underweight, 28% normal weight, 41% overweight and 30% obese. In the Framingham Offspring sample there are higher percentages of overweight and obese persons (41% and 30% in Framingham as compared to 36% and 23% in the national data), and lower proportions of underweight and normal weight persons (0.6% and 28% in Framingham as compared to 2% and 39% in the national data). Are these meaningful differences?

In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable in a single population. We presented a test using a test statistic Z to test whether an observed (sample) proportion differed significantly from a historical or external comparator. The chi-square goodness-of-fit test can also be used with a dichotomous outcome and the results are mathematically equivalent.  

In the prior module, we considered the following example. Here we show the equivalence to the chi-square goodness-of-fit test.

The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data?

We presented the following approach to the test using a Z statistic. 

  • Step 1. Set up hypotheses and determine level of significance

H 0 : p = 0.75

H 1 : p ≠ 0.75                               α=0.05

We must first check that the sample size is adequate. Specifically, we need to check min(np 0 , n(1-p 0 )) = min( 125(0.75), 125(1-0.75))=min(94, 31)=31. The sample size is more than adequate so the following formula can be used

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. The sample proportion is:

null hypothesis for chi square goodness of fit test

We reject H 0 because -6.15 < -1.960. We have statistically significant evidence at a =0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data. (p < 0.0001).  

We now conduct the same test using the chi-square goodness-of-fit test. First, we summarize our sample data as follows:

 

Saw a Dentist

in Past 12 Months

Did Not See a Dentist

in Past 12 Months

Total

# of Participants

64

61

125

H 0 : p 1 =0.75, p 2 =0.25     or equivalently H 0 : Distribution of responses is 0.75, 0.25 

We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ...,np k >) > 5. The sample size here is n=125 and the proportions specified in the null hypothesis are 0.75, 0.25. Thus, min( 125(0.75), 125(0.25))=min(93.75, 31.25)=31.25. The sample size is more than adequate so the formula can be used.

Here we have df=k-1=2-1=1 and a 5% level of significance. The appropriate critical value is 3.84, and the decision rule is as follows: Reject H 0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)

 

64

61

125

93.75

31.25

125

(Note that (-6.15) 2 = 37.8, where -6.15 was the value of the Z statistic in the test for proportions shown above.)

We reject H 0 because 37.8 > 3.84. We have statistically significant evidence at α=0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data.  (p < 0.0001). This is the same conclusion we reached when we conducted the test using the Z test above. With a dichotomous outcome, Z 2 = χ 2 !   In statistics, there are often several approaches that can be used to test hypotheses. 

Tests for Two or More Independent Samples, Discrete Outcome

Here we extend that application of the chi-square test to the case with two or more independent comparison groups. Specifically, the outcome of interest is discrete with two or more responses and the responses can be ordered or unordered (i.e., the outcome can be dichotomous, ordinal or categorical). We now consider the situation where there are two or more independent comparison groups and the goal of the analysis is to compare the distribution of responses to the discrete outcome variable among several independent comparison groups.  

The test is called the χ 2 test of independence and the null hypothesis is that there is no difference in the distribution of responses to the outcome across comparison groups. This is often stated as follows: The outcome variable and the grouping variable (e.g., the comparison treatments or comparison groups) are independent (hence the name of the test). Independence here implies homogeneity in the distribution of the outcome among comparison groups.    

The null hypothesis in the χ 2 test of independence is often stated in words as: H 0 : The distribution of the outcome is independent of the groups. The alternative or research hypothesis is that there is a difference in the distribution of responses to the outcome variable among the comparison groups (i.e., that the distribution of responses "depends" on the group). In order to test the hypothesis, we measure the discrete outcome variable in each participant in each comparison group. The data of interest are the observed frequencies (or number of participants in each response category in each group). The formula for the test statistic for the χ 2 test of independence is given below.

Test Statistic for Testing H 0 : Distribution of outcome is independent of groups

and we find the critical value in a table of probabilities for the chi-square distribution with df=(r-1)*(c-1).

Here O = observed frequency, E=expected frequency in each of the response categories in each group, r = the number of rows in the two-way table and c = the number of columns in the two-way table.   r and c correspond to the number of comparison groups and the number of response options in the outcome (see below for more details). The observed frequencies are the sample data and the expected frequencies are computed as described below. The test statistic is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories in each group.  

The data for the χ 2 test of independence are organized in a two-way table. The outcome and grouping variable are shown in the rows and columns of the table. The sample table below illustrates the data layout. The table entries (blank below) are the numbers of participants in each group responding to each response category of the outcome variable.

Table - Possible outcomes are are listed in the columns; The groups being compared are listed in rows.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N

In the table above, the grouping variable is shown in the rows of the table; r denotes the number of independent groups. The outcome variable is shown in the columns of the table; c denotes the number of response options in the outcome variable. Each combination of a row (group) and column (response) is called a cell of the table. The table has r*c cells and is sometimes called an r x c ("r by c") table. For example, if there are 4 groups and 5 categories in the outcome variable, the data are organized in a 4 X 5 table. The row and column totals are shown along the right-hand margin and the bottom of the table, respectively. The total sample size, N, can be computed by summing the row totals or the column totals. Similar to ANOVA, N does not refer to a population size here but rather to the total sample size in the analysis. The sample data can be organized into a table like the above. The numbers of participants within each group who select each response option are shown in the cells of the table and these are the observed frequencies used in the test statistic.

The test statistic for the χ 2 test of independence involves comparing observed (sample data) and expected frequencies in each cell of the table. The expected frequencies are computed assuming that the null hypothesis is true. The null hypothesis states that the two variables (the grouping variable and the outcome) are independent. The definition of independence is as follows:

 Two events, A and B, are independent if P(A|B) = P(A), or equivalently, if P(A and B) = P(A) P(B).

The second statement indicates that if two events, A and B, are independent then the probability of their intersection can be computed by multiplying the probability of each individual event. To conduct the χ 2 test of independence, we need to compute expected frequencies in each cell of the table. Expected frequencies are computed by assuming that the grouping variable and outcome are independent (i.e., under the null hypothesis). Thus, if the null hypothesis is true, using the definition of independence:

P(Group 1 and Response Option 1) = P(Group 1) P(Response Option 1).

 The above states that the probability that an individual is in Group 1 and their outcome is Response Option 1 is computed by multiplying the probability that person is in Group 1 by the probability that a person is in Response Option 1. To conduct the χ 2 test of independence, we need expected frequencies and not expected probabilities . To convert the above probability to a frequency, we multiply by N. Consider the following small example.

 

10

8

7

25

22

15

13

50

30

28

17

75

62

51

37

150

The data shown above are measured in a sample of size N=150. The frequencies in the cells of the table are the observed frequencies. If Group and Response are independent, then we can compute the probability that a person in the sample is in Group 1 and Response category 1 using:

P(Group 1 and Response 1) = P(Group 1) P(Response 1),

P(Group 1 and Response 1) = (25/150) (62/150) = 0.069.

Thus if Group and Response are independent we would expect 6.9% of the sample to be in the top left cell of the table (Group 1 and Response 1). The expected frequency is 150(0.069) = 10.4.   We could do the same for Group 2 and Response 1:

P(Group 2 and Response 1) = P(Group 2) P(Response 1),

P(Group 2 and Response 1) = (50/150) (62/150) = 0.138.

The expected frequency in Group 2 and Response 1 is 150(0.138) = 20.7.

Thus, the formula for determining the expected cell frequencies in the χ 2 test of independence is as follows:

Expected Cell Frequency = (Row Total * Column Total)/N.

The above computes the expected frequency in one step rather than computing the expected probability first and then converting to a frequency.  

In a prior example we evaluated data from a survey of university graduates which assessed, among other things, how frequently they exercised. The survey was completed by 470 graduates. In the prior example we used the χ 2 goodness-of-fit test to assess whether there was a shift in the distribution of responses to the exercise question following the implementation of a health promotion campaign on campus. We specifically considered one sample (all students) and compared the observed distribution to the distribution of responses the prior year (a historical control). Suppose we now wish to assess whether there is a relationship between exercise on campus and students' living arrangements. As part of the same survey, graduates were asked where they lived their senior year. The response options were dormitory, on-campus apartment, off-campus apartment, and at home (i.e., commuted to and from the university). The data are shown below.

 

32

30

28

90

74

64

42

180

110

25

15

150

39

6

5

50

255

125

90

470

Based on the data, is there a relationship between exercise and student's living arrangement? Do you think where a person lives affect their exercise status? Here we have four independent comparison groups (living arrangement) and a discrete (ordinal) outcome variable with three response options. We specifically want to test whether living arrangement and exercise are independent. We will run the test using the five-step approach.  

H 0 : Living arrangement and exercise are independent

H 1 : H 0 is false.                α=0.05

The null and research hypotheses are written in words rather than in symbols. The research hypothesis is that the grouping variable (living arrangement) and the outcome variable (exercise) are dependent or related.   

  • Step 2.  Select the appropriate test statistic.  

The condition for appropriate use of the above test statistic is that each expected frequency is at least 5. In Step 4 we will compute the expected frequencies and we will ensure that the condition is met.

The decision rule depends on the level of significance and the degrees of freedom, defined as df = (r-1)(c-1), where r and c are the numbers of rows and columns in the two-way data table.   The row variable is the living arrangement and there are 4 arrangements considered, thus r=4. The column variable is exercise and 3 responses are considered, thus c=3. For this test, df=(4-1)(3-1)=3(2)=6. Again, with χ 2 tests there are no upper, lower or two-tailed tests. If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. The rejection region for the χ 2 test of independence is always in the upper (right-hand) tail of the distribution. For df=6 and a 5% level of significance, the appropriate critical value is 12.59 and the decision rule is as follows: Reject H 0 if c 2 > 12.59.

We now compute the expected frequencies using the formula,

Expected Frequency = (Row Total * Column Total)/N.

The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency.   The expected frequencies are shown in parentheses.

 

32

(48.8)

30

(23.9)

28

(17.2)

90

74

(97.7)

64

(47.9)

42

(34.5)

180

110

(81.4)

25

(39.9)

15

(28.7)

150

39

(27.1)

6

(13.3)

5

(9.6)

50

255

125

90

470

Notice that the expected frequencies are taken to one decimal place and that the sums of the observed frequencies are equal to the sums of the expected frequencies in each row and column of the table.  

Recall in Step 2 a condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 9.6) and therefore it is appropriate to use the test statistic.

We reject H 0 because 60.5 > 12.59. We have statistically significant evidence at a =0.05 to show that H 0 is false or that living arrangement and exercise are not independent (i.e., they are dependent or related), p < 0.005.  

Again, the χ 2 test of independence is used to test whether the distribution of the outcome variable is similar across the comparison groups. Here we rejected H 0 and concluded that the distribution of exercise is not independent of living arrangement, or that there is a relationship between living arrangement and exercise. The test provides an overall assessment of statistical significance. When the null hypothesis is rejected, it is important to review the sample data to understand the nature of the relationship. Consider again the sample data. 

Because there are different numbers of students in each living situation, it makes the comparisons of exercise patterns difficult on the basis of the frequencies alone. The following table displays the percentages of students in each exercise category by living arrangement. The percentages sum to 100% in each row of the table. For comparison purposes, percentages are also shown for the total sample along the bottom row of the table.

36%

33%

31%

41%

36%

23%

73%

17%

10%

78%

12%

10%

54%

27%

19%

From the above, it is clear that higher percentages of students living in dormitories and in on-campus apartments reported regular exercise (31% and 23%) as compared to students living in off-campus apartments and at home (10% each).  

Test Yourself

 Pancreaticoduodenectomy (PD) is a procedure that is associated with considerable morbidity. A study was recently conducted on 553 patients who had a successful PD between January 2000 and December 2010 to determine whether their Surgical Apgar Score (SAS) is related to 30-day perioperative morbidity and mortality. The table below gives the number of patients experiencing no, minor, or major morbidity by SAS category.  

0-4

21

20

16

5-6

135

71

35

7-10

158

62

35

Question: What would be an appropriate statistical test to examine whether there is an association between Surgical Apgar Score and patient outcome? Using 14.13 as the value of the test statistic for these data, carry out the appropriate test at a 5% level of significance. Show all parts of your test.

In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable and two independent comparison groups. We presented a test using a test statistic Z to test for equality of independent proportions. The chi-square test of independence can also be used with a dichotomous outcome and the results are mathematically equivalent.  

In the prior module, we considered the following example. Here we show the equivalence to the chi-square test of independence.

A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 0-10 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.

50

23

0.46

50

11

0.22

We tested whether there was a significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using a Z statistic, as follows. 

H 0 : p 1 = p 2    

H 1 : p 1 ≠ p 2                             α=0.05

Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group or that:

In this example, we have

Therefore, the sample size is adequate, so the following formula can be used:

Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes:

We now substitute to compute the test statistic.

  • Step 5.  Conclusion.  

We now conduct the same test using the chi-square test of independence.  

H 0 : Treatment and outcome (meaningful reduction in pain) are independent

H 1 :   H 0 is false.         α=0.05

The formula for the test statistic is:  

For this test, df=(2-1)(2-1)=1. At a 5% level of significance, the appropriate critical value is 3.84 and the decision rule is as follows: Reject H0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)

We now compute the expected frequencies using:

The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency. The expected frequencies are shown in parentheses.

23

(17.0)

27

(33.0)

50

11

(17.0)

39

(33.0)

50

34

66

100

A condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 22.0) and therefore it is appropriate to use the test statistic.

(Note that (2.53) 2 = 6.4, where 2.53 was the value of the Z statistic in the test for proportions shown above.)

Chi-Squared Tests in R

The video below by Mike Marin demonstrates how to perform chi-squared tests in the R programming language.

Answer to Problem on Pancreaticoduodenectomy and Surgical Apgar Scores

We have 3 independent comparison groups (Surgical Apgar Score) and a categorical outcome variable (morbidity/mortality). We can run a Chi-Squared test of independence.

H 0 : Apgar scores and patient outcome are independent of one another.

H A : Apgar scores and patient outcome are not independent.

Chi-squared = 14.3

Since 14.3 is greater than 9.49, we reject H 0.

There is an association between Apgar scores and patient outcome. The lowest Apgar score group (0 to 4) experienced the highest percentage of major morbidity or mortality (16 out of 57=28%) compared to the other Apgar score groups.

Chi-Square Goodness of Fit Test

Investopedia

  • Statistics Tutorials
  • Probability & Games
  • Descriptive Statistics
  • Inferential Statistics
  • Applications Of Statistics
  • Math Tutorials
  • Pre Algebra & Algebra
  • Exponential Decay
  • Worksheets By Grade
  • Ph.D., Mathematics, Purdue University
  • M.S., Mathematics, Purdue University
  • B.A., Mathematics, Physics, and Chemistry, Anderson University

The chi-square goodness of fit test is a variation of the more general chi-square test. The setting for this test is a single categorical variable that can have many levels. Often in this situation, we will have a theoretical model in mind for a categorical variable. Through this model we expect certain proportions of the population to fall into each of these levels. A goodness of fit test determines how well the expected proportions in our theoretical model matches reality.

Null and Alternative Hypotheses

The null and alternative hypotheses for a goodness of fit test look different than some of our other hypothesis tests. One reason for this is that a chi-square goodness of fit test is a nonparametric method . This means that our test does not concern a single population parameter. Thus the null hypothesis does not state that a single parameter takes on a certain value.

We start with a categorical variable with n levels and let p i be the proportion of the population at level i . Our theoretical model has values of q i for each of the proportions. The statement of the null and alternative hypotheses are as follows:

  • H 0 : p 1 = q 1 , p 2 = q 2 , . . . p n = q n
  • H a : For at least one i , p i is not equal to q i .

Actual and Expected Counts

The calculation of a chi-square statistic involves a comparison between actual counts of variables from the data in our simple random sample and the expected counts of these variables. The actual counts come directly from our sample. The way that the expected counts are calculated depends upon the particular chi-square test that we are using.

For a goodness of fit test, we have a theoretical model for how our data should be proportioned. We simply multiply these proportions by the sample size n to obtain our expected counts.

Computing Test Statistic

The chi-square statistic for goodness of fit test is determined by comparing the actual and expected counts for each level of our categorical variable. The steps to computing the chi-square statistic for a goodness of fit test are as follows:

  • For each level, subtract the observed count from the expected count.
  • Square each of these differences.
  • Divide each of these squared differences by the corresponding expected value.
  • Add all of the numbers from the previous step together. This is our chi-square statistic.

If our theoretical model matches the observed data perfectly, then the expected counts will show no deviation whatsoever from the observed counts of our variable. This will mean that we will have a chi-square statistic of zero. In any other situation, the chi-square statistic will be a positive number.

Degrees of Freedom

The number of degrees of freedom requires no difficult calculations. All that we need to do is subtract one from the number of levels of our categorical variable. This number will inform us on which of the infinite chi-square distributions we should use.

Chi-square Table and P-Value

The chi-square statistic that we calculated corresponds to a particular location on a chi-square distribution with the appropriate number of degrees of freedom. The p-value determines the probability of obtaining a test statistic this extreme, assuming that the null hypothesis is true. We can use a table of values for a chi-square distribution to determine the p-value of our hypothesis test. If we have statistical software available, then this can be used to obtain a better estimate of the p-value.

Decision Rule

We make our decision on whether to reject the null hypothesis based upon a predetermined level of significance. If our p-value is less than or equal to this level of significance, then we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.

  • Example of a Chi-Square Goodness of Fit Test
  • What Is a P-Value?
  • Facts About the Number e: 2.7182818284590452...
  • How to Conduct a Hypothesis Test
  • How to Calculate a Sample Standard Deviation
  • Example of Two Sample T Test and Confidence Interval
  • Hypothesis Test for the Difference of Two Population Proportions
  • Differences Between Population and Sample Standard Deviations
  • An Introduction to Hypothesis Testing
  • Hypothesis Test Example
  • An Introduction to the Bell Curve
  • How to Make a Stem and Leaf Plot
  • Histogram Classes
  • How to Make a Boxplot
  • What Is Correlation in Statistics?
  • What Is a Histogram?

Chi-Square Goodness of Fit Test

Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed value of a given phenomena is significantly different from the expected value.  In Chi-Square goodness of fit test, the term goodness of fit is used to compare the observed sample distribution with the expected probability distribution.  This test determines how well theoretical distribution (such as normal, binomial, or Poisson) fits the empirical distribution. Sample data is divided into intervals. Then the numbers of points that fall into the interval are compared, with the expected numbers of points in each interval.

Procedure for Chi-Square Goodness of Fit Test:

  • Set up the hypothesis:

A. Null hypothesis : The null hypothesis assumes that there is no significant difference between the observed and the expected value.

B. Alternative hypothesis : The alternative hypothesis assumes that there is a significant difference between the observed and the expected value.

  • Compute the value of Chi-Square goodness of fit test using the following formula:

null hypothesis for chi square goodness of fit test

Degree of freedom: The degree of freedom depends on the distribution of the sample.  The following table shows the distribution and an associated degree of freedom:

Type of distributionNo of constraintsDegree of freedom
Binominal distribution1n-1
Poisson distribution2n-2
Normal distribution3n-3

Hypothesis testing: Hypothesis testing is the same as in other tests, like t-test, ANOVA , etc.  The calculated value of Chi-Square goodness of fit test is compared with the table value.  If the calculated value is greater than the table value, we will reject the null hypothesis and conclude that there is a significant difference between the observed and the expected frequency.  If the calculated value is less than the table value, we will accept the null hypothesis and conclude that there is no significant difference between the observed and expected value.

Need help with your analysis?

Schedule a time to speak with an expert using the calendar below.

Turn raw data into written, interpreted, APA formatted Chi-Square results in seconds.

Take the Course: Chi-Square Goodness of Fit Test

Related Pages:

  • Conduct and Interpret the Chi-Square Test of Independence
  • Chi-square: degrees of freedom

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

buildings-logo

Article Menu

null hypothesis for chi square goodness of fit test

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

The hotel architectural design factors influencing consumer destinations: a case study of three-star hotels in hua hin, thailand.

null hypothesis for chi square goodness of fit test

1. Introduction

2. literature review, 2.1. the characteristics of hotels in aesthetic perception and evaluation approach, 2.2. the characteristics of hotels in physical comfort approach, 2.3. the characteristics of hotels in emotional comfort, safety, and security: influencing consumer perceptions, 2.4. sensitivity of mind approach: influencing consumer emotions and decision-making, 3. methodology, 3.1. data sources, 3.1.1. professional group, 3.1.2. consumer group, 3.2. data collection, 3.2.1. in-depth interviews with professionals, 3.2.2. open-ended questions with the consumer group, 3.2.3. developing a questionnaire for consumer groups, collection of main data, 3.3. data analysis, 4.1. aesthetics perspectives, 4.2. physical comfort perspectives, 4.3. emotional comfort perspectives, 4.4. security and sensibility of mind perspectives, 5. conclusions and discussion, author contributions, data availability statement, conflicts of interest.

  • Mangruwa, R.; Mahdzir, A.; Abu Mansor, N.N. COVID-19 Pandemic Crisis: The Recovery Strategy of Hotel Business in Bengkulu City through Adoption New-normal. Int. J. Acad. Res. Bus. Soc. Sci. 2021 , 11 , 1764–1777. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Frangou, D.; Georgiadou, Z.; Marnellos, D. Hotel design: A path for qualitative tourism. Tour. Res. Inst. J. Tour. Res. 2015 , 12 , 108–134. [ Google Scholar ]
  • Gibson, J.J. The Concept of the Stimulus in Psychology. Am. Psychol. 1960 , 115 , 694–703. [ Google Scholar ] [ CrossRef ]
  • Lee, T.J. Role of hotel design in enhancing destination branding. Ann. Tour. Res. 2011 , 38 , 708–711. [ Google Scholar ] [ CrossRef ]
  • Sop, S. A Systematic Literature Review on Hotel Design. Tur. Akad. Derg. 2020 , 7 , 297–310. [ Google Scholar ]
  • Phillips, P. Customer-oriented Hotel Aesthetics: A Shareholder Value Perspective. J. Retail. Leis. Prop. 2004 , 3 , 365–373. [ Google Scholar ] [ CrossRef ]
  • Agresti, A. Statistical Methods for the Social Sciences , 5th ed.; Pearson: London UK, 2018. [ Google Scholar ]
  • Dieck, M.C.T.; Jung, T. Augmented Reality and Virtual Reality: The Power of AR and VR for Business ; Springer: Berlin/Heidelberg, Germany, 2018; Available online: https://link.springer.com/book/10.1007/978-3-319-64027-3 (accessed on 20 June 2024).
  • Morgan, N.A.; Rego, L.L. The value of different customer satisfaction and loyalty metrics in predicting business performance. Mark. Sci. 2006 , 25 , 426–439. [ Google Scholar ] [ CrossRef ]
  • Akin, O. Variants in design cognition. In Design Knowing and Learning: Cognition in Design Education ; Eastman, C., Mccracken, M., Newstetter, W., Eds.; Elsevier: Amsterdam, The Netherlands, 2001; pp. 105–124. [ Google Scholar ]
  • Brown, G.; Gifford, R. Architects predict lay evaluations of large contemporary buildings: Whose conceptual properties? J. Environ. Psychol. 2001 , 21 , 93–99. [ Google Scholar ] [ CrossRef ]
  • Hubbard, P. Conflicting interpretations of architecture: An empirical investigation. J. Environ. Psychol. 1996 , 16 , 75–92. [ Google Scholar ] [ CrossRef ]
  • Jeffrey, D.; Reynolds, G. Planners architects the public aesthetics factor analysis of references for infill developments. J. Archit. Plan. Res. 1999 , 16 , 271–288. [ Google Scholar ]
  • Gifford, R.; Hine, D.W.; Muller-Clemm, W.; Reynolds, D.A.J., Jr.; Shaw, K.T. Decoding modern architecture a lens model approach for understanding the aesthetic differences of architects and laypersons. Environ. Behav. 2000 , 32 , 163–187. [ Google Scholar ] [ CrossRef ]
  • Gifford, R. Environmental Psychology: Principles and Practice ; Optimal Books: Colville, WA, USA, 2002. [ Google Scholar ]
  • Gifford, R. The Consequences of Living in High-Rise Buildings. Archit. Sci. Rev. 2011 , 50 , 2–17. [ Google Scholar ] [ CrossRef ]
  • Kotler, P.; Amstrong, G. Principles of Marketing , 14th ed.; Pearson Education Limited: Hoboken, NJ, USA, 2011. [ Google Scholar ]
  • Nasar, J.L. Environmental Aesthetics: Theory, Research, and Application ; Cambridge University Press: Cambridge, UK, 1992. [ Google Scholar ]
  • Ozbay, G.; Sariisik, M.; Ceylan, V.; Çakmak, M. A comparative evaluation between the impact of previous outbreaks and COVID-19 on the tourism industry. Int. Hosp. Rev. 2022 , 36 , 65–82. [ Google Scholar ] [ CrossRef ]
  • Park, E.; Kim, W.H.; Kim, S.B. How does COVID-19 differ from previous crises? A comparative study of health-related crisis research in the tourism and hospitality context. Int. J. Hosp. Manag. 2022 , 103 , 103199. [ Google Scholar ] [ CrossRef ]
  • Sanabria-Díaz, J.M.; Aguiar-Quintana, T.; Araujo-Cabrera, Y. Public strategies to rescue the hospitality industry following the impact of COVID-19: A case study of the European Union. Int. J. Hosp. Manag. 2021 , 97 , 102988. [ Google Scholar ] [ CrossRef ]
  • Danziger, S.; Israeli, A.; Bekerman, M. The relative role of strategic assets in determining customer perceptions of hotel room price. Int. J. Hosp. Manag. 2006 , 25 , 129–145. [ Google Scholar ] [ CrossRef ]
  • Wong, A.K.F.; Kim, S.; Liu, Y.Y.; Grace Baah, N. COVID-19 Research in Hospitality and Tourism: Critical Analysis, Reflection, and Lessons Learned. J. Hosp. Tour. Res. 2023 , 10 , 10963480231156079. [ Google Scholar ] [ CrossRef ] [ PubMed Central ]
  • Doğan, H.; Barutçu, S.; Nebioğlu, O.; Doğan, İ. Perceptions of Hotel Top Managers for Opportunities and Strategic Collaboration with a Foreign Partner in Tourism Sector: An Applied Research in Alanya Destination. Procedia-Soc. Behav. Sci. 2012 , 58 , 1218–1227. [ Google Scholar ] [ CrossRef ]
  • Kim, M.J.; Lee, C.K.; Jung, T. Exploring Consumer Behavior in Virtual Reality Tourism Using an Extended Stimulus-Organism-Response Model. J. Travel Res. 2019 , 58 , 897–912. [ Google Scholar ] [ CrossRef ]
  • Anastasiadou, P.; Sarantakou, E.; Maniati, E.; Tsilika, E. Exploring Stakeholders’ Perspectives on Hotel Design. In Transcending Borders in Tourism Through Innovation and Cultural Heritage, Proceedings of the 8th International Conference, IACuDiT, Hydra, Greece, 1–3 September 2021 ; Springer: Cham, Switzerland, 2022. [ Google Scholar ] [ CrossRef ]
  • Nasar, J.L. Connotative meanings of house styles. In The Meaning and Use of Housing: Ethnoscapes ; Arias, G., Ed.; Gower: Avebury, UK, 1993; Volume 7, pp. 143–167. [ Google Scholar ]
  • Nasar, J.L. Urban Design Aesthetics The Evaluative Qualities of Building Exteriors. Environ. Behav. 1994 , 26 , 377–401. [ Google Scholar ] [ CrossRef ]
  • Nasar, J.L. Environmental Psychology Urban Design. In Companion to Urban Design ; Banerjee, T., Loukaitou-Sideris, A., Eds.; Routledge: London, UK, 2011. [ Google Scholar ]
  • Ghomeishi, M. An Assessment of Aesthetics in Conceptual Properties and Its Relation to Complexity Among Architects and Non-Architects in Residential Façade Design in Iran. J. Build. Sustain. 2017 , 2 , 1–15. Available online: https://www.academia.edu/92486821/An_assessment_of_Aesthetics_in_Conceptual_Properties_and_its_Relation_to_Complexity_among_Architects_and_Non_Architects_in_Residential_Fa%C3%A7ade_Design_in_Iran (accessed on 20 June 2024).
  • Llinares, C.; Montañana, A.; Navarro, E. Differences in Architects and Nonarchitects’ Perception of Urban Design: An Application of Kansei Engineering Techniques. Urban Stud. Res. 2011 , 2011 , 736307. [ Google Scholar ] [ CrossRef ]
  • Nasar, J.L. Symbolic meaning of house styles. Environ. Behav. 1989 , 21 , 235–257. [ Google Scholar ] [ CrossRef ]
  • Devlin, K.; Nasar, J.L. The beauty and the beast: Some preliminary comparisons of ‘high’ versus ‘popular’ residential architecture and public versus architect judgements of same. J. Environ. Psychol. 1989 , 9 , 333–344. [ Google Scholar ] [ CrossRef ]
  • Gibson, C.; Ostrom, E.; Ahn, T.-K. The concept of scale and the human dimensions of global change: A survey. Ecol. Econ. 2000 , 32 , 217–239. [ Google Scholar ] [ CrossRef ]
  • Herzog, T.R. A cognitive analysis of preference for urban spaces. J. Environ. Psychol. 1992 , 12 , 237–248. [ Google Scholar ] [ CrossRef ]
  • Herzog, T.R.; Kaplan, S.; Kaplan, R. The prediction of preference for unfamiliar urban places. Popul. Environ. Behav. Soc. Issues 1982 , 5 , 43–59. [ Google Scholar ] [ CrossRef ]
  • Ghomeshi, M.; Jusan, M.M. Investigating different aesthetic preferences between architects and non-architects in residential fac¸ade designs. Indoor Built Environ. 2013 , 22 , 952–964. [ Google Scholar ] [ CrossRef ]
  • Feast, L.; Melles, G. Epistemological positions in design research: A brief review of theliterature. In Proceedings of the 2nd International Conference of Design Education, Sydney, Australia, 28 June–1 July 2010. [ Google Scholar ]
  • Alexander, C.; Silverstein, M.; Ishikawa, S. A Pattern Language: Towns, Buildings, Construction ; Oxford University Press: New York, NY, USA, 1977. [ Google Scholar ]
  • Colaço, C.A.; Acarturk, C. Visual behaviour during perception of architectural drawings: Differences between architects and non architects. In Design Computing and Cognition’18 ; Springer: Berlin/Heidelberg, Germany, 2018. [ Google Scholar ]
  • Cross, N. Expertise in design: An overview. Des. Stud. 2004 , 25 , 427–441. [ Google Scholar ] [ CrossRef ]
  • Lawson, B. How Designers Think , 4th ed.; Routledge: New York, NY, USA, 2005. [ Google Scholar ]
  • Purcell, A.T.; Nasar, J.L. Environmental and differences in Environmental Experience. J. Environ. Psychol. 1992 , 12 , 199–211. [ Google Scholar ] [ CrossRef ]
  • Huang, W.-J.; Chen, C.C.; Lai, Y.M. Five-Star Quality at Three-Star Prices? Opaque Booking and Hotel Service Expectations. J. Hosp. Mark. Manag. 2018 , 27 , 833–854. [ Google Scholar ] [ CrossRef ]
  • Gifford, R.; Hine, D.W.; Veitch, J.A. Meta-analysis for environment-behavior research illuminated with a study of lighting level effects on office task performance. In Toward the Integration of Theory, Methods, Research, and Utilization ; Advances in Environment, Behavior, and Design; Moore, G.T., Marans, R.W., Eds.; Springer: New York, NY, USA, 1997; Volume 4, pp. 223–253. [ Google Scholar ]
  • Barbey, G. L’appropriation des espaces du logement: Tentative de cadrage théorique. In Actes de la 3ème Conférence Internationale de Psychologie de l’Espace Construit ; Korosec-Serfaty, P., Ed.; Université de Strasbourg Press: Strasbourg, Germany, 1976; pp. 215–218. [ Google Scholar ]
  • Kim, J.J.; Han, H.; Ariza-Montes, A. The impact of hotel attributes, well-being perception, and attitudes on brand loyalty: Examining the moderating role of COVID-19 pandemic. J. Retail. Consum. Serv. 2021 , 62 , 102634. Available online: https://ideas.repec.org/a/eee/joreco/v62y2021ics0969698921002009.html (accessed on 20 June 2024).
  • Countryman, C.C.; Jang, S. The effects of atmospheric elements on custmer impression: The case of hotel lobbies. Int. J. Contemp. Hosp. Manag. 2006 , 18 , 534–545. [ Google Scholar ] [ CrossRef ]
  • Tussyadiah, I.P. The influence of innovativeness on on-site smartphone use among american travelers: Implications for context-based push marketing. J. Travel Tour. Mark. 2016 , 33 , 806–823. [ Google Scholar ] [ CrossRef ]
  • Tussyadiah, I.P.; Wang, D.; Jia, C.H. Virtual reality and attitudes toward tourism destinations. In Information and Communication Technologies in Tourism 2017, Proceedings of the International Conference in Rome, Italy, 24–26 January 2017 ; Springer: Cham, Switzerland, 2017; pp. 229–239. [ Google Scholar ]
  • Tussyadiah, I.P.; Wang, D.; Jung, T.H.; tom Dieck, M.C. Virtual reality, presence, and attitude change: Empirical evidence from tourism. Tour. Manag. 2018 , 66 , 140–154. [ Google Scholar ] [ CrossRef ]
  • Jung, T.; tom Dieck, M.C.; Lee, H.; Chung, N. Effects of virtual reality and augmented reality on visitor experiences in museum. In Information and Communication Technologies in Tourism 2016, Proceedings of the International Conference in Bilbao, Spain, 2–5 February 2016 ; Springer: Cham, Switzerland, 2016; pp. 621–635. [ Google Scholar ]
  • Jung, T.; tom Dieck, M.C.; Rauschnabel, P.; Ascenção, M.; Tuominen, P.; Moilanen, T. Functional, hedonic or social? Exploring antecedents and consequences of virtual reality rollercoaster usage. In Augmented Reality and Virtual Reality ; Springer: Cham, Switzerland, 2018; pp. 247–258. [ Google Scholar ]
  • Uzzell, D. The psychological significance of architectural design in fostering sustainable behaviours within hotel environments. J. Environ. Psychol. 2009 , 29 , 423–431. [ Google Scholar ]
  • Chen, W.; Peng, Y. The influence of hotel design on guests’ perception of safety and comfort: A comparative analysis. J. Hosp. Tour. Res. 2020 , 44 , 345–367. [ Google Scholar ]
  • Pizam, A.; Mansfeld, Y. Tourism, Security and Safety: From Theory to Practice ; Elsevier: Amsterdam, The Netherlands, 2006. [ Google Scholar ]
  • Kolcaba, K. A theory of holistic comfort for nursing. J. Adv. Nurs. 1994 , 19 , 1178–1184. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Alrawadieh, Z.; Law, R. Determinants of Hotel Guests’ Satisfaction from the Perspective of Online Hotel Reviewers. Int. J. Cult. Tour. Hosp. Res. 2019 , 13 , 84–97. [ Google Scholar ] [ CrossRef ]
  • Altman, I. The Environment and Social Behavior ; Brooks/Cole: Monterey, CA, USA, 1975. [ Google Scholar ]
  • Korosec-Serfaty, P. (Ed.) Appropriation of Space, Proceedings of the IAPS-1976, the 3rd International Architectural Psychology Conference ; Louis Pasteur University: Strasbourg, France, 1976. [ Google Scholar ]
  • DeFranco, A.; Lee, J.; Cai, Y.M.; Lee, M. Exploring Influential Factors Affecting Guest Satisfaction: Big Data and Business Analytics in Consumer-Generated Reviews. J. Hosp. Tour. Technol. 2020 , 11 , 137–153. [ Google Scholar ] [ CrossRef ]
  • Bonfanti, A.; Vigolo, V.; Negri, F. Hotel Responses to Guests’ Online Reviews: An Exploratory Study on Communication Styles ; Information and Communication Technologies in Tourism; Springer: Cham, Switzerland, 2008; Available online: https://www.researchgate.net/publication/312053681_Hotel_Responses_to_Guests'_Online_Reviews_An_Exploratory_Study_on_Communication_Styles (accessed on 20 June 2024).
  • Altman, I.; Low, S.M. Place Attachment. Human Behaviour and Environment: Advances in Theory and Research ; Plenum: New York, NY, USA, 1992; Volume 12. [ Google Scholar ]
  • Moser, G.; Uzzell, D. Environmental Psychology. In Handbook of Psychology ; Wiley: Hoboken, NJ, USA, 2003. [ Google Scholar ] [ CrossRef ]
  • Sundstrom, E. Work environments: Offices factories. In Handbook of Environmental Psychology ; Stokols, D., Altman, I., Eds.; John Wiley: New York, NY, USA, 1987; Volume II, pp. 733–782. [ Google Scholar ]
  • Sundstrom, E.; Town, J.P.; Rice, R.W.; Osborn, D.P.; Brill, M. Office noise, satisfaction and performance. Environ. Behav. 1994 , 26 , 195–222. [ Google Scholar ] [ CrossRef ]
  • Giglio, S.; Pantano, E.; Bilotta, E.; Melewar, T.C. Branding luxury hotels: Evidence from the analysis of consumers’ “big” visual data on TripAdvisor. J. Bus. Res. 2020 , 119 , 495–501. Available online: https://www.sciencedirect.com/science/article/abs/pii/S0148296319306435?via%3Dihub (accessed on 20 June 2024).
  • Kim, J.J.; Han, H. Redefining in-room amenities for hotel staycationers in the new era of tourism: A deep dive into guest well-being and intentions. Int. J. Hosp. Manag. 2022 , 102 , 103168. Available online: https://www.researchgate.net/publication/358148866_Redefining_in-room_amenities_for_hotel_staycationers_in_the_new_era_of_tourism_A_deep_dive_into_guest_well-being_and_intentions (accessed on 20 June 2024).
  • Proshansky, H.M.; Fabian, A.K.; Kaminoff, R. Place identity: Physical world socialisation of the self. J. Environ. Psychol. 1983 , 3 , 57–83. [ Google Scholar ] [ CrossRef ]
  • Beck, J.; Egger, R. Emotionalise me: Self-reporting and arousal measurements in virtual tourism environments. In Information and Communication Technologies in Tourism 2018, Proceedings of the International Conference in Jönköping, Sweden, 24–26 January 2018 ; Springer: Cham, Switzerland, 2018; pp. 3–15. [ Google Scholar ]
  • Ruiz, D.; Castro, B.; Diaz, I. Creating customer value through service experiences: An empirical study in the hotel industry. Tour. Hosp. Manag. 2012 , 18 , 37–53. [ Google Scholar ] [ CrossRef ]
  • Ryu, K.; Jang, S. The effect of environmental perceptions on behavioral intentions through emotions: The case of upscale restaurants. J. Hosp. Tour. Res. 2007 , 31 , 56–72. [ Google Scholar ] [ CrossRef ]
  • Ryu, K.; Lee, H.; Kim, W. The influence of the quality of the physical environment, food, & service on restaurant image, customer perceived value, customer satisfaction, and behavioral intentions. Int. J. Contemp. Hosp. Manag. 2012 , 24 , 200–223. [ Google Scholar ] [ CrossRef ]
  • Zeithaml, V. Consumer perceptions of price, quality and value: A means-end model and synthesis of evidence. J. Mark. 1988 , 52 , 2–22. [ Google Scholar ] [ CrossRef ]
  • Aaker, D.A. Managing Brand Equity: Capitalizing on the Value of a Brand Name ; The Free Press: New York, NY, USA, 1991. [ Google Scholar ]
  • Keller, K.L. Conceptualizing, Measuring, Managing Customer-Based Brand Equity. J. Mark. 1993 , 57 , 1–22. [ Google Scholar ] [ CrossRef ]
  • Kotler, P. Marketing Management: Analysis, Planning and Control ; Prentice-Hall: Englewood Cliffs, NY, USA, 1988. [ Google Scholar ]
  • Erickson, G.M.; Johansson, J.K. The Role of Price in Multi-Attribute Product Evaluations. J. Consum. Res. 1985 , 12 , 195–199. [ Google Scholar ] [ CrossRef ]
  • Wu, H.-C.; Ai, C.-H.; Cheng, C.-C. The influence of physical environment on customer emotions, satisfaction and loyalty: A case study of restaurants. J. Foodserv. Bus. Res. 2010 , 13 , 279–299. [ Google Scholar ] [ CrossRef ]
  • Kim, W.G.; Moon, Y.J. Customers’ cognitive, emotional, and actionable response to the servicescape: A test of the moderating effect of the restaurant type. Int. J. Hosp. Manag. 2009 , 28 , 144–156. [ Google Scholar ] [ CrossRef ]
  • Lee, H.; Overby, J.W. Creating value for online shoppers: Implications for satisfaction and loyalty. J. Consum. Satisf. Dissatisf. Complain. Behav. 2004 , 17 , 54–67. [ Google Scholar ]
  • Creswell, J.W.; Plano Clark, V.L. Designing and Conducting Mixed Methods Research , 3rd ed.; Sage Publications: Thousand Oaks, CA, USA, 2017. [ Google Scholar ]
  • Tashakkori, A.; Teddlie, C. SAGE Handbook of Mixed Methods in Social & Behavioral Research , 2nd ed.; Sage Publications: Thousand Oaks, CA, USA, 2010. [ Google Scholar ]
  • Kvale, S.; Brinkmann, S. InterViews: Learning the Craft of Qualitative Research Interviewing , 2nd ed.; Sage Publications: Thousand Oaks, CA, USA, 2009. [ Google Scholar ]
  • Barber, N.; Scarcelli, J.M. Enhancing the assessment of tangible service quality through the creation of a cleanliness measurement scale. Manag. Serv. Qual. Int. J. 2010 , 20 , 70–88. [ Google Scholar ] [ CrossRef ]
  • Bryman, A. Social Research Methods , 5th ed.; Oxford University Press: Oxford, UK, 2016. [ Google Scholar ]
  • Guba, E.G.; Lincoln, Y.S. Fourth Generation Evaluation ; Sage Publications: Thousand Oaks, CA, USA, 1989. [ Google Scholar ]
  • Dillman, D.A.; Smyth, J.D.; Christian, L.M. Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method , 4th ed.; Wiley: Hoboken, NJ, USA, 2014. [ Google Scholar ]
  • Venkatesh, V.; Brown, S.A.; Bala, H. Bridging the qualitative-quantitative divide: Guidelines for conducting mixed methods research in information systems. MIS Q. 2013 , 37 , 21–54. [ Google Scholar ] [ CrossRef ]
  • Johnston, L.G.; Sabin, K. Sampling hard-to-reach populations with respondent driven sampling. Methodol. Innov. 2019 , 12 , 2059799119829906. [ Google Scholar ] [ CrossRef ]
  • Barnett-Page, E.; Thomas, J. Methods for the synthesis of qualitative research: A critical review. BMC Med. Res. Methodol. 2009 , 9 , 59. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Braun, V.; Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 2006 , 3 , 77–101. [ Google Scholar ] [ CrossRef ]
  • Hair, J.F.; Black, W.C.; Babin, B.J.; Anderson, R.E. Multivariate Data Analysis , 8th ed.; Cengage Learning: Boston, MA, USA, 2019. [ Google Scholar ]
  • Eze, S.C.; Chinedu-Eze, V.C.; Bello, A.O. The utilization of e-learning facilities in the educational delivery system of Nigeria: A study of M-University. Int. J. Inf. Learn. Technol. 2020 , 37 , 85–108. [ Google Scholar ]
  • Zhang, X.; Zhao, K.; Xu, X. The interplay of architecture and marketing in the hospitality industry. Tour. Manag. Perspect. 2021 , 35 , 100692. [ Google Scholar ]
  • Jiang, Y.; Kim, Y. Developing multi-dimensional green value: Extending social exchange theory to explore customers’ purchase intention in green hotels—Evidence from Korea. Sustainability 2020 , 12 , 1911. [ Google Scholar ] [ CrossRef ]
  • Zhang, H.; Leung, X.Y. A systematic review of big data analytics in hospitality and tourism. J. Hosp. Tour. Technol. 2019 , 10 , 539–570. [ Google Scholar ]
  • Wu, Y.; Yang, Y. Exploring the impact of service quality on customer satisfaction in the hospitality industry. J. Hosp. Tour. Manag. 2021 , 47 , 170–179. [ Google Scholar ]
  • Lee, J.; Kim, H. The role of experiential value in the hotel industry: The mediating effect of customer satisfaction and the moderating effect of hotel type. J. Hosp. Tour. Manag. 2020 , 43 , 94–104. [ Google Scholar ]
  • Shin, D.; Park, J. Customer engagement in the hospitality industry: The role of emotional labor and job satisfaction. Int. J. Contemp. Hosp. Manag. 2022 , 34 , 502–520. [ Google Scholar ]
  • Wang, Y.; Liu, H.; Wu, J. The role of design and architecture in enhancing hotel brand equity. J. Bus. Res. 2019 , 101 , 560–569. [ Google Scholar ]
  • Wen, H.; Huang, S. Consumer Perception of Hotel Competitive Sets. Cornell Hosp. Q. 2018 . Available online: https://ecommons.cornell.edu/handle/1813/41223 (accessed on 20 June 2024).
  • Kim, J.; Lee, C.; Bonn, M. The impact of hotel design on well-being and guest satisfaction. J. Hosp. Tour. Res. 2020 , 44 , 1017–1040. [ Google Scholar ]
  • Lee, S.; Jeong, M. Effects of experiential design on guest loyalty in luxury hotels. J. Travel Tour. Mark. 2019 , 36 , 995–1008. [ Google Scholar ]
Grroup 1: Aesthetic Group
FactorObserved Frequency (O)Expected Frequency (E)(O − E) /ETest Result
Design Concept Theme5547.561.17Not Significant
Harmony4147.560.91Not Significant
Balance4047.561.2Not Significant
Space3947.561.54Not Significant
Style5347.560.63Not Significant
Beautiful6147.563.92Significant
Creativity4747.560.01Not Significant
Environment5047.560.13Not Significant
Perspective & Visual4247.560.65Not Significant
Group 2: Physical Comfort Group
FactorObserved Frequency (O)Expected Frequency (E)(O − E) /ETest Result
Function6047.563.19Significant
Shape3547.563.38Significant
Proportion & Mass4847.560Not Significant
Texture & Material4947.560.05Not Significant
Human Scale3847.561.92Not Significant
Durability4347.560.44Not Significant
Color5047.560.13Not Significant
Furniture4547.560.14Not Significant
Comfortable6047.563.19Significant
Facilities3847.561.92Not Significant
Circulation4047.561.2Not Significant
Total466523.1615.56Not Significant
Group 3: Emotional Comfort Group
FactorObserved Frequency (O)Expected Frequency (E)(O − E) /ETest Result
Sense of Place5240.453.26Significant
Location3540.450.73Not Significant
Feeling3840.450.15Not Significant
Relationships & Ties3340.451.38Not Significant
Natural Touch4740.451.07Not Significant
Relax4240.450.06Not Significant
Warmth3740.450.29Not Significant
Peaceful4040.450.01Not Significant
Service5540.455.3Significant
Social2840.453.79Significant
Friendly4540.450.51Not Significant
Total45244516.55Not Significant
Group 4: The Security and Sensibility Group
FactorObserved Frequency (O)Expected Frequency (E)(O − E) /ETest Result
Safety5039.562.75Significant
Security3539.560.53Not Significant
Risk3039.562.31Significant
Satisfaction4839.561.79Significant
Loyalty4539.560.75Not Significant
Communication3339.561.09Not Significant
Legal Requirements3639.560.32Not Significant
Modernity3839.560.06Not Significant
Innovation4739.561.4Significant
Sustainability3239.561.44Not Significant
Value/Equality3939.560.01Not Significant
Quality5539.566.05Significant
Efficiency4039.560Not Significant
Expectations4239.560.15Not Significant
Convenient3839.560.06Not Significant
Cleanliness4739.561.4Significant
Room Comfort4539.560.75Not Significant
Remember3239.561.44Not Significant
Total724711.0821.7Not Significant
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Sirirat, S.; Thampanichwat, C.; Pongsermpol, C.; Moorapun, C. The Hotel Architectural Design Factors Influencing Consumer Destinations: A Case Study of Three-Star Hotels in Hua Hin, Thailand. Buildings 2024 , 14 , 2428. https://doi.org/10.3390/buildings14082428

Sirirat S, Thampanichwat C, Pongsermpol C, Moorapun C. The Hotel Architectural Design Factors Influencing Consumer Destinations: A Case Study of Three-Star Hotels in Hua Hin, Thailand. Buildings . 2024; 14(8):2428. https://doi.org/10.3390/buildings14082428

Sirirat, Sanawete, Chaniporn Thampanichwat, Chotewit Pongsermpol, and Chumporn Moorapun. 2024. "The Hotel Architectural Design Factors Influencing Consumer Destinations: A Case Study of Three-Star Hotels in Hua Hin, Thailand" Buildings 14, no. 8: 2428. https://doi.org/10.3390/buildings14082428

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. PPT

    null hypothesis for chi square goodness of fit test

  2. Chi-square goodness-of-fit example

    null hypothesis for chi square goodness of fit test

  3. Chi Square Null Hypothesis Example

    null hypothesis for chi square goodness of fit test

  4. Chi Square Test

    null hypothesis for chi square goodness of fit test

  5. Writing null hypothesis for chi square

    null hypothesis for chi square goodness of fit test

  6. Chi Square Goodness of Fit Test and Chi Square Test for Independence of Categorical Variables

    null hypothesis for chi square goodness of fit test

VIDEO

  1. Statistics: Testing of Hypothesis -Chi square test : Goodness of fit in Tamil

  2. Hypothesis Testing

  3. Statistics Module 12

  4. Chi-square test(χ2-test) of Goodness of fit for Normal Distribution

  5. mod11lec53

  6. Test of Hypothesis ( part

COMMENTS

  1. Chi-Square Goodness of Fit Test

    Example: Chi-square goodness of fit test conditions. You can use a chi-square goodness of fit test to analyze the dog food data because all three conditions have been met: You want to test a hypothesis about the distribution of one categorical variable. The categorical variable is the dog food flavors. You recruited a random sample of 75 dogs.

  2. Chi-Square Goodness of Fit Test: Definition, Formula, and Example

    A Chi-Square goodness of fit test uses the following null and alternative hypotheses: H 0: ... 0.05, and 0.01) then you can reject the null hypothesis. Chi-Square Goodness of Fit Test: Example. A shop owner claims that an equal number of customers come into his shop each weekday. To test this hypothesis, an independent researcher records the ...

  3. Chi-Square Goodness of Fit Test: Uses & Examples

    Null: The sample data follow the hypothesized distribution.; Alternative: The sample data do not follow the hypothesized distribution.; When the p-value for the chi-square goodness of fit test is less than your significance level, reject the null hypothesis.Your data favor the hypothesis that the sample does not follow the hypothesized distribution. Let's work through two examples using the ...

  4. Pearson's chi square test (goodness of fit)

    I understand that if the chi-square value exceeds the appropriate minimum value in the chi-square distribution table, taken into account the degrees of freedom, you can reject the null hypothesis. (And that the same is true of the reverse, if the chi-square value does not exceed the appropriate minimum value in the chi-square distribution you ...

  5. Chi-square statistic for hypothesis testing

    And we got a chi-squared value. Our chi-squared statistic was six. So this right over here tells us the probability of getting a 6.25 or greater for our chi-squared value is 10%. If we go back to this chart, we just learned that this probability from 6.25 and up, when we have three degrees of freedom, that this right over here is 10%.

  6. Chi-Square Goodness of Fit Test

    The chi-square goodness of fit test is appropriate when the following conditions are met: The sampling method is simple random sampling. The variable under study is categorical. The expected value of the number of sample observations in each level of the variable is at least 5. This approach consists of four steps: (1) state the hypotheses, (2 ...

  7. 11.3: Goodness-of-Fit Test

    The test statistic for a goodness-of-fit test is: ∑k (O − E)2 E (11.3.1) where: O = observed values (data) E = expected values (from theory) k = the number of different data cells or categories. The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true.

  8. 11.2: Chi-Square One-Sample Goodness-of-Fit Tests

    the observed count O of each cell in Table 11.2.5 is at least 5, then χ2 approximately follows a chi-square distribution with df = I − 1 degrees of freedom. The test is known as a goodness-of-fit χ2 test since it tests the null hypothesis that the sample fits the assumed probability distribution well. It is always right-tailed, since ...

  9. Chi-Square (Χ²) Tests

    The chi-square goodness of fit test is used to test whether the frequency distribution of a categorical variable is different from your expectations. ... Example: Chi-square test of independence. Null hypothesis (H 0): The proportion of people who are left-handed is the same for Americans and Canadians.

  10. 11.2

    When conducting a chi-square goodness-of-fit test, it makes the most sense to write the hypotheses first. The hypotheses will depend on the research question. The null hypothesis will always contain the equalities and the alternative hypothesis will be that at least one population proportion is not as specified in the null.

  11. Chi-Square Goodness of Fit Test

    Let's look at the candy data and the Chi-square test for goodness of fit using statistical terms. This test is also known as Pearson's Chi-square test. Our null hypothesis is that the proportion of flavors in each bag is the same. We have five flavors. The null hypothesis is written as: $ H_0: p_1 = p_2 = p_3 = p_4 = p_5 $

  12. 11.3 Chi-Square Goodness-of-Fit Test

    Steps to perform a chi-square goodness-of-fit test: First, check the assumptions. Calculate the expected frequency for each possible value of the variable using E = np E = n p, where n n is the total number of observations and p p is the relative frequency (or probability) specified in the null hypothesis. Check whether the expected frequencies ...

  13. Goodness-of-Fit Test

    The test statistic for a goodness-of-fit test is: ∑ k (O−E)2 E ∑ k ( O − E) 2 E. where: O = observed values (data) E = expected values (from theory) k = the number of different data cells or categories. The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true.

  14. Goodness of Fit: Definition & Tests

    The Anderson-Darling test works for continuous data, and the chi-square goodness of fit test is for categorical and discrete data. ... Because the p-value is less than 0.05, reject the null hypothesis and conclude the sample data do not follow a normal distribution.

  15. Hypothesis Testing

    Here we show the equivalence to the chi-square goodness-of-fit test. ... The test is called the χ 2 test of independence and the null hypothesis is that there is no difference in the distribution of responses to the outcome across comparison groups. This is often stated as follows: The outcome variable and the grouping variable (e.g., the ...

  16. When to Use a Chi-Square Test (With Examples)

    You should use the Chi-Square Goodness of Fit Test whenever you would like to know if some categorical variable follows some hypothesized distribution. Here are some examples of when you might use this test: Example 1: Counting Customers. A shop owner wants to know if an equal number of people come into a shop each day of the week, so he counts ...

  17. Chi-Square Goodness of Fit Test

    Example In the gambling example above, the chi-square test statistic was calculated to be 23.367. Since k = 4 in this case (the possibilities are 0, 1, 2, or 3 sixes), the test statistic is associated with the chi-square distribution with 3 degrees of freedom. If we are interested in a significance level of 0.05 we may reject the null hypothesis (that the dice are fair) if > 7.815, the value ...

  18. Chi-Square Goodness of Fit Test Calculator

    A Chi-Square Goodness of Fit Test is used to determine whether or not a categorical variable follows a hypothesized distribution. To perform a Chi-Square Goodness of Fit Test, simply enter a list of observed and expected values for up to 10 categories in the boxes below, then click the "Calculate" button: X 2 Test Statistic: 4.360000.

  19. Chi Square Goodness of Fit

    Chi-square Goodness of Fit is a statistical test commonly used to compare observed data with data we would expect to obtain. ... The level of significance is the maximum tolerable probability of accepting a false null hypothesis. We use 0.05. ... The resulting value is the P value for the Chi-Square test. If you don't want it to be in ...

  20. Chi-Square Goodness of Fit Test

    The chi-square statistic for goodness of fit test is determined by comparing the actual and expected counts for each level of our categorical variable. The steps to computing the chi-square statistic for a goodness of fit test are as follows: For each level, subtract the observed count from the expected count. Square each of these differences.

  21. Chi-Square Goodness of Fit Test

    Hypothesis testing: Hypothesis testing is the same as in other tests, like t-test, ANOVA, etc. The calculated value of Chi-Square goodness of fit test is compared with the table value. If the calculated value is greater than the table value, we will reject the null hypothesis and conclude that there is a significant difference between the observed and the expected frequency.

  22. 4 Examples of Using Chi-Square Tests in Real Life

    1. The Chi-Square Goodness of Fit Test - Used to determine whether or not a categorical variable follows a hypothesized distribution. 2. The Chi-Square Test of Independence - Used to determine whether or not there is a significant association between two categorical variables. In this article, we share several examples of how each of these ...

  23. The Hotel Architectural Design Factors Influencing Consumer ...

    If the calculated value is greater than the critical value, reject the null hypothesis of equal distribution. ... The chi-square goodness of fit test results suggest that the distribution of mentions for factors in the security and sensibility group is uniform, meaning no single factor is disproportionately mentioned more than the others. ...