Retired course
This course has been retired and is no longer supported.
About this lesson
Exercise files
Download this lesson’s related exercise files.
Chi Square Test.xlsx11.7 KB Chi Square Test - Solution.docx
220 KB
Quick reference
Chi Square Test
The Chi Square test is a hypothesis test that considers categories and counts of discrete data items to determine whether the categories are independent.
When to use
When the data that is being analyzed is discrete data and the actual data is counts of different categories, Chi Square test is the appropriate hypothesis test to determine if the categories are independent.
Instructions
The Chi Square test is a commonly used test to determine the independence of categories within a data set. This test can be used to separate statistically significant dependent relationships from those that are independent. Chi Square can be used when working across multiple sample sets of data.
The data is normally organized in a table of counts. The columns are the count categories and the rows represent the different samples as shown in the table below.
- Excel has a function for conducting a Chi Square test.
- The data is first recorded in a table in the format shown above – this is the “Actuals” table.
- An “Expected” table is created by multiplying the Row Percentage times the Column Percentage times the grand total of the counts for each of the cells in the matrix.
- The total for each row and each column should be the same in both matrices, although the Actual matrix will have whole number counts in each cell and the Expected matrix will have a calculated value that is normally not a whole number in each cell.
- Use the CHISQ.TEST function and provide the range for each matrix.
- Excel provides a P value for independence.
- Minitab is able to calculate a Chi Square test.
- Stat > Tables > Chi Square Test for Association
- Select the data columns (You do not need to create the “Expected” table, Minitab will do that automatically.
Hints & tips
- If doing the analysis in Excel, be sure the totals for columns and rows are the same in both the Actual and Expected tables.
- Chi Square will tell you if at least some of the factors are independent, however if some are and some are not, it will not separate out which factors are independent and which are dependent. You will need to test that by reducing some of the columns in your table.
- 00:04 Hi I'm Ray Sheen.
- 00:05 Now sometimes we have many data sets with discreet data and
- 00:09 we wanna understand if there's a relationship between them.
- 00:13 The Chi Square Hypothesis Test will answer that question for us.
- 00:18 Starting with our hypothesis test decision tree, we go to normal data, discrete x and
- 00:23 y data, and more than two variables, which is the chi-square test.
- 00:27 The chi-square test is ideally suited for multiple attribute data hypothesis.
- 00:33 If you have two or more discreet variables in your sample,
- 00:36 it can determined if they are independent or if they are linked.
- 00:39 Some of you may be aware of other uses of the Chi Square Test in basic research,
- 00:43 such as checking for goodness of fit or homogeneity.
- 00:46 We will be focused on lean Six Sigma projects and multiple discrete attribute
- 00:50 testing, and it can use more than just true false or on off data.
- 00:54 It is excellent for counts.
- 00:56 If you're comparing two different samples,
- 00:57 the two sample test of proportions is best.
- 01:00 But if you're comparing two or
- 01:01 more variables within one sample, then use this test.
- 01:05 This test relies on the number of counts for each variable or
- 01:08 attribute characteristic to determine if the different attributes are truly
- 01:12 independent or if they are related to each other, or some other factor.
- 01:17 When using this test in Lean Six Sigma be careful to check your categories for
- 01:21 real, potential, root causes.
- 01:23 The Chi-Square test doesn't know if the categories make sense
- 01:26 to be tracked separately.
- 01:28 If you're trying to determine factors relating to people who have arthritis,
- 01:32 Chi-Square will likely show you that there is a relationship between people with
- 01:35 with gray hair and arthritis, but that does not mean gray hair causes arthritis.
- 01:40 Obviously, a much more relevant factor would be age,
- 01:43 which increases the likelihood of both gray hair and susceptibility to arthritis.
- 01:49 The null hypothesis for Chi-Square is always at the factors are all independent.
- 01:53 There is nothing connecting any of them.
- 01:55 The alternative hypothesis is that the factors are dependent.
- 01:58 There is a relationship between at least two of the factors.
- 02:02 Let's look at how to do this test with Excel.
- 02:05 It's a rather complex process that must be followed.
- 02:08 Excel uses the CHISQ.TEST function to compare two tables.
- 02:13 One table is the actual values and one is expected values.
- 02:17 The actual values we get from our data,
- 02:19 the expected value we have to create manually.
- 02:23 Both tables are structured in the same manner.
- 02:25 The columns are the event categories where we have the count data.
- 02:29 The rows are the categories we're using to create the different proportions.
- 02:34 The actual table is easy to populate.
- 02:37 Create your row and column categories and then count the instances for
- 02:40 each cell in the table.
- 02:41 The expected tables calculated based upon the percentage values for
- 02:46 each total row value and total column value from the actual table.
- 02:51 The value MSL is the row percentage times the column percentage times
- 02:55 the total count of all items.
- 02:57 Let's look at an example.
- 02:59 In this example, we want to determine the types of entertainment values people
- 03:03 attend is dependent upon their age.
- 03:06 The null hypothesis is that the attendance at the different venues and
- 03:09 the persons age are independent.
- 03:12 The alternative hypothesis is that the type of entertainment venue attended
- 03:15 does depend upon the age of individual.
- 03:18 This table is the actual's table.
- 03:20 The ages of divided into eight categories, and
- 03:23 there are seven different entertainment venues.
- 03:25 The total for each row and each column is added up and the total for
- 03:30 the entire table is added up in the lower right corner.
- 03:33 A total of 1181 data points in the table.
- 03:37 Now, the percentage for each row and each column are also
- 03:42 determined by dividing the row or column totals by 1181.
- 03:47 Now we build the expected table.
- 03:49 To do this for each cell, take the column percentage for that cell, multiply it
- 03:54 times the row percentage for that cell and then multiply that times the grand total.
- 04:00 So for our upper left cell, we take the column percentage for
- 04:03 movies of 19.31%, multiply it times the row percentage for
- 04:09 age 6 to 12 which is 5.42% and then multiply that
- 04:13 result times 1181 which is the total number of counts in the table.
- 04:19 This gives us a value of 12.36 for that cell.
- 04:24 Then continue that process for all of the other 55 cells in the table.
- 04:28 Now you are ready to use the CHISQ.TEST function.
- 04:31 The arguments for that function are the range in the two tables.
- 04:35 Be sure that you do not use the total and percentage rows and
- 04:38 columns in the actual table, just the rows and columns with data.
- 04:43 The answer from Excel is 1.59 times e to the -39, which means there are 38
- 04:50 0s to the right of the decimal point before we start to see any other numbers.
- 04:55 In Lean Six Sigma, we call that a P value of 0, so reject the null hypothesis.
- 05:00 The entertainment venue selected by an individual does depend upon their age.
- 05:06 Well the Chi Square Test is much easier to complete in Minitab.
- 05:09 The Minitab version of Chi Square Test uses the same actual table as Excel.
- 05:14 But Minitab will create the expected table for
- 05:16 you, you don't need to do that manually.
- 05:19 To do the test, go to the Stat pulldown menu, select Tables which is
- 05:23 near the bottom of the list, and then select Chi Square Test for association.
- 05:28 That will bring up this panel.
- 05:29 Assuming your data is in the table format, select Summarized Data in two-way table.
- 05:35 Then select the table columns that contain data.
- 05:38 Now select the column with your category labels.
- 05:41 In our example, it was the Ages column.
- 05:43 Minitab now creates expected table, runs the calculation, and
- 05:48 provides a P value just like with Excel.
- 05:50 The P value is 0, so reject the null hypothesis.
- 05:54 Age does influence entertainment venue selection.
- 05:57 The Chi Square takes some work to do that analysis in Excel.
- 06:02 It's definitely much easier in Minitab.
- 06:05 But whichever application you use,
- 06:07 this hypothesis test can bring order to confusion.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.