Retired course
This course has been retired and is no longer supported.
About this lesson
Exercise files
Download this lesson’s related exercise files.
ANOVA.xlsx10.8 KB ANOVA - Solution.docx
231.6 KB
Quick reference
ANOVA
ANOVA is a hypothesis test for comparing the means across multiple samples to determine if they are statistically equivalent.
When to use
The ANOVA tool is widely used in Lean Six Sigma. It is the tool that is used in Gage R&R studies and with Design of Experiments. However, with respect to hypothesis testing, ANOVA is used to test for the equivalence of means across multiple samples when either the X or Y is discrete and the other is continuous.
Instructions
ANOVA stands for ANalysis Of VAriance. It tests the means of multiple samples to determine their equivalence. Unfortunately, when the P Value is low and the Null hypothesis is rejected, the ANOVA does not specifically identify which sample was different. A further study of the data, or in the case of Minitab, the Boxplots, is needed to determine which sample is different.
The ANOVA function performs the same analysis as a Two-sample T Test. When there are only two samples, either hypothesis test can be used. However, when there are more than two samples, the ANOVA should be used. Multiple T Tests could be performed with every combination of samples, but each of those would be susceptible to a Type I Error. When doing multiple tests, the errors begin to compound.
Excel and Minitab can both calculate ANOVA for one or two Y variable. Minitab can also calculate an ANOVA with more than two variable.
- Excel – single Y variable
- Data Analysis > ANOVA Single Factor
- Enter data range, data must be in adjacent columns and each column is a sample set of data.
- Excel – two Y variables
- Data Analysis > ANOVA Two Factor without Replication
- Enter data range, data must be in adjacent columns and each column is a sample set of data
- Minitab – single Y variable
- Stat > ANOVA > One Way
- Select the format of your data and then the data columns
- With the Option button you can change the relationship and you can change the assumption of equal variances (based upon results of the Bartlett’s test).
- With the graphs button you can select the graph of your choice to visualize the comparison of the mean values.
- Minitab – multiple Y variables
- Stat > ANOVA > General Linear Model > Fit General Linear Model
- Select your Y Response variables
- Select you X Factor variables
- With the Model button, interaction between factors can be added as another variable.
Hints & tips
- If your analysis indicates you should reject the Null hypothesis, rerun the analysis after dropping the data column that is the farthest from the other mean values.
- ANOVA is rather forgiving on the Normality assumption.
- 00:05 Hello, I'm Ray Sheen.
- 00:06 We're now going to look at the ANOVA test.
- 00:08 This test becomes strongly linked to lean six sigma because it is used in so
- 00:13 many of the advanced lean six sigma techniques.
- 00:16 Again we start with hypothesis testing decision tree.
- 00:20 We have normal data discrete and continuous x and y and
- 00:24 multiple variables, that get us to the ANOVA.
- 00:28 So let's look at the one way ANOVA.
- 00:30 ANOVA stands for Analysis of Variance.
- 00:33 We compared the means from each of the samples in the analysis to determine if
- 00:37 one of the means is statistically different from the other.
- 00:41 In that regard, it is similar to the 2-Sample T-Test, but
- 00:44 instead of just two samples, it can have many samples.
- 00:48 It is often used to identify significant subsets of the data in large populations.
- 00:53 That is the feature that is used in both the gauge R&R analysis, and
- 00:57 the design of experiments analysis.
- 00:59 It can sort through the many different data points based upon the categories
- 01:03 being used.
- 01:03 And determine which if any provide a result that is statistically different
- 01:07 from the others.
- 01:09 You may be wondering why we would use ANOVA when we could do the same thing with
- 01:13 T-Tests.
- 01:15 Well, the problem is that you can't get the same fidelity in the answer
- 01:18 by using multiple T-Tests.
- 01:20 Keep in mind that with our normal Lean Six Sigma confidence level of 0.95,
- 01:25 there is still a 5% chance for a type I error of false positive.
- 01:29 Now if we have four samples, we would need to conduct six tests.
- 01:33 So the probability of a type I error is compounded.
- 01:37 That means that our chance of making a type I error is now 26.5%.
- 01:42 And keep in mind that the Type II error probabilities often even higher than
- 01:46 the Type I error.
- 01:47 So the chance of the Type II error is often even higher still.
- 01:51 The more T-Tests we run, the more likely we are to make a wrong decision,
- 01:55 either Type I or Type II.
- 01:58 However, with ANOVA is it only one test that needs to be run and
- 02:02 that will check all those relationships between the sample means.
- 02:06 In addition to that,
- 02:07 ANOVA is somewhat forgiving on the assumption of normal data.
- 02:11 It will tolerate a low level of non normal behavior and
- 02:14 still provide excellent results.
- 02:16 So let's look at how we do this test.
- 02:19 In Excel, select the data analysis menu from the data ribbon then
- 02:23 select ANOVA single factor.
- 02:25 Annotate whether the data is grouped by rows or columns and provide the range for
- 02:30 your data table.
- 02:31 The rows or columns must be next to each other and
- 02:34 there can not be any blank rows or columns in the data.
- 02:37 Excel will calculate a P-value.
- 02:39 In this example, the P-value of 0.22 is less than our 0.05,
- 02:44 so we eject the null hypothesis.
- 02:47 The mean values are statistically different.
- 02:49 Minitab will go one step further in testing using ANOVA.
- 02:53 Start in a similar manner, go to the Stat pull down menu and select ANOVA,
- 02:58 then select One Way.
- 02:59 Select the format of your data columns like you did on other test and
- 03:03 then select the data column for analysis.
- 03:06 Then go to the Option panel to select equal variances if you have that
- 03:10 condition.
- 03:11 Also, you can go to the Graphs panel to select the type of graphs you want.
- 03:16 I recommend the interval plot under data plots and the three in one residual plot.
- 03:20 Minitab will provide both plots and
- 03:23 a summary of the analysis in the session window.
- 03:26 As you can see,
- 03:27 one of the items it provides is a P-value which is still 0.022.
- 03:32 Let's take a minute to look at the graphs that we get from Minitab.
- 03:36 The Minitab and Excel P-value will tell us if there's a statistically
- 03:40 significant difference in the means.
- 03:42 But they don't tell you, which sample was the problem.
- 03:45 That is where the graphs add value.
- 03:47 A quick glance at the graphs will normally reveal the difference.
- 03:51 As I mentioned on the last slide, select the Graphs button,
- 03:54 then select both the data graph and residual graphs.
- 03:58 Personally, I like the interval plot, but if a different view works better for you,
- 04:02 then use it.
- 04:03 You can select individual residual plots or do the three in one or
- 04:07 four in one option, depending on the type of test.
- 04:10 So let's look at the interval plot, we have four different categories of data and
- 04:14 the mean was definitely changing between the different categories.
- 04:18 Time 1 and time 4 don't even overlap at the edges of their confidence interval.
- 04:23 So they are clearly not from the same population.
- 04:26 We'll probably need to look at other contextual information to understand
- 04:30 possible reasons for
- 04:31 these differences, that could include looking at the residual plot.
- 04:35 The normality line does not look normal and
- 04:38 we would expect that since there are different populations in the data.
- 04:41 This is a typical reason for non-normality and
- 04:45 we see that the histogram were heavily excude.
- 04:47 Again, an indication that the data is not normal.
- 04:50 It's interesting to see that the verses fit data shows that the range of
- 04:55 residuals was similar, except for that one outlier in the sample on the far right.
- 05:00 That point is up at the upper right hand corner of the plot.
- 05:03 But since the residuals versus fit is similar in other respects,
- 05:07 there is probably not a time-based pattern at work.
- 05:10 But rather, some other attribute that is not captured in the data that is affecting
- 05:15 each of these samples.
- 05:16 So far, we've been trying to evaluate one factor with multiple samples.
- 05:21 However, we can also have multiple factors when working with ANOVA.
- 05:25 Both Excel and Minitab can run an ANOVA with two response factors and
- 05:29 in fact Minitab can work with even more than two.
- 05:32 In Excel, select the Data Analysis menu on the data ribbon and
- 05:36 then select ANOVA Two Factor without replication.
- 05:40 We'll talk about replication when we do the class on the designer experiments.
- 05:45 Now enter the data range in the same way you would with one way ANOVA.
- 05:48 The Excel results will provide a P-value for both column analysis and row analysis.
- 05:54 In Minitab, start with the Stat pulldown menu,
- 05:57 select ANOVA, select General Linear Model, and
- 06:00 then select Fit General Linear Model, and you get this panel.
- 06:05 Now select your Response columns, then select your Factor columns.
- 06:09 You can even include interaction effects by selecting the model button.
- 06:13 And then on the panel that comes up deciding which categories of interactions
- 06:17 you want to conclude and selecting the Add button for that category.
- 06:21 ANOVA is an excellent test for many applications within Lean Six Sigma.
- 06:27 It does a great job of sorting out the complex situation.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.