Locked lesson.
About this lesson
The Analysis of Variance (ANOVA) test is a commonly used test in Lean Six Sigma projects. It allows the comparison of multiple data sets to determine whether there is a statistical difference in those data sets. The analysis can be easily done in both Excel and Minitab. This lesson addresses the basics of ANOVA.
Exercise files
Download this lesson’s related exercise files.
ANOVA Approach Exercise.xlsx10.7 KB ANOVA Approach Exercise Solution.docx
60.2 KB
Quick reference
ANOVA Approach
The One-way ANOVA is a hypothesis test for comparing the means across multiple samples to determine if they are statistically equivalent.
When to use
The One-way ANOVA tool is the hypothesis testing approach used to test for the equivalence of means across multiple samples when either the X or Y is discrete and the other is continuous.
Instructions
ANOVA stands for ANalysis Of VAriance. It tests the means of multiple samples to determine their equivalence. The one-way ANOVA function performs the same analysis as a Two-sample T Test. When there are only two samples, either hypothesis test can be used. However, when there are more than two samples, the ANOVA should be used. Multiple T Tests could be performed with every combination of samples, but each of those would be susceptible to a Type I Error. When doing multiple tests, the errors begin to compound.
The form of the hypothesis test for ANOVA is:
H0: x̄1 = x̄2 = x̄3 = x̄4 ….
Ha: x̄1 ≠ x̄2 ≠ x̄3 ≠ x̄4 ….
ANOVA determines the variation “between” groups and the variation within each group. These are compared to see if the “between group” variation is so large that it cannot be accounted for by the normal “within group” variation. This is represented by the F statistic which is the ratio of the “between”/”within” variations
The actual values calculated for each are based upon the Mean Sum of Squares formulas for the between and the within conditions. Alternatively, the ratio can be expressed using the Sum of Squares of the treatment and the residuals, which is essentially the same formula after canceling out the degrees of freedom.
Hints & tips
- The one-way ANOVA is the preferred method for analyzing the means of multiple datasets.
- Each dataset should have enough data points to determine a statistically significant mean value. With very small sample sizes, the confidence interval is so large on the within group that a wide range of between group means can exist without a statistically significant result.
- Whenever possible, use software for conducting the ANOVA to avoid the possibility of math errors.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 Well we're now going to talk about the ANOVA test.
- 00:09 This test has become strongly linked to Lean Six Sigma because
- 00:13 it's used in many of the advanced Lean Six Sigma analyses.
- 00:18 Again, we start with a hypothesis testing decision tree with discrete and
- 00:23 continuous x and y data, normal data, and multiple variables.
- 00:28 That gets to ANOVA.
- 00:30 Just to be clear, we could use ANOVA test with two variables, but
- 00:33 the T-Test is much simpler from a mathematical standpoint.
- 00:37 However, since we are letting the computer do the math there's nothing wrong with
- 00:42 using an ANOVA test with just two samples.
- 00:45 So let's look at the one way ANOVA.
- 00:48 ANOVA stands for ANalysis of VAriance.
- 00:50 A bit of a misnomer because it uses both the means and
- 00:53 the variance in the analysis.
- 00:56 It compares the means from each of the samples in the analysis to determine if
- 01:00 one of the means is statistically different from the others.
- 01:04 In this regard, it's similar to the two sample T-Test, but
- 01:07 instead of just two samples it can have many samples.
- 01:11 It's often used to identify significant subsets of
- 01:14 the data in a large population that is a feature behind its use and
- 01:18 both the gauge r&r analysis and the design of experiments analysis.
- 01:23 It can sort through the many data points based upon the categories being used and
- 01:27 determine which, if any,
- 01:29 provides a result that is statistically different from the others.
- 01:33 So the null hypothesis is that the mean of all the subsets is the same.
- 01:38 There's nothing to see here.
- 01:39 And, of course, the alternative hypothesis is that at least one of these subsets is
- 01:43 significantly different from the others.
- 01:46 You may be wondering why we would use ANOVA when we could do the same thing with
- 01:50 t-tests.
- 01:51 Well, the problem is that you can't get the same fidelity in the answer by using
- 01:56 multiple T-Tests.
- 01:57 Keep in mind that with our normal Lean Six Sigma confidence level of 0.95
- 02:02 there's still a 5% chance for a type I error, a false positive.
- 02:07 If we had four samples we would need to conduct six T-Tests.
- 02:11 So the probability of a Type I error is compounded, that means that our
- 02:16 chance of making a Type I error across all six tests is now up to 26.5%.
- 02:21 And keep in mind that the Type II error probability is often higher than
- 02:26 the Type I error, so the chance of a Type II error is even higher.
- 02:30 The more T-Tests we run, the more likely we are to make a wrong decision.
- 02:35 However, with ANOVA it's only one test that needs to be run and
- 02:38 that will check all of the relationships between the sample means.
- 02:42 In addition to that, ANOVA is somewhat forgiving on the assumption
- 02:46 of normal data, it will tolerate a low level of non-normality and
- 02:51 still provide excellent results.
- 02:54 Let's spend another minute on the ANOVA approach.
- 02:58 The ANOVA analysis will determine if the between group variation is large
- 03:03 enough to distinguish a difference from the within group variation.
- 03:07 Now between group variation is a variation of the different subset means.
- 03:13 The within group variation, this is where variance comes into play,
- 03:17 is based upon the variance that is occurring within the entire group itself.
- 03:21 ANOVA will calculate an F statistic, which is the ratio of the between
- 03:27 group mean sum of squares over the within group mean sum of squares.
- 03:32 Another feature of ANOVA is that it does this calculation assuming that
- 03:37 the sample sizes in the group are not the same.
- 03:41 The formula for each of these means sum of squares is shown here.
- 03:46 Note that the between group means sum of squares is comparing sample means to
- 03:50 the grand mean of all the data, while within group means sum of squares is
- 03:54 looking at the variation in each unique sample and then summing those results.
- 04:00 Quick word about the sum of squares.
- 04:03 The F statistic can also be expressed as a sum of squares of the treatment
- 04:08 plus the sum of squares of the residuals.
- 04:10 The sum of squares of the treatment is another way of expressing the between
- 04:15 variants.
- 04:16 You can see the equation here.
- 04:18 And the sum of squares, the residual error,
- 04:20 is another way of expressing the within variance.
- 04:24 It's based upon the residuals of each of the data points.
- 04:28 We can then convert the sum of square values into mean square values
- 04:32 when we divide by the appropriate number of degrees of freedom.
- 04:37 This is tied to the number of samples that we are working with.
- 04:40 The ratio indicates the magnitude of these effects compared with
- 04:44 the inherent variability.
- 04:45 The larger the magnitude, the larger the effect.
- 04:48 The magnitude of the ratio is small, it implies that whatever variation we see
- 04:53 from sample to sample, well that's just based upon chance because
- 04:57 all the variation is still within the predicted interval for variation.
- 05:02 ANOVA is a great test for many applications within Lean Six Sigma, and
- 05:08 in particular, it's well suited for sorting out a very complex situation.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.