Locked lesson.
About this lesson
The ANOVA analysis can be done either numerically with a P value or graphically. The P value will indicate whether at least one of the dataset's means is statistically different. The graphical analysis will show which dataset(s) is different.
Exercise files
Download this lesson’s related exercise files.
ANOVA Analysis Exercise.xlsx10.9 KB ANOVA Analysis Exercise Solution.docx
231.4 KB
Quick reference
ANOVA Analysis
ANOVA is a hypothesis test for comparing the means across multiple samples to determine if they are statistically equivalent.
When to use
The ANOVA tool is widely used in Lean Six Sigma. The two-way ANOVA is used in Gage R&R studies and with Design of Experiments. The one-way ANOVA is used as the hypothesis test to test for the equivalence of means across multiple samples when either the X or Y is discrete and the other is continuous.
Instructions
The one-way ANOVA tests the means of multiple datasets to determine if they are statistically different. The two-way ANOVA will compare multiple independent factors to determine which are statistically different. Lean Six Sigma hypothesis testing uses one-way ANOVA. Several advanced statistical techniques used with Lean Six Sigma, such as Design of Experiments and Measurement Systems Analysis, rely on the two-way ANOVA. Those techniques are addressed in separate courses.
The ANOVA analysis can be done manually using the equations introduced in a previous lesson on ANOVA. In that lesson, the calculation for the F statistic was introduced. That F Statistic should be compared to the value from the F Distribution table to determine if there are statistically significant differences between the means of the datasets. The value in the F Distribution table is found by using the number of degrees of freedom between the datasets and the total number of degrees of freedom within the datasets.
Excel and Minitab can both calculate an ANOVA for one or two response (Y) variables. Minitab can also calculate an ANOVA with more than two response variables.
- Excel – single Y variable
- Data Analysis > ANOVA Single Factor
- Enter data range, data must be in adjacent columns and each column is a sample set of data.
- Excel – two Y variables
- Data Analysis > ANOVA Two Factor without Replication
- Enter data range, data must be in adjacent columns and each column is a sample set of data
- Minitab – single Y variable
- Stat > ANOVA > One Way
- Select the format of your data and then the data columns
- With the Option button you can change the relationship and you can change the assumption of equal variances (based upon result of the Bartlett’s test).
- With the graphs button you can select the graph of your choice to visualize the comparison of the mean values.
- Minitab – multiple Y variables
- Stat > ANOVA > General Linear Model > Fit General Linear Model
- Select your Y Response variables
- Select your X Factor variables
- With the Model button, interaction between factors can be added as another variable.
Unfortunately, when the P Value is low and the Null hypothesis is rejected, the one-way ANOVA does not specifically identify which sample was different. A further study of the data, or in the case of Minitab, the Box-plots, is needed to determine which sample is different.
Hints & tips
- When hypothesis testing, use one-way ANOVA. The two-way ANOVA is for more complex analysis.
- If your analysis indicates you should reject the Null hypothesis, rerun the analysis after dropping the data column that is the farthest from the other mean values.
- ANOVA is rather forgiving on the Normality assumption.
- 00:04 Hi, I'm Ray Sheen.
- 00:05 Well, we've considered a bit of the theory behind ANOVA.
- 00:08 Now let's take a look at how we actually do the analysis.
- 00:12 There are actually several flavors to the ANOVA test.
- 00:15 One-way ANOVA compares the mean values of multiple data sets.
- 00:20 It seeks to determine if there is a significant difference.
- 00:23 Two-way ANOVA compares the means of independent factors within
- 00:28 the data sets to determine if there is a significant difference.
- 00:32 We can do the one-way ANOVA analysis in three different ways.
- 00:36 We can do it manually using the table on the next slide.
- 00:39 We can use Excel, or
- 00:41 we can use a statistical software package such as Minitab.
- 00:45 For two-way ANOVA, you have to use a statistical software.
- 00:49 Like many of our hypothesis tests,
- 00:52 the results of the ANOVA can be shown graphically or with a statistical value.
- 00:58 Let's look at the manual method with the F Distribution Table.
- 01:02 You may have a question on the IASSC Lean Six Sigma Black Belt exam that
- 01:05 will require the use of this table.
- 01:07 The table is based upon several data set parameters.
- 01:10 In this case, we're using an alpha of 0.05.
- 01:14 If different parameters are used, it'll require a different table.
- 01:18 We will do a quick exercise.
- 01:20 We have three samples, and each of these three samples is made up of 6 items.
- 01:27 The column entry is the degree of freedoms based upon the number of samples.
- 01:31 Well, that degree of freedom is calculated by subtracting 1 from the total number of
- 01:35 datasets.
- 01:36 In our case, that means 3-1, which is 2.
- 01:40 So we go to column number 2.
- 01:42 Now, the row entry is the sum of the degrees of freedom for
- 01:46 each of the unique samples.
- 01:48 Since each sample, in this case, had 6 items,
- 01:51 the degree of freedom for each sample is 1 less than that, or 5.
- 01:56 And since there are 3 of those, we add 3, 5 times and come up with a total of 15.
- 02:03 So we go in on row 15.
- 02:05 So looking at column 2 and row 15, the F value then is 3.68.
- 02:11 Now let's look at how we do the test in Excel or Minitab.
- 02:14 In Excel, select the data analysis menu from the data ribbon.
- 02:18 Then select ANOVA single factor.
- 02:21 Know whether the data is grouped by rows or columns and provide the range for
- 02:26 the data table.
- 02:27 The rows or columns must be next to each other,
- 02:30 there cannot be any blank rows or columns in the data.
- 02:34 Excel will calculate a P value.
- 02:37 In this example, the P value of 0.022 is less than 0.05,
- 02:42 so reject the null hypothesis.
- 02:44 The mean values are statistically different.
- 02:47 Minitab will go one step further in its analysis using ANOVA.
- 02:51 Start in a similar manner, so go to the Stat pulldown menu,
- 02:55 select ANOVA, then select One Way.
- 02:58 Select the format of your data columns like you did on other tests.
- 03:02 Then select the data columns for analysis.
- 03:05 Now, go to the Options panel and select Equal Variance if you have that condition.
- 03:10 Also, you can go to the Graphs panel to select the types of graphs that you want.
- 03:15 I recommend the interval plot under the data plots and
- 03:19 the Three in One residual plot.
- 03:21 Minitab will provide both plots and
- 03:23 a summary of the analysis in the session window.
- 03:27 As you can see, one of the items it provided is that the P value,
- 03:32 which is 0.022.
- 03:34 Let's take a minute to look at the graphs that we get from Minitab.
- 03:38 The Minitab and XLP values will tell you that there is a statistically significant
- 03:43 difference in the means.
- 03:44 But that doesn't tell you which sample was the problem.
- 03:48 This is where the graphs add value.
- 03:51 A quick glance at the graphs will normally reveal the difference.
- 03:54 As I mentioned in the last slide, select the graphs button and
- 03:58 then select both data graphs and residual graphs.
- 04:01 Personally, I like the interval plot, but if a different view works better for
- 04:06 you then use it.
- 04:06 You can select individual residual plots or
- 04:10 do the Three in One or Four in One option.
- 04:13 So let's look at the interval plot.
- 04:16 We have four different categories of data and
- 04:18 the mean was definitely changing between these different categories.
- 04:23 The mean for time 1 falls within the confidence interval for time 2, and
- 04:27 the mean for time 2 falls within the confidence interval for time 3.
- 04:32 The mean for time 3 falls within the confidence interval for
- 04:35 times 2 and times 4.
- 04:37 And the mean for time 4 falls within the confidence interval for time 3, but
- 04:41 the mean for time 4 does not fall within the confidence interval for time 1.
- 04:46 There's no overlap for those confidence intervals.
- 04:48 So they're clearly not from the same population.
- 04:52 We may need to look at other contextual information to
- 04:54 understand the possible reason for the difference.
- 04:58 That could include looking at the residual plots.
- 05:01 This line does not look normal, and
- 05:03 we would expect that since there are different populations in the data.
- 05:07 That is a typical reason for non-normality.
- 05:11 And we see that the histogram is heavily skewed,
- 05:14 again an indication that the data is not normal.
- 05:17 It's interesting to note that the versus fit data shows that the range of residuals
- 05:23 was similar except for that outlier point on the sample at the far right.
- 05:27 The point is in the upper right corner of that plot.
- 05:30 But since the residual versus fit is similar on the other plots,
- 05:34 this is probably not a time based pattern at work but rather there's some other
- 05:39 special cause attribute not reflected in this data that's affecting the samples.
- 05:44 So far, we've been trying to evaluate one factor with multiple samples.
- 05:49 However, we can also have multiple factors when working with ANOVA.
- 05:54 Both Excel and Minitab can run an ANOVA with two response factors, and
- 05:59 Minitab can even run it with more than two.
- 06:01 In Excel, select the Data Analysis menu on the data ribbon, and
- 06:06 then select ANOVA Two Factor without replication.
- 06:10 We'll talk about replication when we do the class on the design of experiments.
- 06:15 Now under the data range in the same way as you would with a one-way ANOVA,
- 06:20 the Excel results will provide a P value for both a column analysis and
- 06:24 a row analysis.
- 06:25 In Minitab, start with the stat pulldown menu,
- 06:29 select ANOVA, select General Linear Model,
- 06:32 then select Fit General Linear Model, and you get this panel.
- 06:36 Next, select your response columns and then select your factor columns.
- 06:41 You can even include interaction effects by selecting the Model button, and
- 06:45 then on the panel that comes up deciding which categories of interaction you want
- 06:49 to include, and selecting the Add button for that category.
- 06:54 Well, there are several ways to do the ANOVA analysis.
- 06:57 Take your pick based upon the tools available to you.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.