Locked lesson.
About this lesson
The statistical analysis of a small sample of data can point to root causes of problems in the full data set. The selection of the correct statistical test is key to conducting an effective analysis.
Exercise files
Download this lesson’s related exercise files.
Advanced Statistical Analysis Exercise.docx64 KB Advanced Statistical Analysis Solution.docx
63.1 KB
Quick reference
Advanced Statistical Analysis
The statistical analysis of a small sample of data can point to the root causes of problems in the full data set. The selection of the correct statistical test is key to conducting an effective analysis.
When to use
Advanced statistical analysis using hypothesis testing inferential statistics is done during the Analyze stage of a Lean Six Sigma project.
Instructions
Hypothesis testing is a wide field of statistical analysis that is associated with inferential statistics. There are many different hypothesis tests. The selection of the hypothesis test is an essential step in effective hypothesis testing. The selection is based on factors such as:
- Whether the X and Y elements of the problem are continuous data or discrete attribute data categories
- Whether the data is normal or non-normal
- Whether the data is stable exhibiting only common cause or if there is a special cause present
- The number of samples or variables available within the datasets
In some cases, there are multiple tests that may be applicable, with one test focusing on comparing mean values and another test focused on comparing variances between datasets.
Hypothesis tests allow the Lean Six Sigma team to accept an assertion about a parameter, essentially confirming what we think we know.
The results of the hypothesis testing provide clarification of how the “x” terms relate to the “Y” terms in the Y=F(x) equation. This essentially clarifies the “F” relationships and gives the Lean Six Sigma team the insight needed to change the relationship or predict the performance based on a selection of the “x’s.” In fact, hypothesis tests like regression analysis will actually create the Y=F(x) formula
Hints & tips
- Many of the tests are available using Excel. All are available in Minitab.
- A detailed description of each test and how to use it is found in the Hypothesis Testing course.
- Often the insight comes from a visual analysis of the data and the statistical analysis is used to verify the hypothesis created from the visual analysis.
- 00:04 Hi, I'm Ray Sheen.
- 00:05 If you're doing a statistical analysis to solve your problem,
- 00:09 an obvious question is, which analysis should we be using?
- 00:12 Let's discuss that question.
- 00:14 We have an entire course devoted to hypothesis testing.
- 00:17 So this is just a brief overview of some highlights.
- 00:21 Hypothesis testing is a type of inferential statistics.
- 00:25 We analyze a subset of the data and
- 00:27 infers something about the full set because of that analysis.
- 00:32 The analysis allows us to accept a hypothesis which is an assessment we made
- 00:37 about the data set or it may give us the confidence to reject the hypothesis.
- 00:43 The point is that we can use a subset of the data to draw a conclusion about
- 00:47 the full data population.
- 00:48 We don't need all of the data to be able to make a decision to accept or
- 00:53 reject a hypothesis.
- 00:55 Now another point associated with hypothesis testing is that you must first
- 00:59 make the hypothesis In order to test it.
- 01:02 So we have often done graphical analysis or
- 01:05 some other discovery process to identify likely causes of a problem.
- 01:10 We can then use hypothesis testing to statistically decide whether to accept or
- 01:15 reject a hypothesis.
- 01:17 And depending on the test selected,
- 01:19 you can confidently draw conclusions with only a small sample of the data available.
- 01:25 We often say that hypothesis testing is a means of using data to
- 01:29 demonstrate what we already think we know, but
- 01:33 instead of relying on our gut feel, we now can rely on data.
- 01:37 Now as far as testing goes, you need to understand your test, so
- 01:41 you don't fool yourself into thinking something is good or bad when it isn't.
- 01:45 There are many different statistical tests.
- 01:48 Different ones are appropriate depending upon the nature of your data.
- 01:52 I don't want you to feel intimidated about hypothesis testing.
- 01:55 Most of the tests can be done using Excel,
- 01:58 you just need to load your data into your spreadsheet and then use the correct test,
- 02:03 depending upon the data and the hypothesis.
- 02:05 While most of the tests we discussed can be done in Excel,
- 02:08 all of them can be done using other statistical software such as Minitab.
- 02:12 Again, the hypothesis testing course will discuss each of the tests and
- 02:17 how to do them.
- 02:18 When selecting which hypothesis test to use,
- 02:21 there will be four questions that help us to select the best test.
- 02:25 The first question is, have special causes been removed?
- 02:28 If they haven't been removed,
- 02:30 you really can't have any confidence in any statistical analysis.
- 02:33 The next question is, is the sample data set normal?
- 02:37 Some of the tests work with normal data, some work with non-normal data.
- 02:41 The third question, are the variables discrete or continuous?
- 02:46 Now, this question applies to both the x and y elements of the y = fx equation.
- 02:51 Different tests work with different types of data.
- 02:54 And the fourth question, how many samples are available for comparison?
- 02:59 The types of tests can also change based upon the magnitude of the dataset and
- 03:04 the number of datasets available.
- 03:06 Now, don't worry about memorizing this decision table,
- 03:09 we cover it in-depth in the hypothesis testing course.
- 03:12 The point is that there are many different tests and
- 03:15 in order to trust the results, you need to use the correct one.
- 03:19 You'll see that the branch points occur based upon the answers to the four
- 03:23 questions we discussed on the previous slide.
- 03:26 Let's wrap this up by going back to your y = fx equation.
- 03:30 You may recall that in the analyze phase of the dmaic,
- 03:34 we wanted to understand the function part of the equation, that is the F.
- 03:39 In the defined phase, we identified the y we cared about.
- 03:42 And in the measure phase, we identify the x variables and gather data around those.
- 03:47 Now we investigate to understand how these all come together.
- 03:51 A good understanding of the function gives us confidence that we can
- 03:55 predict a Y value based upon the X inputs,
- 03:58 which allows us to tune the design of the process and control it going forward.
- 04:04 In fact, some of the hypothesis tests such as regression analysis will
- 04:09 actually generate an equation for you and will fully define the F function.
- 04:14 Even the tests that do not provide a precise mathematical formula will at least
- 04:18 provide some sense of the relationship.
- 04:20 So that will help us to determine which parameters are impacting others and
- 04:25 the sense of the strength of that relationship.
- 04:28 Hypothesis tests can provide insight about problems and
- 04:34 can be used to verify root causes.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.