Locked lesson.
About this lesson
Statistical tests are often used to aid the problem analysis. The result of the test is a statistical measure of the validity of a hypothesis about the problem in the sample with an inference about that problem throughout the data population.
Exercise files
Download this lesson’s related exercise files.
Statistical Analysis Exercise.docx58.3 KB Statistical Analysis Solution.docx
59.4 KB
Quick reference
Statistical Analysis
Statistical tests are often used to aid the problem analysis by inferring attributes of the problem based on an analysis of a sample set of data. The statistical analysis provides a statistical measure, the P value, that is used to determine whether or not to accept the Null hypothesis.
When to use
In some cases, the problem analysis will point to an obvious root cause, but often the cause is not obvious. Statistical analysis is used to confirm a hypothesis when doing hypothesis testing as part of the Analyse stage of a Lean Six Sigma project.
Instructions
There is a context to statistical analysis and it is important to understand that context. Normally, statistical analysis in Lean Six Sigma projects is used in conjunction with hypothesis testing. When the statistical analysis is completed, a “P” value, or probability value, will be generated. Based upon the “P” value, the Null hypothesis is accepted or rejected.
Inferential Statistics
In most cases, the data set being used in the Lean Six Sigma project will not represent the entire possible population of the problem. It is difficult or impossible to get data from all potential instances with all relevant customers, relevant products, relevant locations, across all time periods – past, present, and future. Therefore, a subset, or sample, of the data is used. The statistical analysis is applied to the data from the sample and then the results are inferred to apply to the entire population.
In order to do this inference, additional analysis should be done based on sampling approach and confidence levels. By confidence level, we mean being able to state something about the entire population based on the sample data with a 90%, 95%, or 99% confidence. The detailed analysis of confidence levels and confidence intervals is addressed in the Hypothesis Testing course.
The desired Confidence Level and the descriptive statistics of the sample data (mean and standard deviation) can be used to calculate a confidence interval for the location of the total population mean. The confidence interval calculation can be used in reverse to determine the confidence that a particular value is the mean of the total population based on the sample mean
Hints & tips
- Don’t blindly apply a statistical analysis to your project. If the root cause is obvious, no analysis is needed. If the root cause is not obvious, create a set of hypotheses and based upon the hypothesis and characteristics of the data apply the one right test for each hypothesis.
- Confidence level is not precisely the probability that the population statistics falls within the range associated with that confidence level, but is essentially that.
- Most organizations use a 95% confidence level and that is the default in Minitab.
- 00:05 Hi, I'm Ray Sheen.
- 00:06 Now, during the analyze stage of a Lean Six Sigma project,
- 00:10 we will often be conducting statistical analysis.
- 00:13 Let's cover a few of the principles involved in this analysis.
- 00:17 So how should we approach the use of statistical analysis for our problem?
- 00:22 We'll start with the basic problem-solving elements that we have already discussed.
- 00:26 Use inductive and deductive reasoning to create a null and alternative hypothesis.
- 00:32 We'll then use the statistical analysis to accept or reject the hypothesis.
- 00:38 Depending upon the structure of your hypothesis and
- 00:41 the characteristics of your data set, such as whether it is variable or
- 00:45 attribute and how many data sets you have.
- 00:48 There are a number of different statistical tests that can be used.
- 00:51 Different tests work with different types of data.
- 00:55 Unfortunately, there is no universal test that works with everything.
- 00:59 That's why we created a separate Hypothesis Testing course, and
- 01:04 allows us to focus in on the statistical tests used in Lean Six Sigma analysis.
- 01:09 While there is no universal test, there is a universal measure for
- 01:13 the statistical test analysis used with hypothesis testing.
- 01:17 The Universal measure is the P value, or the probability value.
- 01:22 This measure tells us the probability that your null hypothesis is true.
- 01:26 Based upon the size of the P value, you can either accept or
- 01:30 reject the null hypothesis.
- 01:32 Now, there's a caution here.
- 01:34 The reason we'll be picky about how to write a null and an alternative hypothesis
- 01:38 is that the P value tells you whether to reject the null hypothesis or not.
- 01:43 It does not specifically say that your alternative hypothesis is true.
- 01:48 A poorly constructed alternative hypothesis that is not truly the inverse
- 01:53 of the null hypothesis may not be true either, but
- 01:56 more about that in the hypothesis testing course.
- 01:59 For now it's enough to know that when I write good hypotheses,
- 02:03 there are statistical tests that will tell me whether to accept them or reject them.
- 02:07 Another important topic to discuss is inferential statistics.
- 02:12 This means answering the question,
- 02:14 is the data of your analysis similar to all of the problem data.
- 02:18 Inferential statistics allow us to draw conclusions about all
- 02:22 the data in a population, even though we only have access to a portion of the data.
- 02:27 Political pollsters do this all the time,
- 02:30 they do a poll of several hundred or thousand people and use the statistics
- 02:35 from that data sample to predict the outcome of a national election.
- 02:39 Occasionally, you'll be lucky enough to have all the data for your problem,
- 02:44 but most of the time we only have a portion of the data, and
- 02:48 that's when we need to do inferential statistics.
- 02:51 We can't go back in time and collect data that was not recorded.
- 02:55 We may not have access to all the products or all the customers or all the locations.
- 03:01 Now let's be clear, when we analyze a dataset,
- 03:04 that's only giving us information about the items that are in that dataset.
- 03:09 But with some additional analysis,
- 03:11 we can determine to what degree the analysis of that data set can be used to
- 03:16 infer the characteristics of the full population of all the instances.
- 03:21 This additional analysis is based upon sampling and confidence intervals.
- 03:26 The purpose of inferential statistics is to allow us to use the sample
- 03:30 statistics as a surrogate for the entire population statistics.
- 03:34 Now, the statistical value calculated the sample such as the mean is a point.
- 03:40 But there is some uncertainty as to how close
- 03:43 the sample point is to the real point for the entire population.
- 03:48 The uncertainty is described with a confidence interval.
- 03:52 Essentially, that is the range around the calculated point in
- 03:56 which the real point exists.
- 03:59 The size of that range is based upon characteristics of your sample data and
- 04:04 your desired confidence level, 90% confident, 95, 99% confident.
- 04:10 Although not precisely correct,
- 04:12 it's essentially correct to say that it is probable that the actual mean value
- 04:17 will be within the confidence interval around the sample mean point.
- 04:21 I won't go through all the calculations right now,
- 04:24 we do that in the hypothesis testing course.
- 04:26 But to use this confidence interval, we need the standard deviation
- 04:30 of the population, which we can derive from the size of the sample and
- 04:34 the standard deviation of the sample itself.
- 04:37 Then we need to decide what confidence interval to use.
- 04:40 Most Lean Six Sigma projects use 95%, or
- 04:43 we need to decide on an interval width with which we are comfortable.
- 04:48 If we start with the confidence level, we'll calculate the confidence interval.
- 04:52 If we start with the interval, we can determine the confidence level.
- 04:56 The confidence interval will ultimately be described with a point estimate and
- 05:01 a range around that point estimate.
- 05:04 The size of that range depends upon the confidence level, and
- 05:08 is essentially plus or minus the Z value that you see in the table.
- 05:13 Statistical analysis will allow us to move from guessing the problem to knowing
- 05:17 the problem.
- 05:18 And with inferential statistics,
- 05:20 we can even apply a confidence level to our knowledge of this situation.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.