Retired course
This course has been retired and is no longer supported.
About this lesson
Statistical tests are often used to aid the problem analysis. The statistical analysis of a small sample of data can point to root causes of problems in the full data set.
Exercise files
Download this lesson’s related exercise files.
Statistical Analysis.docx59.2 KB Statistical Analysis - Solution.docx
60.3 KB
Quick reference
Statistical Analysis
Statistical tests are often used to aid the problem analysis by indicating whether to accept or reject the Null hypothesis. The statistical analysis of a small sample of data can point to root causes of problems in the full data set.
When to use
In some cases the problem analysis will point to an obvious root cause, but often the cause is not as apparent. Statistical analysis is used to confirm a hypothesis when doing hypothesis testing as part of the Analyze phase of a Lean Six Sigma project.
Instructions
There is a context to statistical analysis and it is important to understand that context. Normally, statistical analysis is being used in conjunction with hypothesis testing. The statistical analysis confirms whether the Null hypothesis should be accepted or rejected. There are literally dozens of possible statistical tests. Which test to use depends upon the structure of the hypothesis and the characteristics of the data. Selecting the correct test will be discussed in the Hypothesis Testing course.
When the statistical analysis is completed, a “P” value, or probability value, will be generated. Based upon the “P” value, the Null hypothesis is accepted or rejected.
Inferential Statistics
In most cases, the data set being used in the Lean Six Sigma project will not represent the entire possible population of the problem. It is difficult or impossible to get data from all potential instances with all relevant customers, relevant products, relevant locations, across all time periods – past, present, and future. Therefore, a subset, or sample, of the data is used. The statistical analysis is applied to the data from the sample and then the results are inferred to apply to the entire population.
In order to do this inference, additional analysis should be done based upon sampling approach and confidence levels. By confidence level we mean being able to state something about the entire population based upon the sample data with a 90%, 95%, or 99% confidence. The detailed analysis of confidence levels and confidence intervals is addressed in the Hypothesis Testing course.
The desired confidence level and the descriptive statistics of the sample data (mean and standard deviation) can be used to calculate a confidence interval for the location of the total population mean. The confidence interval calculation can be used in reverse to determine the confidence that a particular value is the mean of the total population based upon the sample mean.
Hints & tips
- Don’t blindly apply a statistical analysis to your project. If the root cause is obvious, no analysis is needed. If the root cause is not obvious, create a set of hypotheses and based upon the hypotheses and characteristics of the data, apply the one right test for each hypothesis.
- Confidence level is not precisely the probability that the population statistics falls within the range associated with that confidence level, but is essentially that.
- Most organization use a 95% confidence level and that is the default in Minitab.
- 00:05 Hi, I'm Ray Sheen.
- 00:06 During the Analyze stage of a Lean Six Sigma project,
- 00:09 we will often be conducting statistical analysis.
- 00:13 Let's cover a few of the principles involved in this analysis.
- 00:18 Let me start by saying you need to understand your statistics so
- 00:22 that you don't fool yourself into thinking something is good or bad when it isn't.
- 00:27 I'll illustrate this point by using a simple statistical measure, the mean or
- 00:32 average value.
- 00:33 No one statistics provide everything you need to know about the data set.
- 00:36 So while the mean is a very important number, and we use it often,
- 00:40 it still does not tell the whole story.
- 00:43 That's because statistical analysis works with the data set, not an individual
- 00:48 data point, but your customer feels their individual data point.
- 00:52 Their instance is what matters to them, so
- 00:54 while the average value in a data set may be a problem for most customers.
- 00:59 You may still have some customers with perfectly acceptable instances and
- 01:03 vice versa.
- 01:04 The average may be fine, but
- 01:06 there're isolated customers who are not satisfied with the process performance.
- 01:11 Let me illustrate with an example,
- 01:13 we have two call centers that are answering customers questions.
- 01:17 The customer calls an stays on hold until a customer service rep speaks with them.
- 01:21 The average hold time in both call centers is the same, seven minutes.
- 01:26 Now, that level itself may be unacceptable for some industries or customer groups but
- 01:30 maybe considered normal performance for others.
- 01:33 However, let's look in a little more detail at the data.
- 01:36 In the first call center, customers waited between five and
- 01:40 eight minutes with an average of seven minutes.
- 01:42 Depending upon the industry that might mean four angry customers or
- 01:45 four customers who are enjoying the sound of that beautiful telephone hold music.
- 01:50 In the second call center, three of the four customers only waited for
- 01:54 1 minute, but one customer waited for 25 minutes.
- 01:58 So in that case, there are three customers who are pleased with
- 02:01 the responsiveness and one customer who is contemplating how to
- 02:04 create a viral YouTube post complaining about the awful service.
- 02:08 Average in both cases are the same, but
- 02:10 the customer experiences are very different.
- 02:14 So be careful just throwing around statistics,
- 02:16 make sure you understand what they mean.
- 02:19 So how should we approach the use of statistical analysis in our problem?
- 02:23 Well, start with the basic problem-solving elements that we've already discussed.
- 02:28 Use inductive and deductive reasoning to create a null and alternative hypothesis.
- 02:33 We will then use statistical analysis to accept or reject the hypothesis.
- 02:37 Depending upon the structure of your hypothesis and the characteristics of your
- 02:41 data set, such as whether it is variable or attribute data.
- 02:44 And how many data sets you have to work with,
- 02:46 there are a number of statistical tests that you can use.
- 02:50 Different tests work with different types of data, unfortunately,
- 02:54 there's no universal test that works with everything.
- 02:57 That is why we create a separate hypothesis testing course.
- 03:01 It allows us to focus on each of the statistical tests used
- 03:04 in Lean Six Sigma analysis.
- 03:06 And while there's no universal test, there is a universal measure for
- 03:11 the statistical test analysis used when doing hypothesis testing,
- 03:16 that universal measure is the P value or probability value.
- 03:21 This measure tells you the probability that your null hypothesis is true.
- 03:26 Based upon the size of the P value, you can either accept or
- 03:29 reject the null hypothesis.
- 03:32 Now there's a caution here,
- 03:34 the reason we will be picky about how to write the null and alternative hypothesis,
- 03:38 is that the P value tells you whether you can reject the null hypothesis or not.
- 03:43 It does not specifically say that your alternative hypothesis is true.
- 03:47 A poorly constructed alternative hypothesis
- 03:50 that is not a true inverse of the null hypothesis may or may not be true.
- 03:55 But more about all that in the hypothesis testing course.
- 03:58 For now, it's enough to know that when I write a good hypothesis,
- 04:02 there're statistical tests that will tell me how to accept or reject them.
- 04:07 Another important topic to discuss is inferential statistics.
- 04:12 That means answering the question,
- 04:14 is the data in your analysis similar to all the problem data?
- 04:18 Inferential statistics allows us to draw conclusions about all the data in
- 04:22 a population, even though we only have access to a portion of that data.
- 04:26 Political posters do this all the time, they do a poll of several hundred or
- 04:30 thousand people, and use the statistics from that
- 04:33 data sample to predict the outcome of a national election.
- 04:37 Occasionally, you'll be lucky enough to have all the data for your problem.
- 04:41 But most of the time, we only have a portion of a data set,
- 04:44 and that is why we need to do inferential statistics.
- 04:48 We can't go back in time and collect data that was never recorded.
- 04:52 And we may not have access to all the products or customers or locations.
- 04:56 Now let's be clear, when we analyze a data set,
- 04:59 that is only getting information about the items represented by that data set.
- 05:04 But with some additional analysis,
- 05:06 we can determine to what degree the analysis of that data set can be used
- 05:11 to infer the characteristics of the full population of instances.
- 05:17 This additional analysis is based upon sampling rules and confidence intervals.
- 05:22 The purpose of inferential statistics is to allow us to use sample statistics
- 05:27 as a surrogate for the entire population statistics.
- 05:31 Now the statistical value calculated such as a mean is a point, but
- 05:36 there's some uncertainty is to how close the sample point is to the real point for
- 05:41 the entire population.
- 05:42 That uncertainty is described with a confidence Interval.
- 05:46 Essentially, that is the range around the calculated point in which
- 05:50 the real point actually exists.
- 05:53 The size of this range is based upon the characteristics of your sample data,
- 05:58 and the desired confidence level 90%, 95, 99.
- 06:02 Although not precisely correct,
- 06:03 it is essentially correct to say that it is a probability that the actual mean will
- 06:07 be within the confidence interval around the sample mean.
- 06:10 I won't go through all the calculations right now we'll do better
- 06:14 the hypothesis test in course.
- 06:17 But to use this confidence interval, we need to have the standard deviation
- 06:21 of the population, which we can derive from the size of the sample and
- 06:25 the standard deviation of the sample.
- 06:27 Then we either need to decide what confidence interval to use, most
- 06:32 people use 95%, or we need to decide what interval width we are comfortable with.
- 06:37 If we start with a confidence level, we will calculate the confidence interval
- 06:41 if we start with the interval, we will determine the confidence level.
- 06:45 The confidence interval will ultimately be described with a point estimate and
- 06:49 a range around that point estimate.
- 06:51 The size of the range depends upon the confidence level, and
- 06:55 is essentially plus or minus the Z value that you see in this table.
- 07:00 Statistical analysis will allow us to move from guessing the problem to knowing
- 07:05 the problem.
- 07:06 And with inferential statistics,
- 07:08 we can even apply a confidence level to our knowledge of this situation.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.