Retired course
This course has been retired and is no longer supported.
About this lesson
Exercise files
Download this lesson’s related exercise files.
Hypothesis Tests.docx64.2 KB Hypothesis Tests - Solution.docx
63.4 KB
Quick reference
Hypothesis Tests
There are many different statistical tests that can be used with hypotheses. The tests shown in this module are the most commonly used tests in Lean Six Sigma. The decision to choose a test is based upon many factors associated both with the data and the nature of the hypothesis statements.
When to use
Use hypothesis tests to provide statistical confidence concerning your assumptions with regards to root causes of the problem and the performance of the improved process.
Instructions
The decision tree shown below should be used for selecting which hypothesis test to use. In some cases, more than one test could be used, so the most commonly used one is shown. In several of the non-normal categories, there are multiple options. For those, choose the test that best matches the shape and characteristics of the data. In some cases, a preliminary hypothesis test must be done prior to the final test because the preliminary test will provide some key insight into how to complete the final test (F Test, Bartlett’s, Levene’s).
Hints & tips
- Follow the paths of the decision tree, do not skip a decision.
- In some cases, a test may be applicable for multiple conditions (ANOVA.) However, this decision tree attempts to show the simplest test for each of the conditions.
- Many of the Normal tests are somewhat forgiving if the data is close to normal. However, if the data is definitely non-normal, use a non-normal test.
- Excel is able to conduct virtually all of the normal tests and none of the non-normal tests.
- Minitab will perform all the tests listed (and many more that are not listed.)
- 00:04 Hi, I'm Ray Sheen.
- 00:05 You know, I kept talking about Hypothesis Tests.
- 00:09 And you're probably starting to wonder, what are these tests?
- 00:11 Well, let's take a look.
- 00:14 Well, in this lesson we'll lay out a decision tree, for
- 00:17 determining which test to use when.
- 00:19 Steps 3 and 4 of the Hypothesis Testing process, are to gather data and
- 00:23 apply the appropriate test to the data, in order to to calculate the test statistic.
- 00:28 It should come as no surprise that different test are used depending upon
- 00:31 the type of data that is available, and
- 00:33 that different tests calculate different test statistics.
- 00:36 Based upon the test results, we either accept or reject the null hypothesis.
- 00:41 When determining which test to use, there are a number of questions that we need to
- 00:45 ask, about the data that's available and the hypothesis.
- 00:48 The answers will determine which test is best suited to give us a good statistic.
- 00:53 One question is, whether or not special causes are present in the data.
- 00:57 Special causes will distort the data, and invalidate the hypothesis test.
- 01:01 Another key question is whether or not the data is normal.
- 01:04 We have different tests depending upon normality.
- 01:07 The nature of the data structure, whether it's discrete or continuous,
- 01:10 will also impact the hypothesis test selection.
- 01:13 And finally, the number of data samples in the investigation
- 01:17 can impact which test we should use.
- 01:20 So let me take you through a decision tree, for
- 01:22 how to select the hypothesis test.
- 01:24 This decision tree contains twenty different tests, there even more tests,
- 01:29 but these are the most commonly used, once in some cases
- 01:32 several different tests could be used I've identified the one that I normally use.
- 01:37 And it provides an accurate test statistic for that condition,
- 01:40 if your business requires you to use a different test then use it.
- 01:44 You'll probably reach the same conclusion with respect to the hypothesis,
- 01:48 to either accept or reject the null.
- 01:50 So we start by determining if the process is stable, if not,
- 01:54 address the special causes.
- 01:56 If the process is stable, then we consider the type of data,
- 01:59 is it discreet, continuous, or some of both.
- 02:02 When all the variables are continuous, we next ask the question of
- 02:06 how many independent variables are being considered.
- 02:09 If there's only one independent variable, you can check for correlation, and
- 02:12 if it exists create a simple linear regression equation.
- 02:16 If there are multiple variables do multiple regression analysis.
- 02:20 When both the dependent and independent variables are discreet,
- 02:23 we need to ask the question, how many samples are involved in the hypothesis?
- 02:28 If only one sample is involved do a one sample test of proportions.
- 02:32 If two samples do two sample test of proportions.
- 02:35 And if more than two samples do a Chi-Square test.
- 02:38 Now consider the case where one of the variables is discreet, and
- 02:42 one is continuous.
- 02:43 At this point you need to determine if the data is normal or not.
- 02:46 If the data is normal,
- 02:47 we'll ask our same question about the number of data samples in the analysis.
- 02:52 If it's just one sample, d a one-sample T-test, or a one-sample test for
- 02:56 variance depending upon the effect you want to analyze.
- 02:59 If more than one sample, we need to do a variance test, either the F test or
- 03:03 the Bartlett's test.
- 03:05 This is because the remaining tests are conducted differently,
- 03:07 depending upon the answer to the question, of whether the variances are equal.
- 03:12 If there are two samples involved, either use a Two-Sample T-Test or
- 03:15 a Paired T-Test, depending upon the structure of the data and the hypothesis.
- 03:20 If there's more than two samples, use the ANOVA test.
- 03:23 Now let's look at the case where the data is not normal, again,
- 03:27 we ask the question to how many samples are involved.
- 03:30 If there is only one sample, use the one sample sign, or
- 03:34 the one sample Wilcoxon depending upon the shape of the data.
- 03:37 If there are two samples, do Levene's test for variance,
- 03:41 and then the Mann Whitney for the means.
- 03:44 And finally, if there are two or more samples, you can use the Mood's Median,
- 03:47 Krusical-Walls, or Friedman test depending upon the data characteristics.
- 03:52 We'll cover all these tests in detail in the remaining lessons, but
- 03:56 let's first consider a few of the preliminary questions
- 03:59 that are in the front half of the decisions tree.
- 04:02 I'll start with common cause and special cause.
- 04:04 We've discussed this in detail in other lessons,
- 04:07 in other parts of the Lean Six Sigma program.
- 04:09 So here's a quick review, common cause is normal variation.
- 04:13 It's random in nature, but within normal bounds for
- 04:16 the process, because of that, it is predictable.
- 04:19 Every process has some level of common cause variation,
- 04:22 that is inherent in the process design.
- 04:25 You cannot change this type of variation,
- 04:26 without a fundamental change to the process.
- 04:29 In contrast, special cause variation is unique.
- 04:32 It is unpredictable, non-random variation,
- 04:34 it is not due to inherent design of the process.
- 04:37 Therefore, it is outside the control of the process, but
- 04:40 it impacts process performance.
- 04:43 Although it can't be predicted, it can be controlled by placing process controls
- 04:47 on the process, or input that screens or filters out the special causes.
- 04:52 When you have special causes present in your Lean Six Sigma project,
- 04:55 you don't need to do further hypothesis testing.
- 04:58 You already have a clear root cause of the problem that you need to solve,
- 05:01 so, fix it.
- 05:03 Next, we ask whether the data is continuous or discrete.
- 05:06 Again, we've discussed these topics in detail in Lean Six Sigma courses, so
- 05:10 this is just a quick review.
- 05:12 Discrete data is data that can take on a set or predefined specific value.
- 05:17 There is a clear break between data values, and
- 05:20 no meaningful data exists between these values.
- 05:24 For instance the switch is either on or off.
- 05:26 Or we are using material from supplier A or from supplier B.
- 05:30 Most of these tests will require that the information be expressed as an integer,
- 05:34 such as off is zero and on is one.
- 05:37 Or supplier A is one and supplier B is two.
- 05:40 If you were to plot the data, it's a step function with the value for
- 05:43 each integer, that represents a discrete value.
- 05:46 There is no fractional values between those levels.
- 05:49 In contrast, continuous data can take on any value.
- 05:52 Between any two continuous state of values,
- 05:54 there is the ability to have another value.
- 05:57 Often the data is represented in decimals or fractions, rather than integers.
- 06:01 And a data plot will have a smooth curve,
- 06:04 rather than the stair-step effect with discrete data.
- 06:07 Finally let's review normal and non-normal.
- 06:10 Normal data has some very specific characteristics.
- 06:13 First, it's symmetric,
- 06:14 there are as many data points above the mean as below the mean.
- 06:17 Second, it is peaked in the center, there is a central tendency.
- 06:21 That has many more data points at the center of the distribution,
- 06:24 than at the edges.
- 06:26 And third, while the edges are dropping to near zero,
- 06:29 they theoretically could extend to infinity.
- 06:32 But practically, they will reach zero.
- 06:35 This gives us the bell shaped curve, that is so
- 06:37 frequently discussed when looking at data.
- 06:40 Finally, the curve is smooth, or at least is becoming smooth,
- 06:44 based upon the number of increments in the measurement system.
- 06:47 Non-normal data can be characterized by one or more of these effects.
- 06:52 First, it may not be symmetric, the data is skewed to one side or the other.
- 06:57 Second, it may not have a peak center due to central tendency.
- 07:01 It could be uniform where the data is level across the distribution, or
- 07:05 even a bathtub shape, where the edges are high, but the center is low.
- 07:09 And third,
- 07:10 the edges have a sharp limit, often due to physical limitation in the system.
- 07:14 Finally, the curve is not smooth, but rather several big steps.
- 07:19 For our purposes, unless the data meets all of the criteria for
- 07:23 normal data, we will treat it like it is not normal.
- 07:28 There are many different hypothesis tests.
- 07:30 You must choose the correct test, to get a meaningful answer to your question.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.