Retired course
This course has been retired and is no longer supported.
About this lesson
A well written hypothesis contains two elements, the Null hypothesis and the Alternate hypothesis. Writing a clear hypothesis that can be quickly analyzed with a statistical test is a skill that will be illustrated and practiced in this lesson.
Exercise files
Download this lesson’s related exercise files.
Writing Hypotheses.docx61.1 KB Writing Hypotheses - Solution.docx
60.4 KB
Quick reference
Writing Hypotheses
The hypothesis is a two part statement that forms the basis for the statistical test. This lesson explains and illustrates the format of the hypothesis.
When to use
The hypothesis must be written in order to determine what type of test is needed to accept or reject the Null hypothesis. Therefore, whenever you do a hypothesis test, you must first write the hypothesis.
Instructions
It is no surprise that Hypothesis testing requires a hypothesis. When doing hypothesis testing for Lean Six Sigma projects, the hypothesis is a pair of statements about data associated with the process, product or problem being investigated. When well written, the two statements are opposites of each other. If one is true, the other cannot be true.
The two statements are always called the Null hypothesis and the Alternative hypothesis. Based upon how the statistical tests are performed, the Null hypothesis must always take the position that the factor being investigated has no impact on the process. Naturally, then the alternative hypothesis is that the factor does have an impact on the process. Or more broadly stated, the Null hypothesis describes the status quo, any changes are due to random chance. The Alternative hypothesis is often what we are trying to prove. If we think that a factor has an impact on performance, we want to show that effect statistically.
The characteristics of well-written Hypotheses are:
- They clearly identify the population being considered.
- They identify the dependent variable and independent variable(s).
- They state the type or direction of the effect – such as greater than or less than.
The Hypothesis test will then do a statistical assessment of the data and conclude to either reject the Null hypothesis or fail to reject the Null hypothesis. The focus of the Hypothesis test is too see if there is evidence of something different about the data. If that difference is found, then reject the Null. If there is no difference found, there is no reason to reject the Null.
Hints & tips
- Be clear about the population of data required and the measure you want to use. These are needed to determine which Hypothesis test should be used.
- An excellent way to provide tips is to show several examples:
- Example 1:
- Ho: There is no significant difference in test defects between the data from production line A and production line B.
- Ha: Production line A incidence of test defects are less than Production line B.
- Example 2:
- Ho: There is no statistically significant difference in repair rates between product made with supplier A material and supplier B material.
- Ha: There is a higher incidence of repair when using materials from supplier A rather than with materials from supplier B.
- Example 1:
- 00:04 Hello, I'm Ray Sheen.
- 00:06 Let's take a few minutes now and discuss hypothesis.
- 00:10 A well written hypothesis statement is at the core of successful hypothesis testing.
- 00:15 So let's discuss the principles for writing a hypothesis.
- 00:20 Writing the hypothesis creates focus, the rest of the hypothesis testing analysis.
- 00:25 The hypothesis should both define the population of data or
- 00:28 information to be used in the analysis, and the parameter of interest.
- 00:32 When writing the hypothesis,
- 00:34 first start by stating a practical terms, what you want to investigate.
- 00:38 It could be something like, does the value of this process characteristic make any
- 00:42 difference in how the process performs?
- 00:44 Or, does this change in the process improve the quality of the process
- 00:48 results?
- 00:50 Now, we convert the question into two practical statements
- 00:53 that can be expressed using mathematical equations.
- 00:56 These equations will involve a parameter of a data sets.
- 00:59 So for the first statement, does the value of this process characteristics makes
- 01:03 a difference in how the process performs.
- 01:05 We can create and equation that states that the mean of the total data set
- 01:08 of the process performance equals or does not equal
- 01:11 the mean of the data set when this process characteristics is that one extreme value.
- 01:16 And when considering the question,
- 01:18 does it change the process, improve the quality of the process results?
- 01:22 We can create an equation for the case where the mean of the process quality
- 01:26 metric is lower after the changes put in place and
- 01:29 the mean stays the same after the changes put in place.
- 01:32 Now that they were creating two equations, one for when the effect occurs and one for
- 01:37 when the effect does not occur.
- 01:39 I find, often,
- 01:40 it's easier to write the equation for the alternative hypothesis first.
- 01:44 This is the equation for when the effect I'm trying to prove or disprove occurs.
- 01:48 So in the first case, it is that the main of the full data set does not equal
- 01:52 the main of the data when the process characteristic is at the extreme value.
- 01:56 And for the second one, it is the mean of the process quality metric is lower after
- 02:00 the change has been put into the process.
- 02:03 The null hypothesis then would occur if the effected not make a difference.
- 02:08 In both of these examples,
- 02:09 it would be that the mean of the two data sets are equal.
- 02:12 As you can see, often the null and alternative hypothesis are opposite
- 02:16 answers with respect to the question that is being asked.
- 02:20 Let's look at the formal definition of the two parts of the hypothesis.
- 02:24 The null hypothesis is designated as H sub 0.
- 02:28 It always represents the status quo.
- 02:30 It is saying that whatever change we are seeing between the statistical metrics of
- 02:34 the data set is just based upon the normal variation that occurs.
- 02:37 It's like our commute to work.
- 02:39 Each day it's a little longer or shorter, depending upon traffic or the weather.
- 02:43 However, unless there's some major accident or other event that changes
- 02:46 traffic patterns and schedules, the variation from day to day is just random.
- 02:51 There's nothing worth looking at here.
- 02:53 Nothing significant is happening.
- 02:55 In our hypothesis testing approach, we'll start by assuming that this is the case.
- 03:00 That means that the data has to prove that something significant is happening.
- 03:04 We don't start by assuming that there is a major change and
- 03:07 the data has to prove that there isn't a difference.
- 03:09 More about that in another lesson.
- 03:11 The final result of our hypothesis test is that we accept or
- 03:15 reject the null hypothesis.
- 03:17 And just to be clear,
- 03:18 we accept the null hypothesis unless we can prove that we should reject it.
- 03:23 Now let's consider the alternative hypothesis.
- 03:26 This is represented by the symbol H sub a.
- 03:29 The hypothesis describes the differences we are attempting to prove.
- 03:33 Recall I said that the hypothesis testing is using data to prove what we
- 03:37 think we know.
- 03:38 This is what we think we know.
- 03:40 This is the difference we think exists in the data.
- 03:43 The hypothesis testing will prove whether that difference is
- 03:46 statistically significant or just due to random chance.
- 03:49 It will validate whether there is a real difference or not.
- 03:52 Let's go through some more examples.
- 03:54 And you should definitely do the exercises to practice writing the hypothesis.
- 03:58 A well written hypothesis will easily convert to an equation.
- 04:02 Remember, the hypothesis is a pair of statements
- 04:04 that expresses two different mathematical relationships for the data sets.
- 04:08 That is because it clearly spells out the population being investigated, therefore,
- 04:12 we know what sub set of data to use.
- 04:15 That identifies the dependent variable and the independent variable that helps us to
- 04:18 determine the data characteristic to be measured, which is the dependent variable.
- 04:23 And the screening or differentiation of that data population to be used,
- 04:26 which is the independent variable.
- 04:29 Finally, the type of relationship or the direction of the relationship is clear.
- 04:33 Do we just wanna find out if there's a difference?
- 04:35 Or do we want to show that the parameter for one data set is larger or
- 04:38 smaller than for the other data sets?
- 04:41 Let's look at an example.
- 04:43 I'll start with the alternative hypothesis since that is often the one we
- 04:46 write first.
- 04:47 Production line A has a lower incident of test defects than production line B.
- 04:52 Notice that it tells us that the population we are dealing with,
- 04:55 our production lines, it tells us the dependent variable which is the defect and
- 05:00 the independent variable which is the line A and line B.
- 05:03 And it gives us a direction for the relationship.
- 05:05 Line A is lower than line B.
- 05:08 Mathematically, this hypothesis statement would be that the average defect value for
- 05:12 line A is less than the average defect value for line B.
- 05:17 The null hypothesis is the opposite position.
- 05:19 There is no significant difference in test defects
- 05:22 between data from production line A and production line B.
- 05:26 Mathematically, this would be that the average defect value for
- 05:29 both lines is equal.
- 05:32 Let's look at another example,
- 05:33 the alternative hypothesis states there is a higher incidence of repair when
- 05:38 using materials from supplier A rather than material from supplier B.
- 05:42 The population we are looking at is our repair process.
- 05:45 The dependent variable is the incidence of repair, not the repair time or
- 05:49 the repair cost.
- 05:50 The independent variable is the material, either supplier A or Supplier B.
- 05:55 And the relationship is that the rate of repair with supplier A material
- 05:58 is greater than supplier B material.
- 06:00 Since we're talking about the incidents or rate of repair, we need to normalize
- 06:04 the repair numbers based on the amount of product with each type of material.
- 06:08 Mathematically, we would state that,
- 06:10 the normalized average number of units being repaired of material A,
- 06:14 is greater than the normalize average number of units with material B.
- 06:18 And, of course, the null hypothesis is, there is no statistically significant
- 06:22 difference in repair rates between products made with supplier A material and
- 06:26 supplier B material.
- 06:28 Mathematically, we would say that the normalized average repair rate for
- 06:31 both material types is the same.
- 06:35 Writing a clear null and alternative hypothesis will focus your analysis and
- 06:40 lead to a clear answer to the question you're investigating.
- 06:44 But you must follow that null and alternative hypothesis format.
- 06:48 All of the tests are structured to either accept or reject the null hypothesis.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.