Locked lesson.
About this lesson
One of the most important criteria for selecting a hypothesis test is based upon whether the data being analyzed is normal on not normal. The normality question does not prove or disprove the hypothesis, rather it determines the type of statistical test that should be performed. This lesson reviews the concept of normality and how to determine it.
Exercise files
Download this lesson’s related exercise files.
Normal Exercise.xlsx10.2 KB Normal Exercise Solution.docx
96.1 KB
Quick reference
Normal Distribution
Hypothesis tests can be done with either normal or non-normal data. But different tests are used. Therefore, a Lean Six Sigma team must be able to determine if their data is normal or non-normal so that they can choose the correct hypothesis test
When to use
Prior to actually conducting the hypothesis test, the data should always be checked to determine if it is normal or non-normal so as to be able to choose the correct test.
Instructions
The normal distribution, which is also called the Gaussian distribution or the bell-shaped curve, is characterized by a symmetric distribution. There are as many data points above the mean as below the mean. Also, there is a central tendency. The points are clustered near the mean. That is why when it is graphed, the center is high and the edges or tails are very small and approach zero.
A normal data distribution represents random variation that occurs within every physical system.
Hypothesis testing can be done with either normal or non-normal data. There are different tests that are done depending on the type of data. That is why this is a key question that is asked in the Hypothesis Testing Decision Tree. Depending upon this answer, a completely different set of tests will be involved.
Normality is determined using basic descriptive statistics of the data sample. When doing that test, several parameters are determined:
- Mean – the average of all the data points. This is often used in Hypothesis tests with normal data.
- Median – the midpoint of the data points. This is often used in Hypothesis tests with non-normal data.
- Standard Deviation – a measure of the spread or width of the distribution. This measure and Variance, which is the standard deviation squared, are often used in hypothesis testing.
- Skewness – this is a measure of symmetry. A symmetrical distribution will have a skewness value of zero. The distribution is considered normal as long as the value is between -.8 and +.8.
- Kurtosis – this is a measure of whether the tails are “heavy” or “light.” When they are light, they taper down to near zero on the upper and lower edges of the distribution. Kurtosis can be measured in several ways. The method used in Excel is “Sample Excess Kurtosis.” This measure has the advantage that a Normal curve score will be zero – just like with Skewness. In this case, values from -0.8 to +0.8 are still considered Normal.
Normalcy can be checked in either Excel or Minitab.
- Excel:
- Select “Data Analysis” on the “Data” ribbon.
- Select “Descriptive Statistics” and click “OK.”
- Enter the range for your data in “Input Range.”
- Select where you want the results – in a new worksheet or in a location in the current worksheet.
- Select “Summary Statistics.”
- Click on “OK.”
- View the results and analyze for normality.
- Minitab:
- “Stat” Menu
- Select “Basic Statistics”
- Select “Normality Test”
- Enter the name of the column with your sample data in the “Variable” window
- Ensure the “Anderson-Darling” box is checked
- View the results and check for normality.
Excel will provide a table with the statistical values and you can then decide if the data is normal or non-normal. Minitab will provide a plot of the data against a normal line and provide a P value that can be used to determine if the data is normal.
Hints & tips
- If the Data Analysis Menu does not show on your Data ribbon in Excel, you need to add the Analysis ToolPak Add-in. Go to File menu, select Options, then select Add-in. Enable the Analysis ToolPak add-in. This is a free feature that is already in Excel, you just need to enable it. You may need to close and reopen Excel for the menu to appear.
- If you don’t have Minitab, consider downloading a free trial. Minitab normally has a 30-day free trial period. All of the hypothesis tests will be demonstrated in Minitab. Approximately half of the tests will also be demonstrated in Excel, but the other half are not available in Excel. If you want to practice doing all the tests, you will need Minitab. Be sure you complete the course within 30 days before your trial expires.
- When using Minitab, data must always be entered into columns - never into rows. Minitab uses column names for identifying data sets.
- Data can be copied and pasted back and forth between Excel and Minitab. I often collect data in Excel because that is easier for data collection, and then copy it to Minitab for analysis.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 A key question to ask when doing a hypothesis test is whether or
- 00:09 not your data set is normal.
- 00:11 Let's review what we mean by normal.
- 00:15 >> If we look at the decision tree for hypothesis test selection,
- 00:19 a critical decision for selection between many of the tests is the question
- 00:23 of whether or not the data is normal.
- 00:26 Many of the normal data tests are somewhat forgiving for
- 00:29 data that is close to normal.
- 00:31 But if the data is definitely non-normal,
- 00:33 you should do a hypothesis testing of non-normal data.
- 00:36 When we had to do those tests by hand, the decision was a big deal.
- 00:41 The non-normal tests were generally more complicated.
- 00:43 So we hoped to be able to do the normal tests if possible.
- 00:47 It still is an issue for you if you do your statistical analysis in Excel.
- 00:52 While Excel has a function for almost all of the hypothesis tests that use
- 00:55 normal data, it does not have a function for those that use non-normal data.
- 01:00 If you need to do non-normal testing,
- 01:03 you should plan to use a statistical analysis tool like Mini-Tab.
- 01:07 Let's quickly review what we mean by normal data.
- 01:10 Normal data occurs when the only reason variation exists in the data set is due to
- 01:15 the true random variation that happens with any physical process.
- 01:20 There are no special causes of variation.
- 01:23 Normal data can be graphically represented with the normal curve or
- 01:27 as it is more commonly called the Bell-Shaped curve.
- 01:31 This is characterized by a symmetric curve with a central peak and
- 01:34 tails that go to zero.
- 01:36 And this is also referred to as the Gaussian curve.
- 01:40 As we know from our decision tree, many of the hypothesis tests require normal data.
- 01:45 Often they will use the mean or standard deviation of the data, and
- 01:49 if the distribution is not normal, the mean or
- 01:52 average may not be a good representation of the data.
- 01:55 So bottom line, always check for normality before selecting your hypothesis test.
- 02:01 So let's look at how we statistically check whether we should treat a dataset as
- 02:05 normal or non-normal.
- 02:06 I'll start showing how we do this in Excel.
- 02:10 We do this with the Descriptive Statistics function in the Excel Data Analysis menu.
- 02:14 The data analysis menu is in the data ribbon of Excel.
- 02:18 If you don't have the data analysis menu in your data ribbon,
- 02:21 you will need to enable this add in.
- 02:23 Just go to the File menu, select options, then select add ins.
- 02:28 Now, select the analysis toolkit, and click OK.
- 02:32 This add-in is free, so don't hesitate to put that into your data ribbon.
- 02:37 So in Excel, go to the data ribbon and select the Data Analysis menu.
- 02:41 That brings up a menu of statistical functions.
- 02:44 Select Descriptive Statistics and select OK.
- 02:48 That will bring up the next panel which we can identify our data for analysis.
- 02:53 Enter the row or column for your data in the input range field.
- 02:57 Select where you want the results presented, either in a new worksheet or
- 03:00 at a designated spot within the current worksheet.
- 03:03 Finally, select Summary Statistics as a type of output that you want to see.
- 03:08 Now click OK, and Excel will create that summary statistics for
- 03:12 the selected data set.
- 03:14 Let's look at what Excel will give us for descriptive statistics.
- 03:18 These statistics can be used for determining normality.
- 03:21 It starts with some of the basic statistics for the distribution,
- 03:25 such as the mean or average value and the standard deviation for that dataset.
- 03:30 Another measure of skewness, which helps us to determine if the dataset is
- 03:34 symmetric or if it is weighted towards the upper or lower end of the data.
- 03:37 A perfectly symmetrical data set will have a skewness value of 0.
- 03:42 But frankly as long as the skewness is not less than -0.8 or
- 03:46 greater than +0.8, then we consider it to be normal.
- 03:51 The last measure I want to look at is kurtosis.
- 03:54 This is the measure for the peak at the center of the data and the edges or
- 03:59 tails approaching 0.
- 04:01 Excel uses the measure Sample Excess Kurtosis,
- 04:04 which subtracts out the nominal value of Kurtosis.
- 04:08 The Sample Excess Kurtosis measure has the advantage that just like with skewness,
- 04:13 a 0 value is a perfectly normal curve.
- 04:17 For our purposes, any value between -0.8 to +0.8 is
- 04:21 considered acceptable for a normal curve.
- 04:25 So, given the values we see in this table, this data is normal.
- 04:30 Now, let's look at using Minitab to do the normality tests.
- 04:34 By the way, Minitab has a 30 day free trial.
- 04:37 So if you want to try Minitab, get the download.
- 04:39 It will be very useful for completing exercises in this course.
- 04:44 Like Excel, Minitab will check for normality of a data set.
- 04:49 If you've not used Minitab before, a couple of pointers,
- 04:52 the menu structure works just like Excel.
- 04:55 In fact, you can cut and paste data back and forth between Excel and Minitab.
- 05:00 However, one constraint, Minitab likes to work with columns of data, so if your data
- 05:05 is in rows you need to transpose that to columns before putting it in the Minitab.
- 05:10 To check normality of a data set, go to the stat pull down menu,
- 05:14 select the first item, descriptive statistics,
- 05:17 and then near the bottom of that menu, select the normality test.
- 05:22 On the panel that comes up, select the column with your data and
- 05:25 make sure the Anderson-Darling test is selected.
- 05:29 Let's look at the results of the normality test in Minitab.
- 05:32 Minitab will plot the data and the dataset against a line which represents
- 05:37 the theoretically perfect normal curve.
- 05:40 Minitab will calculate some of the same dataset statistics such as mean and
- 05:45 standard deviation that we saw with Excel.
- 05:48 Minitab will also calculate a P Value for the test, which in this case is 0.383.
- 05:55 As we know that the P value is greater than 0.05,
- 05:58 we will stick with the null hypothesis.
- 06:01 The null hypothesis in a normality check is always that the data is normal.
- 06:06 There is nothing special in the data.
- 06:08 So as we determine by looking at the skewness and kurtosis in Excel,
- 06:13 we can reach the same conclusion here.
- 06:16 This data set is normal by checking the P Value in a Minitab normality test.
- 06:21 >> Determining whether data is normal is a key decision for
- 06:24 selecting between different hypothesis tests.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.