Back to course

Normal Versus Non-Normal

Retired course

This course has been retired and is no longer supported.

About this lesson

One of the most important criteria for selecting an hypothesis test is based upon whether the data is normal or non-normal. The normality question does not prove or disprove the hypothesis, rather it steers the nature of the analysis. This lesson reviews this concept and its application in hypothesis testing.

Exercise files

Download this lesson’s related exercise files.

Normal Versus Non-Normal.xlsx
10.1 KB Normal Versus Non-Normal - Solution.docx
238.9 KB

Normal versus Non-Normal

Hypothesis tests can be done with either normal or non-normal data. But different tests are used. Therefore, a Lean Six Sigma team must be able to determine if their data is normal or non-normal so that they can choose the correct hypothesis test.

When to Use Normal versus Non-Normal

Prior to actually conducting the hypothesis test, the data should always be checked to determine if it is normal or non-normal so as to be able to choose the correct test.

Instructions

The normal distribution, which is also called the Gaussian distribution or the bell-shaped curve, is characterized by a symmetric distribution. There are as many data points above the mean as below the mean. Also, there is a central tendency. The points are clustered near the mean. That is why when it is graphed, the center is high and the edges or tails are very small and approach zero.

A normal data distribution represents random variation that occurs within every physical system. Characteristics that could cause a data distribution to become non-normal are too many extreme points or outliers, an overlap of several different processes in the data, or a physical limits that truncates one of the tails prematurely.

Hypothesis testing can be done with either normal or non-normal data. There are different tests that are done depending upon the type of data. That is why this is the first question that is asked in the Hypothesis Testing Decision Tree after special causes have been removed. Depending upon this answer, a completely different set of tests will be involved.

Normality is determined using basic descriptive statistics of the data sample. When doing that test, several parameters are determined:

Mean – the average of all the data points. This is often used in Hypothesis tests with normal data.
Median – the midpoint of the data points. This is often used in Hypothesis tests with non-normal data.
Standard Deviation – a measure of the spread or width of the distribution. This measures and Variance, which is the standard deviation squared are often used in hypothesis testing.
Skewness – this is a measure of symmetry. A symmetrical distribution will have a skewness value of zero. The distribution is considered normal as long as the value is between -.8 and +.8.
Kurtosis – this is a measure of the whether the tails are “heavy” or “light.” When they are light, the taper down to near zero on the upper and lower edges of the distribution. Kurtosis can be measured in several ways. The method used in Excel is “Sample Excess Kurtosis.” This measure has the advantage that a Normal curve score will be zero – just like with Skewness. In this case, values from -0.8 to +0.8 is still considered Normal.

Normalcy can be checked in either Excel or Minitab.

Excel:
- Select “Data Analysis” on the “Data” ribbon.
- Select “Descriptive Statistics” and click “OK.”
- Enter the range for your data in “Input Range.”
- Select where you want the results – in a new worksheet or in a location in the current worksheet.
- Select “Summary Statistics.”
- Click on “OK.”
Minitab:
- “Stat” Menu
- Select “Basic Statistics”
- Select “Normality Test”
- Enter the name of the column with your sample data in the “Variable” window
- Ensure the “Anderson-Darling” box is checked

Excel will provide a table with the statistical values and you can then decide if the data is normal or non-normal. Minitab will provide a plot of the data against a normal line and provide a P value that can be used to determine if the data is normal.

Hints and Tips

If the Data Analysis Menu does not show on your Data ribbon in Excel, you need to add the Analysis ToolPak Add-in. Go to File menu, select Options, then select Add-in. Enable the Analysis ToolPak add-in. This is a free feature that is already in Excel, you just need to enable it. You many need to close and reopen Excel for the menu to appear.
If you don’t have Minitab, consider downloading a free trial. Minitab normally has a 30-day free trial period. All of the hypothesis tests will be demonstrated in Minitab. Approximately half of the tests will also be demonstrated in Excel, but the other half are not available in Excel. If you want to practice doing all the tests, you will need Minitab. Be sure you complete the course within the 30 days before your trial expires.
When using Minitab, data must always be entered into columns - never into rows. Minitab uses the column names for identifying data sets.
Data can be copied and pasted back and forth between Excel and Minitab. I often collect data in Excel because that is easier for data collection, and then copy it to Minitab for analysis.

00:04 Hi I'm Ray Sheen.
00:06 A key question to ask when preparing to conduct a hypothesis test is whether
00:10 the data set is normal.
00:12 Let's review how we make that determination.
00:16 If you look at the decision tree for the hypothesis test selection, the first
00:20 decision for selecting between different tests is the question of whether or
00:24 not the data is normal.
00:25 Many of the normal data tests are somewhat forgiving for
00:28 data that is close to normal.
00:30 But if the data is definitely not normal,
00:32 you should do a hypothesis test of non-normal data.
00:36 When we had to do these tests by hand, this decision was a big deal.
00:40 The non-normal tests were generally more complicated, and so
00:43 we hoped that the normal data tests, if possible, could still be used.
00:47 It's still an issue for you if you do your statistical analysis in Excel.
00:51 While Excel has a function for almost all of the hypothesis tests that
00:55 will use normal data, it does not have a function for the non-normal tests.
00:59 If you need to do non-normal tests,
01:01 you should plan on using a statistical application like Minitab.
01:05 Let's quickly review what we mean by normal data.
01:09 Normal data can be graphically represented with the normal curve or
01:12 as is more commonly called the bell shaped curve.
01:15 This is characterized by a symmetric curve with a central peak and
01:19 tails that go to zero.
01:21 This is also referred to as the Gaussian curve.
01:24 A few of the mathematical or
01:25 statistical attributes of this curve is that it can be divided in half
01:29 with an equal amount of data following on each side of the center point.
01:33 The center peak is both the center of the data,
01:35 it is also the average value or mean for the data.
01:39 Finally, mathematically, we say that the edges never reach zero but
01:43 rather approach them asymptotically.
01:45 However in practice, there will be a upper and lower limit to the data.
01:49 The normal curve tells us that whatever we see in the distribution
01:53 is due to random effects that occur within any process.
01:57 So let's now talk about what we mean when we say that the data is not normal.
02:02 First, let's acknowledge that many times the data is not normal.
02:06 There many things in the physical world where process and
02:08 equipment design that prevent the data distribution from being normal.
02:13 Examples include extreme points that disrupt the edges of the distribution.
02:17 Too many of these points is an indication of a special cause occurring.
02:22 Also, there may be physical limits.
02:24 For instance, a parameter may be limited so that it cannot be less than zero.
02:28 Another thing we're often trying to test with our hypothesis is whether or
02:31 not we have a combination of two or
02:33 more processes that are overlapping in our data.
02:36 But this is not an exhaustive list but
02:38 it's an illustrative one of why there is non-normal data.
02:41 And as we saw from our decision tree,
02:43 hypothesis testing does not require normal data.
02:46 In that regard it is different from some of the other statistical methods such as
02:49 statistical process control which does require that the data is normal.
02:54 The statistical test we will use tell us that the data is normal but
02:57 I will suggest that you graph it.
03:00 I can recognize non-normality from graphical data much easier
03:03 than from a table of numbers.
03:05 Don't be scared by non-normal data, just use your appropriate non-normal tests in
03:10 Minitab or other statistical software package.
03:13 So let's look at how we statistically check whether we should treat a data set
03:16 as normal or non-normal.
03:18 I'll start by showing how to do that in Excel.
03:21 We do this with the Descriptive Statistics function in the Excel Data Analysis menu.
03:26 The Data Analysis menu is in Data ribbon of Excel.
03:30 If you don't have the Data Analysis menu, you will need to enable this add-in.
03:34 Just go to the File menu, select Options, then select Add-in.
03:39 Now select the Analysis Tool Pack and click OK.
03:43 This add-in is free, so don't hesitate to load it into your Data ribbon.
03:48 So in Excel, go to the Data ribbon and select the Data Analysis menu.
03:52 This brings up this menu of statistical functions.
03:55 Select Descriptive Statistics and select OK.
03:58 This will bring up the next panel where we can identify our data for analysis.
04:02 Enter the row or column for your data in the input range field.
04:06 Select where you want the results presented, either in a new worksheet, or
04:10 in the designated spot within the current worksheet.
04:12 Finally, select Summary Statistics as a type of output you want to see.
04:17 Now click OK and Excel will create the summary statistics for
04:21 the selected data set.
04:23 Let's look at what Excel will give us for our descriptive statistics.
04:26 This statistics can be used to determine normality.
04:30 It starts with some of the basic statistics for the distribution
04:33 such as the mean or average value and the standard deviation for the data set.
04:38 Another measure is the skewness, which helps us determine if the data is
04:41 symmetric, or if it is weighted towards the upper or lower end of the data.
04:46 A perfectly symmetric data set has a skewness value of 0.
04:49 But frankly, as long as the skewness is not less than -.8 or
04:53 greater than positive .8, then we can consider it to be normal.
04:58 The last measure I want to mention is Kurtosis.
05:01 This is the measure for the size of the peak at the center of the data and
05:05 the edges are tails approaching 0.
05:07 Excel uses the measure called Sample Excess Kurtosis
05:11 which subtracts out the nominal value for Kurtosis.
05:14 This Kurtosis measure has advantages that works just like skewness.
05:18 A 0 value is a perfectly normal curve and
05:21 any value between -.8 to +.8 is considered acceptable for a normal curve.
05:27 So, given the values in this table, this data is normal.
05:31 Now, let's look at using Minitab to do a normality test.
05:35 By the way, Minitab has a 30-day free trial.
05:38 So if you wanna try Minitab, get the download.
05:40 It will be very useful for completing the exercises in this course.
05:44 Like Excel, Minitab will check for normality of a data set.
05:48 If you have not used MiniTab before, a couple of pointers.
05:52 The menu structure works like Excel.
05:54 In fact, you can cut and paste data back and forth between Excel and Minitab.
05:59 However, one constraint.
06:00 Minitab likes to work with columns of data.
06:04 So if your data is in rows,
06:05 you will need to transpose that to columns before putting it into Minitab.
06:09 To check the normality of a data set, go to the Stat pull down menu.
06:13 Select the first item Descriptive Statistics, and
06:16 then near the bottom of that menu, select the Normality Test.
06:20 On the panel that comes up, select the columns with your data and
06:24 make sure the Anderson-Darling test is selected.
06:27 Let's look at the results of the Normality test in Minitab.
06:31 Minitab will plot the data and the data set against the line which represents
06:35 the theoretically perfect normal curve.
06:37 Minitab will calculate some of the same data set statistics such as mean and
06:42 standard deviation.
06:43 These are the same as with Excel.
06:45 Minitab will also calculate a P value for this test, which is 0.383.
06:51 As we know, if the P value is greater than .05, we stick with the null hypothesis.
06:57 The null hypothesis in a normality check is always that the data set is normal.
07:02 There is nothing special in the data.
07:04 So as we determine by looking at skewnes and
07:06 Kurtosis in Excel, we can reach the same conclusion
07:10 that the data is normal by checking the P value in a Minitab normality test.
07:16 Determining whether our data is normal or
07:19 non-normal is the first decision we must make when selecting a hypothesis test.

Lesson notes are only available for subscribers.

PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.

Normal Versus Non-Normal

About this lesson

Exercise files

Quick reference

Normal versus Non-Normal

When to Use Normal versus Non-Normal

Instructions

Hints and Tips