Locked lesson.
About this lesson
Non-normal data often has a pattern. Knowing that pattern can help to transform the data to normal data and can aid in the selection of an appropriate hypothesis test.
Exercise files
Download this lesson’s related exercise files.
Non-Normal Data Exercise.docx62.6 KB Non-Normal Data Exercise Solution.docx
62.6 KB
Quick reference
Non-normal Distribution
Non-normal data can occur from stable physical systems. Hypothesis tests can be done using non-normal data.
When to use
Prior to actually conducting the hypothesis test, if the data set parameters are a continuous “Y” and a discrete “X,” the data should be checked to determine if it is normal or non-normal so as to be able to choose the correct test.
Instructions
Non-normal data is often created by stable physical systems. The non-normality is often due to constraints in the system or environment. There are hypothesis tests that are structured to accept non-normal data sets. However, different tests are best suited to different types of non-normality. The non-normal hypothesis test lessons describe which method to use for different types of non-normality. It is often desirable to graph or plot the data so as to determine the nature of the non-normality.
Normality is determined using basic descriptive statistics of the data sample. In particular, non-normal tests usually use either the median or the variance. Descriptive statistics can also provide measures of the level of non-normality. When doing a descriptive statistics test, several parameters are determined:
- Median – the midpoint of the data points. This is often used in Hypothesis tests with non-normal data.
- Variance – a measure of the spread or width of the distribution. This measure is calculated by squaring the standard deviation of the data set.
- Skewness – this is a measure of symmetry. A symmetrical distribution will have a skewness value of zero. The distribution is considered normal as long as the value is between -.8 and +.8. Beyond that, the distribution is non-normal.
- Kurtosis – this is a measure of the tails of the distribution to indicate if they are “heavy” or “light.” There are three types of Kurtosis. Leptokurtic is heavy tails. There are many points near the upper and lower bounds of the data. Mesokurtic is associated with the normal curve. Platycurtic is the condition when the tails are light, they rapidly drop to near zero on the upper and lower edges of the distribution. Kurtosis can be measured in several ways. The method used in Excel is “Sample Excess Kurtosis.” This measure has the advantage that a Normal curve score will be zero – just like with Skewness. In this case, values from -0.8 to +0.8 are still considered Normal. Minitab uses the true Kurtosis scale which places the midpoint at 3.0.
- Multi-modal – this occurs when there are multiple datasets combined into the same set being investigated. This data set being investigated will often have multiple “peaks” representing each of the constituent data sets.
Granularity – this occurs when the measurement system resolution is too coarse for the data. All the data is lumped together in just a few slices.
Hints & tips
- If the Data Analysis Menu does not show on your Data ribbon in Excel, you need to add the Analysis ToolPak Add-in. Go to the File menu, select Options, then select Add-in. Enable the Analysis ToolPak add-in. This is a free feature that is already in Excel, you just need to enable it. You may need to close and reopen Excel for the menu to appear.
- If you don’t have Minitab, consider downloading the free trial. Minitab normally has a 30-day free trial period. All the non-normal tests we discuss are available in Minitab.
- If your data is non-normal, place your data points in a column or row of Excel and select the graph function to determine the shape of your data. The selection of a non-normal hypothesis test is based on the nature of the non-normality.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.