Retired course
This course has been retired and is no longer supported.
About this lesson
The Central Limit Theorem is a principle that is used to transform non-normal raw data into a data set that is normal.
Exercise files
Download this lesson’s related exercise files.
Central Limit Theorem.docx102.3 KB Central Limit Theorem - Dice Roll Data.xlsx
10.4 KB Central Limit Theorem - Solution.docx
88.6 KB
Quick reference
Central Limit Theorem
The Central Limit Theorem is a principle that is used to transform non-normal raw data into a data set that is normal.
When to use
If data is non-normal due to process attributes instead of a special cause, the data can be transformed into a normal data set using the Central Limit Theorem.
Instructions
The normal process outputs of some processes do not create a normal curve. The physical characteristics of the process cause it to be skewed to the high or low side of the output. Or there may be a physical limit or stop to the upper or lower end of the output that truncates the shape of a normal curve. This presents a problem, since many of our basic statistical data analyses are based upon the assumption that the data set being analysed is a normal curve.
This type of non-normal data can be transformed into normal data by using the Central Limit Theorem. The theorem states, “When independent random variables are added, their sum tends toward a normal distribution, even if the original variables themselves are not normally distributed.” Using this principle, groups or samples of the process output can be added together, and if the data is truly independent and random, that data set of summed values will be normal.
There are several key points to applying this principle. First, the number of points in each subgroup must be the same every time. Second, the data must be in the same units or measured in the same manner. For instance, if the data was a discrete pass/fail data value, the criteria used for what constitutes a “pass” or a “fail” must be the same for all data points.
A critical determination is to decide how many data points should be in each subgroup. This is determined by the structure of the original raw data. The table below provides guidance.
Hints & tips
- Central Limit Theorem is often used when collecting and analysing data. It is very easy to apply.
- If after applying the Central Limit Theorem the data is still not normal, then you know the underlying data is not just experiencing normal random process variation. Rather, there is some abnormal special root cause that is affecting the results of the data.
- When deciding on subgroup size, consider normal physical characteristics of the process. If the data is collected sequentially, try to set your subgroup size at a normal stop time for the process such as the end of a shift – even if that means using a subgroup size that is a little more than the minimum for the type of data.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.