Locked lesson.
About this lesson
When doing Lean Six Sigma statistical analysis, you are often working with a small sample of the potential data associated with the problem. We analyze the sample data and from that we can make inferences about the full data population for the problem.
Exercise files
Download this lesson’s related exercise files.
Sample v Population Exercise.docx60.7 KB Sample v Population Exercise Solution - 2023.docx
61.3 KB
Quick reference
Sample versus Population Statistics
When doing Lean Six Sigma statistical analysis, you are often working with a small sample of the potential data associated with the problem. In that case, analyze the sample data, and from that the Lean Six Sigma team can make inferences about the full data population associated with the problem.
When to use
If the full population of data is available, use it. When it is not available, or acquiring it will be very difficult, use a sample from the population and apply inferential statistics
Instructions
To fully understand a problem or process, the full set of data points representing the physical situation should be analyzed. However, in many cases, the full dataset is not available. The data is limited in the time over which it was collected, the location, the operator, or some other constraint. All that is available is a portion or sample of the full data population.
The statistics associated with this sample of data can be used to make inferences about the statistics of the full data population. The more data points in the sample, the more accurate the statistics with respect to the full data population.
The descriptive statistics for the sample dataset and full population dataset are similar.
The number of data points in the sample is represented by the letter n.
The number of data points in the full population is represented by the letter N.
The mean value is calculated in the same manner, the sum of the data points is divided by the count of the data points. However, the symbol used changes based on whether it is a sample mean or the full population mean.
Sample mean: x= xin
Full population mean: μ = xiN
The standard deviation equation is slightly different between the sample and the full population. In the sample, the denominator is reduced by 1. In addition, the symbol is different between the sample and the full population standard deviation.
The sample standard deviation is: s = (xi - x)2(n-1)
The full population standard deviation is: σ = (xi -μ )2N
The variance is the standard deviation squared, so it too will be slightly different depending upon whether the dataset is a sample or the full population.
The sample variance is: s2 = (xi - x)2(n-1)
The full population variance is: σ2 = (xi -μ )2N
Hints & tips
- When the sample becomes very large, the mean, standard deviation, and variance of the sample become virtually identical to those of the full population.
- Often the parameters of the sample are dictated by physical conditions – such as data from a particular time period or location is not available. However, when intentionally selecting a sample, the sample should be representative of the entire population for the sample to be an accurate inference of the population.
- The course on Hypothesis Testing provides additional information on selecting and working with samples in order to have accurate inferential statistics.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.