Locked lesson.
About this lesson
When doing Lean Six Sigma statistical analysis, you are often working with a small sample of the potential data associated with the problem. We analyze the sample data and from that we can make inferences about the full data population for the problem.
Exercise files
Download this lesson’s related exercise files.
Sample v Population Exercise.docx60.7 KB Sample v Population Exercise Solution - 2023.docx
61.3 KB
Quick reference
Sample versus Population Statistics
When doing Lean Six Sigma statistical analysis, you are often working with a small sample of the potential data associated with the problem. In that case, analyze the sample data, and from that the Lean Six Sigma team can make inferences about the full data population associated with the problem.
When to use
If the full population of data is available, use it. When it is not available, or acquiring it will be very difficult, use a sample from the population and apply inferential statistics
Instructions
To fully understand a problem or process, the full set of data points representing the physical situation should be analyzed. However, in many cases, the full dataset is not available. The data is limited in the time over which it was collected, the location, the operator, or some other constraint. All that is available is a portion or sample of the full data population.
The statistics associated with this sample of data can be used to make inferences about the statistics of the full data population. The more data points in the sample, the more accurate the statistics with respect to the full data population.
The descriptive statistics for the sample dataset and full population dataset are similar.
The number of data points in the sample is represented by the letter n.
The number of data points in the full population is represented by the letter N.
The mean value is calculated in the same manner, the sum of the data points is divided by the count of the data points. However, the symbol used changes based on whether it is a sample mean or the full population mean.
Sample mean: x= xin
Full population mean: μ = xiN
The standard deviation equation is slightly different between the sample and the full population. In the sample, the denominator is reduced by 1. In addition, the symbol is different between the sample and the full population standard deviation.
The sample standard deviation is: s = (xi - x)2(n-1)
The full population standard deviation is: σ = (xi -μ )2N
The variance is the standard deviation squared, so it too will be slightly different depending upon whether the dataset is a sample or the full population.
The sample variance is: s2 = (xi - x)2(n-1)
The full population variance is: σ2 = (xi -μ )2N
Hints & tips
- When the sample becomes very large, the mean, standard deviation, and variance of the sample become virtually identical to those of the full population.
- Often the parameters of the sample are dictated by physical conditions – such as data from a particular time period or location is not available. However, when intentionally selecting a sample, the sample should be representative of the entire population for the sample to be an accurate inference of the population.
- The course on Hypothesis Testing provides additional information on selecting and working with samples in order to have accurate inferential statistics.
- 00:04 Hi, this is Ray Sheen.
- 00:06 We use sample data sets as a surrogate for a full data population.
- 00:11 Let's take a look at how these are related.
- 00:13 The concept starts with the dataset.
- 00:16 Recall that data is facts about the real world.
- 00:19 Therefore, the data set is a collection of facts or
- 00:23 data points that provide a representation of a problem or process.
- 00:27 The more data points in the data set,
- 00:29 the better we can understand the real world problem or process.
- 00:33 The descriptive statistics provide insight about the real world problem.
- 00:38 Most of the time, the data set available to the Lean Six Sigma project team is
- 00:43 only a subset of the full data population that would represent all
- 00:47 the instances of the problem or process.
- 00:50 The data set is for a limited time or
- 00:52 based on a collection of facts from just a limited location.
- 00:56 Sometimes, it's limited to the work by only one operator or one shift, and
- 01:01 in some cases the data collected is just a periodic sample of the data available.
- 01:05 So there's a limitation in the nature of the sampling plan.
- 01:09 However, based upon the data we have in the sample,
- 01:12 we can reach some conclusions about the full population of data.
- 01:16 Let me describe the differences between them.
- 01:19 The population is a term that we use for the entire data set.
- 01:24 That means all data from all time and all locations that represents
- 01:29 the particular problem or process we're discussing.
- 01:32 This can be described with descriptive statistics such as the mean and
- 01:37 the standard deviation.
- 01:39 And the data population may be a finite population, but
- 01:42 in some cases it's infinite.
- 01:45 It's impossible to ever get all the data because
- 01:48 the source of the data is an infinite source.
- 01:50 A sample is a subset or portion of the full population.
- 01:55 Just like with the full population, descriptive statistics can be used to
- 01:59 describe the characteristics of the sample data set.
- 02:03 Even though the descriptive statistics are for a subset of the population,
- 02:08 we can use those to make inferences about the full population.
- 02:12 The accuracy will be based upon the characteristics of the sample as
- 02:16 compared to the full population.
- 02:18 We'll often be working with descriptive statistics on both populations and
- 02:23 samples.
- 02:23 It's important to be able to distinguish between them, so
- 02:27 let's take a look at ways we can do that.
- 02:30 As we look at this table, the descriptive statistics for samples are on the left and
- 02:35 for the full population are on the right.
- 02:37 First, let's look at the number of items in the data set.
- 02:40 The sample size is shown with a small n, but
- 02:44 a population size is represented with a capital N.
- 02:47 In either case, the source of the data has been based on observation.
- 02:51 By that I mean, we counted the number of data points.
- 02:55 The sample mean or average is represented with an X bar.
- 03:00 But with the population, the mean is represented with the Greek symbol, mu.
- 03:05 The formula for determining the mean is the same.
- 03:07 You add up all of your values for your data points and
- 03:10 then divide by the number of data points in that data set.
- 03:14 The standard deviation will be slightly different between sample and population.
- 03:19 The standard deviation for the sample is represented by the small s, however,
- 03:24 the standard deviation for
- 03:25 the full population is represented by the Greek symbol sigma.
- 03:29 With respect to the calculation, in both cases, determine the deviation for
- 03:35 each data point, square that deviation, and then add all of those up.
- 03:40 That makes the numerator,
- 03:41 but the denominator is different depending upon whether it's a sample or population.
- 03:46 If it's a sample, the denominator is the sample size minus 1.
- 03:51 However, if it's the full population, then it's the full count.
- 03:54 In both cases, the final portion of the calculation is to take the square root of
- 03:59 the deviation calculation divided by the count.
- 04:02 An observation is that the sample standard deviation will be slightly
- 04:07 larger than the population standard deviation because
- 04:10 the sample denominator is slightly smaller due to the minus 1.
- 04:15 The last descriptive statistic I want to mention,
- 04:18 because it's often used in hypothesis testing, is the variance.
- 04:22 Sample variance is the sample standard deviation squared, and
- 04:25 the population variance is the population standard deviation squared.
- 04:29 The descriptive statistics for samples and populations will be used frequently during
- 04:34 the analyzed phase, so you need to have a clear understanding of the difference.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.