Locked lesson.
About this lesson
One of the most important statistical measures of a data set is the mean or average value. For inferential statistics to be valid, the mean of the sample should be approximately the same as the mean of the entire population of data. The Standard Error is the measure of how accurately the sample mean will approximate the population mean. In this lesson, we will determine how to calculate the standard error and how the sampling process can affect that error.
Exercise files
Download this lesson’s related exercise files.
Standard Error Exercise.xlsx11.3 KB Standard Error Exercise Solution.xlsx
12.4 KB
Quick reference
Standard Error
Standard error, also known as sampling error, is a quick method to calculate the amount of uncertainty between a sample mean value and the population mean value.
When to use
Standard error can be calculated whenever a sample of a population is statistically analyzed. The standard error is a quick calculation to test for the “goodness” of the sample metrics. It is often used with data that is comprised of means of subset data from many samples.
Instructions
Sampling Error (aka Standard Error) is a measure of the uncertainty of a population parameter for a sample based on the sample size and variability in the sample data.
Standard Error is computed by dividing the sample standard deviation by the square root of the number of items in the sample.
The Standard Error is often used with sample statistics, such as sample mean, to indicate the level of uncertainty between the sample statistic and the population statistic. In this regard, it serves a similar purpose as confidence interval. The confidence interval formula also includes a term that is a standard deviation divided by the number of points in the sample. The difference is that the confidence interval multiplies that ratio by a Z factor based on the confidence level that is being used.
Because of the similarity, an alternative formula for Confidence Interval is sometimes given as
The Z term is approximated at a value of 2. This is because the Z term for a 95% confidence level, which is the most common level in hypothesis testing, is 1.96. The other minor difference is the use of the population standard deviation in the Confidence Interval and the use of the sample standard deviation with the Standard Error. When the sample size becomes reasonably large, that is greater than 20 items, these values will typically be almost identical.
The last point to consider is the sources of sampling error and what to do about them:
- Sampling errors – the uncertainty due to random selection. The Standard Error calculation address this.
- Sampling bias – selecting the sample from a segment of the population and not the entire population. Examples could be always taking the first piece manufactured during a shift or taking all the samples from the same line, even though the organization has multiple lines in operation. Since some items can never be selected, this is not a truly random sample. The sampling approach needs to change to address as much of the full population as possible.
- Error in measurement – this is due to issues associated with the design and maintenance of the measurement equipment. A major aspect of this is calibration, but it also includes problems with operator use and linearity. A measurement systems analysis is needed to ensure that the system is capable and accurate.
Lack of measurement validity – the measurement approach is inappropriate for the characteristic being measured. It may be that it is measured at the wrong step in the process, that the measurement system used does not measure the correct parameter, or the measurement system has become contaminated in some manner. When this is the problem, it is normally necessary to change the measurement system.
Hints & tips
- Standard error is often reported by statistical software as a measure of the uncertainty in the results that have been calculated.
- Standard error is more frequently used in sociological statistical analysis than with Lean Six Sigma problem-solving.
- Standard error is often used when the data set is actually the descriptive statistics of numerous unique samples.
- 00:03 Hi, I'm Ray Sheen.
- 00:05 Let's talk about a term that's frequently used in hypothesis testing and
- 00:10 statistical analysis called the standard error.
- 00:13 So let's see what the standard error is and how to use it.
- 00:17 First, with respect to hypothesis testing,
- 00:20 it goes by two names, standard error and sampling error.
- 00:25 These two names mean the same thing.
- 00:27 It is a measure of the uncertainty of a population's parameter with
- 00:32 respect to a sample based upon the variability in that sample data.
- 00:37 The formula is quite simple.
- 00:39 Standard error is the standard deviation
- 00:42 divided by the square root of the number of items in the sample.
- 00:46 As you can see, it's heavily dependent upon the standard deviation of the sample.
- 00:52 The standard error is frequently cited with a sample statistic,
- 00:56 such as a mean to indicate the level of uncertainty around the mean.
- 01:00 That is the uncertainty between the value of the sample mean and
- 01:04 the mean value of the full population.
- 01:07 Standard error is frequently used with data sets that have many samples in them.
- 01:12 The sample error can provide some insight into the grand mean of
- 01:17 the samples as compared to the population mean.
- 01:20 You're probably already thinking that this standard error looks
- 01:25 similar to the term used for confidence interval, and you would be right.
- 01:30 The confidence interval formula has a term that is very similar to the standard
- 01:35 error term.
- 01:36 We see that in the farthest right portion of the confidence interval equation.
- 01:39 Confidence Interval uses a ratio of the population standard deviation,
- 01:45 divided by the square root of a number of samples.
- 01:48 And standard error uses the sample standard deviation,
- 01:51 divided by the square root of the number of samples.
- 01:54 When the sample size is large, the population standard deviation and
- 01:59 the sample standard deviation are essentially the same.
- 02:03 The confidence interval for the population mean is often approximated with
- 02:08 the equation sample mean, plus or minus two times the standard error.
- 02:13 Let's look at why that is a good approximation.
- 02:16 The confidence interval equation uses the Z value alpha over 2.
- 02:22 Well, the Z value for a 95% confidence level is 1.96, which is almost 2.
- 02:29 And as we said, when the sample is large, the standard deviation for
- 02:34 both sample and population are almost the same.
- 02:38 So for the case of a fairly large sample and a 95% confidence level,
- 02:42 the standard error method for
- 02:44 confidence interval is virtually identical to the early formula we were using.
- 02:49 And given that almost all Six Sigma projects use a 95% confidence level,
- 02:55 we can use this formula.
- 02:57 Finally, let's discuss some of the reasons that the sample statistic could be
- 03:02 different from the population statistic, leading to a large standard error.
- 03:07 The first reason is called sampling error,
- 03:10 and it's just based upon the random nature of the selection.
- 03:13 The item selected by chance may be near and extreme or
- 03:17 generally higher or lower than others.
- 03:20 More samples will have a tendency to reduce that error,
- 03:24 which is why the statistical equation has the square root
- 03:27 of the number of terms in the denominator.
- 03:30 Increase the number of sample items and it decreases the standard error.
- 03:35 Next is the sample bias.
- 03:38 This is due to failure within the sampling plan.
- 03:41 In this case, there are certain conditions associated with sampling which
- 03:45 essentially exclude some of the possible items.
- 03:48 So if the sampling plan said to take the first three items off the manufacturing
- 03:53 line from each shift, there would be a sampling bias because there's no
- 03:57 possibility to ever get a sample from the middle of the manufacturing shift or
- 04:02 during the end of the shift.
- 04:03 Next is an error in the measurement.
- 04:06 All measurement systems have some error in them.
- 04:09 Ideally, that error is very small relative to what is being measured.
- 04:13 For more information on this topic,
- 04:15 take our course on measurement systems analysis.
- 04:18 Suffice to say that with a good measurement system,
- 04:23 this value is very small.
- 04:25 Finally, there is the lack of measurement validity.
- 04:28 In this case, the measurement approach that is used is inappropriate for
- 04:31 what we're trying to measure.
- 04:33 It could be that it's sampling the wrong population or
- 04:36 sampling the wrong parameter.
- 04:38 Whatever the reason, the point is that the data does not provide
- 04:42 useful information about the sample or population mean.
- 04:46 The concept of standard error can be a useful statistical measure to quickly
- 04:51 establish confidence in your statistical approach.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.