Back to course

Confidence Intervals

Retired course

This course has been retired and is no longer supported.

About this lesson

It is often impossible to analyze all the items in a data set, so a sample is used. But that means that the sample statistics might not perfectly represent the full data set. Depending upon the nature of the data and the sample, a confidence interval can be established around any sample statistic to show the range in which the actual statistic occurs. This lesson discusses how to calculate this range and criteria for sampling that impact this range.

Exercise files

Download this lesson’s related exercise files.

Confidence Intervals.docx
59.9 KB Confidence Intervals - Solution.docx
66.5 KB

Confidence Intervals

When inferring statistical values based upon a sample, there is a band of uncertainty around the sample statistic in which the population statistic lies. This band of uncertainty can be calculated and to an extent controlled by the sampling approach used.

When to use

When inferential statistics are used instead of descriptive statistics, a confidence interval and confidence level should always accompany the statistical values.

Instructions

Descriptive statistics provide a complete statistical description of a data population. However, often the full population is not available. Therefore, inferential statistics are used. This is done by calculating descriptive statistics for a sample from the population and inferring from those statistics the likely population statistics. However, since the sample does not include all data points from the population, the actual population statistics will likely be different than the sample statistics. It is possible to calculate the zone in which the population statistics will fall based upon information from the sample and the population. This zone or range is called the confidence interval. The size of this interval will depend in part upon the level of desired confidence that the actual statistic will be within the interval. This level is known as the confidence level.

The formula for the confidence interval is:

Where: CI is the Confidence Interval range.

X-bar is the mean from the sample

Sigma is the standard deviation from the sample

n is the number of items in the sample

Alpha is 1 – Confidence level %

Z is the Z transformation for an area that represents alpha/2 data from either end of the distribution curve. (see diagram below)

A diagram of the confidence level formula is shown below.

Generally, we want the confidence interval to be as small as possible so that there is little uncertainty with regards to population statistics. Based upon this formula we can draw some important conclusions. First if the standard deviation decreases, the confidence interval will decrease. Second if the sample size increases, the confidence level will decrease. Third if the confidence level is reduced, the confidence level will decrease. This third conclusion is based upon the value of Z for common Confidence levels.

The Confidence Interval formula can be transformed so that a required sample size (n) can be determined based upon the desired spread of the Confidence Interval. This is done by just using the plus and minus term from the confidence interval formula and manipulating terms to solve for the sample size. This formula is:

This formula can be used when planning sample data collection.

Hints & tips

The only two elements that you can impact are the confidence level and the sample size. The mean and standard deviation come from the data. If you want to reduce your confidence interval, without reducing your confidence level, your only option is to collect more data in your sample.
The actual formula for these calculations uses the standard deviation from the full population not from the sample. However, Walter Shewhart’s research showed us that once a sample has at least 30 points in it, the standard deviation no longer changes and that standard deviation is an excellent approximation of the full population standard deviation – provided of course that the sample is representative and random.

00:04 Hi, I'm Ray Sheen.
00:05 Now, whenever we have a discussion about Inferential Statistics,
00:09 we have to include the discussion about Confidence Interval.
00:14 Let me explain the concept of Confidence Interval.
00:17 When using inferential statistics,
00:19 we use a statistic of a sample data to estimate the sample statistic within
00:23 the full population from which that sample is extracted.
00:27 So we take a sample mean of x bar and used it to estimate population mean of mu.
00:32 But associated with that sample is a range called the confidence interval.
00:37 This range represents the possible values in which a random value for
00:40 the characteristic will occur.
00:42 In other words, this is the range in which we have confidence that the true value
00:46 actually lies.
00:48 And when the sample represents the entire population,
00:50 the sample statistic is the population statistic.
00:54 The range associated with the confidence interval can go to zero because there is
00:57 no uncertainty.
00:59 But when the sample is not the complete population,
01:01 there is a possibility that when the rest of the population is included,
01:05 the actual value will be higher or lower than the sample statistic.
01:10 Now, to come as no surprise, if the larger the sample, the less the uncertainty in
01:14 the true value for the population and the smaller the confidence interval.
01:19 In fact, let's look at the relationship between confidence interval and
01:22 sample size.
01:23 The confidence interval has some associated confidence level.
01:27 For instance,
01:28 the range of a 95% confidence level is larger than a 90% confidence level.
01:34 And a 99% confidence level is larger still.
01:37 The confidence interval calculation is based upon the Z Transformation.
01:41 You may recall from our lesson on this topic on Lean Six Sigma principles class
01:46 that the Z Transformation converts all of the independent values for
01:49 distribution into units of standard deviation.
01:53 Given these units, the width from minus 2 standard deviations to plus 2 standard
01:57 deviations represents approximately 95% of the processed results.
02:02 In addition to the Z Transformation,
02:04 the confidence interval calculation includes the value for
02:07 the sample size, the sample mean, and the population standard deviation.
02:13 Here's the formula, confidence interval, or CI, equals x bar plus or
02:18 minus the z transformation of one-half alpha times the ratio of the standard
02:24 deviation, divided by the square root of the number of items in the sample.
02:29 In this case, x bar is the sample mean.
02:31 The population's standard deviation is sigma, but
02:35 often we don't know the population standard deviation.
02:38 However, when the sample is relatively large,
02:40 at least 30 data points, the sample standard deviation will be
02:43 almost identical to the population's standard deviation.
02:46 So we will use the population standard deviation, and
02:50 is the number of items in the sample, alpha is 1 minus the confidence interval.
02:56 So if the confidence interval is 90%, alpha is 1-0.9 or
03:00 one-tenth, and Z is the Z transformation which locates the value
03:05 in standard deviations from the end of the distribution curve.
03:10 So let's consider an example.
03:12 Here's a standard normal distribution.
03:14 X bar has been calculated.
03:16 And we'll assume that there are more than 30 data points so the population standard
03:20 deviation can be approximated with the sample standard deviation.
03:24 We apply the confidence interval and you can see in blue, the range for
03:28 the confidence interval using a 95% confidence level.
03:32 This means that with a 95% confidence level, I can state that
03:36 the population mean is the range around the sample mean that is shown in blue.
03:41 And let me help you a bit with the Z transformation map.
03:44 The three most commonly used confidence intervals are 90%, 95%, and 99%.
03:50 Here's the applicable value of Z for each of those levels.
03:54 Let's run through a quick illustration.
03:57 The width of the interval associated with the confidence level
04:00 is critical if the information is to be useful.
04:03 The smaller the width of the confidence interval, the more useful the information.
04:07 For instance, suppose we determine that with a 95% confidence level,
04:12 I can state that the starting salary of a new university engineering graduate lies
04:17 between $30,000 a year and $80,000 a year.
04:20 The mean is $55,000.
04:23 But the confidence interval is $50,000 wide which is quite large.
04:28 Contrast that with the statement that I can say with 95% confidence level,
04:32 that the starting salary for
04:33 a new university engineering graduate is between 50,000 and 60,000.
04:39 The average or mean is still 55,000 but
04:41 now the confidence interval has be reduced from $50,000 wide to only $10,000 wide.
04:49 The smaller or narrower estimate is much more helpful to that new graduating
04:54 engineer to have a much better idea of the expectation about their starting salary.
05:00 So one of our goals when sampling is often
05:03 to work to shrink the width of the confidence interval.
05:06 In fact, let's discuss how we determine our sample size based upon the confidence
05:11 interval and the level of uncertainty we're prepared to tolerate.
05:15 We'll start with just one part of the confidence interval equation, and
05:18 that is the uncertainty band.
05:20 That is the width of the plus or minus part of the equation.
05:24 And it is the Z transformation of one-half alpha times the standard deviation,
05:29 divided by the square root of the number of items in the sample.
05:32 This equation can be transformed into the number of samples equals the ratio
05:37 of Z transformation of one-half alpha times the standard deviation divided
05:43 by the allowed or acceptable width of uncertainty, and that ratio was squared.
05:48 So this shows that the number of items in the sample
05:51 is based upon the sizes of the standard deviation,
05:54 the desired confidence level which will change the Z transformation value and
05:59 the expected or allowable uncertainty around the sample mean.
06:03 And what are the implications?
06:05 Well, one is that if we increase the confidence level from 95% to 99%,
06:10 it will increase the Z transformation value and
06:13 that will mean an increase in the required number of samples.
06:17 Also, if the standard deviation is high, which implies high variation
06:22 in the distribution, we will need a larger number of samples.
06:25 And third, if we decrease the amount of uncertainty that we're willing to accept
06:29 in our confidence interval, it will increase the required number of samples.
06:34 Well we've discussed the uncertainty of the actual population value,
06:39 based upon the sample size, and shown how to calculate in minimum sample size
06:44 based upon our confidence level and the uncertainty band that we will accept.
06:50 This should help us to make wise decisions about our sampling approach.

Lesson notes are only available for subscribers.

PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.

Confidence Intervals

About this lesson

Exercise files

Quick reference

Confidence Intervals

When to use

Instructions

Hints & tips