Retired course
This course has been retired and is no longer supported.
About this lesson
Exercise files
Download this lesson’s related exercise files.
Confidence Intervals.docx59.9 KB Confidence Intervals - Solution.docx
66.5 KB
Quick reference
Confidence Intervals
When inferring statistical values based upon a sample, there is a band of uncertainty around the sample statistic in which the population statistic lies. This band of uncertainty can be calculated and to an extent controlled by the sampling approach used.
When to use
When inferential statistics are used instead of descriptive statistics, a confidence interval and confidence level should always accompany the statistical values.
Instructions
Descriptive statistics provide a complete statistical description of a data population. However, often the full population is not available. Therefore, inferential statistics are used. This is done by calculating descriptive statistics for a sample from the population and inferring from those statistics the likely population statistics. However, since the sample does not include all data points from the population, the actual population statistics will likely be different than the sample statistics. It is possible to calculate the zone in which the population statistics will fall based upon information from the sample and the population. This zone or range is called the confidence interval. The size of this interval will depend in part upon the level of desired confidence that the actual statistic will be within the interval. This level is known as the confidence level.
The formula for the confidence interval is:
Where: CI is the Confidence Interval range.
X-bar is the mean from the sample
Sigma is the standard deviation from the sample
n is the number of items in the sample
Alpha is 1 – Confidence level %
Z is the Z transformation for an area that represents alpha/2 data from either end of the distribution curve. (see diagram below)
A diagram of the confidence level formula is shown below.
Generally, we want the confidence interval to be as small as possible so that there is little uncertainty with regards to population statistics. Based upon this formula we can draw some important conclusions. First if the standard deviation decreases, the confidence interval will decrease. Second if the sample size increases, the confidence level will decrease. Third if the confidence level is reduced, the confidence level will decrease. This third conclusion is based upon the value of Z for common Confidence levels.
The Confidence Interval formula can be transformed so that a required sample size (n) can be determined based upon the desired spread of the Confidence Interval. This is done by just using the plus and minus term from the confidence interval formula and manipulating terms to solve for the sample size. This formula is:
This formula can be used when planning sample data collection.
Hints & tips
- The only two elements that you can impact are the confidence level and the sample size. The mean and standard deviation come from the data. If you want to reduce your confidence interval, without reducing your confidence level, your only option is to collect more data in your sample.
- The actual formula for these calculations uses the standard deviation from the full population not from the sample. However, Walter Shewhart’s research showed us that once a sample has at least 30 points in it, the standard deviation no longer changes and that standard deviation is an excellent approximation of the full population standard deviation – provided of course that the sample is representative and random.
- 00:04 Hi, I'm Ray Sheen.
- 00:05 Now, whenever we have a discussion about Inferential Statistics,
- 00:09 we have to include the discussion about Confidence Interval.
- 00:14 Let me explain the concept of Confidence Interval.
- 00:17 When using inferential statistics,
- 00:19 we use a statistic of a sample data to estimate the sample statistic within
- 00:23 the full population from which that sample is extracted.
- 00:27 So we take a sample mean of x bar and used it to estimate population mean of mu.
- 00:32 But associated with that sample is a range called the confidence interval.
- 00:37 This range represents the possible values in which a random value for
- 00:40 the characteristic will occur.
- 00:42 In other words, this is the range in which we have confidence that the true value
- 00:46 actually lies.
- 00:48 And when the sample represents the entire population,
- 00:50 the sample statistic is the population statistic.
- 00:54 The range associated with the confidence interval can go to zero because there is
- 00:57 no uncertainty.
- 00:59 But when the sample is not the complete population,
- 01:01 there is a possibility that when the rest of the population is included,
- 01:05 the actual value will be higher or lower than the sample statistic.
- 01:10 Now, to come as no surprise, if the larger the sample, the less the uncertainty in
- 01:14 the true value for the population and the smaller the confidence interval.
- 01:19 In fact, let's look at the relationship between confidence interval and
- 01:22 sample size.
- 01:23 The confidence interval has some associated confidence level.
- 01:27 For instance,
- 01:28 the range of a 95% confidence level is larger than a 90% confidence level.
- 01:34 And a 99% confidence level is larger still.
- 01:37 The confidence interval calculation is based upon the Z Transformation.
- 01:41 You may recall from our lesson on this topic on Lean Six Sigma principles class
- 01:46 that the Z Transformation converts all of the independent values for
- 01:49 distribution into units of standard deviation.
- 01:53 Given these units, the width from minus 2 standard deviations to plus 2 standard
- 01:57 deviations represents approximately 95% of the processed results.
- 02:02 In addition to the Z Transformation,
- 02:04 the confidence interval calculation includes the value for
- 02:07 the sample size, the sample mean, and the population standard deviation.
- 02:13 Here's the formula, confidence interval, or CI, equals x bar plus or
- 02:18 minus the z transformation of one-half alpha times the ratio of the standard
- 02:24 deviation, divided by the square root of the number of items in the sample.
- 02:29 In this case, x bar is the sample mean.
- 02:31 The population's standard deviation is sigma, but
- 02:35 often we don't know the population standard deviation.
- 02:38 However, when the sample is relatively large,
- 02:40 at least 30 data points, the sample standard deviation will be
- 02:43 almost identical to the population's standard deviation.
- 02:46 So we will use the population standard deviation, and
- 02:50 is the number of items in the sample, alpha is 1 minus the confidence interval.
- 02:56 So if the confidence interval is 90%, alpha is 1-0.9 or
- 03:00 one-tenth, and Z is the Z transformation which locates the value
- 03:05 in standard deviations from the end of the distribution curve.
- 03:10 So let's consider an example.
- 03:12 Here's a standard normal distribution.
- 03:14 X bar has been calculated.
- 03:16 And we'll assume that there are more than 30 data points so the population standard
- 03:20 deviation can be approximated with the sample standard deviation.
- 03:24 We apply the confidence interval and you can see in blue, the range for
- 03:28 the confidence interval using a 95% confidence level.
- 03:32 This means that with a 95% confidence level, I can state that
- 03:36 the population mean is the range around the sample mean that is shown in blue.
- 03:41 And let me help you a bit with the Z transformation map.
- 03:44 The three most commonly used confidence intervals are 90%, 95%, and 99%.
- 03:50 Here's the applicable value of Z for each of those levels.
- 03:54 Let's run through a quick illustration.
- 03:57 The width of the interval associated with the confidence level
- 04:00 is critical if the information is to be useful.
- 04:03 The smaller the width of the confidence interval, the more useful the information.
- 04:07 For instance, suppose we determine that with a 95% confidence level,
- 04:12 I can state that the starting salary of a new university engineering graduate lies
- 04:17 between $30,000 a year and $80,000 a year.
- 04:20 The mean is $55,000.
- 04:23 But the confidence interval is $50,000 wide which is quite large.
- 04:28 Contrast that with the statement that I can say with 95% confidence level,
- 04:32 that the starting salary for
- 04:33 a new university engineering graduate is between 50,000 and 60,000.
- 04:39 The average or mean is still 55,000 but
- 04:41 now the confidence interval has be reduced from $50,000 wide to only $10,000 wide.
- 04:49 The smaller or narrower estimate is much more helpful to that new graduating
- 04:54 engineer to have a much better idea of the expectation about their starting salary.
- 05:00 So one of our goals when sampling is often
- 05:03 to work to shrink the width of the confidence interval.
- 05:06 In fact, let's discuss how we determine our sample size based upon the confidence
- 05:11 interval and the level of uncertainty we're prepared to tolerate.
- 05:15 We'll start with just one part of the confidence interval equation, and
- 05:18 that is the uncertainty band.
- 05:20 That is the width of the plus or minus part of the equation.
- 05:24 And it is the Z transformation of one-half alpha times the standard deviation,
- 05:29 divided by the square root of the number of items in the sample.
- 05:32 This equation can be transformed into the number of samples equals the ratio
- 05:37 of Z transformation of one-half alpha times the standard deviation divided
- 05:43 by the allowed or acceptable width of uncertainty, and that ratio was squared.
- 05:48 So this shows that the number of items in the sample
- 05:51 is based upon the sizes of the standard deviation,
- 05:54 the desired confidence level which will change the Z transformation value and
- 05:59 the expected or allowable uncertainty around the sample mean.
- 06:03 And what are the implications?
- 06:05 Well, one is that if we increase the confidence level from 95% to 99%,
- 06:10 it will increase the Z transformation value and
- 06:13 that will mean an increase in the required number of samples.
- 06:17 Also, if the standard deviation is high, which implies high variation
- 06:22 in the distribution, we will need a larger number of samples.
- 06:25 And third, if we decrease the amount of uncertainty that we're willing to accept
- 06:29 in our confidence interval, it will increase the required number of samples.
- 06:34 Well we've discussed the uncertainty of the actual population value,
- 06:39 based upon the sample size, and shown how to calculate in minimum sample size
- 06:44 based upon our confidence level and the uncertainty band that we will accept.
- 06:50 This should help us to make wise decisions about our sampling approach.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.