Locked lesson.
About this lesson
Since it is often impossible to analyze all the data items in a population of data, a sample is selected from the data population. But there is a chance that the sample may not perfectly represent the full population. Based upon an understanding of the data sample and population, a range or interval can be established around any sample statistic that represents the boundaries within which the population statistic exists. In this lesson, we learn how to determine the size of that range or confidence interval.
Exercise files
Download this lesson’s related exercise files.
Confidence Interval Exercise.docx59 KB Confidence Interval Exercise Solution.docx
62.9 KB
Quick reference
Confidence Intervals
When inferring statistical values based on a sample, there is a band of uncertainty around the sample statistic in which the population statistic lies. This band of uncertainty can be calculated based on the desired confidence level and the sample statistics.
When to use
When inferential statistics are used instead of descriptive statistics, a confidence interval and confidence level should always accompany the statistical analysis.
Instructions
Descriptive statistics provide a complete statistical description of a dataset. However, often the full population of data is not available and only a sample subset is analyzed. Therefore, inferential statistics are used. This is done by calculating descriptive statistics for a sample from the population and inferring from those statistics the likely population statistics. However, since the sample does not include all data points from the population, the actual population statistics will likely be different than the sample statistics. It is possible to calculate the zone in which the population statistics will likely fall based on information from the sample and the population. This zone or range is called the confidence interval. The size of this interval will depend in part upon the level of desired confidence that the actual statistic will be within the interval. This desired confidence is known as the confidence level.
The formula for the confidence interval is:
Where: CI is the Confidence Interval range (from min to max).
X-bar is the mean from the sample
Sigma is the standard deviation from the population
n is the number of items in the sample
Alpha is 1 – Confidence level %
Z is the Z transformation for an area that represents alpha/2 data from either end of the distribution curve
Generally, we want the confidence interval to be as small as possible so that there is little uncertainty with regard to population statistics. Based on this formula we can draw some important conclusions. First if the standard deviation decreases, the confidence interval will decrease. Second, if the sample size increases, the confidence interval will decrease. Third, if the confidence level is reduced, the confidence interval will decrease. This third conclusion is based on the value of Z for common Confidence levels.
Confidence Level | Z Value |
90% | 1.64 |
95% | 1.96 |
99% | 2.58 |
Hints & tips
- The only two elements that you can impact are the confidence level and the sample size. The mean and standard deviation come from the existing data. If you want to reduce your confidence interval, without reducing your confidence level, your only option is to collect more data in your sample.
- The actual formula for these calculations uses the standard deviation from the full population not from the sample. However, Walter Shewhart’s research showed us that once a normal sample has at least 30 points in it, the standard deviation no longer changes and that standard deviation of the sample is an excellent approximation of the full population standard deviation – provided of course that the sample is representative and random.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 Whenever we're having a discussion about inferential statistics,
- 00:09 we have to include a discussion about confidence interval.
- 00:13 Let me explain the concept of confidence interval.
- 00:17 When using inferential statistics,
- 00:19 we use the statistics of a sample of the data to estimate the same statistic within
- 00:23 the full population from which that sample was extracted.
- 00:27 So we take a sample mean of x bar and use it to estimate the population mean of mu,
- 00:33 but associated with that sample is a range called the confidence interval.
- 00:39 This range represents the possible values of which any subset or sample value for
- 00:44 the characteristic will occur.
- 00:46 In other words, this is the range in which we have confidence that the true value
- 00:51 actually lies.
- 00:53 Now, when the sample represents the entire population,
- 00:57 the sample statistic is the population statistic.
- 01:01 The range associated with the confidence interval can go to zero because there is
- 01:06 no uncertainty.
- 01:08 But when the sample is not the complete population,
- 01:11 then it is possible that when the rest of the population is included,
- 01:16 the actual value will be slightly higher or lower than the sample value.
- 01:21 Now, it should come as no surprise that the larger the sample,
- 01:25 the less the uncertainty in the true value for the population, and
- 01:29 the smaller the confidence interval.
- 01:32 In fact, let's look at the relationship between confidence interval and
- 01:37 sample size.
- 01:38 A confidence interval has some associated confidence level.
- 01:43 For instance, the range of 95% confidence level is larger than
- 01:48 a 90% confidence level, and a 99% confidence level is larger still.
- 01:54 The confidence interval calculation is based upon the Z Transformation.
- 02:00 We have a lesson coming up that explains how the Z Transformation is calculated and
- 02:05 how it converts all the independent values for
- 02:07 distribution into units of standard deviation.
- 02:11 Given these units, the width from -2 standard deviations to +2 standard
- 02:16 deviations represents approximately 95% of the process results.
- 02:21 In addition to the Z Transformation,
- 02:23 the confidence interval calculation includes the value for the sample size,
- 02:28 the sample mean, and the population standard deviation.
- 02:32 Here's the formula, Confidence Interval or CI equals x bar plus or
- 02:37 minus the Z Transformation of one half alpha times the rate of the standard
- 02:42 deviation divided by the square root of the number of items in the sample.
- 02:49 Now, in this case, x bar is the sample mean.
- 02:52 The population standard deviation is sigma but
- 02:55 often we can't know the population standard deviation.
- 03:00 However, when the sample is normal and relatively large, at least 30 data points,
- 03:04 the sample standard deviation will be almost identical to the population
- 03:09 standard deviation, so we'll use the sample standard deviation.
- 03:13 n is the number of items in the sample.
- 03:17 Alpha is 1- the confidence interval.
- 03:21 So if the confidence interval is 90%, alpha is 1- 0.9 or 1/10.
- 03:28 And z is the z transformation which locates the value in standard
- 03:32 deviations from each end of the distribution curve.
- 03:37 So let's consider an example.
- 03:38 Here is a standard normal distribution.
- 03:41 x bar has been calculated, and we'll assume there are more than 30 data points.
- 03:45 So the population standard deviation can be approximated by using the sample
- 03:49 standard deviation.
- 03:52 We apply the confidence interval, and you can see in blue the range for
- 03:56 the confidence interval, we're using a 95% confidence level.
- 04:00 That means that with a 95% confidence, I can state that the population
- 04:05 mean is within the range around the sample mean that is shown in blue.
- 04:10 And let me help you out a bit with the Z Transformation math.
- 04:14 The three most commonly used confidence intervals are 90%,
- 04:19 95%, and 99%, and here's the applicable value of Z for each of those levels.
- 04:27 Let's run through a quick illustration.
- 04:29 The width of the interval associated with the confidence level is critical if
- 04:33 the information is to be useful.
- 04:35 The smaller the width of the confidence interval, the more useful the information.
- 04:40 Let's do a simple illustration.
- 04:42 We collect some data from recent engineering graduates and
- 04:45 find that the average starting salary is $55,000.
- 04:49 Now a new engineer wants to establish a budget so
- 04:51 they can decide what type of car they can afford to buy.
- 04:55 The question they face is,
- 04:57 how much confidence can they place in that $55,000 value?
- 05:02 If using the 95% confidence level,
- 05:05 I can state that the starting salary is between 30,000 and 80,000.
- 05:10 The confidence interval is $50,000 wide.
- 05:13 The mean is 55,000 but that range is quite large.
- 05:18 It's hard to create a budget with that much uncertainty.
- 05:21 If instead the 95% confidence level for the starting salary is between 50,000 and
- 05:27 6000, there is much less uncertainty.
- 05:30 The upside potential is not as great as with the wider interval but
- 05:34 the downside is not as likely.
- 05:36 The new engineer can create a budget with a much higher degree of confidence
- 05:40 in their starting salary.
- 05:43 The smaller or narrower estimate is much more helpful for
- 05:46 the newly graduated engineer.
- 05:49 So one of our goals when sampling is often to work to shrink the width of
- 05:53 the confidence interval.
- 05:55 Well, we've seen the importance of confidence interval.
- 05:58 Whenever you are working with inferential statistics, there is a confidence
- 06:03 level and a confidence interval associated with your analysis.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.