Locked lesson.
About this lesson
Since it is often impossible to analyze all the data items in a population of data, a sample is selected from the data population. But there is a chance that the sample may not perfectly represent the full population. Based upon an understanding of the data sample and population, a range or interval can be established around any sample statistic that represents the boundaries within which the population statistic exists. In this lesson, we learn how to determine the size of that range or confidence interval.
Exercise files
Download this lesson’s related exercise files.
Confidence Interval Exercise.docx59 KB Confidence Interval Exercise Solution.docx
62.9 KB
Quick reference
Confidence Intervals
When inferring statistical values based on a sample, there is a band of uncertainty around the sample statistic in which the population statistic lies. This band of uncertainty can be calculated based on the desired confidence level and the sample statistics.
When to use
When inferential statistics are used instead of descriptive statistics, a confidence interval and confidence level should always accompany the statistical analysis.
Instructions
Descriptive statistics provide a complete statistical description of a dataset. However, often the full population of data is not available and only a sample subset is analyzed. Therefore, inferential statistics are used. This is done by calculating descriptive statistics for a sample from the population and inferring from those statistics the likely population statistics. However, since the sample does not include all data points from the population, the actual population statistics will likely be different than the sample statistics. It is possible to calculate the zone in which the population statistics will likely fall based on information from the sample and the population. This zone or range is called the confidence interval. The size of this interval will depend in part upon the level of desired confidence that the actual statistic will be within the interval. This desired confidence is known as the confidence level.
The formula for the confidence interval is:
Where: CI is the Confidence Interval range (from min to max).
X-bar is the mean from the sample
Sigma is the standard deviation from the population
n is the number of items in the sample
Alpha is 1 – Confidence level %
Z is the Z transformation for an area that represents alpha/2 data from either end of the distribution curve
Generally, we want the confidence interval to be as small as possible so that there is little uncertainty with regard to population statistics. Based on this formula we can draw some important conclusions. First if the standard deviation decreases, the confidence interval will decrease. Second, if the sample size increases, the confidence interval will decrease. Third, if the confidence level is reduced, the confidence interval will decrease. This third conclusion is based on the value of Z for common Confidence levels.
Confidence Level | Z Value |
90% | 1.64 |
95% | 1.96 |
99% | 2.58 |
Hints & tips
- The only two elements that you can impact are the confidence level and the sample size. The mean and standard deviation come from the existing data. If you want to reduce your confidence interval, without reducing your confidence level, your only option is to collect more data in your sample.
- The actual formula for these calculations uses the standard deviation from the full population not from the sample. However, Walter Shewhart’s research showed us that once a normal sample has at least 30 points in it, the standard deviation no longer changes and that standard deviation of the sample is an excellent approximation of the full population standard deviation – provided of course that the sample is representative and random.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.