Locked lesson.
About this lesson
A regression formula can be used to predict how a system responds to various inputs. However, based on the nature of the data set, there will be some uncertainty to the accuracy of that prediction. This lesson shows how to determine that level of uncertainty in the regression model prediction.
Exercise files
Download this lesson’s related exercise files.
Prediction Interval Exercise.docx62 KB Prediction Interval Exercise Solution.docx
67.4 KB
Quick reference
Prediction Interval
The prediction interval is a calculation that determines the degree of uncertainty around a calculated value. It is a zone in which the real-world “Y” should occur based on the accuracy of the model used to calculate “Y.”
When to use
Prediction intervals can be calculated when residuals can be calculated. Therefore, prediction intervals are often used with regression analysis and ANOVA.
Instructions
Some hypothesis tests such as regression analyses form a “best fit” equation to model the system performance based on the data set. These “best fit” equations should closely approximate the real world. But normally the actual values will be slightly different. The prediction interval estimates this level of uncertainty. It is based on the actual variation between the model and the real-world values.
The prediction interval calculation relies on the descriptive statistics of the residuals that are based on the regression model. In particular, the formula uses the mean value and the standard deviation of the residual distribution. The formula is shown below:
The mean value should be very close to zero since that is a key criteria for the determination of the best-fit regression model. The standard deviation is a measure of variability in the residual distribution. The better the fit of the regression model, the smaller the standard deviation and therefore the smaller the prediction interval. The “n” value is the number of samples used when creating the regression model. The “tα/2” is determined from the T table. The T Table will be discussed in more detail in another lesson. However, it is almost identical to the Z table which we have discussed already. In both cases, the α value will determine the constant to be selected along with the sample size. In Lean Six Sigma analysis we normally use an α of 0.95. The “plus” and “minus” portions of the calculation give the limits for the interval and are then used to calculate the entire interval width.
The prediction interval and confidence interval are quite similar in concept and the form of the equation. However, the application is quite different. Prediction intervals are used with regression models to determine the uncertainty around the next unique point to be calculated using the model. Confidence intervals are used to identify the uncertainty in the central tendency of an entire distribution. The prediction interval then shows the uncertainty of distribution points rather than the uncertainty of the mean for the entire distribution. Prediction interval uses the descriptive statistics of the residual distribution whereas the confidence interval uses the descriptive statistics of the entire distribution of real-world values. We often find prediction intervals used with relatively small samples, as long as the regression has a strong correlation coefficient. The confidence interval often will not becomes sable until relatively large samples are being used.
Hints & tips
- Prediction interval can be calculated with either Excel or Minitab.
- Prediction intervals are very sensitive to the model fit.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.