Locked lesson.
About this lesson
A regression formula can be used to predict how a system responds to various inputs. However, based on the nature of the data set, there will be some uncertainty to the accuracy of that prediction. This lesson shows how to determine that level of uncertainty in the regression model prediction.
Exercise files
Download this lesson’s related exercise files.
Prediction Interval Exercise.docx62 KB Prediction Interval Exercise Solution.docx
67.4 KB
Quick reference
Prediction Interval
The prediction interval is a calculation that determines the degree of uncertainty around a calculated value. It is a zone in which the real-world “Y” should occur based on the accuracy of the model used to calculate “Y.”
When to use
Prediction intervals can be calculated when residuals can be calculated. Therefore, prediction intervals are often used with regression analysis and ANOVA.
Instructions
Some hypothesis tests such as regression analyses form a “best fit” equation to model the system performance based on the data set. These “best fit” equations should closely approximate the real world. But normally the actual values will be slightly different. The prediction interval estimates this level of uncertainty. It is based on the actual variation between the model and the real-world values.
The prediction interval calculation relies on the descriptive statistics of the residuals that are based on the regression model. In particular, the formula uses the mean value and the standard deviation of the residual distribution. The formula is shown below:
The mean value should be very close to zero since that is a key criteria for the determination of the best-fit regression model. The standard deviation is a measure of variability in the residual distribution. The better the fit of the regression model, the smaller the standard deviation and therefore the smaller the prediction interval. The “n” value is the number of samples used when creating the regression model. The “tα/2” is determined from the T table. The T Table will be discussed in more detail in another lesson. However, it is almost identical to the Z table which we have discussed already. In both cases, the α value will determine the constant to be selected along with the sample size. In Lean Six Sigma analysis we normally use an α of 0.95. The “plus” and “minus” portions of the calculation give the limits for the interval and are then used to calculate the entire interval width.
The prediction interval and confidence interval are quite similar in concept and the form of the equation. However, the application is quite different. Prediction intervals are used with regression models to determine the uncertainty around the next unique point to be calculated using the model. Confidence intervals are used to identify the uncertainty in the central tendency of an entire distribution. The prediction interval then shows the uncertainty of distribution points rather than the uncertainty of the mean for the entire distribution. Prediction interval uses the descriptive statistics of the residual distribution whereas the confidence interval uses the descriptive statistics of the entire distribution of real-world values. We often find prediction intervals used with relatively small samples, as long as the regression has a strong correlation coefficient. The confidence interval often will not becomes sable until relatively large samples are being used.
Hints & tips
- Prediction interval can be calculated with either Excel or Minitab.
- Prediction intervals are very sensitive to the model fit.
- 00:04 Hi, I'm Ray Sheen.
- 00:05 I'd like to introduce another principle associated with regression analysis,
- 00:10 prediction interval.
- 00:11 You're probably asking, what is the prediction interval?
- 00:15 Recall that when doing regression analysis, you create a model or
- 00:19 formula that represents the data set.
- 00:22 The formula is the form of y equals some function of x,
- 00:26 where x is the independent variable or possible multiple x's.
- 00:32 With the formula, I can plug in any value for x and
- 00:35 it gives me the corresponding y value.
- 00:39 But like everything else with statistical analysis, there is a little bit of
- 00:43 uncertainty when going between the real world and the statistical world.
- 00:48 Which means that the actual real world y will probably be a bit different from
- 00:53 the calculated y value.
- 00:55 Now, the better the fit of the formula, the less uncertainty or
- 00:58 error term in the formula.
- 01:01 The prediction interval is a calculation that determines the degree of uncertainty
- 01:06 around the calculated y value.
- 01:08 It is a zone in which the real world y should occur.
- 01:12 This may seem familiar to you.
- 01:13 It's the same concept as the confidence interval that we discussed earlier when
- 01:18 looking at normality.
- 01:19 Like that interval, we will use some of the descriptive statistics of the data set
- 01:24 to determine the prediction interval.
- 01:27 So let's look at how this is calculated.
- 01:30 First, some background.
- 01:31 The prediction interval will be based upon the results of the residual analysis.
- 01:37 In fact, we'll use some of the descriptive statistics associated with
- 01:41 the distribution of the residual analysis as we calculate the prediction interval.
- 01:46 In particular, if the model fits really well, the residuals will be very small and
- 01:51 therefore the standard deviation of the residual distribution is small.
- 01:56 If the model is poor, well, then we have a larger residuals and
- 02:00 a larger standard deviation.
- 02:03 The formula is straightforward.
- 02:04 The prediction interval is the range between the mean value of the residuals
- 02:09 plus the t function at alpha divided by 2 times the standard deviation of
- 02:14 the residuals plus the standard deviation of the residuals divided by
- 02:18 the square root of the number of data points in the residual data set.
- 02:23 Now, that's at the top of the prediction interval.
- 02:25 The bottom of the prediction interval, you just subtract that t statistic
- 02:30 at alpha divided by 2 times the number of standard deviations
- 02:33 plus the standard deviation divided by the number of points in the data set.
- 02:38 We'll talk about the t statistic more later, but for now,
- 02:42 consider it to be similar to the z statistic.
- 02:45 So if the standard deviation is small, the range between the upper and
- 02:50 the lower value will also be very small.
- 02:54 Since I mentioned confidence interval, let me compare confidence interval and
- 02:57 prediction interval.
- 02:59 You can expect at least one question on the IASSC exam to try and
- 03:02 trip you up between the two.
- 03:05 Now, I grant you that the form of the equation between the two is similar in
- 03:10 many respects, but the purpose and application are what is different.
- 03:15 The prediction interval is often used to predict the next y value based upon
- 03:19 the input of the next unique x value in a regression formula.
- 03:23 It is the range in which that specific y will actually occur.
- 03:28 Whereas the confidence interval is the range around the subset mean value
- 03:32 in which the population actual mean is occurring.
- 03:36 The prediction interval is an indication of the uncertainty in the model used to
- 03:41 represent the distribution.
- 03:43 The confidence interval is the level of uncertainty with respect to
- 03:47 the central tendency of an existing distribution.
- 03:52 There is no model involved.
- 03:54 The prediction interval uses the descriptive statistics
- 03:57 of the residual distribution.
- 03:59 Whereas the confidence interval is using the descriptive statistics of the actual
- 04:04 output variable that's being investigated.
- 04:06 And the prediction interval assumes the sample is rather small.
- 04:11 Whereas the confidence interval assumes that it's a large sample that was
- 04:15 collected before calculating the descriptive statistics.
- 04:19 The prediction interval informs us as to how good our regression model is and
- 04:26 to what degree we can actually trust the answers.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.