Locked lesson.
About this lesson
A statistical analysis or test creates a mathematical model to fit the data in the sample. The real-world data seldom precisely fits the model. The differences between the model and the actual data are known as residuals. This lesson explains how to read residual graphs and analyses.
Exercise files
Download this lesson’s related exercise files.
Residual Analysis Exercise.docx103.4 KB Residual Analysis Exercise Solution.docx
59.2 KB
Quick reference
Residual Analysis
Residuals are the difference between the actual data and the predicted data values based on the Hypothesis test solution. The analysis of the residuals is a way of assessing the validity of the Hypothesis test.
When to use
When the Hypothesis test creates a formula or prediction of the data values, residuals can be calculated. Residuals are created for hypothesis tests that use regression analysis and ANOVA.
Instructions
Some hypothesis tests form a “best fit” equation to model the system performance based on the data set. These “best fit” equations should closely approximate the real world. But normally the actual values will be slightly different. When creating the “best fit” the actual values are compared to the predicted value and the difference is a residual. The “best fit” solution is determined by a set of calculations of these residuals. The mean of the residuals must be zero and the absolute value of the sum of the residuals is at a minimum.
The residuals can be plotted and a review of the residuals will provide an assessment of whether the “best fit” plot is truly a good model for the data set. There are several things to consider when reviewing the residuals. The first is whether the residual plot is normal. A valid “best fit” should result in a normal plot. That of course is characterized by a mean of zero. But also, there are approximately the same number of points above and below the line – it is not skewed. And there is a central tendency to the data – meaning appropriate kurtosis. When plotted in a histogram, the residuals should have a bell-shaped curve. When plotted against the normal line, the residuals should fall on the line or very near it.
In addition, the value of the residuals should not be dependent upon the time-wise nature of the process. That means that neither the mean nor absolute value are time dependent.
Finally, when considering the residual plot that is either based on the order of the residuals occurring or the value of the response variable, watch for patterns in the data. Again a strong pattern is an indication that the “best fit” is missing something.
When the residual analysis indicates a problem with the “best fit,” the solution is normally to switch to a multivariate solution or a non-linear solution. Both of those approaches introduce additional terms into the “best fit” equations that will account for the observed issues. These topics are discussed more in later lessons.
Hints & tips
- Minitab will create the residuals by selecting the graphs button and choosing which residual graphs to use. I normally select the 3-in-one or 4-in-one views.
- Excel has an option to create a residual plot when doing the regression analysis.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.