Locked lesson.
About this lesson
Regression models are formulas that allow us to predict the performance of the system being analyzed. As a hypothesis test, we can determine whether the regression formula is able to predict the performance of the sample data set. This lesson defines the different types of regression analyses that will be discussed in later lessons and how to choose between the regression approaches.
Exercise files
Download this lesson’s related exercise files.
Regression Model Exercise.docx60.8 KB Regression Model Exercise Solution.docx
60.9 KB
Quick reference
Regression models
Regression creates a mathematical model of the relationship between independent factor(s) and the dependent factor that is the focus of the analysis.
When to use
Whenever correlation is established between one or more independent continuous factors and a dependent continuous factor, a regression model should be created to mathematically describe the relationship.
Instructions
A regression model is a mathematical formula in the form of “y = F(x).” This formula can take many different forms. When there is only one “x” factor, the regression model is referred to as a simple model. If there are multiple “x” factors, the regression model is referred to as a multiple regression model. If the “x” and “y” factors move together at the same rate, it is referred to as a linear model. If the “x” and “y” move at a varying rate, it is referred to as a nonlinear model.
The best regression model for a given set of “x” and “y” factors is determined by doing a “best fit” analysis with residuals. This best fit model is created based on the historical data that was used to test for correlation. Once this model is created, it can be used to predict the performance of the "y" factor based on controlling the "x" factor(s).
The form of typical regression models is:
Simple linear: y = a + bx
Simple quadratic: y = a + bx + cx2
Simple cubic: y = a + bx + cx2 + dx3
Simple Inverse: y = a + b/x
Simple exponential: y = a(bx)
Simple logarithmic: y = a (log(x))
Multiple linear: y = a + b1x1 + b2x2 + b3x3 …
Multiple non-linear: y = a + b1x1 + b2x2 + c1x12 + c2x22 + m12x1x2 + …
Hints & tips
- A regression equation can be calculated for any two continuous variables, but it is only meaningful when correlation has been established. Always check correlation first.
- If a higher order term in a non-linear model has a constant multiplier that is near zero, remove the term and redetermine the regression model.
- A multiple nonlinear model may have exponential, logarithmic, cubic or inverse terms in addition to quadratic and interaction terms shown in the example.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 Now we just looked at correlation and now we're going to move on to regression,
- 00:10 which predicts the mathematical relationship of correlated variables.
- 00:14 >> So once again we look at the hypothesis test decision tree.
- 00:19 If correlation exists, we determine the nature with the regression analysis.
- 00:24 It could be a regression with just two variables, or
- 00:27 it could have multiple variables.
- 00:30 Think of regression as a process model for correlation relationship.
- 00:35 The correlation coefficient, the Pearson value,
- 00:38 tells us the strength of that relationship but
- 00:40 it doesn't tell us anything about the actual model of the relationship.
- 00:45 There are some modifiers that go along with the term regression that tell us
- 00:48 something about the nature of the model.
- 00:51 There are only two variables involved an X and Y then it is a simple regression.
- 00:56 When they're more than two X variables involved,
- 00:59 then it is a multiple regression.
- 01:02 When the relationship between the X and the Y is a straight line,
- 01:06 it is a linear relationship.
- 01:08 And when the relationship is not a straight line, meaning it's a curve up or
- 01:12 down, then it is a non-linear relationship.
- 01:16 So which relationship you use will depend upon the type of
- 01:19 model that best fits the data.
- 01:21 This is done by doing a residual analysis.
- 01:24 Which means we take the actual historical data and
- 01:27 compare that to a predicted data based upon the model.
- 01:30 The difference of each data point is called the residual.
- 01:34 The condition which results in the lowest total residual value is our
- 01:38 best fit model.
- 01:39 Generically we call this the Y equals FX function.
- 01:43 And once we have the model, we can now predict a Y value based upon the X value.
- 01:49 If you're a little confused in residuals,
- 01:51 don't worry we'll talk about that more in another lesson.
- 01:54 Let's look at the form of the different regression models.
- 01:59 A simple linear regression is Y = a + bx.
- 02:03 That is 1x factor and a straight line.
- 02:07 A quadratic regression model is one factor with an X squared term.
- 02:12 The formula for this model is Y equals a + b times x + c times x squared.
- 02:18 And following that same pattern, the cubic regression model has an x cubed term.
- 02:24 The formula is Y = a + b times x + c times x squared + d times x cubed.
- 02:31 Now a simple non-linear model doesn't have to just have a squared or
- 02:36 cubed term, it could have an exponential term.
- 02:39 So in this model, the formula is Y equals a raised to the bx power.
- 02:46 But at the same token, the non-linear relationship could be best modeled
- 02:51 with a logarithmic function, or in this case Y equals a times the log of x.
- 02:56 So what about multiple regression?
- 02:59 Well, first let's look at a multiple linear regression.
- 03:03 In this case, there are many x terms, but none of them are in a non-linear form.
- 03:08 The equation is Y equals a plus b1 times x1, plus b2 times x2,
- 03:13 plus b3 times x3, and so on, depending on how many terms you have.
- 03:20 And using the same pattern, the multiple nonlinear regression would
- 03:25 have non-linear terms for x1, x2, x3, and so on.
- 03:29 In fact it may even include some terms that are multiples of x1 times x2,
- 03:35 or x2 times x3.
- 03:37 One example of a multiple non-linear regression model is Y
- 03:42 equals a plus b1 times x1 plus b2 times x2,
- 03:46 then plus c1 times x1 squared plus c2 times x2 squared,
- 03:51 and then plus m12 times x1 times x2.
- 03:55 And again you could include multiple terms beyond that.
- 03:58 Ultimately, the hypothesis test will tell us if there's a reliable formula
- 04:02 that accurately can predict the correlation between the variables.
- 04:06 If the hypothesis tells us to reject the null hypothesis,
- 04:10 then the regression analysis explains that relationship.
- 04:15 >> The regression formula is very useful if you want to be able to predict how
- 04:19 changes in one or more variables will impact the results in another.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.