Retired course
This course has been retired and is no longer supported.
About this lesson
Exercise files
Download this lesson’s related exercise files.
Multiple Linear Regression.xlsx11.4 KB Multiple Linear Regression - Solution.docx
230.8 KB
Quick reference
Multiple Linear Regression
Multiple linear regression analysis is the creation of an equation with multiple independent X variables that all influence a Y response variable. This equation is based upon the data set and models the conditions represented in the data.
When to use
When there are multiple independent variables that correlate with the system response, a multiple linear regression should be done. This can be used to predict process performance and identify which factors have the primary impact on process performance.
Instructions
Multiple linear regression is the appropriate technique to use when the data set has multiple continuous independent input variables and a continuous response variable. The technique determines which variables are statistically significant and creates an equation that shows the relationship of the variables to the response. To improve the accuracy of the analysis, there should be at least ten data points for each independent variable. The equation takes on the form:
- Y = β0 + β1X1 + β2X2 + β3X3 + …
Where the beta coefficients show the relative importance of each variable.
Multiple linear regression can be used to predict process performance based upon the values of the inputs. Input levels for ideal performance can be defined and tolerance levels that ensure acceptable performance can be determined using the regression equation. The equation will also be helpful for setting process controls.
Excel does not have a multiple linear regression function. The analysis can be done in Minitab using the “Fit Regression Model” option in the Regression menu. This will display an input panel where the response variable and input variables can be selected. If the analysis shows a variable is not statistically significant, check the residual plots to see if the result is normal. If not, remove the variable that is not statistically significant and rerun the analysis. The normality of the residuals should be improved.
Hints & tips
- Too many variables increases uncertainty in the analysis. There should be at least ten data points for each variable. Drop variables that are not statistically significant to improve the accuracy of the equation.
- The analysis assumes a linear (straight line) effect. If the residuals indicate a bad fit, you will need to add higher order terms and create a non-linear analysis. This is discussed in another lesson.
- 00:04 Hello, I'm Ray Sheen.
- 00:06 Sometimes the dependent or
- 00:07 response variable in the process depends upon more than one factor.
- 00:11 When that happens, you need to do a multi-linear regression analysis.
- 00:17 Once again, I'll start with our decision tree for hypothesis testing.
- 00:20 When we have continuous variables for the process response, and
- 00:24 the process independent variables, we turn to regression.
- 00:27 And when we want to analyze multiple variables at the same time,
- 00:30 we use the multiple regression technique.
- 00:33 Let's take a few minutes to explain what we mean by multiple regression analysis.
- 00:37 Recall that regression analysis determines the relationship between process variable.
- 00:43 And it is no surprise that multiple regression considers multiple
- 00:46 independent variables instead of just one.
- 00:49 The analysis determines the impact that each of the independent variables has on
- 00:53 the dependent variable.
- 00:55 It will determine the relative significance of each of the factors to
- 00:59 each other, in addition to the dependent factor.
- 01:02 The form of the equation is the dependent variable is equal to a constant,
- 01:06 which is referred to, in this equation, with beta zero.
- 01:10 And then each of the terms with their appropriate scaling factor.
- 01:14 So we see beta one times variable one, beta two times variable two and so on.
- 01:21 Multiple regression analysis is particularly useful for
- 01:24 predicting process performance.
- 01:27 The multiple regression analysis will result in an equation that relates
- 01:31 all of the independent variables to the dependent or response variable.
- 01:35 This equation is incredibly helpful when you're designing a solution for
- 01:39 a problem in a Lean Six Sigma project.
- 01:41 The equation predicts the dependent variable performance
- 01:44 based upon whatever values have been selected for the independent variable.
- 01:48 So when designing the solution, determine the ideal process performance.
- 01:52 Then determine what independent variable settings are needed
- 01:55 to achieve that performance.
- 01:57 Based upon the scaling constant for each of the factors,
- 02:00 you can also decide which factors will be the primary control for the process.
- 02:05 I prefer to use one easily controlled independent factor to control the overall
- 02:09 process.
- 02:10 And if possible, set the other factors in zones
- 02:13 that are very easy to lock in to a standard setting.
- 02:16 You can't always do that,
- 02:17 but it does make the process control much easier when you can.
- 02:22 So let's look at how we conduct a multiple regression analysis.
- 02:26 Excel does not have a function for
- 02:27 conducting a multiple linear regression analysis.
- 02:30 So I'll just focus on the Minitab approach.
- 02:33 In Minitab, go to the Stat pull-down menu, select Regression.
- 02:37 Select Regression again, and
- 02:39 then select Fit Regression model, just like is shown here.
- 02:43 That will bring up this panel.
- 02:45 Place your cursor in the response window to activate the list of data columns
- 02:49 in the window on the left.
- 02:51 Then select the dependent variable, often referred to as the y factor, and
- 02:55 click on the select button.
- 02:57 That column name should now move to the response window.
- 03:00 Now, place your cursor in the continuous predictors and
- 03:04 then select the appropriate columns.
- 03:06 You can also use categorical or discrete factors.
- 03:09 If you have them, however, if using this type of factor I recommend that you
- 03:14 always use factors that are bimodal such as a true false.
- 03:18 And set one of those criteria to one and the other to a zero.
- 03:22 And one more point, you can get the residual plots by selecting the graphs
- 03:27 button and then choosing residual for n1.
- 03:31 Let's finish off this topic with a few warnings about some pitfalls
- 03:34 when doing multiple regression analysis.
- 03:36 This analysis still assumes linear effects,
- 03:40 which means straight line effects for each of the independent variables.
- 03:43 We'll look at interactive effects when we talk about non-linear
- 03:46 regression in another class.
- 03:49 Adding lots of independent variables can increase uncertainty.
- 03:52 If you find that some factor has virtually no effect,
- 03:55 I would remove it from the analysis just to simplify things.
- 03:59 The no effect will be indicated by a very small beta value for that factor.
- 04:04 Too many factors creates too many potential interactions and it becomes
- 04:08 difficult to statistically validate the effect of each independent variable.
- 04:13 A good rule of thumb is that your dataset size should be at least 10 times
- 04:17 the number of independent factors being analyzed.
- 04:20 So if you want to analyze four factors at once,
- 04:23 the dataset needs to have a minimum of 40 points.
- 04:26 Also, when there are many independent factors in the analysis,
- 04:29 the regression formula becomes much more sensitive to outliers.
- 04:34 In many cases,
- 04:35 the multiple linear regression analysis is just what you need to understand
- 04:40 the handful of independent variables that are affecting the overall process output.
- 04:46 That formula that's created is also very helpful when designing the solution
- 04:50 for your problem.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.