- 720p
- 540p
- 360p
- 0.50x
- 0.75x
- 1.00x
- 1.25x
- 1.50x
- 1.75x
- 2.00x
We hope you enjoyed this lesson.
Cool lesson, huh? Share it with your friends
About this lesson
Many times there are multiple factors that are influencing the response variable in a problem. Multiple regression determines the relationship between the response factor and multiple control factors. Like with simple linear regression, a formula is created that allows both analysis and prediction of the process and problem.
Exercise files
Download this lesson’s related exercise files.
Multiple Linear Regression11.5 KB Multiple Linear Regression - Solution
229.7 KB
Quick reference
Multiple Linear Regression
Multiple linear regression analysis is the creation of an equation with multiple independent X variables that all influence a Y response variable. This equation is based upon an existing data set and models the conditions represented in the data.
When to use
When there are multiple independent variables that correlate with the system response, a multiple linear regression should be done. This can be used to predict process performance and identify which factors have the primary impact on process performance.
Instructions
Multiple linear regression is the appropriate technique to use when the data set has multiple continuous independent input variables and a continuous response variable. The technique determines which variables are statistically significant and creates an equation that shows the relationship of the variables to the response. To improve the accuracy of the analysis, there should be at least ten data points for each independent variable. The equation takes on the form:
Y = a + b1X1 + b2X2 + b3X3 + …
Where the absolute value of the “b” coefficients shows the relative importance of each variable.
Multiple linear regression can be used to predict process performance based on the values of the inputs. Input levels for ideal performance can be defined and tolerance levels that ensure acceptable performance can be determined using the regression equation. The equation will also be helpful for setting process controls.
Excel does not have a multiple linear regression function. The analysis can be done in Minitab using the “Fit Regression Model” option in the Regression menu. This will display an input panel where the response variable and input variables can be selected. If the analysis shows a variable is not statistically significant, check the residual plots to see if the result is normal. If not, remove the variable that is not statistically significant and rerun the analysis. The normality of the residuals should be improved.
Hints & tips
- Too many variables increase uncertainty in the analysis. There should be at least ten data points for each variable (e.g. if using three variables have at least 30 data points).
- Drop variables that are not statistically significant to improve the accuracy of the equation.
- The analysis assumes a linear (straight line) effect. If the residuals indicate a bad fit, you will need to add higher-order terms and create a non-linear analysis. This is discussed in another lesson.
- Always check the residual analysis to ensure it is normally distributed with equal variance and indicates independence.
- 00:04 Hi, I'm Ray Sheen.
- 00:05 Sometimes the dependent or
- 00:07 response variable in the analysis depends upon more than one factor.
- 00:12 When that happens, you may need to do a multi-linear regression analysis.
- 00:18 >> Once again, I'll start with our decision tree for hypothesis testing.
- 00:22 When we have continuous variables for the process response and
- 00:26 the process independent variables, we turn to the regression analysis, and when we
- 00:30 have multiple variables at the same time, we use the multiple regression technique.
- 00:36 Let's take a few minutes to explain what we mean by multiple regression analysis.
- 00:42 Recall that regression analysis determines the relationship between process
- 00:45 variables.
- 00:47 And it's no surprise that multiple regression considers multiple independent
- 00:51 variables instead of just one.
- 00:53 The analysis determines the impact of each of these independent
- 00:57 variables upon the dependent variable.
- 01:00 If you tried a simple linear analysis and it wasn't a good fit,
- 01:03 you can consider adding some additional terms.
- 01:07 Now, regardless of the number of terms, the format for
- 01:09 the hypothesis tests are still the same as a simple linear regression.
- 01:13 The null hypothesis is that there is no relationship, and
- 01:17 the alternative hypothesis is that there is a relationship.
- 01:21 The analysis will determine the relative significance of each of the factors to
- 01:25 each other in addition to the dependent factor.
- 01:28 It will show up with coefficients of these factors.
- 01:31 The form of the multiple linear equation is a dependent variable y is equal to
- 01:36 a constant indicated by a in this equation plus a term with each of the variables,
- 01:41 which has a coefficient associated with it.
- 01:44 So it's beta 1 times x1 + beta 2 times x 2 + beta 3 times x 3 and so on.
- 01:53 Multiple regression analysis is particularly useful for
- 01:56 predicting process performance.
- 01:59 The multiple regression analysis will result in an equation that relates all
- 02:03 the independent variables to the dependent variable.
- 02:07 This analysis provides the terms and the coefficients.
- 02:12 This equation is incredibly helpful when you're designing the solution for
- 02:15 a problem in a Lean Six Sigma Project.
- 02:18 The equation predicts the dependent variable performance based upon whatever
- 02:22 values you've selected for the independent variables.
- 02:25 So, as you're designing the solution, you may want to design one of your independent
- 02:30 variables to be in a particular zone that's well controlled.
- 02:34 This will then help you determine how with the other variables you can achieve
- 02:38 the desired performance.
- 02:40 Based upon the scaling constants for each of the factors,
- 02:44 you can also decide which factors will have the primary control for the process.
- 02:49 I prefer to use one easily controlled independent factor to control
- 02:53 an overall process, and if possible to set the other factors and
- 02:58 zones that are very easy to lock into a standard setting.
- 03:02 You can't always do that, but if you can, it makes process control much easier.
- 03:07 So let's look at how we conduct a multiple regression analysis.
- 03:11 Excel does not have a function for
- 03:13 conducting multiple linear regression analysis.
- 03:16 So we will have to rely on Minitab.
- 03:19 In Minitab, go to the stat pulldown menu, select Regression, select
- 03:24 Regression again, and then select Fit Regression Model just like is shown here.
- 03:30 That will bring up this panel.
- 03:32 Place your cursor in the response window to activate the list of
- 03:36 data values in the window on the left ,then select the dependent variable
- 03:41 often referred to as the y factor and click on the selection button.
- 03:46 The column should move to the response window.
- 03:48 Now, place your cursor in the continuous predictors and
- 03:52 then select the appropriate column.
- 03:54 You can also use categorical or discrete factors if you have them.
- 03:58 However, if you're using these factors I always recommend you use factors that
- 04:02 are bimodal such as true, false.
- 04:04 Set one of those to a value of 1 and the other to a value of 0.
- 04:08 And one more point, you get the residual plots by selecting the graph button and
- 04:13 then choosing residual 4 and 1.
- 04:17 Let's finish off this topic with a few warnings and
- 04:20 some pitfalls when doing multiple regression analysis.
- 04:24 This analysis still assumes linear effect, which means straight line effects for
- 04:28 each of the independent variables.
- 04:31 We'll look at interactive effects when we look at nonlinear regression in
- 04:34 another lesson.
- 04:36 Adding lots of independent variables can increase uncertainty.
- 04:40 If you find that some factor has virtually no effect,
- 04:43 then I recommend removing it from your analysis and simplify things.
- 04:48 Check your residual plots to make sure the residuals are normally distributed.
- 04:53 This is another indication that you have a good solution.
- 04:56 And finally, too many factors creates too many potential interactions,
- 05:00 and it becomes difficult to statistically validate the effect of each independent
- 05:04 variable.
- 05:05 A good rule of thumb is that your data set size should have at least 10 times,
- 05:10 the number of independent factors being analyzed.
- 05:14 So if you want to analyze four factors at once,
- 05:17 the data set needs to have a minimum of 40 points.
- 05:20 Also, when there are many independent factors in the analysis,
- 05:24 the regression formula becomes much more sensitive to outliers.
- 05:29 >> In many cases,
- 05:30 the multiple linear regression analysis is just what is needed to understand
- 05:34 how the handful of independent variables affects the overall process output.
- 05:40 The formula created is extremely helpful when
- 05:44 determining the optimal solution for your problem.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.