Retired course
This course has been retired and is no longer supported.
About this lesson
Exercise files
Download this lesson’s related exercise files.
Non-Linear Regression.xlsx10.4 KB Non-Linear Regression - Solution.docx
335.6 KB
Quick reference
Non-Linear Regression
Non-linear regression analysis is the creation of a regression equation with higher order terms or multi-variate terms. These are often a better representation of real world effects than linear regression models.
When to use
Start your regression analysis with either a simple linear regression or multiple linear regression depending upon the number of independent variables. If the residuals analysis is unacceptable, then switch to a non-linear analysis until the residuals have improved.
Instructions
Linear regression assumes that the rate of change in the independent variable is creating a change in the response variable at a constant rate. This can be represented by a straight-line relationship. However, in many cases the real world effect is not a constant rate change, rather it varies. This is represented by a curved line plot of the relationship between the variables. There are many real world effects that Lean Six Sigma teams encounter that have a varying relationship rate. These include system degradation due to wear and tear often accelerates near the end of life; pressure and temperature effects on materials will vary especially when the material is close to changing state; and electronics often will saturate at the low or high end of their performance spectrum causing the performance line to flatten.
When non-linear effects are present, they can usually be modelled with either a higher order term or a mixed variable term. – meaning a term that tracks the interaction effect of two otherwise independent variables.
Excel will not create a non-linear regression, but Minitab can do it in several ways.
- If the nature of the non-linear relationship is already known, you can select “Non-Linear” in the Regression menu and enter the relationship directly.
- If there is only one variable, you can select the “Fitted Line Plot” option in the Regression menu and then select the level of the higher order term you want to include
- If there are multiple terms, you can select “Fit Regression Model” in the Regression submenu. In this case you can use the Model button to then select interaction terms and higher terms to include. Or you can select the Options button to enable a Box-Cox transformation which will try multiple higher order terms to determine which provides the best fit.
Regardless of the method selected, check the Residuals to be certain the solution is appropriate.
The Box-Cox transformation will normally take the response variable and raise it to powers. Positive integers are the power it is raised to. Negative integers treat the response as 1 divided by the variable to the integer power. A .5 Box-Cox is the square root and a 0 Box-Cox is the natural logarithm.
Hints & tips
- Remember correlation is not causation. You may have missed a term in your analysis so be sure to include all possible terms.
- However, adding terms adds uncertainty to the analysis. You need at least ten points for every term in the equation and that includes higher order and mixed interaction terms.
- You may need to iterate through several combinations to find a regression that has acceptable residuals.
- 00:05 Hello, I'm Ray Sheen.
- 00:06 We've been focused on linear regression.
- 00:09 Well, now it's time to move to the next level, and discuss Non-linear Regression.
- 00:15 >> Let me explain what I mean by linear versus non-linear regression.
- 00:19 In a linear relationship, the rate of a change in the dependent variable moves at
- 00:24 a constant rate with respect to the rate of change of the independent variable.
- 00:28 That is why it is modeled with a straight line.
- 00:30 On a graph, the slope of the line does not change.
- 00:34 Lets contrast that with non-linear.
- 00:36 There's a relationship between the independent variable and
- 00:39 dependent variable.
- 00:40 But a difference is that the rate of change between the variables is changing.
- 00:45 The effect is normally modeled by a higher order effect for
- 00:48 the independent variable such as squared, cubed, or square root.
- 00:52 Another thing that can create a non-linear effect is that there is an interaction
- 00:56 term between two independent variables.
- 00:59 In this case, the term in the equation would be a constant times both of
- 01:03 the independent variables.
- 01:05 So we see that a non-linear relationship can still be modeled with a line, but
- 01:09 it is a curved line, not a straight line.
- 01:12 So what could cause a non linear effect?
- 01:15 First, let's recognize that we can have non-linear
- 01:18 relationships with either simple linear regression and multiple regression.
- 01:22 The number of independent variables does not limit whether non-linear
- 01:26 relationships exist.
- 01:28 And as I mentioned, this is often modeled by a exponent on the independent variable.
- 01:32 There are many effects that have a squared, square root, or
- 01:35 logarithmic effect.
- 01:37 In fact, many of the problems that are encountered by Lean Six Sigma projects
- 01:41 have that type of effect.
- 01:42 Performance degradation of equipment due to wear and
- 01:44 tear, often accelerates as an item equipment wears out.
- 01:49 Many times, there are pressure,
- 01:50 temperature effects on material that are impacted by squared or square root terms.
- 01:55 Now, one of my favorite is the failure rate of electronics.
- 01:59 Often, that is high at the beginning of use with infant mortality issues,
- 02:03 then it tapers off at a very low rate until you get to the end of life.
- 02:06 And the rate of failure starts to accelerate again.
- 02:10 It's because of these effects and
- 02:11 others that the higher order terms are needed to model the real-world effects.
- 02:16 So now, let's look at how to determine the non-linear regression relationship.
- 02:20 Minitab can be used to determine and model non-linear relationships.
- 02:25 That formula can then be used to predict process performance.
- 02:28 If the nature of the relationship is already known,
- 02:31 you can immediately model it.
- 02:32 You do that by selecting the Stat pulldown menu, select Regression, and
- 02:36 then select Non-linear Regression.
- 02:39 At that point, you can directly type the equation into the Minitab form.
- 02:44 If you're working with just one independent variable and
- 02:47 you don't know the relationship.
- 02:49 You can start with the Stat pull down menu, select Regression, and
- 02:52 then select Fitted Line Plot.
- 02:55 After selecting the response variable, and
- 02:57 the predictor, then select the exponent you want to try.
- 03:01 As you can see in this plot, I didn't know the exponent.
- 03:04 I selected the cube level, but it turned out that I only needed a squared term.
- 03:09 Minitab figured that out and
- 03:11 you can see the equation that is above the plot has a zero for the cube term.
- 03:16 But has a positive coefficient for the constant on the independent variable and
- 03:20 that variable is squared.
- 03:22 A common approach used to create the exponent term
- 03:25 is the Box-Cox transformation.
- 03:27 This a methodology that is often used to transform an independent variable into
- 03:31 a higher order term.
- 03:33 In fact, let's look at the Box-Cox transformation.
- 03:36 Box-Cox adds a higher order term to a independent variable in order to create
- 03:41 the curvilinear effect.
- 03:42 The Box-Cox transformation will analyze a factor
- 03:46 to determine what exponent effect is the best model for the data in the data set.
- 03:51 It will do integer exponents ranging from -5 to +5.
- 03:55 It also includes a one half exponent, which is a square root, and
- 03:59 it checks for the logarithmic function, and refers to that as the 0 exponent.
- 04:04 The Box-Cox transformation is a great way to assess complex relationships
- 04:09 when doing regression analysis.
- 04:11 Just remember, that if you transform an independent variable before putting into
- 04:14 the regression formula, you need to undo the transformation
- 04:18 when you transpose back to the real world application.
- 04:21 To use this approach, start with the Stat pulldown menu, select Regression,
- 04:25 then Regression again, and finally, Fit Regression Model.
- 04:29 Then select the option button on the regression panel that comes up.
- 04:33 You can enter the Box-Cox exponent by putting it in or
- 04:37 entering the lambda value.
- 04:39 By the way, if you want to enter an interaction effect, you will need to
- 04:42 select the model button, and then you can add the interaction terms.
- 04:47 As you tried different options for exponents or
- 04:49 interactions, you should compare the residual values for
- 04:52 the different trials to determine which one is the best.
- 04:56 There are a few traps and pitfalls to watch for
- 04:59 when doing Non-linear Regression Analysis.
- 05:02 The analysis will probably require several iterations to try different exponents or
- 05:06 interactions, comparing the residuals to pick the best.
- 05:09 If I have a multiple regression to accomplish, I will first do simple linear
- 05:13 with each independent variable to determine which exponent to use.
- 05:16 And then, start the multiple regression analysis with those exponents.
- 05:20 But the final selection comes down to the residuals.
- 05:24 Another thing is a point I made in an earlier lesson.
- 05:26 Correlation does not mean causation.
- 05:29 If you don't include the right factors in the analysis,
- 05:32 you may show a correlation and totally miss the causation.
- 05:36 The correlation is just a reflection that both the independent variable and
- 05:39 dependent variable are moving together because of a different factor.
- 05:44 So feel free to add other factors, but make sure you have enough data points for
- 05:49 all the factors.
- 05:50 And the danger there is that if there are too many independent variables as compared
- 05:54 to the size of the data set, the true effects might be missed.
- 05:58 You need a data set that is at least 10 times larger than the number of
- 06:02 independent variables.
- 06:03 And as we said before, with many independent factors
- 06:07 you'll be more susceptible to the influence of outliers.
- 06:10 >> If the linear regression doesn't create a good fit as shown by
- 06:14 the residual analysis.
- 06:16 Well, then it's time to move to a Non-linear Regression Analysis.
- 06:20 And these situations do occur often in the real world.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.