Locked lesson.
About this lesson
Many of the problems encountered in Lean Six Sigma projects do not have the straight-line correlation effect that we discussed with simple linear regression or multiple regression analysis. The relationship is better modeled by an exponential curve, a parabola, or other non-linear relationship. This lesson will use Minitab to assist in determining the best model to predict performance.
Exercise files
Download this lesson’s related exercise files.
Non-linear Regression Exercise.xlsx10.7 KB Non-linear Regression Exercise Solution.docx
88.4 KB
Quick reference
Non-linear Regression
Non-linear regression analysis is the creation of a regression equation with higher-order terms or multi-variate terms. These are often better representations of real-world effects than linear regression models.
When to use
Start your regression analysis with either a simple linear regression or multiple linear regression depending on the number of independent variables. If the residuals analysis is unacceptable, then switch to a non-linear analysis until the residuals have improved.
Instructions
Linear regression assumes that the rate of change in the independent variable is creating a change in the response variable at a constant rate. This can be represented by a straight-line relationship. However, in many cases the real-world effect is not a constant rate change, rather it varies. This is represented by a curved line plot of the relationship between the variables. There are many real-world effects that Lean Six Sigma teams encounter that have varying relationship rates. These include system degradation due to wear and tear often accelerates near the end of life; pressure and temperature effects on materials will vary especially when the material is close to changing state; and electronics often will saturate at the low or high end of their performance spectrum causing the performance line to flatten.
When non-linear effects are present, they can usually be modeled with a higher-order term (squared or cubic), an exponential term, a logarithmic term, or a mixed variable term. – meaning a term that tracks the interaction effect of two otherwise independent variables.
Excel will not create a non-linear regression, but Minitab can do it in several ways.
- If the nature of the non-linear relationship is already known, you can select “Non-Linear” in the Regression menu and enter the relationship directly.
- If there is only one variable, you can select the “Fitted Line Plot” option in the Regression menu and they select the level of the higher-order term you want to include
- If there are multiple terms, you can select “Fit Regression Model” in the Regression submenu. In this case, you can use the Model button to then select interaction terms and higher terms to include. Or you can select the Options button to enable a Box-Cox transformation which will try multiple higher-order terms to determine which provides the best fit.
Regardless of the method selected, check the Residuals to be certain the solution is appropriate. When doing multiple linear regression, the P value and R-squared value are not appropriate terms to check for goodness of fit. The appropriate measure is the Mean Squared Error (MSE). The formula for MSE is:
MSE = 1/n Ʃ(Yi – Y i)2
Hints & tips
- Remember correlation is not causation. You may have missed a term in your analysis so be sure to include all possible terms.
- However, adding terms adds uncertainty to the analysis. You need at least ten points for every term in the equation and that includes higher order and mixed interaction terms.
- You may need to iterate through several combinations to find a regression that has acceptable residuals.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 We've been talking about linear regression.
- 00:08 Now it's time to move to the next level, and discuss non-linear regression.
- 00:13 Of course, we're working with regression analysis in the decision tree, and
- 00:17 I briefly introduced the concept of non-linear regression in earlier lessons.
- 00:21 Now it's time to discuss how we actually work with
- 00:25 a non-linear regression analysis.
- 00:28 In a linear relationship, the rate of change of the dependent variable
- 00:32 is constant with respect to the rate of change in the independent variable,
- 00:36 that is why it is modeled with a straight line.
- 00:39 On a graph, the slope of the line does not change.
- 00:42 Let's contrast that with non-linear.
- 00:44 There is a relationship between the independent variable and
- 00:47 dependent variable.
- 00:48 But the difference is that the rate of change is changing between the variables.
- 00:53 The effect is normally modeled by a higher order effect for
- 00:57 the independent variables such as a squared, cube, log, or exponential term.
- 01:02 Another thing that can create a non-linear effect is if there are interaction terms
- 01:07 between the two independent variables.
- 01:10 In this case, the term in the equation would be a constant times both of
- 01:14 the independent variables.
- 01:15 So we see a non-linear relationship can still be modeled with a line, but
- 01:20 it's just a curved line, not a straight line.
- 01:24 So what would cause a non-linear effect?
- 01:27 First let's recognize that we have non-linear relationships with either
- 01:31 simple linear regression or multiple regression.
- 01:34 The number of independent variables does not limit whether non-linear
- 01:38 relationships exist.
- 01:39 As I mentioned this is often modeled with some type of an exponent on
- 01:43 the independent variable.
- 01:45 There are many physical effects that have a square, square root, or
- 01:49 logarithmic effect.
- 01:50 And this is true of many of the Lean Six Sigma projects you'll be working with.
- 01:54 For instance, performance degradation of equipment due to wear and
- 01:58 tear, often accelerates as the item of the equipment wears out.
- 02:01 Many times there are pressure or
- 02:03 temperature effects on materials that are impacted by squares or square root terms.
- 02:08 And one of my favorites is the failure rate of electronics.
- 02:11 Often, that is high at the beginning of use with infant mortality.
- 02:15 It then tapers off and we have a long period with no failures, and
- 02:20 then we get failures again at end of life.
- 02:23 It's because of these effects and others,
- 02:26 the higher order terms are needed to model the real-world effects.
- 02:30 And if you're able to plot the data, you might be able to guess the type of effect.
- 02:35 If not, you'll probably need to try several effects to see if any of
- 02:40 them Result in a very low mean squared error.
- 02:43 The P value is not reliable with multiple non-linear regression.
- 02:48 So the formula for the mean squared error as shown and it minitab will calculate for
- 02:53 you is a better indication of when you have a very good fit for your curve.
- 02:58 Speaking of minitab,
- 03:00 let's look at how to determine the non-linear regression relationship.
- 03:05 Minitab can be used to determine and model non-linear relationships, and
- 03:09 that formula can then be used to predict process performance.
- 03:13 If the nature of the relationship is already known,
- 03:16 you can immediately model it.
- 03:18 You do that by selecting the Stat pull down menu, select Regression, and
- 03:23 then select Non-linear Regression.
- 03:25 At that point, you can directly type the equation into the minitab form.
- 03:31 If you're working with just one independent variable, and
- 03:34 you don't know the relationship, you can start with the Stat pulldown menu,
- 03:38 select Regression, and then select Fitted Line Plot.
- 03:41 After selecting the response variable and the predictor,
- 03:45 then select the exponent you want to try.
- 03:47 As you can see in the plot, I didn't know what exponent.
- 03:51 I selected the cube level, but as it turned out, I only needed a squared term.
- 03:55 Minitab figured that out.
- 03:57 And you can see the equation that is above the plot has what is a 0 for
- 04:01 the cube term, but has a positive coefficient for the constant, and
- 04:06 the independent variable, and the variable squared.
- 04:09 An approach used to create the exponent terms is the Box-Cox transformation.
- 04:15 This is a methodology that is often used to transform independent variables into
- 04:20 a higher order term.
- 04:21 We'll discuss this in another lesson.
- 04:24 There are a few traps and
- 04:26 pitfalls that I want to review when we're using non-linear regression analysis.
- 04:31 The analysis will probably require several iterations to try different exponents or
- 04:37 interactions, comparing the residuals to pick the best.
- 04:40 If I have a multiple regression to accomplish, I'll first do a simple linear
- 04:45 with each independent variable, to determine which exponents to use, and
- 04:49 then start the multiple regression analysis with these exponents.
- 04:53 But the final selection will come down to that mean squared error.
- 04:57 Another thing is a point I made in an earlier lesson,
- 05:00 correlation does not mean causation.
- 05:02 If you don't include the right factors into the analysis,
- 05:06 you may show a correlation and totally miss the true cause.
- 05:10 The correlation is just a reflection that both the independent and
- 05:13 dependent variables are moving together because of a different factor.
- 05:18 So feel free to add other factors, but
- 05:20 make sure you have enough data points for all of the factors.
- 05:24 And the danger here is that if there are too many independent variables
- 05:28 compared to the size of the dataset, the true effect might be masked.
- 05:33 You need a data set that is at least 10 times larger than the number of
- 05:37 independent variables.
- 05:39 As we said before, with many independent factors,
- 05:42 you'll be more susceptible to the influence of outliers.
- 05:46 If the linear regressions don't create a good fit as shown in the residuals,
- 05:51 then it's time to switch to a nonlinear regression.
- 05:55 These relationships are very common in real-world problems.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.