Locked lesson.
About this lesson
Data transformation can convert a Lean Six Sigma problem from a non-linear regression analysis into a linear regression which is often easier to understand and explain to the stakeholders. The most common transformation approach is the Box-Cox transformation. In this lesson, we demonstrate how this transformation works and discuss when to use it.
Exercise files
Download this lesson’s related exercise files.
Transformation Exercise.xlsx11.2 KB Transformation Exercise Solution.docx
85.4 KB
Quick reference
Box-Cox and Transformations
Transformations modify non-linear terms in a prescribed manner so that they can be treated as linear terms in a regression analysis. The most common transformation is the Box-Cox transformation.
When to use
Transformations are used when working with higher-order terms, especially when working with multiple regression analyses. By applying a transformation, a linear analysis can be done instead of a non-linear analysis.
Instructions
Many real-world effects that Lean Six Sigma teams encounter are characterized by non-linear behavior. It is more difficult to create a non-linear regression analysis than a linear one when doing the analysis by hand. Fortunately, there is statistical software that can assist with non-linear models. However, if your tool of choice is Excel, you do not have the ability to do non-linear regression analysis. In that case, a transformation of one or more factors can change the problem into a linear relationship which can then be solved.
The most common transformation used in Lean Six Sigma regression analysis is the Box-Cox transformation. Box-Cox organizes the transformation approach into a logical sequence that can be tried by hand to see if there is a suitable linear relationship. Box-Cox uses whole number integers to designate the exponent to be used in the transformation. A “2” designates that the factor should be squared and a “3” indicates it should be cubed. A “0.5” is a square root and a Box-Cox value of “0” is the natural log of the factor. Negative Box-Cox values are the same effects only with the factor in the denominator of a fraction. Therefore, a “-2” is 1/x2. Box-Cox can go as high as 5, but in practice, it seldom exceeds 2.
To use Box-Cox in Minitab, load your data in the normal manner. Then select
Stat → Regression → Regression → Fit Regression Model.
Set up the regression analysis with the response factor and the control factors. Then select the “Option” button. On the Option panel, there are radio buttons to select various Box-Cox values. This transformation will be applied to your response factor. To apply the transformation to the control factors, select Non-linear analysis from the Regression menu and enter the selected exponent or function for each control factor. If using Box-Cox and transforming your response factor (Y), you need to transform back to know the “real-world” values.
Multiple non-linear regression analyses often will include transformations to simplify them, resulting in either a non-linear analysis with one factor or a multiple linear analysis. If you have process knowledge that allows for a transformation to one of those methods, it will likely give you a better fit for the result. Whenever working with multiple non-linear analyses, the standard error is the preferred technique for determining if the fit is adequate.
SE =
where “n” is the sample size and “k” is the number of independent variables
Hints & tips
- Box-Cox is a framework for cataloging the transformation operators to be used. It is not a stand-alone mathematical operation.
- It may take several tries to find the best transformation.
- When reviewing the residual plots, if the “Versus Fit” plot has a definite shape to it, then use a Box-Cox transformation until the pattern becomes random.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 There's another topic I'd like to discuss with relation to regression analysis,
- 00:11 and that is transformations.
- 00:13 What does a transformation do?
- 00:15 Well, when you have a non-linear term in your relationship,
- 00:19 you can attempt to transform that term to turn it into a linear term which then
- 00:24 simplifies the analysis.
- 00:27 In fact, you can transform either the Y parameter or one of the X parameters.
- 00:31 But just because you can do something doesn't mean you should do that thing.
- 00:37 The reasons you may consider doing this are, first, to work with linear models.
- 00:41 Linear models are easier to explain.
- 00:44 We can show the straight line and the rate of change is constant.
- 00:48 The direct relationship is easier to follow.
- 00:51 On top of that,
- 00:51 there are additional benefits because of the tools that can now be used.
- 00:55 For instance, if you're working in Excel, you do not have any non-linear tools.
- 01:00 But if the data is transformed, you can then use the Excel tools.
- 01:05 Of course, there are some disadvantages.
- 01:08 Sometimes it's hard to create a good understanding of what does that
- 01:12 transformed input actually mean,
- 01:14 since it's not using the physical units that occur in nature.
- 01:18 Also, depending upon the transformation you use,
- 01:21 you may need statistical software to do the calculations for you.
- 01:26 So with all that said, the most popular transformation of data within
- 01:30 Lean Six Sigma projects is the Box-Cox transformation.
- 01:34 Now, let's go through how Box-Cox works.
- 01:37 With Box-Cox, we add a higher order term into the analysis.
- 01:43 This then lets us model squared, cubed, even higher order effects if needed.
- 01:48 When using Box-Cox we always use a whole number of the exponent for
- 01:53 the factor except for the case when we're using 0.5 or -0.5.
- 01:58 In fact, the range of the exponents can go from -5 to +5.
- 02:03 So just to be clear, when the exponent is 1, we're using the value as is and
- 02:10 when the exponent is -1, we're using 1 over that value.
- 02:14 When the exponent is 2, we are squaring the term and
- 02:19 when it is -2, the term is 1 over the factor squared.
- 02:23 There are several special cases.
- 02:25 When the exponent is 0.5, we take the square root and
- 02:30 -0.5 is 1 over the square root.
- 02:33 And the final special case is that when the exponent is 0,
- 02:37 Box-Cox takes the logarithm of that value.
- 02:41 Box-Cox is commonly used because it is easy to understand.
- 02:45 When working with Minitab, start like you're doing a simple linear regression.
- 02:49 Select Stat then Regression and Regression again.
- 02:53 When the panel comes up, select Fit Regression Model button, and
- 02:57 then select Options button.
- 02:59 You can try several different Box-Cox solutions and
- 03:02 compare the residual plots to see which one gives you the best fit.
- 03:07 One caution, if you're transforming the Y factor instead of an X factor,
- 03:11 you need to transform it back when looking at things like limits,
- 03:15 targets, or other real world parameters.
- 03:19 If it is a multiple regression analysis, things can be a bit more complicated, and
- 03:24 we'll try to use Box-Cox to get to the point of a multiple linear regression.
- 03:29 So in this case I'm talking about where a simple non-linear analysis did not work
- 03:34 because of other factors but a multiple linear analysis did not work either.
- 03:40 You find you need to use multiple X terms and at least one of them is non-linear.
- 03:46 So in that case,
- 03:47 rely on your process knowledge to guess at the likely non-linear effect.
- 03:51 And then use a Box-Cox transformation of the appropriate term that
- 03:55 would create a multiple linear regression analysis of the new set of terms.
- 04:00 Which is one transformed by Box-Cox and the others in their natural form.
- 04:05 Just like with the other simple nonlinear analysis,
- 04:08 you determine your error terms for each of the points.
- 04:11 The line with the smallest multiple standard error is your best fit curve.
- 04:16 Transforming the data used in the analysis can simplify the regression solution.
- 04:21 And the Box-Cox transformation is the most commonly used one for
- 04:27 Lean Six Sigma projects.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.