Retired course
This course has been retired and is no longer supported.
About this lesson
Simple linear regression analysis creates an equation that correlates two factors. This equation both assists in understanding problems, and it can also be used to manage the problem or process going forward. This lesson shows how to calculate this line with the help of either Excel or Minitab.
Exercise files
Download this lesson’s related exercise files.
Simple Linear Regression.xlsx10.3 KB Simple Linear Regression - Solution.docx
201.9 KB
Quick reference
Simple Linear Regression
Simple linear regression is the creation of a formula that shows the straight line relationship between two correlated variable. One variable is the independent variable and the other is the dependent variable.
When to use
Once correlation is established between two factors, if those factors are continuous variables, a simple linear regression line formula can be created. This formula can be used to determine the effect of the independent variable on the dependent variable during the Analyze Phase and for predicting process performance when designing a solution during the Improve phase.
Instructions
Once correlation has been established between two continuous variables, then a simple linear regression line can be determined. This line is in the format of:
Y = mx + b.
In this equation, x is the independent variable and y is the dependent variable. Also, m is the slope of the line and represents the actual correlation relationship. Finally, b is the y intercept for the line and is needed to establish the correct values in order to use the line for prediction.
This line is the best fit plot of the data points. In real life, the actual data points are seldom exactly on the line, but when there is high correlation, the data points will be close to the line.
While this line is valuable for both investigating root causes and for predicting performance when designing a solution, there are some limitations. Mathematically, the line extends in either direction to infinity. In reality, there are almost always limits. For instance, an analysis found that the amount of study time by a student was correlated with the student’s score on the test, so a simple linear regression line was created. However, a student cannot study for negative time, so the lower limit on the independent variable was zero, Furthermore, a student could not get better than a perfect score on the test, so the upper limit on the dependent variable was 100%.
Both Excel and Minitab will determine the coefficients needed to create simple linear regression line.
- Excel:
- Data Analysis
- Regression
- Enter the range of the data, similar to what was done to check correlation
- Minitab
- Stat
- Regression
- Fitted Line Plot
- Enter the column for the Y variable and the X variable
- Ensure “Linear” is selected
Hints & tips
- Always check correlation first. Excel and Minitab can calculate a line even when there is not correlation. But that line is meaningless. Since there is no correlation, the next y value will not truly be related to the x value and the regression line is not able to predict the next y value.
- The Excel regression function provides more information that just the linear regression line coefficients. We will discuss the other information with the appropriate hypothesis test.
- 00:04 Hello, I'm Ray Sheen.
- 00:06 The next type of hypothesis test that I wanna discuss is Simple Linear Regression.
- 00:11 This test is used both on the analysis phase of a Lean Six Sigma project
- 00:15 to find a problem and in the improved phase to design an optimal solution.
- 00:20 We start with the hypothesis testing decision tree.
- 00:25 If we have established that there is a correlation between
- 00:28 an independent variable and the corresponding dependent variable,
- 00:31 and we know that this are continuous variables,
- 00:33 then we can determine the linear regression line for those variables.
- 00:37 Linear regression builds on the concept of correlation.
- 00:41 Recall that correlation exists when there is a relationship between two or
- 00:44 more things.
- 00:45 For instance, when one goes up, the other goes up also.
- 00:48 This correlation relationship can be turned into a formula or
- 00:51 equation that shows the relationship.
- 00:54 As the independent factor or variable is changed,
- 00:56 the dependent variable changes according to a set rule or pattern.
- 01:01 If correlation was established, a formula can be generated and
- 01:04 that can be used to uncover the root cause or causes of a problem.
- 01:08 The defect in the dependent variable can be traced back to a certain value or
- 01:12 effect that was occurring in the independent variable.
- 01:15 This formula can also be used to design a solution in the improved phase.
- 01:19 Since it predicts the dependent variable response, controls can be put in place on
- 01:24 the independent variable to ensure that the response is always acceptable.
- 01:29 Let's explain this idea of a simple linear regression formula a bit more.
- 01:33 The regression line is meant to be a quantified correlation relationship.
- 01:38 If there is no relationship, don't try to calculate the regression line.
- 01:42 You may be able to create a line, but it's meaningless and arbitrary.
- 01:46 If there was no correlation, any perceived interactions were just random chance.
- 01:51 We calculate this line so we can explain the behavior of the dependent variable.
- 01:56 And assuming that variable is meaningful in the overall problem we are analysing,
- 02:00 that line also allows us to predict the dependent variable
- 02:04 based upon the value of the independent variable.
- 02:07 The regression line will treat the independent variable and
- 02:10 dependent variable differently.
- 02:12 The independent variable is normally denoted by X, and
- 02:16 all of the relationship constant in some factors act on it.
- 02:20 The dependent variable is denoted by Y,
- 02:22 and it will be the result of the mathematical calculations in the equation.
- 02:27 Our software application will determine the formula for
- 02:30 that relationship that best fits the data provided.
- 02:33 It is called simple because there is only one independent variable.
- 02:37 It is called linear because the form will be in that of a straight line and
- 02:41 regression refers to the fact that there is a relationship
- 02:44 between the independent and dependent variables.
- 02:47 If you think back to your algebra days,
- 02:49 the formula of that line will be in the form that Y = MX + B.
- 02:55 Now let's get in to the math of that linear regression equation.
- 02:59 The equation is the best fit line for the two variables.
- 03:02 In the real world, the variables virtually never fall exactly on the line.
- 03:07 So this line is calculated so that the difference between the line and
- 03:11 the actual point is minimized.
- 03:13 While the math of the line would allow the line to extend
- 03:16 indefinitely in either direction, the physical world often has limitations.
- 03:21 For example, if one of the variables is elapsed time, we can't have negative
- 03:26 elapsed time, or there might be a physical limit to one of the variables.
- 03:30 For instance, the temperature to a fluid will only go up so
- 03:33 far before the fluid turns into a gas.
- 03:36 Both Excel and Minitab will calculate the regression line.
- 03:40 If you're doing it in Excel, select the Data Analysis menu in the data ribbon, and
- 03:44 then select the Regression function.
- 03:46 If you're doing it with Minitab, start with the Stat pulldown menu,
- 03:51 select Regression, and then select Fitted Line Plot.
- 03:56 Let's look at an example.
- 03:58 This example is a high school test scores based upon the number of study hours.
- 04:03 The null hypothesis is that the study hour has no impact on test scores.
- 04:08 The alternative hypothesis is that study hour is do impact test scores.
- 04:13 At this point, we haven't determine let's positive or a negative impact.
- 04:18 The study hours and test scores were entered in to data columns
- 04:22 when the fitted line plot on mini tablet selected.
- 04:25 This is the result.
- 04:26 It definitely looks like there is a positive relationship.
- 04:29 The more students studied, the higher the score.
- 04:32 The slope of the line is the angle or rise over run.
- 04:35 Net value is 3.207.
- 04:38 There's also a value for the intercept of the line with the Y axis,
- 04:43 a nd that is 61.25.
- 04:44 This is a great example of a regression line that has limits on it.
- 04:48 A student could not study a negative number of hours.
- 04:52 And if they studied hundreds of hours,
- 04:54 they still could not get more than a 100% score on the exam.
- 04:57 So when creating these regression lines, be aware of those limitations.
- 05:02 Excel and Minitab will not understand that, they'll just do the math for us.
- 05:06 We also have a Pearson coefficient of 0.934.
- 05:09 That's just about as high as you can get, and
- 05:13 the p-value is 0, definitely reject the null hypothesis.
- 05:18 There is a relationship between studying and doing well on the test.
- 05:23 By the way, for those of you planning on sitting for the ISSC Green belt or
- 05:27 Black belt exam ,there is also a relationship between studying for
- 05:31 that test and the score that you achieved.
- 05:34 So if that's part of your plan, be sure to study the quizzes, exercise, and
- 05:38 reference guides that we provide.
- 05:40 The linear regression line helps us to understand the relationship between
- 05:45 the independent variable and the dependent variable.
- 05:48 We use this to both understand the problem and to plan our solution.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.