Locked lesson.
About this lesson
Simple linear regression analysis creates an equation that correlates two factors. This equation assists in understanding problems, and it can also be used to manage the problem or process going forward. This lesson shows how to calculate this line with the help of either Excel or Minitab.
Exercise files
Download this lesson’s related exercise files.
Simple Linear Regression Exercise.xlsx10.5 KB Simple Linear Regression Exercise Solution.docx
201.6 KB
Quick reference
Simple Linear Regression
Simple linear regression is the creation of a formula that shows the straight-line relationship between two correlated variables. One variable is the independent variable and the other is the dependent variable.
When to use
Once correlation is established between two factors, if those factors are continuous variables, a simple linear regression line formula can be created. This formula can be used to determine the effect of the independent variable on the dependent variable during the Analyse Phase and for predicting process performance when designing a solution during the Improve phase.
Instructions
Once correlation has been established between two continuous variables, then a simple linear regression line can be determined. This line is in the format of:
y = a +bx + ε.
In this equation, “x” is the independent variable and “y” is the dependent variable. Also, “b” is the slope of the line and represents the actual correlation relationship. “a” is the y-intercept for the line and is needed to establish the correct values in order to use the line for prediction. The “ε” value is the sum of the residuals and should be equal to zero if this is a “best fit” equation.
In real life, the actual data points are seldom exactly on the line, but when there is high correlation, the data points will be close to the line.
While this line is valuable for both investigating root causes and predicting performance when designing a solution, there are some limitations. Mathematically, the line extends in either direction to infinity. In reality, there are almost always limits. For instance, an analysis found that the amount of study time by a student was correlated with the student’s score on the test, so a simple linear regression line was created. However, a student cannot study for negative time, so the lower limit on the independent variable was zero, Furthermore, a student could not get better than a perfect score on the test, so the upper limit on the dependent variable was 100%.
Both Excel and Minitab will determine the coefficients needed to create a simple linear regression line.
- Excel:
- Data Analysis
- Regression
- Enter the range of the data, similar to what was done to check correlation
- Minitab
- Stat
- Regression
- Fitted Line Plot
- Enter the column for the Y variable and the X variable
- Ensure “Linear” is selected
Hints & tips
- Always check correlation first. Excel and Minitab can calculate a line even when there is not correlation. But that line is meaningless. Since there is no correlation, the next y value will not truly be related to the x value and the regression line is not able to predict the next y value.
- The Excel regression function provides more information than just the linear regression line coefficients. We will discuss the other information with the appropriate hypothesis test.
- This solution is a linear regression line. The best fit may be a non-linear line, which will be discussed in another lesson. Check the P value and the graph of your data to determine if a linear regression is the best fit.
- 00:04 Hi, I'm Ray Sheen.
- 00:05 The next type of hypothesis testing to discuss is simple linear regression.
- 00:11 This is useful both in the analysis phase to understand what's happening,
- 00:16 but also in the improve phase of a Lean Six Sigma project to design a solution.
- 00:21 We start with a hypothesis testing decision tree.
- 00:24 If we've established that there is a relationship between a continuous
- 00:28 independent variable and a continuous dependent variable,
- 00:31 then we can determine if there is a linear regression line for these variables.
- 00:36 Linear regression builds on the concept of correlation.
- 00:40 Recall that the correlation exists when there is a relationship between two or
- 00:44 more things.
- 00:45 For instance, when one goes up the other goes down.
- 00:49 The correlation relationship can turn into a formula or
- 00:53 equation that shows the relationship.
- 00:55 As the independent factor or variable is changing,
- 00:59 the dependent variable changes according to a set rule or pattern.
- 01:04 If correlation was established, the formula can be generated, and
- 01:08 we can then use that to uncover the root cause or causes of a problem.
- 01:12 The defect in the dependent variable can be traced back to a certain value or
- 01:17 effect that was occurring in the independent variable.
- 01:20 This formula can also be used to design the solution in the improved phase.
- 01:24 Since it depicts the dependent variable response, controls can be set up on
- 01:29 the independent variable to ensure that the response is always acceptable.
- 01:33 Let's explore this idea of a simple linear regression formula a bit more.
- 01:38 The regression line is meant to quantify the correlation relationship.
- 01:42 If there is no relationship, don't try to calculate a regression line.
- 01:46 You may be able to actually create a line,
- 01:49 but it's meaningless because there is no relationship.
- 01:53 If there was no correlation,
- 01:55 any perceived interactions would be just due to random chance.
- 02:00 We calculate this line so we can explain the behavior of the dependent variable.
- 02:05 And assuming that the variable is meaningful to the overall problem,
- 02:08 we can then analyze that line to allow us to predict the dependent variable based
- 02:13 upon the value of the independent variable.
- 02:15 The regression line will treat the independent variable, and
- 02:19 dependent variable differently.
- 02:21 The independent variable is normally denoted as x, and
- 02:25 all the relationship constants and factors act on it.
- 02:28 The dependent variable, denoted as y, will be the result of
- 02:33 the mathematical calculations against the x in the equation.
- 02:37 Our software applications will determine
- 02:40 the form of that relationship that best fits the data provided.
- 02:43 It's called a simple regression, because there's only one independent variable.
- 02:48 It may be a linear aggression if the formula will be a straight line.
- 02:52 And regression refers to the fact that there is a relationship between
- 02:56 the independent and dependent variable.
- 02:58 The formula of the line that we'll be using is in the form of Y= a+bx,
- 03:04 and sometimes a term for the residuals.
- 03:08 Now, let's get into the math of the linear regression equation.
- 03:12 The equation is the best fit line for two variables.
- 03:16 In the real world, the variables virtually never fall exactly on a line.
- 03:21 So the line is calculated so that the difference between the line and
- 03:26 the actual points is minimized.
- 03:28 We would love for it to be 0, and we'll get as close to that as possible.
- 03:33 While the math of this line would allow the line to extend indefinitely in
- 03:38 either direction, the physical world often has limitations.
- 03:42 For instance, if one of the variables is elapsed time,
- 03:46 we can't have negative elapsed time.
- 03:48 Or there might be a physical limit to one of the variables.
- 03:52 For instance, temperature of a fluid will only go up so
- 03:56 far before the fluid turns into a gas.
- 03:59 Both Excel and Minitab will calculate the regression line force.
- 04:04 If doing it in Excel, select the Data Analysis menu in the data ribbon,
- 04:08 and then select the regression function.
- 04:10 If doing it in Minitab, start with the Stat pulldown menu,
- 04:14 select Regression, and then select Fitted Line Plot, as shown in this panel.
- 04:20 Let's look at an example.
- 04:22 This example is a high school test scores based upon the number of study hours.
- 04:28 The null hypothesis is that study hours have no impact on test scores.
- 04:33 The alternative hypothesis, is that the study hours did impact test scores.
- 04:38 At this point, we haven't determined if it is a positive or a negative effect.
- 04:43 The study hours and test scores will be entered into the data columns, and
- 04:47 when the fitted line plot on Minitab is selected, this is the result.
- 04:50 It definitely looks like there is a positive relationship.
- 04:54 The more study hours, the higher the score.
- 04:57 The slope of the line is the angle or rise over the run.
- 05:02 This value is 3.207.
- 05:05 There is also a value for the intercept of the line with the y-axis,
- 05:09 essentially when the x is 0, and that's 61.25.
- 05:13 This is a great example of a regression line that has limits on it.
- 05:17 A student could not study a negative number of hours.
- 05:21 And if they studied hundreds of hours,
- 05:23 they still could not get more than a 100% score on the test.
- 05:27 So when creating these regression lines, be aware of these limitations.
- 05:31 Excel and Minitab will not understand that, they'll just do the math.
- 05:35 We also have a Pearson coefficient of 0.934, and the p-value is 0.
- 05:41 Definitely reject the null hypothesis.
- 05:44 There is a relationship between studying and doing well on the test.
- 05:49 And, by the way, for those of you planning to sit for the Isaac Greenbelt or
- 05:53 Black belt exam, there is also a relationship between studying for
- 05:57 that test and the score you achieve.
- 05:59 So if that is part of your plan, be sure to study the quizzes, exercises, and
- 06:04 reference guides.
- 06:05 The linear regression line can help us to understand the relationship between
- 06:09 the independent variable and the dependent variable.
- 06:12 It's useful both in analysis and when planning our solution.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.