Locked lesson.
About this lesson
A common investigation in Lean Six Sigma problem solving is to determine if two factors are correlated. This insight will often point to an underlying cause of the problem. This lesson explains how to do correlation analysis using both Excel and Minitab. It also includes a discussion of the Pearson correlation coefficient.
Exercise files
Download this lesson’s related exercise files.
Correlation Exercise.xlsx10.6 KB Correlation Exercise Solution.docx
220.2 KB
Quick reference
Correlation
Correlation is the condition when two factors are related. When the value of one changes, the value of the other will also change.
When to use
Use correlation with continuous data. It is used to determine if the value of two factors is related.
Instructions
There are two types of correlation, positive and negative. Positive correlation is the condition where as one factor increases, the other factor increases. Negative correlation is the condition where as one factor increases the other factor decreases. In perfect correlation, the rate of increase or decrease is always the same. In real-world applications, the rate will often vary.
A hypothesis test for correlation is often used in the Analysis phase of a project to determine which factors are related. The Null hypothesis states that the factors are not related. The Alternative states they are related and sometimes will even indicate whether that relationship is positive or negative. Be careful assuming too much with correlation. Correlation does not mean causation. The factors may be related because they are moving the same way, but that movement may not be caused by one of those factors but rather by another factor that you have not included in your analysis.
The degree of correlation is measured with the Pearson correlation coefficient. The Pearson will vary from -1 to +1. Minus 1 is a perfect negative correlation. Plus 1 is perfect positive correlation. Zero is no correlation. A Pearson value above 0.8 or below -0.8 is always statistically significant. A Pearson value between -0.2 and 0.2 is not statistically significant. If the Pearson value is between -0.8 and -0.2 or 0.2 and 0.8 you should also check the P value to determine significance.
Correlation is often illustrated by using a scatterplot or scatter diagram of the two factors. Positive correlation will have an upward slope. Negative correlation will have a downward slope. No correlation will either be a scatter with no obvious relationship or a horizontal or vertical line, showing that one factor is not impacting the other.
Both Excel and Minitab will check correlation. There must be the same number of values in each data column or row. The columns or rows should be aligned so that it is clear which values go with which point.
- Excel:
- Data Analysis
- Correlation
- Enter the range of the independent and dependent variables.
- Minitab:
- Stat
- Basic Statistics
- Correlation
- Select the data columns that you want to check for correlation.
Hints & tips
- Use the scatter diagram or scatter plot graphing function to illustrate the correlation.
- Minitab provides a P value in addition to the Pearson value. Excel does not.
- When working with Minitab, data must be in columns, not rows.
- 00:04 Hi, I'm Ray Sheen.
- 00:05 Let's discuss correlation and the use of correlation tests.
- 00:09 Now, notice that, within the term correlation is the word relation.
- 00:14 And that's exactly what we're looking for here.
- 00:16 How are things related to each other?
- 00:19 So let's start with the hypothesis test decision tree.
- 00:22 When we have two continuous data points, both the dependent variable and
- 00:27 the independent variable,
- 00:28 then we need to consider how many independent variables we are analyzing.
- 00:32 Each independent variable can be analyzed with respect to the dependent variable
- 00:37 to determine if there is a correlation between them.
- 00:41 Let me explain what we mean by correlation and its effect on the dependent variable.
- 00:46 We say that a correlation exists when there is a mutual and measurable
- 00:50 relationship between the independent variable and the dependent variable.
- 00:55 Correlation is investigated to identify what factors need to be controlled
- 01:00 when conducting a Lean Six Sigma project.
- 01:02 When the question is correlation,
- 01:05 the null hypothesis is that there is no correlation, nothing to see here.
- 01:10 The alternative hypothesis is that there is a correlation between the two
- 01:14 variables.
- 01:15 A positive correlation exists when the two factors move together.
- 01:19 As one gets bigger, the other one also increases.
- 01:23 And a negative correlation exists when the factors move in opposite direction.
- 01:27 As the first one gets bigger, the second one decreases in value.
- 01:31 One important caution when doing correlation.
- 01:34 Correlation does not mean causation.
- 01:38 The dependent variable may be moving because of a change in the independent
- 01:43 variable, but it's definitely possible that both of these are moving
- 01:47 because of a change in some other factor that was not included in the analysis.
- 01:53 For example, we could find that there's a correlation between the number of shoes
- 01:57 a person owns, and the person's wealth.
- 01:59 But that doesn't mean that if you go out and buy a lot of shoes,
- 02:03 you will automatically become wealthy.
- 02:05 Let's look at the Pearson correlation coefficient.
- 02:08 We can measure the level of correlation using a statistical measure that is called
- 02:13 the Pearson correlation coefficient.
- 02:15 That coefficient is referred to as the R value, R for
- 02:19 the full population and a small r for a subset.
- 02:23 The Pearson coefficient takes on a value between -1 to +1.
- 02:29 +1 is perfect positive correlation, and -1 is perfect negative correlation.
- 02:35 Of course a value of 0 means there is no correlation whatever,
- 02:38 between the independent and dependent variable.
- 02:41 When the value of the Pearson coefficient is squared,
- 02:44 it is known as the coefficient of determination.
- 02:47 This is a measure of the variance of the data points from the linear
- 02:52 regression line that is mathematically predicting the correlation.
- 02:56 I'll spend an entire lesson on how to calculate that line.
- 03:00 But for now, though, we want to acknowledge that the R squared value
- 03:05 is often used as an additional measure of the level of correlation.
- 03:10 Let's look at the graphical representation of correlation.
- 03:13 The graphical display of the relationship between the dependent variable and
- 03:18 independent variable,
- 03:20 is one of the best ways to communicate when a correlation exists.
- 03:23 I found that that visual display really tells a story for me.
- 03:28 The old saying of a picture's worth a 1000
- 03:31 words is definitely true when you're trying to create a picture of a statistic.
- 03:35 If the Pearson coefficient is greater than 0.8 on a positive or
- 03:40 less than -0.8 on the negative side, I reject the null hypothesis.
- 03:45 If it's less than 0.2 on the positive, or
- 03:49 greater than -2 on the negative correlation, then the null is true.
- 03:54 And if it's between the 0.2 and the 0.8 values, well,
- 03:58 then I go check the p-value to make my final decision.
- 04:02 If I saw a plot where the data points created something like the top graph,
- 04:07 I would expect a positive correlation and a positive Pearson value.
- 04:12 If the graph looked like the second one,
- 04:14 I would expect a negative Pearson coefficient.
- 04:17 And if there was no correlation at all, the plot would be all over the place.
- 04:21 I would expect a Pearson value that shows no quarrel correlation.
- 04:25 And also that would be true if it was a flat horizontal or vertical line,
- 04:30 it would also have a Pearson value of 0.
- 04:33 The visual plots are created using the scatter diagram graphing approach.
- 04:37 Both Excel and Minitab will calculate the Pearson coefficient for you.
- 04:42 In fact, let's take a look at how we would do that in Excel and MiniTab.
- 04:47 In Excel, you would choose the data analysis menu from the data ribbon, and
- 04:52 then select the correlation function.
- 04:54 Now enter the range of your data, both the dependent variable and
- 04:58 the independent variable.
- 05:00 These two data sets need to be in adjacent rows or columns.
- 05:05 Make sure you have the exact same number of data values in each set, and
- 05:09 that they're aligned so that the data points are adjacent.
- 05:12 Then click OK.
- 05:15 It's just as easy in Minitab.
- 05:17 Select the Stat pull down menu, select Basic Statistics, and
- 05:21 then select Correlation.
- 05:23 This will pull up this panel.
- 05:25 Now, select select the data columns for analysis, one for the x value,
- 05:29 one for the y value.
- 05:31 These don't have to be adjacent in Minitab,
- 05:33 you can select each column separately.
- 05:36 Just highlight that column name in the window on the left, and
- 05:39 then click on the Select button that is below it.
- 05:42 When that happens, the column name should appear in the variables window.
- 05:46 Again, make sure there are exactly the same number of data points in both
- 05:51 columns.
- 05:51 Minitab will check for correlation based upon
- 05:54 pairing the two values row by row as it goes down the columns.
- 05:58 And whether you're using Excel or Minitab,
- 06:01 the Pearson coefficient will then be calculated.
- 06:04 When two variables are correlated, we know something is happening there.
- 06:08 Correlation may not give us the final answer, but it puts us on the right track.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.