Retired course
This course has been retired and is no longer supported.
About this lesson
Exercise files
Download this lesson’s related exercise files.
Correlation.xlsx10.4 KB Correlation - Solution.docx
220.4 KB
Quick reference
Correlation
Correlation is the condition when two factors are related. When the value of one changes, the value of the other will also change.
When to use
Use correlation with continuous data. It is used to determine if the value of two factors is related.
Instructions
There are two types of correlation, positive and negative. Positive correlation is the condition where as one factor increases, the other factor increases. Negative correlation is the condition where as one factor increases the other factor decreases. In perfect correlation, the rate of increase or decrease is always the same. In real world applications the rate will often vary.
A hypothesis test for correlation is often used in the Analysis phase of a project to determine which factors are related. The Null hypothesis states that the factors are not related. The Alternative states they are related and sometimes will even indicate whether that relationship is positive or negative. Be careful assuming too much with correlation. Correlation does not mean causation. The factors may be related because they are moving the same way, but that movement may not be caused by one of those factors but rather by another factor that you have not included in your analysis.
The degree of correlation is measured with the Pearson correlation coefficient. The Pearson will vary from -1 to +1. Minus 1 is perfect negative correlation. Plus 1 is perfect positive correlation. Zero is no correlation. Correlation is often illustrated by using a scatterplot or scatter diagram of the two factors. Positive correlation will have an upward slope. Negative correlation will have a downward slope. No correlation will either be a scatter with no obvious relationship or a horizontal or vertical line, showing that one factor is not impacting the other.
Both Excel and Minitab will check correlation. There must be the same number of values in each data column or row. The columns or rows should be aligned so that it is clear which values go with which point.
- Excel:
- Data Analysis
- Correlation
- Enter the range of the independent and dependent variables.
- Minitab:
- Stat
- Basic Statistics
- Correlation
- Select the data columns that you want to check for correlation.
Hints & tips
- Use the scatter diagram or scatter plot graphing function to illustrate the correlation.
- Minitab provides a P value in addition to the Pearson value. Excel does not.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 Let's now discuss correlation and the use of the correlation tests.
- 00:10 Notice that within the word correlation is the term relation.
- 00:15 And that's just what we're looking for here, how are things related?
- 00:19 That's the purpose of this testing.
- 00:21 So let's start with a hypothesis test decision tree.
- 00:26 When we have normal data that is continuous,
- 00:28 both the dependent variable and the independent variable,
- 00:30 then we need to consider how many independent variables we are analyzing.
- 00:35 Each independent variable can be analyzed with respect to the dependent variable
- 00:39 to determine if there is a correlation between them.
- 00:42 Let me explain what we mean by correlation and
- 00:44 it's effect upon the dependent variable.
- 00:47 We say that a correlation exists when there is a measurable relationship between
- 00:51 the independent variable and the dependent variable.
- 00:54 Correlation is investigated to identify what factors need to be controlled in
- 00:58 conducting a Lean Six Sigma project.
- 01:01 When the question is correlation,
- 01:02 the null hypothesis is that there is no correlation, nothing to see here.
- 01:07 The alternative hypothesis is that there is a correlation
- 01:10 between the two variables.
- 01:12 A positive correlation exists when two factors move together.
- 01:16 As one gets bigger, the other one also increases.
- 01:20 And a negative correlation exists when they move in opposite directions.
- 01:24 As one gets bigger, the other gets smaller in value.
- 01:27 One important caution when doing correlation,
- 01:30 correlation does not mean causation.
- 01:33 The dependent variable may be moving because of a change in
- 01:36 the independent variable.
- 01:37 But it's definitely possible that both of these variables
- 01:41 are actually moving because of a change in a totally different factor
- 01:44 that is not included in the analysis.
- 01:47 For instance, we could find that there is a correlation between the number of shoes
- 01:50 a person owns and that person's income.
- 01:53 That does not mean that if you go out and buy a lot of shoes,
- 01:56 you'll automatically become wealthy.
- 01:59 Let's look at the Pearson correlation coefficient.
- 02:02 We can measure the level of correlation using a statistical measure that is called
- 02:06 the Pearson Coefficient.
- 02:08 This coefficient is referred to as the R value.
- 02:11 The Pearson Coefficient takes on a value between -1 to +1.
- 02:16 +1 is perfect positive correlation, -1 is perfect negative correlation.
- 02:21 And 0 is no correlation whatever between the independent and dependent variables.
- 02:26 When the value of the Pearson Coefficient is squared,
- 02:29 it is then known as the coefficient of determination.
- 02:32 This is a measure of the variance of the data points from the linear regression
- 02:36 line that mathematically predicts the correlation.
- 02:39 I'll spend more time on another lesson on how to calculate that line.
- 02:43 For now, though, we want to acknowledge that the R2 value is often used
- 02:48 as an additional measure of the level of correlation.
- 02:51 Both Excel and Minitab will calculate the Pearson Coefficient.
- 02:54 I'm showing the Minitab selection screens here, but
- 02:57 let's walk through the step by step analysis.
- 03:00 It's easy to check correlation in both Excel and Minitab.
- 03:04 In Excel, you will choose the data analysis menu from the data ribbon and
- 03:08 the select the correlation function.
- 03:11 Now enter the the range of your data, both the dependent variable and
- 03:14 the independent variable.
- 03:15 These two datasets should be in adjacent rows or columns.
- 03:19 Make sure you have the exact same number of data points in each set and
- 03:22 that they're aligned so that the paired data points are adjacent.
- 03:26 It's just as easy in Minitab.
- 03:28 Select the Stat pull-down menu, select Basic Statistics and
- 03:32 then select Correlation.
- 03:33 As I showed on the previous slide.
- 03:36 This will pull up this panel.
- 03:38 Select the data columns for analysis.
- 03:40 These don't need to be adjacent, you can select each column separately.
- 03:44 Just highlight the column name in the window on the left and
- 03:48 then click on the select button that is below that window.
- 03:51 When that happens, the column name should appear in the variable window.
- 03:55 Again, make sure there are exactly the same number of data points in both
- 03:58 columns.
- 03:59 Minitab will check for the correlation based upon the pairing of
- 04:02 these two values, row by row as it goes down the column.
- 04:06 Whether Excel or Minitab, the Pearson Coefficient will be calculated.
- 04:09 Let's finish this lesson with a discussion of the graphical representation of
- 04:14 correlation.
- 04:15 The graphical display of the relation between the dependent variable and
- 04:18 the independent variable is one of the best ways to communicate that
- 04:22 a correlation exists.
- 04:23 In fact, I often will start with a visual check of different data items.
- 04:28 And when I see what looks like a visual correlation, I will then test it with
- 04:31 the statistical analysis of the relationship or lack of relationship.
- 04:35 Lemme show you what I mean.
- 04:36 If I saw a plot where the data points created something like this,
- 04:40 I would expect a positive correlation and a positive Pearson value.
- 04:44 But if the plot looked like this, I would expect a negative correlation and
- 04:48 a negative Pearson value.
- 04:50 If there was no correlation at all, the plot would be all over the place and
- 04:54 I would expect that the Pearson would show no correlation.
- 04:58 If the plot is a flat horizontal line or
- 05:00 a vertical line, it would also have a Pearson of zero.
- 05:04 Visual correlation plots are easy to create.
- 05:07 Both Excel and Minitab refer to them as scatter diagrams or scatter plots.
- 05:12 Each data point has an x and y value based upon the values of the independent and
- 05:16 dependent variable, and the points are plotted based upon their value.
- 05:20 I have found that creating the visual correlation plots is one of the best
- 05:23 ways for communicating to team members or
- 05:25 business leader the relationship between variables.
- 05:29 The old saying of a picture's worth 1,000 words is definitely true,
- 05:32 when we can create a picture of a statistic.
- 05:36 When two variables are correlated, we know that something is happening.
- 05:40 Correlation may not give us the final answer, but
- 05:43 at least it starts us down the right trail.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.