Retired course
This course has been retired and is no longer supported.
About this lesson
Exercise files
Download this lesson’s related exercise files.
Descriptive Statistics.docx63.6 KB Descriptive Statistics - Solution.docx
59.7 KB
Quick reference
Descriptive Statistics
Lean Six Sigma methodology relies heavily on statistical analysis of problems and solutions. A single data point is not sufficient, rather a collection of data is needed for analysis. This collection will have some natural variability within it and descriptive statistics explain the boundaries of that variability.
When to use
Descriptive statistics are used whenever there is a data set to be analyzed. That will occur in the Measure, Analyze, Improve, and Control phases.
Instructions
While a single data point is interesting, a set of data values for a process parameter provides a much richer and more complete picture of that aspect of the process. In fact, the more data points the more accurate the picture. However, a large set of numbers is awkward to work with, so the data set is described using a set of standard statistical measures. These descriptive statistics are used throughout the Lean Six Sigma process.
The first three statistics are often used to describe the central tendency of the data set.
Mean – the average value of the data set. It is calculated by adding all the values in the data set and dividing that sum by the number of data values. It is often expressed as (x̄). The mean can be heavily influenced by outlier values.
Median – the middle point in the data set. Order the data set from smallest value to largest value. The center point is the median. If the data set has an even number of data points, the average of the two center points is the median. This is a better measure of central tendency when the data is skewed or there are outliers.
Mode – the most frequently occurring value within the data set. This statistic is seldom used in Lean Six Sigma.
The next three statistics are used to describe some aspect of the span or width of the data set.
Range – the value of the data set span. Subtract the smallest data value from the largest data value.
Deviation – the span from the average value of the data set to specific data point. Deviation is always associated with a specific point, not the entire data set.
Standard Deviation – The square root of the average of deviation squared. This value provides a measure of the width of the data set that accounts for the central tendency and the full range of the data. This statistic is often represented by the Greek symbol, σ.
Hints & tips
- Know these definitions and how to find these values. You will be using them often.
- 00:05 Hi, I'm Ray Sheen and it's time to lay a foundation for
- 00:08 the Lean Six Sigma Statistics.
- 00:11 Descriptive Statistics will be the building blocks that we'll start with.
- 00:15 Statistics are a way to describe a data set.
- 00:19 So, lets start there.
- 00:21 A data set is a collection of data points associated with a problem or
- 00:24 process parameter.
- 00:26 While one data point is interesting, a data set which is made up of many data points
- 00:30 would provide a better description of that process parameter.
- 00:34 In fact the more data points you have, the more complete the picture.
- 00:38 When working with variable data, there are some data set characteristics
- 00:42 that can accurately describe that process parameter.
- 00:46 Those characteristics are mean, median,
- 00:48 mode, range, deviation and standard deviation.
- 00:53 Many times the data set will be a combination of variable and
- 00:56 attribute data.
- 00:57 So for instance in this table we have the attribute data element of a region, East,
- 01:02 North and West, or the reason for the service outage.
- 01:05 But then we have the variable data element of how much time until service was restored
- 01:11 and how many customers were affected.
- 01:14 Let's first discuss mean, median and mode.
- 01:17 All three of these are statistics that tell us something about the central
- 01:20 tendency or the most common value for the data set.
- 01:24 First is the mean or average value.
- 01:27 This is often represented by an x with a bar over it and called xbar.
- 01:32 The mean is very easy to calculate.
- 01:34 Just add all the data values and divide that sum by the number of data points.
- 01:39 The mean is our favorite measure when we have normal data.
- 01:41 And I'll talk about that more in other sessions.
- 01:45 One caution with the mean.
- 01:46 When there are large outliers in the data set, and
- 01:49 that means a point which is extremely high or extremely low.
- 01:53 The mean can be significantly influenced by those few points.
- 01:58 Next is the median.
- 01:59 This is the middle point in the dataset.
- 02:01 To find the median, you must first take all the values in your dataset and
- 02:06 put them in order by size from the smallest to the largest.
- 02:10 Then, if there are odd number of points, the middle point is the median.
- 02:14 If there are an even number of points,
- 02:16 the median is the average of the two center values.
- 02:20 The median is used when we have non-normal or skewed data since it provides
- 02:25 a better indication of central tendency than the mean does in that case.
- 02:31 The last measure is the mode.
- 02:33 This is easiest one to find since it is just the data value that occurs
- 02:37 most frequently.
- 02:38 While easy to find, this value is seldom used with Lean Six Sigma statistics.
- 02:44 So let's shift gears and now consider range deviation and standard deviation.
- 02:49 Whereas the mean and median were looking at the center of the distribution,
- 02:52 these statistics look at the edges of the distribution.
- 02:55 They will tell us something about the span, or
- 02:57 the width of the data from lowest to highest.
- 03:01 The first one is the range.
- 03:03 This is normally easy to determine.
- 03:05 Again, we ordered data points in the data set, from lowest to highest.
- 03:09 Subtract the value of the lowest from the highest, and you have the range.
- 03:13 It is the distance between the two extreme values in the data set.
- 03:18 Range is the distance from minimum to maximum.
- 03:21 Deviation is also the distance between two values.
- 03:25 However, in this case it is the difference between the value of a data point and
- 03:29 the mean or average of all the data points in the data set.
- 03:33 So deviation is always associated with a specific data point.
- 03:38 And it tells us how close that point is to the center of the data.
- 03:42 If the deviation was zero,
- 03:43 it would mean that data point was exactly equal to the mean of the data set.
- 03:48 No information about a single data point is interesting.
- 03:51 What we really want to have a statistic that describes the entire data set.
- 03:55 And that is why we often use the standard deviation in our statistical analysis.
- 04:00 The standard deviation provides a sense of the size of the expected data range.
- 04:06 This is calculated by taking each deviation,
- 04:09 which I just discussed a moment ago, and squaring each of them.
- 04:12 That means to multiply the deviation by itself.
- 04:15 Then add all of those deviation squared and divide by the number of data points.
- 04:21 Now finally, we take the square root of the result of that division.
- 04:26 This is the standard deviation calculation and
- 04:29 it's normally represented by the Greek symbol, sigma.
- 04:32 A standard deviation is an excellent statistic for
- 04:36 estimating the normal range or span of data that will occur for a data set.
- 04:41 In fact, the six sigma methodology derived its name
- 04:44 from the standard deviation sigma.
- 04:46 The goal of six sigma was to create a process with a standard deviation
- 04:51 that was so small, that a range of -6 sigma to + 6 sigma could fit
- 04:56 within the allowable tolerance limits for the process.
- 05:02 These basic descriptive statistics will be referred to again and
- 05:06 again throughout this course.
- 05:08 And they will be covered exhaustively on the IASSC exam.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.