Locked lesson.
About this lesson
When the process or problem data set has multiple characteristics, there are a set of graphing techniques that can show these effects. Although more complex than the basic techniques, they are easy to use and create a picture of the data set.
Exercise files
Download this lesson’s related exercise files.
Graphing Complex Data Exercise.docx69 KB Graphing Complex Data Exercise Solution.docx
76 KB
Quick reference
Graphing of Complex Data
When the process or problem data set has multiple characteristics, there is a set of graphing techniques that can show these effects. Although more complex than the basic techniques, they are easy to use and create a picture of the data set.
When to use
Graphical analysis is an excellent way to visualize patterns and key insights from data. Graphical analysis is also an excellent way to communicate data to the Lean Six Sigma team and stakeholders. These techniques should be used whenever discussing data with team members or stakeholders.
Instructions
Graphical analysis creates a picture of the data which helps to put the data into context. Graphical analysis can display a great deal of data on one graph and the patterns in the data can reveal problems and correlation between process parameters and factors. Three graphs are often used with multi-variate data.
Horizontal Bar Chart
The horizontal bar chart is an excellent technique with attribute data. It is similar to the vertical bar charts but with the bars running horizontally. The categories of the data are shown on the vertical axis and are represented by rows on the chart. The horizontal axis is the count of instances for the categories – not a time scale. The rows of data are normally sorted so that the longest data bar is at the top and the shortest at the bottom. Unlike the vertical bar chart, there is no limit to the number of rows shown on the chart.
Pie Chart
The pie chart is a graphical illustration of relative percentages of attributes or categories. The width of each slice of the pie shows the percentage associated with that attribute value. Pie charts are frequently used in comparison such as comparing different pies for different locations or different products. I have also used them for comparing before and after conditions in the product or process being improved.
Box Plots
The box plot chart is normally used to present a family of box plots. Each box plot represents a data set such as for multiple locations, multiple products, or multiple customers. The box plot requires variable data and the different data set is normally separated based on attribute data. The data values in the plot are sorted from largest to smallest. Five points in the data set are used to create the box plot. The minimum value, the maximum value, the midpoint (median value), the value at the 25% point in the sorted data and the value at the 75% point in the sorted data. A horizontal line is created that is the length of the spread of the data – one end is the minimum and one the maximum. A box is placed over the line that is located so that the 25% point is one side and the 75% point is the other side. Finally, the median value is shown with a line through the interior of the box. In some cases, a few points may be shown as outliers with an “x” that extends past the endpoints of the box plot. To identify the outliers, determine the magnitude of the spread from the 25% point to the 75% point. Multiply that value by 1.5. Any points that are more than this value below the 25% point or above the 75% point are outliers. In that case, show the points with an “x” and shift the endpoint of the horizontal line to the first data point that is within the calculated range. While I have described box plots that are oriented horizontally, they can also be oriented vertically.
Data Tables
Multi-variate data items can also be shown in a table. The table is structured so that each row is a data item. Each column represents one of the categories of data – either variable or attribute. Normally, the table will be sorted from highest to lowest value using data found in one of the columns. Large tables of numbers can be difficult to read, so the units for numeric data should be chosen so that most data values have two or three digits.
Hints & tips
- When creating a horizontal bar chart with a large number of rows (40 or 50) it is often helpful to use multiple colors for the bars. However, create a pattern of colors and maintain that pattern so that people won’t think that the color is also a data attribute.
- If the categories on a bar chart are “buckets” or a range on a sliding scale, the graph may look different depending on the width of the “bucket.” Try various widths to see which provides the most in sight.
- Don’t use time-based data items with a horizontal bar chart. People try to make the horizontal axis a timeline instead of a count of instances.
- Typically limit the pie chart to about six or seven slices otherwise some slices are so small they are unreadable.
- Box plots provide a visual cue for whether data sets are similar. If you are not sure about similarity, you will need to do a statistical analysis.
- Typically, box plots are vertical with discrete “x’s” and continuous “Y”, and horizontal with the discrete “Y” and continuous “x’s.”
- If your graph has data points that sit directly on top of each other so that the top one hides all below it, apply a jitter function to the output so as to create clusters of data points.
- Graphs are supposed to help us understand the data. Don’t let your graphs become so complex that they are confusing to read.
- 00:05 Hi, I'm Ray Sheen.
- 00:06 We've looked at some basic graphical analysis techniques.
- 00:10 Let's consider now a few that work better with more complex data.
- 00:14 I'll start with the horizontal bar chart.
- 00:16 I've already discussed the vertical bar chart.
- 00:19 The horizontal bar chart has many similarities.
- 00:22 It's an excellent chart for counting instances of an attribute and
- 00:25 showing them graphically.
- 00:26 The length of the horizontal bar shows the magnitude of an attribute being counted.
- 00:32 To make it easy to read,
- 00:34 I recommend that you sort the categories from largest to smallest.
- 00:38 That way the most significant Items are at the top of the list.
- 00:42 One of the advantages of the horizontal bar chart over the vertical chart is that
- 00:46 you can exceed the 8-category rule that we had suggested for vertical bar charts.
- 00:51 You can have as many rows as you want.
- 00:54 I personally have used over 50 rows, and
- 00:56 I have seen horizontal bar charts with more than 100.
- 00:59 On a vertical bar chart, it's very difficult to read the different categories
- 01:03 when there are large number of bars.
- 01:05 That's not the case with the horizontal bar chart.
- 01:08 We've just extended chart onto the next page and the next page.
- 01:12 However, like with the vertical bar chart, I recommend you use just one color.
- 01:18 I have occasionally used multiple colors so
- 01:20 that it's easier to distinguish the lines.
- 01:23 But in that case, I always repeat the same color pattern so that no one begins
- 01:28 to think that the colors have significant meaning in and of themselves.
- 01:33 One caution when using the horizontal bar chart,
- 01:35 don't use it with time-based attribute data.
- 01:38 Inevitably, people begin to think that the length of the bar represents the amount of
- 01:43 time rather than the attribute that you're dealing with.
- 01:46 Next, I want to discuss the pie chart.
- 01:49 What makes this chart a little more difficult is that
- 01:52 it shows percentages instead of raw data.
- 01:55 Now, you can build it with raw data, but the visual representation of
- 01:59 the categories is always as a percentage of the whole.
- 02:02 This chart is really good for comparison charting,
- 02:05 showing how the percentage of a category changes based upon
- 02:09 different conditions such as location, product line, or comparing the before and
- 02:14 after conditions on a process that you've been analyzing and improving.
- 02:18 To make it easy to read, I try to limit the number of slices to six.
- 02:25 You can go a little higher if they are roughly equal slices.
- 02:28 But if you have several dominant ones, the minor ones are so
- 02:32 small that you might as well just lump them together and call it miscellaneous.
- 02:38 And in this case, colors are very helpful for
- 02:41 being able to distinguish between the different size of the slices.
- 02:45 As with the vertical and the horizontal bar charts,
- 02:48 this chart helps to show what factors are dominant.
- 02:51 Of course, the big difference is that this chart uses percentages.
- 02:56 A very interesting chart to discuss is the box plot.
- 03:00 Box plots are ideal for
- 03:02 showing the comparison between datasets because it shows the median and
- 03:07 extreme values of each dataset and a level of central tendency.
- 03:11 I use box plots with attribute data to compare how datasets from
- 03:15 different locations or customers or product lines compare to each other.
- 03:21 The box plot quickly reveals when one data set for
- 03:24 an attribute is very different from all the others.
- 03:27 When discussing box plots, we often talk about the box and whiskers.
- 03:31 As you can see, each data set has a box with a vertical line somewhere inside it,
- 03:36 and then the two horizontal lines extending out of either end.
- 03:40 To create the box and whiskers, five values are needed from the data set.
- 03:45 To get these five values, first order the data set from largest to smallest.
- 03:51 The largest and smallest values are normally the two endpoints for
- 03:57 the horizontal lines, the whiskers.
- 04:00 The midpoint of the data or median, is the vertical line and the center of the box.
- 04:03 The 25% and 75% points of the dataset are the vertical sides of the box.
- 04:11 This then shows the data spread, the central tendency of the data and
- 04:15 whether the data is biased towards the high end or the low end of the spread.
- 04:20 Notice in some cases that there are a few xs that are outside the box and whiskers,
- 04:25 when a data point is more than one and a half times the span between the 25%
- 04:31 value and 75% value, well then those are classified as outliers.
- 04:36 These values are shown as an x and the end of the whiskers is now placed
- 04:40 at the first data point that is not an outlier.
- 04:44 Comparing boxes for different data sets can visually reveal which ones
- 04:48 are the same and which ones are different.
- 04:51 In the illustration shown here, all four boxes have a similar upper value.
- 04:55 However, the lower values, midpoints, and width of the boxes are quite different.
- 05:00 The top two boxes are similar and the bottom two boxes are similar.
- 05:05 Box plots can also be graphed vertically in addition to
- 05:08 the horizontal depiction that we have here.
- 05:10 Now a few caveats about data plots.
- 05:13 With respect to bar charts, the data categories are normally obvious
- 05:17 categories that are related to the problem.
- 05:20 Sometimes, instead of discrete categories,
- 05:23 a range of data values will be measured and grouped together.
- 05:27 Depending upon the interval used,
- 05:30 different pictures of the data can be exposed.
- 05:33 Now let me talk for a minute about the box plots that use the y value and
- 05:36 some of its attributes.
- 05:38 When the y value is continuous function, but the x values are discreet,
- 05:43 we normally orient the box plot in a vertical fashion.
- 05:47 When the y value is discreet but
- 05:49 the x values are continuous then we normally orient it horizontally.
- 05:54 Finally a jitter function can be very helpful with some plots.
- 05:57 The jitter function is used when dataset is likely to have multiple points
- 06:02 with the same value.
- 06:04 When that is the case, many of the data points are hidden by other points.
- 06:09 The jitter function will vary the point value slightly so
- 06:12 that instead of a single point, there is now a cluster of points.
- 06:16 This provides a visual cue that there are many data points at that location on
- 06:21 the diagram.
- 06:22 Finally, let's look at putting data into tables.
- 06:25 I know it's not really a graph, but there are a few guidelines to follow to make it
- 06:30 easier to find and understand tabular data.
- 06:32 Sometimes there are multiple correlation relationships across attributes within
- 06:37 a data set.
- 06:38 These can be hidden when the data in the table because all the numbers tend to blur
- 06:42 together.
- 06:43 There are no pictures of what the data is saying.
- 06:46 However, when a data set does have a multiple attribute values,
- 06:50 I like to start with a data table.
- 06:52 In the table, each data point is a row, and
- 06:55 the columns represent the different attribute categories.
- 06:59 Like in the table shown here, where we have associates in a call center and
- 07:03 we have their names, years of experience, the number of errors or defects they
- 07:07 created in the past month, and their professional certification level.
- 07:11 Now and to understand what the data is telling us,
- 07:14 I can sort the data based upon any of these columns.
- 07:18 For instance,
- 07:19 I could sort the table based upon the alphabetical listing of the names.
- 07:24 But that doesn't really provide any useful insight in the data.
- 07:28 However, when sorting by the number of years of experience as shown here,
- 07:32 we can draw a few conclusions.
- 07:34 We see that the number of years of experience does appear to correlate with
- 07:38 the training and certification level, but
- 07:40 it does not appear to have any significant effect on the number of defects.
- 07:45 Apparently, neither experience, training, or
- 07:48 certification seem to affect the number of defects.
- 07:51 One suggestion when using tables for showing data is to try to use units
- 07:56 that result in table values with only two or three significant digits.
- 08:01 Don't show data values that have long strings of numbers.
- 08:05 They become mind-numbing for almost all the stakeholders.
- 08:09 These additional graphical techniques can be very useful when your data sets
- 08:13 have several levels of complexity.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.