Retired course
This course has been retired and is no longer supported.
About this lesson
When the process or problem data set has multiple characteristics, there are a set of graphing techniques that can show these effects. Although more complex than the basic techniques, they are easy to use and create a picture of the data set.
Exercise files
Download this lesson’s related exercise files.
Graphing of Complex Data.docx69.8 KB Graphing of Complex Data - Solution.docx
77.4 KB
Quick reference
Graphing Of Complex Data
When the process or problem data set has multiple characteristics, there are a set of graphing techniques that can show these effects. Although more complex than the basic techniques, they are easy to use and create a picture of the data set.
When to use
Graphical analysis is an excellent way to visualize patterns and key insights from data. It is also a great way to communicate data to the Lean Six Sigma team and stakeholders. These techniques are applicable for use with multivariate data.
Instructions
Graphical analysis creates a picture of the data which helps to put the data into context. It makes it possible to display a great deal of data on one graph and the patterns in the data can reveal problems and correlation between process parameters and factors. Three graphs are often used with multivariate data.
Horizontal Bar Chart
The horizontal bar chart is an excellent technique with attribute data. It is similar to the vertical bar charts but with the bars running horizontally. The categories of the data are shown on the vertical axis and are represented by rows on the chart. The horizontal axis is the count of instances for the categories – not a time scale. The rows of data are normally sorted so that the longest data is at the top and shortest at the bottom. Unlike the vertical bar chart, there is no limit to the number of rows shown on the chart.
Pie Chart
The pie chart is a graphical illustration of relative percentages of attributes or categories. The width of each slice of the pie shows the percentage associated with that attribute value. Pie charts are frequently used in comparison such as comparing different pies for different locations or different products. I have frequently used them for comparing before and after conditions in the product or process being improved.
Box Plots
The box plot chart is normally used to present a family of box plots. Each box plot represents a data set such as for multiple locations, multiple products, or multiple customers. The box plot requires variable data. The data values are sorted from largest to smallest. Five points in the data set are used to create the plot. The minimum value, the maximum value, the midpoint (median value), the value at the 25% point in the sorted data and the value at the 75% point in the data. A horizontal line is created that is the length of the spread of the data – one end is the minimum and one the maximum. A box is placed over the line that is located so that the 25% point is one side and the 75% point is the other side. Finally the median value is shown with a line through the box.
Data Tables
Multivariate data items can also be shown in a table. The table is structured so that each row is a data item. Each column represents one of the categories of data – either variable or attribute. Normally, the table will be sorted from highest to lowest value using data found in one of the columns. Large tables of numbers can be difficult to read, so the units for numeric data should be chosen so that most data values have two or three digits.
Hints & tips
- When creating a horizontal bar chart with a large number of rows (40 or 50) it is often helpful to use multiple colors for the bars. However, create a pattern of colors and maintain that pattern so that people won’t think that the color is also a data attribute.
- Don’t use time-based data items with a horizontal bar chart. People try to make the horizontal axis a timeline instead of a count of instances.
- Typically limit the pie chart to about six or seven slices otherwise some slices are so small they are unreadable.
- Box plots provide a visual cue for whether data sets are similar. If you are not sure about similarity, you will need to do a statistical analysis.
- Graphs are supposed to help to understand the data. Don’t let your graphs become so complex that they are confusing to read.
- 00:05 Hi, I'm Ray Sheen.
- 00:06 We've looked at some basic graphical analysis techniques.
- 00:10 Let's consider now a few that work better with more complex data.
- 00:15 I'll start with putting data into tables.
- 00:17 I know it's not really a graph, but there are a few guidelines to follow
- 00:21 to make it easier to find and understand tabular data.
- 00:24 Sometimes there are multiple correlation relationships
- 00:27 across attributes within a data set.
- 00:29 And these are often hidden when the data is in a table because all the numbers
- 00:34 just kinda blend together.
- 00:35 There's no picture of what the data is saying.
- 00:38 However, when a data point does have multiple attribute values,
- 00:42 I like to start with a data table.
- 00:44 In the table each data point is a row, and
- 00:47 the column represents the different attribute categories,
- 00:50 like in the table shown, where we have associates in a call center.
- 00:55 And we have their name, their years of experience, the number of errors or
- 01:00 defects that were created in the past month, and
- 01:03 their professional certification level.
- 01:05 Now to understand what the data is telling me,
- 01:08 I can sort the data based upon any column.
- 01:10 For instance, I could sort the table based upon an alphabetical listing of the names,
- 01:14 although that probably would not provide any useful insight.
- 01:18 However, when sorting by the number of years or experiences shown here,
- 01:22 we can draw a few conclusions.
- 01:24 We see that the number of years experience does appear to
- 01:27 correlate with the training and certification level, but
- 01:29 it does not appear to have any significant effect on the number of defects.
- 01:33 Apparently, neither experience or training and
- 01:36 certification seem to affect that number.
- 01:39 One suggestion when using tables for showing data,
- 01:42 try to use units that result in only 2 or 3 significant digits.
- 01:46 Don't show data tables with long strings of numbers.
- 01:50 They become mind-numbing for most stakeholders.
- 01:53 Let's switch now to the horizontal bar chart.
- 01:56 I've already discussed the vertical bar chart, and
- 01:59 the horizontal bar chart has many similarities.
- 02:02 It is an excellent chart for quantifying instances of an attribute and
- 02:05 showing that graphically.
- 02:07 The length of the horizontal bar shows the magnitude of the attribute being measured.
- 02:12 To make it easier to read,
- 02:14 I recommend that you sort the categories from largest to smallest.
- 02:17 That way, the most significant items appear at the top of the chart.
- 02:21 One of the advantages of the horizontal bar chart over the vertical bar chart is
- 02:25 that you can exceed the eight category rule that we use with vertical charts.
- 02:30 You can have as many rows as you want.
- 02:33 I personally have used over 50 rows, and
- 02:35 I have seen horizontal bar charts with even more than that.
- 02:39 On a vertical bar chart,
- 02:40 it becomes hard to read categories when there's a large number.
- 02:43 That's not the case on a horizontal bar chart.
- 02:46 However, like with the vertical bar chart, I recommend that you use just one color.
- 02:51 I've occasionally used multiple colors so
- 02:53 that it was easier to distinguish the lines.
- 02:55 But in that case, I always repeated the same pattern for the colors so
- 02:59 that no one would start to think that different colors meant different things.
- 03:04 One caution when using the horizontal bar chart,
- 03:06 don't use it with a time-based attribute.
- 03:09 Inevitably people will start to look at the horizontal scale like it
- 03:14 is a time scale instead of a count of the number of instances.
- 03:18 Next I wanna discuss the pie chart.
- 03:20 What makes this chart a little more difficult is it is showing percentages
- 03:24 instead of the raw data.
- 03:26 Now you can build it with raw data, but the visual representation
- 03:29 of the categories is always as a percentage of the whole.
- 03:33 This chart is really good for doing comparison charting,
- 03:36 showing how the percentage of a category changes based upon different conditions,
- 03:41 such as location or product line.
- 03:43 Or comparing the before and
- 03:44 after condition of a process that you are analyzing.
- 03:48 To make it easy to read, I try to limit to about 6 slices of the pie.
- 03:53 You can go a little higher if you have roughly equal slices.
- 03:57 But if you have dominant slices,
- 03:58 the minor ones are miniscule by the time you get over 6 slices.
- 04:03 And in this case, colors are very helpful for
- 04:05 being able to discriminate between the slices.
- 04:08 As with the vertical and horizontal bar charts,
- 04:11 this chart helps to show why factors are dominant.
- 04:14 Of course, the big difference is that in this case you're using percentages.
- 04:19 The last chart to discuss is the box plot.
- 04:22 Box plots are ideal for showing comparison between data sets because
- 04:27 it shows the median and extremes of the data sets and a level of central tendency.
- 04:33 I use box plots with attribute data to compare how data sets for different
- 04:37 locations or customers or product lines look with respect to each other.
- 04:42 The box plot quickly reveals when the data set for
- 04:45 one attribute item is very different from all the others.
- 04:49 When discussing box plots, we often talk about the box and whiskers.
- 04:53 As you can see, each data set has a box with a vertical line somewhere inside it
- 04:58 and two horizontal lines extending to either end.
- 05:01 To create the box and whiskers, five values are needed from the data set.
- 05:06 To get these five values,
- 05:07 first order the data within the set from the largest to the smallest.
- 05:11 The largest and smallest values are the two endpoints for
- 05:15 the horizontal lines, the whiskers.
- 05:17 The midpoint of the data, or median, is the vertical line that is within the box.
- 05:23 The 25% and 75% points of the data are the vertical sides of the box.
- 05:30 This plot then shows the data spread and central tendency of the data and
- 05:34 whether the data is biased towards the high end or the low end of the spread.
- 05:38 Comparing boxes for different data sets
- 05:41 can usually reveal which ones are the same and which are different.
- 05:45 In the illustration shown here, the four boxes have a similar upper value.
- 05:50 However, the lower value, midpoint, and width of the boxes are quite different.
- 05:55 The top two boxes are similar, and the bottom two boxes are also similar.
- 06:00 These additional graphical techniques can be very useful when your data sets
- 06:05 have several levels of complexity.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.