Locked lesson.
About this lesson
There are three types of data: variable, attribute and ranked. Each type is useful in measuring process performance by analyzing process problems, but they need to be treated differently.
Exercise files
Download this lesson’s related exercise files.
Data Types Exercise -2023.docx61 KB Data Types Exercise Solution - 2023.docx
61.1 KB
Quick reference
Data Types
There are three types of data, variable, attribute, and ranked. Each type is useful in measuring process performance and analyzing process problems, but they need to be treated differently.
When to use
Whenever data is collected or analyzed – which can happen in all phases of a Lean Six Sigma project – the differentiation between the data types should be clear so that the data is handled correctly.
Instructions
Data is the set of real-world facts about the problem or process that is being studied. Statistics are the numerical interpretation of those facts. We will spend many lessons discussing ways to interpret the data. However, it is important to recognize that depending upon the category of the data, different types of statistical analysis can be done.
There are three categories of data, attribute, variable, and ranked. All three are meaningful for our analysis.
Attribute data is used to indicate the status or condition of the process or problem. It is often expressed in descriptive terms such as: on/off, true/false, Brazil/Argentina/Chile, First place/Second place/Third place, or Good/Better/Best. It is often easier to collect than variable data because typically there are no measurements to be made, it is just a check of the status. Attribute data may be expressed in numerical terms, but it is still a status. For instance, on/off may be represented as 1/0.
Variable data is measured data along some scale. Because it is along a scale, it can take on many values, essentially an infinite number of values depending upon the ability of the scale to discriminate. Variable data is considered to be richer data than attribute because in the analysis we can see values moving and drifting and the movement can be analyzed. Examples of variable data include time, temperature, distance, dimensions, and percentages.
Ranked data is a combination of variables and attributes. The variable data points are sorted from ascending to descending order and then categories are applied to the unit scale of the data. The number of data points that lie within each category is counted. In this way, the variable data is converted into a count that can be used in graphical data analysis.
Hints & tips
- When collecting variable data, be sure you are using consistent units. There have been some disastrous analyses because one team member measured in meters and another with feet and the two data sets were combined without converting to common units.
- During the Measure phase, more data is good. Collect any and all data, even if you don’t know whether you will need it or not. When you get to the Analyse phase, it may become very valuable.
- Some in the Lean Six Sigma community treat attribute or ranked data with disdain because it is not as rich. However, it is much easier to collect. In addition, when the problem is a special cause problem – meaning a unique or isolated event - the attribute data will often provide a clearer picture of what is happening. Attribute data can identify massive changes very quickly whereas variable data is much more powerful for finding trends and slow changes. Ranked data is most useful with graphical data analysis.
- 00:04 Hi, this is Ray Sheen.
- 00:06 During the measure phase, we collect data in order to understand the problem.
- 00:10 And I've been talking a lot about data, so let's just take a few moments and
- 00:14 understand what we mean by that term.
- 00:16 Data characteristics help us understand how to work with the data.
- 00:20 Data is information and facts about the world.
- 00:24 In our case, it will be facts about things that are happening within or
- 00:28 around the process or problem that we're investigating.
- 00:32 Now, we can differentiate data from statistics.
- 00:35 Statistics are not facts, but rather, statistics are an analysis of facts.
- 00:41 These statistics give us the opportunity to interpret
- 00:45 the characteristics of the data concerning the problem or process.
- 00:50 Data that is continuous, meaning that the value can take on virtually
- 00:54 an infinite number of values, is characterized by descriptive statistics.
- 00:59 Which would include things like the mean, the standard deviation, and
- 01:03 the shape of the data.
- 01:05 More about that in another lesson.
- 01:07 By contrast, statistics for discrete data, which means data that
- 01:12 can only take on a relatively small number of predefined values,
- 01:16 will be analyzed based on the number of instances of the defined data values and
- 01:22 some graphical types, such as the Pareto analysis.
- 01:26 Let's dig deeper into attribute data.
- 01:29 Attribute data is a count of things.
- 01:32 It's normally much easier to collect because most of the time,
- 01:36 there is no need for measuring equipment.
- 01:38 The data describes things like a particular status.
- 01:41 Examples would be true or false, on or off.
- 01:45 The sequence of items, first, second, or third, account of occurrences,
- 01:49 such as the number of defects, number of calls.
- 01:52 Which country, Brazil, Argentina, or Chile, which product,
- 01:57 A, B, or C, or a status of excellent, good, fair, or poor.
- 02:02 Attribute data has a limited number of possible values.
- 02:06 The switch is either on or off.
- 02:08 The manufacturing facility is either in Brazil or Argentina.
- 02:12 The car is either red, blue, or green.
- 02:16 Now you can't say that attribute data is any data that's not numeric,
- 02:20 because we'll often use a number to reflect the status or attribute.
- 02:24 On is 1, off is 0.
- 02:26 First place finish is a number 1, second place is 2, and
- 02:30 the third place finish is number 3.
- 02:33 But even though we may use numbers,
- 02:35 there is no meaningful information between the number values.
- 02:39 If on is 1, and off is 0, then what does 0.37 indicate?
- 02:44 The question doesn't even make sense.
- 02:46 This is because the numbers are not measurable values, but
- 02:50 rather are status indicators.
- 02:52 Now, let's look at variable data.
- 02:54 Variable data is considered by statisticians to be much richer
- 02:58 information.
- 02:59 There are many statistical analyses that can be done with variable data.
- 03:04 Variable data exists on a continual scale.
- 03:07 That means there are fractions and decimals between the major values.
- 03:11 Examples of this kind of data is time, distance, temperature,
- 03:15 almost any dimension on a product, such as length, width, or height.
- 03:20 Percentages are also variable, and
- 03:22 this is one of the ways that we can transpose attribute data into variable
- 03:27 data by determining the percentage in each of the attribute categories.
- 03:31 Variable data can take on an infinite number of values.
- 03:35 It's limited only by the precision of the measuring equipment.
- 03:39 Which means that there can be meaningful data values between data points.
- 03:43 Data values like the median or average value may be a unique from any of the data
- 03:49 points, but it's still a very significant number in some of our analyses.
- 03:55 One caution when collecting variable data from multiple data sets,
- 03:59 be sure the measurement systems are in the same units.
- 04:02 So if you're measuring temperature, make sure you know if it is centigrade or
- 04:06 Fahrenheit.
- 04:06 If measuring distance, is at feet, meters, miles, or furlongs?
- 04:11 Finally, let's consider ranked data.
- 04:14 This is data that starts as variable data or ordinal data but
- 04:18 is transposed into count data.
- 04:21 To do this, take all of the data values and
- 04:23 sort them from ascending to descending order.
- 04:27 Data often comes sorted in the timestamp of when they were collected or recorded.
- 04:31 Well, in this case, we ignore that timestamp and
- 04:34 we just take the absolute value of the data.
- 04:37 Once sorted, categories are imposed upon the data.
- 04:40 Typically, these categories reflect either a real-world attribute or
- 04:45 are selected so as to have standard intervals.
- 04:48 The number of data points that fall in each category are then recorded.
- 04:53 This integer count of data points can be helpful for
- 04:57 pattern recognition or visual data analysis.
- 05:01 In your project, you'll be collecting a lot of data,
- 05:04 some of it will be attribute and some will be variable, both will be very useful.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.