Locked lesson.
About this lesson
Attribute data Gage R&R Studies generate a number of metrics that are used for judging the performance of the measurement system used to evaluate pass/fail attribute data. In this lesson, each of these metrics is described and the method of calculation is explained.
Exercise files
Download this lesson’s related exercise files.
Attribute Data Gage R&R Calculations.xlsx11.7 KB Attribute Data Gage R&R Calculations - Solution.xlsx
15.5 KB
Quick reference
Attribute Data Gage R&R Calculations
The formulas and calculations for attribute data Gage R&R Studies are simple counts of comparisons between appraisers and the reference standard with answers converted to a percentage.
When to use
The attribute data Gage R&R calculations are done once all data is collected in order to determine the Gage R&R measurements.
Instructions
Attribute data Gage R&R Studies rely upon data that is in the form of pass/fail, yes/no or good/bad. There are only two states possible, the desired state and an undesirable state.
Attribute data Gage R&R studies have many of the components seen in variable data Gage R&R Studies. These include two or three appraisers and two or three trials. One difference is that attribute data studies require a minimum of 30 items and 30 is definitely better.
- In order to assess the system performance, some items need to be acceptable and some unacceptable. Ideally these would be split 50% of one type and 50% of the other type – although a 70-30 split will still provide excellent study results. And of course inspections of items are done in a random order.
- One other input that is normally used with attribute data Gage R&R Studies, although it does not help in the calculation of repeatability and reproducibility, is a reference standard of the true value for each item in the study. This reference will be used for some accuracy calculations that are done in parallel with GRR. The results are analysed by comparing results across trials, across the appraisers and with the standard. Minitab will also use the Chi-Square statistical analytical tool to gain evern more insight.
- Repeatability is the within appraiser measure. An individual repeatability score is determined for each appraiser. This is the number of items in which they matched their score across all trials divided by the total number of items. The system repeatability is the average of all the individual repeatability scores. The target for repeatability is 90% or better with the marginal range being 80% to 90%.
- Reproducibility is the between appraiser measure. In this case it is a check of two aspects of the assessment. First, did an appraiser agree on all assessments for an item and second, did that appraiser agree with another appraiser on all their assessments of the item. Determine how many times this condition exists and then divide that value by the number of items and the number of appraiser pairs. The target percentage and marginal zone are the same as for Repeatability.
Attribute data Gage R&R studies include the reference value for each item. This allows the study to examine both precision (repeatability and reproducibility) but also to assess accuracy measurements.
- Accuracy is the percentage of times an appraiser made the correct assessment of an item. It is the total number of correct assessments divided by the number of items, trials, and appraisers. Once again we will use a target of greater than 90% and a marginal zone of 80% to 90%.
- False Alarm is the condition when an appraiser assessed something to be a “Fail” that the reference standard has rated to be a “Pass.” This means an appraiser is rejecting a good item. The acceptable rate for this is only 5% or less and the marginal rate is %5 to 10%.
- Miss Rate is the opposite condition. It occurs when an appraiser assesses something to be a “Pass” when its true value is a “Fail". This is more dangerous to the customer since it means a flawed output could be delivered to a customer who is expecting the product or service to correct. The acceptable rate for this is only less than 2% and the marginal zone is 2% to 5%.
- Effectiveness is the final accuracy and precision measurement. It is done at the item level and is determined by counting the number of items where all appraiser’s assessment agreed on all trials with the true value from the reference standard. And this is divided by the number of items to create a percentage effectiveness. The target is once again greater than 90% and the marginal zone is 80% to 90%.
Hints & tips
- You can see how important it is that the data is recorded for the correct item.
- Be sure to get at least a 70-30 split or better for the items – however, those that are bad should not be so obviously bad that the appraiser does not even need to assess the item with the measurement system.
- Basic repeatability and reproducibility do not need the reference standard, but every time I have run this type of study, I have included the reference standard so I could calculate the accuracy measurements also.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 I'd now like to walk you through an attribute Data Gage R&R analysis.
- 00:11 This is actually a two part lesson.
- 00:13 In this first part, I'll explain the calculations and
- 00:16 then in the second part we'll use those formulas on a dataset
- 00:20 to determine the amount of error due to repeatability and reproducibility.
- 00:25 Let me quickly review the attribute Data Gage R&R study parameters.
- 00:30 The attribute Data Gage R&R study has data,
- 00:33 that is in the form of pass-fail or good-bad.
- 00:36 There are no multiple intermidiate values, that's either one or the other.
- 00:41 This makes the analysis much simplier but
- 00:43 it also means we need more data points than with a variable Data Gage R&R.
- 00:48 Then we'll review the setup.
- 00:50 There are two or three appraisers.
- 00:52 You have at least 20 items for measurement and 30 are better.
- 00:55 Some of the items are good or pass and some are bad or fail.
- 01:00 A 50/50 split is idea but it should at least be a 70/30 split.
- 01:04 You'll do two or three trials,
- 01:06 typically just two and everything will be measured in a random order.
- 01:11 The calculations can be done in two manners.
- 01:14 Statistical software applications like Minitab
- 01:17 will often use the Chi-Square methodology.
- 01:19 This will provide a very accurate answer.
- 01:21 There are also some basic equations that we can use with our Excel spreadsheet.
- 01:26 These formulas will do some cross tabulation reference comparisons and
- 01:30 I will discuss both options.
- 01:32 I'll start with repeatability.
- 01:34 The first step is to check to see if the appraiser agreed on
- 01:37 all checks for an item.
- 01:38 That means that every time the appraiser checked that item,
- 01:41 they found it to be a pass or fail.
- 01:44 It doesn't mean they were right, just that they were consistent.
- 01:47 Determine how many times each appraiser was consistent and
- 01:50 then divide that by the total number of items that were checked and
- 01:54 you can then determine the percentage of time that each appraiser was consistent.
- 01:58 That is the repeatability for the appraiser.
- 02:01 Now, calculate the average repeatability for
- 02:04 all the appraisers to get the system repeatability.
- 02:06 Just to add the individual repeatability and divide it by the number of appraisers.
- 02:10 If that repeatability is less than 90%, the system has a weakness and
- 02:14 needs improvement.
- 02:15 If the reason it was less
- 02:16 was primarily due to one appraiser having a very low individual repeatability score.
- 02:21 You probably have a training problem with that appraiser.
- 02:24 If everyone was low, it's more likely that the system has too much variability or
- 02:29 some other inherent weakness.
- 02:31 So now let's look at reproducibility.
- 02:33 Reproducibility is based upon the number of times the appraisers agree with each
- 02:38 other on an item.
- 02:38 And again, it doesn't mean that they were right,
- 02:41 just that the appraisers were consistent with each other.
- 02:44 So if we had three appraisers, Appraiser 1 and Appraiser 2 agreed about an item
- 02:49 on all trials, Appraiser 1 and Appraiser 3 agreed about the same item on all trials
- 02:54 and Appraiser 2 and Appraiser 3 agreed about that same item on all trials.
- 02:59 Once again, we divide the number by the total number of items.
- 03:02 But we also need to divide by the number of appraiser pairs
- 03:05 to get our reproducibility value.
- 03:08 So again, if there were three appraisers, there are three appraiser pairs.
- 03:12 That is one in two, one in three and two in three.
- 03:15 Again, we're looking for a value greater than 90%.
- 03:18 If we fail to meet that, it could be the same problems we had with repeatability of
- 03:22 training or system variability.
- 03:24 It could also be a poor operational definition of what is a pass or a fail.
- 03:29 Now, we wanna start working with the other column that we said might
- 03:32 be included in the data set, which was the reference standard
- 03:35 that indicated the correct assessment of an item status, either pass or fail.
- 03:40 With this standard, there are some additional accuracy related checks that we
- 03:43 can make and two of them are miss rate and false alarm rate.
- 03:48 These two are just the flip sides of the same coin.
- 03:51 The miss rate is the percentage of times an
- 03:53 appraiser passed an item whose true status is fail.
- 03:57 And the false alarm rate is the number of times an appraiser failed an item whose
- 04:01 true status is pass.
- 04:03 In either case, we compare how the appraiser rated an item
- 04:06 to the value in the reference standard. If they are different, it's either a miss or
- 04:11 a false alarm.
- 04:12 Once we've checked each trial for each appraiser, we add up the number of
- 04:15 occurrences in each category and divide by the total number of appraisers and
- 04:19 trials and the number of items in that category.
- 04:23 In other words, the total number of times an inspection
- 04:26 was done on an item of that category of pass or fail.
- 04:30 We are more concerned about the miss rate because we wanna minimize the likelihood
- 04:34 of a bad or failed item getting through the system.
- 04:37 So our target for miss rate is 2% or less.
- 04:40 And the false alarm rate is 5% or less.
- 04:43 The marginal zone for miss rate is 2% to 5%.
- 04:45 And the marginal zone for false alarm is 5% to 10%.
- 04:50 There are two other measures that we can consider by using the reference standard
- 04:54 and these are accuracy and effectiveness.
- 04:57 Accuracy is just the inverse of the combination of miss rate and
- 05:00 false alarm rate that we just discussed.
- 05:02 It's the percentage of the time that an appraiser got it right.
- 05:06 The assessment agreed with the standard.
- 05:08 Add up all of those occasions and
- 05:10 divide by the total number of inspections that occurred.
- 05:14 Effectiveness is a little more complex and in my opinion, it's a more meaningful
- 05:18 number in the context of repeatability and reproducibility.
- 05:22 Individual effectiveness is a combination of accuracy and repeatability.
- 05:25 So there are two checks.
- 05:27 Did the appraiser reach the same conclusion about an item
- 05:30 on all of that appraiser's trials?
- 05:31 That is repeatability.
- 05:33 And was the conclusion the correct one, which is accuracy.
- 05:37 When you determine how often that occurred for each item for
- 05:39 an appraiser, divide by the number of items to get the individual effectiveness.
- 05:44 Now we can go even one step further.
- 05:47 Not only was one appraiser effective for
- 05:49 an item, we then ask if all the appraisers are effective for that item.
- 05:53 We're bringing in a reproducibility check.
- 05:56 So the question is for
- 05:57 an item, did all the appraisers correctly asses that item on all trials?
- 06:02 Once we know the number of times that that occured,
- 06:04 we divide it by the number of items to get the system effectiveness.
- 06:08 And the target for system effectiveness is 90%.
- 06:11 So let's summarize these different repeatability and
- 06:14 reproducibility measurements.
- 06:16 Recognizing that we have lots of different measurements we could use when
- 06:19 working with attribute data.
- 06:21 It's not like the variable data where we could get one comprehensive GRR value.
- 06:26 This table sums up all of those measures with the target values and
- 06:29 the marginal zone.
- 06:31 In my opinion, the system effectiveness is the best overall measurement for
- 06:35 combining both accuracy and precision into one measure.
- 06:38 We can use the other measures of repeatability,
- 06:40 reproducibility, miss rate and false alarm rate
- 06:43 to understand the specific nature of any effectiveness weaknesses.
- 06:48 And you can use statistical software to determine all of these values and
- 06:51 even some more analysis and graphs to go with it.
- 06:55 The attribute Data Gage R&R study has simple equations and with those, you can
- 07:00 get a very good insight into the strengths and weaknesses of the measurement system.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.