Retired course
This course has been retired and is no longer supported.
About this lesson
Different tests are designed to test different quantities of test samples or test parameters. The correct test will ensure a meaningful analysis.
Exercise files
Download this lesson’s related exercise files.
Uni-, Bi-, Multi-Variate Tests.docx60.2 KB Uni-, Bi-, Multi-Variate Tests - Solution.docx
60.1 KB
Quick reference
Uni-, Bi-, Multi-Variate Tests
Different tests are designed to test different quantities of test samples or test parameters. The correct test will ensure a meaningful analysis.
When to use
Every analysis is in one of these three categories. The nature of the hypothesis will determine which one to use.
Instructions
Uni-variate Tests
Uni-variate tests are testing one dataset or one parameter within a dataset to determine its statistical behaviour and significance. This type of test will provide insight about the performance of that parameter or data set. However, it does not provide any insight about any other parameter or the relationship between that dataset or parameter and any other parameter.
An example of Uni-variate test is the calculation of descriptive statistics with respect to a dataset. This would include the calculation of the mean, median, and standard deviation of that dataset.
Bi-Variate Tests
Bi-variate tests compare two parameters or datasets to determine if there is a relationship. These hypothesis tests are normally looking for a specific relationship. The existence of the relationship provides clarity on whether to accept or reject a null hypothesis. This type of test is commonly used to test different potential root causes for an observed defect or process performance condition.
Examples of this type of test are a correlation study to determine if two factors are related and the nature of that relationship. Another example is tests of two datasets to determine if their respective descriptive statistics are statistically different. If there is a difference, the characteristic that separates the two datasets is an indicator of the root cause for the difference.
Multi-Variate Tests
These are tests that include two or more parameters or datasets. While these will work with just two datasets, they typically are used when there are more than two independent parameters or datasets. These tests are used to de-clutter or simplify a large complex dataset to focus on those aspects that are most significant. There are three types of multi-variate tests:
- Factor analysis – This type of test is used to determine which factors that are used to describe a dataset are of high significance for predicting performance and which are not. In its simplest form, it is a Pareto analysis. When doing multi-variate regression analysis, it determines which variables or factors are most important and assigns a large value for the coefficient of that factor.
- Cluster Analysis – This is used with large datasets to simplify the dataset. Clusters of performance are identified. The analysis then determines which factors are the best predictors of segregation into the clusters. To simplify the analysis, we often start with a predetermined number of clusters. Otherwise the analysis can take a very long time and “overfit” the clusters as it strives to get a perfect match.
- Discriminant Analysis – This analysis is used when working with a dataset that already has clearly defined clusters of performance and has numerous potential factors that could influence the performance. The analysis determines which of the factors, or factor combinations, are the ones that best predict the difference in performance. An example of this type of test would be to work with a dataset that has classified the output as a success or failure. This analysis then determines which factors will best predict that performance.
Hints & tips
- All three types of analysis could be used in a complex problem-solving project. Multiple data sets may be collected and analysed using a uni-variate analysis to determine if the data is normal or non-normal. Then a multi-variate analysis is used with all the datasets to identify likely discriminating factors with respect to the performance. The short list of potential factors may then be tested with bi-variate tests to determine if there is any correlation or inter-dependent relationship between these factors.
- The number of factors or samples is a key decision on the hypothesis test decision tree.
- 00:04 Hi, I'm Ray Sheen, as we dig into different types of hypothesis tests,
- 00:09 one of the factors that will determine what type of tests we do is the number or
- 00:13 subsets of data or factors that we're working with.
- 00:16 Let's explore this for a minute.
- 00:18 Hypothesis tests can be divided into categories based upon the number of
- 00:23 data sets or factors which we refer to as variates that are being considered.
- 00:29 The uni-variate tests have one data set or one factor.
- 00:33 These tests are analyzed or characterizing
- 00:35 that set of data in order to understand the behavior of that data set or factor.
- 00:40 To the extent that the data set is a factor in our broader analysis,
- 00:44 we need to know more about it.
- 00:46 Uni-variate tests will explain that factor.
- 00:50 The next type of test is the bi-variate test.
- 00:53 In this case, we are comparing two data sets
- 00:56 to determine if they are statistically the same, or statistically different.
- 01:00 Or we're looking at two factors to see if they're related.
- 01:03 Our goal is to understand whether there is a relationship
- 01:07 between the two data sets or factors.
- 01:09 If done as a correlation study, it helps us to understand more about each factor.
- 01:14 If done is a comparison study, we can determine if the two are separate,
- 01:19 unique, or they are part of the same overall data set.
- 01:23 In either case, based upon what that data set or factor represents,
- 01:27 we have a better insight into our overall analysis of the performance of the data.
- 01:33 Finally, there's a multi-variate analysis.
- 01:35 What this means is that there are two or more data sets or factors.
- 01:39 And we normally think of this is when they're more than two,
- 01:42 not just when they are two, which we'll call the bi-variate analysis.
- 01:46 These tests are again used to determine which factors are linked to differences
- 01:50 in the data performance and which factors are not.
- 01:54 This helps us to understand what is significant in predicting
- 01:56 overall process performance.
- 01:58 Some of these tests focus on the mean or medium value and
- 02:01 some focus on the spread or the standard deviation within each data set.
- 02:06 An obvious question is,
- 02:07 how should we decide what type of analysis we should use?
- 02:11 And then obvious answer is,
- 02:13 that it depends upon how many data sets we are working with.
- 02:16 But it may not be quite that simple.
- 02:18 For example, you probably will need to do a uni-variate analysis
- 02:21 on each data set to determine if that data is normal.
- 02:25 That is because determining which of the follow on analysis we will use
- 02:29 will depend upon whether we have normal data.
- 02:31 Multi-variate analysis is typically used to declutter or to whittle down
- 02:36 a large number of data sets or a large number of potential factors.
- 02:39 This then allows the analysis to continue in a more focused manner on one or
- 02:45 a smaller group of factors.
- 02:46 Bi-variate analysis is looking linkage between two data sets or
- 02:51 between two parameters or factors within the data set.
- 02:54 The presence or absence of a linkage can be a strong pointer
- 02:57 towards one of the root causes for the effect that is being investigated.
- 03:02 The uni-variate analysis is often used to demonstrate that
- 03:05 a particular data set meets some standard or goal for the data.
- 03:09 It might be that the data is normal or
- 03:11 it might be that the data shows the process performance is not changed
- 03:15 from the original installed or as designed performance.
- 03:19 It's interesting to know that in some cases you may do all three analysis.
- 03:24 First, a uni-variate analysis on each data set to determine if its normal.
- 03:28 Then a multi-variate analysis is done to determine which data sets are most
- 03:31 significant.
- 03:32 And then finally, several bi-variate analysis may be done to determine if those
- 03:36 most significant data sets are truly independent of each other or
- 03:39 if they're from the same population.
- 03:42 By the way you can do a type of multi-variate analysis by conducting
- 03:45 multiple bi-variate analysis with all the different combinations of factors or
- 03:50 subsets.
- 03:51 However, this can become very time consuming.
- 03:53 Let's spend a little more time discussing how to use a multi-variate analysis.
- 03:58 One approach is to test for
- 03:59 which factors create a significant difference in the data set.
- 04:03 In this case, we're analyzing multiple factors within a data set
- 04:07 to see which ones cause the major stratification and which ones do not.
- 04:12 In its simplest form, a Pareto analysis is this type of multi-analysis.
- 04:16 It tells us which factors are most important.
- 04:19 Another type of multi-variate analysis is a cluster analysis.
- 04:23 This looks for data samples that are similar and determines what
- 04:27 factors contribute to their similarity or are shared across the data sets.
- 04:31 And therefore, can be used in discriminating between those data points.
- 04:36 When working with large data sets,
- 04:38 this type of analysis can take a long time to search for the best characteristic.
- 04:43 So often, the analysis is done by pre-determining a number of data set
- 04:47 clusters, and then looking at the goodness of fit for those different clusters.
- 04:51 By seeking out clusters within a large data set, the data set can be reduced to
- 04:56 several smaller subsets of data that are easier to analyze.
- 04:59 And the discriminating factors that create the subsets or
- 05:02 often provide insight into the root causes of the issues being investigated.
- 05:07 Sometimes there is already an obvious grouping such as
- 05:11 a sub-group of data points that contain the defect and a sub-group that do not.
- 05:15 A discriminant analysis will analyze the data set from the perspective of many
- 05:19 different factors simultaneously to determine which factors or
- 05:23 in some cases which factor interactions create
- 05:26 the most significant correlation with the known obvious grouping.
- 05:31 This would provide insight into why some data points are defective and
- 05:35 some are not.
- 05:36 Multi-variate analysis can speed up your data analysis.
- 05:40 It will help you sieve through all the possible factors to focus in on
- 05:44 the critical few.
- 05:46 And then you can do a more in-depth bi-varied or
- 05:48 uni-varied analysis on those critical few factors.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.