Locked lesson.
About this lesson
Hypothesis testing relies on the principle of inferential statistics. A sample data set from a larger population of data is statistically analyzed. The result of the analysis of the sample data is used to infer a conclusion about the larger population of data. This lesson will discuss the concept and ways to compare the data sample and the data population.
Exercise files
Download this lesson’s related exercise files.
Inferential Statistics Exercise.xlsx12.2 KB Inferential Statistics Exercise Solution.xlsx
13.8 KB
Quick reference
Inferential Statistics
Inferential statistics rely on the statistical analysis of a subset or sample of an entire population of occurrences to draw conclusions about the entire population. The inferential statistics rely on the descriptive statistics of a sample dataset.
When to use
Inferential statistics are used when the data from an entire population of occurrences or iterations of a product or process are not readily available. This could be because of the long time that the product or process has been in use, or it could be because the access and availability of the product or process are limited.
Instructions
Inferential statistics is a branch of statistical analysis that relies on using the statistical analysis of a subset or sample from a data population to draw inferences about the statistical measures that are applicable to the entire population. In many cases, the entire data population is not available for measurement. This is particularly true for products or processes that have been in use for a long time period. The earlier iterations of the product or process are either no longer in existence or are out of the control of the product or process manager and therefore cannot be measured as part of the population.
Contrasting descriptive statistics with inferential statistics, there are a few obvious differences. Descriptive statistics analyze a set of data to provide insight into the real-world business processes associated with that data. Inferential statistics analyze a sample set of data to provide insights into the larger data population from which the sample was drawn. Descriptive statistics is a mathematical analysis of the existing data. Inferential statistics use the descriptive statistics from the sample data and infer population statistics that will fall within a certain range.
Calculating descriptive statistics for the sample data will provide insight into the statistics applicable to the full population. Terminology that will be used in the hypothesis test discussions will differentiate at times between sample statistics and population statistics. These formulas are found in the equations handout.
While it is clear that precise statistical values are only available for the actual data in the subset or sample; if that sample fairly represents the entire population, then those values are excellent surrogates for the statistical measures of the entire population. There are several questions you can ask when beginning an analysis with inferential statistics. These questions will help to ensure you can trust the results of the analysis.
- What are you trying to determine?
- What tool can provide the needed information?
- What kind of data does that tool require?
- How will you collect the data?
- How confident are you in the data summaries?
Hints & tips
- If all the data is available, use it. Don’t rely on inferential statistics.
- Place close attention to formulas in the remaining lessons so that you know when you are to use sample statistics and when to be relying on population statistics.
- 00:04 Hi, I'm Ray Sheen.
- 00:06 Often, when conducting a hypothesis test,
- 00:08 all of the data from a population is not available.
- 00:12 Instead, you only have a sample.
- 00:14 Well from that sample, you want to be able to infer meaningful information
- 00:19 about the entire population, and for that we need inferential statistics.
- 00:24 >> So what do we mean by the term inferential statistics?
- 00:28 Inferential statistics is exactly what it says.
- 00:31 It relies on a statistical analysis of a subset or sample data from a data
- 00:36 population to infer or draw conclusions about the entire population of data.
- 00:44 We use this approach when the entire population is not available,
- 00:47 only a subset.
- 00:49 We study the subset in detail and then draw conclusions about the full dataset.
- 00:55 Just to be clear, if the full data set is available, use it.
- 00:58 Years ago when the analysis was done by hand,
- 01:02 analyzing a large data set was time consuming and error-prone.
- 01:07 Now, with modern data analysis applications, that's not a problem.
- 01:12 So if you have the entire data set, use it all, but if not,
- 01:15 then we will work with a smaller sample.
- 01:19 Many times the data set that is available does not represent all
- 01:22 the data that is applicable.
- 01:25 You may not be able to get all the data from all locations, and
- 01:28 all occurrences that have happened throughout all time for that process.
- 01:33 A limitation that we have is the analysis of a data sample tells us about that
- 01:38 data sample.
- 01:39 To be able to extrapolate that data and
- 01:41 infer conclusions about the larger data set, we need to assess whether that
- 01:46 sample dataset is a good surrogate for the entire data population.
- 01:52 Before we get into that analysis, let's quickly review descriptive statistics and
- 01:57 compare them to inferential statistics.
- 02:00 Descriptive statistics describe the data set that is being studied.
- 02:05 This data is often a subset of some larger population, but
- 02:08 the descriptive statistics are only valid for that subset.
- 02:13 With this data, we often calculate things like mean, median and
- 02:17 standard deviation of the data subset.
- 02:20 The numerical analysis gives us a mathematical description of the real world
- 02:25 that is represented by that sample data.
- 02:28 But again, it only represents the information from that data sample.
- 02:33 Inferential statistics are used when we have a large data population and
- 02:37 we can't get access to all the data.
- 02:40 In fact, some data populations may be infinite in size,
- 02:44 and it would be impossible to measure the full data population.
- 02:48 So inferential statistics use the descriptive statistics from a sample set
- 02:53 of data to draw conclusions about what the descriptive statistics would be for
- 02:57 that full data population.
- 03:00 By an inferential calculation of the sample data statistics,
- 03:03 we can then draw conclusions about the performance of the larger population.
- 03:08 So the sample mean or standard deviation can provide insight
- 03:12 into the full population mean or standard deviation.
- 03:17 Of course to do this, we need to carefully consider what is in the sample.
- 03:22 When you have determined that inferential statistics must be used, you should answer
- 03:26 these questions to determine what is needed in the sample data set.
- 03:31 What are you trying to determine?
- 03:33 What information do you need about the population?
- 03:37 What tool can provide that needed information?
- 03:40 What type of analysis will need to be done to generate that information?
- 03:44 What kind of data does that tool require?
- 03:47 What should be measured?
- 03:49 How will you collect that data?
- 03:51 Do you have a system for collecting the data, or if the data population already
- 03:55 exists, do you have a means or a method for extracting a representative sample?
- 04:01 How confident are you in the data summaries?
- 04:03 How many data points can you collect?
- 04:05 And are you able to minimize the likelihood that the data is biased in
- 04:09 any way?
- 04:10 The answers to these questions will help you select the sample.
- 04:13 And we'll talk more about sampling in another lesson.
- 04:17 Let's finish this lesson with a quick review of the typical statistical measures
- 04:21 of a dataset.
- 04:22 We will contrast the sample dataset measures with
- 04:25 the population dataset measures.
- 04:28 Throughout the remaining lessons in this course, you'll be using these terms.
- 04:31 So it's helpful to understand the differences now and avoid confusion later.
- 04:37 The number of items in that data set is represented by the letter n.
- 04:41 A small n means you are working with the sample of the data and
- 04:45 a large N means the entire population.
- 04:49 Of course you can directly count the small n for the sample, but
- 04:52 the large N for the population must often be estimated.
- 04:57 The mean or average is either x bar or
- 04:59 mu depending upon whether it is the sample or the population.
- 05:04 The calculation is done the same way,
- 05:05 it's just the number of data points that's different.
- 05:08 The standard deviation is either s, or sigma.
- 05:11 In this case, there is a slight difference in the formula.
- 05:14 The sample standard deviation, s has the term small n-1 in the denominator.
- 05:21 The population has the term large N in the denominator.
- 05:25 Because of the large size of N,
- 05:27 there's essentially no difference between a large N and a large N- 1.
- 05:32 Finally, the variance is the standard deviation squared, and
- 05:37 therefore, is either s squared or sigma squared.
- 05:42 >> Hypothesis testing relies on the principle of inferential statistics.
- 05:47 Based upon what we know about the sample data,
- 05:49 it influences what we think we know about the full population.
- 05:54 Depending upon the data sample definition and collection,
- 05:59 we can have confidence in these inferences.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.