Locked lesson.
About this lesson
Hypothesis testing relies on the use of data samples. However, the power and value of the hypothesis test are based on the size of the sample and the means by which it was selected. In this lesson, we consider factors for selecting sample data points and we determine the size of the sample needed based on the desired accuracy of the answer.
Exercise files
Download this lesson’s related exercise files.
Determine Sample Size Exercise.docx62.4 KB Determine Sample Size Exercise Solution.docx
62.6 KB
Quick reference
Samples and Sample Selection
The size of a sample data set will impact the confidence interval when doing inferential statistics. It is critical for an accurate analysis to have a sample data set that is large enough to provide accurate data and that the data points are not biased or skewed.
When to use
When the confidence level has been set and a desired margin or error or confidence interval has been established, then the sample size equation must be used to determine the sample size. In addition, when collecting sample points, appropriate considerations must be given to which data points are selected so as not to bias the data.
Instructions
Inferential statistics are based on statistically analyzing a data sample and inferring characteristics about the full data population. Sampling is often done to save time and money because collecting all the data points from a population would be nearly impossible. An obvious concern is how big is the sample and how was it chosen.
The sample should meet certain characteristics if it is to be used as a surrogate for the entire population.
- Representative – it includes data points that capture the fluctuation and changes found in the data population
- Sufficient – there are enough data points to detect patterns in the data
- Contextual – other system or environmental effects that could influence the data are recorded
- Reliable – the measurement system is able to provide data that is precise and accurate
- Random – every data point has an equal chance of being selected
Data collection then should be well-considered and planned activity to ensure the data points will meet these criteria. That means that before data is collected, the problem must be defined so that you can decide what data is needed to conduct the analysis. The sample size can then be calculated using the formula:
Where Z α/2 is the Z value associated with the confidence level
Where σ is the population standard deviation
Where Em is the margin of error that is acceptable for the statistics
Then decide on a sample selection process that ensures the data points are representative and random. Before collecting the data, ensure the measurement system will provide accurate and reliable data value
Hints & tips
- The data collection approach, including how the data is collected and how many samples are collected, will limit the accuracy of the statistical analysis. So be certain that your approach will provide enough reliable data to conduct the analysis
- In some cases, there is already a large body of data that has been collected. If the contextual aspects indicate that you can get representative and reliable data from that database, additional data may not need to be collected
- The same size equation is derived from the equation for the Confidence Interval – which essentially becomes the margin of error term in the sample size equation.
- Based on the sample size equation, the required number of data points in the sample goes up when the confidence level is increased, or the standard deviation gets larger, or the margin of error is reduced. By the same token, the number of points in the sample can be reduced if the confidence level goes down, the standard deviation gets smaller or the allowed margin of error in the statistics can increase.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.