Locked lesson.
About this lesson
There are many types of continuous data distributions. These are often associated with physical characteristics of the data or system being studied. The ability to recognize the type of distribution aid in the selection and analysis of a hypothesis test.
Exercise files
Download this lesson’s related exercise files.
Continuous Data Distribution Exercise.xlsx10.8 KB Continuous Data Distribution Exercise Solution.docx
128.8 KB
Quick reference
Continuous Data Distributions
Datasets are often displayed in distributions. Different continuous data distributions are indicative of different physical phenomena. The ability to recognize a distribution will aid id the identification of process performance issues.
When to use
Visualizations of datasets are often easier to use when explaining the characteristics of data than tables of numbers. In addition, the different continuous data distributions have specific characteristics which will dictate what type of hypothesis test is appropriate for that data.
Instructions
Continuous Distribution
Continuous data is that which can take on an infinite number of values. Between any two data values, there is another data value that could be detected if the measurement system could accurately discriminate that level of fraction or decimal. The plots are characterized by a smooth curve, not histogram bars. In all these plots, the horizontal axis is the independent variable and the vertical axis is the process performance dependent variable.
Gaussian (Normal) Distribution
This is the bell-shaped curve that represents common cause or random variation. It is symmetric, peaked in the center and the tails approach zero. This is normally our desired distribution for analysis because we know that it represents random variation around the typical process performance.
Uniform Distribution
This is a horizontal line or essentially equal vertical value for all horizontal axis values. This represents the case where the process performance does not depend upon the independent variable.
Exponential Distribution
This is an asymmetric curve. One end starts a point on the vertical axis and the other end of the curve approaches – but never reaches – zero value. A typical physical phenomenon that follows this pattern is the failure rates of a product or system that is subject to infant mortality.
Log-normal Distribution
This is also an asymmetric curve. Both ends of the curve are at zero. However, one end quickly shoots up and then it slowly decays back to zero. This is also a commonly occurring pattern in the real world. For instance, machine downtime follows this pattern, It takes a finite amount of time to do a repair which is the major spike, and some repairs then take longer.
F Distribution
The F Distribution is a graph of the F statistic. The F statistic is used for comparing continuous data distributions with statistical techniques such as ANOVA. The actual shape of the curve will depend upon the number of degrees of freedom. The horizontal axis goes from zero to one. The vertical axis is probability.
Chi-Squared Distribution
This distribution is an asymmetrical distribution of the Chi-Squared statistic. The level of skewness will vary based on the application statistics. The horizontal axis is the number of factors in the test. The vertical axis is the Chi-squared value.
Beta Distribution
The Beta distribution is a family of curves based on several shape factors. The general form of the Beta distribution can vary from exponential to normal or even a bathtub with ends and a low center. The horizontal axis ranges from zero to one. The vertical axis is probability.
Gamma Distribution
The Gamma distribution is a family curve based on shape factors. The Exponential distribution and Chi-squared distribution are special cases of the Gamma distribution. The horizontal axis is continuous and the vertical axis is probability.
Weibull Distribution
The Weibull distribution is a family of curves based upon the Beta curve that can take on many shapes including an exponential, log-normal, or even normal. The actual shape varies based on factors or constants in the Weibull equation. This equation has proven very effective at modeling reliability in complex systems. The factors are based on the system design parameters.
Login to download- 00:04 Hi, I'm Ray Sheen.
- 00:05 So we just finished looking at PDF distribution graphs for discrete data.
- 00:10 Now, let's consider continuous data.
- 00:13 As a reminder, continuous data can take on any value,
- 00:17 unlike discrete data, which must take on one of a limited number of values.
- 00:23 So continuous data will be things like time, distance, volume, or percentages.
- 00:28 And the PDF for continuous data will typically be a line graph,
- 00:32 rather than the bars that we saw with discrete data.
- 00:35 The first one is a distribution that we have already seen and discussed,
- 00:39 it's the normal distribution, sometimes referred to as the Gaussian distribution.
- 00:43 It is characterized by a form that is symmetrical with a high peak and
- 00:48 tails that approach 0.
- 00:50 And as we have noted in the past,
- 00:52 this was a representative of common random variation within a process.
- 00:57 By definition, this data set is normal.
- 01:00 The next one is the uniform distribution.
- 01:03 This is symmetrical because it is essentially a flat horizontal line.
- 01:08 This represents that there's no relationship between the variables being
- 01:11 plotted.
- 01:12 No matter how you change the x factor, the y factor is unfazed.
- 01:16 Distribution is non-normal.
- 01:19 On to the exponential distribution.
- 01:22 This is definitely asymmetrical and non-normal.
- 01:26 One end will occur at some point on the y-axis and the distribution will trend
- 01:31 downward and flatten out at some other value on the y-axis, often 0.
- 01:35 And the exponential could occur either trending down or trending up.
- 01:40 Either way, the change is at a constantly increasing rate.
- 01:43 There are a number of physical phenomena that will behave in this exponential
- 01:47 fashion.
- 01:48 One that is commonly refound is a failure rate.
- 01:51 Now let's look at the log-normal curve.
- 01:54 This is also an asymmetrical or non-normal curve.
- 01:58 It reflects a value of 0 on both the left side and the right side of the curve.
- 02:03 However,unlike the normal curve, it is heavily skewed to one side or the other.
- 02:08 It's also a good model for some physical phenomena such as machine downtime.
- 02:13 And interesting attribute is that the logarithmic value is often normally
- 02:18 distributed.
- 02:19 That may introduce another curve known as the F distribution.
- 02:23 The F distribution is based upon the F ratio, which is the test
- 02:27 statistic when comparing the variances of two normal distributions.
- 02:33 This is of interest to us because the F value is calculated as part of the ANOVA
- 02:37 hypothesis test, which is a frequently used test, and we'll discuss it later.
- 02:42 Frankly, the shape of the curve will vary depending upon the number of degrees of
- 02:46 freedom.
- 02:47 The curve is shown as for five degrees of freedom and
- 02:50 you can assume that this curve will be non-normal.
- 02:53 Let's move on to the chi-square distribution.
- 02:55 Again, this is a curve of a hypothesis test statistic.
- 02:59 In this case, the statistic is the chi-square.
- 03:02 In this illustration, you can see that the shape of the curve will change depending
- 03:07 upon the number of factors used in the chi-squared analysis.
- 03:10 Again, assume that it will always be non-normal.
- 03:15 Just a few more to go,
- 03:16 the beta distribution is a probability of a proportion or percentage occurring.
- 03:22 Now depending upon a number of factors, the shape can be very interesting.
- 03:26 It might be a normal curve, however, it could also be a log-normal.
- 03:30 In fact, it can even have a bathtub effect,
- 03:33 where it's high on both ends and very low in the middle.
- 03:37 Factors associated with the probability will dictate the shape of the curve.
- 03:42 So let's consider the gamma distribution.
- 03:44 It's a special case of the beta curve, and it graphs the probability that the input
- 03:49 to the problem being investigated is skewed.
- 03:51 Again, different shape factors will determine the actual shape of
- 03:55 the distribution.
- 03:57 Because of the skewness, it will be non-normal.
- 04:00 The last distribution I want to mention is, again,
- 04:03 part of the family of data distributions and it is the Weibull curve.
- 04:07 Weibull is the probability of an event occurring and
- 04:10 can represent multiple effects, and therefore will take on multiple shapes,
- 04:15 including exponential or log normal.
- 04:18 The actual curve is based upon the selection of some of the parameters in
- 04:22 the formula, but those will often change over time.
- 04:26 The Weibull distribution is most closely associated with probability analysis and
- 04:30 prediction.
- 04:32 It's usually a non-normal curve.
- 04:35 You definitely need to be familiar with a different continuous data distributions if
- 04:40 you plan to sit for the Isaac Exam.
- 04:42 And it's also very helpful to understand them so
- 04:45 as to spot likely physical phenomena in the data.
Lesson notes are only available for subscribers.
PMI, PMP, CAPM and PMBOK are registered marks of the Project Management Institute, Inc.