Skip to content Skip to sidebar Skip to footer

How to Model Continuous Distribution Sample in R

Continuous Distributions in R

A continuous function in mathematics is one whose graph can be drawn in one continuous motion without ever lifting pen from paper. We've already seen examples of continuous probability density functions. For example, the probability density function

The probability density function for the standard normal distribution.

from The Standard Normal Distribution was an example of a continuous function, having the continuous graph shown in Figure 1.

The standard normal distribution is continuous.

Figure 1. The standard normal distribution is continuous.

In the activities The Standard Normal Distribution and The Normal Distribution, we saw that dnorm, pnorm, and qnorm provided values of the density function, cumulative probabilities, and quantiles, respectively. In this activity, we will explore several continuous probability density functions and we will see that each has variants of the d, p, and q commands.

The Uniform Distribution

The Uniform Distribution is defined on an interval [a, b]. The idea is that any number selected from the interval [a, b] has an equal chance of being selected. Therefore, the probability density function must be a constant function. Because the total are under the probability density curve must equal 1 over the interval [a, b], it must be the case that the probability density function is defined as follows:

The Uniform Probability Density Function on [a, b]

Figure 2.The probability density function for the uniform distribution on [a, b].

For example, the uniform probability density function on the interval [1,5] would be defined by f(x) = 1/(5-1), or equivalently, f(x) = 1/4. The following code sketches this uniform probability density function over the interval [1,5]. The resulting plot is shown in Figure 3.

> x=seq(1,5,length=200) > y=rep(1/4,200) > plot(x,y,type="l",xlim=c(0,6),ylim=c(0,0.5),lwd=2,col="red",ylab="p") > polygon(c(1,x,5),c(0,y,0),col="lightgray",border=NA) > lines(x,y,type="l",lwd=2,col="red")          

Some comments on this code are in order:

  1. The rep command takes the value 1/4 and duplicates it 200 times. This stores a vector in the variable y having length 200, each entry of which is 1/4. This is necessary because the plot(x,y) command requires that vectors x and y have equal length.
  2. We've seen the polygon command before, but setting the argument border=NA "not applicable" prevents the borders of the shaded region from being drawn.
  3. The last lines command redraws the horizontal uniform density function so that appears atop the shaded polygon.

Key Idea: Note that the area of the shaded region is 1, as it should be for all probability density functions.

The shaded area under the uniform density function is 1.

Figure 3. The shaded area under the uniform density function is 1.

Alternatively, we could have used dunif, which will produce density values for the uniform distribution. The following code will also produce the image shown in Figure 3.

> x=seq(1,5,length=200) > y=dunif(x,min=1,max=5) > plot(x,y,type="l",xlim=c(0,6),ylim=c(0,0.5),lwd=2,col="red",ylab="p") > polygon(c(1,x,5),c(0,y,0),col="lightgray",border=NA) > lines(x,y,type="l",lwd=2,col="red")          

Note that the arguments min=1 and max=5 provide the endpoints of the interval [1,5] on which the uniform probability density function is defined.

Using punif

Suppose that we would like to find the probability that the random variable X is less than or equal to 2. To calculate this probability, we would shade the region under the density function to the left of and including 2, then calculate its area.

> x=seq(1,5,length=200) > y=dunif(x,min=1,max=5) > plot(x,y,type="l",xlim=c(0,6),ylim=c(0,0.5),lwd=2,col="red",ylab="p") > x=seq(1,2,length=100) > y=dunif(x,min=1,max=5) > polygon(c(1,x,2),c(0,y,0),col="lightgray",border=NA) > x=seq(1,5,length=200) > y=dunif(x,min=1,max=5) > lines(x,y,type="l",lwd=2,col="red")          

The above code produces the image shown in Figure 4.

The shaded area under the uniform density function represents the probability that <i>x ≤ 2</i>.

Figure 4. The shaded area under the uniform probability density function represents the probability that x ≤ 2.

In Figure 4, note that the width of the shaded area is 1, the height is 1/4, so the area of the shaded region is 1/4. Thus, the probability that x ≤ 2 is 1/4.

We can use the punif command to compute the probability that x ≤ 2.

> punif(2,min=1,max=5) [1] 0.25          

Note that the punif command works in precisely the same manner as did the pnorm command in the activity The Normal Distribution. It calculates the area to the left of a given number under the uniform probability density function.

Let's look at another example. Suppose that we wanted to find the probability that x lies between 2 and 4. We could draw the uniform distribution, then shade the area under the curve between 2 and 4.

x=seq(1,5,length=200) y=dunif(x,min=1,max=5) plot(x,y,type="l",xlim=c(0,6),ylim=c(0,0.5),lwd=2,col="red",ylab="p") x=seq(2,4,length=100) y=dunif(x,min=1,max=5) polygon(c(2,x,4),c(0,y,0),col="lightgray",border=NA) x=seq(1,5,length=200) y=dunif(x,min=1,max=5) lines(x,y,type="l",lwd=2,col="red")          

The above code produces the image shown in Figure 5.

The shaded area under the uniform density function represents the probability that <i>2 < x < 4</i>.

Figure 5. The shaded area under the uniform probability density function represents the probability that 2 < x < 4.

Note that the width of the shaded area is 2, the height is 1/4, so the area is 1/2. That is, the probability that 2 < x < 4 is 1/2.

We can use punif to arrive at the same conclusion. To find the area between 2 and 4, we must subtract the area to the left of 2 from the area to the left of 4.

> punif(4,min=1,max=5)-punif(2,min=1,max=5) [1] 0.5          

Finally, if we need to find the area to the right of a given number, simply subtract the are to the left of the given number from the total area; i.e., subtract the area from the left of a given number from 1. So, the following calculation will find the probability that x > 4.

> 1-punif(4,min=1,max=5) [1] 0.25          

Using qunif

The command qunif will find quantiles for the uniform distribution in the same way as we saw the qnorm find quantiles in the activity The Normal Distribution. Thus, to find the 25th percentile for the uniform distribution on the interval [1,5], we execute the following code.

> qunif(0.25,min=1,max=5) [1] 2          

Note that this is in total agreement with the result shown in Figure 4, where we saw that the area of the shaded rectangle in Figure 4 was 0.25. Hence, the number 2 delimits the point at which 25% of the total area under the density function is attained.

The Exponential Distribution

The exponential probability density function is defined on the interval [0, ∞]. It has the following definition.

The Exponential Probability Density Function

The exponential probability density function is continuous on [0, ∞).

Figure 6. The exponential probability density function is continuous on [0, ∞).

The exponential distribution is known to have mean μ = 1/λ and standard deviation σ = 1/λ.

Suppose that we set λ = 1. Then the mean of the distribution should be μ = 1 and the standard deviation should be σ = 1 as well. We could use the formula in Figure 6 to produce values of the probability density function, but as we've seen before, it will be more efficient to use the dexp command.

x=seq(0,4,length=200) y=dexp(x,rate=1) plot(x,y,type="l",lwd=2,col="red",ylab="p")          

Note that the argument rate expects us to respond with the value of λ. The code above produces the exponential distribution shown in Figure 7.

The exponential probability density function is shown on the interval [0,4].

Figure 7. The exponential probability density function is shown on the interval [0,4].

There are a number of applications where the exponential function is a resonable model. For example, waiting in queues.

The exponential probability density function is shown on the interval [0,4] in Figure 7. However, remember that the full domain is on [0,∞), so we've shown only part of the full picture. In determining on what domain to draw the function, we extended the interval three standard deviations to the right of the mean.

Unlike the normal and uniform distributions, the exponential distribution is not symmetric about its mean. However, in Figure 7 there is reasonable evidence that the distribution will "balance" about the mean at μ = 1.

Using pexp

Suppose that we want to find the probability that x &le 1. We would shade the area under the exponential probability density function to the left of 1, as shown in Figure 8.

> x=seq(0,4,length=200) > y=dexp(x,rate=1) > plot(x,y,type="l",lwd=2,col="red",ylab="p") > x=seq(0,1,length=200) > y=dexp(x,rate=1) > polygon(c(0,x,1),c(0,y,0),col="lightgray")          

The code above produces shaded region under the exponential distribution shown in Figure 8.

Shading the region under the density curve to the left of <i>x = 1</i>

Figure 8. Shading the region under the density curve to the left of x = 1

Now, just as we did with the uniform distribution above, we will use the pexp command to compute the area of the shaded region in Figure 8.

> pexp(1,rate=1) [1] 0.6321206          

You may find it surprising that the answer was not 50%! However, the mean at x = 1 is not the median! The graph of the exponential is "skewed to the right" and the extreme outliers at the right strongly influence the mean, pushing it to the right of the median.

As a second example, suppose that we want to find the probability that x lies between 1 and 2.

> x=seq(0,4,length=200) > y=dexp(x,rate=1) > plot(x,y,type="l",lwd=2,col="red",ylab="p") > x=seq(1,2,length=200) > y=dexp(x,rate=1) > polygon(c(1,x,2),c(0,y,0),col="lightgray")          

The above code produces the shaded region under the exponential density curve in Figure 9.

Calculating the probability that <i>x</i> lies between 1 and 2.

Figure 9. Calculating the probability that x lies between 1 and 2.

Again, to find the shaded area in Figure 9, we must subtract the area to the left of x = 1 from the area to the left of x = 2.

> pexp(2,rate=1)-pexp(1,rate=1) [1] 0.2325442          

Thus, the probability of selecting a number between 1 and 2 from this distribution is approximately 0.2325442.

Finally, if we need to find the area to the right of a given number, simply subtract the area to the left of the given number from the total area. For example, to find the probability that x > 3, subtract the probability that x ≤ 3 from 1.

> 1-pexp(3,rate=1) [1] 0.04978707          

Using qexp

The command qexp will find quantiles for the exponential distribution in the same way as we saw the qunif find quantiles for the uniform distribution. Thus, to find the 50th percentile for the exponential distribution on the interval, we execute the following code.

> qexp(0.50,rate=1) [1] 0.6931472

This result is in keeping with the fact that the distribution is skewed badly to the right. The outliers at the right end greatly influence the mean, pushing it to the right. With the mean at x = 1, the current result for the median makes good sense. Fifty percent of the data lies to the left of x = 0.6931472 and fifty percent of the data lies to the right of x = 0.6931472.

Regarding d, p, and q

As we've seen in this and previous activities, the letters d, p, and q have special meanings:

  • "d" is for "density." It is used to find values of the probability density function.
  • "p" is for "probability." It is used to find the probability that the random variable lies to the left of a given number.
  • "q" is for "quantile." It is used to find the quantiles of a given distribution.

In connection with the normal distribution, dnorm calculates values of the normal probability density function. Similarly, dbinom, dunif, and dexp calculate values of the binomial, uniform, and exponential probability density functions, respectively.

In connection with the normal distribution, pnorm calculates area under the normal probability density function to the left of a given number. Similarly, pbinom, punif, and pexp calculate area under the binomial, uniform, and exponential probability density functions to the left of a given number, respectively.

In connection with the normal distribution, qnorm calculates quantiles for the normal probability density function. Similarly, pbinom, punif, and pexp calculate quantiles for the binomial, uniform, and exponential probability density functions, respectively.

This common use of d, p, and q is by design. There are a host of statistical distributions that we've yet to introduce. For example, there is a distribution called the Beta Distribution. The commands dbeta, pbeta, and qbeta would be used to calculate values of the beta probability density function, to calculate the area to the left of a given number underneath the beta probability density function, and to calculate quantiles for the beta distribution.

Enjoy!

We hope you enjoyed this introduction to the uniform and exponential distributions in R. We encourage you to explore further. Try typing ?dunif or ?dexp and reading the resulting help files.

tomlinsonmothe1966.blogspot.com

Source: https://mse.redwoods.edu/darnold/math15/UsingRInStatistics/ContinuousDistributions.php

Post a Comment for "How to Model Continuous Distribution Sample in R"