Distributions

One of the main strengths of the scipy.stats module is the great number of distributions coded, both continuous and discrete. The list is impressively large and has at least 80 continuous distributions and 10 discrete distributions.

One of the most common ways to employ these distributions is the generation of random numbers. We have been employing this technique to contaminate our images with noise, for example:

>>> import scipy.misc 
>>> from scipy.stats import signaltonoise 
>>> from scipy.stats import norm     # Gaussian distribution
>>> lena=scipy.misc.lena().astype(float)
>>> lena+= norm.rvs(loc=0,scale=16,size=lena.shape)
>>> signaltonoise(lena,axis=None)

The output is shown as follows:

array(2.459233897516763)

Let's see the SciPy way of handling distributions. First, a random variable class is created (in SciPy there is the rv_continuous class for continuous random variables and the rv_discrete class for the discrete case). Each continuous random variable has an associated probability density function (pdf), a cumulative distribution function (cdf), a survival function along with its inverse (sf, isf), and all possible descriptive statistics. They also have associated the random variable, rvs, which is what we used to actually generate the random instances. For example, with a Pareto continuous random variable with parameter b = 5, to check these properties, we could issue the following commands:

>>> import numpy
>>> from scipy.stats import pareto
>>> import matplotlib.pyplot as plt
>>> x=numpy.linspace(1,10,1000)
>>> plt.subplot(131); plt.plot(pareto.pdf(x,5))
>>> plt.subplot(132); plt.plot(pareto.cdf(x,5))
>>> plt.subplot(133); plt.plot(pareto.rvs(5,size=1000))
>>> plt.show()

This gives the following graphs, showing probability density function (left), cumulative distribution function (center), and random generation (right):