This is a short Bayesian analysis tutorial developed around the following problem: Consider the following dataset, which is a time series of recorded coal mining disasters in the UK from 1851 to 1962 [Jarrett, 1979].
The model can then be defined as
The nest step is fitting the probability model (linked collection of probabilistic variable) to the recorded mining disaster time series. This means we are trying to represent the posterior distribution. Markov chain Monte Carlo (MCMC) algorithm [Gamerman 1997] is the method of choice. In this case we represent the posterior p(s,e,l|D) by a set of joint samples of it. The MCMC sampler produces these samples by randomly updating the values of s, e and l for a number of iterations. This updating algorithm is called Metropolis-Hasting [Gelman et al. 2004].
With a reasonable large number of samples, the MCMC distribution of s, e and l converges to the stationary distribution.
There are many general-purpose MCMC packages. Here I use PyMC - a python module created by David Huard and Anand Patil and Chris Fonnesbeck. I prefer a programming interface to stand alone programs like WinBugs for its flexibility. It uses high performance numerical libraries like numpy and optimized Fortran routines. The following code imports the model, instantiates the MCMC object and run the sampler algorithm:
The iter-parameter specifies the number of updating iterations for the sampling algorithm. The algorithm assumes that the burn-parameter specifies a sufficient large number of iterations for convergence to stationarity of distribution s, e and l. To avoid strong autocorrelation among samples in the Markov chain, successive samples can be thinned. This means only every k-th sample is retained, where k is given by the thin-parameter. The number of samples for each variable is therefore (iter-burn)/thin.
Below are the 900 samples (left) of variable s - the year in which the rate parameter changed. The histogram (right) with a mean at 40 and 95% HPD interval (35, 44).
95% HPD interval (2.5, 3.7).
95% HPD interval (0.7, 1.2).