This is a short Bayesian analysis tutorial developed around the following problem: Consider the following dataset, which is a time series of recorded coal mining disasters in the UK from 1851 to 1962 [Jarrett, 1979]. The first step in the analysis is the Bayesian model building. We assume that the occurrences of the disasters can be modelled as a poisson process. Our hypothesis is, that at some point in time the mining process switched to a new technology, resulting in a lower rate of disasters in the later part of the time series. We define the following variables for the Bayesian stochastic model:
The model can then be defined as where D is dependent on s, e, and l. In a Bayesian network the probabilistic variables s, e, and l are considered as 'parent' nodes od D and D is a 'child' node of s, e, and l. Similarly, the 'parents' of s are t_l and t_h, and s is the 'child' of t_l and t_h.
The nest step is fitting the probability model (linked collection of probabilistic variable) to the recorded mining disaster time series. This means we are trying to represent the posterior distribution. Markov chain Monte Carlo (MCMC) algorithm [Gamerman 1997] is the method of choice. In this case we represent the posterior p(s,e,l|D) by a set of joint samples of it. The MCMC sampler produces these samples by randomly updating the values of s, e and l for a number of iterations. This updating algorithm is called Metropolis-Hasting [Gelman et al. 2004]. With a reasonable large number of samples, the MCMC distribution of s, e and l converges to the stationary distribution. There are many general-purpose MCMC packages. Here I use PyMC - a python module created by David Huard and Anand Patil and Chris Fonnesbeck. I prefer a programming interface to stand alone programs like WinBugs for its flexibility. It uses high performance numerical libraries like numpy and optimized Fortran routines. The following code imports the model, instantiates the MCMC object and run the sampler algorithm: >>> from pymc.examples import DisasterModel >>> from pymc import MCMC >>> M = MCMC(DisasterModel)>>> M.isample(iter=10000, burn=1000, thin=10)Below are the 900 samples (left) of variable s - the year in which the rate parameter changed. The histogram (right) with a mean at 40 and 95% HPD interval (35, 44). Next, the 900 samples (left) of variable e - the early rate parameter prior to s. The histogram (right) with a mean at 3.0 and 95% HPD interval (2.5, 3.7). Finally, the 900 samples (left) of variable l - the late rate parameter posterior to s. The histogram (right) with a mean at 0.9 and 95% HPD interval (0.7, 1.2).
|




