The nice thing about BUGS/JAGS/Stan/etc is that they can operate on arbitrarily complex bayesian networks. You can take my running ‘coin toss’ example and add extra layers. Imagine that we believe that the mint who made the coin produces coins who bias ranges between theta=0.7 and theta=0.9 uniformly. Now we can take data about coin tosses, and use it to infer not only knowledge about the bias of one coin, but also about the coins made by the mint.

But this kind of generality comes at a cost. Let’s look at a simpler model: we have ten datapoints, drawn from a normal distribution with mean mu and standard deviation sigma, and we start with uniform priors over mu and sigma.

For particular values of mu and sigma, the posterior density is proportional to the likelihood, which is a product of gaussians. However, we can avoid doing a naive N exponentials with a bit of algebra, instead doing a single exponential involving a summation. So, as we add more data points, the runtime cost of evaluating the posterior (or at least something proportional to it) will rise, but only at the cost of a few subtractions/squares/divides rather than more exponentials.

In contrast, when I use JAGS to evaluate 20 datapoints, it does twice as many log() calls as it does for 10 datapoints, so seems not to be leveraging any algebraic simplifications.

Next step: write a proof of concept MCMC sampler which runs faster than JAGS for the non-hierarchical cases which are most useful to me.