This short guide will illustrate how to estimate hyper-parameters from posterior samples using bilby using a simplistic problem.

Setting up the problem

We are given three data sets labelled by a, b, and c. Each data set consists of \(N\) observations of a variable \(y\) taken at a dependent variable \(x\). Notationally, we can write these three data sets as \((y_i^a, x_i^a),(y_i^b, x_i^b),(y_i^c, x_i^c)\) where \(i \in [0, N]\) labels the indexes of each data set.

Plotting the data, we see that all three look like they could be modelled by a linear function with some slope and some intercept:


Given any individual data set, you could write down a linear model \(y(x) = c_0 + c_1 x\) and infer the intercept and gradient. For example, given the a-data set, you could calculate \(P(c_0|y_i^a, x_i^a)\) by fitting the model to the data. Here is a figure demonstrating the posteriors on the intercept, for the three data sets given above


While the data looks noise, the overlap in the \(c_0\) posteriors might (especially given context about how the data was produced and the physical setting) make you believe all data sets share a common intercept. How would you go about estimating this? You could just take the mean of the means of the posterior and that would give you a pretty good estimate. However, we’ll now discuss how to do this with hyper parameters.

Understanding the population using hyperparameters

We first need to define our hyperparameterized model. In this case, it is as simple as

\[c_0 \sim \mathcal{N}(\mu_{c_0}, \sigma_{c_0})\]

That is, we model the population \(c_0\) (from which each of the data sets was drawn) as coming from a normal distribution with some mean and some standard deviation.

To do - write in details of the likelihood and its derivation

For the samples in the figure above, the posterior on these hyperparamters is given by