Gibbs Sampler
In this blog we are going to discuss Gibbs Sampler with an applied example. But we will not dive into Gibbs Sampler directly. We will first discuss Stochastic processes and Markov Chains. It will be nothing more than simple reviews of these topics. Let’s get started.
1. Stochastic Process
A stochastic process includes states and a random chance. As time goes by, the process may change it’s state according to some probability comes from the randomness. Namely, two processes starting from identical conditions, we may not get the same pattern of moves between those two processes as time evolves.
Let’s say the process can only change values at discrete values such as t = 1,2,..,n. It is called a discrete time stochastic process. We also have continuous time stochastic processes if the process can change time at any point. For simplicity we only consider discrete processes in this blog.
2. Markov Chains
Markov chains are a type of stochastic processes, which are processes that move arounf a set of possible values where the future values cannot be predicted with certainty. There are probability distributions of the states. So, there is some chance element in the evolution of the process through time. A set of possible values is called the state space of the process.
Markov chains have the “memoryless” property. That means the future state of the process depends only on the present state. Normally the joint processes at times 0,1,2,..n can be built up sequentially from the previous joint probabilities and the conditional probabilities (###transition probabilities###) of the next observation in the process given the previous values of the process by using multiplication rule.
But markov chains does this in a different way. As stated earlier, they have the memoriless property which means the future state of the process depends only on the present state. So, we can rewrite the equation as
Memoriless property is also known as the Markov Property.
3. Gibbs Sampler
The Gibbs Sampler is a method for generating random variables from marginal distribution using conditional distributions. The knowledge of the of the conditional distributions is sufficient to determine a joint distribution[1]. Thanks to Gibbs Sampler, we are able to avoid the numerical solution of difficult integrals. It replaces them with a sequence of easier calculations. Let’s say we are to compute marginal probability of x given joint probability
The integration of (1) is difficult in so many cases. Gibbs sampler is an alternative method for obtaining . Rather than compute , the Gibbs sampler allows us generate a sample without knowing the actual . If we calculate large enough sample using Gibbs sampler, we can calculate any characteristic of like mean and variance. So, by taking n large enough, we can calculate the density. As n gets larger, the accuracy of our approximation to the f(x) will increase.
4. Gibbs Sequence
Let’s consider a pair of random variables (X, Y). The Gibbs Sampler generates a sample from f(x) by sampling instead from the conditional distributions f(x|y) and f(y|x). We are working with conditional distributions. If we generate f(x|y) more enough, we can reach f(x). This can be explained by Rao-Blackwell theorem, that we explain in the next section. Now, let’s see how f(y|x) and f(x|y) are used.
Taking a sample using f(y|x) and f(x|y) will give us
’s come from and ’s come from . This process is Gibbs sampling. As k in goes to infinity, converges to f(x).
In (**), we need to reach to first to get the . So, if we want to get the marginal distribution of X, we need to get the X’ values. This requires obtaining Y’ values also. So, to go from we have to go through , so the iteration sequence is , and forms a Markov chain with transition probability,
=
As discussed in the second section, the next state is depend on the present state in Markov chains.
5. Rao - Blackwell Theorem
The Rao - Blackwell Theorem states that if is an estimator of and is a sufficient statistic, then is a good approximation of . So, as we have in Gibbs Sampler, when there is a model with estimator and sufficient statistic, you can use the conditional expectation above to get a better estimator.