In this post we will explore Markov Chain Monte Carlo Methods (MCMC), and specifically Metropolis-Hastings. MCMC Methods allow us to sample from a distribution, if all we know is $\phi(x)=c\times f(x)$, where $f(x)$ is the density at point $x$ of the distribution we wish to sample from, and $c$ is any fixed constant. This allows us to sample from the distribution without knowing the normalization term of the distribution. Recall that all density functions technically need to integrate to $1$, i.e., $\int f(x) dx=1$. However, often we do not know exactly $f(x)$ but we know $\phi(x)$ for some $c$. Notice that $\phi$ then is not technically a distribution, but through metropolis hastings we can convert it to a distribution.
Running Metropolis-Hastings (MH) over several distributions
Through the MH algorithm, in the following, we simulate the evolution of 2-dimensional carefully designed Markov chains, many times in parallel. MH gives us a way to make these Markov chains have as a stationary distribution the distribution of interest. In the following, you can try different distributions, some of which are constant over some particular shape, e.g. uniform over the map of Greece or the streets of Manhattan. We also include distributions that are non-uniform, such as a population heatmap of the USA or mixtures of Gaussians. You can try the wiggly ring distribution, which is a changing distribution with time. Finally we include a 4-dimensional Markov chain, which assigns uniform probability over all the pairs of points in Greece and Turkey that are not too far from each other.
Challenges and Details
Points outside the support of the distribution
In the above implementation, points outside the support of the distribution are assigned very low weight in $\phi$ (rather than $0$). Therefore points are engaging in essentially a random walk for sometime, until they stumble at a point within the support. There is positive but very low probability that they exit the support again. This has some benefits, for example, if the support consists of disjoint sets, then unless the step is sufficiently big, we will never be able to exit a particular subset of the support (therefore the Markov chain will be essentially trapped in a subset of all the states).
If we start from a point outside the support, the Markov chain might take a while until it stumbles on a point in the support. This might be mitigated if we know roughly where the support is or if we can instead sample from a distribution with density proportional to the inverse of the distance from the support or some other approach along these lines. Notice that in particular in the case of Greece-Turkey nearby points, chains take quite a while until they end up in a point within the distribution. One could initiate the chains from a point within the support if such a point is known. Notice that the marginal distribution of the points in Turkey or the points in Greece are not uniform, even though the pairs of points are uniform.