Here, if , , and , then separates
and .
The Hammersley-Clifford theorem implies an area law for mutual
information; we will explain what this is and sketch why this is true.
Divide a system into two disjoint pieces and . We want to know
about the mutual information between and , . The
Hammersley-Clifford theorem gives us a bound which depends only on the
size of the boundary between these sets. For simplicity,
assume . Also, assume that the interactions have
bounded range; then, the Hammersley-Clifford theorem tells us that
.
Now, we will use the fact
. We can see
this by writing out the expressions, but the intuition is that the term
on the left asks about how much knows about , having already
known about . This equals how much knows about and
combined, minus how much knows about alone. In
this case, since we said , then is the
same as . In general, however, we have an upper
bound:
In this calculation, we
have used (the
information between and is the amount by which the
entropy of gets reduced once we know ) and
(which is true classically).
In Bayesian inference, we have a model for a system which can be very
complicated. The model represents our assumptions on how parts of the
system are causally related to the rest of the system. We have some
observations, and we want to sample from a distribution conditionally on
the fixed observations. Sampling from a conditional distribution is not
the same as sampling from the original distribution, but we can still
formally represent the conditional distribution as a Markov network.
Therefore, sampling from Markov networks is a broadly useful task.
This section will discuss Monte Carlo Markov chain (MCMC) methods,
namely the Metropolis-Hastings algorithm and Glauber dynamics. Readers
familiar with these methods may wish to skip to the discussion of
mixing in time. For readers who wish to build more
intuition about Markov chains before proceeding, see the
Appendix, where the simple example of the random walk
on a cycle is treated in detail.
The general approach is to use a Markov chain. Let be
the possible states of the system. Effectively, a Markov chain is a way
of doing a random walk over .