Sampling: Rejection Sampling Explained

Sampling from a Distribution - Rejection Sampling

The main idea behind this method is that if we're trying to sample from a distribution $p(x)$, we'll use another instrumental distribution, $q(x)$, to help sample from $p(x)$.
The only restriction is that $p(x) < Mq(x)$ for some $M > 1$.
Its primary use is when the form of $p(x)$ makes it hard to sample with directly, but it's possible to evaluate it at any point $x$.

Here's a breakdown of the algorithm:
  1. Sample $x$ from $q(x)$.
  2. Sample $y$ from $U(0, Mq(x))$ (uniform distribution).
  3. If $y < p(x)$, then accept $x$ as a sample for $p(x)$, otherwise go to step 1.
The reason this works is that the uniform distribution helps us "scale" the envelope provided by $Mq(x)$ down to the PDF of $p(x)$. Another way to look at it is the probability that we sample a point $x_0$. This is proportional to the probability of sampling $x_0$ from $g$ times the proportion of times we accept, which is simply given by the ratio between $p(x_0)$ and $Mq(x_0)$:
\begin{align} P(\text{sampling } x_0) \propto q(x_0) * \frac{p(x_0)}{Mq(x_0)} = \frac{p(x_0)}{M} \end{align}
This tells us the probability of sampling an arbitrary point is proportional to $p(x_0)$. After sampling many points and finding the proportion of times that we see $x_0$, the constant $M$ is normalized out and we get the correct result for the PDF $p(x)$.
Let's take a look at it more visually with an example. Our target distribution, $p(x)$, that we want to sample from is double gamma distribution, basically a two-sided gamma distribution. We'll be using a normal distribution, $q(x)$, as our envelope distribution.
The code below shows us how to find the scaling constant $M$ as well as draws us a picture of how the rejection sampling conceptually works.
In [1]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

# Target = double gamma distribution
# Envelope = normal distribution 
dg = stats.dgamma(a=1)
norm = stats.norm(loc=0, scale=2)

# Generate samples for PDF
x = np.linspace(min(dg.ppf(0.001), norm.ppf(0.001)), 
                max(dg.ppf(0.999), norm.ppf(0.999)), 1000)
dg_samples = dg.pdf(x)
norm_samples = norm.pdf(x)

# Find scaling constant for envelope 
M = max(dg_samples / norm_samples)

# Plot
df = pd.DataFrame({'Target': dg_samples, 'Envelope': M * norm_samples}, index=x)
ax = df.plot(style=['--', '-'], color=['black', 'blue'], 
             figsize=(8,6), linewidth=2.0)
ax.plot((2, 2), (0, dg.pdf(2)), 'g--', linewidth=2.0)
ax.plot((2, 2), (dg.pdf(2), M * norm.pdf(2)), 'r--', linewidth=2.0)
ax.text(1.0, 0.20, 'Reject')
ax.text(1.0, 0.03, 'Accept')
Out[1]:
Text(1,0.03,'Accept')
From the figure, once we find a sample for $q(x)$ (in this case $x=2$), we draw from a uniform distribution with range equals to the height of $Mq(x)$. If it's within the height of the target PDF, we accept it (green), otherwise reject (reject).

Implementing Rejection Sampling

The code below implements rejection sampling for our target double gamma distribution. It plots the scaled histogram and matches it up with the theoretical PDF we should get.
In [2]:
def rejection_sampling():
    while True:
        # Re-use global parameters from above
        x = np.random.normal(0, 2)
        envelope = M * norm.pdf(x)
        p = np.random.uniform(0, envelope)
        if p < dg.pdf(x):
            return x

# Generation samples from rejection sampling algorithm
samples = [rejection_sampling() for x in range(10000)]

# Plot Histogram vs. Target PDF
df['Target'].plot(color='blue', style='--', figsize=(8,6), linewidth=2.0)
pd.Series(samples).hist(bins=300, normed=True, color='green', 
                        alpha=0.3, linewidth=0.0)
plt.legend(['Target PDF', 'Rejection Sampling'])
Out[2]:
<matplotlib.legend.Legend at 0x10b769940>
Overall quite a good fit from our rejection sampler. No doubt drawing more samples would improve the fit compared to the theoretical distribution.
Pros
  • Works for most distributions (even unnormalized, which is when you integrate over the entire domain, you not necessarily get 1)
Cons
  • It's slow. If $p$ and $q$ are too different (M is large), it rejects most points.
  • It becomes slower with high dimensional distributions