Stick Breaking Process

One algorithmic protocol for generating Dirichlet Process draws.

Steps:

  1. Start with a stick of length 1
  2. Draw realization $i$ from a pre-configured Beta Distribution. (What is a pre-configured probability distribution) Call this realization $p_i$.
  3. Split stick of length 1 into two, with fraction $p_i$ of the stick on the left, and fraction $1 - p_i$ of the stick on the right.
  4. Store the left stick as $l_i$.
  5. Repeat this ad infinitum (if we're talking about it in abstract), or up till a fixed number of draws.

We'll now have a series of draws for $p_i$ and $l_i$:

  • $l = (l_1, l_2, l_3, ... l_n)$
  • $p = (p_1, p_2, p_3, ... p_n)$

Each $p$ came from an independent Beta Distribution draw, while each $l$ was the result of breaking whatever was leftover from the previous round of stick breaking.

If we finished at a finite stopping point, then $l$ is guaranteed to not sum to 1, as we never know what length of stick was leftover on that last stick breaking step. To use $l$ as a valid probability vector, it must be re-normalized to sum to 1, i.e.:

$$l_{norm} = \frac{l}{\sum{l}}$$

Beta Distribution

The Beta distribution is a Probability distribution that has support over the interval $[0, 1]$. It's most commonly used to express degree of belief in the value of a probability term.

Dirichlet Process

What exactly is a Dirichlet process?

We start from the Dirichlet Distribution, which provides a vector of probabilities over states. The vector of probabilities must sum to 1, and they give Categorical Distribution probabilities, i.e. the probability of getting a draw from that category.

In contrast to the Dirichlet distribution, which has a fixed number of categories, the Dirichlet process describes a generative process for an infinite number of categories. There are a few algorithmic variants of the Dirichlet process, including the Chinese Restaurant Process, the Indian Buffet Process, and the Stick Breaking Process.

In practice, when computing with Dirichlet processes, we tend to use the Stick Breaking Process with a large but finite number of states.

What is a pre-configured probability distribution

Probability distribution can come in what I think of as "pre-configured" or "raw" states.

An example is a Gaussian. Without any configurations, it's got an infinite set of possible $\mu$ and $\sigma$ values. With a configuration, it has one $\mu$ and one $\sigma$.

Another example is the Beta distribution. Without any configurations, it's got an infinite set of possible $\alpha$ and $\beta$ parameter values. With a configuration, it has one $\alpha$ and one $\beta$.