What is a pre-configured probability distribution

Probability distribution can come in what I think of as "pre-configured" or "raw" states.

An example is a Gaussian. Without any configurations, it's got an infinite set of possible $\mu$ and $\sigma$ values. With a configuration, it has one $\mu$ and one $\sigma$.

Another example is the Beta distribution. Without any configurations, it's got an infinite set of possible $\alpha$ and $\beta$ parameter values. With a configuration, it has one $\alpha$ and one $\beta$.

Probability distribution

A probability distribution is an object that assigns credibility values to discrete or continuous values. For parametrized distributions, there is usually a math function that takes in one or more parameters and returns a value across the number line.

Stick Breaking Process

One algorithmic protocol for generating Dirichlet Process draws.

Steps:

  1. Start with a stick of length 1
  2. Draw realization $i$ from a pre-configured Beta Distribution. (What is a pre-configured probability distribution) Call this realization $p_i$.
  3. Split stick of length 1 into two, with fraction $p_i$ of the stick on the left, and fraction $1 - p_i$ of the stick on the right.
  4. Store the left stick as $l_i$.
  5. Repeat this ad infinitum (if we're talking about it in abstract), or up till a fixed number of draws.

We'll now have a series of draws for $p_i$ and $l_i$:

  • $l = (l_1, l_2, l_3, ... l_n)$
  • $p = (p_1, p_2, p_3, ... p_n)$

Each $p$ came from an independent Beta Distribution draw, while each $l$ was the result of breaking whatever was leftover from the previous round of stick breaking.

If we finished at a finite stopping point, then $l$ is guaranteed to not sum to 1, as we never know what length of stick was leftover on that last stick breaking step. To use $l$ as a valid probability vector, it must be re-normalized to sum to 1, i.e.:

$$l_{norm} = \frac{l}{\sum{l}}$$