Dirichlet Process
What exactly is a Dirichlet process?
We start from the Dirichlet Distribution, which provides a vector of probabilities over states. The vector of probabilities must sum to 1, and they give Categorical Distribution probabilities, i.e. the probability of getting a draw from that category.
In contrast to the Dirichlet distribution, which has a fixed number of categories, the Dirichlet process describes a generative process for an infinite number of categories. There are a few algorithmic variants of the Dirichlet process, including the Chinese Restaurant Process, the Indian Buffet Process, and the Stick Breaking Process.
In practice, when computing with Dirichlet processes, we tend to use the Stick Breaking Process with a large but finite number of states.
Notes on Statistics
Some learnings while training myself in statistics.
Distributions:
Statistical Estimation:
Papers that I'm writing:
Dealing with data:
Probabilistic Programming:
Indian Buffet Process
Chinese Restaurant Process
In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables (or blocks of the partition).
Let's try putting this into an algorithmic framing.
The probability vector that is calculated on-the-fly is effectively a Categorical Distribution.
I think this description should be enough to implement it in Python, which would be an ultimate test of my understanding.
Stick Breaking Process
One algorithmic protocol for generating Dirichlet Process draws.
Steps:
We'll now have a series of draws for $p_i$ and $l_i$:
Each $p$ came from an independent Beta Distribution draw, while each $l$ was the result of breaking whatever was leftover from the previous round of stick breaking.
If we finished at a finite stopping point, then $l$ is guaranteed to not sum to 1, as we never know what length of stick was leftover on that last stick breaking step. To use $l$ as a valid probability vector, it must be re-normalized to sum to 1, i.e.:
$$l_{norm} = \frac{l}{\sum{l}}$$
Categorical Distribution
The categorical distribution is a probability distribution that yields a category from a discrete set of categories.
It's parametrized by a probability vector, which has a length equal to the number of categories and sums to 1. The probability vector provides the probability of drawing each category.
Dirichlet Distribution
The Dirichlet distribution is often known as the multi-class (or multivariate) generalization of the Beta Distribution. In contrast to the Beta distribution, which provides a probability draw for one class, the Dirichlet distribution generalizes this to multi-class.