Categorical Distribution
The categorical distribution is a probability distribution that yields a category from a discrete set of categories.
It's parametrized by a probability vector, which has a length equal to the number of categories and sums to 1. The probability vector provides the probability of drawing each category.
Dirichlet Process
What exactly is a Dirichlet process?
We start from the Dirichlet Distribution, which provides a vector of probabilities over states. The vector of probabilities must sum to 1, and they give Categorical Distribution probabilities, i.e. the probability of getting a draw from that category.
In contrast to the Dirichlet distribution, which has a fixed number of categories, the Dirichlet process describes a generative process for an infinite number of categories. There are a few algorithmic variants of the Dirichlet process, including the Chinese Restaurant Process, the Indian Buffet Process, and the Stick Breaking Process.
In practice, when computing with Dirichlet processes, we tend to use the Stick Breaking Process with a large but finite number of states.
Chinese Restaurant Process
In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables (or blocks of the partition).
Let's try putting this into an algorithmic framing.
The probability vector that is calculated on-the-fly is effectively a Categorical Distribution.
I think this description should be enough to implement it in Python, which would be an ultimate test of my understanding.