Hidden Markov Model axes semantics
What are the semantics behind a Hidden Markov Model's matrices?
Previously, I wrote about the "event, batch, and sample" shape axes. (see my blog post on probability distributions.)
Building off that, here's a few more.
First off, there's the transition matrix. It is a square matrix, and its axis semantics are (num_states, num_states)
. From here, we already know that there's at least two tensor axes that have a states
semantic. The transition matrix is necessary and (I think) sufficient for specifying a Markov Model. (The Hidden piece needs extra stuff.)
The transition matrix is one of the inputs, and we can control it using a Dirichlet Process prior (see also: ).
There's also the emission parameters. Assume we have Gaussian-distributed observations from each of the hidden states, and that there are no autoregressive terms. Per state, we need a Gaussian central tendency vector $\mu$ of length n_dimensions
, and a covariance matrix $\sigma$ of dimension (n_dimensions, n_dimensions)
. Therefore, for an entire HMM, we need to also specify $\mu$ with shape (n_states, n_dimensions)
and covariance matrix $\sigma$ with shape (n_states, n_dimensions, n_dimensions)
.
We can view the Hidden Markov Model as a probability distribution, by also specifying its output shapes. If we semantically define one draw of a hidden Markov model as a timeseries sequence of states drawn from the model, then the draw should have an event shape of (n_timesteps,)
. Multiple draws would give an event shape of (n_samples, n_timesteps)
.
Assuming we have a Gaussian emission distribution, we have to now
Dirichlet Process
What exactly is a Dirichlet process?
We start from the Dirichlet Distribution, which provides a vector of probabilities over states. The vector of probabilities must sum to 1, and they give Categorical Distribution probabilities, i.e. the probability of getting a draw from that category.
In contrast to the Dirichlet distribution, which has a fixed number of categories, the Dirichlet process describes a generative process for an infinite number of categories. There are a few algorithmic variants of the Dirichlet process, including the Chinese Restaurant Process, the Indian Buffet Process, and the Stick Breaking Process.
In practice, when computing with Dirichlet processes, we tend to use the Stick Breaking Process with a large but finite number of states.