What are the semantics behind a Hidden Markov Model's matrices?
Previously, I wrote about the "event, batch, and sample" shape axes. (see my blog post on probability distributions.)
Building off that, here's a few more.
Input shape
First off, there's the transition matrix. It is a square matrix, and its axis semantics are (num_states, num_states)
. From here, we already know that there's at least two tensor axes that have a states
semantic. The transition matrix is necessary and (I think) sufficient for specifying a Markov Model. (The Hidden piece needs extra stuff.)
The transition matrix is one of the inputs, and we can control it using a Dirichlet Process prior (see also: Hierarchical Dirichlet Process Hidden Markov Model Transition Matrices).
There's also the emission parameters. Assume we have Gaussian-distributed observations from each of the hidden states, and that there are no autoregressive terms. Per state, we need a Gaussian central tendency vector $\mu$ of length n_dimensions
, and a covariance matrix $\sigma$ of dimension (n_dimensions, n_dimensions)
. Therefore, for an entire HMM, we need to also specify $\mu$ with shape (n_states, n_dimensions)
and covariance matrix $\sigma$ with shape (n_states, n_dimensions, n_dimensions)
.
Event shape
We can view the Hidden Markov Model as a probability distribution, by also specifying its output shapes. If we semantically define one draw of a hidden Markov model as a timeseries sequence of states drawn from the model, then the draw should have an event shape of (n_timesteps,)
. Multiple draws would give an event shape of (n_samples, n_timesteps)
.
Assuming we have a Gaussian emission distribution, we have to now