Eric's Notes

Hidden Markov Model

Notes on Hidden Markov Models

A description of the Autoregressive Hidden Markov Model equations.
Hidden Markov Model axes semantics
Hierarchical Dirichlet Process Hidden Markov Model Transition Matrices - basically how to create transition matrices that are biased towards a few typical set of states.

Pages that link here

Notes on Statistics
Some learnings while training myself in statistics

Hidden Markov Model axes semantics

What are the semantics behind a Hidden Markov Model's matrices?

Previously, I wrote about the "event, batch, and sample" shape axes. (see my blog post on probability distributions.)

Building off that, here's a few more.

Input shape

First off, there's the transition matrix. It is a square matrix, and its axis semantics are (num_states, num_states). From here, we already know that there's at least two tensor axes that have a states semantic. The transition matrix is necessary and (I think) sufficient for specifying a Markov Model. (The Hidden piece needs extra stuff.)

The transition matrix is one of the inputs, and we can control it using a Dirichlet Process prior (see also: Hierarchical Dirichlet Process Hidden Markov Model Transition Matrices).

There's also the emission parameters. Assume we have Gaussian-distributed observations from each of the hidden states, and that there are no autoregressive terms. Per state, we need a Gaussian central tendency vector $\mu$ of length n_dimensions, and a covariance matrix $\sigma$ of dimension (n_dimensions, n_dimensions). Therefore, for an entire HMM, we need to also specify $\mu$ with shape (n_states, n_dimensions) and covariance matrix $\sigma$ with shape (n_states, n_dimensions, n_dimensions).

Event shape

We can view the Hidden Markov Model as a probability distribution, by also specifying its output shapes. If we semantically define one draw of a hidden Markov model as a timeseries sequence of states drawn from the model, then the draw should have an event shape of (n_timesteps,). Multiple draws would give an event shape of (n_samples, n_timesteps).

Assuming we have a Gaussian emission distribution, we have to now

Notes on Statistics

Some learnings while training myself in statistics.

Distributions:

Probability distribution
Dirichlet Process
Dirichlet Distribution
Hidden Markov Model
Estimating a multivariate Gaussian's parameters by gradient descent
Gaussians come from processes that are additive

Statistical Estimation:

Fermi estimation and Bayesian priors
When is your statistical model right vs. good enough
Maximum score estimator is used to maximize classification true positives and negatives

Papers that I'm writing:

Hierarchical Bayesian models for high throughput biological measurement

Dealing with data:

Multiple imputation is better than imputing single values

Probabilistic Programming:

Anatomy of a probabilistic programming framework
PROBPROG2020 conference

Hierarchical Dirichlet Process Hidden Markov Model Transition Matrices

From Matt Johnson's thesis.

Section 3.2.3 says:

The generative process HDP-HMM($\gamma$, $\alpha$, $H$) given concentration parameters $\gamma, \alpha > 0$ and base measure (observation prior) H can be summarized as:

$$\beta \sim GEM(\gamma)$$
$$\pi^{(i)} \sim DP(\alpha, \beta)$$
$$x_t \sim \pi^{(x_{t-1})}$$
$$y_t \sim f(\theta^{x_t})$$

...
The HDP plays the role of a prior over infinite transition matrices: each $\pi^{(i)}$ is a DP draw, and is interpreted as the ransition distribution from state $j$. The $\pi^{(i)}$ are linked by being DP draws parametrized by the same discrete measure $\beta$, thus $E[\pi^{(i)}] = \beta$ and the transition distributions tend to have their mass concentrated arounda typical set of states, providing the desired bias towards re-entering and re-using a consistent set of states.

The key here: make sure each row vector is generated from a Dirichlet Process. By generating each row from the same Dirichlet process distribution, we get a hierarchical Dirichlet process, in which a parental concentration parameter governs the generation of state transition probabilities. The concentration parameter can be "concentrated" (i.e. low) thus expressing the belief that the Markov chain should re-enter and re-use a consistent set of states.

Autoregressive Hidden Markov Model

Gaussian AR-HMMs and their structure in equations.

Firstly the HMM piece. States $s$ at time $t$ are $s_{t}$. Transition matrix is $p_{tr}$.

$$s_{t} | s_{t-1} \sim \text{Categorical}(p_{tr}[s_{t-1}])$$

(Just expressing that we slice out the row belonging to state $s_{t-1}$ from the transition matrix.)

Now, we have a conditional distribution: emission $y$ given state $s$.

$$y_t | s_t \sim \text{Normal}(\mu[s_t] + ky_{t-1}, \sigma[s_t])$$

This is the most common. The mean depends on the previous time state. We can also make the variance depend on the previous time state too:

$$y_t | s_t \sim \text{Normal}(\mu[s_t] + ky_{t-1}, (\sigma[s_t]) k \sigma_{t-1})$$

There's many other ways to establish the autoregressive dependency, as long as the autoregressive dependency is expressed in the general form of the current emission distribution parameters depending on the previous emission's output values.