Hidden Markov Models

Sequence Labeling¶

Noun, Verb, Article, Adjective, Noun

eg - Faith is a fine invention

One to one mappying between labels and input values

$\prod\limits_{j=0}^{n} \mathbb{P}(y_{i+1} | y_j) \times \prod\limits_{j=1}^{n} \mathbb{P}(x_i | y_j)$

The first term -> Transition Probabilities $a_{y_j, y_{j+1}}$ The second term -> Emission Probabilities $b_{y_j} (x_j)$

Hidden Markov Model¶

Assumption: Current state only determined by previous state, and nothing else

An HMM is defined by a tuple: $\langle \mathcal{T}, \mathcal{O}, \theta \rangle$
- $\mathcal{T}$: A set of states including START and END
- $\mathcal{O}$: A set of observation symbols
- $\theta$: Transition and emission parameters $a_{u,v}$ and $b_u (o)$

The probability of observing an input "the dog the" as a state transition $A B A$ is $a_{START,A} b_A("the") a_{A,B} b_B("dog") a_{B,A} b_A("the") a_{A, STOP}$

Estimation¶

From actual labeled data (too computationally expensive)

$a_{u,v} = \cfrac{\text{count}(u,v)}{count(u)}$

$b_u(o) = \cfrac{\text{count}(u \to o)}{count(u)}$

MLE¶

likelihood = $\prod\limits_{u,v} (a_{u, v})^{\text{count}(u, v)} \times \prod\limits_{u, o} (b_{u} (o))^{\text{count}(u \to o)}$

log likelihood = $\text{count}(u, v) \sum\limits_{u,v} (a_{u, v}) + \text{count}(u \to o)\sum\limits_{u, o} (b_{u} (o))$

maximise!

Answer the qn: Which label sequence is the most probable given the word sequence x?

$y^* = \argmax_y \mathbb{P}(y | x)$ where $y$ is the label seq and $x$ is the input seq

$y^* = \argmax_y \cfrac{\mathbb{P}(y \cap x)}{mathbb{P}(x)}$

$y^* = \argmax_y \mathbb{P}(y \cap x)$ (denom const wrt $y$)

Too complex to brute force: $\mathcal{O}(|\mathcal{T}^n|)$ ($\mathcal{T}# - # states, $n$ - num words in sentence)

Better: Finding highest scoring path in transition graph, enter DP (Viterbi Algo)

$\pi(j, u)$ = highest scoring path to word $j$ and label $u$

$\pi(0, \text{START}) = 1$ & $\pi(0, u) = 0$

$\pi(j+1, n) = \max_{v \in \text{states}} (\pi(j, v) \times b_u(x_{j+1}) \times a_{v, u})$

$\pi(n+1, \text{STOP}) = \max_{v \in \text{states}} (\pi(n, v) \times a_{v, \text{STOP}})$

store $v$ as well so backtrack to find the actual path that gives the score.

$\mathcal{O}(n |\mathcal{T}|^2)$ complexity