Skip to content

Hidden Markov Models

Sequence Labeling

Noun, Verb, Article, Adjective, Noun

eg - Faith is a fine invention

One to one mappying between labels and input values

\(\prod\limits_{j=0}^{n} \mathbb{P}(y_{i+1} | y_j) \times \prod\limits_{j=1}^{n} \mathbb{P}(x_i | y_j)\)

The first term -> Transition Probabilities \(a_{y_j, y_{j+1}}\) The second term -> Emission Probabilities \(b_{y_j} (x_j)\)

Hidden Markov Model

Assumption: Current state only determined by previous state, and nothing else

  • An HMM is defined by a tuple: \(\langle \mathcal{T}, \mathcal{O}, \theta \rangle\)
    • \(\mathcal{T}\): A set of states including START and END
    • \(\mathcal{O}\): A set of observation symbols
    • \(\theta\): Transition and emission parameters \(a_{u,v}\) and \(b_u (o)\)

The probability of observing an input "the dog the" as a state transition \(A B A\) is \(a_{START,A} b_A("the") a_{A,B} b_B("dog") a_{B,A} b_A("the") a_{A, STOP}\)

Estimation

From actual labeled data (too computationally expensive)

\(a_{u,v} = \cfrac{\text{count}(u,v)}{count(u)}\)

\(b_u(o) = \cfrac{\text{count}(u \to o)}{count(u)}\)

MLE

likelihood = \(\prod\limits_{u,v} (a_{u, v})^{\text{count}(u, v)} \times \prod\limits_{u, o} (b_{u} (o))^{\text{count}(u \to o)}\)

log likelihood = \(\text{count}(u, v) \sum\limits_{u,v} (a_{u, v}) + \text{count}(u \to o)\sum\limits_{u, o} (b_{u} (o))\)

maximise!

Answer the qn: Which label sequence is the most probable given the word sequence x?

\(y^* = \argmax_y \mathbb{P}(y | x)\) where \(y\) is the label seq and \(x\) is the input seq

\(y^* = \argmax_y \cfrac{\mathbb{P}(y \cap x)}{mathbb{P}(x)}\)

\(y^* = \argmax_y \mathbb{P}(y \cap x)\) (denom const wrt \(y\))

Too complex to brute force: \(\mathcal{O}(|\mathcal{T}^n|)\) ($\mathcal{T}# - # states, \(n\) - num words in sentence)

Better: Finding highest scoring path in transition graph, enter DP (Viterbi Algo)

\(\pi(j, u)\) = highest scoring path to word \(j\) and label \(u\)

\(\pi(0, \text{START}) = 1\) & \(\pi(0, u) = 0\)

\(\pi(j+1, n) = \max_{v \in \text{states}} (\pi(j, v) \times b_u(x_{j+1}) \times a_{v, u})\)

\(\pi(n+1, \text{STOP}) = \max_{v \in \text{states}} (\pi(n, v) \times a_{v, \text{STOP}})\)

store \(v\) as well so backtrack to find the actual path that gives the score.

\(\mathcal{O}(n |\mathcal{T}|^2)\) complexity