Addendum to HMMs

In the previous chapter, we discussed how it's possible to train a HMM using the forward-backward algorithm and we have seen that it is a particular application of the EM algorithm. The reader can now understand the internal dynamic in terms of E and M steps. In fact, the procedure starts with randomly initialized A and B matrices and proceeds in an alternating manner:

E-Step:
- The estimation of the probability α^t_ij that the HMM is in the state i at time t and in the state j at time t+1 given the observations and the current parameter estimations (A and B)
- The estimation of the probability β^t_ithat the HMM is in the state i at time t given the observations and the current parameter estimations (A and B)
M-Step:
- Computing the new estimation for the transition probabilities a_ij (A) and for the emission probabilities b_ip (B)

The procedure is repeated until the convergence is reached. Even if there's no explicit definition of a Q function, the E-step determines a split expression for the expected complete data likelihood of the model given the observations (using both the Forward and Backward algorithms), while the M-Step corrects parameters A and B to maximize this likelihood.