CTC
Connectionist Temporal Classification
When working with sequences that are not guaranteed to be aligned, like handwritten text, we need a way to compare the guesses of the model (usually a letter for each vertical slice) to the label.
In order to do this we need to answer:
How likely is this sequence compared to all possible sequence that'd produce the label when compressed with function?
function is a simple compression function that takes a string and
- Remove duplicate letter between empty tokens
- Remove empty tokens
So a sequence like H_ELL_L_OO_ is compressed to HELLO.
To make this possible an empty token (_) needs to be added to the vocabulary and model needs to predict this token between letters.
Math
Probability of a label is sum of all possible paths' probabilities. Each path is represented with , and has length .
And loss is defined as negative log of the probability.