Back to notes Modules February 10, 2026 168 words

CTC

Connectionist Temporal Classification

When working with sequences that are not guaranteed to be aligned, like handwritten text, we need a way to compare the guesses of the model (usually a letter for each vertical slice) to the label.

In order to do this we need to answer:

How likely is this sequence compared to all possible sequence that'd produce the label when compressed with $B$ function?

$B$ function is a simple compression function that takes a string and

Remove duplicate letter between empty tokens
Remove empty tokens

So a sequence like H_ELL_L_OO_ is compressed to HELLO.

To make this possible an empty token (_) needs to be added to the vocabulary and model needs to predict this token between letters.

Math

Probability of a label is sum of all possible paths' probabilities. Each path is represented with $\pi$ , and has length $T$ .

P(y | x) = \sum_{\pi:B(\pi) =y}\left[{\prod^{T}_{t=1}{p_t(\pi_t)}}\right]

And loss is defined as negative log of the probability.

\mathcal{L}_{CTC} = -\log P(y|x)