Back to notes Modules February 10, 2026 168 words

CTC

Connectionist Temporal Classification

When working with sequences that are not guaranteed to be aligned, like handwritten text, we need a way to compare the guesses of the model (usually a letter for each vertical slice) to the label.

In order to do this we need to answer:

How likely is this sequence compared to all possible sequence that'd produce the label when compressed with BB function?

BB function is a simple compression function that takes a string and

  1. Remove duplicate letter between empty tokens
  2. Remove empty tokens

So a sequence like H_ELL_L_OO_ is compressed to HELLO.

To make this possible an empty token (_) needs to be added to the vocabulary and model needs to predict this token between letters.

Math

Probability of a label is sum of all possible paths' probabilities. Each path is represented with π\pi, and has length TT.

P(yx)=π:B(π)=y[t=1Tpt(πt)]P(y | x) = \sum_{\pi:B(\pi) =y}\left[{\prod^{T}_{t=1}{p_t(\pi_t)}}\right]

And loss is defined as negative log of the probability.

LCTC=logP(yx)\mathcal{L}_{CTC} = -\log P(y|x)