Loss
Mean Squared Error (MSE) / Least Squares
Earliest version of a loss function, attributed to Gauss and Legendre. Best for regression-like problems.
Cross-Entropy (Log Loss)
Instead of having a numeric value of the loss, used when we want how likely the result was. This is necessary as MSE doesn't work that well with classification problems.
Categorical Cross-Entropy
Used for multiple classes where:
- is number of classes
Binary Cross-Entropy
Used for yes/no questions
Hinge Loss
Like cross entropy but doesn't measure distance as long as prediction is true. This comes from Support Vector Machines.
Focal Loss
Cross-entropy is overwhelmed when data is imbalanced and there's a majority class. Focal Loss adds a reshaping factor so that down-weight easy (more occurring) examples and focus on hard (less occurring) examples. where:
- is the reshaping factor.
Direct Preference Optimization (DPO) Loss
In scenarios where users can prefer some of the outputs and reject some of them, there needs to be a mechanism that increases the likelihood of winning outputs and decrease the likelihood of losing outputs.
DPO is designed to solve this issue