🏖️ Kaggle Solutions

Search

❯

❯

❯

label smoothing

label smoothing

Mar 01, 2025, 1 min read

for classification problems, rather than making your target variables binary (0 or 1), it smooths it, so it’s 0.9
- https://towardsdatascience.com/what-is-label-smoothing-108debd7ef06
- If we do not use label smoothing, the label vector is the one-hot encoded vector [1, 0, 0]. Our model will make a ≫ b and a ≫ c. For example, applying softmax to the logit vector [10, 0, 0] gives [0.9999, 0, 0] rounded to 4 decimal places.
- If we use label smoothing with α = 0.1, the smoothed label vector ≈ [0.9333, 0.0333, 0.0333]. The logit vector [3.3322, 0, 0] approximates the smoothed label vector to 4 decimal places after softmax, and it has a smaller gap. This is why we call label smoothing a regularization technique as it restrains the largest logit from becoming much bigger than the rest.
- Label smoothing replaces one-hot encoded label vector y_hot with a mixture of y_hot and the uniform distribution:
  - $y_{smoothed} = (1 - α) * y_{one-hot} + \frac{α}{K}$
    - where K is he number of classes

I’m not sure how label smoothing is related to KLDivergence:

https://leimao.github.io/blog/Label-Smoothing/
https://proceedings.neurips.cc/paper_files/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf
- is related to KLDivergenceLoss
- “They show that label smoothing is equivalent to confidence penalty if the order of the KL divergence between uniform distributions and model’s outputs is reversed.”
  - not sure what this means

Backlinks

Tweet Sentiment Extraction
Vesuvius Challenge - Ink Detection
KLDivergenceLoss
custom labels

Created with Quartz v4.2.3 © 2025

Download these notes on Github!