adding epsilon to regularize

e.g. when you add a L2 regularization term to the loss function: $L oss = O r i g ina l L oss + λ Σ W i^{2}$

the regularization increases the loss, which encourages the network to have smaller weights
smaller weights makes the model less complex and reduces overfitting
- Cause less complex model encourages it to learn only the most important patterns

🏖️ Kaggle Solutions