e.g. when you add a L2 regularization term to the loss function:

  • the regularization increases the loss, which encourages the network to have smaller weights
  • smaller weights makes the model less complex and reduces overfitting
    • Cause less complex model encourages it to learn only the most important patterns