• prevents weights from getting too big by adding a term to the loss:
    • loss = loss + weight decay parameter * L2 norm of the weights