prevents weights from getting too big by adding a term to the loss: loss = loss + weight decay parameter * L2 norm of the weights