we have

  • like ridge regression, the fit will have more bias than least squares (makes our prediction less sensitive to tiny datasets)
  • it also shrinks parameters (when compared to just using least squares)
  • Since we’re using absolute value, weights for dimensions with large slope latter less
    • so the main reason to use lasso regression is because it can exclude useless features from equations
      • since it cares about the slope less
    • This also means that: ridge regression does a bit better when most features are useful