-
optimizes the learning rate for each parameter in your model
-
Kaparthy likes to use Adam with a learning rate of 3e-4 (to set baselines)
- “In my experience Adam is much more forgiving to hyperparameters, including a bad learning rate.”
-
Q: Doesn’t Adam set the learning rate for me? Why do I have to set one?
- The value you provide is the initial ballpark that Adam uses (adam can then decide what learning rate to use later)