decreasing learning rate

the idea is that your model needs to take larger steps at first, then smaller later
Note: for the Adam optimizer, the rate you set is the maximum learning rate that adam can use - “It’s true, that the learning rates adapt themselves during training steps, but if you want to be sure that every update step doesn’t exceed lambda you can than lower lambda using exponential decay or whatever” - https://stackoverflow.com/questions/39517431/should-we-do-learning-rate-decay-for-adam-optimizer

🏖️ Kaggle Solutions