https://datascience.stackexchange.com/questions/37362/when-should-one-use-l1-l2-regularization-instead-of-dropout-layer-given-that-b dropout is more than regularization by deactivating neurons, it has the same effect as if you had used a totally different network for that forward/backward pass