Lasso, minimize absolute value
Ridge, minimize square value
Set randomly outputs to 0 (disables them). This prevent neurons from co-adapting and forces them to learn individually useful features.
How do you select hyperparameters?
- Manual Search
- Grid Search
- Random Search
- Bayesian Optimization
Check this as well: Guideline to select the hyperparameters in Deep Learning
What is batch normalization?
It is the normalisation the inputs of each layer in such a way that they have a mean output activation of zero and standard deviation of one.
The idea is to do the same in the hidden layers as you do in the input layer.
- Makes the network learn faster (converge more quickly, higher learning rates)
- Helps with weight problems (weight initialization, saturated activation functions, some regularization capabilities)