Points to consider while building a Machine Learning model
19 Feb 2017
The objective of this post is to list down some of the pointers to keep in mind while building a Machine Learning model.
- Always start with the simplest of models. You can increase the complexity if the performance of a simple model is inadequate.
- Understand your dataset first.
- Build a baseline model before building any prediction model. I will expand on this further in another post.
- Complex models tend to over fit and simpler models tend to under fit. It is your job to find a balance between these two.
- High bias and low variance - A property of simpler models. Suggests under fitting.
- High variance and low bias - A property of complex models. Suggests over fitting.
- In any Machine Learning model, if the number of parameters is greater than the number of training examples, beware. It leads to over fitting. Try considering a simpler model with lesser number of parameters or reduce the number of hidden layers or anything else to reduce the number of parameters of the model.
- Always normalize the inputs. Neural Networks are optimized for working on numbers between 0 and 1. Any number greater than 1 leads to explosive gradient descent, which involves weight updates by large numbers.
- Regularization is very very important. Therefore, consider using an XGBoost model instead of Random Forest.
Some terms to keep in mind:
- Stratified Sampling - When the training data is overly skewed, the practice of picking the samples such the final training data has the distribution you need.
- Bootstrapping - Evaluating the same model with different random seeds.