Points to consider while building a Machine Learning model

19 Feb 2017 The objective of this post is to list down some of the pointers to keep in mind while building a Machine Learning model.

Always start with the simplest of models. You can increase the complexity if the performance of a simple model is inadequate.
Understand your dataset first.
Build a baseline model before building any prediction model. I will expand on this further in another post.
Complex models tend to over fit and simpler models tend to under fit. It is your job to find a balance between these two.
High bias and low variance - A property of simpler models. Suggests under fitting.
High variance and low bias - A property of complex models. Suggests over fitting.
In any Machine Learning model, if the number of parameters is greater than the number of training examples, beware. It leads to over fitting. Try considering a simpler model with lesser number of parameters or reduce the number of hidden layers or anything else to reduce the number of parameters of the model.
Always normalize the inputs. Neural Networks are optimized for working on numbers between 0 and 1. Any number greater than 1 leads to explosive gradient descent, which involves weight updates by large numbers.
Regularization is very very important. Therefore, consider using an XGBoost model instead of Random Forest.

Some terms to keep in mind:

Stratified Sampling - When the training data is overly skewed, the practice of picking the samples such the final training data has the distribution you need.
Bootstrapping - Evaluating the same model with different random seeds.

Sujay S Kumar