Importance of a Baseline Model23 Feb 2017 One of the important aspects of building a machine learning model is to understand the data first. Most of us forget this and jump right into modelling. Another corollary to this is that we often times forget to build a baseline model before building something complicated.
What is a Baseline Model and a Baseline Accuracy?
A baseline model, in simple words, is the most simple model that you can build over the provided data. The accuracy that is achieved by a baseline model is the lower bound for evaluating the performance of your model.
A baseline model usually does not include any machine learning approaches, rather a statistical approach. It also include heuristics, randomness or simple statistics in order to come up with a value.
Sklearn supports baseline models in the form of Dummy Classifiers:
- “stratified”: generates predictions by respecting the training set’s class distribution.
- “most_frequent”: always predicts the most frequent label in the training set.
- “prior”: always predicts the class that maximizes the class prior.
- “uniform”: generates predictions uniformly at random.
- “constant”: always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class.
- Median or average
In case of models that are already implemented, we can use the performance of the existing models as a frame of reference and they become baseline models.