Importance of a Baseline Model

One of the important aspects of building a machine learning model is to understand the data first. Most of us forget this and jump right into modelling. Another corollary to this is that we often times forget to build a baseline model before building something complicated.

What is a Baseline Model and a Baseline Accuracy?

A baseline model, in simple words, is the most simple model that you can build over the provided data. The accuracy that is achieved by a baseline model is the lower bound for evaluating the performance of your model.

A baseline model usually does not include any machine learning approaches, rather a statistical approach. It also include heuristics, randomness or simple statistics in order to come up with a value.

Sklearn supports baseline models in the form of Dummy Classifiers:

In the case of regression, a baseline model could be any of the following:

Ideally, the performance of the machine learning model should be much greater than the statistical performance.

In case of models that are already implemented, we can use the performance of the existing models as a frame of reference and they become baseline models.