Neural Networks over Random Forest

There has been many works published in the academia that prove that a neural network has a higher capability of capturing the underlying pattern much more effectively than statistical models. This fact is embraced by the industry as well where more and more applications leverage the use of Neural Networks.

I am not going to reiterate on the various reasons why a Neural Network is better than statistical models, as I would not be able to do justice to that. I am going to demonstrate using a simple model how a neural network outperforms statistical model.

I recently came across a very simple problem on Kaggle, the Kobe's Shot Selection - Kobe Bryant Shot Selection. It is one of the simplest problem for any beginner data scientist to cut his teeth on.

I went through a lot of solutions for this problem, which were unsurprisingly filled with Random Forests and XGBoost Classifiers. This was as expected, as XGBoost models are a proven winner in a 70% of Kaggle competitions and also, a Decision Tree would be the most intuitive model to model this particular problem.

I tried many variations of the same and was able to climb upto rank 240 using the XGBoost models. But these models relied heavily on extensive feature engineering. Me, personally, being from a non sports background, it was rather difficult to identify these features which would be relevant to the problem. I had to read blogs and other solutions to understand that certain features like "Is it the last 5 mins of the game?" or "Is it a home or away match?", are very important in order to predict the outcome. This requirement of domain knowledge is a shot in the knee for someone with limited domain knowledge.

The next model I built, was a simple Feed Forward Neural Network with one hidden layer. The input dimension was 197, the hidden layer dimension was 30 and the output was a single sigmoid neuron, optimizing on binary cross entropy.



Such a simple model with practically no feature engineering at all was able to put me in the 25th position on the public leaderboard, Public Leaderboard Ranking.

This leads me to one of the main advantages of neural networks over statistical models. You do not need any domain knowledge in order to build a model. Although the winning solution implements an XGBoost model, I am sure it involves a lot of feature engineering, which is a time consuming task.

Some of the points to keep in mind while building a neural network are:

  1. Always normalize the inputs. Neural Networks are optimized for working on numbers between 0 and 1. Any number greater than 1 leads to explosive gradient descent, which involves weight updates by large numbers.
  2. The number of parameters of the model should be significantly less than the number of training examples. If it is greater, it will lead to over fitting.
  3. Use neurons which align to the objective of the problem. For example, in this case, the model was being evaluated on the logloss and hence it makes sense to use sigmoid neurons.