THE RANDOM FOREST ALGORITHM

Oct 2, 20213 min read

Machine learning is a vast subject with numerous concepts, each concept for different purpose, and each topic born from different scenarios. In short, it is a forest itself. Just like, every step you take into the forest, you find a different tree or animal, even in ML, every time you explore, you learn a new concept, sometimes a sub-species of earlier concept, and other times, it is a completely different new concept. While ML itself is a forest, there are models and topics in ML which are referred as trees and forest. One of that models is, Random Forest model.

Before learning about the Random forest model, there are concepts you need to be familiar with to understand them more clearly and more easily.

Things to know first

Supervised learning: It is machine learning model where the model is trained using labelled data, i.e, for spam detection ML model, numerous emails with labels as spam or not spam is fed into the model. The model tries to figure out the pattern from the data and connects it to the labels. Hence, the results would only be the labelled field.

Classification: It is a type of supervised learning where the model predicts the category the data belong to. Here, the labelled data is qualitative and only finite values can present. Some examples of classification include spam detection, churn prediction, sentiment analysis, dog breed detection and so on. It groups the data into their respective categories and predicts the category of the future data given for prediction.

Regression: It is the second type of supervised learning where the model predicts a numerical value based on previously observed data. Here, the labelled data is quantitative, it is continuous and can be infinite values. Some examples of regression models include house price prediction, stock price prediction, height-weight prediction and so on.

Decision trees:

Like the name implies, they are trees made from decisions. Unlike normal trees, decision trees are depicted to grow downwards. For every decision, ‘True’ or ‘False’, a new branch begins and for each branch there can exist again ‘True’ or ‘False’ branches.

Decision trees are similar to flow charts, starting from top and expanding as you go down. At every node, a decision is made on the property and the data is split into branches based on the decision.

What is random forest model?

Random forest is a supervised machine learning decision tree model which can be used for both classification and regression. The classification model is termed as Random forest classifier and regression model as Random forest regression model.

Decision tree models are very useful, but it sought that, they tend to overfitting, meaning, they give best results for the tested models but when a real dataset, another than the trained one is given, they show poor accuracy.

Hence, random forest uses ensemble learning. Which is, choosing the output based on the votes given by various models.

To explain in laymen terms, random forests uses majority voting system, where all the decision tree models get to vote their results. Based on the majority of the votes, the final output is decided

How ensemble works?

Bootstrapping the data and using its aggregate to make a decision is known as bagging. In other words, bagging is training a bunch of individual models parallelly, and each model trained by different subset.

Ensemble uses bagging, i.e, the given dataset is split into smaller datasets with each dataset having equal number of rows and each set having different combination of columns. The smaller sets could even contain repeating rows as the splitting is done by repeating the rows

A) & B) Different columns are considered

Working of Random forest model

Step1: Bootstrapping the dataset

The original data is bootstrapped into numerous datasets by repeating the rows or removing the columns.

Step 2: Creating decision trees

Step 3: Making the decision trees vote

Step 4: Counting the votes and predicting the results.

For better understanding the Random forest regression model and want to check out the code, here is the github link for a random forest model implemented on a dataset.

Github

https://github.com/PabbaAbhishek/Random_Forest.git

Madras Scientific Research Foundation

THE RANDOM FOREST ALGORITHM

Things to know first

Github

Recent Posts

Comments