2024-01-15

Decision Tree for Tennis dataset

hands on machine learning

25 min read

INTRODUCTION

In this blog, we will create and visualize a decision tree for a simple tennis dataset, predicting whether the person will play based on different weather conditions."

Importing Necessary Packages:

import pandas as pd
import numpy as np

Reading and seeing the data:

tennis=pd.read_csv('PlayTennis.csv')
tennis

Tennis

Since decision tree is an algorithm that uses mathematical concepts like entropy,gini impurity and information gain we have to convert all the categorical features to numerical variables.

For this we use labelencoder from sklearn library. Labelencoder converts all the different categories of the categorical feature to a numeric variable. For example it converts the categories sunny,overcast,rainy of categorical feature outlook into 0,1 and 2 which is random in nature but unique to each category.

Importing labelencoder from sklearn library:

from sklearn.preprocessing import LabelEncoder
#Creating an object of LabelEncoder class
LB=LabelEncoder()  
#Using loop for labelencoding all categorical features/columns.
for i in tennis.columns:                  
    tennis[i]=LB.fit_transform(tennis[i])

print(tennis)

Tennis

Splitting the X(Features/Input) and Y(Output/Target):

x=tennis.drop('play',axis=1)
y=tennis['play']
print(x)

Tennis

print(y)

Tennis

Importing the decision tree and fitting/training the decision tree model:

from sklearn import tree
# Using Entropy as a criteria for splitting the tree.
clf=tree.DecisionTreeClassifier(criterion='entropy')  
clf.fit(x,y)

DecisionTreeClassifier(criterion='entropy')

Visualising the decision tree:

y_pred = clf.predict(x)
tree.plot_tree(clf)

Tennis

In the above graph X[0]=OUTLOOK,X[1]=TEMP,X[2]=HUMIDITY,X[3]=WINDY,VALUE=[5,9] means 5 No and 9 Yes for the Output play.

Since we have used all the features (X) for training data, we don't have any test data. Therefore, we will make the model predict on the training data."

y_pred = clf.predict(x)

Evaluating model performances:

y_pred == y

Tennis

It's clear that the model has predicted everything correctly.