How the Naive Bayes classifier works

We will try to understand all of this by looking at the example of the Titanic. While the Titanic was sinking, a few of the categories had priority over others, in terms of being saved. We have the following dataset (it is a Kaggle dataset):

Person category

Survival chance

Woman

Yes

Kid

Yes

Kid

Yes

Man

No

Woman

Yes

Woman

Yes

Man

No

Man

Yes

Kid

Yes

Woman

No

Kid

No

Woman

No

Man

Yes

Man

No

Woman

Yes

 

Now, let's prepare a likelihood table for the preceding information:

 

 

Survival chance

 

 

 

 

No

Yes

Grand Total

 

 

Category

Kid

1

3

4

4/15=

0.27

Man

3

2

5

5/15=

0.33

Woman

2

4

6

6/15=

0.40

 

Grand Total

6

9

15

 

 

 

 

6/15

9/15

 

 

 

 

 

0.40

0.6

 

 

 

 

Let's find out which category of people had the maximum chance of survival:

Kid - P(Yes|Kid)= P(Kid|Yes) * P(Yes)/P(Kid)

P(Kid|Yes) = 3/9= 0.3

P(Yes) = 9/15 =0.6

P(Kid)= 4/15 =0.27

P(Yes|kid) = 0.33 *0.6/0.27=0.73

Woman - P(Yes|Woman)= P(Woman|Yes) * P(Yes)/P(Woman)

P(Woman|Yes) = 4/9= 0.44

P(Yes) = 9/15 =0.6

 P(Woman)= 6/15 =0.4

P(Yes|Woman) = 0.44 *0.6/0.4=0.66

Man - P(Yes|Man)= P(Man|Yes) * P(Yes)/P(Man)

P(Man|Yes) = 2/9= 0.22

 P(Yes) = 9/15 =0.6

 P(Man)= 6/15 =0.33

P(Yes|Man) = 0.22 *0.6/0.33=0.4

So, we can see that a child had the maximum chance of survival and a man the least chance.

Let's perform the sentiment classification with the help of Naive Bayes, and see whether the result is better or worse:

from sklearn.naive_bayes import MultinomialNB
# splitting data into training and validation set
xtraintf, xtesttf, ytraintf, ytesttf = train_test_split(tfidfV, Newdata['label'], random_state=42, test_size=0.3)
NB= MultinomialNB()
NB.fit(xtraintf, ytraintf)
prediction = NB.predict_proba(xtesttf) # predicting on the test set
prediction_int = prediction[:,1] >= 0.3 # if prediction is greater than or equal to 0.3 than 1 else 0
prediction_int = prediction_int.astype(np.int)
print("F1 Score-",f1_score(ytest, prediction_int))
print("Accuracy-",accuracy_score(ytest,prediction_int))

The output is as follows:

Here, we can see that our previous results were better than the Naive Bayes results.