How the Naive Bayes classifier works

We will try to understand all of this by looking at the example of the Titanic. While the Titanic was sinking, a few of the categories had priority over others, in terms of being saved. We have the following dataset (it is a Kaggle dataset):

Person category	Survival chance
Woman	Yes
Kid	Yes
Kid	Yes
Man	No
Woman	Yes
Woman	Yes
Man	No
Man	Yes
Kid	Yes
Woman	No
Kid	No
Woman	No
Man	Yes
Man	No
Woman	Yes

Now, let's prepare a likelihood table for the preceding information:

		Survival chance
		No	Yes	Grand Total
Category	Kid	1	3	4	4/15=	0.27
Man	3	2	5	5/15=	0.33
Woman	2	4	6	6/15=	0.40
	Grand Total	6	9	15
		6/15	9/15
		0.40	0.6

Let's find out which category of people had the maximum chance of survival:

Kid - P(Yes|Kid)= P(Kid|Yes) * P(Yes)/P(Kid)

P(Kid|Yes) = 3/9= 0.3

P(Yes) = 9/15 =0.6

P(Kid)= 4/15 =0.27

P(Yes|kid) = 0.33 *0.6/0.27=0.73

Woman - P(Yes|Woman)= P(Woman|Yes) * P(Yes)/P(Woman)

P(Woman|Yes) = 4/9= 0.44

P(Yes) = 9/15 =0.6

P(Woman)= 6/15 =0.4

P(Yes|Woman) = 0.44 *0.6/0.4=0.66

Man - P(Yes|Man)= P(Man|Yes) * P(Yes)/P(Man)

P(Man|Yes) = 2/9= 0.22

P(Yes) = 9/15 =0.6

P(Man)= 6/15 =0.33

P(Yes|Man) = 0.22 *0.6/0.33=0.4

So, we can see that a child had the maximum chance of survival and a man the least chance.

Let's perform the sentiment classification with the help of Naive Bayes, and see whether the result is better or worse:

from sklearn.naive_bayes import MultinomialNB
# splitting data into training and validation set
xtraintf, xtesttf, ytraintf, ytesttf = train_test_split(tfidfV, Newdata['label'], random_state=42, test_size=0.3)
NB= MultinomialNB()
NB.fit(xtraintf, ytraintf)
prediction = NB.predict_proba(xtesttf) # predicting on the test set
prediction_int = prediction[:,1] >= 0.3 # if prediction is greater than or equal to 0.3 than 1 else 0
prediction_int = prediction_int.astype(np.int)
print("F1 Score-",f1_score(ytest, prediction_int))
print("Accuracy-",accuracy_score(ytest,prediction_int))

The output is as follows:

Here, we can see that our previous results were better than the Naive Bayes results.