Executing TF-IDF in Python

The following are the stepsĀ for executing TF-IDF in Python:

  1. ImportĀ the library, as follows:
from sklearn.feature_extraction.text import TfidfVectorizer
  1. Let's make a corpus by adding four documents, as follows:
corpus = ['First document', 'Second document','Third document','First and second document' ]
  1. Let's set up the vectorizer:
vectorizer = TfidfVectorizer()
  1. We extract the features out of the text as follows:
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names())
print(X.shape)

The output is as follows:

  1. Here comes the document term matrix; every list indicates a document:
X.toarray()

We get the following output: