Import the following packages:

from nltk.tokenize import RegexpTokenizer
from nltk.stem.snowball import SnowballStemmer
from gensim import models, corpora
from nltk.corpus import stopwords

Load the input data:

def load_words(in_file):
  element = []
  with open(in_file, 'r') as f:
    for line in f.readlines():
      element.append(line[:-1])
  return element

Class to pre-process text:

classPreprocedure(object):
  def __init__(self):
    # Create a regular expression tokenizer
    self.tokenizer = RegexpTokenizer(r'w+')

Obtain a list of stop words to terminate the program execution:

    self.english_stop_words= stopwords.words('english')

Create a Snowball stemmer:

    self.snowball_stemmer = SnowballStemmer('english')

Define a function to perform tokenizing, stop word removal, and stemming:

  def procedure(self, in_data):
# Tokenize the string
    token = self.tokenizer.tokenize(in_data.lower())

Eliminate stop words from the text:

    tokenized_stopwords = [x for x in token if not x in self.english_stop_words]

Implement stemming on the tokens:

    token_stemming = [self.snowball_stemmer.stem(x) for x in tokenized_stopwords]

Return the processed tokens:

    return token_stemming

Load the input data from the main function:

if __name__=='__main__':
  # File containing input data
  in_file = 'data_topic_modeling.txt'
  # Load words
  element = load_words(in_file)

Create an object:

  preprocedure = Preprocedure()

Process the file and extract the tokens:

  processed_tokens = [preprocedure.procedure(x) for x in element]

Create a dictionary based on the tokenized documents:

  dict_tokens = corpora.Dictionary(processed_tokens)
  corpus = [dict_tokens.doc2bow(text) for text in processed_tokens]

Develop an LDA model, define required parameters, and initialize the LDA objective:

  num_of_topics = 2
  num_of_words = 4
  ldamodel = models.ldamodel.LdaModel(corpus,num_topics=num_of_topics, id2word=dict_tokens, passes=25)
  print "Most contributing words to the topics:"
  for item in ldamodel.print_topics(num_topics=num_of_topics, num_words=num_of_words):
    print "nTopic", item[0], "==>", item[1]

The result obtained when topic_modelling.py is executed is shown in the following screenshot: