When we discussed language models, we showed how we can generate text. Building a chatbot is similar, except that we are modeling an exchange. This can make our requirements more complex or, actually, more simple depending on how we want to approach the problem.
In this chapter we will discuss some of the ways this can be modeled, and then we will build a program that will use a generative model to take and then generate responses. First, let’s talk about what discourse is.
Morphology and syntax tell us how morphemes are combined into words, and words into phrases and sentences. The combination of sentences into larger language acts is not as easily modeled. There is an idea of an inappropriate combination of sentences. Let’s look at some examples:
I went to the doctor, yesterday. It is just a sprained ankle.
I went to the doctor, yesterday. Mosquitoes have 47 teeth.
In the first example, the second sentence is obviously related to the first. From these two sentences, combined with common knowledge, we can infer that the speaker went to the doctor for an ankle problem that turned out to be a sprain. The second example makes no sense. From a linguistics point of view, sentences are generated from concepts and then encoded into words and phrases. The concepts that are expressed by sentences are connected, so a sequence of sentences should be connected by similar concepts. This will be true whether there is only one speaker or more in a conversation.
The pragmatics of a discourse is important to understanding how to model it. If we are modeling a customer-service exchange, the range of responses can be limited. These limited types of responses are often called intents. When building a customer-service chatbot, this greatly reduces the potential complexity. If we are modeling general conversation, this can become much more difficult. Language models learn what is likely to occur in a sequence, but they cannot learn to generate concepts. So our choice is to either build something that models the probable sequences or find a way to cheat.
We can cheat by building canned responses to unrecognized intents. For example, if the user makes a statement that our simple model is not expecting, we can have it respond with, “Sorry, I don’t understand.” If we are logging the conversations, we can use exchanges that use the canned responses to expand the intents we cover.
In the example we are covering, we will be building a program that purely models the full text of the discourse. Essentially, it is a language model. The difference will be in how we use it.
This chapter is different than previous ones in that it doesn’t make use of Spark. Spark is great for processing large amounts of data in batches. It’s not great in interactive applications. Also, recurrent neural networks can take a long time to train with large amounts of data. So, in this chapter we are working a small piece of data. If you have the right hardware, you change the NLTK processing to use Spark NLP.
We will build a story-building tool. The idea is to help someone write an original story similar to one of the Grimm fairy tales. This model will be much more complex, in the sense of containing many more parameters, than the previous language model was. The program will be a script that asks for an input sentence and generates a new sentence. The user then takes that sentence, modifies and corrects it, and enters it.
What is the problem we are trying to solve?
We want a system that will recommend the next sentence in a story. We also must recognize the limitations of text generation techniques. We will need to have the user in the loop. So we need a model that can generate related text and a system that lets us review the output.
What constraints are there?
First, we need a model that has two notions of context—the previous sentence and the current sentence. We don’t need to worry about performance as much, since this will be interacting with a person. This might seem counterintuitive because most interactive systems require quite low latency. However, if you consider what this program is producing, it is not unreasonable to wait one to three seconds for a response.
How do we solve the problem with the constraints?
We will be building a neural network for generating text, specifically an RNN, as discussed in Chapters 4 and 8. We could learn the word embeddings in this model, but we can instead use a prebuilt embedding. This will help us train a model more quickly.
Most of the work on this project will be developing a model. Once we have a model, we will build a simple script that we can use to write our own Grimm-style fairy tale. Once we’ve developed this script, this model could potentially be used to power a Twitter bot or Slackbot.
In a real production setting for text generation, we would want to monitor the quality of generated text. This would allow us to improve the generated text over time by developing more targeted training data.
If you recall our language model, we used three layers.
We input windows of characters of a fixed size and predicted the following character. Now we need to find a way to take into account larger portions of text. There are a couple of options.
Many RNN architectures include a layer for learning an embedding for the words. This would merely require us to learn more parameters, so we will use a pretrained GloVe model instead. Also, we will be building our model on the token level, and not on the character level as before.
We could make the window size much larger than the average sentence. This has the benefit of keeping the same model architecture. The downside is that our LSTM layer will have to maintain information over quite long distances. We can use one of the architectures used for machine translations.
Let’s consider the concatenating approach.
The current inputs will be windows over sentences, so for each window of a given sentence we will use the same context vector. This approach has the benefit of being able to be extended to multiple sentences. The downside is that the model has to learn to balance the information from far away and from nearby.
Let’s consider the stateful approach.
This helps make training easier by reducing the influence of the previous sentence. This is a double-edged sword, however, because the context gives us less information. We will be using this approach.
Let’s start out by doing our imports. This chapter will rely on Keras.
from collections import Counter import pickle as pkl import nltk import numpy as np import pandas as pd from keras.models import Model from keras.layers import Input, Embedding, LSTM, Dense, CuDNNLSTM from keras.layers.merge import Concatenate import keras.utils as ku import keras.preprocessing as kp import tensorflow as tf
np.random.seed(1) tf.set_random_seed(2)
Let’s also define some special tokens for the beginning and ending of sentences, as well as for unknown tokens.
START = '>' END = '###' UNK = '???'
Now, we can load the data. We will need to replace some of the special characters.
with open('grimms_fairytales.txt', encoding='UTF-8') as fp: text = fp.read() text = text\ .replace('\t', ' ')\ .replace('“', '"')\ .replace('”', '"')\ .replace('“', '"')\ .replace('‘', "'")\ .replace('’', "'")
Now, we can process our text into tokenized sentences.
sentences = nltk.tokenize.sent_tokenize(text) sentences = [s.strip()for s in sentences] sentences = [[t.lower() for t in nltk.tokenize.wordpunct_tokenize(s)] for s in sentences] word_counts = Counter([t for s in sentences for t in s]) word_counts = pd.Series(word_counts) vocab = [START, END, UNK] + list(sorted(word_counts.index))
We need to define some hyperparameters for our model.
dim
is the size of the token embeddingsw
is the size of the windows we’ll usemax_len
is the sentence length that we useunits
is the size of the state vectors we’ll use for our LSTMsdim = 50 w = 10 max_len = int(np.quantile([len(s) for s in sentences], 0.95)) units = 200
Now, let’s load the GloVe embeddings.
glove = {} with open('glove.6B/glove.6B.50d.txt', encoding='utf-8') as fp: for line in fp: token, embedding = line.split(maxsplit=1) if token in vocab: embedding = np.fromstring(embedding, 'f', sep=' ') glove[token] = embedding vocab = list(sorted(glove.keys())) vocab_size = len(vocab)
We will also need to have a lookup for the one-hot–encoded output.
i2t = dict(enumerate(vocab)) t2i = {t: i for i, t in i2t.items()} token_oh = ku.to_categorical(np.arange(vocab_size)) token_oh = {t: token_oh[i,:] for t, i in t2i.items()}
Now, we can define some utility functions.
We will need to pad the end of the sentences; otherwise, we will not learn from the last words in the sentences.
def pad_sentence(sentence, length): sentence = sentence[:length] if len(sentence) < length: sentence += [END] * (length - len(sentence)) return sentence
We also need to convert sentences to matrices.
def sent2mat(sentence, embedding): mat = [embedding.get(t, embedding[UNK]) for t in sentence] return np.array(mat)
We need a function for converting sequences to a sequence of sliding windows.
def slide_seq(seq, w): window = [] target = [] for i in range(len(seq)-w-1): window.append(seq[i:i+w]) target.append(seq[i+w]) return window, target
Now we can build our input matrices. We will have two input matrices. One is from the context, and one is from the current sentence.
Xc = [] Xi = [] Y = [] for i in range(len(sentences)-1): context_sentence = pad_sentence(sentences[i], max_len) xc = sent2mat(context_sentence, glove) input_sentence = [START]*(w-1) + sentences[i+1] + [END]*(w-1) for window, target in zip(*slide_seq(input_sentence, w)): xi = sent2mat(window, glove) y = token_oh.get(target, token_oh[UNK]) Xc.append(np.copy(xc)) Xi.append(xi) Y.append(y) Xc = np.array(Xc) Xi = np.array(Xi) Y = np.array(Y)
print('context sentence: ', xc.shape) print('input sentence: ', xi.shape) print('target sentence: ', y.shape)
context sentence: (42, 50) input sentence: (10, 50) target sentence: (4407,)
Let’s build our model.
input_c = Input(shape=(max_len,dim,), dtype='float32') lstm_c, h, c = LSTM(units, return_state=True)(input_c) input_i = Input(shape=(w,dim,), dtype='float32') lstm_i = LSTM(units)(input_i, initial_state=[h, c]) out = Dense(vocab_size, activation='softmax')(lstm_i) model = Model(input=[input_c, input_i], output=[out])
print(model.summary())
Model: "model_1" __________________________________________________________________________ Layer (type) Output Shape Param # Connected to ========================================================================== input_1 (InputLayer) (None, 42, 50) 0 __________________________________________________________________________ input_2 (InputLayer) (None, 10, 50) 0 __________________________________________________________________________ lstm_1 (LSTM) [(None, 200), (None, 200800 input_1[0][0] __________________________________________________________________________ lstm_2 (LSTM) (None, 200) 200800 input_2[0][0] lstm_1[0][1] lstm_1[0][2] __________________________________________________________________________ dense_1 (Dense) (None, 4407) 885807 lstm_2[0][0] ========================================================================== Total params: 1,287,407 Trainable params: 1,287,407 Non-trainable params: 0 __________________________________________________________________________ None
model.compile( loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Now we can train our model. Depending on your hardware, this can potentially take four minutes per epoch on CPU. This is our most complex model yet with almost 1.3 million parameters.
Epoch 1/10 145061/145061 [==============================] - 241s 2ms/step - loss: 3.7840 - accuracy: 0.3894 ... Epoch 10/10 145061/145061 [==============================] - 244s 2ms/step - loss: 1.8933 - accuracy: 0.5645
Once we have this model trained, we can try to generate some sentences. This function will need a context sentence and an input sentence—we can simply supply one word to begin. The function will append tokens to the input sentence until the END
token is generated or we have hit the maximum allowed length.
def generate_sentence(context_sentence, input_sentence, max_len=100): context_sentence = [t.lower() for t in nltk.tokenize.wordpunct_tokenize(context_sentence)] context_sentence = pad_sentence(context_sentence, max_len) context_vector = sent2mat(context_sentence, glove) input_sentence = [t.lower() for t in nltk.tokenize.wordpunct_tokenize(input_sentence)] input_sentence = [START] * (w-1) + input_sentence input_sentence = input_sentence[:w] output_sentence = input_sentence input_vector = sent2mat(input_sentence, glove) predicted_vector = model.predict([[context_vector], [input_vector]]) predicted_token = i2t[np.argmax(predicted_vector)] output_sentence.append(predicted_token) i = 0 while predicted_token != END and i < max_len: input_sentence = input_sentence[1:w] + [predicted_token] input_vector = sent2mat(input_sentence, glove) predicted_vector = model.predict([[context_vector], [input_vector]]) predicted_token = i2t[np.argmax(predicted_vector)] output_sentence.append(predicted_token) i += 1 return output_sentence
Because we need to supply the first word of the new sentence, we can simply sample from the beginning tokens found in our corpus. Let’s save the distribution of first words that we will need as JSON.
first_words = Counter([s[0] for s in sentences]) first_words = pd.Series(first_words) first_words = first_words.sum()
first_words.to_json('grimm-first-words.json')
with open('glove-dict.pkl', 'wb') as out: pkl.dump(glove, out)
with open('vocab.pkl', 'wb') as out: pkl.dump(i2t, out)
Let’s see what is generated without human intervention.
context_sentence = ''' In old times, when wishing was having, there lived a King whose daughters were all beautiful, but the youngest was so beautiful that the sun itself, which has seen so much, was astonished whenever it shone in her face. '''.strip().replace('\n', ' ') input_sentence = np.random.choice(first_words.index, p=first_words) for _ in range(10): print(context_sentence, END) output_sentence = generate_sentence(context_sentence, input_sentence, max_len) output_sentence = ' '.join(output_sentence[w-1:-1]) context_sentence = output_sentence input_sentence = np.random.choice(first_words.index, p=first_words) print(output_sentence, END)
In old times, when wishing was having, there lived a King whose daughters were all beautiful, but the youngest was so beautiful that the sun itself, which has seen so much, was astonished whenever it shone in her face. ### " what do you desire ??? ### the king ' s son , however , was still beautiful , and a little chair there ' s blood and so that she is alive ??? ### the king ' s son , however , was still beautiful , and the king ' s daughter was only of silver , and the king ' s son came to the forest , and the king ' s son seated himself on the leg , and said , " i will go to church , and you shall be have lost my life ??? ### " what are you saying ??? ### cannon - maiden , and the king ' s daughter was only a looker - boy . ### but the king ' s daughter was humble , and said , " you are not afraid ??? ### then the king said , " i will go with you ??? ### " i will go with you ??? ### he was now to go with a long time , and the bird threw in the path , and the strong of them were on their of candles and bale - plants . ### then the king said , " i will go with you ??? ###
This model won’t be passing the Turing test any time soon. This is why we need to have a human in the loop. Let’s build our script. First, let’s save our model.
model.save('grimm-model')
Our script will need to have access to some of our utility functions, as well as to the hyperparameters—for example, dim
, w
.
%%writefile fairywriter.py """ This script helps you generate a fairytale. """ import pickle as pkl import nltk import numpy as np import pandas as pd from keras.models import load_model import keras.utils as ku import keras.preprocessing as kp import tensorflow as tf START = '>' END = '###' UNK = '???' FINISH_CMDS = ['finish', 'f'] BACK_CMDS = ['back', 'b'] QUIT_CMDS = ['quit', 'q'] CMD_PROMPT = ' | '.join(','.join(c) for c in [FINISH_CMDS, BACK_CMDS, QUIT_CMDS]) QUIT_PROMPT = '"{}" to quit'.format('" or "'.join(QUIT_CMDS)) ENDING = ['THE END'] def pad_sentence(sentence, length): sentence = sentence[:length] if len(sentence) < length: sentence += [END] * (length - len(sentence)) return sentence def sent2mat(sentence, embedding): mat = [embedding.get(t, embedding[UNK]) for t in sentence] return np.array(mat) def generate_sentence(context_sentence, input_sentence, vocab, max_len=100, hparams=(42, 50, 10)): max_len, dim, w = hparams context_sentence = [t.lower() for t in nltk.tokenize.wordpunct_tokenize(context_sentence)] context_sentence = pad_sentence(context_sentence, max_len) context_vector = sent2mat(context_sentence, glove) input_sentence = [t.lower() for t in nltk.tokenize.wordpunct_tokenize(input_sentence)] input_sentence = [START] * (w-1) + input_sentence input_sentence = input_sentence[:w] output_sentence = input_sentence input_vector = sent2mat(input_sentence, glove) predicted_vector = model.predict([[context_vector], [input_vector]]) predicted_token = vocab[np.argmax(predicted_vector)] output_sentence.append(predicted_token) i = 0 while predicted_token != END and i < max_len: input_sentence = input_sentence[1:w] + [predicted_token] input_vector = sent2mat(input_sentence, glove) predicted_vector = model.predict([[context_vector], [input_vector]]) predicted_token = vocab[np.argmax(predicted_vector)] output_sentence.append(predicted_token) i += 1 return output_sentence if __name__ == '__main__': model = load_model('grimm-model') (_, max_len, dim), (_, w, _) = model.get_input_shape_at(0) hparams = (max_len, dim, w) first_words = pd.read_json('grimm-first-words.json', typ='series') with open('glove-dict.pkl', 'rb') as fp: glove = pkl.load(fp) with open('vocab.pkl', 'rb') as fp: vocab = pkl.load(fp) print("Let's write a story!") title = input('Give me a title ({}) '.format(QUIT_PROMPT)) story = [title] context_sentence = title input_sentence = np.random.choice(first_words.index, p=first_words) if title.lower() in QUIT_CMDS: exit() print(CMD_PROMPT) while True: input_sentence = np.random.choice(first_words.index, p=first_words) generated = generate_sentence(context_sentence, input_sentence, vocab, hparams=hparams) generated = ' '.join(generated) ### the model creates a suggested sentence print('Suggestion:', generated) ### the user responds with the sentence they want add ### the user can fix up the suggested sentence or write their own ### this is the sentence that will be used to make the next suggestion sentence = input('Sentence: ') if sentence.lower() in QUIT_CMDS: story = [] break elif sentence.lower() in FINISH_CMDS: story.append(np.random.choice(ENDING)) break elif sentence.lower() in BACK_CMDS: if len(story) == 1: print('You are at the beginning') story = story[:-1] context_sentence = story[-1] continue else: story.append(sentence) context_sentence = sentence print('\n'.join(story)) print('exiting...')
Let’s give our script a run. I’ll use it to read the suggestion and take elements of it to add the next line. A more complex model might be able to produce sentences that can be edited and added, but this model isn’t quite there.
%run fairywriter.py
Let's write a story! Give me a title ("quit" or "q" to quit) The Wolf Goes Home finish,f | back,b | quit,q Suggestion: > > > > > > > > > and when they had walked for the time , and the king ' s son seated himself on the leg , and said , " i will go to church , and you shall be have lost my life ??? ### Sentence: There was once a prince who got lost in the woods on the way to a church. Suggestion: > > > > > > > > > she was called hans , and as the king ' s daughter , who was so beautiful than the children , who was called clever elsie . ### Sentence: The prince was called Hans, and he was more handsome than the boys. Suggestion: > > > > > > > > > no one will do not know what to say , but i have been compelled to you ??? ### Sentence: The Wolf came along and asked, "does no one know where are?" Suggestion: > > > > > > > > > there was once a man who had a daughter who had three daughters , and he had a child and went , the king ' s daughter , and said , " you are growing and thou now , i will go and fetch Sentence: The Wolf had three daughters, and he said to the prince, "I will help you return home if you take one of my daughters as your betrothed." Suggestion: > > > > > > > > > but the king ' s daughter was humble , and said , " you are not afraid ??? ### Sentence: The prince asked, "are you not afraid that she will be killed as soon as we return home?" Suggestion: > > > > > > > > > i will go and fetch the golden horse ??? ### Sentence: The Wolf said, "I will go and fetch a golden horse as dowry." Suggestion: > > > > > > > > > one day , the king ' s daughter , who was a witch , and lived in a great forest , and the clouds of earth , and in the evening , came to the glass mountain , and the king ' s son Sentence: The Wolf went to find the forest witch that she might conjure a golden horse. Suggestion: > > > > > > > > > when the king ' s daughter , however , was sitting on a chair , and sang and reproached , and said , " you are not to be my wife , and i will take you to take care of your ??? ### Sentence: The witch reproached the wolf saying, "you come and ask me such a favor with no gift yourself?" Suggestion: > > > > > > > > > then the king said , " i will go with you ??? ### Sentence: So the wolf said, "if you grant me this favor, I will be your servant." Suggestion: > > > > > > > > > he was now to go with a long time , and the other will be polluted , and we will leave you ??? ### Sentence: f The Wolf Goes Home There was once a prince who got lost in the woods on the way to a church. The prince was called Hans, and he was more handsome than the boys. The Wolf came along and asked, "does no one know where are?" The Wolf had three daughters, and he said to the prince, "I will help you return home if you take one of my daughters as your betrothed." The prince asked, "are you not afraid that she will be killed as soon as we return home?" The Wolf said, "I will go and fetch a golden horse as dowry." The Wolf went to find the forest witch that she might conjure a golden horse. The witch reproached the wolf saying, "you come and ask me such a favor with no gift yourself?" So the wolf said, "if you grant me this favor, I will be your servant." THE END exiting..
You can do additional epochs to get better suggestions, but beware of overfitting. If you overfit this model, then it will generate worse results if you provide it with contexts and inputs that it doesn’t recognize.
Now that we have a model that we can interact with, the next step would be to integrate it with a chatbot system. Most systems require some server that will serve the model. The specifics will depend on your chatbot platform.
Measuring a chatbot depends more on the end purpose of the product than it does for most applications. Let’s consider the different kinds of metrics we will use for measuring.
If you are building a chatbot to support customer service, then the business metrics will be centered around the customer experience. If you are building a chatbot for entertainment purposes, as is the case here, there are no obvious business metrics. However, if the entertaining chatbot is being used for marketing, you can use marketing metrics.
It’s difficult to measure live interactions in the same way that the model measures in training. In training, we know the “correct” response, but due to the interactive nature of the model we don’t have a definite correct answer. To measure a live model, you will need to manually label conversations.
Now let’s talk about the infrastructure.
In this chapter, we learned how to build a model for an interactive application. There are many different kinds of chatbots. The example we see here is based on a language model, but we can also build a recommendation model. It all depends on what kind of interaction you are expecting. In our situation, we are entering and receiving full sentences. If your application has a constrained set of responses, then your task becomes easier.