Recurrent neural networks function just right when it comes to short-term dependencies. What this means is that, if there is just a single statement to be dealt with, a neural network operates fine. For example, if there is a sentence, India's capital is __, in this scenario we would invariably get the correct result as this is a universal statement and there is nothing like a context here. This statement has no dependency on the previous sentence and here, there is no previous sentence either.
Hence, the prediction would be India's capital is New Delhi.
Afterall, the vanilla RNN's does not understand the context behind an input. We will understand with an example:
Staying in India meant that I gravitated towards cricket. But, after 10 years, I moved to the US for work.
The popular game in India is ___.
One can see that there is a context in the first sentence and then it changes in the second one. However, prediction has to be done by the network on the basis of the first one. It is highly likely that the popular game in India is cricket, but context plays a role here and it has to be understood by the network. Simple RNN is a failure here.
That is where Long Short-Term Memory (LSTM) comes into the picture.