Trump speech generation using Markoc chains

Moving from Trump speech analysis (see previous post) to speech generation, in this post I investigate the possibilities of automatically generating text that could plausibly have been by Trump. For the generation, I experimented with using Markov chains. Markov chains, named after Andrey Markov, are stateless mathematical models that describe sequences of possible events. They are used in several real world applications such as autocomplete suggestions, speech recognition, text identifiers, path recognition, and in many other artificial intelligence tools.

A Markov chain tells the probability of transitioning from one state to any other state. In text generation, Markov chains learn the probability of the next word or character given one, two, or more words or characters. In this post, I will not delve into the details and mathematical formulation of Markov chains. For a visual description of how Markov chains work, see Victor Power’s site.

As this was my first practical experiment working with Markov chains, I thus began by using python code written by Nevo [1], which I then slightly modified for the purpose of generating Trump like speeches.

I started by creating a collection of Trump’s speeches. The collection included words from his inauguration speech and his first and final presidential debates. In total, the text collection consisted of 16,667 words. I first trained a Markov chain model on uni-grams (one word), then bigrams (two words) and trigrams (three words) and continuously increased the n-gram order. Then the model was given a random word or words to begin from, and it started generating the rest. I limited the length of generated text to 150 words.

Below are the results. What do you think, do the following texts sound like a speech made by Trump?

A: Text generated from model learned with unigrams

“I am a brilliant community, you tried to happen. And what do not show that. It’s an airplane on to that. I mean, I told them $1,500, and peaceful transfer of state when you right around election time, including a small group came out about young and that’s true throughout the true way you find this? Is it was so bad. First of our country. And I am going to Aleppo? OK, so many different reality exists: mothers and I should have taken the war in every year. You haven’t even worse, and I can’t do damage to be obsolete, because of the likes of them to have led our country, whether a routine audit. And I’ll go forward to myself, “I can’t bring into the clips where so much tougher and look — now, and other country, you would be a disaster.”

B: Text generated from model learned with bigrams

“We share one heart, one home, and one glorious destiny. The oath of allegiance to the manufacturing industry. You go to Mexico and other countries, while allowing for the presidency of the people she wants to make America great again. Thank you. Today’s ceremony, however, has very special meaning because today, we are not keeping up with the nations of the greatest assets anywhere in the world, beyond the United States got along well, that would help you, I would like to… It’s not an accurate one at all. It’s not an accurate one at all. It’s not nice. And I think I’ve ever seen in any government whatsoever. Wrong. CLINTON: … it is… Lester, how much? Typical politician. All talk, no action. Sounds good, doesn’t work. Never going to create new deals.”

C: Text generated from model learned with trigrams

“We, the citizens of America, are now joined in a great national effort to rebuild our country and our economy generally is — look, we owe $20 trillion. We cannot do it any longer, Lester. HOLT: Back to the question, though. How do you bring back — specifically bring back jobs, American manufacturers? How do you miss $6 billion? You ran the State Department, $6 billion was missing. How do you bring back — specifically bring back jobs, American manufacturers? How do you miss $6 billion? You ran the State Department, $6 billion was missing. How do you make them bring the jobs back? Well, the first thing you do is don’t let the jobs leave. The companies are leaving. I could name, I mean, there are thousands of them. They’re leaving, and they’re leaving in bigger numbers than ever.”

As can be seen, the text generated by the model trained with unigrams (A) is nonsensical. With the bigrams trained model (B), the text starts getting a bit better, but still does not make much sense. With the trigrams trained model (C), the text begins to sound like some thing Trump would say.

From my experiments, trigrams were identified as the highest possible order before the generated text resulted in direct quotations of the text used in the training and thus the models were no longer generating text per se. I also tried generating longer texts of more than 150 words, but it resulted in nonsensical texts. This is because the Markov chain models do no have a memory beyond the set n-grams. Thus, new sentences might be generated with topics that are not related to prior sentences.

Based on this experiment, I would say that Markov chains are not the best approach in generating speeches of considerable length, and future experiments involve identifying better approaches. For short texts, like Tweets, I have observed good results being achieved by recurrent neural networks. As an example, a Trump Twitterbot @DeepDrumpf produces quiet Donald Trump sounding tweets, for instance the following tweet “America has never been more harmed by the  vote. I made a lot of money on that. I am doing big jobs in places, now everything is Benghazi.” Thus, a most likely next experiment will be to experiment with recurrent neural networks for generating longer texts.

 

Source:

[1] Omer Nevo (2016). Poetry in Python: Using Markov Chains to generate texts. https://www.youtube.com/watch?v=-51qWZdA8zM


Also published on Medium.