Pythonism

joy in simplicity – platonic world of code and math

A Markov Text Bot in 71 lines of Python

with 21 comments

A naive chatbot program. No parsing, no cleverness, just a training file and output.

It first trains itself on a text and then later uses the data from that training to generate responses to the interlocutor’s input. The training process creates a dictionary where each key is a word and the value is a list of all the words that follow that word sequentially anywhere in the training text. If a word features more than once in this list then that reflects and it is more likely to be chosen by the bot, no need for probabilistic stuff just do it with a list.

The bot chooses a random word from your input and generates a response by choosing another random word that has been seen to be a successor to its held word. It then repeats the process by finding a successor to that word in turn and carrying on iteratively until it thinks it’s said enough. It reaches that conclusion by stopping at a word that was prior to a punctuation mark in the training text. It then returns to input mode again to let you respond, and so on.

It isn’t very realistic but I hereby challenge anyone to do better in 71 lines of code !! This is a great challenge for any budding Pythonists, and I just wish I could open the challenge to a wider audience than the small number of visitors I get to this blog. To code a bot that is always guaranteed to be grammatical must surely be closer to several hundred lines, I simplified hugely by just trying to think of the simplest rule to give the computer a mere stab at having something to say.

Its responses are rather impressionistic to say the least ! Also you have to put what you say in single quotes.

I used War and Peace for my “corpus” which took a couple of hours for the training run, use a shorter file if you are impatient…

here is the trainer

#lukebot-trainer.py
import pickle
b=open('war&peace.txt')
text=[]
for line in b:
    for word in line.split():
        text.append (word)
b.close()
textset=list(set(text))
follow={}
for l in range(len(textset)):
    working=[]
    check=textset[l]
    for w in range(len(text)-1):
        if check==text[w] and text[w][-1] not in '(),.?!':
            working.append(str(text[w+1]))
    follow[check]=working
a=open('lexicon-luke','wb')
pickle.dump(follow,a,2)
a.close()

here is the bot

#lukebot.py
import pickle,random
a=open('lexicon-luke','rb')
successorlist=pickle.load(a)
a.close()
def nextword(a):
    if a in successorlist:
        return random.choice(successorlist[a])
    else:
        return 'the'
speech=''
while speech!='quit':
    speech=raw_input('>')
    s=random.choice(speech.split())
    response=''
    while True:
        neword=nextword(s)
        response+=' '+neword
        s=neword
        if neword[-1] in ',?!.':
            break
    print response

You tend to get an uncanny feeling when it says something that seems partially to make sense. I think this is a universal syndrome with chatbots and shows us a lot about our expectations and hopes for machine intelligence. The tendency to attribute some kind of surreal cosmic wisdom or irony to the responses is common.

Sample output:( is this like talking to Tolstoy? Not really but still…)


>>>
>'hello'
the human affairs he thought.
>'oh those affairs were great'
God,
>'what -surprised?'
do you value but she is what had the new country,
>'i do value but i do not blame the war on her'
so alarming them with those around with particular chair by all those who caressed me.
>'i never get anyone caressing me!'
allowed to talk about in a really so.
>'oh do not be prudish I am allowed to mention it'
should she suddenly heard of a duty for reason was handed him very hot.
>'you like these women a lot to talk about them so much'
advantage if your father?
>'my father approves of my romantic side'
love my son's having obeyed in the causes historic events that circle they didn't you know a force producing two Emperors with a certain prince.
>'oh now it is princes is it'
God,
>'do you believe in Him?'
the insistence of incongruities giving you married at his squadron--that is,
>' i don't understand you sir'
>'i do not understand you sir'
the bridge which,
>'i crossed'
himself,
>'ok ok... well 71 lines still is not bad'
the work at Smolensk?
>'yes'
or that of War.
>'no, war'
was gradually that millions of married in which is Dolokhov?

About these ads

Written by Trip Technician

April 18, 2010 at 6:57 pm

21 Responses

Subscribe to comments with RSS.

  1. [...] July 8, 2010 by Trip Technician I am still working on ideas for enhancing my chatbot. One of the ideas I have had is that when the bot finds a word it doesn’t know it [...]

  2. Cool program man !
    I like the “simple” approach. And this have the advantage to be langage-free, as you can give a text input in any langage.
    Of course, there may be some improvements to get an answer that makes a little bit more sense, however it is a good approach to me as generic & simple.
    I read your 2 other posts on this bot, and I think you are right that the context has a primary importance to get answer more intelligent…
    I’ll work on this idea. Thanks for this nice post man !

    Lorico

    October 27, 2010 at 11:41 am

    • Hello again,

      I played with your code, and found it definitely funny : you never get the same answer.
      However, as you make a random at each word you find, you kind of “jump” of sentence context each time.
      So I thouth it could be fun to try to keep, at least, the complete sentence itself, to keep the context (at least at 1 sentence level).
      So I slightly changed your trainer to do so :
      #lukebot-trainer.py
      import pickle
      b=open(‘sample_text.txt’)
      text={}
      all = b.read()

      for line in all.split(‘.’):
      l = line.split()

      for word in l:
      if word in text:
      t = text[word]
      t.append(line)
      text[word] = t
      else:
      text[word] = [line]

      b.close()

      a=open(‘lexicon-dict2′,’wb’)
      pickle.dump(text,a,2)
      a.close()
      print “done”

      In the other file, I only changed the format of the raw_input (to avoid putting always some ‘ between the word you are looking at) (and I also put a i counter to make sure we don’t finish in an endless loop…), but if you keep your version I think it works as well :
      #lukebot.py
      import pickle,random
      a=open(‘lexicon-dict2′,’rb’)
      successorlist=pickle.load(a)
      a.close()

      def nextword(a):
      if a in successorlist:
      return random.choice(successorlist[a])
      else:
      return ‘ ‘
      speech=”

      while speech!=’quit’:
      speech=raw_input(‘> ‘)
      s=random.choice(speech.split())
      response=”
      i = 0
      while True:
      i = i + 1
      neword=nextword(s)
      if neword == s:
      i=11
      response+=’ ‘+neword
      s=neword
      if neword[-1] in ‘,?!.’ or i > 10:
      break
      print response

      This is not really improving your version/idea, however I found it could give you some better impression when using it (but less fun of course – and the output sentences for a word are obviously much less different).

      Thanks again !

      Lorico

      October 27, 2010 at 6:33 pm

      • great to find it aroused interest Lorico ! When I’ve got the best chatbot with all the new features i am talking about in other posts I’ll share. Also I wondered whether Markov text techniques can be used for automatic translation… substitute a handful of looked up words for each word in the translate input, then use markov to select the best choices of words (most likely) and best word order. just an idea.

        Trip Technician

        November 25, 2010 at 10:38 am

  3. Hey! thanks alot for sharing this, I am making really good use of an editted version of it in a chatroom bot and the results are utterly hilarious! try feeding it a Shakespeare piece :D

    Blazer

    November 7, 2010 at 8:24 pm

    • thanks Blazer happy to find interest. Post again if you have improvements…

      Trip Technician

      November 25, 2010 at 10:38 am

  4. Is there some reason you’re looping over all of the text for each of the words?

    This code seems to accomplish the same thing and go MUCH faster:

    
    for w in range(len(text)-1):
        check=text[w]
        next_word=text[w+1]
        if check[-1] not in '(),.?!':
            if follow.has_key(check):
                follow[check].append(next_word)
            else:
                follow[check]=[next_word]
    
    

    L Zoel.

    January 17, 2011 at 5:03 am

  5. I’ve tried to run your code a few times using different words and keep getting the same error…

    Traceback (most recent call last):
    File “lukebot.py”, line 13, in
    speech=input(‘>’)
    File “”, line 1, in
    NameError: name ‘hello’ is not defined

    JakeJAMR

    April 5, 2011 at 2:01 pm

    • I think the training data you used does not contain the word ‘hello’.

      Trip Technician

      April 5, 2011 at 4:01 pm

    • you should replace the ‘input()’ function with raw_input(), then it should work. I had the same problem :|

      giodamelio

      June 9, 2011 at 3:43 am

  6. Thanks giodamelio! I’ll dig up the folder that I gave up on and try your fix. If it works… maybe they should change the code on this page to reflect that fix???

    JakeJAMR

    July 9, 2011 at 7:52 pm

  7. This look really nice, and I look forward to enjoying it.

    The lukebot-trainer.py ran just fine, and created this file: lexicon-luke

    Then lukebot.py would not run, due to this error:

    Traceback (most recent call last):
    File “lukebot.py”, line 3, in
    a=open(‘lexicon-dict’,’rb’)
    IOError: [Errno 2] No such file or directory: ‘lexicon-dict’

    I am not familiar with this file. May I kindly ask about the file: lexicon-dict ?

    Thanks in advance!

    8pla.net

    September 6, 2011 at 8:02 pm

    • Hi,

      sorry there seems to have been an error. the file lexicon-dict should be lexicon-luke. I’ve changed the code of the output program so you can copy and paste the new version !! Thanks for your interest.

      Trip Technician

      September 7, 2011 at 8:18 am

  8. uhmm whats the meaning of lexicon-luke

    krit

    July 12, 2012 at 8:23 am

  9. lexicon-luke is a pickled dictionary of word -> successor word mappings. It constitutes the data whereby sentences are generated. The bot takes a word for input, chooses a random successor word, adds it to the output string, then takes that successor and chooses another successor to that etc until a punctuation mark is reached.

    Trip Technician

    July 12, 2012 at 8:37 am

  10. working really great gonna study more about this

    Trip Technician – “I am still working on ideas for enhancing my chatbot. One of the ideas I have had is that when the bot finds a word it doesn’t know it “

    krit

    July 12, 2012 at 9:00 am

  11. the key idea is that of a “Markov Process”. This is a way of generating a sequence of words where each word is chosen based only on the previous one. If you are interested in chatbots then here is a good link to look at:

    http://pyaiml.sourceforge.net/

    AIML – Artificial Intelligence Markup Language is the technology behnd ALICE which was a well known bot. Also take a look around my blog there are other bits about NLP – Natural Language Processing with Python.

    Trip Technician

    July 12, 2012 at 9:15 am

  12. Hi there! . . I am new in python. . What is the extension of both file. . Pls. Thanks in advance.

    Danilo

    August 2, 2012 at 3:16 am

    • the files are made without extension, which is more common for text files on Linux. A file doesn’t have to have an extension…You can give the files extensions if you like by changing the instances of the filenames in the code. Some windows people might like to call them something like “lexicon-luke.dat”, dat being a general extension used for data, but it’s not essential.

      Trip Technician

      August 2, 2012 at 9:48 am

  13. […] Para esto modifiqué un muy simple chat-bot que encontré aquí: https://pythonism.wordpress.com/2010/04/18/a-simple-chatbot-in-python/. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 45 other followers

%d bloggers like this: