Pythonism

code and the oracular

Pocketsphinx Voice Recognition with Python

with 9 comments

I downloaded pocketsphinx and the corresponding python module with:

sudo apt-get install python-pocketsphinx pocketsphinx-hmm-wsj1 pocketsphinx-lm-wsj

and then downloaded Pyaudio from http://people.csail.mit.edu/hubert/pyaudio/#downloads

Pocketsphinx needs a 16 bit mono wav file at a bitrate of 16k, as you can see I set this in the code.

This code lets you record a bit of speech and then it reads it back to you, just to test the idea. It could be the beginning of a speech recognition system of great usefulness but for me the fidelity wasn’t that good. If you want to train Sphinx to your voice this means creating your own acoustic model which takes some time and is detailed here: http://cmusphinx.sourceforge.net/wiki/tutorialam. I may do this later. The real dream is to have a talking chatbot.

import sys,os
import pyaudio
import wave

hmdir = "/usr/share/pocketsphinx/model/hmm/wsj1"
lmd   = "/usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP"
dictd = "/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic"

def decodeSpeech(hmmd,lmdir,dictp,wavfile):

    import pocketsphinx as ps
    import sphinxbase

    speechRec = ps.Decoder(hmm = hmmd, lm = lmdir, dict = dictp)
    wavFile = file(wavfile,'rb')
    wavFile.seek(44)
    speechRec.decode_raw(wavFile)
    result = speechRec.get_hyp()

    return result[0]

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 10

for x in range(10):
    fn = "o"+str(x)+".wav"
    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)
    print("* recording")
    frames = []
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    print("* done recording")
    stream.stop_stream()
    stream.close()
    p.terminate()
    wf = wave.open(fn, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()
    wavfile = fn
    recognised = decodeSpeech(hmdir,lmd,dictd,wavfile)
    print recognised
    cm = 'espeak "'+recognised+'"'
    os.system(cm)

for a screencast example of using pshinx for a talking chatbot see here

Advertisements

Written by Luke Dunn

June 6, 2013 at 9:12 am

9 Responses

Subscribe to comments with RSS.

  1. How was the output ? you should consider putting in a video of the output ! 🙂

    Arkapravo Bhaumik

    June 11, 2013 at 12:30 pm

    • this is a real starter only because anyone who pastes this code may find they want to tweak it a lot before it performs well and frankly I hope someone will, because I was disappointed by the fidelity. certainly a lot worse than the dictate app on my android phone.

      Maybe my first step would be a better mike too, but I wasn’t sure I was missing something since PyAudio didn’t want to set sample rate only bit rate and there may be something hooky going on between those two.

      Trip Technician

      October 10, 2013 at 1:49 pm

  2. Thank you very much for the code – I was looking a long time to find something like that.

    Unfortunately, Python shows me an invalid syntax error in line 8 “import pocketsphinx as ps” and I don’t understand why. I hope you can help me.

    J. Knopp

    August 3, 2014 at 8:54 pm

  3. I run it on anaconda having python 3.6.1 version.It is creating a wave file on the same folder but not reading it back and printing the speech as text on the shell.

    BHAVYA SHETH

    August 3, 2017 at 6:34 am

    • I developed this using Python 2.7. Python 3+ makes “print” a function, not a statement. Try changing line 49 to

      print(recognised)

      If that doesn’t work then try to debug by getting the script to print out some variables so you can identify where the bug is. Step through each stage and make sure that the code is doing what it should. If you are still stuck then run it with Python 2.7 and see if that fixes. Can’t say much more than that without being there…

      Cheers 🙂

      Luke Dunn

      August 3, 2017 at 8:06 am

    • I tried it in python 2.7 but now Iam getting error in speechRec.decode_raw(wavefile) line. As it is showing that the decoder object has no attribute named decode_raw.

      BHAVYA SHETH

      August 3, 2017 at 10:57 am

      • you are in a tricky situation because I think the version of pocketsphinx you are using no longer supports the decode_raw method, it has instead been replaced by a different method. The best I can suggest is to import pocketsphinx in the interpreter and look at the bound methods and objects using

        dir(ps)

        or

        dir(speechRec)

        then try and get some documentation for the different functions to see what your version of sphinx offers as ways to decode a recording, if you can’t get any online then you may have to look at docstrings and also experiment.

        Luke Dunn

        August 3, 2017 at 11:25 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: