mining project gutenberg and using graphviz to display word data

I downloaded the Project Gutenberg DVD from here:

I mounted the ISO and copied the files across to a folder, preserving structure.

I used this code to unpack the zip archives, ~32,000 in all into a flat folder to make an easily usable corpus.
Written by Luke Dunn

February 28, 2014 at 9:05 pm

Natural Language Generation

August 14, 2012 at 2:02 pm