Pythonism

code and the oracular

Archive for the ‘data visualisation’ Category

mining project gutenberg and using graphviz to display word data

leave a comment »

I downloaded the Project Gutenberg DVD from here: http://www.gutenberg.org/wiki/Gutenberg:The_CD_and_DVD_Project

I mounted the ISO and copied the files across to a folder, preserving structure.

I used this code to unpack the zip archives, ~32,000 in all into a flat folder to make an easily usable corpus.
Read the rest of this entry »

Advertisements

Written by Luke Dunn

February 28, 2014 at 9:05 pm

Natural Language Generation

leave a comment »

Written by Luke Dunn

August 14, 2012 at 2:02 pm