Pythonism

code and the oracular

Simple Python Project: Markov Text

with 2 comments

consider this sentence

“the cat sat on the mat”

we can see the following about it

the word “the” is followed by “cat” and “mat”
the word “cat” is followed by “sat”
the word “sat” is followed by “on”
the word “on” is followed by “the”

so from this sentence we can construct a dictionary like this

catsat = {"the":["cat","mat"],
          "cat":["sat",],
          "sat":["on",],
          "on":["the",],
          "mat":[]}

this dictionary has summarised some of the characteristics of our sentence, not all by a long chalk but it contains some degree of information about the order of the words.

index

if you were only given the dictionary not the original text you could try to reconstruct the text. But what do you do when a word has more than one possible successor?… you could take the first in the list, or the second, or you could take a random member. With any of these you lose certainty that you can reconstruct perfectly, but it is still worth a try. Let’s try to use the random method starting with a random seed word “cat”, and see what we get.

cat -> sat -> on -> the -> cat -> sat -> on -> the -> mat

how do you know when to end the sequence… well “mat” has no successors so here we can agree perhaps that if we land on a word that is terminal, like “mat”, then that can be the last term.

so what’s the point?

well it can get more interesting if you use a larger text sample. Here’s some code that will do our trick on a text of almost any length quite fast

a=""" some text here"""

to make our dictionary we can iterate through the text as a list of words, formed by splitting on spaces

b=a.split(' ')

markov = {}
for word in set(b):
    markov[word] =[]

for z in range(len(b)-1):
    markov[b[z]].append(b[z+1])

then to generate a random text 100 words long that “feels” a little like the original we can do

import random
seed_word = random.choice(b)

for count in range(100):
next_word = random.choice(markov[seed_word])
print next_word,
seed_word = next_word

this can be funny too. Here I apply the process to the following text, the first few paragraphs of Dickens’ “A Tale of two Cities”.

a=”””It was the best of times,
it was the worst of times,
it was the age of wisdom,
it was the age of foolishness,
it was the epoch of belief,
it was the epoch of incredulity,
it was the season of Light,
it was the season of Darkness,
it was the spring of hope,
it was the winter of despair,
we had everything before us,
we had nothing before us,
we were all going direct to Heaven,
we were all going direct the other way–
in short, the period was so far like the present period, that some of
its noisiest authorities insisted on its being received, for good or for
evil, in the superlative degree of comparison only.

There were a king with a large jaw and a queen with a plain face, on the
throne of England; there were a king with a large jaw and a queen with
a fair face, on the throne of France. In both countries it was clearer
than crystal to the lords of the State preserves of loaves and fishes,
that things in general were settled for ever.

It was the year of Our Lord one thousand seven hundred and seventy-five.
Spiritual revelations were conceded to England at that favoured period,
as at this. Mrs. Southcott had recently attained her five-and-twentieth
blessed birthday, of whom a prophetic private in the Life Guards had
heralded the sublime appearance by announcing that arrangements were
made for the swallowing up of London and Westminster. Even the Cock-lane
ghost had been laid only a round dozen of years, after rapping out its
messages, as the spirits of this very year last past (supernaturally
deficient in originality) rapped out theirs. Mere messages in the
earthly order of events had lately come to the English Crown and People,
from a congress of British subjects in America: which, strange
to relate, have proved more important to the human race than any
communications yet received through any of the chickens of the Cock-lane
brood.

France, less favoured on the whole as to matters spiritual than her
sister of the shield and trident, rolled with exceeding smoothness down
hill, making paper money and spending it. Under the guidance of her
Christian pastors, she entertained herself, besides, with such humane
achievements as sentencing a youth to have his hands cut off, his tongue
torn out with pincers, and his body burned alive, because he had not
kneeled down in the rain to do honour to a dirty procession of monks
which passed within his view, at a distance of some fifty or sixty
yards. It is likely enough that, rooted in the woods of France and
Norway, there were growing trees, when that sufferer was put to death,
already marked by the Woodman, Fate, to come down and be sawn into
boards, to make a certain movable framework with a sack and a knife in
it, terrible in history. It is likely enough that in the rough outhouses
of some tillers of the heavy lands adjacent to Paris, there were
sheltered from the weather that very day, rude carts, bespattered with
rustic mire, snuffed about by pigs, and roosted in by poultry, which
the Farmer, Death, had already set apart to be his tumbrils of
the Revolution. But that Woodman and that Farmer, though they work
unceasingly, work silently, and no one heard them as they went about
with muffled tread: the rather, forasmuch as to entertain any suspicion
that they were awake, was to be atheistical and traitorous.”””

and we get this

——– output ————-

>>>
the Life Guards had
heralded the Cock-lane
brood.

France, less favoured on the
throne of comparison only.

There were all going direct to matters spiritual than any
communications yet received through any suspicion
that they work
unceasingly, work silently, and fishes,
that things in the woods of events had not
kneeled down in the Woodman, Fate, to do honour to matters spiritual than her
sister of incredulity,
it was the human race than her
sister of years, after rapping out its
messages, as they were awake, was the rough outhouses
of some of
its noisiest authorities insisted on the English Crown and Westminster. Even the epoch of her
Christian pastors, she entertained herself, besides, with a dirty
>>>

Cool ! Kind of weird, but cool…

The most fun thing I have found to do with this is as a writing aid. If you have diary, notes or any creative or factual work you have written, you can feed it to this code and it will create a kind of “parody“.

William-Burroughs-London-1988

The writer William Seward Burroughs invented a technique he called “The Cut-up technique”” where he cut written pages into strips of words and then jumbled them and re-assembled them. Isn’t it amazing that our process is almost exactly a computerised version of this! Burroughs felt that the technique can bring up new ideas and tropes, perhaps from the unconscious, although really the randomisation is simply chance based. But the way the mind interprets is not random, so you are feeding yourself text that may help you see in a different way, or even just get some new phrase combinations.

for lots of ebooks to use try this site https://www.gutenberg.org/

Advertisements

Written by Luke Dunn

December 31, 2015 at 9:09 am

2 Responses

Subscribe to comments with RSS.

  1. That’s almost the way that ECTOR, the learning chatterbot, works: https://github.com/parmentf/pyector

    François Parmentier

    January 4, 2016 at 5:15 pm

    • Great! Thanks for that… random text has all sorts of interesting qualities…

      Luke Dunn

      January 4, 2016 at 5:18 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: