July 8, 2012

The Shakespeare Generator


A fool thinks himself to be wise, but a wise man knows himself to be a fool.

I didn't write that. Shakespeare did. Wish I had though.

For us with an ounce less talent but with a will to cut corners I decided to make a Shakespeare generator.

So what is the Shakespeare algorithm?

I don't know, but I can show you my take on the subject.

The easiest way would of course to just randomize the words. In the example above you could get stuff like:
"A think a to himself."
Or
"be knows a to a"

Definitely words from the Shakespeare vocabulary, but still crappy. You will not find any memorable quotes there unless you randomize a lot. A lot like in this.

If I were a linguist I would try the other end of the spectrum. I would analyze each word and implement the grammatic rules on when and how to bend them.

For me that sounds too tiresome. I want quick results.

So I constructed a graph out of Macbeth.

The basic fact was that Macbeth is made out of sentences and the sentences are made out of words.
All sentences has a start and an end, made out of an ending sign such as "?", ".", "!". To make it easier for me I decided that "," ";" and ":" also ends a sentence. After all it is word groups rather than pure sentences that I am seeking.

So the graph starts with a start node and that node is connected to all words that starts a sentence, namely "a" and "but".


And then I just continued adding nodes for each word to the following word. Eg. "A" -> "Fool", "Fool" to "Thinks" etc. 

When a word came to an ending sign, I added a link to a node of that sign, such as "Wise" -> "," and "Fool" -> "."

When a word came up that already had a node, I didn't add a new node, but just a new edge, so the "Wise" node is connected both from "Be" and from "A". 

When the same word pair came up a second time, such as the double connection between "A" and "Fool", I added a one to the weight of the edge, ending up with a weighted graph.

I did this for the complete Macbeth which means that a word such as "A" is connected to 119 other words with different weights depending on how often those words shows up.

When my graph is ready I can randomize a trip through it from start node to an end node. Randomized based on the weight of the edges. Wise huh? This pseudorandomization also sees to that words usually have the correct ending and that all words fit with the word before it.

So some generated sentences:


Mingle with you make us.
Come to the subject of our chimneys were a ring the gate make such welcome.
Whey-face?
Every man.
Madam, 
and blind-worm's sting, 
I' the honest men our offices and 'tis the instruments, 
at peace?


"Come to the subjects of our chimneys" Poetry!

Next step will be to make a graph consisting of all Shakespeares works.

Am I wise or not? (or just a simple fool)

PS. The graph was made using this tool. Recommended!

No comments:

Post a Comment