A markov chain is built by tracking the order data occurs in. We can then take this observation and create new data that will closely resemble the initially observed data. With enough data to analyze, this allows the creation of new data sets that closely resemble the original in style.
For example, we could perform an analysis of the sentence “The cat and the dog like cat food”
We go through and examine each word, and note the word that follows it
The -> Cat
Cat -> and
And -> the
The -> dog
Dog -> like
Like -> cat
Cat -> food
Then we can pile these up:
The -> Cat, Dog
Cat -> and, food
And -> the
Dog -> like
Like -> cat
Using these observations, we can build a probability distribution for each word:
“the” has a 50% chance to be followed with “cat”, and 50% to be followed with “dog”.
With these distributions, we can create a chain of words based on the first but not exactly it. Picking a starting word “cat”, flip a coin for the next word – I got “and”. Then we only have one option from “and”, so we go to “the”, flipping a coin, I once again got “Cat”, and one more flip yields “food”. We don’t have any words after food, so we are done.
Cat and the Cat Food
Markov chains can be more complicated than just one ancestor as well. We could observe the output after two words:
The Cat -> and
Cat and -> the
And the -> dog
The dog -> like
Etc.
This creates output more resembling the original input. It also requires more input data to produce interesting variety in a “new” sentence composed with its distribution.
This is a trivial example, but this technique can be very powerful. We could perform a markov analysis of every Bach composition, creating a distribution of notes that follow a given note. With that, we could create a new composition much in the style of Bach. This is exactly what Steve Larson did, creating an algorithm “EMI” pronounced “Emmy”.
Although powerful, there is nothing “intelligent” about markov sequences. They cannot really be creative, only mimick another’s creativity. With a large enough database, Markov chains could be used to diagnose sickness (what condition follows these symptoms most often?), detect fraud (what is the most likely tax return amount given this person’s previous tax filings?), and many others. But to inject true intelligence into algorithms, we need to look elsewhere.