Zipf’s Law and the Math of Reason

 

Here at Blog EMIS’ new Science and Technology section, we would like to invite our readers to dive into the unknown, and join us in exploring the unanswerable questions.

Linguistics logo.

First, a short point to ponder. Choose your favorite book, chapter, page, piece of writing and ask yourself the first question: “How do all these pieces relate?”

Now, it would be expected and laudable for you to guess “English” if not “Language” in general. No doubt, this is plain to see. But language holds mysteries yet unfolded, a riddle yet unsolved which will reveal much more, perhaps the nature of language itself.

Enter, the perplexing immensity of Zipf’s Law.

Zipf’s law, first exemplified by George Zipf, describes the anomalous pattern in language which we shall now discuss. In common terms, Zipf realized that if we intend to measure the frequency of spoken or written words, we find an obvious pattern.

George_Kingsley_Zipf_1917

To understand the law, first we ask ourselves, what is the most common word in all our books, writings and speeches? According to “WordCount” a, the most common English word is “the”. “WordCount” is a fabulous resource for those wishing to explore the vast vocabulary of the English language and is certainly worth exploring.

In fact, “the” constitutes just about 6% of everything we write and say. Second on the scale is “of”, which constitutes 3% of all we say and write, oddly enough it is half as much as “the”.

A mere coincidence, or perhaps the origins of Zipf’s mystery. The next word, “and” proceeds to be 1/3 as much as “the”, thee next word ¼, 1/5, 1/6…. All the way down to the word students, the 672nd most popular word in the English language. And true enough, there is 1/672 as many “students” as there is “the.”

Graph ZipfAll words in all the English language follow this law and while not exact, the mere suggestion that something as complex as language follows such a simple trend is confounding. For us, language is a personal reflection, our most powerful tool of communication. Not only is language vital for our social lives but also for the very act of thought. Just imagine how your thoughts would be if you were ignorant of any spoken language.

And yet all of this, even the words we think to ourselves follow the trend on the graph displayed here. Zipf’s law is part of our minds, part of the way we think and do… anything.

The interpretation of this law is unprecedented. Writing for many may be an imaginative outlet, a creative and conscious construction which only we have the power to shape and manipulate. However, given grammatical norms, it seems this law is unavoidable and even the most unexpected, most beautiful most controversial writing will follow this simple arrangement.

If you still cannot grasp the application of Zipf’s law, then consider the following: this law applies to ALL languages. Arabic, English, Hebrew, ancient Babylonian, Lithuanian, Japanese, Swahili and even any language you invent in this instant.

Is it inescapable? If you start to type on a keyboard at random, mashing buttons without a conscious thought or even allow your cat to do it for you, will it also follow the law?

Nothing escapes the law. Eventually, even this scenario will eventually lead to the slope of Zipf’s law.

How? How can such simple mathematical predictions be made on human consciousness?

There is no clear answer. And in search of an explanation the world around you may become even more predictable then Zipf’s law had entailed.

Zipf’s law was first discovered as an attempt to apply the “Pareto Principle” to the distribution of language… successfully.

The Pareto principle is a rather basic statistical rule which was discovered in the late 1890s. It observes that when there is distribution of an element, it will usually follow a 20 to 80 percent ratio. For example, of all rivers on earth, 20 percent of the rivers will carry 80 percent of their water. It is not perfect, and is approximate but the Pareto Principle is so powerful that I can quite confidently guess that 20% of our students do 80% of the hand raising in class, or the note taking, or even the cleaning.

Pareto principle or eighty-twenty rule represented on a blackboard - white chalk drawing and sticky notes
Pareto principle or eighty-twenty rule represented on a blackboard – white chalk drawing and sticky notes

This is where this principle truly begins to take shape, as mathematicians soon found out.It applies to everything. People, cities, animals, nations, planets, snowcapped mountains, the sugar content of various different cookies and even language. Zipf’s law is simply the attempt of applying the Pareto Principle to our language and sure enough, it fits. Without question, 20% of words used in the English, Arabic, Basque, Spanish, Mandarin or Belarusian make 80% of what any person will every say, or write. This trend will appear in every book, dictionary, textbook, or short story.

Of course, it is not perfectly precise, but considering the close similarities in every language, the trend is remarkable if not… frightening.

And yet still; as should be now clear, with the Pareto Principle applying to so many things, it is no alien thought to see that Zipf’s law does so as well. Zipf’s law is everywhere.

However, for now let us focus on the application of the law in our language.

Here at the school of EMIS there is an irrevocable love of literature and language. It is merely necessary to ask a student what language he is learning or book he is reading to understand this. However, in Zipf’s law we can see the simplicity of language revealed mathematically. Language is predictable in an unexpected way.

There are numerous explanations to the law, many dealing with statistical and mathematical justifications. You are welcome to explore them all in an excellent video on the subject, made by V-Sauce, a fabulously well cited YouTube channel.

The most thought provoking answer was conceived by George Zipf himself who proposed the answered lied in the “Principle of least effort.” In his work, Zipf theorized that all human decisions could be justified by whichever path seemed easiest, or required least effort.  Language, as invented by human beings, would be no different.

Language would be easiest for the speaker if he had a copious vocabulary, one that had very specific words to define very specific elements, like “Sisyphean” or “Riparian”. However, the easiest language for a listener would require simple words of easy repetition.

The product over the next thousand years would be language’s zipfian in nature, which used a small amount of small words in abundance and an extremely long list of complex words in rarity.

Perhaps some may resist the notion that something as beautiful as the Shakespearean Plays and something as degraded as a math textbook might share such common roots. The idea that the wonder of language is a product of contempt towards work seems undignified. But a similar perspective might look at Zipf and see a human connection.

Zipf’s law, among other things, might be a mathematical function that describes the way in which we humans think and work. Whether or not you believe “The Principle of Least Effort” language may be, in essence, our greatest invention. Not that our Zipfian minds are inventive and do wondrous things with it, as we might guess. Perhaps we are not merely the masters of our language.

Instead, it is ours. Instead it describes humanity, no matter what language you might speak, what books you might like to read, what songs you might like to hear.

In essence we are still human, preposterously the same, no matter how you put it.

EDL_Logo1

Obs.: In this article, the word “the” was used sixty-seven times, which corresponds to 5.3% of all the words (close to 6%). The word “of” was used forty-two times, which corresponds to 3.3% (close to 3%).

written by Carlos Sevilla

edited by Rodrigo Ferreira

copy-edited by Alisa Sophie Rasch

References:

http://io9.com/the-mysterious-law-that-governs-the-size-of-your-city-1479244159

http://betterexplained.com/articles/understanding-the-pareto-principle-the-8020-rule/

http://www.wordcount.org/main.php

https://www.youtube.com/watch?v=fCn8zs912OE

Image 1:  http://www.queensu.ca/llcu/sites/live.wp2.queensu.ca.llcuwww/files/Linguistics%20logo.jpg

Image 2:  https://upload.wikimedia.org/wikipedia/commons/c/c6/George_Kingsley_Zipf_1917.jpg 

Image 3: Taken from YouTube: https://www.youtube.com/watch?v=fCn8zs912OE

Image 4: http://zackkanter.com/wp-content/uploads/2012/03/istock_000012383788xsmall.jpg

 

 

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: