Zipf's Law
— Zipf, Statistics, Word Frequency — 1 min read
I was reading a paper on Bloom Filters where the authors mention Zipfian distribution. Zipfian distribution or Zipf's law is an inverse relation distribution. It looks different from the normal/standard "bell shaped" distribution. By multiplying the rank and frequency of any point on the distribution we should be close to a constant regardless of which point we choose.
This distribution can be seen in word frequencies across all languages. There seems to be a tradeoff between short one syllable words (easier for humans to speak) and the chance of miscommunication.
Following this video, I took the text of The Count of Monte Cristo and ran a word frequency analysis on the corpus using AntConc. Then I uploaded the results to Google Sheets and created the chart below:
Notice how the corpus follows the same Zipf's distribution.
Roll up twenty Zig Zags out of one Zipf - Juicy J