Power Laws and the Pareto Principle - Powerful Ideas
Idea: Power Laws, Pareto Principle
Other names: Pareto Law, Pareto Distribution, Scale-free distribution, Matthew Effect
Summary of the idea: Many things in life have a disproportionate relationship between cause and effect.
Examples of the idea: 20% of the people own 80% of the land, Just 1.4 percent of tree species account for 50 percent of the trees in the Amazon, 77% of Wikipedia is written by 1% of its editors (vice), an example of The 90-9-1 Rule
This idea was invented by: Vilfredo Pareto
Why the idea is so powerful: power laws are hard for humans to intuitively recognize but they’re present in many facets of business and life
Internal Link Optimization with TIPR
The hidden Force of Nonlinearity in Digital Marketing
Table Of Contents
The Pareto Principle was invented by the Italian Vilfredo Federico Damaso Pareto (1848-1923) in 1906 and also called “80/20 rule”, meaning 80% of the outcome coming from 20% of the input.
Pareto did more than “just” discover a phenomenon that’s still very relevant today. He also drove the change of economics from a branch of philosophy to a data-driven field of research and mathematical equations. In fact, the Pareto Index is still used today to measure income equally.
As Economist, Sociologist, Philosopher, Vilfredo was particularly interested in wealth distributions and how people came to power. That’s also how he found the famous principle:
“The gulf between rich and poor has always been part of the human condition, but Pareto resolved to measure it. He gathered reams of data on wealth and income through different centuries, through different countries: the tax records of Basel, Switzerland, from 1454 and from Augsburg, Germany, in 1471, 1498 and 1512; contemporary rental income from Paris; personal income from Britain, Prussia, Saxony, Ireland, Italy, Peru. What he found – or thought he found – was striking. When he plotted the data on graph paper, with income on one axis, and number of people with that income on the other, he saw the same picture nearly everywhere in every era. Society was not a "social pyramid" with the proportion of rich to poor sloping gently from one class to the next. Instead it was more of a "social arrow" – very fat on the bottom where the mass of men live, and very thin at the top where sit the wealthy elite.”Benoit Mandelbrot in "Τhe Mystery of Cotton"
The term “Pareto Principle” was brought to the business world in 1940 by Joseph M Juran, a quality engineer who applied the principle to business and sparked the idea for the Six Sigma process.
Examples of Pareto Distributions
20% of the people own 80% of the land
20% of athletes win 80% of the time
20% of training exercises deliver 80% of the results
80% of revenue comes from 20% of customers
20% of keywords deliver 80% of traffic
20% of health care patients use 80% of healthcare resources
20% of products drive 80% of sales
20% of the population transmits 80% of STDs
20% of bugs cause 80% of crashes (according to Microsoft in 2002)
How the Pareto Principle applies in our lives
The Pareto Law is everywhere: investments, technology, time, efficiency, and risks. Once you see it, it cannot be unseen.
It’s the result of compounding effects, a concept best known from interest and described by Warren Buffet as the strongest force in the universe. The power of compounding effects makes long-term investments possible, whether in finance, marketing, or personal self-development.
The rule of 1 percent, also called “Preferential Attachment”, describes the idea of using small improvements to scale progress over time. Getting 1% better every week adds to a whole lot over 10 years. A little advantage compounds over time. The rich get richer (that’s why it’s also called “Matthew Principle”).
It’s fascinating how scalable the Pareto Rule is. In essence, you can ask yourself “what are the 20% of the 20% of the 20%?”. If something has an 80/20 distribution, you can zoom in at the 20% and play the game again and again to find the core causality.
A key component of compounding effects and Pareto Distributions is time. Power Laws and Pareto Principles get stronger over time. Their impact scales. Some distributions grow quicker than others but they all grow. So, when assessing Pareto Distributions, keep in mind how long they have been in place and how they will look in the future.
Technology, or better said the Internet, accelerates the time of compounding effects. In tech, we often talk about Networking Effects and Feedback Loops - same thing. Moore’s Law (the memory of computer chips doubles every year) is a perfect example for that. We can use those effects to grow businesses but the result is that most markets nowadays are winner-takes-it-all markets. They hold one big incumbent, sometimes a decent second place, and many smaller companies that get the scraps. Innovation feeds Power Laws.
When you hear people saying “work smarter, not harder”, they actually mean to look for Pareto distributions. Efficiency is focusing on things that bring the greatest results. According to the Pareto Law, working smarter is focusing on the 20% of input that drives 80% of the output.
The Pareto Rule is also a guide to finding the weakest link in the chain. There’s a saying that a team is only as strong as its weakest player and if 20% of input drive 80% of output, you want to make sure that the 20% are not at risk. That can be applied to managing a team, risk-assessments of product-lines, or competitor-analysis.
The Pareto Principle is a Power Law and as such, an exponential function.
An introduction to Power Laws
“We are moving into the far more uneven distribution of 99/1 across many things that used to be 80/20.”Nicolas Taleb, “Antifragile”
In abstract terms, Power Laws say that a relative change in one quantity results in a proportional relative change in another. As the name indicates, a few things hold much power. Another word is “nonlinearity”: a disproportionate relationship between cause and effect.
Let’s take a quick dive into the math. Don’t cringe, it’s going to be super basic and interesting!
A power-law function looks like this: f(x)=axk
If its exponent, “k” in this case”, equals 1 we speak of a linear function (see below) or power law of the first order (not related to the First Order from Star Wars).
If the exponent of an exponential function equals 2 - in fact, if it’s higher than 1 - we have a “real” Power Law.
If the exponent is 3, the power law is scaled to the 3rd power. If it’s 4, the law is scaled to the 4th power, and so on.
If the exponent is below 1 and above 0, we have a function of diminishing returns (more below).
What happens if the exponent is negative? Then, we have a heavy-tailed distribution. The exciting part about exponential functions with a negative but close to zero exponents is that we see the common long-tail that occurs so often in marketing, e.g. long-tail vs. short-head keywords.
Power Laws either have a decaying or growing effect, depending on whether positive vs. negative exponents. The exponent is the “power” in Power Laws. It decides in which direction the trend accelerates.
The four types of power laws
1. Exponential growth
You have either encountered f(x) = xb in school in exponential functions or economics as exponential growth. I didn’t like it either at the time but now understand how valuable it is.
All startups strive to grow their customer base and revenue by a multiple of what they put in*. Because most startups have low to zero marginal cost, that equation works out if they have product/market-fit. The way to get there is Flywheels and Network Effects. Both have a very similar effect that’s deeply rooted in this Power Law: increasing returns with constant input. In plain terms: more bang for your buck.
*Note that innovation grows in s-curves, not exponential functions.
Concavity, the technically correct term is “negative Concavity”, is most often not a good deal. It’s actually the opposite of exponential growth. Even more so, concavity has no upside - only downside - as opposed to convexity (exponential growth).
3. Diminishing Returns
Diminishing Returns are the stagnation of growth. This power law is an inverse exponential function and increasingly comes up in the Marketing world as the internet accelerates the decay of tactics. Andrew Chen called it “the law of shitty click-throughs”, a.k.a. less bang for your buck.
Most marketers, especially SEOs, are familiar with the concept of “long-tail keywords”. In a nutshell, it reflects the idea that some networks are concentrated at a few nodes and widely spread over many others.
Power Laws thrive under three conditions:
Network effects (word of mouth, for example) to amplify the differences between them
What is the difference between power laws and exponential functions?
The difference between power laws and exponential functions is contextual: Exponential Functions are power laws. When speaking of power functions, we refer to the base (x3); with exponential functions we refer to the exponent (x3). However, the two often go hand in hand with each other and are often used synonymously.
The most fascinating Power Laws
Over 100 power-law distributions have been identified in biology, physics, and social sciences. To list them all would be overwhelming and counter-productive, so I picked the most fascinating ones.
Sizes of moon craters and solar flares
Zipf’s law: the frequency of an item is inversely proportional to its frequency rank, i.e. the second most frequent item occurs half as often as the most frequent one, etc. (Zipf’s law and the Pareto Distribution are both power laws)
Foraging patterns of various species
Most played and sold music is dominated by the billboard top 40
Rhapsody, an online music provider, streamed more songs each month beyond its top 10,000 than it did its top 10,000 in 2007 (The Whole Digital Library” Diane Kresh)
50% of DVD Stations rental revenue came from titles that were not new releases (2005)
“The S&P 500 rose 22% in 2017. But a quarter of that return came from 5 companies – Amazon, Apple, Facebook, Boeing, and Microsoft. Ten companies made up 35% of the return. Twenty-three accounted for half the return. Apple alone was responsible for more of the index’s total returns than the bottom 321 companies combined.” 
Book sales are concentrated around a small number of titles (Greco 1997)
Certain words in most language and family names occur with a much higher frequency than others
Kepler’s Third Law: the further a planet is away from the sun, the longer its orbit
The diffusion of innovation
Just 1.4 percent of tree species account for 50 percent of the trees in the Amazon
77% of Wikipedia is written by 1% of its editors (vice), an example of The 90-9-1 Rule
PageRank: “Some web pages, such as the Google and Yahoo home pages, were linked to vastly more often than others. When the researchers plotted a histogram of the nodes’ degrees, it appeared to follow the shape of a power law, meaning that the probability that a given node had degree k was proportional to 1/k raised to a power. (In the case of incoming links in the World Wide Web, this power was approximately 2, the team reported.)” 
Gaussian vs. Paretian thinking
Opposite to common belief, most things in life don’t have an average. This misconception is best described by Gaussian versus Paretian thinking. A Gaussian Distribution is also known as Normal Distribution or Bell Curve and has a clear average (mean), for example, the average size of humans, IQ scores, or salaries.
The opposite are Paretian distributions, Power Laws or exponential functions, which occur, for example, in the frequency of used words, size of human settlements, distribution of Internet traffic or intensity of earthquakes. They have long and fat tails, which lead to unstable means, infinite variance, and unstable confidence intervals.
If the variance of a distribution is too high, statistically significant results are hard to find. That’s why Power Laws do not have an average or standard distribution to base confidence on, which makes them especially hard to intuitively spot by humans and also unpredictable.
Nassim Taleb refers to events that occur rarely but strongly as Black Swans. They’re unpredictable. Good examples are catastrophes like a devastating meteor crash with planet earth or “the big Earthquake” that’s supposed to destroy California but no one knows when it’s supposed to strike.
Gaussian distributions tend to prevail when events are completely independent of each other. As soon as you introduce the assumption of interdependence across events, Paretian distributions tend to surface because positive feedback loops tend to amplify small initial events. 
In a paper called “From Gaussian to Paretian Thinking: Causes and Implications of Power Laws in Organizations”, Andriani et al. found that the big challenges most companies and managers face are extremes, not averages.