For years, SEOs suspected user behavior to matter for Google’s Search ranking systems, but Googlers either denied or belittled the point. Internal documents that surfaced in the 2020 US vs. Google anti-trust lawsuit prove the elephant’s existence.
We now have track records, footprints, and photos of the elephant. The house owner doesn’t need to acknowledge it exists. But the biggest takeaway is that we had our mental model of how Search works upside down.
The source of Google’s magic
It’s likely Google’s systems have evolved a lot, especially over the last 3 years, and the antitrust documents only go up to 2020. At least until 2020, Google distracted us from the key ingredient of Search with documentation about content and technical SEO like a skilled magician. User behavior, not content, is the source of Google’s magic. For a long time, Google has been logging events like clicks, hovers, scrolls, swipes, pauses, and query rewrites to understand what users want. (source)
“Two-way dialogue is the source of Google’s magic.”
An internal Google document states, “reliance on user feedback (clicks) in ranking has steadily increased over the past decade”. (source)
What turns our model of Search upside down is the acknowledgment that Google looks at documents but doesn’t actually understand them. Rank systems use 3 groups of signals: content, user engagement and backlinks. But based on internal document documents, it seems Google looked primarily at metadata to understand content. (source)
“We don’t understand documents. We fake it.”
One of the documents brings it to the point: “Search works from induction.” Induction looks for patterns based on observation, while deductive reasoning starts with a theory. (source)
Induction = observation > pattern recognition > conclusion
Deduction = theory > data > analysis > conclusion
Google groups users based on their past behavior to predict what they want. Think about it like Amazon’s “other shoppers also bought”. Multiplied by hundreds of billions of searches, strong patterns emerge.
“…we simply use a user’s past actions to describe them and match users based on their behavioral similarity.” (source)
The question is how to get into people’s minds early when they search. Since 15% of queries on Google are new, the key is to identify and go after new topics instead of established and competitive ones.
Most Google systems like Rankbrain, RankEmbed BERT, or DeepRank wouldn’t work without user signals. We should assume that user behavior matters in all areas of SEO. For example, backlinks that are more likely to get clicks (as described in the user-sensitive pagerank patent) have higher value.
Another example is the importance of optimizing product experiences for user intent, instead of focusing solely on content. A good example is the keyword “qr code scanner”, for which qrcodescan.in ranks #1.
The page has no content, just a way to use your computer’s camera to scan a QR code. It satisfies user intent, and since Google measures user behavior, ranking this page makes the most sense. But it also goes to show that content doesn’t matter in this case.
You would think that without filters like “content” and backlinks, user signals would overemphasize clickbait, spam and porn. But the impact is not material. The most important ingredient to Google’s intent predict pie is “value judgments” a.k.a. quality raters. Google has amassed over 16,000 quality raters that help fine-tune and validate new prediction outputs and models.
I assume these prediction systems were also implanted in other parts of Alphabet, like YouTube recommenders or Google Maps navigation. We often marvel at TikToks’ algorithm, but if Google was really able to build Search mostly on user behavior and backlinks, its algorithm is at least as astonishing.
Circus show
Why would Google keep the use of user behavior signals a secret? You could argue they’d get much better results when being open about the value of user interaction because SEOs would focus more on satisfying users.
I can see 4 arguments:
1/ User behavior signals are (very) gameable, which means Google Search is more gameable than we thought. In my experience, traffic manipulation is very resource-heavy and not sustainable. But internal Google documents call out SEOs and competitors: "everything we leak will be used against us by SEOs, patent trolls, competitors, etc.” (Link)
2/ Google wants to maintain a competitive advantage. Using clicks for ranking was an open secret. Knowing exactly what events Google logs is much more valuable to competitors. However, Google does confirm that it gets a unique head-start due to its data advantage, which is also one of the reasons Google was able to open-source many (AI) models without fearing competition.
3/ Most Googlers truly didn’t know. In some ways, Google is a very siloed operation to protect trade secrets. It’s possible that the majority of people working at Google weren’t aware of the importance of user behavior signals and therefore, spread the word they weren’t very important.
4/ Google doesn’t want users to feel like they’re tracked. Looking at all the logged events and how they are used to predict optimal results. Google even measures whether searchers unfold a knowledge panel or not. It’s reasonable to assume public knowledge could have incurred heavy brand damage like Meta in 2016.
Traffic as a moat
The more people search on Google, the better they understand what users want - not just once but continuously.
In hindsight, Google’s playbook of developing free products to keep people searching becomes apparent. It built a moat of free and universal products: Chrome, Android, Gmail, Youtube, Maps, Calendar, Docs/sheets/slides, Chrome, Translate, Flights/Hotels, Meet, News, Analytics, etc.
In 2013, Google rolled out a unified login for YouTube, Gmail, Search, etc. and built the clearest map of user behavior on the planet. In Chrome’s Activity controls, you can see logged events across almost all Google properties.
Add a $18b deal with Apple to be the default search on top, plus 36% revenue share for Safari revenue, plus a few billion to Mozilla and Samsung, and you realize Google built several traffic moats around its castle that give it a unique competitive advantage of user insights. Google knows what you want, how your profile is identical to thousands or millions of other users, and uses that intel to train predictive algorithms for the researchers coming after you.
How well does Google really understand content?
Our model of what’s important in SEO has been wrong for a long time. We thought content was the basis, backlinks the middle layer and user signals sprinkled on top. It turns out user signals were the basis, with a middle layer of backlinks and content understanding sprinkled on top.
Google might just now get better at really understanding content quality. Besides being able to understand word N-grams first and then embeddings and vectors later, it seems Google was never able to understand what good content is without user signals.
Now, it makes sense why Google rolled Passage Ranking out in Feb 2021.
Martin Splitt about Passage Ranking:
It’s just us getting better at more granularly understanding the content of a page, and being able to score different parts of a page independently.
It sounds a lot like they’ve just developed a basic understanding of content in 2021:
Passages is a ranking feature where we say like, this page covers these five different topics and one of the topics is this specific tomato kind, for instance… whereas the rest of the page talks about cucumbers and gardening in general.
The evolution of content understanding is the Helpful Content Update (HCU). If Google was always able to understand what good content is, then why roll out Passage Ranking and HCU over the last 3 years instead of early on? Instead, Google always knew what users preferred but not necessarily why.
It’s an interesting time for internal pre-2020 documents to be revealed. LLMs have a serious chance of giving Google a run for its money. They are, in a sense, the opposite of search engines. They don’t have an index but understand language and meaning incredibly well, while search engines induct optimal results from past behavior.
Now, search engines might be able to replace a bottleneck of user preference prediction with LLMs: quality raters.
We have found large language models can be effective, with accuracy as good as human labellers and similar capability to pick the hardest queries, best runs, and best groups. Systematic changes to the prompts make a difference in accuracy, but so too do simple paraphrases. To measure agreement with real searchers needs high-quality “gold” labels, but with these we find that models produce better labels than third-party workers, for a fraction of the cost, and these labels let us train notably better rankers. (Source)
The next elephant in the room?
3 more things
If you want to stay on top of Google updates, subscribe for Nick LeRoy’s SEO For Lunch.
For SEO rants you can dance to, subscribe to Searn Markey’s Ranktheory.