Eroding moats

#241 Could a search competitor use Google to gain traction?

Feb 19, 2024

Could Google ever get piggybacked?

In cybersecurity, Piggybacking is an attack on a system where an unauthorized user gains access by exploiting the access of an authorized user. In Growth, Piggybacking is leveraging the user base of one platform to feed your own.

Airbnb posted listings on Craigslist until they reached critical mass.
PayPal initially focused only on Ebay transactions and then branched out.
Youtube created embeddable videos so users could post them on Myspace.

Now, Perplexity seems to be trying it with Google by letting the search engine index search pages (3,400 so far) with answers to user prompts.

Perplexity can create endless answers to longtail questions and flood Google with answers. When visitors find these answers in Google search, they might try Perplexity out and eventually switch over from Google.

The erosion of Google's traffic moat

Piggybacking is a symptom of fading moats. EBay became the largest online 3rd party marketplace but didn't figure out a native online payment system. Craigslist got unbundled. Myspace got stuck in time. It seems Google was aware of the risk.

A document revealed in Google's lawsuit reveals that Eric Lehman, former web ranking engineer at Google, raised deep concerns about a Deep LM system outperforming Google in 2018.1

His reasoning: within 5 years, a machine learning system developed outside of Google could outperform Google in relevance assessment (how well a document matches a search query). His timing was almost perfect.

Lehman mentions that BERT was a step change in relevance that "abruptly subsumed essentially all preceding work." Translation was one of the first use cases where another company, DeepML, quickly caught up to Google, and it caught Google off guard:

For the web answers team, the tidal wave of deep ML that arrived in the last few weeks was a complete shock. With this warning, we should not allow ourselves to be caught off-guard again; rather, we should start thinking through the implications now.

Until then, Google didn't really understand the content of documents (“We don’t understand documents. We fake it.”), but compensated with user signals from hundreds of billions of searches.

From Elephant in the Room, where I wrote about all the mind-blowing reveals from the Google lawsuit:

Google groups users based on their past behavior to predict what they want. Think about it like Amazon’s “other shoppers also bought”. Multiplied by hundreds of billions of searches, strong patterns emerge.

AI threatened the advantage of that workaround.

Lehman again:

Huge amounts of user feedback can be largely replaced by unsupervised learning from raw text.

Whether Google developed BERT for that reason or not is not known, but it came out a year later:

Today, BERT plays a critical role in almost every English query. This is because our BERT systems excel at two of the most important tasks in delivering relevant results — ranking and retrieving. Based on its complex language understanding, BERT can very quickly rank documents for relevance. We’ve also improved legacy systems with BERT training, making them more helpful in retrieving relevant documents for ranking.2

Relevance matching is a great example of an eroding moat due to technological progress. Google's traffic moat is not useless. Hundreds of billions of searches still provide a massive advantage in understanding trends, training neural networks, and grouping users (eg Google's Topics API that's replacing 3rd party cookies). But LLMs erode that moat in at least a few places.

If LLMs make relevance matching easier, what's the value of spending +$50 billion (Traffic Acquisition Cost or TAC) a year to be the default search engine?

From Elephant in the Room again:

Add a $18b deal with Apple to be the default search on top, plus 36% revenue share for Safari revenue, plus a few billion to Mozilla and Samsung, and you realize Google built several traffic moats around its castle that give it a unique competitive advantage of user insights. Google knows what you want, how your profile is identical to thousands or millions of other users, and uses that intel to train predictive algorithms for the researchers coming after you.

In his legendary book 7 Powers, Hamilton Helmer lays out 7 business moats:

Economies of Scale: do something more often
Network Effects: do things with exponential returns
Counter-Positioning: be the opposite
Switching Costs: be hard to leave
Brand: be known
Cornered Resource: be the only one with access
Process Power: do things better / faster

Google's TAC feeds several of these moats:

Brand awareness: people use googling and web searching synonymously.
Cornered Resources: no one else has access to that traffic.
Network Effects: Google uses the traffic to train many of its search systems.

Web ranking is not just the order of classic results but all elements in the SERP. As Pandu Nayak testified in court, Google uses the NavBoost system to measure user engagement with web results and Glue for SERP Features like PAA, map packs or image carousels.3 Already back in 2019, Gary Illyes confirmed that Google uses clicks as a signal to display SERP features. So, Google uses traffic not just for classic results ranking but the layout of all elements in the SERPs. Keep in mind ~15% of queries Google sees every day are new and Google needs to understand the optimal SERP layout quickly.

The opportunity for Perplexity is that they could catch up to Google, at least in relevance matching, without many users. Perplexity has two more benefits:

Measuring the reaction to a single answer is easier than to many web results.
Users ask AI chatbots longer questions, which makes clearer what they want.

Even quality raters seem to be replaceable with LLMs. Google cites over 16,000 external quality raters in its network. A startup would have to raise hundreds of millions of dollars to hire even half of that number, and it would be only one part of search. But LLMs might bring that cost close to 0 - another moat erosion.4

Meta vs. Google - round 2

In Google+ was born to die, I described the intense rivalry between Google and Facebook:

In September 2012, it announced that the service had 400 million registered users and 100 million active ones. Facebook hadn’t even quite reached a billion users yet, and it had taken the company four years to reach the milestone—100 million users—that Google had reached in one.
This contest had so rattled the search giant, intoxicated as they were with unfamiliar existential anxiety about the threat that Facebook posed, that they abandoned their usual sober objectivity around engineering staples like data and began faking their usage numbers to impress the outside world, and (no doubt) intimidate Facebook.

Back then, Facebook quickly grew its user base. As we now know, traffic was the key ingredient in Google's magic soup. No wonder Google panicked and spun up Google+.

In 2018, the competition eventually dried out because Google realized that Facebook wasn't aiming to replace it. Both compete for ad dollars, but the targeting mechanics (intent-based vs behavior-based) are fundamentally different. On top, Facebook got caught up in the Cambridge Analytica scandal. So, Google killed Google+.

Fast forward to 2024: the concept of aggregating the web to give the best search results doesn't seem as appealing as learning from the web to give a single answer. Meta and Google are competing closer again.

Meta open-sources its LLMs to tilt the ecosystem in its favor. But if someone - it could even be Meta itself - were to develop a Google competitor with Meta LLMs, it would only hurt Google. Meta can easily drop a bomb in Google's garden because they have no horse in the search race.

And that's kind of what happens: Perplexity's PPLX model is built (in part) on Meta's LLaMA 2-70b model. Meta also claims to have better data than the open web, which is an old moat. Google was afraid social network data would be a better advertising model, but in 2023, Meta's total revenue ($134 billion) was ~56%~ Meta's advertising revenue ($237 billion). It turns out, the real danger in Meta's data seems to hide in model training.

From Q4 2023 Earnings Super Bowl:

When people think about data, they typically think about the corpus that you might use to train a model upfront. And on Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the common crawl data set. And people share large numbers of public text posts and comments across our services as well.

Deposition

In geology, the opposite of Erosion is Deposition. Resources like Chalk or Coal are the result of deposition of organic material like plankton under pressure.

New technology makes old things easy and new things hard. LLMs erode the advantage of volume and create pressure to create content of intense value. The unlock for SEO is figuring out what type of content LLMs cannot create.

Any content that has a clear structure is not only easy for machines to create but will likely be answered by a bot in the future. But unstructured content without a probable outline cannot be created by LLMs. Think of how good storytellers lead you to think a story is about a topic while it eventually turns about to be about something else entirely.

It's that type of serendipitous, surprising experience that algorithms cannot create. You piggyback on your audience's attention to get them from where they think they want to go to where you want them to be. The future of content marketing is storytelling, not ultimate guides. Relatability over length.

As marketers, we could talk to our audience and fish for answers to questions like:

What is my target audience trying to achieve but doesn't know how to do yet?
Where does my audience's understanding of the problem not match reality?
What is surprising about the way my product solves a problem?
Where is the gap between the actual problem and what my audience thinks the problem is?
How can I bring my audience from understanding the problem to solving it in a single piece of content?
How can I make content relatable?

Link

https://blog.google/products/search/how-ai-powers-great-search-results/

https://thecapitolforum.com/wp-content/uploads/2023/10/101823-USA-v-Google-PM.pdf

https://www.microsoft.com/en-us/research/publication/large-language-models-can-accurately-predict-searcher-preferences/

Eroding moats

#241 Could a search competitor use Google to gain traction?

The erosion of Google's traffic moat

Meta vs. Google - round 2

Deposition

Discussion about this post