The impact of GPT-3 on Google Search, a complex-adaptive system
In this article, I explain what could happen if GPT-3 allowed us to automate content creation. A thought experiment allows us to understand the connection to the idea of query-specific ranking signals
Everybody's reaction to the many GPT-3 demos on Twitter has been the same: First, it's "oh my god, is this real?" Then, it's "People are gonna lose their jobs." The fear of GPT-3 replacing writers and designers is real. While I think some of the concerns are realistic, many are not. Let me explain.
Imagine high-quality content was automatable
Let's do a thought experiment ("Gedankenexperiment" in German - we love those): Imagine everybody could use natural language generation to produce vast amounts of content without much effort. What would happen?
There are two opinions.
One says that writers would be redundant and lose their job. They would be replaced by artificial intelligence like Microsoft recently did.
Microsoft is laying off dozens of journalists and editorial workers at its Microsoft News and MSN organizations. The layoffs are part of a bigger push by Microsoft to rely on artificial intelligence to pick news and content that’s presented on MSN.com, inside Microsoft’s Edge browser, and in the company’s various Microsoft News apps. https://www.theverge.com/2020/5/30/21275524/microsoft-news-msn-layoffs-artificial-intelligence-ai-replacements
Content creation would be taken over by machines and the company with the mightiest algorithms would become unbeatable.
The other opinion is that NLG could take boring and dull tasks off of writers' shoulders and let them focus on the exciting and creative stuff. The art. Editing. Storytelling. The AI would be like an intern on steroids that can execute orders at blazing speed and vast scale but not "think for itself".
I carry the second opinion, at least with the information I have at hand right now.
What I didn't tell you in the quote above is that the writers Microsoft laid off were actually content curators (a job that algorithms can easily take on).
In recent weeks, we've seen many examples of GPT-3, an NLG technology that can create good content at scale. It instills excitement and fear to be replaced by machines. But I don't think it's that simple.
Complex-adaptive systems
Machine learning technology took Google Search from a simple to a complex adaptive system.
A complex adaptive system is a system in which a perfect understanding of the individual parts does not automatically convey a perfect understanding of the whole system's behavior. In complex adaptive systems, the whole is more complex than its parts, and more complicated and meaningful than the aggregate of its parts.https://en.wikipedia.org/wiki/Complex_adaptive_system
Back in the day (about 10 years ago), you could be successful with keyword stuffing or aggressive link building because the system worked in a linear way. The old-school understanding of "the ranking algo" is the idea of a fixed set of ranking factors.
But it's not 2005 anymore.
There are too many parts in Google's ranking algorithms ("the whole") for us to understand how big the impact of a single signal ("its parts") is. The modern understanding is a complex system of many algorithms and signals, differentiated by query and user intent.
Gary Illyes explained it many times, for example at the SF SEO Meetup in 2019:
There’s, of course, not just one algorithm. There are probably millions of little algorithms that work together in unison. One algorithm might endorse what the scientific community thinks [Gary notes that he’s citing the search quality rater guidelines here].https://www.kevin-indig.com/my-notes-from-the-gary-illes-qa-bay-area-search/
Even further, depending on what search result mix shows positive results, e.g. satisfied user-intent or no query refinements by the user, new ranking signals could be added. The system learns and evolves based on user (and human quality rater) feedback.
We marketers contribute to a problem in this logic: sample bias. We look at the top search results to understand what Google wants to show, which results in the creation of more of the same (I do this, too). It makes sense. But it also creates content filter bubbles that can be difficult to break for adaptive complex systems.
How does that relate to NLG and automated content creation? It's simple: if everybody had great content, it wouldn't be a differentiator anymore. It would be table stakes. Just like HTTPS or mobile-friendliness. The impact of "scalable" content on rankings (in certain verticals) would be much lower than it is today.
Let me illustrate this with a second Gedankenexperiment: AI and the stock market.
A common fear is that we could invent an artificial intelligence that can predict the stock market and one person or company would rule the economy. But the stock market is also a complex adaptive system (so is the human brain and immune system). In theory, the stock market would reach to the investments made by the AI, which would change the circumstance and throw its predictions overboard.
The concept of query-specific ranking factors illustrates this even further.
Query-specific ranking factors
In a conversation with Jordan Koene (at Redmond Airport when our flight was delayed) from November 2019, we talked about the idea of query-specific ranking signals.
I first heard about the concept of industry and niche ranking factors from Searchmetrics. In essence, the idea is that ranking signals are weighted differently per query. For life insurance-related queries, for example, https and E-A-T play a higher role. For furniture e-commerce, it might be internal links and structure data.
Hyper-specialization is the result of a fine-tuned complex-adaptive system, which is why it fits so well. Complex-adaptive systems adapt to small differences and inputs over time to refine the outcome. Anything that could be valuable for a user in a certain context could make a difference: speed, accuracy, expertise, etc. Google learns from what users seek and makes small adjustments to the search results. The tweaks are then verified by quality raters and user behavior. Then, they're kept as a staple (or "rolled into the core ranking algorithm").
That should emphasize how the access to high-quality content at scale would dilute content ranking signal(s).
And that brings me to GPT-3.
GPT-3
GPT-3 is a task-agnostic machine learning technology that requires minimal fine-tuning. You give it a few examples and it can extrapolate an output from that.
There are many stunning examples of GPT-3's power on the web.
Style rewriting
Text completion and the combination of style rewriting and text completion. What else should I add to round out the writing tools? GPT-3 #gpt3 pic.twitter.com/XSc6n1hqE2
— Carlos E. Perez (@IntuitMachine) July 25, 2020
Basic search engines
I made a fully functioning search engine on top of GPT3.
For any arbitrary query, it returns the exact answer AND the corresponding URL.
Look at the entire video. It's MIND BLOWINGLY good.
cc: @gdb @npew @gwern pic.twitter.com/9ismj62w6l— Paras Chopra (@paraschopra) July 19, 2020
Memes (thank god)
Tired: Making your own memes
Wired: Asking @OpenAI's #gpt3 to make memes.
Amazed to see how much of cultural subtext and nuance language models can pick up on. cc: @gwern @gdb pic.twitter.com/eBrFAWiZhA— Mrinal Mohit (@wowitsmrinal) July 25, 2020
GPT-3 is a huge milestone in Natural Language Generation. It's trained on 175b parameters, instead of 1.5b as its predecessor GPT-2 - a 116x multiple.
However, GPT-3 is not yet ready to replace writers or designers. Mike King describes how to create category text with GPT-2 in a very detailed article, which also shows its limitations. One of them is that most machine-generated content is machine-detectable and there are already tools for that. And Will Critchlow points out that Google already has guidelines against low-quality machine-generated content.
The content apocalypse
The content apocalypse isn't happening, yet, but GPT-3 is an intense reminder of what's about to come. This begs the question of what happened if we had the power to create very high-quality content at scale; if GPT-3 was as good as we feared.
Here's what I think.
First, it wouldn't take long for every site to have it. In the beginning, it might be just a few company that can act fast and set the pipelines but that window of advantage is relatively short. Soon, it would be available at lost cost and friction. A couple of SaaS companies - some new, some already existing - would provide cheap ways to create content at scale.
Content would then become a hard requirement - binary - similar to https. In other words, I think content would lose its impact on ranking, at least when we talk about generic and easy-to-produce content as at the bottom of category pages. And because search engines would lower its impact, companies would evaluate its value for users more closely.
Human writers would indeed not be necessary anymore to create boilerplate-like text, i.e. the content you find on many category or product pages. I'd also argue that humans never "loved" creating that content in the first place and that this will free up resources to focus more on editing, creative writing, thought leadership, and "exciting" stuff.
I can also see that content will be customized and fine-tuned almost in realtime, especially landing page copy. We can feed an NLG model like GPT-3 with realtime user behavior data to tweak copy and even automate a/b testing until the perfect version is found.
NLG models could also be fed search results data to refine content until a higher position is reached, the page appears in the Featured Snippet or shows answers in FAQ snippet just enough for users to click-through. We could build automated skyscraper techniques. The possibilities are endless.
John Mueller from Google even confirmed that it might not matter whether machines or humans create content, as long as it's valuable.
However, I also see a possible point of oversaturation at which more content is simply not helpful for users. There is a limit.
What do you think? Drop a comment!