Free content

#237 is ad and affiliate-monetized content still a good business model?

Jan 22, 2024

Optimism among publisher CEOs is not high.

Just half (47%) of our sample of editors, CEOs, and digital executives say they are confident about the prospects for journalism in the year ahead, with around one-tenth (12%) expressing low confidence. Stated concerns relate to rising costs, declining advertising revenue, and a slowing in subscription growth – as well as increasing legal and physical harassment.1

Publishers are getting squeezed from multiple sides:

Few are prepared for the death of 3rd party cookies.
Social networks send less traffic (Meta) or deteriorate in quality (X).
Most young people get news from TikTok.
SGE looms on the horizon.

From The cookie crumbles:

McKinsey estimated publishers will lose $10 billion in ad revenue.
NAB found that the broadcast radio and television industry would lose $2.1 billion in digital advertising revenue annually, representing 6.3% of the industry’s total advertising revenue, if third-party cookies were eliminated today with no privacy-preserving alternatives.

Another symptom of the free content monetization issue is that news aggregator Artifact shut down. Reason: The market opportunity is too small.

I really enjoyed the app. I was created by Kevin Systrom and Mike Krieger, the co-founders of Instagram, who might know a thing or two about building successful consumer apps. Artifact made news scanning easy, summarized key points with AI and provided a seamless experience. But even a great offering is not enough when the market isn’t there.

Ads are not the only way to monetize “free” content. Affiliate is another business model, and it’s also under heavy pressure.

According to Axios, Red Ventures is exploring a sale of CNET2:

Red Ventures has been quietly approaching strategic buyers, mostly other large media holding firms, for several months to gauge their interest in CNET, but talks began to ramp up before the holidays, sources told Axios.

Red Ventures bought CNET in 2020 for $500 million. Over the last 3 years, total traffic to cnet.com has declined by over 70%.

That’s not to say externalities are at fault for CNET’s decline, but you have some of the best SEOs and content marketers working at Red Ventures. The competition is tough. Google has become Hardcore. Affiliate rates have been cut many times.

It’s rough out there. Soon, SGE could come on top of the end of 3rd party cookies and less traffic to the open web.

The SGE hammer

An article from the Wall Street Journal uncovered uncomfortable statistics about the impact of SGE3:

About 40% of the magazine’s web traffic comes from Google searches, which turn up links that users click on. A task force at the Atlantic modeled what could happen if Google integrated AI into search. It found that 75% of the time, the AI-powered search would likely provide a full answer to a user’s query and the Atlantic’s site would miss out on traffic it otherwise would have gotten.
While Google says the final shape of its AI product is far from set, publishers have seen enough to estimate that they will lose between 20% and 40% of their Google-generated traffic if anything resembling recent iterations rolls out widely.

20-40% less traffic would mean the end for many publishers if no alternative pops up.

Many content sites get over 80% of traffic from Google.

Barry Diller, chairman of IAC and Expedia, said all major AI companies, including Google and rivals like OpenAI, have promised that they would continue to send traffic to publishers’ sites. “How they do it, they’ve been very clear to us and others, they don’t really know,” he said.
Many of IAC’s properties, like Brides, Investopedia and the Spruce, get more than 80% of their traffic from Google, according to SimilarWeb.

One complication is that Google can’t track LLM output. As a result, more publishers might try AI to create more content to make up for traffic gaps.

IAC senior executives met with Google senior executives at the Allen & Company conference in Sun Valley, Idaho, in July to discuss AI. Google told publishers at the meeting that it can’t directly trace the sources behind the outputs of AI systems, despite recent technological advances, said people familiar with the discussions.

Already today, you can buy services that create hyper-local AI-generated content.

AI-generated content used by local publishers

The same WSJ article states that SGE is tested on 10 million users, by the way. It’s such a good piece that I’m gifting you a free version.

The pressure on publishers and affiliates paired with newly available technology leads to more SEO spam.

LLMs and spam

A new research study from Germany has investigated whether “Google is getting worse”.4 To get an answer, the researchers gathered results on Google, Bing, and DuckDuckGo for 7,392 product review queries. Then, they compared the ranks with the research search engines Chat Noir and ClueWeb22, which rank content with BM25* and then rerank it with BERT.5

*(BM25 is a score calculated by considering how often the query terms appear in the document, adjusting for the document's length and the uniqueness of the query terms in the entire document collection. Documents with higher scores are considered more relevant to the query.)

The study reveals a few interesting points:

First, it’s easy to recognize affiliate content in search engines. If the researchers can, Google can.

We also find strong correlations between search engine rankings and affiliate marketing, as well as a trend toward simplified, repetitive, and poten- tially AI-generated content.

Examples of correlations between search optimization and affiliate content

Second, Google is getting better over time but still features a significant amount of SEO spam in the top results.

Third, it’s very hard to stay on top of spammers. It’s a trench war: spammers get hit by algorithm updates and slowly fight their way back.

The study results make it painfully obvious why Google fired a barrage of algorithm updates around reviews on the web over the last 3 years.

A slide from a recent presentation I gave about algorithm updates

The amount of spam and over-optimized content on Google has been criticized a lot lately. The German study has also found a relationship between search optimization and lower perceived quality:

A recent study by Schultheiß et al. investigates the compatibility between SEO and content quality on medical websites with a user study. The study finds an inverse relationship between a page’s optimization level and its perceived expertise, indicating that SEO may hurt at least subjective page quality.

A big part of the “problem” is that SEO works, and companies use it to make money. Personally, I’ve seen many examples of content that is optimized and has a higher perceived quality as a result. But it’s not the default. The truth is, you can optimize content for search, make it rank and also make it very generic in the process. Publishers and affiliates who need to save revenue won’t be able to shy away from that for long.

From Content Goblins:

⛏️The underlying problem: These days, I cringe when people separate “content for Google” from “content for users”. And yet, not everything that’s good for readers is good for SEO. The winning playbook in SEO often leads to mediocre content, but it works. The harsh reality is Google hasn’t figured out how to reward the best content and filter out fluff. To some degree, you need to blame the game, not just the players.

The study shows not just review farms and pure spam sites but also many publishers who get away with lower quality:

The most common category of new pages that enter the top 30 for the first time seems to be magazines, which hints that having a separate low-quality review section to support a site’s primary content is a successful and lucrative business model.

Types of sites ranking for product review queries over time (x-axis = time)

Perplexity AI CEO Srinivas sees AI chatbots as the solution to the problem:

“With Perplexity, there’s no need to click on different links, compare answers or endlessly dig for information,” he said. “The era of sifting through SEO spam, sponsored links and multiple sources will be replaced by a more efficient model of knowledge acquisition and sharing, propelling society into a new era of accelerated learning and research.”6

Perplexity raised $73.6 million from investors like IVP, Jeff Bezos, Tobi Lütke and other big names. The goal: take on Google with an AI search engine.

While over 10 million people already use Perplexity, taking on Google is nearly impossible. The opportunity is juicy: 1% of Google’s revenue is $2.8 billion, and Perplexity doesn’t have the same constraints as Google. But many others, like Neeva, tried and failed despite massive funding and key executives from Google.

Perplexity throws a bone to the open web by diligently linking out to sources, which increases the chances of survival for publishers. However, publishers aren’t silently waiting for their death sentence.

Lawsuits

The New York Times (NYT) is suing OpenAI for copyright infringement. The strongest argument is the exact citation of some of their articles, as the filing shows.7

An excerpt of the NYT lawsuit filing against OpenAI, showing direct citations of the NYT’s content

The phenomenon of LLMs citing content verbatim from training data is called memorization and a bug, according to OpenAI’s response to the lawsuit.8

While there is something to be said about being compensated for training data, the NYT lawsuit might not be the obvious winner it initially seems.

For example, the prompts used in the lawsuit filing seemed to have been set up in a way that ChatGPT had no other choice than to return the content verbatim.

Screenshot from the lawsuit filing showing the prompts used to find memorization

The prompts are similar to brand queries on Google. Of course, Google returns results from a site when its name is mentioned in the search. Chat GPT behaves no differently, and why should it?

The lawsuit argues OpenAI violated paywalled articles, but the prompted article in the screenshot above (“Snow Fall: The Avalanche at Tunnel Creek”) is not paywalled. On top, some NYT articles are fully cited in web forums or other 3rd party websites.

From OpenAI’s response (bolding mine):

Interestingly, the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.

Many see the NYT lawsuit as a precedent for other publishers, but the NYT might actually benefit from AI content and chatbot usage. There are thousands of outlets for news, most of them replaceable. But the New York Times and a few other publishers have a highly differentiated brand.

The NYT is also set up for success, being the largest paywalled site on the web:9

70% of revenue comes from subscriptions (47% digital, 23% print)
20% of revenue comes from advertising
10% of revenue comes from affiliate (The Wirecutter), events, and more

The lawsuit raises a couple of interesting questions for ad and affiliate-monetized content on the web moving forward:

How can publishers and affiliates protect their content from being crawled by LLM developers or non-profits like Common Crawl?
What does the primary monetization model for publishers and affiliates look like in an LLM-first world?
Where is the line between a non-profit vs a for-profit using copyrighted content? OpenAI seems to have used public indices of web content like Common Crawl when it was still a non-profit.
Are Featured Snippets less copyright infringement than LLM quotes or paraphasings? Is it okay when forums or blogs copy/paste paywalled articles?
What are the responsibilities of search engines vs. AI chatbots when it comes to giving direct answers?

https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2024

https://www.axios.com/2024/01/16/red-ventures-cnet-sale-talks

https://www.wsj.com/tech/ai/news-publishers-see-googles-ai-search-tool-as-a-traffic-destroying-nightmare-52154074

https://downloads.webis.de/publications/papers/bevendorff_2024a.pdf?ref=404media.co

https://lemurproject.org/clueweb22/qryspecs.php

https://techcrunch.com/2024/01/04/ai-powered-search-engine-perplexity-ai-now-valued-at-520m-raises-70m/

https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf

https://openai.com/blog/openai-and-journalism

Mantas Gudeika

Red Ventures selling CNET? Even though traffic's have gone down, they've must have at least 3xd the revenue with their affiliate engine that they've put under CNET's hood after the acquisition.

Expand full comment

3 replies by Kevin Indig and others

Nick

Fantastic post!

1 reply by Kevin Indig

4 more comments...

Free content

#237 is ad and affiliate-monetized content still a good business model?

The SGE hammer

LLMs and spam

Lawsuits

Discussion about this post