We tracked 70 users across 8 tasks to deeply understand their behavior, thoughts emotions when engaging with Google's AI Overviews and other SERP Features. The findings paint a new picture of SEO.
Damn, I love this. It’s highly relevant to the Google engineer’s comment on the DOJ case: “When search results are worse, people attempt fewer tasks. When they're better, they attempt more.”
To me, this highlights the need for a framework that differentiates task completion vs task expansion queries.
This approach can better align brand perception (EEAT), content type, format, site architecture, SEO best practices, UX, CRO, new flavor of Off-site / barnacle SEO (3rd party + social validation paths), and metrics (including brand sentiment / customer happiness scores).
It’s not a new concept, but the application and medium are evolving as user behavior shifts alongside AI-driven experiences.
Secondly, are we building a generation of answer-dependent users, or simply refining the art of finding truth?
Hopefully, it’s the latter. With thoughtful design and user-centric experiences, the ecosystem can still make the web a better place, even as the landscape shifts and traditional visibility declines. I love where SEO is going, thank you and the team for sharing this.
This is excellent. Thanks for putting it together. Loads of insights to shamelessly steal (I mean, strategically summarise) for clients.
If you’re open to it, I had a few follow-up questions:
1. In my experience, one challenge with remote usability testing is that participants often engage more deeply than they would in an organic setting (demand characteristics, and all that).
Do you think your testers might have paid more attention to AIOs than typical users would in the wild, possibly making the CTR drop-offs here lower than what might happen naturally? Or does this roughly align with the larger-scale traffic patterns you’ve seen?
2. How much variance did you notice in scroll depth or click-out rates across categories? Did user behaviour tend to cluster, or did it vary widely by query type? And did you categorise queries in any more granular ways that could be used to stratify the data, beyond health/DIY etc (e.g., by query structure, industry, etc.)?
Essentially I’m wondering how likely it is for scroll depth to be vary significantly across industries and different query types.
3. That section on users clicking through to social platforms to validate AIOs is fascinating—30% on desktop is higher than I’d have guessed. When you say a user clicked through to Reddit or other forums, was that specifically through the “Discussions and forums” SERP feature, or did it also include clicks on organic listings or search refinements?
Sorry if any of those are answered in the write-up and I missed it. Really appreciate the work you’ve done here.
1/ No, we accounted for that by mixing the questions with queries that don't return AIOs.
2/ user behavior varied by query type. for example, finding a coupon and comparing drug side effects were very different. for the former, users basically skim. for the latter, they really take their time. The variance is there for sure (see mean vs. median).
3/ it varied, but users sought out Reddit in either way. I will say the 30% is also an average that varies by query stakes and context.
Thanks for taking the time to get back to me. Really appreciate the response.
1) but there’s still a risk that the cohort given the AIOs engage differently with those AIOs than they would if they weren’t under test conditions. Did you get the sense they were behaving differently, and that in some cases engagement might be even lower in non-test environments, and that therefore that being right at the top of AIOs might be even more important?
2) Yeah, that makes sense. Mean/median split definitely suggests some high-engagement outliers (could even be related to point 1).
But if you have any specific figures easily to hand on data distribution – standard deviations, coefficients of variance etc – I’d love to read more about it.
3) makes sense, thank you.
Thanks again for taking the time to answer – it’s excellent research.
1/ Yes, any study that is not double blinded (participants have no idea they're being watched) will have some sort of bias incorporated. We accounted for that as good as possible in the design of the tasks and I didn't get the feeling people were particularly biased. But I can't rule it out completely.
Absolutely fascinating. Thanks for putting this together.
It's always good to see research back up your intuition. I so rarely read the full AI response - AIO or chatbots - I read the first few lines and maybe scan some more of the content.
It is also interesting to see how few clicks the AIO citations are getting and that more often than not users were still clicking on organic.
I realise this is qualitative research but I wonder how many of these metrics could be measured at scale?
Finally some real user insight behind all the speculation around AIOs. That 2/3 drop in desktop clicks is wild. Question - with visibility becoming the new currency (instead of clicks), how should we start thinking about measuring content performance going forward?
I think the new measuring models still need to be built. My take is that we need to measure visibility, for example in share of voice, and find a way t9 track quality impressions. I'll publish more on that shortly :).
Well done, Kevin. Great work. Very important for folks working in-house and service providers to communicate the changes and impact of their work. Keep it up!
I remember we have chatted about these things back a few months in the old year. Great to see what you have done here. This piece of work made me a paying subscriber to pro plan.
So this is solid stuff, really. The findings here are (directionally) helpful for marketing leadership for sure.
But the very first video really led me the doubt the solidity or voracity of the findings. The participant was given a task (find the best month to purchase a car), did one search, and saw an answer. Task complete.
A research participant is motivated to finish a task, so they can do more tasks and therefore earn more $$.
That's the opposite motivation of someone truly looking to buy a car. They are motivated to find the truly best time to by a car, so will likely spend more time than just a couple minutes.
We're drawing a lot of conclusions here... We really need Similarweb (or other) data that can look into these questions en masse and real world...
Honestly though, the findings here are informative, but I don't make a budget decision from them just yet...
Masterclass of a post.
Seems like a must read for any growth marketer, thank you Kevin for this gold mine of insights.
I watched the video and skimmed thru but I’m definitely booking a good hour tomorrow to go through it in detail. More comments to follow.
Thanks mate !
Thanks for this amazing post. It was cool to see GoodRx show up in the AIO in the health-related user video 😎
my pleasure! Good work, GoodRX ;-).
Damn, I love this. It’s highly relevant to the Google engineer’s comment on the DOJ case: “When search results are worse, people attempt fewer tasks. When they're better, they attempt more.”
To me, this highlights the need for a framework that differentiates task completion vs task expansion queries.
This approach can better align brand perception (EEAT), content type, format, site architecture, SEO best practices, UX, CRO, new flavor of Off-site / barnacle SEO (3rd party + social validation paths), and metrics (including brand sentiment / customer happiness scores).
It’s not a new concept, but the application and medium are evolving as user behavior shifts alongside AI-driven experiences.
Secondly, are we building a generation of answer-dependent users, or simply refining the art of finding truth?
Hopefully, it’s the latter. With thoughtful design and user-centric experiences, the ecosystem can still make the web a better place, even as the landscape shifts and traditional visibility declines. I love where SEO is going, thank you and the team for sharing this.
On point!
The Kevin Indig value!!! 🔥
And all for free?!?! He never stops 🎉
🙏🙏🙏
This is excellent. Thanks for putting it together. Loads of insights to shamelessly steal (I mean, strategically summarise) for clients.
If you’re open to it, I had a few follow-up questions:
1. In my experience, one challenge with remote usability testing is that participants often engage more deeply than they would in an organic setting (demand characteristics, and all that).
Do you think your testers might have paid more attention to AIOs than typical users would in the wild, possibly making the CTR drop-offs here lower than what might happen naturally? Or does this roughly align with the larger-scale traffic patterns you’ve seen?
2. How much variance did you notice in scroll depth or click-out rates across categories? Did user behaviour tend to cluster, or did it vary widely by query type? And did you categorise queries in any more granular ways that could be used to stratify the data, beyond health/DIY etc (e.g., by query structure, industry, etc.)?
Essentially I’m wondering how likely it is for scroll depth to be vary significantly across industries and different query types.
3. That section on users clicking through to social platforms to validate AIOs is fascinating—30% on desktop is higher than I’d have guessed. When you say a user clicked through to Reddit or other forums, was that specifically through the “Discussions and forums” SERP feature, or did it also include clicks on organic listings or search refinements?
Sorry if any of those are answered in the write-up and I missed it. Really appreciate the work you’ve done here.
Haha, my pleasure!
Regarding your questions:
1/ No, we accounted for that by mixing the questions with queries that don't return AIOs.
2/ user behavior varied by query type. for example, finding a coupon and comparing drug side effects were very different. for the former, users basically skim. for the latter, they really take their time. The variance is there for sure (see mean vs. median).
3/ it varied, but users sought out Reddit in either way. I will say the 30% is also an average that varies by query stakes and context.
All good! Happy to clarify, of course :).
Thanks for taking the time to get back to me. Really appreciate the response.
1) but there’s still a risk that the cohort given the AIOs engage differently with those AIOs than they would if they weren’t under test conditions. Did you get the sense they were behaving differently, and that in some cases engagement might be even lower in non-test environments, and that therefore that being right at the top of AIOs might be even more important?
2) Yeah, that makes sense. Mean/median split definitely suggests some high-engagement outliers (could even be related to point 1).
But if you have any specific figures easily to hand on data distribution – standard deviations, coefficients of variance etc – I’d love to read more about it.
3) makes sense, thank you.
Thanks again for taking the time to answer – it’s excellent research.
Sure thing! My pleasure :)
1/ Yes, any study that is not double blinded (participants have no idea they're being watched) will have some sort of bias incorporated. We accounted for that as good as possible in the design of the tasks and I didn't get the feeling people were particularly biased. But I can't rule it out completely.
2/ Will check
Absolutely fascinating. Thanks for putting this together.
It's always good to see research back up your intuition. I so rarely read the full AI response - AIO or chatbots - I read the first few lines and maybe scan some more of the content.
It is also interesting to see how few clicks the AIO citations are getting and that more often than not users were still clicking on organic.
I realise this is qualitative research but I wonder how many of these metrics could be measured at scale?
Thanks! Check out my previous posts with lots of aggregated quant data.
You're right that people still click organic results, but I will also say that AIOs decrease clicks significantly. Both is true at the same time.
Finally some real user insight behind all the speculation around AIOs. That 2/3 drop in desktop clicks is wild. Question - with visibility becoming the new currency (instead of clicks), how should we start thinking about measuring content performance going forward?
Thanks!
I think the new measuring models still need to be built. My take is that we need to measure visibility, for example in share of voice, and find a way t9 track quality impressions. I'll publish more on that shortly :).
Well done, Kevin. Great work. Very important for folks working in-house and service providers to communicate the changes and impact of their work. Keep it up!
thanks so much!
I remember we have chatted about these things back a few months in the old year. Great to see what you have done here. This piece of work made me a paying subscriber to pro plan.
🙇🏻♂️🙇🏻♂️🙇🏻♂️
Großartig Kevin. This one made me a subscriber.
Aw huge! Danke, Ralf :)
Outstanding Kevin. Unparalleled insights. Insanely valuable for businesses and SEOs IMO
thank you so much!
Well this is an amazing analysis/write up, Kevin. Upgrading to paid 👏
Thanks so much ,Tyler!
So this is solid stuff, really. The findings here are (directionally) helpful for marketing leadership for sure.
But the very first video really led me the doubt the solidity or voracity of the findings. The participant was given a task (find the best month to purchase a car), did one search, and saw an answer. Task complete.
A research participant is motivated to finish a task, so they can do more tasks and therefore earn more $$.
That's the opposite motivation of someone truly looking to buy a car. They are motivated to find the truly best time to by a car, so will likely spend more time than just a couple minutes.
We're drawing a lot of conclusions here... We really need Similarweb (or other) data that can look into these questions en masse and real world...
Honestly though, the findings here are informative, but I don't make a budget decision from them just yet...
Keep in mind that this was just one clip out of 70 participants that solved the task...
Sure, incentivized studies are never perfect but creating unbiased conditions for Search isn't possible (without crossing strong ethical borders).
Yep 100%!
Looks like you did the best possible for user studies!
My, my, top-notch post! Many thanks for sharing!
Here is my link to subscribe. Subscribe and I’ll subscribe back. https://thebusinesscoach.substack.com/