Current AI mannequin releases within the latter half of 2025 haven’t improved at performing Search engine marketing-related duties.
TL;DR: What you have to know concerning the LLM benchmark
Claude Opus 4.1 stays one of the best language mannequin for performing Search engine marketing-related duties like technical Search engine marketing, localization, Search engine marketing technique, and on-page optimization.
ChatGPT-5 has improved in our benchmark regardless of the general public’s damaging response to its preliminary launch.
Copilot, which leverages GPT-5, is as performant as OpenAI’s mannequin. It is a main improve because it beforehand underperformed.
Gemini 2.5 Professional is a robust third possibility. It has essentially the most potential impression for SEOs and entrepreneurs because of the base product integration (Gmail, Sheets, Slides, Docs) and AI-focused modalities that push its utility even additional (Opal, NotebookLM).
The AI Search engine marketing Benchmark
In April, Previsible launched the AI Search engine marketing Benchmark, a structured effort to guage how successfully giant language fashions (LLMs) can carry out real-world Search engine marketing duties. This research was centered on answering two core questions:
Can AI reliably carry out Search engine marketing duties at an professional degree?
As these fashions enhance, will their utility change how entrepreneurs ought to useful resource for Search engine marketing and GEO duties?
To reply these, we curated a complete set of questions throughout a number of Search engine marketing disciplines, content material technique, on-page optimization, hyperlink constructing, and technical Search engine marketing. These questions had been developed by a staff of seasoned Search engine marketing professionals with 10+ years of expertise of their respective specialties.
We then ran main LLMs by way of this battery of questions, scoring their responses out of 100. This benchmarking method mirrors how AI efficiency is examined in fields like software program growth, mathematical reasoning, and logic-based duties.
Preliminary findings
Our first benchmark in April delivered spectacular, albeit unsurprising, outcomes:
LLMs carried out properly throughout content-focused Search engine marketing duties like key phrase technique and metadata creation.
Nonetheless, LLMs struggled with technical Search engine marketing, the place precision and predictable considering are crucial.
A brand new wave of fashions
Since then, the panorama has modified dramatically. Practically each main AI supplier has launched a brand new mannequin (with the notable exception of Meta’s Llama). With this inflow of up to date capabilities, we’ve re-run the benchmark and refreshed the leaderboard.
So how do the most recent fashions stack up? And what does this imply for a way Search engine marketing groups allocate time, instruments, and expertise?
Within the subsequent installment, we’ll share up to date scores, efficiency breakdowns by Search engine marketing self-discipline, and implications for entrepreneurs.
Quite a bit has modified since April, so let’s check out the Leaderboard now that almost all main AI corporations have launched new fashions (aside from Llama).
AI Search engine marketing Benchmark
The benchmark has seen some motion however hasn’t damaged by way of the ceiling of what was attainable in April.
For those who’re not a educated Search engine marketing, I’d be extraordinarily cautious about trusting LLMs to carry out Search engine marketing duties.
In researching this submit, we reached out to the Search engine marketing group for examples of AI run amok.
Listed here are just a few examples:
Once I first began utilizing AI for Search engine marketing, it discovered 404 errors for URLs that didn’t exist, which AI claimed had backlinks. I introduced these findings to the dev staff and administration as some form of large “win.”
I wanted to carry out a rank drop evaluation for a big website with a brief turnaround time. I ran the evaluation by way of ChatGPT and was impressed by the categorization and the insights. The staff was excited and needed a deep dive, additional evaluation, and a presentation of the findings. Once I dug slightly deeper, all the underlying “evaluation” turned out to be meaningfully off base, and I needed to begin over and appeared silly.
LLMs don’t adjust to wordcounts; they don’t even perceive them, so I’m led to imagine. So, I ran a script that automated a pair thousand pages of HTML edits and the consequence was full paragraphs of content material and essays in title tags (typical max characters 160!) that additionally value far more than I needed to pay for!
These are anecdotal experiences, however they arrive from skilled SEOs. For those who’re an government who cares about search, you continue to want educated SEOs who can make the most of LLMs correctly.
Has AI progress slowed down?
For many who will not be “AGI-pilled,” you’ve in all probability observed the reasonable tempo of change this 12 months. There may be disruption, however it’s principally impacting the hype bubble, with ChatGPT-5 notably underperforming after its debut.
That isn’t shocking based mostly on what Ilya Sutskiver instructed Reuters final 12 months concerning the “scaling up pre-training—the part of coaching an AI mannequin that makes use of an enormous quantity of unlabeled information to know language patterns and buildings—has plateaued.”
AI will proceed to progress. This benchmark focuses on present utility companies.
If these instruments aren’t offering worth or effectivity in our present workflows, what good are they? Google has been making positive aspects in that space.
Google is the darkish horse
A 12 months in the past, I had written off Google’s early Gemini fashions. As an early consumer, the expertise was underwhelming and, frankly, unusable. Nonetheless, my perspective has utterly shifted with the discharge of Gemini 2.5 Professional.
Gemini 2.5 not solely performs impressively in our benchmark, however it’s additionally deeply built-in throughout the Google ecosystem. That’s the place its true benefit lies.
I can now draft an e-mail that routinely understands the context of paperwork I’ve created in Google Drive, reference conferences from Calendar, or pull insights from Google Docs and Sheets, all inside a single interface. That’s an actual, seamless utility that no different LLM at the moment affords at scale.
Whereas many LLMs wrestle to construct a sustainable moat, Google already has one: ubiquitous information integration. The flexibility to retrieve and act on related data throughout all Google merchandise is a strategic benefit that’s arduous to copy.
Is it good? Not but. Nonetheless, if the tempo of product enchancment continues, Google may quietly change into essentially the most dominant participant in utilized AI.
Making use of the Benchmark: The place AI stands in the present day
We constructed this benchmark to be a dwelling instrument, one thing we’ll proceed to replace as new fashions are launched and capabilities evolve. So the place do issues stand as of September 2025?
Can AI reliably carry out Search engine marketing duties at an professional degree?
No. Regardless of main developments in LLMs, most nonetheless lack expert-level execution, particularly in areas requiring nuanced technique, technical precision, or methods considering.
Will mannequin enhancements change how entrepreneurs useful resource Search engine marketing and GEO features?
Not meaningfully. We’re seeing incremental positive aspects in pace and assist for sure duties, however not sufficient to warrant a full shift in staff construction or funding technique. The utility lies in effectivity positive aspects, not automation at scale.
Briefly, don’t count on ChatGPT or Gemini to interchange your Search engine marketing staff. Count on them to boost it when used correctly.
AI nonetheless disappoints on complicated duties. However the hole is closing.
Keep tuned to the benchmark. Extra importantly, begin leveraging these instruments earlier than your opponents do. Early adoption isn’t only a productiveness enhance – it’s a strategic benefit.
Contributing authors are invited to create content material for Search Engine Land and are chosen for his or her experience and contribution to the search group. Our contributors work underneath the oversight of the editorial employees and contributions are checked for high quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not requested to make any direct or oblique mentions of Semrush. The opinions they categorical are their very own.