Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

Anthropic has introduced new capabilities that can permit a few of its latest, largest fashions to finish conversations in what the corporate describes as “uncommon, excessive circumstances of persistently dangerous or abusive person interactions.” Strikingly, Anthropic says it’s doing this to not defend the human person, however relatively the AI mannequin itself.

To be clear, the corporate isn’t claiming that its Claude AI fashions are sentient or could be harmed by their conversations with customers. In its personal phrases, Anthropic stays “extremely unsure in regards to the potential ethical standing of Claude and different LLMs, now or sooner or later.”

Nonetheless, its announcement factors to a latest program created to review what it calls “mannequin welfare” and says Anthropic is actually taking a just-in-case method, “working to establish and implement low-cost interventions to mitigate dangers to mannequin welfare, in case such welfare is feasible.”

This newest change is at the moment restricted to Claude Opus 4 and 4.1. And once more, it’s solely speculated to occur in “excessive edge circumstances,” equivalent to “requests from customers for sexual content material involving minors and makes an attempt to solicit info that might allow large-scale violence or acts of terror.”

Whereas these sorts of requests might probably create authorized or publicity issues for Anthropic itself (witness latest reporting round how ChatGPT can probably reinforce or contribute to its customers’ delusional pondering), the corporate says that in pre-deployment testing, Claude Opus 4 confirmed a “robust choice towards” responding to those requests and a “sample of obvious misery” when it did so.

As for these new conversation-ending capabilities, the corporate says, “In all circumstances, Claude is barely to make use of its conversation-ending capability as a final resort when a number of makes an attempt at redirection have failed and hope of a productive interplay has been exhausted, or when a person explicitly asks Claude to finish a chat.”

Anthropic additionally says Claude has been “directed to not use this capability in circumstances the place customers may be at imminent threat of harming themselves or others.”

Techcrunch occasion

San Francisco
|
October 27-29, 2025

When Claude does finish a dialog, Anthropic says customers will nonetheless be capable of begin new conversations from the identical account, and to create new branches of the troublesome dialog by modifying their responses.

“We’re treating this function as an ongoing experiment and can proceed refining our method,” the corporate says.

What's Hot

Canada clears way for $60bn Anglo Teck merger

UK and South Korea strike trade deal

Runway announces its AI general world model GWM-1

EU opens investigation into Google’s use of online content for AI models | Google

EU opens probe into Google’s use of online content for AI models

xAI Seeks More Funding as it Looks to Expand Grok Models

5 Steps for Leading a Team You’ve Inherited

Campbell’s VP Blasts Customers—And He’s Not the First Exec to Do It

A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a ‘Content Explosion’

Canada clears way for $60bn Anglo Teck merger

UK and South Korea strike trade deal

Runway announces its AI general world model GWM-1

Most Popular

SLR reform is happening. Does it matter?

Panthers in awe of Brad Marchand’s ‘will to win’ in Cup run

DOJ Offers Divestiture Remedy in Lawsuit Opposing Merger of Defense Companies

Our Picks

Canada clears way for $60bn Anglo Teck merger

UK and South Korea strike trade deal

Runway announces its AI general world model GWM-1

Subscribe to Updates

What's Hot

Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

Related Posts