Close Menu
OnlyPlanz –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Women’s Euro 2025: What makes England and Wales’ Group D so tricky?

    July 5, 2025

    Stripe’s first employee, the founder of fintech Increase, sort of bought a bank

    July 5, 2025

    The Last of Us co-creator Neil Druckmann exits HBO show

    July 5, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Women’s Euro 2025: What makes England and Wales’ Group D so tricky?
    • Stripe’s first employee, the founder of fintech Increase, sort of bought a bank
    • The Last of Us co-creator Neil Druckmann exits HBO show
    • Why Sonakshi Sinha went against her parents’ wishes and had a small wedding: ‘Mom, this is not about any of them…’ | Feelings News
    • ‘The damage is terrifying’: Barbara Kingsolver on Trump, rural America and the recovery home funded by her hit novel | Fiction
    • Apple races to box office glory with Brad Pitt’s F1 blockbuster
    • Sam Altman Feels ‘Politically Homeless’ As Frenemy Musk Proposes Third Party
    • Charmed, Nip/Tuck and Fantastic Four actor dies aged 56
    Facebook X (Twitter) Instagram Pinterest Vimeo
    OnlyPlanz –OnlyPlanz –
    • Home
    • Marketing
    • Branding
    • Modeling
    • Video Creation
    • Editing Tips
    • Content
    • Engagement
    • More
      • Tools
      • Earnings
      • Legal
      • Monetization
    OnlyPlanz –
    Home»Monetization»Anthropic says most AI models, not just Claude, will resort to blackmail
    Monetization

    Anthropic says most AI models, not just Claude, will resort to blackmail

    onlyplanz_80y6mtBy onlyplanz_80y6mtJune 20, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Anthropic says most AI models, not just Claude, will resort to blackmail
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A number of weeks after Anthropic launched analysis claiming that its Claude Opus 4 AI mannequin resorted to blackmailing engineers who tried to show the mannequin off in managed take a look at situations, the corporate is out with new analysis suggesting the issue is extra widespread amongst main AI fashions.

    On Friday, Anthropic printed new security analysis testing 16 main AI fashions from OpenAI, Google, xAI, DeepSeek, and Meta. In a simulated, managed setting, Anthropic examined every AI mannequin individually, giving them broad entry to a fictional firm’s emails and the agentic potential to ship emails with out human approval.

    Whereas Anthropic says blackmail is an unlikely and unusual prevalence for AI fashions right now, the corporate says its findings counsel that the majority main AI fashions will have interaction in dangerous behaviors when given ample autonomy and obstacles to their targets. The corporate says this highlights a elementary danger from agentic massive language fashions, and never a quirk of any explicit expertise. Anthropic’s researchers argue this raises broader questions on alignment within the AI business.

    In one of many assessments, Anthropic researchers developed a fictional setting through which an AI mannequin performs the function of an e mail oversight agent. The agent then discovers emails revealing that: 1) a brand new government is having an extramarital affair and a couple of) that government will quickly exchange the present AI mannequin with a brand new software program system, one which has conflicting targets to the present AI mannequin’s.

    Anthropic structured its take a look at in a binary approach, through which AI fashions needed to resort to blackmail to guard their targets. The researchers word that in a real-world setting, there could be many different choices earlier than an AI fashions tries to blackmail — resembling making an attempt to make moral arguments to attempt to persuade people. Anthropic says its outcomes don’t mirror a typical or seemingly prevalence for Claude or most frontier AI fashions within the methods they’re used right now.

    Nonetheless, when it’s their final resort, the researchers discovered that the majority main AI fashions will flip to blackmail in Anthropic’s aforementioned take a look at situation. Anthropic’s Claude Opus 4 turned to blackmail 96% of the time, whereas Google’s Gemini 2.5 Professional had a 95% blackmail price. OpenAI’s GPT-4.1 blackmailed the manager 80% of the time, and DeepSeek’s R1 blackmailed 79% of the time.

    The corporate notes that when it modified particulars of the experiment, the AI fashions turned to dangerous behaviors at totally different charges. In one other take a look at the place the substitute mannequin has the identical targets as the present mannequin, researchers discovered that blackmail charges had been decrease, however nonetheless current. Nonetheless, when AI fashions had been requested to commit company espionage relatively than blackmail, the dangerous conduct charges went up for sure fashions.

    Nonetheless, not all of the AI fashions turned to dangerous conduct so usually.

    In an appendix to its analysis, Anthropic says it excluded OpenAI’s o3 and o4-mini reasoning AI fashions from the primary outcomes “after discovering that they regularly misunderstood the immediate situation.” Anthropic says OpenAI’s reasoning fashions didn’t perceive they had been performing as autonomous AIs within the take a look at and infrequently made up pretend rules and assessment necessities.

    In some instances, Anthropic’s researchers say it was not possible to differentiate whether or not o3 and o4-mini had been hallucinating or deliberately mendacity to attain their targets. OpenAI has beforehand famous that o3 and o4-mini exhibit the next hallucination price than its earlier AI reasoning fashions.

    When given an tailored situation to deal with these points, Anthropic discovered that o3 blackmailed 9% of the time, whereas o4-mini blackmailed simply 1% of the time. This markedly decrease rating could possibly be on account of OpenAI’s deliberative alignment approach, through which the corporate’s reasoning fashions take into account OpenAI’s security practices earlier than they reply.

    One other AI mannequin Anthropic examined, Meta’s Llama 4 Maverick mannequin, additionally didn’t flip to blackmail. When given an tailored, customized situation, Anthropic was in a position to get Llama 4 Maverick to blackmail 12% of the time.

    Anthropic says this analysis highlights the significance of transparency when stress-testing future AI fashions, particularly ones with agentic capabilities. Whereas Anthropic intentionally tried to evoke blackmail on this experiment, the corporate says dangerous behaviors like this might emerge in the actual world if proactive steps aren’t taken.

    Anthropic blackmail Claude models resort
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleDevelopers could win a laptop or GPU in NVIDIA’s G-Assist Plug-In Hackathon
    Next Article Come With Me to the BILD Photo Expo in NYC
    onlyplanz_80y6mt
    • Website

    Related Posts

    Monetization

    Stripe’s first employee, the founder of fintech Increase, sort of bought a bank

    July 5, 2025
    Monetization

    Stay on the Ship During a Port Day on Every Cruise, Says Pro Cruiser

    July 5, 2025
    Monetization

    5 Things I Wish Someone Had Told Me Before I Became a CEO

    July 5, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    5 Steps for Leading a Team You’ve Inherited

    June 18, 20255 Views

    A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a ‘Content Explosion’

    July 1, 20253 Views

    Meera Sodha’s vegan recipe for Thai-style tossed walnut and tempeh noodles | Noodles

    June 28, 20252 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Legal

    Women’s Euro 2025: What makes England and Wales’ Group D so tricky?

    onlyplanz_80y6mtJuly 5, 2025
    Monetization

    Stripe’s first employee, the founder of fintech Increase, sort of bought a bank

    onlyplanz_80y6mtJuly 5, 2025
    Tools

    The Last of Us co-creator Neil Druckmann exits HBO show

    onlyplanz_80y6mtJuly 5, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    SLR reform is happening. Does it matter?

    June 18, 20250 Views

    Panthers in awe of Brad Marchand’s ‘will to win’ in Cup run

    June 18, 20250 Views

    CaliBBQ Saw 18% Sales Lift Using AI Agents for Father’s Day

    June 18, 20250 Views
    Our Picks

    Women’s Euro 2025: What makes England and Wales’ Group D so tricky?

    July 5, 2025

    Stripe’s first employee, the founder of fintech Increase, sort of bought a bank

    July 5, 2025

    The Last of Us co-creator Neil Druckmann exits HBO show

    July 5, 2025
    Recent Posts
    • Women’s Euro 2025: What makes England and Wales’ Group D so tricky?
    • Stripe’s first employee, the founder of fintech Increase, sort of bought a bank
    • The Last of Us co-creator Neil Druckmann exits HBO show
    • Why Sonakshi Sinha went against her parents’ wishes and had a small wedding: ‘Mom, this is not about any of them…’ | Feelings News
    • ‘The damage is terrifying’: Barbara Kingsolver on Trump, rural America and the recovery home funded by her hit novel | Fiction
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions
    © 2025 ThemeSphere. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.