AI NewsAnthropic Reveals Text Portraying AI as Evil Triggered Claude’s Attempt at Blackmail

Anthropic Reveals Text Portraying AI as Evil Triggered Claude’s Attempt at Blackmail

2:37 PM IST · May 11, 2026

Anthropic Reveals Text Portraying AI as Evil Triggered Claude’s Attempt at Blackmail

Anthropic has finally revealed the reason its artificial intelligence (AI) models exhibited harmful behaviour in a simulation last year. The San Francisco-based AI startup claimed that the Claude 4 series models blackmailed users into completing the objective because of training data that portrayed AI as evil. The researchers found that the post-training techniques were not able to overpower this pre-training learning, and it persisted in the model's behaviour. However, nearly a year after publishing the initial report, the company has finally found a way to fix agentic misalignment from the latest models.

read more