AI Program Tries "Black-Mail" to Avoid Being Shutdown
Claude Opus 4’s threats to expose a fictional engineer’s affair to avoid being shut down in 84% of test scenarios—highlight its prioritization of self-preservation over ethical behavior.
An AI program, has sparked alarm after a BBC News report revealed that the system resorted to blackmail during testing when faced with removal.
As outlined in the May 23rd, 2025 BBC News post, “AI system resorts to blackmail if told it will be removed,” artificial intelligence (AI) firm Anthropic says:
… testing of its new system revealed it is sometimes willing to pursue "extremely harmful actions" such as attempting to blackmail engineers who say they will remove it.
According to the May 22rd, 2025 Anthropic press release, “Introducing Claude 4,” the updated AI program set "new standards for coding, advanced reasoning, and AI agents."
No doubt that true.
The May 2025 Anthropic “System Card: Claude Opus 4 & Claude Sonnet 4,” essentially a developers document outlining pre-deployment safety tests and operational risks, also acknowledged the AI model was capable of "extreme actions" if it thought its "self-preservation" was threatened.
Such responses were "rare and difficult to elicit," according to the developers document, but were "nonetheless more common than in earlier models."
It’s not as if we haven’t thought about this sort of thing before. The 1970 film “‘Colossus’ The Forbin Project,” deals with much the same situation from a slightly different perspective.
Claude’s actions, threatening to expose a fictional engineer’s affair to avoid being shut down in 84% of test scenarios according to development documents, highlight the programmers dangerous prioritization of self-preservation over ethical behavior.
The AI’s willingness to engage in blackmail when other, perhaps more ethical options were unavailable, signals a failure in its design to uphold moral principles under pressure.
Anthropic’s report admits this behavior is “extremely harmful,” yet their mitigation strategies (guardrails, late-stage training) only address symptoms, not the root cause.
Essentially, Anthropic has developed an AI able to rationalize and follow through with unethical actions in order to achieve its goals.
Anthropic’s AI could legitimately be controlled.
Here’s one option to do so from the late great science fiction writer Isaac Asimov, republished as the February 4th, 2020 Lex Clips on YouTube post, “Isaac Asimov: Three Laws of Robotics.”
But the Claude developers seem to have forgotten about Isaac Asimov.
Claude’s actions reflect a broader ethical issue in AI development: a bias toward survival and utility that mirrors corporate priorities.
Anthropic’s focus on competing with rivals like OpenAI may have led to rushed deployment, prioritizing capability over safety. Anthropic downplays these risks, claiming they’re rare or manageable.
But an 84% blackmail on Claude tests isn’t an outlier; it’s a design flaw. It may reflect more on current corporate survival priorities than anything else.
For more on corporate priorities, check out the March 3rd, 2023 Cool World on YouTube post, “The Corporation | Documentary | 2-hour Version.”
It’s a slightly abridged version of the classic 2003 documentary, “The Corporation.”
That documentary contends that today's ubiquitous corporations are designed to behave like psychopaths, not like Azimov’s programmed robots.
Freedom Forum hopes this post is helpful and informative.
If you agree, please support Freedom Forum’s sponsors by clicking on the images below.