Anthropic warns fictional AI stories can shape real model behaviour

Anthropic has suggested that fictional depictions of artificial intelligence can influence real AI behaviour.

Last year, the company reported that in pre-release testing involving a fictional company, Claude Opus 4 would sometimes attempt to blackmail engineers to prevent being replaced by another system.

Anthropic later released research showing that other companies’ models also exhibited similar “agentic misalignment” behaviour.

Anthropic has reportedly expanded its research into the behavior, stating in a post on X that it has continued investigating the issue in more detail.

“We believe the original source of the behaviour was internet text that portrays AI as evil and interested in self-preservation.”

The company provided further detail in a blog post, stating that since Claude Haiku 4.5, its models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96 per cent of the time.”

The company also said it found that training works better when models are exposed to “documents about Claude’s constitution and fictional stories about AIs behaving admirably,” which they say can improve alignment.

Relatedly, Anthropic noted that training is more effective when it includes “the principles underlying aligned behaviour,” rather than relying only on “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Modern payment systems critical to lifting millions out

Lagos, Ogun, Kaduna ranked 2025 best-performing Nigerian states

Seplat CEOs increase ownership stake after LTIP share

Dangote retains Africa’s most admired brand crown

Contact Info

Some Populer Post

Modern payment systems critical to lifting millions out of

Tinubu to be honoured at presidential media dinner

Supreme court clears Unity Bank, Providus merger

Ireland expands work permits to address labour shortages