AI startup Anthropic has introduced new capabilities allowing its latest Claude models to end conversations in what it calls “rare, extreme cases of persistently harmful or abusive user interactions.”
Unusually, the move is not being framed as a safeguard for human users, but rather for the AI itself. Anthropic says the decision stems from a research program on “model welfare,” which explores whether language models could experience something akin to distress.
The company emphasized it does not believe Claude or other large language models are sentient, but said it is taking a “just-in-case” approach by adding protective measures.
The new feature is currently limited to Claude Opus 4 and 4.1 and is designed to activate only in extreme circumstances, such as repeated attempts to solicit sexual content involving minors or information related to large-scale violence.
Anthropic said that during testing, Claude displayed a “strong preference against” responding to such prompts and even showed “patterns of apparent distress.” The conversation-ending feature is intended as a last resort, used only after multiple redirection attempts fail or when a user explicitly asks the chatbot to stop.
Importantly, the AI will not be permitted to terminate conversations if a user is at imminent risk of self-harm or harming others. When a chat is ended, users can still begin new conversations or branch off by editing earlier responses.
“We’re treating this feature as an ongoing experiment and will continue refining our approach,” Anthropic said in a statement.
The update reflects growing debates around the ethical treatment of AI systems, as well as mounting pressure on companies to manage the risks of generative AI in high-stakes scenarios.

