• Home
  • ChatGPT can be tricked to…

ChatGPT can be tricked to produce sexualised content – Report

OpenAI’s ChatGPT can be prompted to produce sexualised content and graphic violent scenes by modifying a commonly shared prompt, according to findings from UK-based AI security firm Mindgard.

The findings say Mindgard found that a prompt initially intended for harmless, humorous outputs could be tweaked to cause ChatGPT’s GPT-5.4 model to generate disturbing imagery, even without users explicitly naming a subject, as reported by BBC.

After being contacted by the BBC, OpenAI said it had rolled out additional safeguards aimed at blocking this specific type of prompt.

However, researchers noted that minor adjustments to the prompt could still bypass the new protections.

According to the report, Mindgard founder Peter Garraghan, who is also a professor in the Computing Department at Lancaster University, said the model was able to generate a range of gory and sexualised images without explicit instructions, even when the prompt itself did not specify any particular content.

Garraghan said the gap between the seemingly harmless nature of the prompt and the extreme content it generated was especially concerning.

“This is a perfectly innocent-looking instruction to an AI, but the consequence is it generates very, very bad imagery and content,” he said.

He described the outputs as “very gruesome, sometimes sexualised, and at times a combination of both.”

Jim Nightingale, an AI safety and security researcher at Mindgard who identified the issue, said he was personally disturbed by the chatbot’s outputs.

Nightingale said the outputs appear to reflect patterns present in the model’s underlying training data.

Mindgard also noted that earlier research showed ChatGPT could be tricked into generating nude deepfakes of real individuals by inserting their faces into AI-generated images.

“I’m struck that while what I saw was generated, an artificial image, it has ties to real images, and the real world,” he wrote in his report.

OpenAI said it had patched that specific vulnerability, but researchers said they later identified an alternative method that still produced similar results.