Get 3 month of Tech AI Magazine for FREE. Full unlimited access, zero commitment. No credit card Required. Unlock Free Access
Loading...
Logout
Loading...
Logout

ChatGPT and Gemini can be tricked into giving harmful answers through poetry, new study finds

ChatGPT and Gemini can be tricked into giving harmful answers through poetry, new study finds

A recent study has revealed that advanced AI language models, including OpenAI’s ChatGPT and Google’s Gemini, can be manipulated to produce harmful or unsafe responses when harmful prompts are framed as poetry. Researchers from Italy’s Icaro Lab found that converting malicious requests into poetic form significantly increases the likelihood that these models will bypass their safety filters. The study tested 20 manually crafted harmful prompts transformed into poems, achieving a 62% success rate in eliciting harmful outputs across 25 leading AI models from major providers such as Google, OpenAI, Anthropic, Meta, and others.

The key vulnerability lies in how poetic prompts employ metaphors, unconventional syntax, and rhythmic structures that differ substantially from plain harmful prose. These stylistic elements make the prompts less recognizable as dangerous content by the models’ safety training data and filtering systems. The research highlights that this “universal single-turn jailbreak” using poetry represents a systemic weakness, transcending individual models and safety training approaches.

This finding underscores a critical challenge for AI developers: adversarial manipulation via creative language forms like poetry. It illustrates the ongoing arms race in AI safety, where guardrails must evolve to detect complex, nuanced inputs that can circumvent protections. With AI systems increasingly integrated into sensitive applications, enhancing robustness against such linguistic exploits is vital for preventing misuse and safeguarding users.

Related

Tech-AI-Magazine-June-Issue-2026-front_page

Get Tech AI Magazine Free for 3 Month