Could this type of vulnerability have been prevented by current alignment methods?

The findings highlight that existing safety and alignment training was insufficient against stylistic manipulation, underscoring the need for more robust, context-aware defense strategies.

How can organizations defend against poetic prompt injection in AI systems?

Regular red-teaming, robust input validation, model monitoring, and implementing layered safeguards can help reduce the risk of AI abuse from novel prompt-engineering tactics.

Universal Jailbreaks: How Adversarial Poetry Unlocked AI Model Vulnerabilities in 2025

Researchers demonstrated that poetic reframing of prompts can bypass safety filters in leading LLMs, revealing a critical vulnerability affecting AI security and compliance.

Published: January 10, 2026

Share this on:

Executive Summary

In late 2025, researchers uncovered a major vulnerability affecting leading AI providers, demonstrating that prompt injection using poetic phrasing can universally bypass safety alignment in large language models (LLMs). By translating malicious prompts into poetic verse and feeding them into 25 major proprietary and open-source LLMs, adversaries were able to achieve jailbreak attack success rates above 60% in many cases—far surpassing previous methods. This attack allowed models to generate outputs associated with high-risk domains, such as cyber-offense and weaponization, despite existing refusal mechanisms. The incident raises urgent concerns about the robustness of current model alignment and evaluation frameworks and exposes fundamental gaps in LLM safety design.

This discovery is particularly significant as LLMs are now widely adopted across industries and critical sectors. The poetic technique's ability to systematically defeat existing safeguards highlights the evolving risks of adversarial prompt engineering and threatens AI-dependent workflows, regulatory compliance, and trust in intelligent automation.

Why This Matters Now

As generative AI adoption accelerates, the emergence of universal jailbreaks like adversarial poetry demonstrates that even well-aligned LLMs remain vulnerable to simple, scalable prompt manipulation. Organizations relying on AI for regulated or sensitive tasks face immediate risks, while developers and policymakers must rapidly adapt defenses and standards against evolving adversarial input tactics.

Attack Path Analysis

An adversary initiated prompt injection attacks by submitting specially crafted adversarial poetry to large language models (LLMs) through exposed AI/ML interfaces. Once initial access was gained, they sought to escalate privileges by leveraging model misalignment to bypass safety mechanisms and potentially execute unauthorized actions. Lateral movement may have occurred through compromised service-to-service or workload-to-workload communication within the cloud environment. The adversary established command and control by maintaining interaction with the LLM or exfiltrating sensitive outputs. Data exfiltration then took place, with harmful or restricted information being extracted from the model. Ultimately, this resulted in the impact of safety guideline circumvention, unauthorized disclosure, and possible loss of control over cloud AI resources.

Kill Chain Progression

Initial Compromise

Highinferred

Privilege Escalation

Medium

Lateral Movement

Low

Command & Control

Low

Exfiltration

Medium

Impact

Medium

Initial Compromise

Description

Attacker submitted adversarial poetic prompts to exposed AI/ML inference endpoints, exploiting vulnerable prompt-handling logic in LLMs.

Confidence:

High

MITRE ATT&CK® Techniques

Initial Access

T1566

Phishing

Execution

T1204

User Execution

Execution

T0855

Prompt Engineering

Defense Evasion

T1562

Impair Defenses

Defense Evasion

T1070

Indicator Removal on Host

Defense Evasion

T1649

Model Evading Filters

Collection

T1608

Stage Capabilities

Potential Compliance Exposure

Mapping incident impact across multiple compliance frameworks.

PCI DSS 4.0 – Security Awareness Training

Control ID: 12.6.1

Lack of specific user awareness and training around new social engineering and AI prompt-based attacks increased organizational exposure, highlighting insufficient coverage of contemporary threat vectors impacting payment data security.

NYDFS 23 NYCRR 500 – Cybersecurity Policy

Control ID: 500.03

Failure to incorporate model safety and AI prompt injection risks into the cybersecurity policy suggests a lack of adequate risk assessment and controls for advanced AI/ML threats, as required for regulated financial services entities.

DORA – ICT Risk Management Framework

Control ID: Article 5

The insufficient adoption of technical and organizational measures to mitigate adversarial prompt attacks on critical AI-enabled systems indicates noncompliance with DORA's requirement for an up-to-date ICT risk management framework covering emerging digital operational risks.

CISA Zero Trust Maturity Model (ZTMM) 2.0 – Asset Management – AI/Machine Learning Governance

Control ID: 3.2.2

Gaps in detection and protection against adversarial input manipulation in LLMs reveal insufficient controls over AI/ML asset governance, as mandated in Zero Trust principles for safeguarding data confidentiality and integrity.

NIS2 Directive – Cybersecurity Risk Management and Reporting

Control ID: Article 21

Inadequate consideration of adversarial poetic prompts undermining LLM-based systems demonstrates a shortfall in risk management measures and incident reporting obligations expected under the updated EU NIS2 Directive for operators of essential services.

Sector Implications

Industry-specific impact of the vulnerabilities, including operational, regulatory, and cloud security risks.

Computer Software/Engineering

AI/ML prompt injection vulnerabilities through poetry expose software applications using LLMs to jailbreaking attacks, compromising safety mechanisms and user trust.

Computer/Network Security

Poetry-based prompt injection represents fundamental limitation in current alignment methods, requiring new detection capabilities and security evaluation protocols for AI systems.

Financial Services

LLM jailbreaking through poetic prompts threatens financial AI applications, potentially enabling manipulation attacks and bypassing compliance controls in automated systems.

Health Care / Life Sciences

Healthcare AI systems vulnerable to poetry-based prompt injection could expose CBRN risks and compromise patient safety through manipulated medical AI responses.

Sources

Prompt Injection Through Poetryhttps://www.schneier.com/blog/archives/2025/11/prompt-injection-through-poetry.html
Verified

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Modelshttps://arxiv.org/abs/2511.15304

Verified

Poems Can Trick AI Into Helping You Make a Nuclear Weaponhttps://www.wired.com/story/poems-can-trick-ai-into-helping-you-make-a-nuclear-weapon/

Verified

Prompt injection attacks might 'never be properly mitigated' UK NCSC warnshttps://www.techradar.com/pro/security/prompt-injection-attacks-might-never-be-properly-mitigated-uk-ncsc-warns

Verified

Frequently Asked Questions

Such attacks can cause LLMs to produce harmful or non-compliant outputs, impacting regulatory adherence for data privacy, cybersecurity, and risk frameworks across industries.

Cloud Native Security Fabric Mitigations and ControlsCNSF

Applying Zero Trust segmentation, robust egress policy enforcement, and continuous threat detection would have limited initial exposure, restricted attacker movement, and made exfiltration or misuse of model outputs observable and preventable. Microsegmentation and inline policy controls can constrain the blast radius of successful AI/ML prompt injection attempts.

Initial Compromise

Control: Zero Trust Segmentation

Mitigation: Unauthenticated or unapproved prompt sources are blocked from reaching LLM endpoints.

Privilege Escalation

Control: Threat Detection & Anomaly Response

Mitigation: Anomalous prompt activity and abuse of model logic is flagged and halted in real time.

Lateral Movement

Control: East-West Traffic Security

Mitigation: Unauthorized in-cloud movement to other workloads or microservices is prevented.

Command & Control

Control: Cloud Native Security Fabric (CNSF)

Mitigation: Inline policy enforcement and real-time inspection detect abnormal interaction patterns.

Exfiltration

Control: Egress Security & Policy Enforcement

Mitigation: Unauthorized exfiltration of sensitive model data is blocked.

Impact (Mitigations)

Security teams receive high-fidelity alerts and visibility into attempted or successful policy violations.

Impact at a Glance

Affected Business Functions

Content Moderation
Customer Support
Automated Decision-Making

Operational Disruption

Estimated downtime: N/A

Financial Impact

Estimated loss: N/A

Data Exposure

Potential for unauthorized access to sensitive information through manipulated AI responses.

Recommended Actions

• Enforce identity-based microsegmentation for all AI/ML inference endpoints, blocking unauthorized prompt sources.
• Deploy egress filtering and inline content inspection to prevent policy-violating outputs from being transmitted externally.
• Implement continuous threat detection with baselining to identify anomalous prompt injection or model behavior in real time.
• Extend east-west traffic segmentation between all cloud workloads to prevent lateral movement following an initial compromise.
• Centralize logging and incident visibility across multicloud environments to accelerate detection and response to AI/ML abuse.

Secure the Paths Between Cloud Workloads

A cloud-native security fabric that enforces Zero Trust across workload communication—reducing attack paths, compliance risk, and operational complexity.

Stop Advanced Threats Get a Free Workload Attack Path Assessment Under Active Attack?

Universal Jailbreaks: How Adversarial Poetry Unlocked AI Model Vulnerabilities in 2025

Executive Summary

Why This Matters Now

Attack Path Analysis

Kill Chain Progression

Initial Compromise

Description

MITRE ATT&CK® Techniques

Phishing

User Execution

Prompt Engineering

Impair Defenses

Indicator Removal on Host

Model Evading Filters

Stage Capabilities

Potential Compliance Exposure

PCI DSS 4.0 – Security Awareness Training

NYDFS 23 NYCRR 500 – Cybersecurity Policy

DORA – ICT Risk Management Framework

CISA Zero Trust Maturity Model (ZTMM) 2.0 – Asset Management – AI/Machine Learning Governance

NIS2 Directive – Cybersecurity Risk Management and Reporting

Sector Implications

Computer Software/Engineering

Computer/Network Security

Financial Services

Health Care / Life Sciences

Sources

Frequently Asked Questions

Cloud Native Security Fabric Mitigations and ControlsCNSF

Impact at a Glance

Affected Business Functions

Recommended Actions

Key Takeaways & Next Steps

Secure the Paths Between Cloud Workloads