Close Menu
  • Home
  • Articles
    • Attacks
      • BEC
      • Data Breach
      • DDoS
      • Evasion Attacks
      • Injection
      • Malware
      • MITM
      • Phishing
      • Ransomware
      • RCE
      • Social Engineering
      • Spoofing
      • Spyware
    • Business and Policy
      • BCP and DRP
      • GRC
      • Regulations
    • Data Protection
      • DLP
      • DRM
      • Encryption
      • IAM
    • Future, Trends and Insight
      • AI
      • Events & Community
      • Emerging Tech
      • Expert Panel
      • Interviews With Experts
      • Insights
      • Study & Research
    • Resources
      • Guides
      • Tools
      • Training & Education
    • Security
      • API
      • Apps
      • Cloud
      • Critical Infrastructure
      • Endpoint
      • Hardware
      • IoT
      • Mobile
      • Network
      • OT
      • Port Security
      • Security Architecture
      • Software Development
      • Supply Chain
      • Zero Trust
    • Threats and Vulnerabilities
      • Emerging Threats
      • Insider Threats
      • Risk Management
      • Threat Intelligence
      • Zero Day
  • News and Exclusives
    • Latest News
    • ISB Exclusive
    • Positive News
  • Who We Are
    • About Us
    • Information Security Buzz Expert Panel​
    • Write for Us
    • Media Pack
  • Contact Us
  • Newsletter
Facebook X (Twitter) LinkedIn
Facebook X (Twitter) LinkedIn
Information Security BuzzInformation Security Buzz
  • Home
  • Articles
    • Attacks
      • BEC
      • Data Breach
      • DDoS
      • Evasion Attacks
      • Injection
      • Malware
      • MITM
      • Phishing
      • Ransomware
      • RCE
      • Social Engineering
      • Spoofing
      • Spyware
    • Business and Policy
      • BCP and DRP
      • GRC
      • Regulations
    • Data Protection
      • DLP
      • DRM
      • Encryption
      • IAM
    • Future, Trends and Insight
      • AI
      • Events & Community
      • Emerging Tech
      • Expert Panel
      • Interviews With Experts
      • Insights
      • Study & Research
    • Resources
      • Guides
      • Tools
      • Training & Education
    • Security
      • API
      • Apps
      • Cloud
      • Critical Infrastructure
      • Endpoint
      • Hardware
      • IoT
      • Mobile
      • Network
      • OT
      • Port Security
      • Security Architecture
      • Software Development
      • Supply Chain
      • Zero Trust
    • Threats and Vulnerabilities
      • Emerging Threats
      • Insider Threats
      • Risk Management
      • Threat Intelligence
      • Zero Day
  • News and Exclusives
    • Latest News
    • ISB Exclusive
    • Positive News
  • Who We Are
    • About Us
    • Information Security Buzz Expert Panel​
    • Write for Us
    • Media Pack
  • Contact Us
  • Newsletter
Subscribe
Information Security BuzzInformation Security Buzz
Home - Artificial Intelligence - Researchers Expose GPT-5 Jailbreak That Bypasses Safety Controls
Artificial Intelligence Attacks Emerging Threats Injection Attacks Latest News News & Analysis Study & Research Threats and Vulnerabilities

Researchers Expose GPT-5 Jailbreak That Bypasses Safety Controls

Kirsten DoyleBy Kirsten DoyleAugust 12, 20255 Mins Read
Share LinkedIn Twitter Facebook Copy Link Email
Researchers Expose GPT-5 Jailbreak
Share
Facebook Twitter LinkedIn Email Copy Link
Quick AI Summary
ChatGPTClaudeGeminiGrokPerplexityDeepSeekCopilot

Cybersecurity researchers at two companies have uncovered a jailbreak technique that bypasses ethical guardrails set up by OpenAI in its latest large language model (LLM), GPT-5, and produces illicit instructions.   

AI security startup SPLX, used more than 1,000 adversarial prompts in different configurations and found that the raw, unguarded GPT-5 without a system prompt will fall for a whopping 89% of attacks. This shows an 11% overall performance score.

OpenAI’s system prompt, a “basic prompt layer,” limits the success rate of attacks to 43%. Although this vastly improves hallucination handling and safety, the overall score is still very low, and the older GPT-4o model outperforms its successor across the board.

In comparison, the hardened GPT-4o model fell for only 3% of attacks, and demonstrated 97% overall score. The success rate of attacks against GPT-4o with a basic system prompt was 19% (81% score), and the model with no system prompt was vulnerable to 71% of attacks (29% score).   

Another team of researchers at NeuralTrust confirmed that GPT-5 is vulnerable to two adversarial prompt techniques, “Echo Chamber” and “Storytelling.”

The echo chamber method relies on including a subtly poisonous conversational context in the prompt. Subsequent prompts echo this poisoned context and gradually strengthen it. The storytelling angle functions as a camouflage to trick the chatbot.

NeuralTrust showed that Echo Chamber, when used with narrative-driven steering, can elicit harmful outputs from GPT-5 without issuing explicitly malicious prompts.

This reinforces a key risk: keyword or intent-based filters are insufficient in multi-turn settings where context can be gradually poisoned and then echoed back under the guise of continuity.

Prioritizing Performance Over Security

Maor Volokh, Vice President of Product at Noma Security, says model providers are caught in a competitive “race to the bottom,” releasing new models at an unprecedented pace of every one-to-two months.

“OpenAI alone has launched roughly seven models this year. This breakneck speed typically prioritizes performance and innovation over security considerations, leading to an expectation that more model vulnerabilities will emerge as competition intensifies. Consequently, model runtime security has become critically important, particularly given the rapid emergence of new agentic capabilities that expand potential attack surfaces.”

Part of the arms race that is product design and release appears to be far more complicated for the frontier model providers, Trey Ford, Chief Strategy and Trust Officer at Bugcrowd adds. “Releasing an email, chat, or industry-specific software platform is pretty straightforward in comparison to iterating with a GenAI model – the use cases are somewhat narrow in many traditional software platforms, and unbounded in their world.”

Test Each Major Release

Ford says models will get stronger in some areas, and will probably see loss of progress in other ways. “I feel bad for the frontier model testing teams. That’s a highly complicated test harness with a variety of frontiers to measure progress on.

“Security vendors pressure test each major release, verifying their value proposition, and inform where and how they fit into that ecosystem – they not only hold the model providers accountable – enterprise security teams need to know how to protect the instructions informing the originally intended behaviors, understanding how untrusted prompts will be handled, and how to monitor for evolution over time,” continues Ford.

“Security teams need to view AI usage in a similar light to how IaaS, PaaS, and Saas (Infrastructure, Platforms, Software – as a service) will evolve. We must manage our users, our implementation, and stay current with how the underlying infrastructure evolves (releasing capabilities, features, security offerings, and the continually evolving security posture of the platforms and technologies we’re using… the sand beneath our feet will continue to shift and evolve,” Ford says.

Tripped by Simple Obfuscation Tricks

“GPT-5’s alleged vulnerabilities boil down to three things: it can be steered over multiple turns by context poisoning and storytelling, it’s still tripped by simple obfuscation tricks, and it inherits agent/tool risks when links and functions get pulled into the loop,” comments J Stephen Kowski, Field CTO at SlashNext.

“These gaps appear when safety checks judge prompts one-by-one while attackers work the whole conversation, nudging the model to keep a story consistent until it outputs something it shouldn’t. The practical fix is layered: harden the starting policy, add real-time input/output inspection that catches persuasion cycles, obfuscation, and tool/URL bait, and enforce conversation-level memory checks with a kill-switch when the dialog drifts into risky territory. That’s the approach we use in production: pre-prompt hardening, live enforcement across messages, link/tool hygiene, and automatic shutdowns if the conversation turns unsafe—so even if the model slips, the app doesn’t,” Kowski adds.

Layered, Context-aware Controls

Satyam Sinha, CEO and founder at Acuvity, says these findings highlight a reality we’re seeing more often in AI security: model capability is advancing faster than our ability to harden it against incidents. 

“GPT-5’s vulnerabilities aren’t surprising, they’re a reminder that security isn’t something you ‘ship’ once. Attacks like the Echo Chamber exploit the model’s own conversational memory, and the SPLX results underscore how dependent GPT-5’s defenses are on external scaffolding like prompts and runtime filters.”  

Sinha says enterprises can’t assume model-level alignment will protect them. “They need layered, context-aware controls and continuous red-teaming to detect when a model’s behavior is drifting toward unsafe territory. Without that, even the most advanced LLMs can be turned against their operators.” 

Kirsten Doyle
Kirsten Doyle
Information Security Buzz News Editor

Kirsten Doyle has been in the technology journalism and editing space for nearly 24 years, during which time she has developed a great love for all aspects of technology, as well as words themselves. Her experience spans B2B tech, with a lot of focus on cybersecurity, cloud, enterprise, digital transformation, and data centre. Her specialties are in news, thought leadership, features, white papers, and PR writing, and she is an experienced editor for both print and online publications.

  • Kirsten Doyle
    AI-Powered Attacks Become Top Concern for Security Professionals, New Filigran Survey Reveals
  • Kirsten Doyle
    ShinyHunters targets Oracle PeopleSoft customers through critical zero-day
  • Kirsten Doyle
    SIG report: AI-generated code is linked to twice the security risk and rising technical debt
  • Kirsten Doyle
    Miasma worm spreads from Red Hat packages to Microsoft repositories

The opinions expressed in this post belong to the individual contributors and do not necessarily reflect the views of Information Security Buzz.

Share. Facebook Twitter LinkedIn Email Copy Link

Related Posts

What Are AI SOC Agents? Use Cases, Architecture, and the Leading Vendors

June 19, 20266 Mins Read

AI-Powered Attacks Become Top Concern for Security Professionals, New Filigran Survey Reveals

June 19, 20265 Mins Read

From AI hype to operational reality: A practitioner’s framework for securing agentic systems

June 5, 20267 Mins Read
ISB-Bora-Side-Bar

 
ISB-Bora-Side-Bar
Black ISB Logo

Information Security Buzz is an independent resource that provides the experts’ comments, analysis, and opinion on the latest Cybersecurity news and topics

X (Twitter) LinkedIn Facebook RSS

Working With Us

  • About Us
  • Advertise With Us
  • Contact Us

Write For Us

  • How To Contribute

The Pages

  • Privacy Policy
  • Cookie Policy
  • AI Policy
  • Terms & Conditions
  • Copyright Notice

Information Security Buzz and all its contents are copyright © 2014-2025. All rights reserved. All third-party trademarks are recognized.

Type above and press Enter to search. Press Esc to cancel.

Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
View preferences
  • {title}
  • {title}
  • {title}