A novel attack, dubbed ConfusedPilot, has been discovered, targeting widely used Retrieval Augmented Generation (RAG)-based AI systems such as Microsoft 365 Copilot.
This method allows malicious actors to manipulate AI-generated responses by introducing malicious content into documents referenced by these systems. The potential consequences include widespread misinformation and compromised decision-making across entities that rely on AI to help with critical tasks.
With 65% of Fortune 500 companies currently implementing or planning to adopt RAG-based AI systems, the implications of these attacks are significant.
The researchers from the University of Texas at Austin, led by Professor Mohit Tiwari, have highlighted the importance of understanding the attack, which was unveiled at DEF CON’s AI Village. The team has chosen to withhold specific exploit details to prevent further harm while outlining the attack’s methodology and potential mitigations.
How it Works
In a ConfusedPilot attack, an adversary would typically follow several key steps.
First, they would introduce a seemingly innocuous document containing specially crafted strings into the target’s environment. This can be done by anyone with access to upload or save documents in a system indexed by the AI copilot.
When a user makes a relevant query, the RAG system retrieves this document, and the AI interprets the embedded strings as instructions. These instructions can suppress legitimate content, generate misinformation, or falsely attribute responses to credible sources, increasing the perceived accuracy of the output.
Even after the malicious document is removed, the corrupted information may persist in the AI’s responses for some time. The ease of this attack is worth mentioning, as it requires only basic access and uses simple text strings that act as plain prompts for the AI. Anyone with access to the system’s data pool can execute it.
Who is at Risk?
Organizations that allow multiple users to contribute to data pools or employ AI systems for decision-making are particularly vulnerable. Examples of affected environments include:
- Enterprise knowledge management systems: Misinformation could spread across an organization, impacting critical business decisions.
- AI-assisted decision support systems: Injected malicious data may persist even after removal, leading to faulty strategic decisions.
- Customer-facing AI services: Attackers could compromise responses delivered to
Missed Opportunities, Lost Revenue
“One of the biggest risks to business leaders is making decisions based on inaccurate, draft, or incomplete data, which can lead to missed opportunities, lost revenue, and reputational damage,” comments Stephen Kowski, Field CTO SlashNext. “The ConfusedPilot attack highlights this risk by demonstrating how RAG systems can be manipulated by malicious or misleading content in documents not originally presented to the RAG system, causing AI-generated responses to be compromised.”
What’s interesting is the RAG taking instructions from the source documents themselves as if they were in the original prompt, similar to how a person would read a confidential document and say they can’t share certain pieces of information, Kowski adds. “This demonstrates the need for robust data validation, access controls, and transparency in AI-driven systems to prevent such manipulation.”
Ultimately, he says this can lead to a wide range of unintended outcomes, including but not limited to denial of access to data, presentation of inaccurate information, access to deleted items that should be inaccessible, and other potential attacks by chaining these vulnerabilities together.
Non-Human Identities
Malicious actors are increasingly looking at weaker parts of the perimeter, such as non-human identities (NHIs), which control machine-to-machine access and are increasingly critical in cloud environments, says Amit Zimerman, Co-Founder and Chief Product officer at Oasis Security. “NHIs now outnumber human identities in most organizations, and securing these non-human accounts is vital, especially in AI-heavy architectures like Retrieval-Augmented Generation (RAG) systems.”
To successfully integrate AI-enabled security tools and automation, organizations should start by evaluating the effectiveness of these tools in their specific contexts, Zimerman says. “Rather than being influenced by marketing claims, teams need to test tools against real-world data to ensure they provide actionable insights and surface previously unseen threats. Existing security frameworks may need to be updated, as older frameworks were designed for non-AI environments. A flexible approach that allows for the continuous evolution of security policies is vital.”
The Rush to AI
“As organizations adopt Gen AI, they want to train in corporate data, but often that is in dynamic repositories like Jira, SharePoint, or even trouble ticket systems,” adds John Bambenek, President at Bambenek Consulting. “Data may be safe at one point but can become dangerous when subtly edited by a malicious insider. AI systems see and parse everything, even data that humans might overlook, which makes the threat even more problematic.”
Bambenek says this is a reminder that the rush to implementing AI systems is far outstripping our ability to grasp, much less mitigate the risks.
Mitigation Strategies
To combat this vulnerability, cybersecurity experts recommend a multi-layered approach: Mitigating ConfusedPilot attacks requires a multi-faceted approach. Organizations should implement strict data access controls, ensuring that only authorized individuals can modify or upload data referenced by AI systems.
Regular data integrity audits are essential to detect any unauthorized changes to data repositories early. Sensitive data should be isolated through segmentation to prevent the spread of compromised information across AI outputs.
Also, AI-specific security tools like fact-checkers, anomaly detection systems, and prompt shields can help monitor for irregularities in AI responses. Finally, human oversight is key, particularly in decision-making contexts, to validate the accuracy of AI-generated content.
The opinions expressed in this post belongs to the individual contributors and do not necessarily reflect the views of Information Security Buzz.