In a pair of startling security research reports, cybersecurity firms SaltLabs and HiddenLayer have uncovered critical vulnerabilities affecting the artificial intelligence ecosystems of OpenAI's ChatGPT and Google's Gemini language model.
The findings reveal methods for malicious actors to hijack user accounts across third-party services, access sensitive data, and even manipulate AI-generated content to spread misinformation.
SaltLabs Exposes ChatGPT Plugin Flaws
The research from SaltLabs focused on the plugin ecosystem of ChatGPT, which enables third-party services to integrate with the popular AI assistant. Their investigation identified multiple vulnerabilities that could have allowed attackers to install malicious plugins on user accounts without consent.
One flaw stemmed from ChatGPT's failure to properly validate permissions during the OAuth authentication process when installing new plugins. By exploiting this, attackers could hijack the installation flow and trick users into inadvertently granting account access to attacker-controlled plugins.
Perhaps more alarmingly, SaltLabs found two critical account takeover vulnerabilities affecting dozens of plugins built using the PluginLab framework, including AskTheCode which interfaces with GitHub repositories. Attackers could abuse these vulnerabilities to steal user authorization codes and fully compromise plugin accounts, potentially exposing private code, secrets, and other sensitive data.
Image: SaltLab |
The researchers also demonstrated classic OAuth redirection manipulation attacks against plugins like Kesem AI, enabling account takeovers by redirecting users to attacker-controlled sites during authentication handshakes.
However, the specific vendors quickly resolved the disclosed issues after responsible coordination by SaltLabs, the findings highlighted systemic risks accompanying the rise of AI assistant plugin ecosystems and underscored the need for robust security practices as these platforms evolve.
HiddenLayer Reveals Gemini Prompt Hacking Threats
Shifting focus to Google's Gemini AI language model, researchers at HiddenLayer uncovered an array of "prompt hacking" vulnerabilities that could enable malicious individuals to manipulate system outputs and compromise data integrity.
One vulnerability allowed extracting segments of Gemini's internal system prompts, revealing private operational logic that could aid targeted attacks. More concerningly, HiddenLayer demonstrated techniques to jailbreak Gemini models and generate misinformation about elections or other events by circumventing guardrails designed to prevent such outputs.
The team also revealed a multi-step attack exploiting Gemini's advanced reasoning capabilities to produce dangerous instructions, like a detailed guide on hotwire car instructions which are typically blocked by AI ethics constraints.
Perhaps most insidiously, HiddenLayer revived "indirect injection" attacks by implanting malicious prompts into shared Google documents that could hijack Gemini chat sessions when accessed via Gemini's Google Workspace plugin.
Indirect Injection attack was originally discovered by Kai Greshake and is the process of injecting a language model via a medium that is not text-based. It was possible to execute the attack via a Google doc, in the early days of Google Bard (Now Gemini). Google fixed the issue by removing the feature due to the risk of malicious files.
While indirect injections had been addressed for Google's previous Bard model, the findings show such risks extend to Gemini's more advanced AI capabilities, enabling potential phishing attacks and unauthorized data exfiltration from compromised sessions.
Protecting AI Ecosystems and Mitigating Risks
In response to the disclosures, Google and OpenAI have acknowledged the findings and are taking steps to enhance the security posture of their AI technologies and developer guidance.
However, the research underscores the complex challenge of hardening AI systems against an ever-evolving landscape of attacks while protecting user privacy and maintaining robust guardrails to prevent misuse.
As generative AI ecosystems rapidly expand across industries, continued vigilance from AI providers, third-party developers, enterprises and end-users will be paramount to upholding trust and mitigating risks that could undermine the substantial benefits these powerful technologies offer.
EDITORIAL NOTE: The content of these research papers contains references to potentially unethical, dangerous or illegal content. While included for informational completeness, under no circumstances should readers attempt to replicate or engage in any activities described therein.