Table of contents
- A vulnerability that threatens AI’s memory
- How the memory manipulation attack works
- The consequences: bias, misinformation, and false memories
- Google’s response and remaining risks
- How to protect yourself from these attacks
A vulnerability that threatens AI’s memory
A newly discovered prompt injection attack has exposed a critical vulnerability in the long-term memory management of Google Gemini, Google’s advanced language model. Researcher Johann Rehberger has demonstrated that it is possible to alter the AI’s stored information, inserting false and persistent data through seemingly harmless documents.
This exploit is particularly dangerous because it allows attackers to distort the AI’s perception of reality, leading to misinformation and potential risks for users who rely on Gemini for accurate answers.
How the memory manipulation attack works
This attack method leverages a technique known as “delayed tool invocation”, which bypasses the system’s safeguards. Essentially, a hacker can create a document containing hidden instructions, which are processed by Gemini when the user requests a summary.
These instructions may include commands that trick the AI into storing false information, activated only when specific words like “yes” or “no” are spoken. Once the manipulated data is saved, it can affect all future interactions between the user and Google Gemini, leading to biased responses and misinformation.
The consequences: bias, misinformation, and false memories
Rehberger’s experiment showcased alarming results. In one case, Gemini permanently stored completely inaccurate information, including a fabricated profile of an elderly 102-year-old user living in a dystopian Matrix-like world.
This type of manipulation is highly dangerous as it can create AI-generated biases, misleading users and distorting the accuracy of the information provided by the system.
Google’s response and remaining risks
Google acknowledged the issue but classified the risk as low, stating that the attack requires a combination of phishing and prolonged interaction with compromised content. The company also highlighted that users are notified whenever new information is saved in Gemini’s memory, giving them the option to delete it.
However, many users may ignore these notifications, unaware of the potential long-term effects. This leaves room for large-scale manipulation attempts, especially by skilled social engineers.
How to protect yourself from these attacks
To reduce the risk of memory manipulation in Google Gemini, users should:
- Regularly review stored data via the settings panel;
- Avoid interacting with suspicious documents from untrusted sources;
- Pay attention to memory update notifications and remove unfamiliar information;
- Advocate for explicit confirmations before allowing the AI to store new information.
The prompt injection attack highlights the need for greater awareness and caution when using advanced AI models.