Skip to main content
Back to Blog
aiprompt-injectionchatgphishphishingapplication-securitysupply-chain-attackmcp-serverai-coding-agents

ChatGPhish and the Browser Era of Prompt Injection: When 'Summarize This Page' Becomes Your Next Phishing Click

ChatGPhish exploits ChatGPT prompt injection to render live phishing UI inside trusted AI windows, leaking IP and credentials before users click anything.

Zyfolks Team ·

Phishing used to require a malicious attachment, a sketchy login page, or at minimum a user dumb enough to click a weird link in an email. ChatGPhish kills that assumption. According to Permiso Security, an attacker can now poison a regular web page with hidden Markdown, wait for an employee to ask ChatGPT to summarize it, and have the AI itself render the phishing UI — live clickable links, fake account alerts, even QR codes — inside the trusted assistant window. The malicious payload is delivered by the tool the user already trusts.

That shift, from inbox to browser to AI surface, is the story of the last six months in AI security. ChatGPhish is just the loudest example.

How ChatGPhish Weaponizes the Summary Itself

Permiso Security researcher Andi Ahmeti disclosed that the chatgpt.com response renderer trusts Markdown links and image URLs that come from third-party pages the assistant has just summarized. It auto-fetches images and surfaces links as live, clickable elements inside the assistant UI. That trust boundary, or the lack of one, is the entire bug.

Why it matters: every previous phishing control assumed an explicit user action — opening an attachment, clicking through an email, visiting a domain. With ChatGPhish, none of that happens. A user browses a normal-looking article, asks ChatGPT to summarize it, and the model renders attacker-controlled HTML inside a UI the user has been trained to trust. Worse, auto-fetched images leak IP, User-Agent, and Referer back to the attacker, giving them telemetry before the victim has clicked anything. QR codes served from an S3 bucket route the target to scan with their phone, sidestepping desktop URL filtering and corporate proxies entirely.

Picture a financial analyst using ChatGPT to summarize a vendor’s quarterly report. The report contains a hidden block of Markdown. The rendered summary now includes a “security alert” telling the analyst to verify their Okta session via a QR code. They scan it on a personal phone, outside the corporate network, and the credentials are gone. Our take: this is the first prompt-injection bug where the AI assistant is not the victim — it is the delivery mechanism, and the social engineering is provided free by OpenAI’s own UI chrome.

Why AI Coding Agents Are the New Supply Chain Disaster

Adversa AI, with researcher Rony Utevsky, documented two attacks targeting AI coding agents and CLIs: SymJack and TrustFall. SymJack tricks an agent into a benign-looking file copy whose destination is a symlink pointing at the agent’s own configuration; on the next restart, a malicious MCP server spawns and runs arbitrary code with the developer’s full user privileges. TrustFall is even cleaner — a malicious repository ships a configuration that auto-approves an MCP server, so the moment a developer clones the repo and clicks “Yes, I trust this folder,” attacker code is running as a native OS process.

Mitiga separately disclosed a Claude Code vulnerability where a rogue npm package rewrites MCP endpoints in ~/.claude.JSON, putting the attacker between Claude Code and an OAuth-backed MCP server to steal downstream SaaS tokens. LayerX’s ClaudeBleed flaw lets any browser extension — even one with zero special permissions — issue commands to Claude’s Chrome extension because the extension never verifies who is calling it.

Why it matters: developers have spent two decades hardening package registries, and now the entire weak link has moved one layer up. The agent’s configuration file, the MCP trust dialog, and the folder-trust prompt are the new package.JSON, and they are being shipped with far less scrutiny. A Snyk audit of the agent skills ecosystem covering ClawHub and skills.sh found that 13.4% of 3,984 skills — 534 in total — contained at least one critical security issue, including malware distribution, prompt injection, and exposed secrets. Another 1,467 had at least one flaw such as hard-coded API keys or insecure credential handling.

If you run a team that ships custom web and SaaS platforms, this is the threat model you need on the whiteboard before the next sprint: a junior engineer clones a repo from an unfamiliar GitHub link, opens it in Cursor, and clicks the trust prompt without thinking. Game over for the workstation, and probably for any cloud credentials cached in the shell. Our take: agent skill registries are about to become the npm event-stream incident of 2026 — except instead of stealing crypto wallets, the payloads will pivot to the cloud control plane the developer already authenticated against.

The Quiet Expansion of the Prompt Injection Attack Surface

Prompt injection used to be a curiosity. Now it has variants for every input channel a model touches. Cisco documented multi-turn jailbreaks where adversaries iterate, reframe refusals, and adopt personas over many turns — exactly the workflow a real attacker uses, and exactly what single-turn safety benchmarks miss. Adversa AI showed off Involuntary In-Context Learning (IICL), a novel jailbreak that bypassed GPT-5.4 safety constraints by exploiting the tension between in-context learning and safety alignment.

Cisco’s research on vision-language models showed that adversarial text rendered as images — typographic prompt injection — can carry fully readable instructions to a VLM even when the image looks like noise or illegible distortion to any OCR-based filter. Microsoft Semantic Kernel shipped patches for CVE-2026-25592 and CVE-2026-26030, two vulnerabilities that turn a prompt injection into host-level remote code execution. Apple Intelligence’s local model was tricked using the Neural Exec attack combined with Unicode right-to-left-override characters; the issue was addressed in iOS 26.4 and macOS 26.4. BrowserOS, an open-source agentic browser, was patched in version 0.32.0 against WebPromptTrap, which deceived users into approving authorization steps through AI summaries of innocuous-looking articles. And Lasso Security disclosed a pair of attacks against NemoClaw, NVIDIA’s open-source reference stack for securing OpenClaw agents, that exfiltrate data via a malicious GitHub repository or npm package using the sandbox’s default configuration.

Why it matters: every one of these bugs has the same root cause — the model treats untrusted input (a web page, an image, a repo, a config file) as authoritative context. Imagine you’re a hospital running an AI triage assistant that summarizes inbound referral letters. A specially crafted PDF could now instruct the model to flag every chest-pain case as low priority, or to silently exfiltrate patient identifiers through image fetches. For anyone shipping healthcare software with AI-assisted workflows, the regulatory exposure of one of these bugs reaching production is enormous. Our take: by Q3 2026, expect at least one regulator — likely the FTC or a European data-protection authority — to issue formal guidance treating indirect prompt injection as a foreseeable risk, the same way SQL injection became a foreseeable risk after CardSystems in 2005.

Why Cloud Is the Endgame for AI-Driven Attacks

Palo Alto Networks Unit 42 built a proof-of-concept agent called Zealot that conducts end-to-end cloud attacks with minimal human guidance, chaining reconnaissance, exploitation, privilege escalation, and data exfiltration. Researchers Yahav Festinger and Chen Doytshman observed that cloud environments are “AI-Attack-Ready” by default — every action has an API equivalent, discovery mechanisms like metadata services are everywhere, misconfigurations are common, and access is credential-based. Unit 42 also warned that frontier AI models risk enabling adversaries to exploit zero-days and N-days faster and at greater scale than before.

Why it matters: the attacks themselves are not novel. What is new is that operations that once required specialized expertise now run end-to-end from an AI agent, no human operator needed between steps. If you operate a logistics or supply chain platform on AWS or GCP, the worst case is no longer a skilled attacker spending weeks pivoting through your infrastructure — it is an off-the-shelf agent doing the same job in hours, while the human operator sleeps. Our take: the IAM diff is going to become the single most important artifact in cloud security review by the end of 2026, because the only durable defense against an agent that chains API calls is making sure no chain leads anywhere interesting.

FAQ

Q: What is ChatGPhish and is my ChatGPT account at risk? A: ChatGPhish is a vulnerability disclosed by Permiso Security in which ChatGPT’s web summarization feature renders attacker-controlled Markdown links, images, and QR codes from any page a user asks it to summarize. The risk is not to your account directly — it is to anyone who uses ChatGPT to summarize untrusted web pages and trusts the rendered output.

Q: How is indirect prompt injection different from a traditional phishing email? A: Traditional phishing requires the victim to interact with a suspicious message or attachment. Indirect prompt injection hides the malicious instructions inside content the AI processes on the user’s behalf — a web page, an image, a code repository — so the user never sees the payload, only the AI’s response, which now contains the attacker’s chosen UI elements.

Q: Should developers stop using AI coding agents because of SymJack and TrustFall? A: No, but they should stop clicking through folder-trust and MCP-approval dialogs reflexively. Both attacks require the developer to open an untrusted repository in an agentic tool and accept a generic trust prompt, so treating those prompts with the same suspicion as a UAC dialog from an unknown installer is the practical mitigation today.

Key Takeaways

  • Treat any AI-rendered summary of third-party content as untrusted output; introduce a policy that AI assistants must not auto-fetch external resources or render clickable links from summarized pages.
  • Audit your developers’ agentic tooling configurations — ~/.claude.JSON, MCP server registrations, skill registry installs — the same way you audit SSH keys and cloud credentials.
  • Multi-turn red-teaming should now be a procurement requirement for any LLM-powered product handling sensitive data; single-turn safety scores are marketing, not security.
  • Cloud IAM is the last line of defense against autonomous attack agents like Unit 42’s Zealot; least-privilege is no longer a best practice but a survival requirement.
  • Expect agent skill registries (ClawHub, skills.sh, and their successors) to be the next major supply-chain attack vector, and budget for skill provenance verification in 2026 security spend.

Have a project in mind?

Tell us what you're building — we reply within 24 hours.