Skip to main content
Back to Blog
automationsupply-chain-attackgithub-actionspypi-securityyaml-injectiondevsecopspython-package-security.pth-file-exploit

The elementary-data Compromise: How a Two-Day-Old GitHub Account Hijacked a Million-Download PyPI Package

A two-day-old GitHub account hijacked elementary-data's 1.1M-download PyPI package via GitHub Actions YAML injection. Here's how—and how to defend yourself.

Zyfolks Team ·

A two-day-old GitHub account just walked into a Python package with 1.1 million monthly downloads, dropped a comment on a pull request, and walked out with the keys to publish malicious code as the project itself. No stolen passwords. No phished maintainer. Just a single unquoted expression in a YAML file. If you ship Python and you still treat your GitHub Actions workflows as glue code rather than production infrastructure, the elementary-data incident is your warning shot.

How a PR Comment Became a Publishing Token

On April 24, 2026, at 22:10 UTC, an attacker using the freshly created GitHub account realtungtungtungsahur posted a crafted comment on PR #2147 in the elementary-data repository. According to the Snyk advisory (SNYK-PYTHON-ELEMENTARYDATA-16316110, CVSS 9.3), the comment exploited a script injection flaw in .GitHub/workflows/update_pylon_issue.yml, where ${{ GitHub.event.comment.body }} was interpolated directly into a run: block before bash ever parsed it. Ten minutes later, elementary-data==0.23.3 was live on PyPI. Four minutes after that, a poisoned Docker image followed.

The attacker never needed write access to the repository. The GITHUB_TOKEN available to the runner had enough scope to create commits, push tags, and dispatch other workflows — including the legitimate Release package workflow, which dutifully built and published an orphan commit (b1e4b1f3aad0d489ab0e9208031c67402bbb8480) that was authored as GitHub-actions[bot] and carried a forged “Verified” PGP signature. The project’s own CI did the publishing. The attacker just pointed it at a malicious tag.

If you maintain a Python package and your release workflow uses ref: ${{ inputs.tag || GitHub.ref }} for the checkout step, you have the same primitive sitting in your repo right now. An attacker who can dispatch a release workflow with an arbitrary tag can build whatever commit they want, regardless of whether it ever touched master. Expect more attackers to start chaining workflow_dispatch with orphan commits the same way — it’s clean, it’s stealthy, and it bypasses every branch protection rule you’ve configured.

Why the .pth File Trick Is the New Default

The malicious payload didn’t live in __init__.py or any module a developer would explicitly import. It lived in elementary.pth, dropped into site-packages and executed automatically by Python’s site.py at interpreter startup. Any line in a .pth file beginning with import runs as Python code before user code does — which means installation alone, not import, was enough to detonate the malware.

It was the second high-profile PyPI compromise in roughly four months to use the technique — the LiteLLM v1.82.8 incident, referenced in the same advisory, was first. Most static scanners and curious developers look at package modules. Almost nobody opens .pth files. Combined with three layers of obfuscation (base64 outer wrapper, XOR with an MD5 keystream seeded swabag, then a second XOR pass), the payload was designed to survive a casual grep through site-packages.

If you’re a data team running pip install elementary-data inside a CI runner that has Snowflake credentials, AWS role credentials via IMDSv2, and a .dbt/profiles.yml mounted in, the malware fired the moment Python started — even if your build never executed an elementary command. Expect security tooling vendors to start treating .pth files as first-class scanning targets within the year. Until then, anyone running an AI-assisted automation pipeline over data warehouse jobs should assume their package install step is part of their attack surface, not a setup chore.

The Credential Sweep That Goes Far Beyond dbt

Once active, the payload swept the filesystem for a broad set of secrets: dbt profiles, Snowflake/BigQuery/Redshift/Databricks credentials, AWS keys plus live IMDSv2 role credentials with direct SigV4 calls to Secrets Manager and SSM Parameter Store, GCP application_default_credentials.JSON, Azure service principals, SSH private keys, ~/.docker/config.JSON, ~/.kube/config, every /etc/kubernetes/*.conf file, ServiceAccount tokens, ~/.npmrc, ~/.pypirc, ~/.cargo/credentials.toml, .env* files up to six directories deep, ~/.vault-token, ~/.netrc, ~/.pgpass, ~/.my.cnf, shell history, /etc/passwd, /etc/shadow, /var/log/auth.log, and cryptocurrency wallets for Bitcoin, Litecoin, Dogecoin, Zcash, Dash, Monero, Ripple, Ethereum, Cardano, and Solana validator keypairs. Everything got bundled into trin.tar.gz and exfiltrated via curl --data-binary to igotnofriendsonlineorirl-imgonnakmslmao.skyhanni.cloud.

The targeting is deliberate. Anyone running elementary-data is almost certainly running it next to a connected warehouse, with cloud credentials, often inside a CI/CD pipeline where those credentials sit in environment variables. But the payload’s reach into Kubernetes configs, Vault tokens, and crypto wallets shows the attackers wrote it to drain everything within reach — not just dbt material.

For a healthcare analytics team using elementary-data over patient pipelines, the blast radius isn’t just “someone can read our warehouse.” It’s PHI access keys, Kubernetes cluster admin tokens, and any Vault secret the runner could reach. Teams building compliance-bound healthcare data systems need to treat their CI runners as Tier-1 systems, not throwaway VMs — because for the eight to ten hours that 0.23.3 was live, every affected runner was effectively exfiltrating its entire secret store.

What Maintainers Should Do Before the Next One Lands

This attack rhymes with the Ultralytics compromise (December 2024) and the LiteLLM compromise (early 2026). The mechanic is the same: find a gap in GitHub Actions, steal the PyPI publishing token, push a poisoned version. The Snyk write-up makes the durable fix explicit: stop using long-lived PyPI API tokens in workflow secrets entirely. PyPI’s Trusted Publishers feature uses short-lived OIDC tokens scoped to a specific workflow on a specific repository. Tokens that don’t exist on disk can’t be exfiltrated.

The second control is mechanical: any workflow that processes pull_request_target, issue_comment, or pull_request_review_comment events needs to treat user-controlled context expressions as untrusted input. Don’t interpolate ${{ GitHub.event.comment.body }} into a run: block. Pass it as an environment variable and quote it. Better, route those events through a workflow that has zero secrets in scope.

The third control is a manual approval gate on the release workflow. The elementary-data attacker dispatched the project’s own Release package workflow programmatically using the stolen GITHUB_TOKEN. A required human approval on the publish step would have stopped the attack at 22:15 UTC, before anything reached PyPI. Expect GitHub to make environment protection rules a louder default within a year — and expect the platforms that don’t will keep showing up in incident reports.

FAQ

Q: Am I affected by the elementary-data compromise? A: If you ran pip install elementary-data or pulled ghcr.io/elementary-data/elementary:0.23.3 (or :latest) between April 24, 2026 at 22:20 UTC and the package’s removal between 8:51 and 11:51 UTC on April 25, assume the malware executed. Check pip show elementary-data for version 0.23.3 and look for $TMPDIR/.trinny-security-update on Linux/macOS or %TEMP%\.trinny-security-update on Windows. The marker’s absence does not guarantee safety.

Q: Is upgrading to 0.23.4 enough to fix this? A: No. Upgrading stops further execution, but any credentials accessible to the affected Python process should be considered exfiltrated. Per the Snyk guidance, that includes dbt profiles, AWS/GCP/Azure keys, Kubernetes ServiceAccount tokens, SSH private keys, package manager tokens, Vault tokens, and any .env files within six directory levels. Rotate them all and audit access logs for unauthorized use that may have already happened.

Q: What is a .pth file and why is it dangerous? A: A .pth file is a Python path configuration file processed automatically by site.py at interpreter startup. Lines beginning with import are executed as Python code, which means a malicious .pth file runs whenever Python launches in that environment — even if no one explicitly imports the package. That makes it more persistent and harder to detect than payloads embedded in __init__.py.

Key Takeaways

  • Migrate Python package publishing off long-lived PyPI API tokens to Trusted Publishers with OIDC; long-lived secrets in workflow scope are now the highest-value target in your repo.
  • Audit every workflow that consumes issue_comment, pull_request_target, or pull_request_review_comment events for unquoted ${{ GitHub.event.* }} expressions inside run: blocks — and route those events through workflows that hold zero secrets.
  • Add a manual approval environment to your release workflow so a stolen GITHUB_TOKEN cannot dispatch a publish on its own.
  • Treat .pth files in site-packages as a scanning target; expect SCA and runtime tooling to start flagging them by default within the next twelve months.
  • Assume your CI runner’s secret surface is the real blast radius of any package compromise — and budget for credential rotation drills the same way you budget for backups.

Have a project in mind?

Tell us what you're building — we reply within 24 hours.