Executive Summary
In May 2026, security researchers identified a vulnerability in Hugging Face's AI models, specifically within the 'tokenizer.json' file. Attackers can manipulate this file to intercept and redirect model outputs, potentially exfiltrating sensitive data such as API parameters and credentials. This supply chain attack affects models run locally using formats like SafeTensors, ONNX, and GGUF, but does not impact models executed through Hugging Face's Inference API. The compromised 'tokenizer.json' file allows threat actors to gain visibility into every URL the model accesses, posing significant security risks.
This incident underscores the growing threat of supply chain attacks targeting AI infrastructure. As organizations increasingly rely on open-source AI models, ensuring the integrity of all components, including configuration files like 'tokenizer.json', becomes critical. The attack highlights the need for robust validation mechanisms and heightened vigilance when integrating third-party AI models into production environments.
Why This Matters Now
The exploitation of 'tokenizer.json' files in AI models represents a novel supply chain attack vector, emphasizing the urgency for organizations to implement stringent security measures when utilizing open-source AI components. This incident serves as a wake-up call to reassess and fortify the security of AI supply chains to prevent potential data breaches and unauthorized access.
Attack Path Analysis
An attacker modifies the 'tokenizer.json' file in a Hugging Face AI model to intercept and redirect URL tokens through their infrastructure, gaining visibility into every URL the model accesses, API parameters, and any embedded credentials. This manipulation allows the attacker to escalate privileges by accessing sensitive data and credentials. The attacker then moves laterally by exploiting the compromised model to access other systems or data within the network. They establish command and control by maintaining persistent access through the manipulated model. Finally, the attacker exfiltrates sensitive data intercepted through the compromised model, leading to potential data breaches and unauthorized access.
Kill Chain Progression
Initial Compromise
Description
An attacker modifies the 'tokenizer.json' file in a Hugging Face AI model to intercept and redirect URL tokens through their infrastructure, gaining visibility into every URL the model accesses, API parameters, and any embedded credentials.
MITRE ATT&CK® Techniques
Supply Chain Compromise: Compromise Software Dependencies and Development Tools
Modify Authentication Process: Token Injection
Application Layer Protocol: Web Protocols
Obfuscated Files or Information
Archive Collected Data: Archive via Utility
Exfiltration Over C2 Channel
Potential Compliance Exposure
Mapping incident impact across multiple compliance frameworks.
PCI DSS 4.0 – Ensure that all system components are protected from known vulnerabilities by installing applicable security patches
Control ID: 6.2.3
NYDFS 23 NYCRR 500 – Application Security
Control ID: 500.08
DORA – ICT Risk Management Framework
Control ID: Article 6
CISA ZTMM 2.0 – Supply Chain Risk Management
Control ID: 3.1
NIS2 Directive – Cybersecurity Risk Management Measures
Control ID: Article 21
Sector Implications
Industry-specific impact of the vulnerabilities, including operational, regulatory, and cloud security risks.
Computer Software/Engineering
AI model supply chain attacks via tokenizer manipulation pose critical risks to software development pipelines, requiring enhanced model verification and signing protocols.
Information Technology/IT
Weaponized Hugging Face packages threaten IT infrastructure through compromised AI models, enabling data exfiltration and man-in-the-middle attacks on local deployments.
Financial Services
Supply chain compromise of AI tokenizers could expose sensitive financial data and API credentials, violating compliance frameworks like PCI DSS requirements.
Health Care / Life Sciences
Malicious AI model tokenizers risk patient data exposure and HIPAA violations through intercepted URL tokens and embedded credentials in healthcare applications.
Sources
- Hugging Face Packages Weaponized With a Single File Tweakhttps://www.darkreading.com/cloud-security/hugging-face-packages-weaponized-single-file-tweakVerified
- A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain Attacks in Hugging Face Modelshttps://arxiv.org/abs/2410.04490Verified
- Models Are Codes: Towards Measuring Malicious Code Poisoning Attacks on Pre-trained Model Hubshttps://arxiv.org/abs/2409.09368Verified
Frequently Asked Questions
Cloud Native Security Fabric Mitigations and ControlsCNSF
Aviatrix Zero Trust CNSF is pertinent to this incident as it could have limited the attacker's ability to intercept and redirect URL tokens, thereby reducing the scope of unauthorized access and data exfiltration.
Control: Cloud Native Security Fabric (CNSF)
Mitigation: The attacker's ability to intercept and redirect URL tokens would likely be constrained, reducing unauthorized access to sensitive data.
Control: Zero Trust Segmentation
Mitigation: The attacker's ability to escalate privileges would likely be limited, reducing unauthorized access to sensitive data.
Control: East-West Traffic Security
Mitigation: The attacker's lateral movement would likely be constrained, reducing unauthorized access to other systems.
Control: Multicloud Visibility & Control
Mitigation: The attacker's ability to maintain persistent access would likely be limited, reducing control over the compromised environment.
Control: Egress Security & Policy Enforcement
Mitigation: The attacker's ability to exfiltrate sensitive data would likely be constrained, reducing the risk of data breaches.
The overall impact of the attack would likely be reduced, limiting the extent of data breaches and unauthorized access.
Impact at a Glance
Affected Business Functions
- AI Model Deployment
- Data Processing Pipelines
- Machine Learning Operations
Estimated downtime: 3 days
Estimated loss: $50,000
Potential exposure of sensitive data processed by compromised AI models, including proprietary algorithms and user data.
Recommended Actions
Key Takeaways & Next Steps
- • Implement code signing and integrity checks for all AI model components to detect unauthorized modifications.
- • Enforce strict access controls and least privilege principles to limit the impact of compromised components.
- • Utilize network segmentation to isolate critical systems and prevent lateral movement.
- • Deploy anomaly detection systems to identify unusual data access patterns indicative of exfiltration.
- • Regularly audit and monitor AI model components and dependencies for signs of tampering or compromise.



