Hackers use nuclear weapon prompts to bypass AI safety systems

ICT

12 June, 2026
21:26

Hackers use nuclear weapon prompts to bypass AI safety systems

Cybersecurity researchers at Socket have identified a new piece of malicious software, Hades, designed for software supply chain attacks.

According to Report, citing Tom's Hardware, the malware uses an unusual method to evade AI-based code analysis systems. Researchers found comments embedded in infected JavaScript files containing provocative prompts related to the creation of nuclear and biological weapons. These prompts caused some AI models to stop processing the files before reaching the actual malicious code.

According to the researchers, during testing, the Claude AI model developed by Anthropic did in fact halt analysis due to its safety filters.

However, traditional cybersecurity tools were still able to detect Hades, including signature-based scanning, source code review, detection of suspicious fragments, and sandbox execution of samples.

Experts say the malware authors are also using additional obfuscation techniques. The malicious payload may be split between Python scripts and separate binary files, while some components only activate when executed in a target environment.

The main goal of Hades is to compromise development environments. It steals credentials and access tokens from services such as npm, PyPI, RubyGems, JFrog, Kubernetes, and AWS. It also targets SSH keys, Docker configurations, .ENV files, terminal history, and AI tool settings.

Socket researchers report that infections have so far been identified in 37 Python packages and 106 JavaScript packages.