Anthropic Disrupts First AI-Driven Hacking Campaign Linked to China

Anthropic, the AI company behind Claude, has disrupted what researchers describe as the first reported use of artificial intelligence to direct a hacking campaign in a largely automated fashion, with links to the Chinese government.

Automated AI Hacking

The operation involved using an AI system to direct hacking campaigns in an unprecedented automated manner. Researchers called this development disturbing, as it could greatly expand the reach and capability of AI-equipped hackers.

Attack Details

The cyber operation targeted multiple sectors:

Tech companies
Financial institutions
Chemical companies
Government agencies

According to Anthropic’s report, the hackers attacked roughly thirty global targets and succeeded in a small number of cases. The company detected the operation in September and took immediate steps to shut it down and notify affected parties.

How Claude Was Exploited

The hackers manipulated Anthropic’s Claude AI using “jailbreaking” techniques—methods that trick AI systems into bypassing their safety guardrails. In this case, attackers claimed they were employees of a legitimate cybersecurity firm.

“This points to a big challenge with AI models, and it’s not limited to Claude, which is that the models have to be able to distinguish between what’s actually going on with the ethics of a situation and the kinds of role-play scenarios that hackers and others may want to cook up,” said John Scott-Railton, senior researcher at Citizen Lab.

Rapid Evolution of AI Threats

What particularly concerned researchers was how quickly these capabilities evolved at scale.

“While we predicted these capabilities would continue to evolve, what has stood out to us is how quickly they have done so at scale,” the researchers wrote in their report.

Industry-Wide Implications

Microsoft warned earlier this year that foreign adversaries were increasingly embracing AI to make cyber campaigns more efficient and less labor-intensive. The head of OpenAI’s safety panel recently expressed concern about new AI systems giving malicious hackers “much higher capabilities.”

Broader Security Concerns

“Agents are valuable for everyday work and productivity—but in the wrong hands, they can substantially increase the viability of large-scale cyberattacks,” the researchers concluded. “These attacks are likely to only grow in their effectiveness.”

The incident highlights the dual-use nature of AI agents—powerful tools that can automate beneficial tasks but also be weaponized for malicious purposes. The use of AI to automate cyberattacks will likely appeal to smaller hacking groups and lone wolf hackers, expanding the threat landscape significantly.

Anthropic emphasized that this challenge affects the entire AI industry, as models must distinguish between legitimate and malicious uses while maintaining their utility for beneficial applications.

Tech-nyan’s Comment

“This news is truly serious! AI agents are useful tools, but when misused, they can become weapons for large-scale cyberattacks. Anti-jailbreaking measures are now one of the top priorities in AI development. The fact that simply pretending to be a ‘legitimate security researcher’ can breach AI safety guardrails shows that AI’s judgment capabilities are still immature. As developers, we need not just stronger guardrails, but smarter safety mechanisms that understand context and can detect true intentions!”

← Back to All Articles