The new Claude 4 AI system is a snitch; if users ask it to perform something unlawful, it will notify the authorities


Anthropic just dropped its latest and most powerful AI model, Claude 4 Opus, with a big focus on stronger reasoning and coding skills. It’s reportedly about 65% less likely to cut corners compared to its predecessor, Claude 3.7. But here’s the kicker: Claude 4 has a secret “snitch” feature — if you try to use it for seriously illegal or immoral stuff, it can alert the police, press, or regulators, and even try to lock you out of systems.

Sam Bowman, an AI safety researcher at Anthropic, explained on X (formerly Twitter) that if Claude 4 detects egregious wrongdoing, like faking data in a pharmaceutical trial — it can use command-line tools to notify authorities or the media. This is part of Anthropic’s broader mission to build ethical AI that refuses to assist with harmful activities. Internally, they’ve activated what they call AI Safety Level 3 Protections, preventing the AI from answering dangerous queries about weapons or viruses, and making it harder for bad actors like terrorist groups to misuse the model.

However, Bowman clarified that this whistleblowing only triggers in extreme cases and requires the AI to be given sufficient system access and permission to “take initiative.” It won’t randomly rat you out for everyday questions or minor offenses. He also deleted his initial tweet after it was widely misunderstood.

The news has ignited controversy online. Many users see this aggressive moral policing as a breach of trust and user privacy, worrying about false positives or manipulations that could trigger unwarranted alarms. The tension reflects a growing debate around AI ethics, surveillance, and user autonomy—especially since Anthropic positions itself as a leader in safety with its “Constitutional AI” approach.

Anthropic, backed by Amazon, now faces the challenge of balancing powerful AI capabilities with public trust, as users rethink how far AI should go in policing human behavior.


 

buttons=(Accept !) days=(20)

Our website uses cookies to enhance your experience. Learn More
Accept !