In a significant policy reversal, Anthropic has publicly apologized for implementing hidden guardrails in its latest AI model, Claude Fable 5, admitting that such measures compromised the integrity of both researchers and competitors. The tech company faced severe criticism over its decision to stealthily throttle user queries deemed as attempts to distill its proprietary technology into competing models.
Anthropic's announcement comes amid heightened scrutiny from the AI research community. The company now acknowledges that users “should have visibility into the safeguards we have in place, and why,” a commitment that it intends to honor going forward.
The Shift to Transparency
Previously, queries that invoked these covert restrictions were met with modified or degraded responses, unbeknownst to users. Under the new approach, if a user attempts a query that triggers distillation safeguards, they will receive responses rerouted from Claude Opus 4.8, Anthropic’s previous flagship model. “You will see this every time it happens,” Anthropic stated in a post on X, emphasizing its dedication to clearer communication.
These changes reflect Anthropic’s acknowledgment that while
Source: The Verge