Commentary

Ben Thompson's OpenAI critique: AI safety culture cost America the race

Jan 29, 2025

Key Points

  • Ben Thompson argues OpenAI's safety-focused culture and nonprofit governance have constrained product velocity and ceded competitive ground to DeepSeek, with estimated annual spend of $50 to $100 million on cultural guardrails rather than innovation.
  • OpenAI's decision to withhold GPT-2 weights and hide chain-of-thought reasoning in o1 lacked meaningful safety justification; Anthropic's transparent r1 proved users preferred exposed reasoning, undermining the safety rationale for secrecy.
  • As model costs approach zero through optimization, distribution and user interface become the differentiator; OpenAI maintains an edge over Google and Apple by communicating product progress clearly to users.

Summary

Ben Thompson's latest Stratechery update argues that OpenAI's safety-focused culture, rooted in nonprofit governance and Silicon Valley politics, has cost the company competitive ground to DeepSeek and constrained American AI innovation more broadly.

Thompson's core critique centers on a distinction between legitimate AI safety concerns and the cultural impositions built atop them. The safety question itself—concerns about deceptive, biased, or abusive language—is real but modest in scope. The problem, Thompson argues, is how that framing has been weaponized to justify a broader set of decisions that have nothing to do with existential risk. This is what he calls the "motte and bailey" of AI safety: a defensible position (safety matters) used as cover for something else (imposing specific cultural norms through product decisions).

The GPT-2 decision is Thompson's primary exhibit. OpenAI withheld model weights, training code, and datasets citing concerns about deceptive content generation. That decision kicked off the company's pivot away from open-sourcing, a strategic move Thompson sees as having ripple effects across the entire organization. The constraint wasn't technical or safety-driven in any meaningful way—deceptive language at scale is not an extinction-level threat—but rather a proxy for internal political concerns.

More concretely, Thompson points to OpenAI's approach to hidden chain-of-thought reasoning in the o1 model. The company justified the opacity as necessary for model monitoring and to prevent competitors from stealing ideas. But when Anthropic released r1 with exposed chain-of-thought, the market spoke decisively: users preferred seeing the model's reasoning. That transparency became a UI breakthrough—it teaches better prompting, increases trust, and is more delightful. The safety rationale for secrecy dissolved the moment a competitor proved it wasn't necessary.

The broader problem Thompson identifies is that US tech policy has become defensive rather than competitive. Instead of outinnovating rivals, American companies have focused on blocking and regulatory capture. This matters because it shapes how companies allocate resources. If OpenAI spends $50 to $100 million annually on safety teams primarily tasked with implementing cultural guardrails—Thompson's rough estimate—that's capital and engineering cycles unavailable for product velocity. Meanwhile, DeepSeek moved fast, open-sourced aggressively, and in doing so became the calibration event for how far behind American models actually were.

Thompson reframes DeepSeek's release as a favor to the entire industry. From a purely geopolitical standpoint, a Chinese lab probably should have kept its progress secret. Instead, it published detailed papers and open-sourced the model, allowing researchers at Berkeley to duplicate its results within a day. This echoes Google's 2004 S-1, when Google revealed to the world that it had built the largest supercomputer cluster from commodity hardware and distributed algorithms. That transparency, combined with ruthless efficiency, defined a generation of tech infrastructure. DeepSeek did something similar: it showed that efficiency, not scale, was the limiting factor.

The critique includes a shot at the false dilemma that has dogged Sam Altman. Critics simultaneously claim he's nontechnical and bears no responsibility for innovation, while also accusing him of recklessly releasing ChatGPT without board approval. Thompson notes the contradiction is worth naming: the decision to tweet out ChatGPT and reach 100 million users in weeks was clearly correct from a business standpoint. From a nonprofit safety org's perspective, it was reckless. Altman may have been fighting an internal culture war—a nonprofit structure insisting on slowness and caution—and simply chose speed. That was the right call.

Thompson closes with a reminder that the race is not over. Google's Gemini 2.0 Flash Thinking appears competitive with r1, potentially cheaper and with longer context. But Google's ability to ship that news is hampered by organizational dysfunction and marketing missteps. The company has spent years renaming models (Bard, Gemini, now Gemini Advanced), making it impossible for users to understand what they're accessing. It owns search and YouTube but cannot seem to communicate product progress to the world. That's a shipping and messaging problem, not a technical one.

The larger implication: if all models eventually become cheap through optimization—approaching zero marginal cost—then distribution, aggregation, and the entry point matter enormously. Apple, Google, and Facebook don't need to win the model race. They can drop inference into their platforms, make it seamless, and capture value by being the interface. But that only works if they can articulate what they're offering and get it to users. On that measure, OpenAI is still ahead.