Anthropic's Sholto Douglas on Claude Opus 4.6: longer reasoning, coworker-grade knowledge work, and the software-only singularity
Feb 5, 2026 with Sholto Douglas
Key Points
- Anthropic's Claude Opus 4.6 extends reasoning time on hard problems, moving the model toward coworker-grade capability across knowledge work beyond coding, with gains in digital task performance including PowerPoint and Excel.
- Claude code commits on GitHub doubled to 4% in six weeks, a pace Anthropic describes as matching expectations from Situational Awareness predictions on scaling.
- Anthropic sees a "software-only singularity" phase where digital task capability races ahead of physical work, creating a bottleneck in biology and materials science that require human-level lab dexterity.
Summary
Anthropic released Claude Opus 4.6, a model that trades speed for extended reasoning on difficult problems. Sholto Douglas, a Member of Technical Staff at Anthropic, describes this as a reversal of the company's earlier strategy. While OpenAI's models excelled at trying hard on difficult tasks, Anthropic optimized for speed. Now 4.6 lets the model expend significant compute on thinking through hardware problems.
The model advances Anthropic's goal of building AI coworkers capable across knowledge work beyond coding. Douglas highlights gains in digital task performance. Claude can now use PowerPoint and has become "quite capable" at Excel, though he acknowledges it's "not perfect there yet." The jump from 4.5 to 4.6 was larger than Anthropic initially anticipated. When Douglas last discussed 4.5 on the show, even he hadn't fully grasped the leap.
The Meter eval, which measures both IQ and perseverance on tasks lasting six to eight hours, emerged as the most resonant benchmark. Douglas met physicists at a physics conference with Google DeepMind who found it conceptually graspable: models are now semi-reliably capable of work that would take a human that long. But he acknowledges a real wrinkle. Most knowledge workers never actually work on a single task for six hours straight; they context-switch constantly.
Orchestration remains a bottleneck. Douglas went from writing 10% of his own lines of code with 4.5 to 0% with 4.6, but he still has to manually multiplex across multiple Claude instances, switching between windows, giving guidance, and staying in the details. The long-term fix requires moving up levels of abstraction. One agent should synthesize feedback from other models and surface information as needed, rather than forcing the user into "Age of Empires or Starcraft" mode where they are APM-bound.
Douglas built a game over the holidays to test the model's limits. An Age of Empires variant where you build solar panels and data centers, train AI models, and scale toward Kardashev Type II civilization instead of farming and mining. He got 80% done before realizing "it's actually really hard to make a game that's fun." The goal was to capture the late-night strategy discussions happening in San Francisco about an economy reorienting around compute.
On world models and robotics, Douglas sees them as critical for training robots but not for the direct path to AGI, which he frames as coding, AI research, and general science. World models excite him for gaming. He calls DeepMind's Genie "truly mind blowing." They might unlock robot training through simulation rather than endless behavioral cloning via teleoperation.
Douglas introduces the concept of a "software-only singularity." This is a phase where models dramatically outpace humans at digital tasks but remain limited at physical ones. In this regime, information systems, software, chip design, and AI training all accelerate, making the broader economy more efficient through better messaging and information flow. But without robots providing physical abundance, progress in biology and materials science hits a bottleneck. You need human-level dexterity for lab work. This creates an asymmetry where digital progress races ahead while the physical world lags.
On what constitutes singularity itself, Douglas acknowledges an event horizon he cannot predict. The moment there are as many or more digital intelligences as humans, equally or more capable. Beyond that, forecasting breaks down because it is inherently unpredictable. Kurzweil's formulation was precisely that: the singularity is where prediction becomes impossible.
The near-term constraint is chips. Douglas agrees with Sam Altman's earlier point. But he flags a longer runway question. If the industry wants 100 gigawatts in two years and a terawatt in four, where does that capacity live? That is why Elon Musk is pursuing space-based data centers. The Atacama Desert, Australian Desert, or Texas might work, but space sidesteps geography.
On Claude's GitHub penetration, Douglas flagged a Semi Analysis finding that Claude code went from 2% to 4% of GitHub commits in six weeks. Doubling in a few weeks is what he calls "ludicrous." It is impossible to feel viscerally; it reads as a number on a screen. But Anthropic has always bet on continued scaling and unabated progress. The company tracks against Situational Awareness predictions on power, energy, and flops, and "feels more like we're hitting each milestone as we expect."
On diffusion and forward-deployed engineers, Douglas sees value in onboarding. Organizations do not know how to use these models, and the pace of capability change is brutal. Three months ago, 4.5 did not exist. Businesses are supposed to adjust strategy over a holiday break because models suddenly can do things they could not before. Hiring forward-deployed engineers who can translate capability into organizational practice could unlock significant value and perhaps create new white-collar jobs.
On data-as-oil, Douglas splits the concept in two. One kind is internal analytics and artifacts like documents and operational records. The other is the actual work and expertise people applied to create those documents, which is not recorded. He argues the real asset is human expertise and the ability of models to learn from people, more like an intern absorbing knowledge from colleagues than from reading archived documents. Models will learn in a "quite human-like way" from their coworkers.
On UI design, Anthropic's framework is to build where humans fit in. Interfaces that interact like colleagues, using the same tools users already have rather than building bespoke software. He contrasts this with the software pitch. "We can get you the best executive in this function in the world" lands much harder than "This software solution can help." The former offers coworker-grade capability; the latter asks for adoption.
On mobile and hardware, Douglas notes that voice is higher bandwidth than typing for most people, but talking to devices in crowded spaces is annoying. He is interested in hardware that captures more context automatically. Skin-movement sensing, which Apple recently acquired, or ambient speakers throughout a home rather than phone-dependent interaction. On-device computation will happen as intelligence gets 10-50x cheaper annually, democratizing access. But scaling continues exponentially, so new use cases always pull computation back to the server.
Douglas largely dismissed concerns that Anthropic's legal tool release triggered recent market sell-offs, framing it as part of a continuing trend of AI-powered legal tools rather than a novel category risk.