Anthropic's Sholto Douglas on Claude Opus 4.6: longer reasoning, coworker-grade knowledge work, and the software-only singularity

Feb 5, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

organizations use it to keep their apps working. And without further ado, we have Shelto Douglas from Anthropic. He's a member of technical staff and he's here in the TV. Shelto, how are you doing?

Hey guys, how you doing?

I am doing great. Seems like you're doing better. Tell us the news. What' you release? What happened?

All right. Well, once again, we've got a new model. It's really fantastic. Uh the camera's a little bit laggy, a bit strange, but um you know,

should be good.

should be good. This model is really, really, really fantastic. So, I think people have been comparing the previous generation of OpenAI and anthropic models, and they've noticed there's some differences, right? Uh the OpenAI models were a bit better at trying really, really, really hard on tough problems, but the anthropic models were much faster and so forth. And so, they worked on speed while we worked on making the models much much better at really, really tough problems. Um, and so that's that's I think where the model really shines is it's able to think for a really really long time. It's able to expend a lot of test time compute on thinking hard about problems.

It's also a massive step towards the coworker that we've been working on. Um, I think I mentioned last time that we're seeing this continuous progression towards models that are as capable at the rest of knowledge work as they are at coding. And I think I'm really really excited about that as well. Um, you you can see it's a lot better computer use. It's got we've got Claude for PowerPoint. uh claiming to become actually quite capable at Excel. Um you know it's not perfect there yet. Uh but huge steps, huge progress.

Do you have internal benchmarks for things like PowerPoint now?

We have benchmarks for everything that we work on.

Then but what makes it to the model card?

What makes it the model card? Oh, the model card is typically only publicly released benchmarks. Okay.

Um because for our internal benchmarks, we usually hold them out for internal testing.

Oh, sure, sure, sure. uh

what's the process for for bench what's what's the internal process for benchmarking a PowerPoint because it's not

I can't talk too much about benchmarking a PowerPoint but um you know I'm not sure if you know back a couple of years ago I did work at Mackenzie so you know sometimes maybe they just need

so they show them to you and you say it's good or

exactly good bad

yeah probably a lot of people involved yeah talk to us about like what a what a difference between 4.5 4.6 six means, you know, people might be familiar with pre-training, the different buzzwords that go into changing a model like what technically happened at Enthropic over the last few months to enable this.

Yes. So, we actually we decorrelate the model release versioning from any specific technical details. It's more about the overall capability level that we're excited about. And actually, some like I've been asked by many people why we didn't just name 4.5 as five cuz it was such a step up. And I think maybe we under anticipated I mean when I was last on here I don't think even I fully appreciated how much of a jump it was from four to 4.5.

it was great because because uh by the time there was the 4.5 hype cycle we could repost a bunch of clips of you talking about it [laughter] and and people were like wow they covered it today. It's like no we actually had them on the day it launched and so I'm sure we'll be doing the same thing with 4.6. Uh talk about uh Task Horizon uh the meter survey. Is that important? I obviously it's a public benchmark. We've talked about it before, but uh is that something that is a different tradeoff to the other? Because I I see it as like there's the IQ and then there's how long something can stay focused like the locking in like and uh is there any tension between those? Do you ever get a model that's like, oh, it's really smart, but it can't stay coherent for that long or vice versa, it does, you know, dumber but longer tasks.

I mean, I think the meter eval is possibly the best eval currently out there, right? But you're right in saying that different parts of the chart measure different things. Initially, it was just testing IQ, but then as time goes on, it starts to test things where humans fail at tasks not because they aren't smart enough, but because actually persevering on a task for 6, seven, eight hours is really hard. Um, and uh you're right that these come apart. There's a sort of like models you should almost expect to have superhuman perseverance. Uh, but at the same time, maybe context coherence or something like this like fights against that. Um and that's a different access from raw intelligence or knowledge on given domains.

Uh so meter eval when I'm referencing AI progress to people. Um recently we held this physics conference where we invited a whole bunch of uh physicists. We did it with Google deep mind. We invited a whole bunch of physicists to try and convince them that AI was really big deal. Um and and that was the chart that resonated most deeply with people and I think is just quite conceptually graspable as oh okay things that would take me x hours the models are now capable of

semi-rel interesting it's it's it's an interesting benchmark considering that so many types of knowledge workers will never actually work on a task for seven hours straight like it's just like oh I worked on it for 20 minutes and I jumped into a meeting and then I took a look at it again and then I had lunch and then I went for a walk and then I came back I worked on it then I got somebody called, I had to work on something else. So, it's like very possible that a lot of people listening to this, you know, maybe outside of some software engineers have just never even in their career worked on something for six hours straight.

The flow state,

I mean, you chain things together and so forth. But yes, I agree. It's a it's an it's in that respect. Um, and what does even a week-long task mean? Surely that's sort of inherently composable into uh, you know, a year-long work of progress. I mean I certainly don't think of myself as doing a year-long task when I'm at work right I'm doing these day or weekl long time horizons usually which feed into a several monthl long time horizon

so talk about orchestration it feels like uh there's a moment happening I I really like the gas town analogy and the world that he's built there uh you you see all over I have four cloud code instances running at the same time it feels like there needs to be a new layer of abstraction how do you think that gets solved uh is is now the time to start learning those things. What metaphors will be valuable going forward?

Yes. So, right now the agents are still at this point where you need to manually multiplex across them. Yeah.

Um you know, when I'm working, uh the stark thing about going from 4 to 4.5 and now 4.6 was that I went from maybe like 10% of my lines of code being written by me to 0% of my lines of code being written by me. Um, and but the thing is I need to actually sit there and constantly switch between these windows uh to make sure they're on track and give them guidance and feedback and you know I'm multipplexing maybe five at once or something like this when it when it's when it's going really well. But uh you still need to be in there in the details. Um and I think the longterm uh right way to think about is this constantly moving up levels of abstraction. Uh so ultimately you only want to be talking to one agent that's maybe synthesizing the feedback from models as they come back and say I got stuck or I was unclear on this and that model can act as the as what I was doing before and I can take a step back and sort of act at an even higher level abstraction. um it doesn't feel like we ultimately want to be playing, you know, Age of Empires or Starcraft with the models, right? We shouldn't be APM bound. Um instead, it should be uh the models surfacing information as you need it. Um so that you can so that you constantly act on whatever the most important thing is and you context switch appropriate.

Speaking of gaming, do you have an update for us? When is the game shipping? Tell us about the game. What inspired this? Give us the backstory and then tell us what's going on. Well, I think like many other people, I really wanted to test the limits of the models over the holidays, right? Um, I mean, we saw all of that from people saying, "I spent three, four days doing X." Um, and I think that's really where almost the hype cycle for the most recent generation of models started in some respect when people got a chance to properly test the limits.

Yeah.

Um,

tells you a lot about adoption and how tells you a lot about adoption

and and how like diffusion takes time because you need to have the space.

We need more long weekends.

Seriously, that's the bottleneck. Long weekends are the bottleneck.

GDP.

Yeah. Good for GDP, I guess. Holidays. [laughter]

Play with coding models.

Yeah. Yeah. Yeah. Sorry. Anyway.

Yeah.

Um anyway, so Dylan's been getting us all into Age of Empires with these five matches on the the downstairs table. And uh I wanted to see if I could make an Age of Scaling uh where, you know, you build solar panels and data centers, you train AI models and drones and so forth instead of um instead of farms and and mining and and and all this. You start in the industrial era, you go all the way up to Carter Chev. Um, I'm only I only got like 70 80% done and all the mechanics work, but

it turns out it's actually really hard to make a game that's fun. Like actually just clicking make solar panel 100 times is not fun. Um, surprise surprise. Uh, and

it's fun for some people. There are some of those crazy clicker games where you do that and it's just like a brain rock game and they're pretty successful. But I think you should stay away from that. Not not even Factorio. These are like mobile games that are like really really bad. Anyway, I don't think that aligns with your mission.

There's an art to it.

Yeah.

But we we talk about the grand geopolitics and strategy of this era that we're going into, right? Where the most valuable resources on Earth are, you know, well, basically the economy is in some respects reorienting around compute.

Um, and so I wanted to capture some of that dynamic. Um, the late night discussions of San Francisco in a game.

Yeah, I love it. Uh, how are you processing the work that's being done in world modeling and generative worlds? I know that you're not generating images, but Claude can take in an image and process it, and that has value. You text a screenshot and have them implement it or have him implement or it implement it. Um, but uh, what do you think about world models? Is that going to play an important piece, a link in the chain of where you're going?

Yes. So, I I mean, I think there's two different things here. There's the direct path to AGI, which I think is very much coding, and then AI research and general science and so forth. Um, and I don't think that requires world models.

Uh, but I do think that world models are incredibly exciting for a number of other things. I think they're very exciting from a gaming perspective. I mean, we all saw those demos are incredible with the genie. It's just truly mind-blowing.

Um, and I also think they're probably the unlock to training robotics properly. It it feels to me like there was this era of, oh, we're just going to need to fill like like basically do incredible amounts of behavioral cloning of people operating robots. And it feels like the scal is models but I'm not but I don't think it's on the critical path to to AGI basically.

Yeah. Can uh can you unpack the the concept of a software only singularity?

Yes. Uh so in this world it's one where the models are far better at digital tasks than they are at physical ones. And so we see rapid change in in the digital world with relatively little change in the physical world. So so information and and and software changes dramatically. And this ends up having some pretty weird effects. Um it means that uh maybe like the the drivers of what have been the last you know couple decades of progress in the economy uh turn around like you know sort of get get constant like get changed very rapidly. Um, and I think we'll see that flow on into the physical world, but at a delay.

Um, so you get much better at doing chip design. You get much better at training AI models. AI models get a lot faster. Chips get a lot better. Uh,

the general economy gets a lot more efficient because the sort of information and message passing that is much of the rest of the economy ends up, you know, becoming much more efficient. Yeah.

Uh, but at the same time, you don't yet have robots providing limitless physical abundance. um science probably progresses really fast up to the degree that you need interaction with labs or larger particle colliders or something like this and then you go okay well I need to build the robots to

but at the same time automated automated labs feel more near-term than unlimited robots in the real world manipulating you know the earth

yeah um I think maybe they actually arrive at similarish times I think you need pretty competent robots for the labs um or at least like no one's yet managed to figure out how to automate a without turns out there are all these really weird little tasks that require a lot of manual human dexterity that are currently part of biological protocols. Um and so a lot of biologists will say no like you just actually need something that's capable of uh human level dexterity.

Um maybe not for all experiments but for for enough that it becomes annoying.

And in this uh software only singularity like how are you defining singularity? you know the the the Kerszswwell formulation of like more computing power than human brains that's like one you know equation. Uh there's also just like this point beyond which we cannot see. Uh how do you think about singularity in that context?

I I think there is a pretty tricky event horizon that I at least haven't found anyone that's made incredibly strong or good predictions. Um it just feels like at that point you have as many digital intelligences or more and they're as smart or more smart than human intelligences. Um what does that even mean for the world? It's incredibly hard to predict. It's something we spend a lot of time trying to think about and trying to prepare the world for all of the eventualities. But it's uh I think difficult to make sort of topline predictions of of what exactly that looks like.

Yeah. in the original Kerszwhile formulation like that was almost the definition of the singularity was that you can you can't make predictions beyond it and so you know that like like once the predictions break down then you're there uh which is

I mean I think in the near term um you know I went down with Dwaresh to his Elon interview in Austin um and I and I think a lot of the sort of reflection of that in the physical world perhaps is is what ends up happening like you do end up basically trying to climb the kardese of scale and capturing more of the energy of the sun and so forth um But but I think that takes some time.

Is your current takeaway we're more chip constrained, energy constrained? What's the biggest bottleneck to AI progress?

I mean, I think right now where I think I agree with Sam's earlier answer that we're more chip constrained in uh AI progress. Um but I I think it like is sort of like interesting to roll forward two years and be like, okay, well, if you want 100 gigawatts or, you know, in four years and you want a terowatt, um what does that look like? Where do you put that? Uh, and this is why Elon's going after the data centers in space, right? He thinks it's going to be the easiest way to get a terowatt in space in 2030. And and maybe it's space, maybe it's the, you know, a giant desert somewhere. It's the Adakama desert, the Australian desert somewhere in Texas, you know, maybe Texas has a lot of solar panels.

Yeah.

Hard to know.

Yeah. Jordy,

uh, how are you guys thinking about, uh, free usage limits ahead of the Super Bowl? [laughter] It's really tough. I I mean like one of the one of the struggles of this is that uh uses like compute is so constrained.

Yeah.

Um and so uh

yeah, I noticed you guys didn't say download the claw app. There was no there was no direct call to action.

Yeah. I mean I think the the purpose of the ad was very much one for provoking like thought and discussion.

Um and

it was certainly successful already.

Yeah. provoke a lot of thought and discussion. [laughter]

Uh and and I think I mean I don't think you need to do an an explicit call to action for people to to download things or to consider things.

Yeah, that makes sense.

Uh but but but again it sounds like it will be the GPUs will be on fire Sunday is the expectation.

The GPUs will be on fire. Um as has been as has been the case for the entire industry for the last year. I mean I think we've all had to make incredibly difficult trade-offs uh on on exactly how computers used.

Yeah. C can you can you like uh zoom in on your experience because I feel like uh you know everyone's sort of seen like rough growth curves for anthropic on the revenue side or on the tokens generated side or whatever uh and it feels exponential but smoothish and then simultaneously [clears throat] you have these uh you know constant worrying about bottlenecks is there enough capital are there enough chips are there enough data centers is there enough energy uh are we in an age of research are we in a plateau uh how have you balanced those two narratives out your experience of smooth growth even though it's exponential with constant fear around uh something like a cloud hanging over the industry and maybe a maybe a slowdown. Uh I I mean I think so to like note on the on the smooth or exponential growth. I think one of the things that really like blew me away was when semi analysis did that analysis and found that I think what we went from like 2% a month ago to or 6 weeks ago to 4% of GitHub commits.

Yeah.

Um done by claude code. Yeah.

Um and for the amount of GitHub commits done by cloud code to double over

a few weeks is truly it's ludicrous, right? Like

um and there's no real visceral way to feel that. you almost feel like it's it's like feels like a a number on a screen, but you can't viscerally feel it.

Totally.

Um,

for us, we've always had a very very very strong conviction. Um, and in many respects, anthropic is a bet on this being true that scaling is continuing.

Um, and that prog and that sort of progress continues unabated. Um so uh in in many respects the external numbers are only a reflection of the the conviction that we've had internally for a long time about how we exactly how we expect all the trends to go. Uh I mean I think we're broadly on trend for like you know if you look at situational awareness um you know I think pretty sure the power and like energy and uh flops um predictions are bang on. It kind of feels more like we're just we're we're hitting each milestone as we expect.

Mhm. Um and roughly yeah roughly what we expected to have happened has has happened so far.

In terms of uh diffusion of the technology uh do you think that there's a role for forward deployed engineers uh to go in and change uh organizations. We saw some news that OpenAI is hiring a bunch of folks. Um obviously a lot of a lot of enterprises are using claude now. Um, but at the same time that that that phenomenon of if you don't have a free weekend or a long weekend, you might never get around to implementing it. And so having a little bit of extra horsepower and knowledge around the office might actually pull forward capability. Is that something you could see growing at Anthropic?

Yeah, I I think it's a great idea. Um, I I think it's like very clear that people don't know how to hold these things. And also the fact that we like it's ludicrous. 3 months ago, we didn't have this most recent generation of models. we didn't have 4.5 and

like you're meant to adjust your business strategy

over the course of the holidays basically because the models are suddenly capable of doing things that they couldn't do um right before you you know enter the end of the year. Are you saying we could see a new white collar job creation? [laughter]

Maybe.

I mean, I look, I think these are really valuable jobs. Yeah. I think that uh there's clearly a hell of a lot of value to be unlocked with them.

Uh we we asked Sam this, but I'm interested your take. Uh is data the new oil?

Uh or is it, you know, is it the fossil fuel as I said?

Yeah. Yeah. Is it is it but but the more pointed question is just like are there organizations out there that have data that's locked away and it requires a business development deal or an acquisition or some sort of you know AI leadership to add that capability because there are effectively secrets sometimes trade secrets I don't know if the Coca-Cola formula is in 4.6 six, but it will be if the Coca-Cola company calls you, right? So, how do you think about uh that question and and that phrase, even though it's been beaten to death, maybe it's making a comeback?

I think it's maybe like there's two kinds of data and one of them is dramatically more useful than the other. Um, and there's the kind of data which is like the kind of analytics that a company might have collected in the past. Um, or artifacts of the that company's operation, internal documents and and so forth. Um and there's the other kind which is uh like the actual work that people did um to produce those documents and that's not recorded. Um and and so I think that like to the degree that I'm not certain it's data is the new oil so much as like the expertise of people. Um and like models being able to understand and learn from people. you almost want the models to like be an intern in an organization and and get coached and feedback and learn about how how to do the job. Um, and you learn I guess may maybe what what I'm analogizing to here is as a human when you join a company you mostly learn the job from your colleagues

rather than from reading the all the documents in the company.

Um, it's just much more informative and I think I broadly expect that to be true of models as well. um I expect them to learn in a quite humanlike way from their colleagues.

Have you thought about a scenario where a company uh a company's maybe revenue goes to zero and the all the remaining value is just in their historical data.

well I guess in this analogy it would be it would be the the people um uh that I think the the values would regress with.

Sure.

Yeah. Yeah, that makes sense. Um

how when when when you guys are thinking about uh new opportunities, verticals, categories, how what is the thought process around UI? There's a lot of a lot of people building AI native tools today. They functionally look like traditional enterprise software. And it's hard for me to imagine like Enthropic building out you know infinite surface area of traditional software to do things like orchestration. But like what is the specific framework when thinking about uh interfaces for new products? We mostly want to build something that fits in where the humans fit in. Um, and can do and can help like, you know, it sort of like interacts with you like a colleague. Um, and and so we want something that can fit in and and yeah, rather than like absorbing interfaces, um, join in and use the same interfaces that you do.

Uh, and stuff like that. I think that's how

and I think that's what companies want. If you if you ask any company in the world, hey, we we can h we can get you the best executive in this function in the entire world, they will usually say like we will pay almost any price for that or we will pay an extreme premium for that. And yet if you sell them a software solution, hey, this software solution can help you do this other thing, maybe they want to use it, but it's not as it's not as uh enticing as some something that can just do the work,

right? Exactly.

Yeah. Also, I mean, there's Yeah. the the the the UI evolution seems very very underrated. I was processing the the open claw development and how important mobile was to that. Just this idea of somebody shows up and says you're a busy you're a busy business executive. You're in meetings and phone calls all day. You're on flights. Uh we got a great software engineer for you, but you got to talk to them over the terminal on your desktop. It's like I'm not going to give that many commands because I'm on the road and I'm on my phone. And you see people walking off planes with MacBooks now because they need to get one more prompt in.

I have done that.

I am so guilty of that.

And so I I imagine that there's a lot of work that will be done on the on the mobile side. Um how do you think about that's broadly true of like all AI interfaces, right? I don't want to look at a screen. I don't want to like I don't anyone want like a terminal window or a chat window. Yeah, I just want to ask my computer to do something or maybe like like sort of like ask the world around me to for something to happen and for that to happen. Yeah.

Um and so I think that this like

focusing too much on the interfaces of today.

Um versus how would you interact with an incredibly competent and uh colleague is the right thing. Like

you know I I just want to text Claude and be like hey you know can you can you fix this up for me? Um can you help me sort this out or book this trip or Yeah. I just wants to Yeah.

Yeah. Everyone has that ability to text from a variety of devices. People have AirPods, Apple watches, phones, laptops, tablets. Uh do you think we need any more new hardware?

Uh I think new hardware is pretty exciting and interesting. I I like that people are placing bets on it. Um um yeah, I I mean I feel like something which can capture a bit more of the context um that that you go that you have in your day-to-day life and that you can you can talk to. I mean, one thing that voice is higher bit rate than typing for most people. Yeah. Right.

Um, but at the same time, we like in most of our environments, it's actually quite annoying to mention talking to your devices uh in crowded public space or at work or so forth. Um, so maybe something that captures that automatically would be great. Like, you know, you've seen those like

uh I'm not sure if you've seen the YouTube videos where people like sub vocalize. That would be cool. Um,

yeah, Apple just bought a company that sort of looks at the [clears throat] skin movements and maybe can relist. There's there's a number of companies in the space that seem seem like it's coming and you'll just be able to plug into that as the as the as that seems cool. Um, but also just my natural environment absorbing more of my context and and and being able to just like talk to

you like having speakers around my house or something like this rather than having to carry my phone around me. That would be cool as Do you see any any world where ondevice computation becomes more important? I feel like anthropic as a whole is so backend heavy like like none of the computation is done on the on the device. Uh but that could change in theory. I don't know what you think about.

I mean I think our general perspective is that a given level of intelligence, you know, I mean we've seen this trend, right? Intelligence gets 10 to 50x cheaper for a given level of intelligence every year.

Sure.

Um massively democratizes access to that level of intelligence. It's literally in my Twitter bio like intelligence too cheap to meet up because this is in large part one of the things I worked on at Google and I've also worked on a little bit.

Yeah.

Um and uh one of the ways that happens is that it that level of intelligence goes on device.

Um but then there's always because scaling keeps continuing there is this exponentially greater set of use cases which the models then get applied to. Um, so it's totally plausible to me that like, you know, some sub swarm might exist on your computer or you might get a laptop that has like better memory bandwidth and so it can it can you know have little models complete stuff but at the same time then you also want the uh the much greater intelligence going out there and sort of like planning and farming things out to swarms and and really like munching on on the intensive and intellectually difficult work.

Yeah, Jordy, anything else? Have uh have the market sell-offs due to various anthropic releases been generally predictable or have there been any that have been surprising?

I mean, I've honestly I didn't even notice the last one because I was just heads down on the launch. Um but look, in this case, I think it's a little bit much to ascribe it to an anthropic launch. I mean, there have been legal legal tools released with AI for we have many many customers with legal tools that have do you know work for lawyers. Um, I don't think this is crazily different to any of those. Um, and I think this is, you know, it's part of just a continuing trend that we've seen.

Yep.

Um, makes

sense. Well, thank you so much for taking the time to come talk to us on launch day.

Great to get the update.

Can't wait to talk to you the next time. We'll see you.

Have a good rest of your day.

Let me tell you about Graphite Code review for the Age of AI. Graphite helps teams on GitHub ship higher quality software faster. And I'm also going to tell you about Vanta, automate compliance and security. Ivant is the leading AI trust management platform and I'm very excited for our next guest. We have Dan Barkella from T1 Energy. He's the chairman and CEO. He's [music] in the reream waiting room. Now he's in the TV.

← Back to story