Physical Intelligence's PI05 robot can clean a home it has never seen — 50% of the time
Apr 23, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Karol Hausman & Lachy Groom
What's happening? I'm so happy. Um welcome to the show tech. You can ask him about how it was. Yes. Yes. I mean, maybe that's a great uh way to start. We're huge fans. We were just singing big tech's praises saying that we would do anything to support big technology and yet uh you left big tech.
Uh why did you leave and what are you building now? Uh we are building physical intelligence. Um we want to build a model that can control any robot to do any task. Mhm. Uh this is something I did explore in big tech before. Mhm. It's just much more fun to do it in a startup. Way more fun. Also a little quicker.
A little quicker. A little faster paced. Yeah. I mean, anything's become clear in the last few weeks. We need the robots now. Yes. We can't really wait 20 years. And uh if big tech was, you know, fully responsible, we probably would have to wait something like that. Yeah, for sure.
Uh so can you walk me through the most recent announcement? Uh I I saw the video. Uh fantastic. But but how would you break it down in terms of like what the milestone represents? Yeah. So the biggest challenge in robotics so far hasn't really been agility or dexterity, what robots can do. Y but been generalization.
Y kind of similar to what we've seen in in language before um where it was really really hard to get it to to do tasks where you just ask it to do something and and see if it can work. Um, and what we tried to do for the past six months or so is to get to the next level of generalization for these models for robots.
So the challenge we set for ourselves is to take a robot to a completely new home it's never seen before and ask you to do a complex long horizon task like clean a bedroom or clean a kitchen. And there is so many details that go into cleaning a kitchen that you need to understand when you're in a new home.
And you only start appreciating it when you try to do it with a robot where you know everything is different. Like the countertop looks very differently. You don't know where the drawers are. You don't know how to open them. You don't know where the objects are. You don't know.
I don't even when I try to wash dishes in my house, I'm always I'm like I can't Where's this where's the sponge? I don't know where the soap is. Where do I put this? So, it's a mess. So, not even. Yeah, I guess really hard. And on top of like knowing all of those things, then you also need to connect it to motion.
You actually need to get the robot to do the right thing. Um, and it turns out that with with PI05, which we which we just released yesterday, we we can do that. And it it doesn't work all the time.
It's not that I can just give it to you and it will work in your kitchen every single time, but it works quite often quite well. So we bring it to a new home and it can do those things maybe like 50% of the time, sometimes 80% of the time.
Big increase from 0% of the time, but yeah, big increase from what we've seen before where the the previous state-of-the-art was basically if you want to show a robotic demo, you need to collect data in that specific environment for those specific tasks and that's where you show it.
But now we can for the first time bring it somewhere else and it kind of works. It understands what it needs to do. Do you think that consumers will generally be more patient around reliability with robotics? Because if I, you know, let's say I have some type of robot in my home and I say, "Hey, do the dishes, right?
" And 50% of the time it does it perfectly and like, you know, the other 50% of the time I have to kind of interject. Whereas if I'm like booking a flight and only 50% of the time it like books the flight, it's like, well, I'm just going to do it myself, right?
Because like it's only going to take me a minute, whereas like the dishes could take 20 minutes, right? So h how do you think about kind of uh the sort of threshold of reliability in order to like really deliver value for consumers?
Yeah, I think the the like we don't think right now about delivering value for consumers and it's kind of why we structured the company the way we did. We're a research lab. We're trying to solve this problem of physical intelligence.
We really like these consumer oriented tasks and I think people tend to as well like when they think about laundry being folded for them. I think there'll be a point at which it gets good enough that we can deploy it to consumers, but it's not going to be like 50%. It's going to be closer to 98 99%.
And there I think we can harp on self-driving cars where there will be a period of of interventions, right? Like if it doesn't work, it's not that it just will stop and do nothing for a while. We could have a human teley operator intervene and finish the task.
But I think the other cool thing about the home and consumer use case is there's so much that could also just happen overnight. Um there's like while you're sleeping, your laundry is folded, your your your meals are cooked for like the you know they're prepped for the week ahead. Um your house is tidied.
Uh so consumers still a little far away but making a lot of progress. Can you talk about the uh just the path in terms of the underlying technology to go from a Roomba, we're pulling up the video here on the stream of uh what you've actually built.
uh what were the foundational turning points in terms of the like the different models and different breakthroughs? I imagine the transformer was really imp important, but there's probably a ton of other uh developments that excited you and now now is the time like we're ready to go.
Yeah, there's been a lot of things that we are we are building on top of um things like transformers, things like vision language models, the concept of pre-training and post-training. Yeah, a lot of those things transfer to the robotics world, but they're they're not as well understood.
We are still in the process of figuring out what that recipe should be like. We we kind of have to rediscover this some of the steps that that language people had to do initially and see how we can map them onto the robotics world. We don't have the the privilege of having an open internet full of data.
Y we need to collect that data ourselves which on on one hand is is a big challenge like the data isn't there you can't iterate nearly as fast on the other hand it also gives you more freedom in figuring out what kind of data is the most important and what data to collect for for this particular PI5 advancement what we had to do is one collect very diverse data set large diverse data set that involves not only model manipulators in homes but also static robots in the office or data of the internet.
And it turns out if you collect very diverse data across many different tasks from many different form factors, they all contribute to each other and that they contribute to to a better understanding for the model of what actually is happening and how to utilize all of the data to figure out what to do.
Um, so that was a really really big component. And then there's also a lot of architectural things, a lot of details that we need to get right to make sure that we take full advantage of that data. Um, interestingly, most of the data is actually not the the model manipulators in many different homes.
It's a very very small percentage of it. So, it gives us also a lot of hope that we can leverage the data off from the internet or from other platforms where it's easier to collect it to to get to that kind of generalization. Can you talk a little bit about uh simulation uh like data generation through simulation?
I imagine it's very easy to procedurally generate like a million different floor plans or a trillion different floor plans uh to try and navigate those in, you know, kind of two-dimensional space, but at the same time, when you get into the manipulating of a sheet, uh all of a sudden that's a physics calculation.
It's probably harder to simulate there. It's not like with self-driving cars, there's, you know, Grand Theft Auto can train on. Uh I haven't maybe there's a game where you clean up your house, but I certainly haven't played it. Uh how effective has simulation been in generating data and is it useful?
Is that a is that a viable path here? Yeah, it's been so far really really useful for locomotion for robots walking around. Yep. Um and the reason for this is that for that kind of problem, the main difficulty is modeling your own body. Like how do you place your foot? How do you how do you walk?
And that you can do once when you you know you model your robot really really well and then it works. It works across many different terrains. You can um easily you know randomize that and and figure out the gate that is robust. It hasn't worked nearly as well for manipulating objects for working with your hands.
And I think the reason for that is then the difficulty isn't about like how do you move your hands. It's more about the world that you're manipulating. And that is much harder to simulate. You don't just do it once.
Like every object you interact with is different and you have to model each one of those and it's really really hard to to figure out all the different physical parameters to make it to make it good.
Um so and and this is kind of the data that is the most important the data of of physical interactions because this is the data that is not on the internet. This is the data that is not even described in language. This is something that comes really natural to you. You just you just know how to do it.
You don't even sometimes know how to describe it in words. So I think it's kind of like the a really bad combination where it's the hardest thing to do for sim and is the data that we need the sim the most for. And what we discovered so far is that basically looking at the past successes of machine learning of of AI.
Uh it seems that the the best successes are where you you take real world data and large diversity of that data and and learn directly on that. You don't try to find some kind of proxy or some kind of simulated environment that reflects what you actually want to do. You just go after the problem headon.
And that's what we're doing here. So, we're collecting a ton of data ourselves in the real world. We can collect very diverse data this way. It's also very easy to collect it across many different scenes, many different objects. We don't need to, you know, create them in sim.
We can just buy them and bring them in and and start interacting with them. And that so far has been actually easier than we had initially thought. Yeah. I mean, you you say you're collecting a lot of data, but I I imagine like there's only so many of those robots in that demo video that you can manufacture.
There's only so many houses that are like, "Yeah, come try your 50% robot in my house.
" Um, is is is that a key uh is is that scaling as you'd like or or or is this more like you're going to build a physical uh you know demo like demonstration unit m like uh and then be manipulating it in a warehouse or is the plan to be more like let's roll this out and just uh have beta testers kind of dog food it for us?
All of the above. Um, okay. Basically, where we find there's benefits to scale at this stage, we'll scale it. We'll figure out a way.
And whether that's producing more robots, giving them to people, whether it's scaling up our um our operations team and the folks that tell operate these robots ourselves, whether it's going out and commercially deploying these into environments where they're doing economically useful or viable tasks as a training data set collection, we'll do it.
We also think there's there's so much though to do on just the algorithmic development that that can make the data far more useful that can uh reduce the necessities of scale but we're structured such that we can go and pursue every avenue.
What you guys both mentioned there is that was one of the big questions before PIO5 where it kind of was unclear you know do we have to visit million homes do we have to visit you know hundreds of thousands and at some point it becomes kind of not feasible or really really hard and maybe we need to find a different path but so far we've been quite surprised by how few different environments you need to see to be able to generalize to a new one.
It's awesome. We actually got really reassured that this path could could really work. Uh would you guys like to see way more uh early stage like robotics companies? It feels like there's you know the the Optimus, the 1X, you've got you know Figure making noise.
Um but it feels like you know we just covered this I think uh Monday the Chinese humanoid marathon. I'm sure you guys followed that. They've got a lot of people working on this problem.
It it it seems like u there's a tendency in venture to think that okay there's a bunch of heavily funded players now I shouldn't go build in that space but at the same time the when you look at some of these TAMs maybe we should have 10 times the amount of um you know early stage robotics companies uh getting started it's extremely early we work with I'd say probably most of the new robotics companies starting in the US and abroad uh if you're starting a robotics company reach out to us we'd love to work with uh you can build the body, we'll build the brain.
Uh but yeah, then we need to see a lot more robotics companies particularly in the US. That's awesome. Uh I want to talk. Did you ever interact with the Google arm farm? Yes. Yeah. One of one of my co-founders actually started that project. I had a feeling. Uh do you have your own version of an arm farm?
Or can you describe for people that might not know uh what was the genesis of the arm farm? What was the purpose? So what was the takeaway and is that does every robotics company need an arm farm or is it just you or uh and and what will that look like in the context of what you're building specifically?
Yeah, so back then that was few years ago the idea was that for robots to learn to to to acquire those kind of skills to manipulate the world around them you can't really prescribe it. You can't just code it all up. The world is too diverse.
You can't, you know, have a lot of if statements describing what you should do in every single situation. They need to learn it the same way as as we do.
And um the idea of the farm arm farm was to set up many different stations where you have static robots, static robotic arms where they just practice and they learn from experience. So in that particular case, they were trying to learn how to grasp objects.
So they just had a bin in front of them with lots of different objects that were very diverse. And the arm was just going down and trying to figure out how to grasp it. And over a long period of time, it gathered enough experience to actually learn from it and get and become really really good at grasping objects.
Like remarkably remarkably good and way better than any kind of prescribed systems that people that people designed by hand.
And on on one hand it was a big success because of that because you know it was it was clear that this learning approach is something that can truly work and and understand the nature of grasping and truly nail that skill. On the other hand, it was also disappointing in that it took really long time.
It needed a lot of data and especially a lot of data at the beginning was kind of just like the arm wandering around and not knowing what to do. So it seemed like a lot of that time was just kind of wasted with the arm figuring out the simplest things.
And one thing with IO5 that we are really excited about is that we are now at the stage where the robots kind of get the the sense of what they should be doing in the environment.
So they are no longer in this space where you know you just like arrive in a new home and you start with just like moving your arms around not knowing what to do and hoping that you you do something that is useful and then you learn from that. You start at a point where you kind of know more or less what you can do.
It works some of the time. You just now need to get it to work every time and really really well. Can you talk about the path to or the importance of endtoend learning in the context of robotics?
uh my understanding is that tea operation is great and as long as it's economical we should do it and then uh having a deterministic code uh like you know uh control system that's written in C++ that's also great as long as it works it's sometimes more debugable um but the reason that we want to get to endtoend AI systems is that then you're on the scaling law then you're just data bound and the more you can manufacture the more you can produce the actual robots.
You're on this flywheel and you're now bound by actual productive, you know, like getting the cars into on the roads, getting the robots into the world that will naturally create a flywheel. Uh that's what everyone's hoping for in self-driving. Um but but what does that path look like now?
And how ridiculous is it to claim that end to end robotics will be here by the end of the year or something like that? So end to end robotics is already here. Everything we've shown so far is fully end to end where you take camera input in and few other sensors and output actions directly. Wow.
Um I think there's another reason to do end to end learning which is this is I think the only thing that has a chance of working. Yeah.
Like if if there was a way to just pre-program your robot and write a really good C++ code to to get it to do all kinds of different things like folding laundry, we would have done it long time ago. Totally. It's not for the lack of trying. Many people have tried it for a very very long time.
Yeah, there's just the world is too complex. There's too many things that you will never see. You will never predict and you can't really write that code. And um I think the only way to get there is to is for for AI to figure it out from experience.
This is similar to what we've seen in language or in vision where people have tried to write chat bots with writing different instructions and prescribing logical steps of how you should proceed but it turned out that that the intelligence you need is much more messy.
It just like you give it a lot of text and you let it figure out all the different patterns and analogies and there's many many more of them than you know the ones that we can express in code or in language and I think something very very analogous is going to happen here and that's what we start seeing like the demonstrations that we that we've shown so far here at at physical intelligence are of tasks that were not possible before like things like folding laundry you can't really there is no program that I've ever seen that could that could do that the same with arriving in a new home and making the that there's just too many variables there to do it any other way.
Can you talk about um some of the experiences that you both have had in your careers?
Uh Google and Stripe uh in some ways big companies that maybe move slower than a small startup, but at the same time both of those organizations I feel like the time from hey we're starting the company to we have a product that it wasn't a research organization for years.
Um h what have you learned from those organizations? What are you taking into this experience? Stripe really taught me everything I need to know about building a a robotics research lab. Uh it's just lessons galore. I'd never worked in a research environment. Uh so I don't have priors.
I don't know what a research environment actually looks like. I just know what our research environment looks like. I feel like we whatever it looks like. Uh maybe it operates exactly like startups. Like maybe grad programs are exactly like startups.
But we feel like every other company I've worked with that moves extremely quickly and has a clear set of goals and direction and just has a bunch of people that work behind it and work extremely hard to solve whatever it is that that we're setting our our minds towards.
And and uh there's so much I learned from Stripe that informed that, but a lot of it's just the the obvious stuff, right? It's like comm hire exceptional people, set a really high bar for it, don't compromise, set a very clear set of goals for everyone, really align people.
Actually, I think that's one thing I really took from Stripe is you want an extremely aligned set of people. And I've never seen more alignment than I've seen at PI. We we talk about the alignment tax.
Like when we're bringing someone on, how much work is there to align them around our mission, our way of seeing the world? And almost everyone that joins, there's like basically nothing.
Uh most of the people that work here have dedicated their lives towards robotics or robotic learning or AI in some form or another, hardware, whatever it might be. Uh and that just allows us to move so much quicker. It's like we need to communicate a fraction of what the average company needs to communicate to someone.
We don't really need to inspire people or motivate them because they're so inspired and so self motivated. Um I think that's probably one of the things that's worked best for us to date. How have you seen your customers react to all the news and chaos around the tariffs?
I think a lot of these companies are not in full commercial production yet.
So it's not like hey we we're no longer making money but how are they thinking kind of like long term just given how much of the supply chain is based in in Asia and are there opportunities for you know you know new uh US kind of like subcontractors and and manufacturing companies to sort of service this new industry.
Yeah, the good thing is I mean good and bad it's also subscale right now like most of the money is being spent on R&D rather than scaled production and so it's not as if there's 100,000 robots that everyone's buying and it's now just twice as expensive.
I think the good thing is that given it's subscale there's a lot of time to build out US supply chains and it's putting a lot of focus on figuring out can we get US actuators can we can we start to create companies that are developing those and all the other critical supply elements of the supply chain so it's actually just getting people into gear and maybe it's the right time for it.
Can you talk a little bit about um what you're excited about on the data center side?
Is there a moment where you're like really pulling for Stargate to come together and we need the the the 100 gigawatt data center to crunch um you know all of the data that you've collected in you know kind of a GPT5 class training run or is that something that's like so far out that you'll always be able to just uh you know uh tag along on the residual capability from the large language model labs.
Yeah, we are we are not there yet in terms of like having a full scaling law the same way as we've seen for LLM companies where you can just translate prog compute to progress or to capability. We we are searching really really uh really heavily for that.
We we're trying to figure out what is the recipe that would scale like this but we are not there yet. We do at the same time generate a ton of data.
I think that's one thing that that I realized since starting the company is that robots generate a ton of data and you don't need that many to generate data that is close to the levels that LLM companies use for for their models and there is no ceiling to it. Yeah. Right.
Not that we run out of the the data that robots can collapse. It's not like the internet.
Y so I think over time it's quite likely that that uh the the places are going to switch a little bit where most of the models including you know LLMs and BLMs are going to be using real world data collected through robots because that's the data that has no ceiling and it's very active as opposed to just passive observations of what people wrote on the internet.
Um and I think at that point probably the the question about data centers and compute is gonna is going to be a big one but for our models we are not there yet. We are not bottlenecked by you know if only we had 100 times more compute everything would have worked so much better. Yeah.
How do you guys think about uh demos long term?
We joked on the show recently after seeing the the Chinese uh humanoid marathon like I want to see humanoids doing like big wave surfing, cliff jumping, you know, at what point is that like worth even doing or exploring just because of the amount of attention that it would bring to the industry?
What do you think we should demo? I think I'd like to see one of your robots surf Jaws. I think that's that's really I was saying I want to see I want to see a robot do the 900 on a halfpipe Tony Hawk style. That was a really foundational moment in, you know, my childhood and American skateboarding culture.
Really life or death stuff. Exactly. It's got to be high stakes. Yeah. What about with with hands? What about manipulation? Uh juggling for sure. Rubik's cube for sure. Uh juggling Rubik's cubes. You can do that. I can do the Rubik's Cube. Jordy can juggle.
We need to learn each other's skills so we can do both at the same time. Um but yeah, I mean these stunts uh when they're done right, they can draw a bunch of attention. Although you guys have plenty of attention.
I don't know if that's really what what's No, but it's an interesting thing where it's like you have the the intention of the entire industry, but then at some point, you know, to basically inspire the next, you know, however many thousand robotics companies.
I I I would love to know about um obviously with any AI project, there's always the public perception of like job displacement, dystopia, AI doom, etc. But when I look at the demo that you just posted, I'm like that thing is going to be fighting on my team in the singularity.
like this is a friendly robot that will be defending me. But how do you think about the like tuning the language interaction uh so that it do you see a world where um yes it's doing my laundry or or making my bed but if I happen to just also ask it tell me about the news I can just have a chat with it.
Is that something that's even uh in your mind in terms of like human computer interaction? It's not a big focus of ours right now really.
We're so focused on on manipulation and economically valuable tasks and and and more so than that the fundamental building blocks that we think gets us from here to physical intelligence. I think it's inevitable that everyone has robots in their homes, their workplaces, uh just like in their lives.
And I think they'll want robots that are more useful than just doing things. Whether it's companionship or like it's the best Amazon Alexa that can actually then go like cook the recipe that you ask it for.
uh it's a place we focus u around the interaction but right now it's more understanding the intent of clean my kitchen and then breaking that down into tasks but it's pretty straightforward to go from do the thing to tell me about the thing to let's have a conversation about the thing and so it's it's on the the horizon but not the greatest priority and then one thing there I would say is with the models we've been releasing they're actually built on top of vision language models so these are the models that are truly what they call multimodel where you talk to it.
You can ask it what they see in the image and every now and then you can ask it to perform actions too. That's cool. And what we what we start to realize is that all of these different data sources contribute to each other. They give you just like a bigger picture of what the world is like and better understanding.
And it just turns out that robot actions is just like yet another language that these models can speak and they just need to learn it and see enough examples of it. So the model that we have already is the model that you can talk to and it works you know just as well as as open source BLMs.
Um but on top of that it can also have that understanding be very embodied and it and you know it's it understands what it sees in front of it and it's a much deeper understanding when it knows how to how to move its arm to actually accomplish a task.
We were kind of joking before the show about the obvious comparison between uh robotics and self-driving cars. Uh, but can you explain to me like I'm a 5-year-old or like a venture capitalist like why is that a bad analogy? Why don't you love that that analogy? I feel like it's it's it's not that you don't love it.
It's just that it um it can put you down a bunch of wrong directions. There's a lot of parallels, but uh I mean even like the Whimo Tesla thing, right?
Like Tesla has this incredible advantage with how much data they're collecting and passively yet Whimo is so much better so far and it has so many fewer cars on the roads.
There's useful things about it, but there's also aspects that that don't transfer in the analogy and it uh I think the like the reason why we were joking about it is it's the number one question.
Yeah, we don't talk to many 5-year-olds, but investors and VCs ask us and so we have to go down this rabbit hole where we're breaking down all of the assumptions and correcting some of them and validating some of them.
I mean, is that a is so so should VCs if they're looking at the robotics market just throw out that analogy entirely or should they be saying like there are a set of robotics companies that are in the Whimo category and there's a set of robotics companies that are in the Tesla category and those are reasonable uh like an an ontology to map to.
Yeah. No, it's a there's a lot of very useful stuff in the analogy. Um, I think uh I I think one thing that's interesting is that there are all these self-driving companies that have died uh over the past 15 years. And one thing that we actually like to remind people is that this is not coming tomorrow.
You log on to Twitter and you'll see all of these crazy robotics demos. Most of them teleyoperated or most of them being like robots doing backflips, which is a much easier problem than actually a robot folding laundry.
And the thing we really try and and remind everyone that looks at investing in us or is thinking about investing in us is this is not a problem we're going to solve tomorrow. There's fundamental research breakthroughs that that we need to make.
And much like self-driving had a it's what like a 15year arc at this point there is a very high likelihood that robotics is the same way. Um like we we think our greatest competition is science itself. It's not like this company or that company. It's just maybe we can't pull it off in our lifetimes.
We think we'll be able to. It's looking more and more likely, but it's uh it's not a tomorrow thing. I have one last question. I know you enjoy food and cooking. What is the final eval? the Mount Rushmore, the Mount Everest of cooking that you expect will be the last the last dish that a robot will be able to cook.
What's the hardest dish for a robot to cook? The Don Angie lasagna. Oh, yeah. Yeah. Okay. Very difficult. So, so when when when when they cook that AGI achieved, it's game. It's game over. That's amazing. Amazing. Uh there was actually a recent AGI a AGI benchmark.
Someone sec shared a screenshot and uh it was a very old definition of AGI. It said it'll be able to describe a sheep, tell you three things that are larger than a lobster, and all of and AGI is here by that definition, but one of the one of the things that it can't do is bake you a cake.
Uh, and and we just thought it was funny that like that was the that was the last thing that the that the computer can't do. Um, but maybe soon, maybe future. But thank you so much for coming on the show. This is a fantastic conversation.
Best of luck to you and and and thank you so much for for building this this really important technology. Yeah, we're excited. Put a put a robot in the studio when you're ready. Send it over. It's a mess. I have clothes all over here that need to be folded. Uh, so we'd love to have one. Awesome. Live demo. Bye.
Next up, we got Sam Lesson coming in the studio. Venture capitalist yapping about venture capital. Uh, we will bring him in when he's ready. Uh, in the meantime, I will do some ads. We'll talk to you about Wander. Find your happy place.
Book a wander with inspiring views, hotel grade amenities, dreamy beds, top tier cleaning, 247 concier service. It's a vacation home, but better, folks. We can also tell you about ramp. Time is money. Save both. You heard me slip it into the Perplexity interview. I'm going to try and slip more ads in.
We've been hearing a lot of feedback that there aren't enough ads on the show. We are working hard to remedy that. But we have Sam Lesson here in the studio. Welcome to