Madic Robotics co-founder explains vision-only home robot approach and path to multi-task humanoid
May 9, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Mehul Nariyawala
today. It's killing me. Let's go. Bring them in and play that effect again. Jordy, welcome to the stream. Welcome to the stream. Congratulations on the Wired article. Congratulations on all the progress. Uh, can you kick us off with a little introduction? Robots and homes. Robots and homes. Yes.
They said it could have been done. They said it was a 2035. Most people, they ship a render. They ship a video. This guy ships a real robot. What a concept. Welcome to the stream. How you doing? Uh thank you guys for having me. I appreciate it. Um I'm a president co-founder at Madic.
We built home robots and my background is in computer vision and product and we prior to this we were at Nest. So that's a quick background and Oh sense. Yeah. Okay.
So how many people thought you were crazy to go make a you know another take on a on a you know the next Roomba you know or whatever whatever the pitch was. I think they still do. Uh I I think it's still why are you doing floor cleaning robots?
That's the that's the question we get quite a bit and and the answer really is that this is the only robot with scale. Yeah. Uh this is the the irony or of what we're doing is that at thousand robot ship we are already the second largest American consumer robots company. Wow.
So we talk about this point of view that there is this perception that navigation manipul and and mapping and localization in indoor environment is solved problem.
uh but I tend to think of it as a firm paradox that if it is a solve problem where are all the robots in our lives why aren't they at the airports right why aren't they at the airports grocery stores why aren't we swarmed with it uh and and the answer is that it's actually quite a hard problem number one number two um economic viability which is making it profitable and surviving as a business itself is a challenge and uh on a flip side making it valuable for customers.
Yeah.
talk about the evolution of the relevant breakthroughs in artificial intelligence and and which ones you are true beneficiary of uh imageet obviously very groundbreaking the transformer architecture haven't heard about that having an impact at least in the Amazon's Roomba context are you using transformers what about LLMs what about different models uses like a vision a vision first vision only approach which John has been nering out about for months.
So I want to know yeah like like like does is it what is this does this lead into the why now or how do you tell the story of the underlying technologies that have led to an improved experience here? A great question. We actually left Nest and Google in 2017 to start working on it because of two trends.
One is a self-supervised learning techniques that were emerging which is what LLMs are essentially they're learning on their own. And then second one was that my co-founder and CEO Namit helped spec out Google coral TPU from Nest perspective.
So that trend of AI chips coming out and compute skyrocketing was a trend we saw coming and between with those two things we thought it was possible to build edge device uh uh edge device robotics and and the reason we thought edge device was critical was we as humans we don't have hard minds latency is really critical especially in a uh dynamic environment that we live in.
So we always thought that robots have an ability to make decisions really fast. Um and specifically for indoor robotics, we just felt like indoor world was built by humans for humans for our visual perception system. So vision only robotics was the only way to go that it needed the same perception system as us.
If you're trying to build let's say level five robots, well level five robots for cars means that cars drive like humans. So inside home, it means that they behave like humans, clean like humans, manipulate like humans.
So it should have the same perception system and those were the trends that helped quite a bit and and uh in doing that and now um if you look at our robot what it does it it builds a Google street view like uh uh map on its own.
So the way we thought about it is we as humans we go into a new home new environment new indoor space we self-explore self map and then remember exactly where we are. So localize can robot do that? Well the answer is yes it can. Our robot does that but it still has the same ability as cats and dogs.
Yeah, we can't tell our cats and dogs to go sit by the couch or go in a living room. They don't understand that yet.
And that's where VLMs and some of these open sour uh source dyno and and clip some of the models that are being released are really useful because now we can extract semantic embeddings and information at the image level out of it and we can actually transfer that into our map at a voxel level.
So each uh uh our maps are built using voxels which is like a 3D pixel 1 cm by 1 cm.
So each walk actually knows that it belongs to a chair or a human leg or a child or a or a piece of furniture and that's when you can uh start asking and doing all kinds of things like hey go clean by the bookcase in a living room and it knows what you're talking about.
So it's really that's the next layer which is turning it into much more of a natural language inter interaction between a robot and a human. talk about the actual training runs that go into your models. Is there a concept of of iteration on the training runs like GPT1, GPT2, 3, four, 4. 5?
He said they spent less than a million dollars on NVIDIA on compute. Okay. Yeah. So, break that down.
uh h how do you think about uh obviously you're you're acquiring data constantly but then is there this pace of let's do another bigger run and and are you is there capital at risk when you when you actually uh roll out a big training run uh great question so I'll take a a little bit of a higher view and come down but there is this concept in selfming car as well as robotics that there would be one god model and that god model would do everything when you look at practicality of for deployment almost always there are multiple models at a smaller level that you uh you do it.
So we've always taken this approach that we're not trying to do research we're trying to build product. So whatever is available to us let's go do it. So our approach has been combination of uh obviously neural nets and and some of the work that's happening but also what we do is uh is referred to as a special AI.
Uh so so that's a term that uh Dr. Fifi Lee really prioritized and we use human information bottleneck principle. Mhm.
So the way we do this, we take we have image toxel neuronet network and then we combine that with long-term slam using both classical and and techniques and build this world and then the physics of the world is permanent. Now as a human being I can uh know that there's a wall here.
There is I know how what will gravity do. So based on that for us it doesn't take 26,000 iterations to learn how to tie shoe laces. So in the same way once you know the physics you can predict things of that physics that I know certain objects will topple over if I were to do that. So that's how we think about it.
So we use information model like principle. So for us computes and data has been critical but it's less of a traditional logistic gigantic um uh uh data set and let robot do everything on top of it.
So it's less of a less of a compute intensity but there is obviously iterations and and is there something like a mixture of experts model that you could kind of pull from and and and design like yeah to kind of scale up the model is that is that relevant at all? Absolutely.
So we have uh our own master uh student models and stuff. Exactly.
And what we and the way we use it is that hey the 3D part of it we use traditional techniques of a spatial AI then adding semantics and understanding and context part of it that's our master instrument model works very well but even for the neural net sometimes for the precision there are master instrument models that we can use for precision as well just to see the way humans see so occupancy network that we have um a lot of lot of it is inspired by the approach that Tesla has taken over the years for to build their full self-driving talk about simulation are using a lot of simulated data.
I imagine that you could pro procedurally generate a million or trillion households with different furniture legs and stuff pretty easily. Uh drive virtual robots around that, use that to generate data. We've heard about a lot of that in the humanoid context. Are you seeing luck with synthetic data for your product?
Abs. Absolutely. I think uh we started with self-supervised learning. Then we realized that simulation and supervised simulation actually works very well as well along the way. Uh so we built our own simulation environment using Unreal Engine with our own robot.
Uh so we've customized it over the years and have a large set of environment but that usually takes it we what we've seen is that it takes us to about 0 to 80% but that the final 20% always comes with the real world data.
So we take we initially train it to create quoteunquote the master model but then the precision and finetuning almost always comes from adding real world data. Mhm. Switching gears a little bit on product strategy. I imagine you have ambitions uh well beyond this initial form factor. Walk us through Yeah.
Walk us through maybe the the was it was this always the the form factor that you were going to start with and at what point do you do you look at kind of expanding from here? Great question. So we always imagine goal was always to go build Rosie the robot.
All of us want that uh something uh sort of Alfred that comes in a home and just takes care of everything. But we thought that the best way to do it is the way human child grows which is in the first five years of human child they just learn how to navigate from a perception perspective.
They're just trying to make sense of 3D rules and they pick up the object and learn that it drops. So in the same way floor cleaning robot allows us to do that. But then we evolve and say just like a 5 to 10 year old child can it start picking up a shoes and moving it around? Can it just organize unbreakable items?
So for 5 to 10 year old, we don't give them knives and scissors and all the risky stuff.
So in the same way, can it start with this task and can we along the way productize it and start shipping and then ultimately put it all together as a full-blown robot and we thought this approach was better because as we did consumer research and we always start with customers and work backwards, we realized that there is a lot of apprehension about robot and whether they can do things accurately.
Mhm. So even though you know robot vacuums have been around for 23 years now, they've only penetrated 13% of the US households. 87% don't even have it yet. So and the reason is because they they're just not that good. They're not accurate. They're actually kind of dumb.
uh so for so for 2002 they were amazing device but they hadn't moved forward and we thought that purpose-built device that solves the problem to the nth level is way to earn customers trust and then it to the second third port task and the way we built it is if you see our current robot we have a black crown we have a black border on the top that's where the eyes and the brain sits and it's very much like a human being and we always thought that we just have to build that once and then it just grows up just like a robot so as robot scales it will scale scale.
So the beautiful part of what we've built is that and then we have demos of it uh in our thing where we can just raise the robot maybe even put it at a 6 ft level and everything just works out of the box.
So it you can put it on top of the humanoid and it will map the entire space for humanoid and and with a six degrees of freedom with the same precision at 1 centimeter level. So are you thinking about adding like a robotic arm so it can pick up a shoe and put it back in the closet? AB: Absolutely.
uh as we as we go into users home. I've been about 200 homes now, real homes now and parents always talk about can you just give me a tall cleaning robot. Uh that's my biggest pain point. So that that comes again and again.
Uh the second thing we've heard is a lot of uh kids talking about their parents who live on their own in Florida or Texas and they don't have a time to go there and they're not techn technologically savvy as well. So they're like, "We'll take this robot.
will actually control it, clean their home, but can you also send me a 10-second time lapse at the end of the day to confirm that um that they're okay? How do you think about privacy? I imagine you have to message like, "Hey, we need the data, but we're going to anonymize it.
You can trust us, but then there's data leaks that happen. " I'm sure this is an important part of your messaging, but what are you saying to people? Huge, huge part of So, prior to we worked at Nest, I was a product lead for Nest Camera, so I know privacy is a big deal.
So, that's why we do the whole thing on their edge device. And then the way we do is just opt in.
uh and we always knew and this is pri our prior sort of at flutter we also did gesture detection where webcam did this that if you build a trust with user there will be a spectrum of users on one end there will be users who says don't ever take even my telemetry data I don't want to share anything on the other hand there will be user who says take everything you want I don't really care in between there are lots and lots of users who would say hey these are the long tail at which your robot fails and I actually want you to help uh uh uh get better so we'll share the data So we have we we haven't even done it automatically but there is a record button on our app and users have already uploaded thousands hours of data with permission on their own.
Do you have evals like like when you train a new model do you put the robot the final eval is is can the madic robot clean up after two toddlers after they've had dinner. I I was gonna say frat party. You know, throwing frat party toddlers make more messy make a bigger mess. Spaghetti and meatballs.
A regular vacuum can't even handle. No, no, no. Instant instant disaster. It needs a scoop and all sorts of stuff. Anyway, Jordan, last question. How How do you expect the humanoid How do you expect the humanoid market to play out over the next few years? Right.
It's obviously an area that you guys want to uh you guys will be competing in that market over time, but clearly made um some some big decisions around how to get there.
Uh there's a lot of companies that have raised so much money that they and and you you know on once you've raised couple billion dollars, people are going to want you to be shipping or at least having you know robots that are that are creating value uh in in these settings.
Um but at the same time like iRoot you know and we talked about this offline. iRoot is the biggest robotics company in the US with 50 million units shipped. Amazon Robotics is a second at 750K. And then Boston Dynamics has only shipped 1500 robots in its entire lifetime.
And so when and but at the same time we just had uh Sonia on from Sequoia Capital a little bit ago and she's like very humanoid pil. He was like, I think they're coming like, you know, quickly. And so I'm curious as somebody who's actually building and shipping robots now kind of how you project out the next few years.
And I imagine uh you must be kind of entertained by it all. It's it's going to be, you know, there's a lot of capital on the line. You know how hard it is, but I'm curious how you think there there that's a great question. There are two pieces of the puzzle there.
One I think you guys have touched in past interviews around accuracy and how accurate can robotics get and the the thing that we talk about internally is with AI today we are collaborating if it gets 90% of right we're pretty happy with it with with robotics especially in a trivial task we almost want always want to delegate we want to set it and forget we don't want to do that last percent because that's a you know I don't want to finish that last corner cleaning my of the cleaning or that one last plate And that actually puts bar much higher.
There's a corary there as well which is we go to school maybe 4 years 8 years learn how to do coding. So if AI gets 90% of right we're mesmerized. But we don't really go to school to learn how to navigate our home or how to pick up a glass or how to vacuum floors.
So the the trivial the task the higher the accuracy expectations for customers because if you make mistakes in just picking up a glass they think of it as a dumb robot. Like come on. Yeah. Thank you.
And and the analogy is imagine someone is helping you set a dining table um for your dinner party one out of 100 times and one out of 100 times they break one glass or one set of plate. Yeah, there's a good chance you're going to fire them. Uh so so the bar of accuracy is much higher there.
So that's one piece and then second thing is the adoption. So this is where General Magic is a good example in our mind. General Magic tried to build iPhone in 1995. Didn't really work. Amazing team all ex people.
20 Fidel was there and in instead what we got was purpose-built device from I uh cell phones to PDAs to iPods to um blackberries and then we combine everything into an iPhone.
So in a similar way we tend to believe that purpose-built robots will see light of the day first and then they will get combined into multi-purpose devices and the more you have a human form the more expectations that customers would have. So there is a home side of it and there is an enterprise side of it.
So we tend to think that in enterprise or factories there is a good chance robots would be used in few years. With homes we think it's a little bit far away. Yeah, that makes sense. Yeah.
And if if you're buying let's say you know a humanoid comes in at at even comparable to something like what Uni Tree is selling right now and it's $40,000. Your expectations on that are going to be gigantic. So it's going to be an uphill battle. But uh we'll have to have you back on as there's uh Yeah. One last thing.
What what were we about to say? I was going to say you were touching on a really great point. We actually talk about it internally that there is no ubiquitous consumer electronics device higher than $2,000. Yeah. TV cars have been right. Cars have been around for 100 years.
Utility is clear even though and and that's usually 10,000 $20,000 and even then it's a considered purchase. We just don't wake up and do it. So the utility and the value has to be proven and then you have to convince customers to say okay it is worth spending 10$10,000 $20,000 and it will survive five years 10 years.
So there is a productization element to the robotics that needs to be paid a little bit more attention to. Well thanks so much for stopping by. This is a great chat. Uh we will talk to you soon. We are going to use our Matics. We're going to use our Maddox at the new studio. Yeah we're excited. We will let you know.
We'll talk to you soon. Bye. Uh, next up we have Fastino coming in the studio. Uh, sounds like an Italian name. Fino. Fino.