Fei-Fei Li on World Labs' Marble: the first public 3D world generation model and the spatial intelligence frontier

Nov 12, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

Speaker 1: We're doing fantastically. I I I have so many questions about, what you're building. I'd like to start with why you're building it. Specifically, what was the moment where you decided to go and start the company? Because, the big institutions, haven't exactly given you the cold shoulder. You've been associated with Stanford and Google and on the board of Twitter at various times. You could be doing work inside of a hyperscaler, I'm sure. Why did you wanna build a company outside of the big labs?

Speaker 9: Well, because we had a vision. Mhmm. We had a vision we really believe in. We believe that AI as a civilizational technology will will help humans and superpower humans in many aspects of the intelligence industry. Mhmm. One of them is spatial intelligence, and that is something that is, in my opinion, still still in the in the in the budding stage. And our vision was ahead of most people, and this is something that my cofounders and I really feel passionate about because because without this, AI or AGI would not be complete. And we wanna pursue that vision and that mission.

Speaker 1: Talk about how you think about spatial intelligence that there's so many data primitives from GIS data, GPS data. There's, you know, dot clouds and all sorts of different ways, even just video footage like what we're generating right now. We're generating spatial data to some degree. How do you like, where where did you think the initial the initial data sources were, like, lacking? And then how did you want how do you think about building up the stack to kind of advance this and move it a step forward?

Speaker 9: Yeah. You're totally right. Spatial intelligence is actually as huge or horizontal as, say, linguistic intelligence or language intelligence. So there is many aspect of it. But fundamentally, I think it's it's it's deeply perceptive. Mhmm. You know, it involves seeing and understanding. It involves reasoning. It involves the ability to create in the mind's eye what what worlds look like and also the fundamental ability to to be able to interact with the with the worlds, whether it's physical or virtual in a very profound way. And if you put all these together, we haven't had technology that can can really do all of the above, and we're still on a journey to unlock that. But one of the things we just have done is really putting out the world's first three d world generation model that is available to everybody. And not only it's generative, it allows users to interact and create and edit. And your your question about data is is is actually a a important one because it's hard. Unlike language, unlike videos, unlike images, you don't have too many three d data too much three d data out there on the Internet. They're much more specialized. So as a group of technologists, we actually have to we have to just work harder and, you know, and and, frankly, we use a hybrid approach in data. Some are large scale from the Internet. Some are large scale from simulation. Simulation. Some are real world captures, and a lot of data also comes from algorithmically processed data. That's really important because it's just a much richer problem. Three d is a much richer problem.

Speaker 1: So the, I mean, the product that we just saw, it it looks a little bit like a video game. I can imagine a bunch of different ways that this gets built into something that feels like a video game. Or, maybe there's this new trend of folks who are using AI tools almost as, these they just enjoy generating songs on Suno, or they enjoy just creating for the sake of creating. It's like you sell a guitar and then someone plays the guitar, and they never commercialize their music. They just enjoy playing the guitar. And so I'm wondering if you have a view on, is it is it too early to tell what your customer base or what the user base will look like? Are you just hoping to get this out and then see who shows up? Maybe there's a robotics company that needs a bunch of data, and so they use your model to generate synthetic data. Or there's people that just have fun with it, and they wind up paying because they enjoy the service.

Speaker 2: Yeah. Wonder if people underrate world models because they look and feel today often like video games, and so they don't understand kind of the significance. And so, yeah, would love to understand. Yeah.

Speaker 9: You're you're so right. I think people underrate world models because we haven't had world models. So it's not people's problem. It's that the technology is really hard. It's the next frontier of AI. And and like I said, this is the first time a generative three d world model is available to everybody. And humanity has only known one world that which is the three d physical world. But now suddenly, we are in a multi universe multiverse situation because of this technology. So who will use it? I agree with you, actually. You know, just the sheer joy of the immersive experience for for the sake of enjoyment itself is really fun. I I was just talking to a user earlier today about the VR button. You know? We on our product, click on the VR part button. It takes you to if you have a headset, you can just purely enjoy the immersive experiences and, of course, add some favorite music of yours. That will be even better. But dialing all the way to professional use, there is a a lot of professional use cases. If you go on our website, we have a page called MarbleLabs Yeah. Where we show creators from VFX, from game developer, developing world, from design world, architecture, from robotic simulation, even clinical research, which I personally never thought about, are are already finding use cases. So it's very horizontal. But I think creativity and simulation are going to be two very large areas that people will will find this Mhmm. Really useful and superpowering.

Speaker 1: Okay. Help me understand tool use in the world model context. So there was a moment where the the large language models, the transformer based GPT three, GPT four, those models were getting really good at math, and they were basically just memorizing math. And you could just ask them, what's a 100 plus 50? And it would just tell you one fifty, and it was sort of learning math. And then at a certain point, it broke down, and it couldn't do really complex math just in the actual weights. And so the solution was to teach it to write Python and give it a Python REPL and let it write some code. And then you get the perfectly accurate math when you need that from the not just the LLM weights, but the actual model, the experience. And so what I'm interested in is that I've seen, I've seen evidence from Genie and your model and world models generally. There's this idea of persistence and the base model getting good enough that if I go up and I paint the wall and I come back later, the paint's there. The model remembers this. Maybe gravity. There's all these things that are simulated. But at a certain level, it feels like if I need an inventory in my game, maybe we should just use a database. And so I'm wondering where you see tool use coming in to layer on top versus your where where you think you're on, like, a scaling law where if you just keep working on the base underlying model, you will get all of that functionality for free.

Speaker 9: Yeah. Great question. So I I wanna answer your question in two different ways because you you ask a very important question technically, which is the scaling law of of Yeah. These models. But then you also ask the really important question of the tool use. Right? On the scaling law side, three d generative world or or just world model in general is actually a a earlier technology than language models.

Speaker 1: So Okay.

Speaker 9: We're seeing budding signs of scaling law, but but this is a interplay between model architecture as well as data. So what I can say right now is we're absolutely seeing budding signs of it. Mhmm. But it's not the the the curve hasn't gotten to the point where we see LLMs, you know, yet. Yeah. So which makes this technology really, really exciting and and being at the forefront of it, we really do have the best three d generative world model is is exciting. Yeah. On the tool use side, this is where I find it fascinating. I think when it comes to the usage or the use cases of world model, there's a lot of professional use cases, whether we're talking about VFX, you know, for movie industry or or game developers. And when it comes to professional use cases, I personally really believe that superpowering creators and developers is very important. That means it needs to give them the sense of control and and agency. You know? If if if you try our model and product, there is an option just prompting and and result out. And sometimes it's a slot machine, but it's it's it's setting to be a pretty good slot machine.

Speaker 1: I love a slot machine.

Speaker 9: Yeah. And that's great. Right?

Speaker 1: It's amazing.

Speaker 9: But as a creator, in your mind's eye, there's actually so much nuance. There's so much story in your mind, so much so much you wanna express, whether it's for a game or for a movie, for a story, for for even, like, trying to get the robot to do train the robot. So we actually put a lot of thought in using native AI capabilities to allow the the the users or the customers to engage in the editing and controllable creative loop. So if you're prompting something and you don't like the the wall color here Yeah. We actually allow you to say, we'll change the wall color from green to purple or say, add this object Yeah. Or say, I wanna expand here or allow you to stitch together the worlds that you see or allow you to actually give a three d layout

Speaker 1: Yeah.

Speaker 9: So that you you can put the couch where it is. So so this is very important, and this is the tooling side of it. Right? It gives the controllability and agency to the human collaborator.

Speaker 1: Yeah. You you can go.

Speaker 9: By the way, my team told me I have to put this on now.

Speaker 2: There we go. Oh.

Speaker 9: Yeah. Thank you. You're such a

Speaker 1: sweet truck.

Speaker 2: Love it.

Speaker 1: You're doing fantastic. Love it. That.

Speaker 2: Yeah. What what kind of progress, can we expect from World Labs on the next two, three years? Like, how what what are your kind of ambitions in terms of just general progress

Speaker 9: Oh my god. Model. Two, three years. I I think we need to two, three years is a long time. I things are moving so fast. Spatial intelligence is so horizontal, and it's a platform. I do envision that Marble, our product, is a platform that can empower creativity simulation. And and I I think we're gonna see a sea change in multiple use cases on the gaming side, on the VFX side

Speaker 4: Mhmm.

Speaker 9: On the metaverse side, on the robotic simulation side, on design side. All these are really I wouldn't say it's it's almost ripe for changes, and we we're seeing that level of excitement from many, many u users now. What

Speaker 2: what game developers or platforms do you think will be most quick to adopt, world models? Because it feels like Feels like like Roblox I I'm sure you've chatted with them. Like, Roblox should be jumping on this because it could be something that any of their users could get a lot of value from immediately, speed up that creative process. But I can think of any and some of the visuals we've shown on the screen, I'm just like, wish that I had World Labs when I was playing first person shooter games as a kid because you could just generate infinite maps for the games that you're playing.

Speaker 1: You're watching this, and you're like, yeah. The the the environments look beautiful. I just wish I had a assault rifle in this game, and they were bad guys.

Speaker 9: Yes. And if you look at our page on MarbleLabs, you there are, you know, shooting Yeah.

Speaker 6: Shoot games.

Speaker 9: Examples already. So Yeah. I don't know. I I I I think I think that the the the sky is the limit Mhmm. Literally. Of course, I think what's exciting in this Gen AI era is that a lot of a lot of just creators and technologists, many of them are individual developers or indie, you know, studios and and teams, small teams, are really jumping in really fast. And and general AI models like ours really lowers the bar of entry for many of these many of these use cases. So we

Speaker 2: we're

Speaker 9: we're also to be honest, we're also talking to bigger teams. I I'm actually very pleasantly surprised by the the level of enthusiasm from bigger teams as well.

Speaker 1: The team's the team's going crazy. They lot of fun. They're actually playing

Speaker 2: Is it is possible that we're in a we're in a WorldLabs model Simulation. Simulation at this very moment? Have you

Speaker 9: Sure. Yeah.

Speaker 1: I'm glad After this time travel.

Speaker 9: Generated this way.

Speaker 1: Getting back to some

Speaker 2: The model the model is getting really good.

Speaker 1: Yeah. I I I would love to know your thoughts on, market structure and how, things might play out because I I've been completely convinced of your vision for how where we are on this technology, how early we are, and and what's going to happen over the next few years is, like, just the I don't wanna call them easy wins, but, like, the logical play out the scale, and you get somewhere really exciting. But I'm interested in the market dynamics. So, in in chat, in LLM, we saw sort of a somewhat of a monopoly, you know, emerge, a clear winner in consumer chat knowledge retrieval with OpenAI's ChatGPT, of course. Now other firms are obviously working on that and competing. Then in the enterprise, you have more of an oligopoly maybe emerging between Anthropic, Gemini, and and OpenAI's API. There's obviously a long tail of other providers. Then you have the wrappers. And I'm wondering if there's gonna be something similar that happens in world models. Is is it worth thinking very hard about being the winner in consumer specifically and being not partnering with Roblox, but being the next Roblox, being the platform that wins consumer even if it means pulling back from an API business in the short term? Or is it just too soon to tell, and it might be kind of totally different? I'm just wondering how you're thinking about the long term market structure for what feels like a distinct technology from LLMs.

Speaker 9: That's a great question, actually. By and large, I think it's a little bit early to tell. So right now, like I said, we we just rolled out the first, you know, generally available publicly available model. Yeah. And we're going to focus on being a model company for for a while because science is still early. Yeah. We we we wanna move fast, and and and there is a lot to be unlocked. Yeah. I also think that I don't think we're a model in in a way that we are defining it, especially in the deep spatial intelligence way, is as consumer as a a chatbot. Mhmm. Because I think, you know, like, three d is a a representation. It's a it's a medium that is that has its own own characteristics.

Speaker 1: Mhmm.

Speaker 9: So I think that we're gonna see we're gonna see products that are going to build around these models in in in different ways. Some are for creators, some are APIs, some are for possibly even different use cases. And I'm not saying there's no consumer. I I actually think the market might surprise us for especially on the metaverse side.

Speaker 1: I think it's gonna surprise you. I I I I mean, it's gonna be a huge consumer category. Yeah. There's there's a lot of work that needs to be done to actually make it a place where people wanna hang out for hours and hours and hours. But Yes. But but I I I I do think it's gonna happen, and I think that it it's a bit of a race. But it's it's extremely exciting to even just be on the at the at the at the, at the model layer and then, you know, experiment at the application layer because

Speaker 2: We've seen

Speaker 1: model companies

Speaker 2: get in get in get into the application layer.

Speaker 1: Yes. Yes. Course. Exactly. Of course. Right? Course.

Speaker 9: Yeah. No. We're we're absolutely I mean, Marvel itself is a application Of

Speaker 1: course. Today Of course.

Speaker 9: SaaS app.

Speaker 1: Yeah. Yeah. Yeah.

Speaker 9: But we are gonna focus on model. This is the World Labs is a deep tech company, and I think focusing on model is the right thing to do right now.

Speaker 1: Yeah. How are you thinking about fungibility between formats? It feels like there's maybe a uniquely unique value in having a world model that can export Gaussian splat, but then also export geometry, also export, just an m p four for different stages of, like, a VFX pipeline, for example, something that Hollywood might wanna do. Do you think that you need to have, like, translational models, or is there, like, one model that becomes, like, multimodal that can handle all the different formats, whether I want an an OBJ file and I want a three d, you know, point cloud or an Alembic? I can just get whatever I want from the same model, or do I need specific models to translate between them?

Speaker 9: I don't know if you have played Marble yet. We are actually multimodal. All Yes. We we are right now exporting, you know, Gaussian splats, mesh mesh colliders, m p fours Yes. And images and panels.

Speaker 1: But is that is that all from one one model that's trained, or is there, like, a translation layer between the outputs? Is there, like, one foundational truth and then like, it's like, ChatGPT is not specifically trained on there's not a Spanish version and an English version and a French version. It just learned every language when they did the big pre train. Is it the same thing, or is there actual, like, or is there a, like, a translation layer that you need to do in a series of different models?

Speaker 9: Oh, what I can say now is we are a foundation model. Model is a

Speaker 1: foundation model. Okay. Yeah. Got it. Cool.

Speaker 2: More more broadly, where do you think, AI is underhyped, and where do you think it's overhyped? Oh.

Speaker 9: Oh, that's a spicy question. Whole tomato is underhyped.

Speaker 6: There we

Speaker 2: go. Yeah. Talk your bug.

Speaker 1: Just talk your own bug. I love it. I agree, though. I completely agree. I think this is a deeply underhyped category.

Speaker 10: Yeah.

Speaker 9: So so to be honest, you know, one one thing, I consider myself a scientist in my heart, and I actually really don't like hyping. Yeah. I think, you know, that's just you probably have seen me not on the hype train

Speaker 6: Mhmm.

Speaker 9: In most of these discourse. Right? So as a scientist, I think world model is underappreciated because it's so new. And in terms of overheated, I do think that Silicon Valley as a whole tend to mistaken clear vision with short distance. Mhmm. Sometimes for example, 2006, Stanford self driving car drove the first a 140 miles in the Nevada Desert for mankind self driving car. Yeah. It took twenty years for Waymo to be barely on the road as a l four, and there's still it's still so limited. Yeah. And and this is a massive amount of effort, and it's also riding on the coattail of a very mature industry, especially a hardware and then distribution and use cases and all that. So this is an example to show you. There are clear visions. For example, you know, robots that can do all kind of house chores. And and as as a mom, I'll tell you that that would be amazing. You know? Start with cleaning my bathroom.

Speaker 1: Mhmm. But

Speaker 9: but that clear vision is important, but the journey is gonna be long. It's you know, we have issues with hardware. We we still need to figure out data. The world model brain of the robots also need to improve and and and all that. So that's just an example. And and also creativity. Right? Creativity industry, which we are also engaging in. One thing I don't I don't call it a hype. I call it a misleading sentiment, which actually really bothers me is that I don't we don't wanna replace human creators. You know? Human creativity is precious. It tells the story of our species, of our culture, of our society, of our community, of each one of us. What we wanna do is to superpower and augment creators' capabilities. And sometimes the the communication of technology is a little skewed towards, oh, the model can do all the work in creativity. And I I personally really wanna highlight human creativity as as something that's so fundamental to who we are, and and AI is here to help and augment.

Speaker 1: Can you help me understand the relationship between the current AI hardware build out and your technology, your model, your business? I would imagine that you're a beneficiary because there's all these data centers. There might be some slack capacity in the future. But at the same time, I've been hearing that a lot of the, LLM, foundation model companies, are are partnering with NVIDIA to try and steer the development of the next generation of chips to be hyper optimized for LLM inference. And that might be okay for you, but it might actually take us, you know, away from something that's optimized for for what you do. And so I'm wondering if if you could just take me on a tour of what's exciting on the AI hardware build out from your perspective. Where are the risks?

Speaker 9: Yeah. So, actually, I was just in London receiving this award with both the algorithm folks as well as the hardware folks for AI. Congratulations. Yeah. So I think as as you Is that a AI generated?

Speaker 1: It's the queen no. No. It's not AI generated. I believe it is recording of an audience clapping, but Jordi triggered it from his soundboard. That is the Queen Elizabeth Prize for Engineering from King Charles himself. Congratulations. We had it here to to mention to you, but I'm glad you brought it up.

Speaker 9: Yeah. Because you're talking about hardware. Yes. Literally the room with my friend Bill and Jensen. So as you know, the the most exciting the most exciting thing for for for modern AI for the birth of modern AI is the convergence of hardware, software, and data. So so from hardware side, I think it it's it's it continues to evolve. For world modeling, especially our technology relies on rendering Mhmm. We do think we we will keep our conversation, our side of the conversation going with the chip makers. Mhmm. Because some of the our requirement, especially on the rendering side as well as on the training, the model side will be somewhat different from LLMs. And, you know, I I would love to I will, and I would love to call for the chip industry to also work with us on this on this front. What is exciting? I think that's a great question, actually. I'm trying to so I was in Middle East visiting some of the the the construction of data centers, and and and that was pretty epic seeing how how large the this Stargate data centers are are being constructed. And it really feels that we're in a different phase of industrial revolution. You know? We we never none of us lived in the days of the the steam engines, the electricity when the the scale of industrialization in that time was just multiplying for for humanity. I think we are now living in the AI industrialization era and seeing that scaling up is is is quite quite exciting.

Speaker 1: Hypothetically, last question for me. Hypothetically, if I gave you unlimited access to a one gigawatt data center, the best and brightest, the hot off the presses, the freshly installed one gigawatt data center, you know, probably billions of dollars invested in this, Would you would you run your would you do a training run? Would you would you run your, your model? Would you just scale up? Or would you say, hey. I I I would need more time to actually, advance the way we think about training, before we press Yeah.

Speaker 2: That that kind of goes

Speaker 1: the big run.

Speaker 2: Yeah. That that was gonna be my question is, like, at what point does World Labs become capital constrained? And Yeah. Are you capital constrained today?

Speaker 1: Yeah. Or is it just matter less and new ideas? There are different parts of the business that are bitter less than pilled and others that aren't. And I'm wondering, like, what what pieces are are currently on the great scaling curves?

Speaker 9: Yeah. Like I said, the scaling curve, from a model architecture point of view, we're still early. So

Speaker 1: if I

Speaker 9: have if I have one gigawatts and and and and that many chips, I I'm more likely gonna actually run some parallel models and experiments to really hole in the the the the the bet. So the so it doesn't mean we're still trading large models. It's just not at the level of today's LLMs yet. So are we capital constrained? We are capital constrained. I'm just getting

Speaker 2: on start

Speaker 1: hiding more. Okay.

Speaker 2: I wanna see. We'll we'll we'll we'll we'll we'll the gong.

Speaker 1: We're gonna hit the gong.

Speaker 2: We hate

Speaker 1: capital constraints. How much have you actually raised? How much can you do how you how many can you share?

Speaker 9: Publicly, it's known that we've raised more than 200, $40,000,000.

Speaker 2: Publicly. Let me

Speaker 9: explain why. Why did you hit the gong just now?

Speaker 2: Because we're increasing the hype.

Speaker 1: We we hit the call. We heard

Speaker 2: your capital constrained. We

Speaker 1: hate that. Yes. Yes. It's it's a good sign.

Speaker 9: It's a mutual hype. I'm wearing your hat.

Speaker 1: Yes. We appreciate that. And we appreciate you taking the time to come talk to us. Thank you so much. This is extremely exciting, extremely enlighten enlightening.

Speaker 2: And And, yeah, we're very excited for you

Speaker 1: and your culture. Launch. And we'll talk to you soon.

Speaker 9: Yeah. Thank you, guys.

Speaker 2: Thank you. Cheers.

Speaker 1: Bye. See you soon. Before we bring in our next guest, let me tell you about ProFound. Get your brain invention in ChatGPT. Reach millions of consumers who are using AI to discover new products and brands. Me Let also tell you about Linear. Linear is a purpose built tool for planning and building products. Meet the system for modern software development.

Speaker 2: Ask Kari to add this sound effect when

Speaker 1: you Projects linear. Road maps. Imagine. I'm loving the new stingers. Look at that. That looks fantastic. Our next guest is Scott Sanders from Forterra. He's got some big news. How are doing, Scott?

Speaker 2: What's going on?

Speaker 1: Good to see you.

Speaker 6: To see you too.

Speaker 1: First time on the show, introduce yourself. Let us know what you what you do, how you describe Forterra for those who don't know.

← Back to story