Categories

Li Feifei's "spatial intelligence" ambition: when AI began to understand the three-dimensional world

Jan 23rd,2026 76 Views

Li Feifei, the pioneer of computer vision who ignited the deep learning revolution with ImageNet, recently quietly released a big move.

World Labs, which she founded, launched a generative world model called Marble after receiving $230 million in financing. In a few minutes, you can create a freely roaming 3D environment through text description.

There is a key difference to understand here: Marble is not a video, not a frame-by-frame rendered picture, but a truly stateful three-dimensional space. Technically, it is based on nerve radiation field and Gaussian splash, and each scene consists of hundreds of thousands of "splash points" with information of position, color and transparency. What does this mean? The world is persistent, editable and extensible. You can move objects in the scene through text instructions, splicing multiple worlds together, and exporting them to mainstream engines such as Unity and Unreal.

The community has mixed opinions on this. Critics think this is not a real "world model" because it doesn't understand physical laws and causality. It has also been pointed out that similar technologies already exist, and NVIDIA's Lyra project even opened the source code. More people directly said: not open source, not interested.

However, the author of the original post points out a perspective that is easily overlooked: for robots, this spatial representation that can be edited with certainty may be much more important than beautiful pictures. Imagine a robot in a warehouse. When the position of the conveyor belt is adjusted, you only need one sentence to update its knowledge of the environment, instead of rescanning the whole space or relying on expensive lidar.

This reminds me of a deeper problem: our definition of "world model" itself may need to be layered. One is the causal model of understanding physical laws, and the other is the geometric cognition of spatial structure. The latter doesn't look so sexy, but for robots that need to work in a real environment, a lightweight, editable spatial representation that can quickly synchronize the changes in reality may be more practical than a perfect simulation of the laws of physics.

Li Feifei has always kept a low profile, and this time there was no hype. The product is really rough, with obvious jagged edges and limited environmental scale. But if the timeline is extended to five years, when this technology is mature enough to generate in real time, expand seamlessly and be accurate to centimeter level, it may become one of the standard ways for robots to perceive the world.

Of course, the fact that it is not open source really makes it difficult for the open source community to buy it. After all, in this field, closure means that it cannot be verified, improved or trusted. I hope Li Feifei team can continue the spirit of opening ImageNet in the past and give back the core technology to the community after the product matures.

Maxipcb has advanced printed circuit board equipment and sufficient manufacturing materials in stock, We provide standard custom services. If you have a conclusion about the printed circuit board, please contact Maxipcb.