From scenes to simulations: The rise of world models

Share

The discussion at Web Summit Lisbon 2025, featuring Cristóbal Valenzuela, Founder & CEO of Runway, and Katie Drummond, Global Editorial Director of WIRED, delved into the transformative concept of world models in generative AI. Mr. Valenzuela introduced world models as a “new kind of camera,” drawing parallels with the historical evolution of photography. This “camera” is moving beyond capturing reality to simulating it, initially for creative industries, but now poised for broader applications.

(This article was generated with AI and it’s based on a AI-generated transcription of a real talk on stage. While we strive for accuracy, we encourage readers to verify important information.)

Cristóbal Valenzuela, Katie Drummond

He explained that world models aim to understand physical reality, including 3D geometry and spatial dynamics, enabling AI to predict actions and consequences within simulated environments. Unlike language models, world models learn how the physical world behaves. This capability allows for a deeper, intuitive understanding of complex interactions, akin to human mental models of reality.

 

Applications are extensive. In video games, world models could facilitate real-time rendering of interactive, nonlinear experiences. For robotics and self-driving cars, they offer a crucial solution to data collection challenges by generating vast simulated training data. This allows AI systems to learn efficiently from virtual environments, accelerating development and deployment across industries.

 

Runway’s development strategy is “video-first,” leveraging video’s high information density. Mr. Valenzuela emphasized multimodal training, incorporating diverse datasets like videos, images, texts, scientific papers, and natural observations. This comprehensive input helps models understand the world without human biases, fostering a more objective and robust learning process for AI systems.

 

Key challenges include achieving real-time processing and securing sufficient compute capacity (GPUs). Mr. Valenzuela noted that the rapid normalization of AI advancements makes it hard for the public to grasp future progress. Robust moderation and safety protocols are crucial for generating consistent, interactive, and realistic pixels in real-time, requiring significant cultural and social adjustments.

 

Regarding data for training, while multimodal inputs are diverse, the primary concern lies with protecting intellectual property in the *outputs*, such as characters owned by studios. Mr. Valenzuela asserted that AI models “understand” data rather than “photocopying” it, using this comprehension to reason about the world and create novel content, not replicate existing works. This distinction is key to addressing copyright concerns.

 

Looking 2-5 years ahead, Mr. Valenzuela envisioned a future where most screen-based experiences are generated. This includes personalized, real-time video content for learning, custom interactive games, or dynamically created movies. He believes this will democratize access to information and experiences, offering opportunities previously unavailable to many, fostering a net positive societal impact.

 

Mr. Valenzuela concluded that world models are currently in their “GPT-3 era,” with major advancements in consistency and inference speed anticipated within 12-18 months. He boldly predicted that by 2026, world models will supersede LLMs as the dominant AI topic, signaling a profound shift in the technological landscape and how we interact with digital realities.

 

Related
When AI becomes utility: Brand decides who survives

When AI becomes utility: Brand decides who survives

November 28, 2025 - 2 min read
Related

Growth expo: Tomorrow’s big players

November 26, 2025 - 3 min read