The Deep View: Nvidia solves key challenge in robotics. “At Nvidia GTC Taipei at Computex, the company unveiled Cosmos 3, a new generalist world foundation model that it calls a “fully open omnimodel,” capable of reasoning and generation across text, video, images, ambient sound and action.”

Nvidia solves key challenge in robotics
As the AI industry looks beyond language models, Nvidia is betting big on the buzzy new technology powering physical AI: world models. 
At Nvidia GTC Taipei at Computex, the company unveiled Cosmos 3, a new generalist world foundation model that it calls a “fully open omnimodel,” capable of reasoning and generation across text, video, images, ambient sound and action. This iteration of the Cosmos world model family builds on a previous generations by providing improved generalization capabilities, which is a major barrier to physical AI development and deployment. 
“We wanted to build this Cosmo 3 model to help physical AI developers to build more generalizable physical AI models,” Ming-Yu Liu, Nvidia’s VP of Cosmos Labs, told The Deep View. 
Cosmos 3 debuts a number of world model innovations, Liu said: 
The model utilizes a new architecture called “mixture-of-transformers,” which combines the best aspects of two types of transformers: one for reasoning and one for generation. This enables it to understand object interactions, motion, and spatiotemporal relationships before generating video or action paths. Cosmos 3 also doesn’t treat just one kind of data as a first-class citizen, said Liu. Instead, being omnimodal, it reasons with and generates “image, video, sound, and action, together with text,” he said. Additionally, Cosmos 3 is trained on one of the largest multimodal datasets for physical AI, spanning 20 trillion tokens, 1 billion images and 400 million authentic and synthetic videos.  
The model comes in several sizes: Super, the larger model for high-quality physics and accuracy, and Nano, for more efficient, quick generation needs, both of which are available now. Edge, which offers real-time inference for edge computing, will be available soon.
The models are also open-source, which Liu said offers developers more control and usability in physical AI development, a process that can be “challenging to do with API assets only.” That allows enterprises to run them locally, customize them for their needs, and better control data security. 
Because the foundation models themselves are “just a starting point for physical AI developers,” the goal is to integrate these models into ecosystems to provide a foundation for solving critical problems, he said. 
Cosmos 3 is just one step in the right direction in solving one of physical AI’s most pressing challenges. “We believe that the key problem to solve in physical AI is the generalization capability of the agent,” Liu said. “To be clear, [Cosmos] is not yet solving the problem, but I think this architecture provides a great foundation to solve what I think is the holy grail in robotics.” 
With Cosmos, Nvidia is feeding the open model ecosystem, both for the benefit of the ecosystem and for its own benefit. Along with providing the foundation for developers to create what Liu calls robotics’ “holy grail”, any opportunity to feed a market that will inevitably demand more compute is an opportunity for Nvidia to make money in the end, as well as potentially make its own chips better through extreme hardware co-design. And while the benefits would extend back to Nvidia, a rising tide lifts all boats. As the industry broadly embraces the promise of physical AI, Nvidia’s sharing of its resources and innovation will help stimulate further innovation. 

Unknown's avatar

About michelleclarke2015

Life event that changes all: Horse riding accident in Zimbabwe in 1993, a fractured skull et al including bipolar anxiety, chronic fatigue …. co-morbidities (Nietzche 'He who has the reason why can deal with any how' details my health history from 1993 to date). 17th 2017 August operation for breast cancer (no indications just an appointment came from BreastCheck through the Post). Trinity College Dublin Business Economics and Social Studies (but no degree) 1997-2003; UCD 1997/1998 night classes) essays, projects, writings. Trinity Horizon Programme 1997/98 (Centre for Women Studies Trinity College Dublin/St. Patrick's Foundation (Professor McKeon) EU Horizon funded: research study of 15 women (I was one of this group and it became the cornerstone of my journey to now 2017) over 9 mth period diagnosed with depression and their reintegration into society, with special emphasis on work, arts, further education; Notes from time at Trinity Horizon Project 1997/98; Articles written for Irishhealth.com 2003/2004; St Patricks Foundation monthly lecture notes for a specific period in time; Selection of Poetry including poems written by people I know; Quotations 1998-2017; other writings mainly with theme of social justice under the heading Citizen Journalism Ireland. Letters written to friends about life in Zimbabwe; Family history including Michael Comyn KC, my grandfather, my grandmother's family, the O'Donnellan ffrench Blake-Forsters; Moral wrong: An acrimonious divorce but the real injustice was the Catholic Church granting an annulment – you can read it and make your own judgment, I have mine. Topics I have written about include annual Brain Awareness week, Mashonaland Irish Associataion in Zimbabwe, Suicide (a life sentence to those left behind); Nostalgia: Tara Hill, Co. Meath.
This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

Leave a comment