Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 21, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 21, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 21, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 21, 2025

      The best smart glasses unveiled at I/O 2025 weren’t made by Google

      May 21, 2025

      Google’s upcoming AI smart glasses may finally convince me to switch to a pair full-time

      May 21, 2025

      I tried Samsung’s Project Moohan XR headset at I/O 2025 – and couldn’t help but smile

      May 21, 2025

      Is Google’s $250-per-month AI subscription plan worth it? Here’s what’s included

      May 21, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      IOT and API Integration With MuleSoft: The Road to Seamless Connectivity

      May 21, 2025
      Recent

      IOT and API Integration With MuleSoft: The Road to Seamless Connectivity

      May 21, 2025

      Celebrating GAAD by Committing to Universal Design: Low Physical Effort

      May 21, 2025

      Celebrating GAAD by Committing to Universal Design: Flexibility in Use

      May 21, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft open-sources Windows Subsystem for Linux at Build 2025

      May 21, 2025
      Recent

      Microsoft open-sources Windows Subsystem for Linux at Build 2025

      May 21, 2025

      Microsoft Brings Grok 3 AI to Azure with Guardrails and Enterprise Controls

      May 21, 2025

      You won’t have to pay a fee to publish apps to Microsoft Store

      May 21, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»NVIDIA Releases Cosmos-Reason1: A Suite of AI Models Advancing Physical Common Sense and Embodied Reasoning in Real-World Environments

    NVIDIA Releases Cosmos-Reason1: A Suite of AI Models Advancing Physical Common Sense and Embodied Reasoning in Real-World Environments

    May 20, 2025

    AI has advanced in language processing, mathematics, and code generation, but extending these capabilities to physical environments remains challenging. Physical AI seeks to close this gap by developing systems that perceive, understand, and act in dynamic, real-world settings. Unlike conventional AI that processes text or symbols, Physical AI engages with sensory inputs, especially video, and generates responses grounded in real-world physics. These systems are designed for navigation, manipulation, and interaction, relying on common-sense reasoning and an embodied understanding of space, time, and physical laws. Applications span robotics, autonomous vehicles, and human-machine collaboration, where adaptability to real-time perception is crucial.

    The current AI models’ weak connection to real-world physics is a major limitation. While they perform well on abstract tasks, they often fail to predict physical consequences or respond appropriately to sensory data. Concepts like gravity or spatial relationships are not intuitively understood, making them unreliable for embodied tasks. Training directly in the physical world is costly and risky, which hampers development and iteration. This lack of physical grounding and embodied understanding is a significant barrier to deploying AI effectively in real-world applications.

    Previously, tools for physical reasoning in AI were fragmented. Vision-language models linked visual and textual data but lacked depth in reasoning. Rule-based systems were rigid and failed in novel scenarios. Simulations and synthetic data often miss the nuances of real-world physics. Critically, there was no standardized framework to define or evaluate physical common sense or embodied reasoning. Inconsistent methodologies and benchmarks made progress difficult to quantify. Reinforcement learning approaches lacked task-specific reward structures, leading to models that struggled with cause-and-effect reasoning and physical feasibility.

    Researchers from NVIDIA introduced Cosmos-Reason1, a suite of multimodal large language models. These models, Cosmos-Reason1-7B and Cosmos-Reason1-56B, were designed specifically for physical reasoning tasks. Each model is trained in two major phases: Physical AI Supervised Fine-Tuning (SFT) and Physical AI Reinforcement Learning (RL). What differentiates this approach is the introduction of a dual-ontology system. One hierarchical ontology organizes physical common sense into three main categories, Space, Time, and Fundamental Physics, divided further into 16 subcategories. The second ontology is two-dimensional and maps reasoning capabilities across five embodied agents, including humans, robot arms, humanoid robots, and autonomous vehicles. These ontologies are training guides and evaluation tools for benchmarking AI’s physical reasoning.

    The architecture of Cosmos-Reason1 uses a decoder-only LLM augmented with a vision encoder. Videos are processed to extract visual features, which are then projected into a shared space with language tokens. This integration enables the model to reason over textual and visual data simultaneously. The researchers curated a massive dataset comprising around 4 million annotated video-text pairs for training. These include action descriptions, multiple choice questions, and long chain-of-thought reasoning traces. The reinforcement learning stage is driven by rule-based, verifiable rewards derived from human-labeled multiple-choice questions and self-supervised video tasks. These tasks include predicting the temporal direction of videos and solving puzzles with spatiotemporal patches, making the training deeply tied to real-world physical logic.

    The team constructed three benchmarks for physical common sense, Space, Time, and Fundamental Physics, containing 604 questions from 426 videos. Six benchmarks were built for embodied reasoning with 610 questions from 600 videos, covering a wide range of tasks. The Cosmos-Reason1 models outperformed previous baselines, especially after the RL phase. Notably, they improved in task completion verification, predicting next plausible actions, and assessing the physical feasibility of actions. These gains were observed in both model sizes, with Cosmos-Reason1-56B showing stronger performance across most metrics. This performance improvement underscores the effectiveness of using structured ontologies and multimodal data to enhance physical reasoning in AI.

    Hostinger

    Several Key Takeaways from the Research on Cosmos-Reason1:

    • Two models introduced: Cosmos-Reason1-7B and Cosmos-Reason1-56B, trained specifically for physical reasoning tasks.
    • The models were trained in two phases: Physical AI Supervised Fine-Tuning (SFT) and Physical AI Reinforcement Learning (RL).
    • The training dataset includes approximately 4 million annotated video-text pairs curated for physical reasoning.
    • Reinforcement learning uses rule-based and verifiable rewards, derived from human annotations and video-based tasks.
    • The team relied on two ontologies: a hierarchical one with three categories and 16 subcategories, and a two-dimensional one mapping agent capabilities.
    • Benchmarks: 604 questions from 426 videos for physical common sense, and 610 from 600 videos for embodied reasoning.
    • Performance gains were observed across all benchmarks after RL training, particularly in predicting next actions and verifying task completion.
    • Real-world applicability for robots, vehicles, and other embodied agents across diverse environments.

    In conclusion, the Cosmos-Reason1 initiative demonstrates how AI can be better equipped for the physical world. It addresses key limitations in perception, reasoning, and decision-making that have hindered progress in deploying AI in embodied scenarios. The structured training pipeline, grounded in real-world data and ontological frameworks, ensures that the models are accurate and adaptable. These advancements signal a major step forward in bridging the gap between abstract AI reasoning and the needs of systems that must operate in unpredictable, real-world environments.


    Check out the Paper, Project Page, Models on Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post NVIDIA Releases Cosmos-Reason1: A Suite of AI Models Advancing Physical Common Sense and Embodied Reasoning in Real-World Environments appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleGoogle AI Releases MedGemma: An Open Suite of Models Trained for Performance on Medical Text and Image Comprehension
    Next Article Build a domain‐aware data preprocessing pipeline: A multi‐agent collaboration approach

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 21, 2025
    Machine Learning

    Step-by-Step Guide to Create an AI agent with Google ADK

    May 21, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Best AI Writing Tools Every Business Owner Needs to Know in 2024

    Development

    Samsung Patches CVE-2025-4632 Used to Deploy Mirai Botnet via MagicINFO 9 Exploit

    Development

    CVE-2025-46188 – SourceCodester Client Database Management System SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-1495 – IBM Business Automation Workflow Information Disclosure Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-4806 – SourceCodester Oretnom23 Stock Management System SQL Injection

    May 16, 2025

    CVE ID : CVE-2025-4806

    Published : May 16, 2025, 7:15 p.m. | 1 hour, 48 minutes ago

    Description : A vulnerability, which was classified as critical, has been found in SourceCodester/oretnom23 Stock Management System 1.0. Affected by this issue is some unknown functionality of the file /admin/?page=back_order/view_bo. The manipulation of the argument ID leads to sql injection. The attack may be launched remotely. The exploit has been disclosed to the public and may be used.

    Severity: 6.3 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Researchers Uncover ‘LLMjacking’ Scheme Targeting Cloud-Hosted AI Models

    May 10, 2024

    This Linux distro is inspired by Windows’ UI – and it works surprisingly well

    July 30, 2024

    UniBench: A Python Library to Evaluate Vision-Language Models VLMs Robustness Across Diverse Benchmarks

    August 18, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.