Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      10 Top Node.js Development Companies for Enterprise-Scale Projects (2025-2026 Ranked & Reviewed)

      July 4, 2025

      12 Must-Know Cost Factors When Hiring Node.js Developers for Your Enterprise

      July 4, 2025

      Mirantis reveals Lens Prism, an AI copilot for operating Kubernetes clusters

      July 3, 2025

      Avoid these common platform engineering mistakes

      July 3, 2025

      RIP, Perfect Dark — Xbox leadership canceled my most-anticipated game, and the developers deserved better

      July 6, 2025

      I keep seeing people at events taking notes on E-Ink tablets — so I tried one to see what all the fuss is about

      July 6, 2025

      “A fantastic device for creative users” — this $550 discount on ASUS’s 3K OLED creator laptop disappears before Prime Day

      July 5, 2025

      Distribution Release: Rhino Linux 2025.3

      July 5, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Token System using PHP and MySQL

      July 6, 2025
      Recent

      Token System using PHP and MySQL

      July 6, 2025

      Create React UI component with uncontrollable

      July 6, 2025

      Flaget – new small 5kB CLI argument parser

      July 5, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      RIP, Perfect Dark — Xbox leadership canceled my most-anticipated game, and the developers deserved better

      July 6, 2025
      Recent

      RIP, Perfect Dark — Xbox leadership canceled my most-anticipated game, and the developers deserved better

      July 6, 2025

      I keep seeing people at events taking notes on E-Ink tablets — so I tried one to see what all the fuss is about

      July 6, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 27/2025

      July 6, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»NVIDIA Releases Cosmos-Reason1: A Suite of AI Models Advancing Physical Common Sense and Embodied Reasoning in Real-World Environments

    NVIDIA Releases Cosmos-Reason1: A Suite of AI Models Advancing Physical Common Sense and Embodied Reasoning in Real-World Environments

    May 20, 2025

    AI has advanced in language processing, mathematics, and code generation, but extending these capabilities to physical environments remains challenging. Physical AI seeks to close this gap by developing systems that perceive, understand, and act in dynamic, real-world settings. Unlike conventional AI that processes text or symbols, Physical AI engages with sensory inputs, especially video, and generates responses grounded in real-world physics. These systems are designed for navigation, manipulation, and interaction, relying on common-sense reasoning and an embodied understanding of space, time, and physical laws. Applications span robotics, autonomous vehicles, and human-machine collaboration, where adaptability to real-time perception is crucial.

    The current AI models’ weak connection to real-world physics is a major limitation. While they perform well on abstract tasks, they often fail to predict physical consequences or respond appropriately to sensory data. Concepts like gravity or spatial relationships are not intuitively understood, making them unreliable for embodied tasks. Training directly in the physical world is costly and risky, which hampers development and iteration. This lack of physical grounding and embodied understanding is a significant barrier to deploying AI effectively in real-world applications.

    Previously, tools for physical reasoning in AI were fragmented. Vision-language models linked visual and textual data but lacked depth in reasoning. Rule-based systems were rigid and failed in novel scenarios. Simulations and synthetic data often miss the nuances of real-world physics. Critically, there was no standardized framework to define or evaluate physical common sense or embodied reasoning. Inconsistent methodologies and benchmarks made progress difficult to quantify. Reinforcement learning approaches lacked task-specific reward structures, leading to models that struggled with cause-and-effect reasoning and physical feasibility.

    Researchers from NVIDIA introduced Cosmos-Reason1, a suite of multimodal large language models. These models, Cosmos-Reason1-7B and Cosmos-Reason1-56B, were designed specifically for physical reasoning tasks. Each model is trained in two major phases: Physical AI Supervised Fine-Tuning (SFT) and Physical AI Reinforcement Learning (RL). What differentiates this approach is the introduction of a dual-ontology system. One hierarchical ontology organizes physical common sense into three main categories, Space, Time, and Fundamental Physics, divided further into 16 subcategories. The second ontology is two-dimensional and maps reasoning capabilities across five embodied agents, including humans, robot arms, humanoid robots, and autonomous vehicles. These ontologies are training guides and evaluation tools for benchmarking AI’s physical reasoning.

    The architecture of Cosmos-Reason1 uses a decoder-only LLM augmented with a vision encoder. Videos are processed to extract visual features, which are then projected into a shared space with language tokens. This integration enables the model to reason over textual and visual data simultaneously. The researchers curated a massive dataset comprising around 4 million annotated video-text pairs for training. These include action descriptions, multiple choice questions, and long chain-of-thought reasoning traces. The reinforcement learning stage is driven by rule-based, verifiable rewards derived from human-labeled multiple-choice questions and self-supervised video tasks. These tasks include predicting the temporal direction of videos and solving puzzles with spatiotemporal patches, making the training deeply tied to real-world physical logic.

    The team constructed three benchmarks for physical common sense, Space, Time, and Fundamental Physics, containing 604 questions from 426 videos. Six benchmarks were built for embodied reasoning with 610 questions from 600 videos, covering a wide range of tasks. The Cosmos-Reason1 models outperformed previous baselines, especially after the RL phase. Notably, they improved in task completion verification, predicting next plausible actions, and assessing the physical feasibility of actions. These gains were observed in both model sizes, with Cosmos-Reason1-56B showing stronger performance across most metrics. This performance improvement underscores the effectiveness of using structured ontologies and multimodal data to enhance physical reasoning in AI.

    Several Key Takeaways from the Research on Cosmos-Reason1:

    • Two models introduced: Cosmos-Reason1-7B and Cosmos-Reason1-56B, trained specifically for physical reasoning tasks.
    • The models were trained in two phases: Physical AI Supervised Fine-Tuning (SFT) and Physical AI Reinforcement Learning (RL).
    • The training dataset includes approximately 4 million annotated video-text pairs curated for physical reasoning.
    • Reinforcement learning uses rule-based and verifiable rewards, derived from human annotations and video-based tasks.
    • The team relied on two ontologies: a hierarchical one with three categories and 16 subcategories, and a two-dimensional one mapping agent capabilities.
    • Benchmarks: 604 questions from 426 videos for physical common sense, and 610 from 600 videos for embodied reasoning.
    • Performance gains were observed across all benchmarks after RL training, particularly in predicting next actions and verifying task completion.
    • Real-world applicability for robots, vehicles, and other embodied agents across diverse environments.

    In conclusion, the Cosmos-Reason1 initiative demonstrates how AI can be better equipped for the physical world. It addresses key limitations in perception, reasoning, and decision-making that have hindered progress in deploying AI in embodied scenarios. The structured training pipeline, grounded in real-world data and ontological frameworks, ensures that the models are accurate and adaptable. These advancements signal a major step forward in bridging the gap between abstract AI reasoning and the needs of systems that must operate in unpredictable, real-world environments.


    Check out the Paper, Project Page, Models on Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post NVIDIA Releases Cosmos-Reason1: A Suite of AI Models Advancing Physical Common Sense and Embodied Reasoning in Real-World Environments appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleGoogle AI Releases MedGemma: An Open Suite of Models Trained for Performance on Medical Text and Image Comprehension
    Next Article Build a domain‐aware data preprocessing pipeline: A multi‐agent collaboration approach

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 6, 2025
    Machine Learning

    Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

    July 4, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Solution Highlight – Oracle Fusion Global SCM and Manufacturing – Part 2

    Development

    CVE-2025-52830 – bSecure Universal Checkout SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-2774: Webmin Vulnerability Allows Root-Level Privilege Escalation

    Security

    CVE-2025-48342 – RedefiningTheWeb Dynamic Pricing & Discounts Lite for WooCommerce CSRF Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Development

    ⚡ Weekly Recap: APT Campaigns, Browser Hijacks, AI Malware, Cloud Breaches and Critical CVEs

    May 26, 2025

    Cyber threats don’t show up one at a time anymore. They’re layered, planned, and often…

    CVE-2025-1348 – IBM Sterling B2B Integrator and IBM Sterling File Gateway Information Disclosure Vulnerability

    June 18, 2025

    CVE-2025-4338 – Lantronix Device Installer XXE Injection Vulnerability

    May 22, 2025

    The New Hacker’s List and an Old Debate: Would you Hire a Hacker?

    April 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.