Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»StereoAnything: A Highly Practical AI Solution for Robust Stereo Matching

    StereoAnything: A Highly Practical AI Solution for Robust Stereo Matching

    November 27, 2024

    Computer vision is revolutionizing due to the development of foundation models in object recognition, image segmentation, and monocular depth estimation, showing strong zero- and few-shot performance across various downstream tasks. Stereo matching, which helps perceive depth and create 3D views of scenes, is crucial for fields like robotics, self-driving cars, and augmented reality. However, the exploration of foundation models in stereo matching remains limited due to the difficulty of obtaining accurate disparity ground truth (GT) data. Many stereo datasets exist, but using them effectively for training is difficult. Moreover, these annotated datasets cannot train an ideal foundation model even when combined.

    Currently, Stereo-from-mono is a leading study focusing on creating stereo-image pairs and disparity maps directly from single images to address these challenges. However, this approach resulted in only 500,000 data samples, which is relatively low compared to the scale required to train robust foundation models effectively. While this effort represents an important step towards reducing the dependency on expensive stereo data collection, the generated dataset is still insufficient for building large-scale models capable of generalizing well to diverse real-world conditions. Early Stereo-matching methods mainly relied on hand-crafted features but shifted to CNN-based models like GCNet and PSMNet, improving accuracy with techniques like 3D cost aggregation. Video stereo matching uses temporal data for consistency but struggles with generalization. Cross-domain methods address this by learning domain-invariant features using techniques like unsupervised adaptation and contrastive learning, as seen in models like RAFT–Stereo and FormerStereo.

    A group of researchers from School of Computer Science, Wuhan University, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Waytous, University of Bologna, Rock Universe, Institute of Automation, Chinese Academy of Sciences and University of California, Berkeley conducted detailed research to overcome these issues and proposed StereoAnything, a foundational model for stereo matching developed to produce high-quality disparity estimates for any pair of matching stereo images, no matter how complex the scene or challenging the environmental conditions. It is designed to train a robust stereo network using large-scale mixed data. It mainly consists of four components: feature extraction, cost construction, cost aggregation, and disparity regression.

    To improve generalization, Supervised stereo data was used without depth normalization, as stereo matching relies on scale information. The training began with a single dataset and combined top-ranked datasets to improve robustness. For single-image learning, monocular depth models predicted depth converted into disparity maps to generate realistic stereo pairs via forward warping. Occlusions and gaps were filled using textures from other images in the dataset.

    The experiment showed the evaluation of the StereoAnything framework using OpenStereo and NMRF-Stereo baselines with Swin Transformer for feature extraction. Training used AdamW optimizer, OneCycleLR scheduling, and fine-tuning on labeled, mixed, and pseudo-labeled datasets with data augmentation. Testing on KITTI, Middlebury, ETH3D, and DrivingStereo showed StereoAnything significantly reduced errors, with NMRF-Stereo-SwinT lowering the mean error from 18.11 to 5.01. Fine-tuning StereoCarla on more diverse datasets lead to the best mean metric of 8.52%. This showed the importance of dataset diversity when concerning stereo-matching performance.

    In terms of results, the StereoAnything showed strong robustness across various domains in both indoor and outdoor scenes. This approach constantly delivered a disparity map that was more accurate than with the NMRF-Stereo-SwinTmode. Thus, this approach shows strong generalization capabilities and performs better across domains with numerous visual and environmental differences.

    It is safe to conclude that StereoAnything provided a highly useful solution for robust stereo matching. A new artificial dataset called StereoCarla is used to better generalize across different scenarios and enhance performance. Also, the effectiveness of labeled stereo datasets and pseudo stereo datasets generated using monocular depth estimation models was investigated. In terms of performance, StereoAnything achieved competitive performance across various benchmarks and real-world scenarios. These results show the potential of hybrid training strategies, including diverse data sources to enhance stereo model robustness, and can be used as the baseline for future improvement and research!


    Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    🎙 🚨 ‘Evaluation of Large Language Model Vulnerabilities: A Comparative Analysis of Red Teaming Techniques’ Read the Full Report (Promoted)

    The post StereoAnything: A Highly Practical AI Solution for Robust Stereo Matching appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleEnhanced IDS Framework with usfAD for Detecting Unknown Attacks
    Next Article Meet Foundry: An AI Startup that Builds, Evaluates, and Improves AI Agents

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-48187 – RAGFlow Authentication Bypass

    May 17, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    A seasoned developer’s guide to learning Meteor.js

    Development

    Facebook turns 11 – what you need to know, and what do your likes say about you?

    Development

    Build a custom HTTP client in Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL: An alternative to Oracle’s UTL_HTTP

    Databases

    CVE-2025-43946 – TCPWave DDI Remote Code Execution Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CISA Adds Langflow flaw to KEV Catalog

    May 6, 2025

    CISA Adds Langflow flaw to KEV Catalog

    The Cybersecurity and Infrastructure Security Agency (CISA) has officially added CVE-2025-3248, a critical vulnerability in Langflow, to its Known Exploited Vulnerabilities (KEV) Catalog, citing activ …
    Read more

    Published Date:
    May 06, 2025 (26 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-3248

    Vintage Investment Partners Appoints Ilan Leiferman as Chief Value-Add Officer

    July 4, 2024

    How to Use a Better PHP UUID Generator to Generate Unique Identifier Strings

    August 14, 2024

    Faster LLMs with speculative decoding and AWS Inferentia2

    August 5, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.