Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 2, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 2, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 2, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 2, 2025

      How Red Hat just quietly, radically transformed enterprise server Linux

      June 2, 2025

      OpenAI wants ChatGPT to be your ‘super assistant’ – what that means

      June 2, 2025

      The best Linux VPNs of 2025: Expert tested and reviewed

      June 2, 2025

      One of my favorite gaming PCs is 60% off right now

      June 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      `document.currentScript` is more useful than I thought.

      June 2, 2025
      Recent

      `document.currentScript` is more useful than I thought.

      June 2, 2025

      Adobe Sensei and GenAI in Practice for Enterprise CMS

      June 2, 2025

      Over The Air Updates for React Native Apps

      June 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025
      Recent

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025

      Microsoft says Copilot can use location to change Outlook’s UI on Android

      June 2, 2025

      TempoMail — Command Line Temporary Email in Linux

      June 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Light3R-SfM: A Scalable and Efficient Feed-Forward Approach to Structure-from-Motion

    Light3R-SfM: A Scalable and Efficient Feed-Forward Approach to Structure-from-Motion

    February 1, 2025

    Structure-from-motion (SfM) focuses on recovering camera positions and building 3D scenes from multiple images. This process is important for tasks like 3D reconstruction and novel view synthesis. A major challenge comes from processing large image collections efficiently while maintaining accuracy. Several approaches rely on the optimization of camera poses and scene geometry. However, these have usually increased computational costs substantially, and scaling SfM for large datasets remains challenging due to the sensitivity of balancing speed, accuracy, and memory consumption.

    Currently, SfM methods follow two main approaches: incremental and global. Incremental methods build 3D scenes step by step, starting from two images, while global methods align all cameras at once before reconstruction. Both rely on feature detection, matching, 3D triangulation, and optimization, leading to high computational costs and memory usage. Some learning-based methods improve accuracy but struggle with low visual overlap in images. Others attempt to reduce processing time by limiting pairwise comparisons, but optimization-based alignment remains slow and inefficient. Despite advancements, current techniques remain resource-intensive, making it difficult to scale SfM for large datasets or dynamic scenes.

    To solve these issues, researchers from NVIDIA, Vector Institute, and the University of Toronto proposed Light3R-SfM, a fully learnable feed-forward Structure-from-Motion (SfM) model designed to estimate globally aligned camera poses from unordered image collections without requiring computationally expensive global optimization. Unlike conventional SfM techniques, it incorporates an implicit global alignment module in the latent space, enabling efficient multi-view feature sharing before performing pairwise 3D reconstruction. Light3R-SfM differs from Spann3R, which utilizes an explicit memory bank for online reconstruction that can drift over time, focusing on offline reconstruction from unordered images. It employs a scalable attention mechanism for global information exchange, improving accuracy while reducing runtime. Compared to MASt3R-SfM, Light3R-SfM reconstructs a 200-image scene in 33 seconds, achieving a 49× speedup over the 27-minute runtime of MASt3R-SfM.

    The framework consists of five stages: encoding images into feature tokens, performing latent global alignment through self- and cross-attention, constructing a scene graph using the shortest path tree (SPT) algorithm, decoding pairwise point maps, and merging them into a globally aligned 3D reconstruction without traditional global optimization. The method reduces redundant computation by filtering low-overlap image pairs and aligns point maps using Procrustes alignment, which is computationally efficient compared to conventional bundle adjustment. 

    Researchers evaluated multi-view pose estimation on the Tanks&Temples dataset, comparing their method, Light3R-SfM, with optimization-based (OPT) and feedforward-based (FFD) approaches across different view settings. Using metrics such as relative rotation and translation accuracy (RRA, RTA), absolute translation error (ATE), registration rate, and runtime on an NVIDIA V100-32GB, they found that Light3R-SfM significantly outperformed Spann3R, the only other FFD method. It achieved 145% higher RRA and 84% higher RTA while running nearly twice as fast. Although OPT methods like Colmap and Glomap offered better accuracy through bundle adjustment, they required up to 43× more runtime, making them less scalable. Unlike Spann3R, which struggled with unordered images and suffered from high computational costs due to exhaustive pairwise comparisons, Light3R-SfM demonstrated superior efficiency and accuracy, making it a more practical solution.

    In summary, the proposed method replaced traditional matching and global optimization with 3D foundation models and a scalable latent alignment module. This approach reduced runtime while maintaining competitive accuracy, offering a practical alternative to optimization-based methods. However, it has limitations regarding scalability to large image collections and accuracy at tight thresholds, likely due to the low resolution of images. Despite these limitations, this method may serve as a foundation for more promising work in the area, where potential improvements would be related to scalability and accuracy improvement and more robust feature alignment techniques.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

    🚨 Meet IntellAgent: An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System (Promoted)

    The post Light3R-SfM: A Scalable and Efficient Feed-Forward Approach to Structure-from-Motion appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMistral AI Releases the Mistral-Small-24B-Instruct-2501: A Latency-Optimized 24B-Parameter Model Released Under the Apache 2.0 License
    Next Article U.S. and Dutch Authorities Dismantle 39 Domains Linked to BEC Fraud Network

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 2, 2025
    Machine Learning

    MiMo-VL-7B: A Powerful Vision-Language Model to Enhance General Visual Understanding and Multimodal Reasoning

    June 2, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Ohkami – intuitive and declarative web framework

    Linux

    I saw the future of AI at Qualcomm’s headquarters, and Copilot+ PCs were only just the beginning

    Development

    Optimizing Mobile Experiences with Experience Cloud: Reaching Customers on the Go

    Development

    CVE-2025-32433 impacts Erlang/OTP

    Security

    Highlights

    ASUS Confirms Critical Flaw in AiCloud Routers; Users Urged to Update Firmware Security

    ASUS Confirms Critical Flaw in AiCloud Routers; Users Urged to Update Firmware

    April 20, 2025

    ASUS Confirms Critical Flaw in AiCloud Routers; Users Urged to Update Firmware

    Network Security / Vulnerability
    ASUS has disclosed a critical security flaw impacting routers with AiCloud enabled that could permit remote attackers to perform unauthorized execution of functions on …
    Read more

    Published Date:
    Apr 19, 2025 (1 day, 13 hours ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-2492

    Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Development Support Program

    July 31, 2024

    Optimizing Reasoning Performance: A Comprehensive Analysis of Inference-Time Scaling Methods in Language Models

    April 27, 2025

    Windows 11’s Start menu is getting a big redesign, lets you turn off Recommended feed

    April 3, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.