Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      7 MagSafe accessories that I recommend every iPhone user should have

      June 1, 2025

      I replaced my Kindle with an iPad Mini as my ebook reader – 8 reasons why I don’t regret it

      June 1, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025
      Recent

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 22/2025

      June 1, 2025

      Rilasciata PorteuX 2.1: Novità e Approfondimenti sulla Distribuzione GNU/Linux Portatile Basata su Slackware

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Light3R-SfM: A Scalable and Efficient Feed-Forward Approach to Structure-from-Motion

    Light3R-SfM: A Scalable and Efficient Feed-Forward Approach to Structure-from-Motion

    February 1, 2025

    Structure-from-motion (SfM) focuses on recovering camera positions and building 3D scenes from multiple images. This process is important for tasks like 3D reconstruction and novel view synthesis. A major challenge comes from processing large image collections efficiently while maintaining accuracy. Several approaches rely on the optimization of camera poses and scene geometry. However, these have usually increased computational costs substantially, and scaling SfM for large datasets remains challenging due to the sensitivity of balancing speed, accuracy, and memory consumption.

    Currently, SfM methods follow two main approaches: incremental and global. Incremental methods build 3D scenes step by step, starting from two images, while global methods align all cameras at once before reconstruction. Both rely on feature detection, matching, 3D triangulation, and optimization, leading to high computational costs and memory usage. Some learning-based methods improve accuracy but struggle with low visual overlap in images. Others attempt to reduce processing time by limiting pairwise comparisons, but optimization-based alignment remains slow and inefficient. Despite advancements, current techniques remain resource-intensive, making it difficult to scale SfM for large datasets or dynamic scenes.

    To solve these issues, researchers from NVIDIA, Vector Institute, and the University of Toronto proposed Light3R-SfM, a fully learnable feed-forward Structure-from-Motion (SfM) model designed to estimate globally aligned camera poses from unordered image collections without requiring computationally expensive global optimization. Unlike conventional SfM techniques, it incorporates an implicit global alignment module in the latent space, enabling efficient multi-view feature sharing before performing pairwise 3D reconstruction. Light3R-SfM differs from Spann3R, which utilizes an explicit memory bank for online reconstruction that can drift over time, focusing on offline reconstruction from unordered images. It employs a scalable attention mechanism for global information exchange, improving accuracy while reducing runtime. Compared to MASt3R-SfM, Light3R-SfM reconstructs a 200-image scene in 33 seconds, achieving a 49× speedup over the 27-minute runtime of MASt3R-SfM.

    The framework consists of five stages: encoding images into feature tokens, performing latent global alignment through self- and cross-attention, constructing a scene graph using the shortest path tree (SPT) algorithm, decoding pairwise point maps, and merging them into a globally aligned 3D reconstruction without traditional global optimization. The method reduces redundant computation by filtering low-overlap image pairs and aligns point maps using Procrustes alignment, which is computationally efficient compared to conventional bundle adjustment. 

    Researchers evaluated multi-view pose estimation on the Tanks&Temples dataset, comparing their method, Light3R-SfM, with optimization-based (OPT) and feedforward-based (FFD) approaches across different view settings. Using metrics such as relative rotation and translation accuracy (RRA, RTA), absolute translation error (ATE), registration rate, and runtime on an NVIDIA V100-32GB, they found that Light3R-SfM significantly outperformed Spann3R, the only other FFD method. It achieved 145% higher RRA and 84% higher RTA while running nearly twice as fast. Although OPT methods like Colmap and Glomap offered better accuracy through bundle adjustment, they required up to 43× more runtime, making them less scalable. Unlike Spann3R, which struggled with unordered images and suffered from high computational costs due to exhaustive pairwise comparisons, Light3R-SfM demonstrated superior efficiency and accuracy, making it a more practical solution.

    In summary, the proposed method replaced traditional matching and global optimization with 3D foundation models and a scalable latent alignment module. This approach reduced runtime while maintaining competitive accuracy, offering a practical alternative to optimization-based methods. However, it has limitations regarding scalability to large image collections and accuracy at tight thresholds, likely due to the low resolution of images. Despite these limitations, this method may serve as a foundation for more promising work in the area, where potential improvements would be related to scalability and accuracy improvement and more robust feature alignment techniques.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

    🚨 Meet IntellAgent: An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System (Promoted)

    The post Light3R-SfM: A Scalable and Efficient Feed-Forward Approach to Structure-from-Motion appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMistral AI Releases the Mistral-Small-24B-Instruct-2501: A Latency-Optimized 24B-Parameter Model Released Under the Apache 2.0 License
    Next Article U.S. and Dutch Authorities Dismantle 39 Domains Linked to BEC Fraud Network

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Microsoft Excel for Mac now allows viewing mutiple worksheets side by side

    Operating Systems

    Accessibility Color Contrast Testing: A Complete Guide for 2024

    Development

    What Is A Next.js Boilerplate and Why Should You Use It?

    Web Development

    CVE-2025-5387 – JeeWMS File Handler Improper Access Controls Remote Vulnerability

    Common Vulnerabilities and Exposures (CVEs)
    Hostinger

    Highlights

    The Role of Prosody in Spoken Question Answering

    April 1, 2025

    Spoken language understanding research to date has generally carried a heavy text perspective. Most datasets…

    Dynamic Design Made Simple: Motion UI for React Developers

    November 29, 2024

    Phishing-as-a-Service “Rockstar 2FA” Targets Microsoft 365 Users with AiTM Attacks

    November 29, 2024

    100 Days of Web Experiments

    February 24, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.