Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»DELTA: A Novel AI Method that Efficiently (10x Faster) Tracks Every Pixel in 3D Space from Monocular Videos

    DELTA: A Novel AI Method that Efficiently (10x Faster) Tracks Every Pixel in 3D Space from Monocular Videos

    November 6, 2024

    Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. Existing methods face challenges in achieving detailed 3D tracking because they often track only a few points, which need more detail for full-scene understanding. They also demand computational power, making it difficult to handle long videos efficiently. Additionally, many of them must be fixed to maintain accuracy over extended sequences, as problems like camera movement and object occlusion cause the model to lose track or introduce errors.

    Current methods include several approaches for estimating motion in video sequences, each with unique strengths and limitations. Optical flow techniques provide dense pixel-wise tracking but struggle with robustness in complex scenes, especially when extended to long sequences. Scene Flow generalizes optical flow to estimate dense 3D motion, using either RGB-D data or point clouds, but it remains challenging to apply efficiently over long sequences. Point tracking captures motion trajectories by tracking specific points, with recent advancements incorporating spatial and temporal attention for smoother tracking. However, point-tracking methods still need to improve in achieving dense monitoring due to the high computational cost. Tracking by Reconstructing methods uses a deformation field to estimate motion making them less practical for real-time applications.

    A team of researchers from UMass Amherst & MIT-IBM Watson AI Lab, Snap Inc. have proposed DELTA (Dense Efficient Long-range 3D Tracking for Any video), the first method designed to efficiently track every pixel in 3D space across long video sequences. DELTA operates by starting with reduced-resolution tracking via spatio-temporal attention and applying an attention-based upsampler for high-resolution accuracy. Key innovations include an upsampler for sharp motion boundaries, an efficient spatial attention architecture for dense tracking, and a log-depth representation that enhances tracking performance. DELTA achieves state-of-the-art results on the CVO and Kubric3D datasets, showing over 10% improvement in metrics like Average Jaccard (AJ) and Average Position Difference in 3D (APD3D), and performs competitively on 3D point tracking benchmarks such as TAP-Vid3D and LSFOdyssey. Unlike existing methods, DELTA delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy.

    An experiment conducted showed that DELTA excels in 3D tracking tasks, outperforming previous methods in speed and accuracy. Trained on Kubric’s dataset with over 5,600 videos, DELTA’s loss function combines 2D coordinate, depth, and visibility losses. 

    In benchmarks, DELTA achieved top scores on CVO for long-range 2D tracking and on Kubric3D for dense 3D tracking, completing tasks much faster than other methods. DELTA’s design choices, including log-depth representation, spatial attention, and an attention-based upsampler, significantly enhance its accuracy and efficiency across diverse tracking scenarios.

    In conclusion, DELTA is a highly efficient method for tracking every pixel across video frames, achieving accuracy in dense 2D and 3D tracking with a faster runtime than existing methods. The model may need help with points that remain occluded for extended periods and perform best on videos with fewer than several hundred frames. The approach has limitations similar to those of earlier methods as it utilizes shorter temporal processing windows. Moreover, the method’s 3D tracking accuracy relies on the precision and temporal stability of the monocular depth estimation used. Anticipated monocular depth estimation research improvements will likely enhance the method’s performance further.


    Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    [Sponsorship Opportunity with us] Promote Your Research/Product/Webinar with 1Million+ Monthly Readers and 500k+ Community Members

    The post DELTA: A Novel AI Method that Efficiently (10x Faster) Tracks Every Pixel in 3D Space from Monocular Videos appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCodeSOD: Uniquely Validated
    Next Article The Three Different Types of Artificial Intelligence – ANI, AGI and ASI

    Related Posts

    Machine Learning

    Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

    May 16, 2025
    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Microsoft unveils new Start menu for Windows 11 with customizable layout and Phone Link integration

    News & Updates

    StilachiRAT comes after your credentials and crypto wallet, warns Microsoft

    Operating Systems

    How to find a specific occurrence of a field that exists in multiple places (Xpath)?

    Development

    New CSS that can actually be used in 2024

    Development

    Highlights

    Development

    Lightski: An AI Startup that Lets You Embed ChatGPT Code Interpreter in Your App

    June 15, 2024

    These days, an embedded analytics solution can cost six figures. Users are never satisfied, regardless…

    Photoshop and AI: Combining Creativity with Technology

    May 19, 2024

    Mind-Reading AI Is Finally Here – And It’s the World’s Best-Kept Secret

    February 24, 2025

    How to Bring Zero Trust to Wi-Fi Security with a Cloud-based Captive Portal?

    January 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.