DELTA: A Novel AI Method that Efficiently (10x Faster) Tracks Every Pixel in 3D Space from Monocular Videos

Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. Existing methods face challenges in achieving detailed 3D tracking because they often track only a few points, which need more detail for full-scene understanding. They also demand computational power, making it difficult to handle long videos efficiently. Additionally, many of them must be fixed to maintain accuracy over extended sequences, as problems like camera movement and object occlusion cause the model to lose track or introduce errors.

Current methods include several approaches for estimating motion in video sequences, each with unique strengths and limitations. Optical flow techniques provide dense pixel-wise tracking but struggle with robustness in complex scenes, especially when extended to long sequences. Scene Flow generalizes optical flow to estimate dense 3D motion, using either RGB-D data or point clouds, but it remains challenging to apply efficiently over long sequences. Point tracking captures motion trajectories by tracking specific points, with recent advancements incorporating spatial and temporal attention for smoother tracking. However, point-tracking methods still need to improve in achieving dense monitoring due to the high computational cost. Tracking by Reconstructing methods uses a deformation field to estimate motion making them less practical for real-time applications.

A team of researchers from UMass Amherst & MIT-IBM Watson AI Lab, Snap Inc. have proposed DELTA (Dense Efficient Long-range 3D Tracking for Any video), the first method designed to efficiently track every pixel in 3D space across long video sequences. DELTA operates by starting with reduced-resolution tracking via spatio-temporal attention and applying an attention-based upsampler for high-resolution accuracy. Key innovations include an upsampler for sharp motion boundaries, an efficient spatial attention architecture for dense tracking, and a log-depth representation that enhances tracking performance. DELTA achieves state-of-the-art results on the CVO and Kubric3D datasets, showing over 10% improvement in metrics like Average Jaccard (AJ) and Average Position Difference in 3D (APD3D), and performs competitively on 3D point tracking benchmarks such as TAP-Vid3D and LSFOdyssey. Unlike existing methods, DELTA delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy.

An experiment conducted showed that DELTA excels in 3D tracking tasks, outperforming previous methods in speed and accuracy. Trained on Kubricâ€™s dataset with over 5,600 videos, DELTAâ€™s loss function combines 2D coordinate, depth, and visibility losses.Â

In benchmarks, DELTA achieved top scores on CVO for long-range 2D tracking and on Kubric3D for dense 3D tracking, completing tasks much faster than other methods. DELTAâ€™s design choices, including log-depth representation, spatial attention, and an attention-based upsampler, significantly enhance its accuracy and efficiency across diverse tracking scenarios.

In conclusion, DELTA is a highly efficient method for tracking every pixel across video frames, achieving accuracy in dense 2D and 3D tracking with a faster runtime than existing methods. The model may need help with points that remain occluded for extended periods and perform best on videos with fewer than several hundred frames. The approach has limitations similar to those of earlier methods as it utilizes shorter temporal processing windows. Moreover, the methodâ€™s 3D tracking accuracy relies on the precision and temporal stability of the monocular depth estimation used. Anticipated monocular depth estimation research improvements will likely enhance the methodâ€™s performance further.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter.. Donâ€™t Forget to join ourÂ 55k+ ML SubReddit.

[Sponsorship Opportunity with us] Promote Your Research/Product/Webinar with 1Million+ Monthly Readers and 500k+ Community Members

The post DELTA: A Novel AI Method that Efficiently (10x Faster) Tracks Every Pixel in 3D Space from Monocular Videos appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

DELTA: A Novel AI Method that Efficiently (10x Faster) Tracks Every Pixel in 3D Space from Monocular Videos

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

UEFI Secure Boot: Not so secure?

The best free software uninstallers of 2025: Expert tested

Chain-of-Associated-Thoughts (CoAT): An AI Framework to Enhance LLM Reasoning

How to Start Learning TypeScript – A Beginner’s Guide

PHP 8.4 Alpha 1 is now out!

Generative AI’s biggest challenge is showing the ROI – here’s why

Fine-Tuning NVIDIA NV-Embed-v1 on Amazon Polarity Dataset Using LoRA and PEFT: A Memory-Efficient Approach with Transformers and Hugging Face

ImmerseDiffusion: A Generative Spatial Audio Latent Diffusion Model

DELTA: A Novel AI Method that Efficiently (10x Faster) Tracks Every Pixel in 3D Space from Monocular Videos

Related Posts