Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Error’d: Pickup Sticklers

      September 27, 2025

      From Prompt To Partner: Designing Your Custom AI Assistant

      September 27, 2025

      Microsoft unveils reimagined Marketplace for cloud solutions, AI apps, and more

      September 27, 2025

      Design Dialects: Breaking the Rules, Not the System

      September 27, 2025

      Building personal apps with open source and AI

      September 12, 2025

      What Can We Actually Do With corner-shape?

      September 12, 2025

      Craft, Clarity, and Care: The Story and Work of Mengchu Yao

      September 12, 2025

      Cailabs secures €57M to accelerate growth and industrial scale-up

      September 12, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

      September 28, 2025
      Recent

      Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

      September 28, 2025

      Mastering PHP File Uploads: A Guide to php.ini Settings and Code Examples

      September 28, 2025

      The first browser with JavaScript landed 30 years ago

      September 27, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured
      Recent
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Offline Video-LLMs Can Now Understand Real-Time Streams: Apple Researchers Introduce StreamBridge to Enable Multi-Turn and Proactive Video Understanding

    Offline Video-LLMs Can Now Understand Real-Time Streams: Apple Researchers Introduce StreamBridge to Enable Multi-Turn and Proactive Video Understanding

    May 13, 2025

    Video-LLMs process whole pre-recorded videos at once. However, applications like robotics and autonomous driving need causal perception and interpretation of visual information online. This fundamental mismatch shows a limitation of current Video-LLMs, as they are not naturally designed to operate in streaming scenarios where timely understanding and responsiveness are paramount. The transition from offline to streaming video understanding presents two key challenges. First, multi-turn real-time understanding requires models to process the most recent video segment while maintaining historical visual and conversational context. Second, proactive response generation demands human-like behavior where the model actively monitors the visual stream and provides timely outputs based on unfolding content without explicit prompts.

    Video-LLMs have gained significant attention for video understanding, combining visual encoders, modality projectors, and LLMs to generate contextual responses from video content. Several approaches have emerged to address the challenge of streaming video understanding. VideoLLMOnline and Flash-VStream introduced specialized online objectives and memory architectures for handling sequential inputs. MMDuet and ViSpeak developed dedicated components for proactive response generation. Multiple benchmark suites have been used to evaluate streaming capabilities, including StreamingBench, StreamBench, SVBench, OmniMMI, and OVO-Bench.

    Researchers from Apple and Fudan University have proposed StreamBridge, a framework to transform offline Video-LLMs into streaming-capable models. It addresses two fundamental challenges in adapting existing models into online scenarios: limited capability for multi-turn real-time understanding and lack of proactive response mechanisms. StreamBridge combines a memory buffer with a round-decayed compression strategy, supporting long-context interactions. It also incorporates a decoupled, lightweight activation model that integrates seamlessly with existing Video-LLMs for proactive response generation. Further, researchers introduced Stream-IT, a large-scale dataset designed for streaming video understanding, featuring mixed videotext sequences and diverse instruction formats.

    StreamBridge framework is evaluated using mainstream offline Video-LLMs, LLaVA-OV-7B, Qwen2-VL-7B, and Oryx-1.5-7B. The Stream-IT dataset is added with approximately 600K samples from established datasets to maintain general video understanding capabilities, including LLaVA-178K, VCG-Plus, and ShareGPT4Video. OVO-Bench and StreamingBench are used for multi-turn real-time understanding, focusing on their real-time tasks. General video understanding is evaluated across seven benchmarks, including three short-video datasets (MVBench, PerceptionTest, TempCompass) and four long-video benchmarks (EgoSchema, LongVideoBench, MLVU, VideoMME).

    The evaluation results show that Qwen2-VL† improved with average scores increasing from 55.98 to 63.35 on OVO-Bench and 69.04 to 72.01 on Streaming-Bench. In contrast, LLaVA-OV† experiences slight performance decreases, dropping from 64.02 to 61.64 on OVO-Bench and from 71.12 to 68.39 on Streaming-Bench. Fine-tuning on the Stream-IT dataset yields substantial improvements across all models. Oryx-1.5† achieves gains of +11.92 on OVO-Bench and +4.2 on Streaming-Bench. Moreover, Qwen2-VL† reaches average scores of 71.30 on OVO-Bench and 77.04 on Streaming-Bench after Stream-IT fine-tuning, outperforming even proprietary models like GPT-4o and Gemini 1.5 Pro, showing the effectiveness of StreamBridge’s approach in enhancing streaming video understanding capabilities.

    In conclusion, researchers introduced StreamBridge, a method to transform offline Video-LLMs into effective streaming-capable models. Its dual innovations, a memory buffer with round-decayed compression strategy and a decoupled lightweight activation model, address the core challenges of streaming video understanding without compromising general performance. Further, the Stream-IT dataset is introduced for streaming video understanding, with specialized interleaved video-text sequences. As streaming video understanding becomes increasingly essential in robotics and autonomous driving, StreamBridge offers a generalizable solution that transforms static Video-LLMs into dynamic, responsive systems capable of meaningful interaction in continuously evolving visual environments.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

    Here’s a brief overview of what we’re building at Marktechpost:

    • ML News Community – r/machinelearningnews (92k+ members)
    • Newsletter– airesearchinsights.com/(30k+ subscribers)
    • miniCON AI Events – minicon.marktechpost.com
    • AI Reports & Magazines – magazine.marktechpost.com
    • AI Dev & Research News – marktechpost.com (1M+ monthly readers)
    • Partner with us

    The post Offline Video-LLMs Can Now Understand Real-Time Streams: Apple Researchers Introduce StreamBridge to Enable Multi-Turn and Proactive Video Understanding appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Step-by-Step Guide on Building, Customizing, and Publishing an AI-Focused Blogging Website with Lovable.dev and Seamless GitHub Integration
    Next Article Build an intelligent community agent to revolutionize IT support with Amazon Q Business

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    This month in security with Tony Anscombe – April 2025 edition

    Development

    UK Quantum computing is going universal through scaling

    News & Updates

    Facebook ‘Safety Check’ allows travelers to alert family

    Development

    Part 1 – Marketing Cloud Personalization and Mobile Apps: Functionality 101

    Development

    Highlights

    Play Ransomware Hacked 900 Organizations, CISA Released TTPs & IOCs

    June 5, 2025

    Play Ransomware Hacked 900 Organizations, CISA Released TTPs & IOCs

    Federal authorities have revealed that the notorious Play ransomware group has successfully breached approximately 900 organizations worldwide as of May 2025, marking a dramatic escalation in cybercri …
    Read more

    Published Date:
    Jun 05, 2025 (1 hour, 22 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2024-57727

    Human Resource Process Outsourcing | HR Outsourcing Solutions Provider

    September 12, 2025

    I think the ergonomics of generators is growing on me.

    May 13, 2025

    Qodo launches CLI agent framework

    June 25, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.