Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sentry launches MCP monitoring tool

      August 14, 2025

      10 Benefits of Hiring a React.js Development Company (2025–2026 Edition)

      August 13, 2025

      From Line To Layout: How Past Experiences Shape Your Design Career

      August 13, 2025

      Hire React.js Developers in the US: How to Choose the Right Team for Your Needs

      August 13, 2025

      I’ve tested every Samsung Galaxy phone in 2025 – here’s the model I’d recommend on sale

      August 14, 2025

      Google Photos just put all its best editing tools a tap away – here’s the shortcut

      August 14, 2025

      Claude can teach you how to code now, and more – how to try it

      August 14, 2025

      One of the best work laptops I’ve tested has MacBook written all over it (but it’s even better)

      August 14, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Controlling Execution Flow with Laravel’s Sleep Helper

      August 14, 2025
      Recent

      Controlling Execution Flow with Laravel’s Sleep Helper

      August 14, 2025

      Generate Secure Temporary Share Links for Files in Laravel

      August 14, 2025

      This Week in Laravel: Filament 4, Laravel Boost, and Junie Review

      August 14, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      KDE Plasma 6 on Wayland: the Payoff for Years of Plumbing

      August 14, 2025
      Recent

      KDE Plasma 6 on Wayland: the Payoff for Years of Plumbing

      August 14, 2025

      FOSS Weekly #25.33: Debian 13 Released, Torvalds vs RISC-V, Arch’s New Tool, GNOME Perfection and More Linux Stuff

      August 14, 2025

      Ultimate ChatGPT-5 Prompt Guide: 52 Ideas for Any Task

      August 14, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Offline Video-LLMs Can Now Understand Real-Time Streams: Apple Researchers Introduce StreamBridge to Enable Multi-Turn and Proactive Video Understanding

    Offline Video-LLMs Can Now Understand Real-Time Streams: Apple Researchers Introduce StreamBridge to Enable Multi-Turn and Proactive Video Understanding

    May 13, 2025

    Video-LLMs process whole pre-recorded videos at once. However, applications like robotics and autonomous driving need causal perception and interpretation of visual information online. This fundamental mismatch shows a limitation of current Video-LLMs, as they are not naturally designed to operate in streaming scenarios where timely understanding and responsiveness are paramount. The transition from offline to streaming video understanding presents two key challenges. First, multi-turn real-time understanding requires models to process the most recent video segment while maintaining historical visual and conversational context. Second, proactive response generation demands human-like behavior where the model actively monitors the visual stream and provides timely outputs based on unfolding content without explicit prompts.

    Video-LLMs have gained significant attention for video understanding, combining visual encoders, modality projectors, and LLMs to generate contextual responses from video content. Several approaches have emerged to address the challenge of streaming video understanding. VideoLLMOnline and Flash-VStream introduced specialized online objectives and memory architectures for handling sequential inputs. MMDuet and ViSpeak developed dedicated components for proactive response generation. Multiple benchmark suites have been used to evaluate streaming capabilities, including StreamingBench, StreamBench, SVBench, OmniMMI, and OVO-Bench.

    Researchers from Apple and Fudan University have proposed StreamBridge, a framework to transform offline Video-LLMs into streaming-capable models. It addresses two fundamental challenges in adapting existing models into online scenarios: limited capability for multi-turn real-time understanding and lack of proactive response mechanisms. StreamBridge combines a memory buffer with a round-decayed compression strategy, supporting long-context interactions. It also incorporates a decoupled, lightweight activation model that integrates seamlessly with existing Video-LLMs for proactive response generation. Further, researchers introduced Stream-IT, a large-scale dataset designed for streaming video understanding, featuring mixed videotext sequences and diverse instruction formats.

    StreamBridge framework is evaluated using mainstream offline Video-LLMs, LLaVA-OV-7B, Qwen2-VL-7B, and Oryx-1.5-7B. The Stream-IT dataset is added with approximately 600K samples from established datasets to maintain general video understanding capabilities, including LLaVA-178K, VCG-Plus, and ShareGPT4Video. OVO-Bench and StreamingBench are used for multi-turn real-time understanding, focusing on their real-time tasks. General video understanding is evaluated across seven benchmarks, including three short-video datasets (MVBench, PerceptionTest, TempCompass) and four long-video benchmarks (EgoSchema, LongVideoBench, MLVU, VideoMME).

    The evaluation results show that Qwen2-VL† improved with average scores increasing from 55.98 to 63.35 on OVO-Bench and 69.04 to 72.01 on Streaming-Bench. In contrast, LLaVA-OV† experiences slight performance decreases, dropping from 64.02 to 61.64 on OVO-Bench and from 71.12 to 68.39 on Streaming-Bench. Fine-tuning on the Stream-IT dataset yields substantial improvements across all models. Oryx-1.5† achieves gains of +11.92 on OVO-Bench and +4.2 on Streaming-Bench. Moreover, Qwen2-VL† reaches average scores of 71.30 on OVO-Bench and 77.04 on Streaming-Bench after Stream-IT fine-tuning, outperforming even proprietary models like GPT-4o and Gemini 1.5 Pro, showing the effectiveness of StreamBridge’s approach in enhancing streaming video understanding capabilities.

    In conclusion, researchers introduced StreamBridge, a method to transform offline Video-LLMs into effective streaming-capable models. Its dual innovations, a memory buffer with round-decayed compression strategy and a decoupled lightweight activation model, address the core challenges of streaming video understanding without compromising general performance. Further, the Stream-IT dataset is introduced for streaming video understanding, with specialized interleaved video-text sequences. As streaming video understanding becomes increasingly essential in robotics and autonomous driving, StreamBridge offers a generalizable solution that transforms static Video-LLMs into dynamic, responsive systems capable of meaningful interaction in continuously evolving visual environments.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

    Here’s a brief overview of what we’re building at Marktechpost:

    • ML News Community – r/machinelearningnews (92k+ members)
    • Newsletter– airesearchinsights.com/(30k+ subscribers)
    • miniCON AI Events – minicon.marktechpost.com
    • AI Reports & Magazines – magazine.marktechpost.com
    • AI Dev & Research News – marktechpost.com (1M+ monthly readers)
    • Partner with us

    The post Offline Video-LLMs Can Now Understand Real-Time Streams: Apple Researchers Introduce StreamBridge to Enable Multi-Turn and Proactive Video Understanding appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Step-by-Step Guide on Building, Customizing, and Publishing an AI-Focused Blogging Website with Lovable.dev and Seamless GitHub Integration
    Next Article Build an intelligent community agent to revolutionize IT support with Amazon Q Business

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 14, 2025
    Machine Learning

    Citations with Amazon Nova understanding models

    August 14, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Aviatrix Cloud Controller Authentication Vulnerability Let Attackers Execute Remote Code

    Security

    Ori and the Blind Forest studio head says they could be forced to “shut down” due to the impact of negative Steam reviews for ‘No Rest for the Wicked’

    News & Updates

    Man-in-the-Middle Attack Prevention Guide

    Development

    CVE-2025-48916 – Drupal Bookable Calendar Missing Authorization

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Machine Learning

    Long-running execution flows now supported in Amazon Bedrock Flows in public preview

    July 11, 2025

    Today, we announce the public preview of long-running execution (asynchronous) flow support within Amazon Bedrock…

    macOS Apprentice [SUBSCRIBER]

    July 28, 2025

    CVE-2025-26855 – Joomla Articles Calendar SQL Injection

    July 18, 2025

    CVE-2025-6352 – “Code-projects Automated Voting System Remote Code Execution Vulnerability”

    June 20, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.