Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      This week in AI dev tools: Gemini 2.5 Pro and Flash GA, GitHub Copilot Spaces, and more (June 20, 2025)

      June 20, 2025

      Gemini 2.5 Pro and Flash are generally available and Gemini 2.5 Flash-Lite preview is announced

      June 19, 2025

      CSS Cascade Layers Vs. BEM Vs. Utility Classes: Specificity Control

      June 19, 2025

      IBM launches new integration to help unify AI security and governance

      June 18, 2025

      I used Lenovo’s latest dual-screen OLED laptop for a month and it wouldn’t be my first choice — here’s why

      June 22, 2025

      Here’s how I fixed a dead Steam Deck screen — with Valve proving they still have the best customer service in gaming

      June 22, 2025

      Borderlands 4 drops stunning new story trailer

      June 22, 2025

      DistroWatch Weekly, Issue 1127

      June 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Exploring Lakebase: Databricks’ Next-Gen AI-Native OLTP Database

      June 22, 2025
      Recent

      Exploring Lakebase: Databricks’ Next-Gen AI-Native OLTP Database

      June 22, 2025

      Understanding JavaScript Promise

      June 22, 2025

      Lakeflow: Revolutionizing SCD2 Pipelines with Change Data Capture (CDC)

      June 21, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I used Lenovo’s latest dual-screen OLED laptop for a month and it wouldn’t be my first choice — here’s why

      June 22, 2025
      Recent

      I used Lenovo’s latest dual-screen OLED laptop for a month and it wouldn’t be my first choice — here’s why

      June 22, 2025

      Here’s how I fixed a dead Steam Deck screen — with Valve proving they still have the best customer service in gaming

      June 22, 2025

      Borderlands 4 drops stunning new story trailer

      June 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Hybrid AI model crafts smooth, high-quality videos in seconds

    Hybrid AI model crafts smooth, high-quality videos in seconds

    May 6, 2025

    What would a behind-the-scenes look at a video generated by an artificial intelligence model be like? You might think the process is similar to stop-motion animation, where many images are created and stitched together, but that’s not quite the case for “diffusion models” like OpenAl’s SORA and Google’s VEO 2.

    Instead of producing a video frame-by-frame (or “autoregressively”), these systems process the entire sequence at once. The resulting clip is often photorealistic, but the process is slow and doesn’t allow for on-the-fly changes. 

    Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Adobe Research have now developed a hybrid approach, called “CausVid,” to create videos in seconds. Much like a quick-witted student learning from a well-versed teacher, a full-sequence diffusion model trains an autoregressive system to swiftly predict the next frame while ensuring high quality and consistency. CausVid’s student model can then generate clips from a simple text prompt, turning a photo into a moving scene, extending a video, or altering its creations with new inputs mid-generation.

    This dynamic tool enables fast, interactive content creation, cutting a 50-step process into just a few actions. It can craft many imaginative and artistic scenes, such as a paper airplane morphing into a swan, woolly mammoths venturing through snow, or a child jumping in a puddle. Users can also make an initial prompt, like “generate a man crossing the street,” and then make follow-up inputs to add new elements to the scene, like “he writes in his notebook when he gets to the opposite sidewalk.”

    The CSAIL researchers say that the model could be used for different video editing tasks, like helping viewers understand a livestream in a different language by generating a video that syncs with an audio translation. It could also help render new content in a video game or quickly produce training simulations to teach robots new tasks.

    Tianwei Yin SM ’25, PhD ’25, a recently graduated student in electrical engineering and computer science and CSAIL affiliate, attributes the model’s strength to its mixed approach.

    “CausVid combines a pre-trained diffusion-based model with autoregressive architecture that’s typically found in text generation models,” says Yin, co-lead author of a new paper about the tool. “This AI-powered teacher model can envision future steps to train a frame-by-frame system to avoid making rendering errors.”

    Yin’s co-lead author, Qiang Zhang, is a research scientist at xAI and a former CSAIL visiting researcher. They worked on the project with Adobe Research scientists Richard Zhang, Eli Shechtman, and Xun Huang, and two CSAIL principal investigators: MIT professors Bill Freeman and Frédo Durand.

    Caus(Vid) and effect

    Many autoregressive models can create a video that’s initially smooth, but the quality tends to drop off later in the sequence. A clip of a person running might seem lifelike at first, but their legs begin to flail in unnatural directions, indicating frame-to-frame inconsistencies (also called “error accumulation”).

    Error-prone video generation was common in prior causal approaches, which learned to predict frames one by one on their own. CausVid instead uses a high-powered diffusion model to teach a simpler system its general video expertise, enabling it to create smooth visuals, but much faster.

    CausVid displayed its video-making aptitude when researchers tested its ability to make high-resolution, 10-second-long videos. It outperformed baselines like “OpenSORA” and “MovieGen,” working up to 100 times faster than its competition while producing the most stable, high-quality clips.

    Then, Yin and his colleagues tested CausVid’s ability to put out stable 30-second videos, where it also topped comparable models on quality and consistency. These results indicate that CausVid may eventually produce stable, hours-long videos, or even an indefinite duration.

    A subsequent study revealed that users preferred the videos generated by CausVid’s student model over its diffusion-based teacher.

    “The speed of the autoregressive model really makes a difference,” says Yin. “Its videos look just as good as the teacher’s ones, but with less time to produce, the trade-off is that its visuals are less diverse.”

    CausVid also excelled when tested on over 900 prompts using a text-to-video dataset, receiving the top overall score of 84.27. It boasted the best metrics in categories like imaging quality and realistic human actions, eclipsing state-of-the-art video generation models like “Vchitect” and “Gen-3.”

    While an efficient step forward in AI video generation, CausVid may soon be able to design visuals even faster — perhaps instantly — with a smaller causal architecture. Yin says that if the model is trained on domain-specific datasets, it will likely create higher-quality clips for robotics and gaming.

    Experts say that this hybrid system is a promising upgrade from diffusion models, which are currently bogged down by processing speeds. “[Diffusion models] are way slower than LLMs [large language models] or generative image models,” says Carnegie Mellon University Assistant Professor Jun-Yan Zhu, who was not involved in the paper. “This new work changes that, making video generation much more efficient. That means better streaming speed, more interactive applications, and lower carbon footprints.”

    The team’s work was supported, in part, by the Amazon Science Hub, the Gwangju Institute of Science and Technology, Adobe, Google, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator. CausVid will be presented at the Conference on Computer Vision and Pattern Recognition in June.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNode.js 24 Is Here: What You Need to Know
    Next Article Therapists Too Expensive? Why Thousands of Women Are Spilling Their Deepest Secrets to ChatGPT

    Related Posts

    Artificial Intelligence

    Introducing Gemma 3

    June 22, 2025
    Artificial Intelligence

    Gemini Robotics brings AI into the physical world

    June 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-49258 – ThemBay Maia PHP Remote File Inclusion Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Gladinet’s Triofox and CentreStack Under Active Exploitation via Critical RCE Vulnerability

    Development

    CVE-2025-5705 – Code-Projects Real Estate Property Management System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2022-27562 – HCL Domino Volt HTML Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Roundcube: CVE-2025–49113

    June 19, 2025

    Roundcube: CVE-2025–49113

    Roundcube: CVE-2025–49113Who am I?I’m Chetan Chinchulkar (aka omnipresent), a cybersecurity enthusiast, software developer, and security researcher ranked in the top 2% on TryHackMe. Passionate about …
    Read more

    Published Date:
    Jun 19, 2025 (4 hours, 16 minutes ago)

    Vulnerabilities has been mentioned in this article.

    Windows 11 tests PC to PC wireless file transfer, similar to Windows 7 tool

    May 16, 2025

    CVE-2025-49878 – Greg Winiarski WPAdverts Cross-site Scripting

    June 17, 2025

    Satya Nadella says Microsoft’s AI model performance is “doubling every 6 months”, despite the estranged OpenAI partnership

    May 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.