Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 3, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 3, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 3, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 3, 2025

      SteelSeries reveals new Arctis Nova 3 Wireless headset series for Xbox, PlayStation, Nintendo Switch, and PC

      June 3, 2025

      The Witcher 4 looks absolutely amazing in UE5 technical presentation at State of Unreal 2025

      June 3, 2025

      Razer’s having another go at making it so you never have to charge your wireless gaming mouse, and this time it might have nailed it

      June 3, 2025

      Alienware’s rumored laptop could be the first to feature NVIDIA’s revolutionary Arm-based APU

      June 3, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      easy-live2d – About Make your Live2D as easy to control as a pixi sprite! Live2D Web SDK based on Pixi.js.

      June 3, 2025
      Recent

      easy-live2d – About Make your Live2D as easy to control as a pixi sprite! Live2D Web SDK based on Pixi.js.

      June 3, 2025

      From Kitchen To Conversion

      June 3, 2025

      Perficient Included in Forrester’s AI Technical Services Landscape, Q2 2025

      June 3, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      SteelSeries reveals new Arctis Nova 3 Wireless headset series for Xbox, PlayStation, Nintendo Switch, and PC

      June 3, 2025
      Recent

      SteelSeries reveals new Arctis Nova 3 Wireless headset series for Xbox, PlayStation, Nintendo Switch, and PC

      June 3, 2025

      The Witcher 4 looks absolutely amazing in UE5 technical presentation at State of Unreal 2025

      June 3, 2025

      Razer’s having another go at making it so you never have to charge your wireless gaming mouse, and this time it might have nailed it

      June 3, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Microsoft and Tsinghua University Researchers Introduce Distilled Decoding: A New Method for Accelerating Image Generation in Autoregressive Models without Quality Loss

    Microsoft and Tsinghua University Researchers Introduce Distilled Decoding: A New Method for Accelerating Image Generation in Autoregressive Models without Quality Loss

    December 27, 2024

    Autoregressive (AR) models have changed the field of image generation, setting new benchmarks in producing high-quality visuals. These models break down the image creation process into sequential steps, each token generated based on prior tokens, creating outputs with exceptional realism and coherence. Researchers have widely adopted AR techniques for computer vision, gaming, and digital content creation applications. However, the potential of AR models is often constrained by their inherent inefficiencies, particularly their slow generation process, which remains a significant hurdle in real-time applications.

    Among many concerns, a critical one that AR models face is their speed. The token-by-token generation process is inherently sequential, meaning each new token must wait for its predecessor to complete. This approach limits scalability and results in high latency during image generation tasks. For instance, generating a 256×256 image using traditional AR models like LlamaGen requires 256 steps, translating to approximately five seconds on modern GPUs. Such delays hinder their deployment in applications that demand instantaneous results. Also, while AR models excel in maintaining the fidelity of their outputs, they struggle to meet the growing demand for both speed and quality in large-scale implementations.

    Efforts to accelerate AR models have yielded various methods, such as predicting multiple tokens simultaneously or adopting masking strategies during generation. These approaches aim to reduce the required steps but often compromise the quality of the generated images. For example, in multi-token generation techniques, the assumption of conditional independence among tokens introduces artifacts, undermining the cohesiveness of the output. Similarly, masking-based methods allow for faster generation by training models to predict specific tokens based on others, but their effectiveness diminishes when generation steps are drastically reduced. These limitations highlight the need for a new approach to enhance AR model efficiency.

    Tsinghua University and Microsoft Research researchers have introduced a solution to these challenges: Distilled Decoding (DD). This method builds on flow matching, a deterministic mapping that connects Gaussian noise to the output distribution of pre-trained AR models. Unlike conventional methods, DD does not require access to the original training data of the AR models, making it more practical for deployment. The research demonstrated that DD can transform the generation process from hundreds of steps to as few as one or two while preserving the quality of the output. For example, on ImageNet-256, DD achieved a speed-up of 6.3x for VAR models and an impressive 217.8x for LlamaGen, reducing generation steps from 256 to just one.

    The technical foundation of DD is based on its ability to create a deterministic trajectory for token generation. Using flow matching, DD maps noisy inputs to tokens to align their distribution with the pre-trained AR model. During training, the mapping is distilled into a lightweight network that can directly predict the final data sequence from a noise input. This process ensures faster generation and provides flexibility in balancing speed and quality by allowing intermediate steps when needed. Unlike existing methods, DD eliminates the trade-off between speed and fidelity, enabling scalable implementations across diverse tasks.

    In experiments, DD highlights its superiority over traditional methods. For instance, using VAR-d16 models, DD achieved one-step generation with an FID score increase from 4.19 to 9.96, showcasing minimal quality degradation despite a 6.3x speed-up. For LlamaGen models, the reduction in steps from 256 to one resulted in an FID score of 11.35, compared to 4.11 in the original model, with a remarkable 217.8x speed improvement. DD demonstrated similar efficiency in text-to-image tasks, reducing generation steps from 256 to two while maintaining a comparable FID score of 28.95 against 25.70. The results underline DD’s ability to drastically enhance speed without significant loss in image quality, a feat unmatched by baseline methods.

    Hostinger

    Several key takeaways from the research on DD include:

    1. DD reduces generation steps by orders of magnitude, achieving up to 217.8x faster generation than traditional AR models.
    2. Despite the accelerated process, DD maintains acceptable quality levels, with FID score increases remaining within manageable ranges.
    3. DD demonstrated consistent performance across different AR models, including VAR and LlamaGen, regardless of their token sequence definitions or model sizes.
    4. The approach allows users to balance quality and speed by choosing one-step, two-step, or multi-step generation paths based on their requirements.
    5. The method eliminates the need for the original AR model training data, making it feasible for practical applications in scenarios where such data is unavailable.
    6. Due to its efficient distillation approach, DD can potentially impact other domains, such as text-to-image synthesis, language modeling, and image generation.

    In conclusion, with the introduction of Distilled Decoding, researchers have successfully addressed the longstanding speed-quality trade-off that has plagued AR generation processes by leveraging flow matching and deterministic mappings. The method accelerates image synthesis by reducing steps drastically and preserves the outputs’ fidelity and scalability. With its robust performance, adaptability, and practical deployment advantages, Distilled Decoding opens new frontiers in real-time applications of AR models. It sets the stage for further innovation in generative modeling.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

    🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

    The post Microsoft and Tsinghua University Researchers Introduce Distilled Decoding: A New Method for Accelerating Image Generation in Autoregressive Models without Quality Loss appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNeural Networks for Scalable Temporal Logic Model Checking in Hardware Verification
    Next Article Flowbite React: Comprehensive Guide

    Related Posts

    Development

    The Open Source LLM Agent Handbook: How to Automate Complex Tasks with LangGraph and CrewAI

    June 3, 2025
    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    June 3, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    CapCut copycats are on the prowl

    Development

    You could get a cut of Avast’s $16.5 million FTC settlement – how to file a claim

    News & Updates

    CVE-2025-20201 – Cisco IOS XE Software CLI Privilege Escalation Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Opslane: An Open-Source Tool that Helps Teams Reduce Alert Fatigue, Streamline Incident Response, and Boost Team Morale

    Development

    Highlights

    uiautomatorviewer not running on Mac Big Sur

    June 28, 2024

    I’m trying to run uiautomatorviewer on Big Sur and I’m getting this error
    java.lang.NullPointerException
    at org.eclipse.swt.widgets.Control.internal_new_GC(Unknown Source)
    at org.eclipse.swt.graphics.GC.<init>(Unknown Source)
    at org.eclipse.swt.graphics.GC.<init>(Unknown Source)
    at org.eclipse.swt.widgets.Tree.computeSize(Unknown Source)
    at org.eclipse.swt.layout.GridData.computeSize(Unknown Source)
    at org.eclipse.swt.layout.GridLayout.layout(Unknown Source)
    at org.eclipse.swt.layout.GridLayout.computeSize(Unknown Source)
    at org.eclipse.swt.widgets.Composite.computeSize(Unknown Source)
    at org.eclipse.swt.custom.SashFormLayout.computeSize(Unknown Source)
    at org.eclipse.swt.widgets.Composite.computeSize(Unknown Source)
    at org.eclipse.swt.custom.SashFormLayout.computeSize(Unknown Source)
    at org.eclipse.swt.widgets.Composite.computeSize(Unknown Source)
    at org.eclipse.swt.layout.FillData.computeSize(Unknown Source)
    at org.eclipse.swt.layout.FillLayout.computeChildSize(Unknown Source)
    at org.eclipse.swt.layout.FillLayout.computeSize(Unknown Source)
    at org.eclipse.swt.widgets.Composite.computeSize(Unknown Source)
    at org.eclipse.swt.layout.GridData.computeSize(Unknown Source)
    at org.eclipse.swt.layout.GridLayout.layout(Unknown Source)
    at org.eclipse.swt.layout.GridLayout.layout(Unknown Source)
    at org.eclipse.swt.widgets.Composite.updateLayout(Unknown Source)
    at org.eclipse.swt.widgets.Composite.updateLayout(Unknown Source)
    at org.eclipse.swt.widgets.Composite.layout(Unknown Source)
    at org.eclipse.swt.widgets.Display.runSettings(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.eclipse.jface.window.Window.runEventLoop(Window.java:825)
    at org.eclipse.jface.window.Window.open(Window.java:801)
    at com.android.uiautomator.UiAutomatorViewer.main(UiAutomatorViewer.java:78)

    USDA’s FIDO Rollout: A Case Study in Phishing-Resistant MFA

    November 21, 2024

    glhd/gretel

    March 16, 2025

    CVE-2025-48788 – Apache HTTP Server Credentials Disclosure

    May 27, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.