Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Report: 71% of tech leaders won’t hire devs without AI skills

      July 17, 2025

      Slack’s AI search now works across an organization’s entire knowledge base

      July 17, 2025

      In-House vs Outsourcing for React.js Development: Understand What Is Best for Your Enterprise

      July 17, 2025

      Tiny Screens, Big Impact: The Forgotten Art Of Developing Web Apps For Feature Phones

      July 16, 2025

      Too many open browser tabs? This is still my favorite solution – and has been for years

      July 17, 2025

      This new browser won’t monetize your every move – how to try it

      July 17, 2025

      Pokémon has partnered with one of the biggest PC gaming brands again, and you can actually buy these accessories — but do you even want to?

      July 17, 2025

      AMD’s budget Ryzen AI 5 330 processor will introduce a wave of ultra-affordable Copilot+ PCs with its mobile 50 TOPS NPU

      July 17, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The details of TC39’s last meeting

      July 17, 2025
      Recent

      The details of TC39’s last meeting

      July 17, 2025

      Notes Android App Using SQLite

      July 17, 2025

      How to Get Security Patches for Legacy Unsupported Node.js Versions

      July 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      KeySmith – SSH key management

      July 17, 2025
      Recent

      KeySmith – SSH key management

      July 17, 2025

      Pokémon has partnered with one of the biggest PC gaming brands again, and you can actually buy these accessories — but do you even want to?

      July 17, 2025

      AMD’s budget Ryzen AI 5 330 processor will introduce a wave of ultra-affordable Copilot+ PCs with its mobile 50 TOPS NPU

      July 17, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»National University of Singapore Researchers Introduce Dimple: A Discrete Diffusion Multimodal Language Model for Efficient and Controllable Text Generation

    National University of Singapore Researchers Introduce Dimple: A Discrete Diffusion Multimodal Language Model for Efficient and Controllable Text Generation

    May 29, 2025

    In recent months, there has been growing interest in applying diffusion models—originally designed for continuous data, such as images—to natural language processing tasks. This has led to the development of Discrete Diffusion Language Models (DLMs), which treat text generation as a denoising process. Unlike traditional autoregressive models, DLMs enable parallel decoding and provide better control over structure, offering advantages such as flexible initialization of entire sequences, explicit control over output format, and improved infilling through bidirectional attention. Furthermore, their non-sequential nature opens the door to faster generation. Despite these benefits, most current multimodal large language models (MLLMs)—such as LLaMA, Qwen-VL, and InternVL—still rely solely on autoregressive methods.

    Work in diffusion-based language models has explored both continuous and discrete diffusion spaces. Continuous approaches, such as DiffuSeq and SED, use embedding or relaxed categorical spaces for smoother generation. In contrast, discrete models like SDDM and RDM tailor the diffusion process to linguistic structures. Training techniques vary, but commonly use masked language modeling losses or entropy-based score matching. Some hybrid models, such as AR-Diffusion and SSD-LM, combine autoregressive and diffusion strategies to leverage the strengths of both approaches. Meanwhile, open-source MLLMs such as LLaVA and InternVL have advanced through visual instruction tuning and joint pretraining, yet still follow an autoregressive generation scheme. 

    Researchers at the National University of Singapore present Dimple, the first Discrete DMLLM, which integrates a vision encoder with a discrete diffusion-based language model. To overcome the instability and performance issues of purely diffusion-based training, they introduce a two-phase training method—Autoregressive-then-Diffusion—combining initial autoregressive alignment with subsequent diffusion-based masked language modeling. Dimple-7B surpasses LLaVA-NEXT by 3.9% on benchmarks. The team also introduces Confident Decoding for dynamic token generation and explores Structure Priors for precise control over output. These innovations significantly improve inference efficiency, generation flexibility, and structural controllability without sacrificing performance. 

    Dimple is a Discrete Diffusion Multimodal LLM that integrates a vision encoder with a diffusion-based language model. To address inefficiencies in diffusion training, such as sparse supervision and limited generation coverage, the model is trained in two phases: first with autoregressive training using a causal attention mask for vision-language alignment, then with diffusion training to restore generation capabilities. During inference, a dynamic “Confident Decoding” strategy adapts token updates based on prediction confidence. Despite using significantly fewer training samples, Dimple exhibits competitive performance on multiple benchmarks, outperforming similar-scale autoregressive models, although it trails behind larger-scale state-of-the-art systems. 

    The experiments evaluate Dimple, a DMLLM, against autoregressive models on instruction-following tasks. Dimple, trained with a hybrid strategy that combines autoregressive and diffusion tuning, exhibits strong performance, surpassing models with similar training data on most benchmarks. Although it lags behind models trained on much larger datasets, Dimple benefits from a stronger base language model. Ablation studies reveal that combining autoregressive and diffusion tuning mitigates issues like length bias and improves consistency. Prefilling further boosts inference speed significantly, with only minor performance drops, making the model both efficient and competitive in multimodal understanding tasks. 

    In conclusion, Dimple, the first DMLLM, is designed to overcome the limitations of purely discrete diffusion training, such as instability and length bias. Dimple employs a hybrid training approach that starts with autoregressive learning, followed by diffusion tuning, yielding the Dimple-7B model, which outperforms LLaVA-NEXT by 3.9%. A decoding strategy, confident decoding, significantly reduces inference steps, while prefilling improves speed with minimal performance trade-offs. Dimple also enables structured and controllable outputs through structure priors, offering fine-grained control over format and length capabilities that autoregressive models struggle to provide. 


    Check out the Paper, Model on Hugging Face and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post National University of Singapore Researchers Introduce Dimple: A Discrete Diffusion Multimodal Language Model for Efficient and Controllable Text Generation appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleThis AI Paper Introduces WEB-SHEPHERD: A Process Reward Model for Web Agents with 40K Dataset and 10× Cost Efficiency
    Next Article The Future of AI-Generated Design: From Architecture to Advertising🎨

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 17, 2025
    Machine Learning

    Implementing on-demand deployment with customized Amazon Nova models on Amazon Bedrock

    July 17, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-5528 – WordPress Sassy Social Share Reflected Cross-Site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-43561 – ColdFusion Incorrect Authorization Code Execution Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-20182 – Cisco IKEv2 Protocol Denial of Service Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Microsoft’s extra year of Windows 10 security updates isn’t a “viable solution” for the 400 million PCs that can’t upgrade to Windows 11 — “It’s obvious users are frustrated and feel yanked around.”

    News & Updates

    Highlights

    CVE-2025-3840 – Oracle OVA Connect Installer Cross-Site Scripting (XSS)

    April 21, 2025

    CVE ID : CVE-2025-3840

    Published : April 21, 2025, 10:15 a.m. | 41 minutes ago

    Description : An improper neutralization of input vulnerability was identified in the End of Life (EOL) OVA based connect installer component which is deployed for installation purposes in a customer network. This EOL component was deprecated in September 2023 with end of support extended till January 2024. An actor can manipulate the action parameter of the login form to inject malicious scripts which would lead to a XSS attack under certain conditions.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2024-24916 – Adobe Installer DLL Loading Vulnerability

    June 19, 2025

    CVE-2025-4248 – SourceCodester Simple To-Do List System SQL Injection

    May 4, 2025

    30+ Best Free Google Slides Templates for Designers & Professionals (2025)

    May 5, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.