Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 29, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 29, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 29, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 29, 2025

      Gemini can now watch Google Drive videos for you – including work meetings

      May 29, 2025

      LG is still giving away a free 27-inch gaming monitor, but you’ll have to hurry

      May 29, 2025

      Slow Roku TV? This 30-second fix made my system run like new again

      May 29, 2025

      Hume’s new EVI 3 model lets you customize AI voices – how to try it

      May 29, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Your Agentforce Readiness Assessment

      May 29, 2025
      Recent

      Your Agentforce Readiness Assessment

      May 29, 2025

      Introducing N|Sentinel: Your AI-Powered Agent for Node.js Performance Optimization

      May 29, 2025

      FoalTS framework – version 5 is released

      May 29, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      KB5058499 finally makes Windows 11 24H2 stable for gaming, and it wasn’t Nvidia’s fault

      May 29, 2025
      Recent

      KB5058499 finally makes Windows 11 24H2 stable for gaming, and it wasn’t Nvidia’s fault

      May 29, 2025

      Transform Your Workflow With These 10 Essential Yet Overlooked Linux Tools You Need to Try

      May 29, 2025

      KNOPPIX is a bootable Live system

      May 29, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models

    Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models

    January 20, 2025

    Generative models have revolutionized fields like language, vision, and biology through their ability to learn and sample from complex data distributions. While these models benefit from scaling up during training through increased data, computational resources, and model sizes, their inference-time scaling capabilities face significant challenges. Specifically, diffusion models, which excel in generating continuous data like images, audio, and videos through a denoising process, encounter limitations in performance improvement when simply increasing the number of function evaluations (NFE) during inference. The traditional approach of adding more denoising steps prevents these models from achieving better results despite additional computational investment.

    Various approaches have been explored to enhance the performance of generative models during inference. Test-time compute scaling has proven effective for LLMs through improved search algorithms, verification methods, and compute allocation strategies. Researchers have pursued multiple directions in diffusion models including fine-tuning approaches, reinforcement learning techniques, and implementing direct preference optimization. Moreover, sample selection and optimization methods have been developed using Random Search algorithms, VQA models, and human preference models. However, these methods either focus on training-time improvements or limited test-time optimizations, leaving room for more detailed inference-time scaling solutions.

    Researchers from NYU, MIT, and Google have proposed a fundamental framework for scaling diffusion models during inference time. Their approach moves beyond simply increasing denoising steps and introduces a novel search-based methodology for improving generation performance through better noise identification. The framework operates along two key dimensions: utilizing verifiers for feedback and implementing algorithms to discover superior noise candidates. This approach addresses the limitations of conventional scaling methods by introducing a structured way to use additional computational resources during inference. The framework’s flexibility allows component combinations to be tailored to specific application scenarios.

    The framework’s implementation centers on class-conditional ImageNet generation using a pre-trained SiT-XL model with 256 × 256 resolution and a second-order Heun sampler. The architecture maintains a fixed 250 denoising steps while exploring additional NFEs dedicated to search operations. The core search mechanism employs a Random Search algorithm, implementing a Best-of-N strategy to select optimal noise candidates. The system utilizes two Oracle Verifiers for verification: Inception Score (IS) and Fréchet Inception Distance (FID). IS selection is based on the highest classification probability from a pre-trained InceptionV3 model, while FID selection minimizes divergence against pre-calculated ImageNet Inception feature statistics.

    The framework’s effectiveness has been shown through comprehensive testing on different benchmarks. On DrawBench, which features diverse text prompts, the LLM Grader evaluation shows that searching with various verifiers consistently improves sample quality, though with different patterns across setups. ImageReward and Verifier Ensemble perform well, showing improvements across all metrics due to their nuanced evaluation capabilities and alignment with human preferences. The results reveal different optimal configurations on T2I-CompBench, focusing on text-prompt accuracy rather than visual quality. ImageReward emerges as the top performer, while Aesthetic Scores show minimal or negative impact, and CLIP provides modest improvements.

    Hostinger

    In conclusion, researchers establish a significant advancement in the diffusion models by introducing a framework for inference-time scaling through strategic search mechanisms. The study shows that computational scaling via search methods can achieve substantial performance improvements across different model sizes and generation tasks, with varying computational budgets yielding distinct scaling behaviors. The research concludes that while the approach proves successful, it also reveals the inherent biases in different verifiers and emphasizes the importance of developing task-specific verification methods. This insight opens new avenues for future research in developing more targeted and efficient verification systems for various vision generation tasks.


    Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

    🚨 [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

    The post Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSHREC: A Physics-Based Machine Learning Approach to Time Series Analysis
    Next Article Swarm: A Comprehensive Guide to Lightweight Multi-Agent Orchestration for Scalable and Dynamic Workflows with Code Implementation

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 29, 2025
    Machine Learning

    Real-world applications of Amazon Nova Canvas for interior design and product photography

    May 29, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    What is GNUnet? A Complete Guide

    Development

    AI Ethics and Privacy: The Human Role in Responsible Tech

    Web Development

    Fortinet Patches CVE-2025-32756 Zero-Day RCE Flaw Exploited in FortiVoice Systems

    Development

    GitHub introduces security campaigns to help developers reduce security debt

    Tech & Work

    Highlights

    News & Updates

    Windows 11 file sharing could be transformed by this hidden feature, especially if you have a Surface Pro

    January 27, 2025

    Windows 11 has a new method of sharing in the works that will let you…

    Using Manim For Making UI Animations

    April 11, 2025

    CVE-2024-57235 – NETGEAR RAX5 Command Injection Vulnerability

    May 5, 2025

    passfzf is a simple fzf wrapper for pass

    April 19, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.