Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models

Generative models have revolutionized fields like language, vision, and biology through their ability to learn and sample from complex data distributions. While these models benefit from scaling up during training through increased data, computational resources, and model sizes, their inference-time scaling capabilities face significant challenges. Specifically, diffusion models, which excel in generating continuous data like images, audio, and videos through a denoising process, encounter limitations in performance improvement when simply increasing the number of function evaluations (NFE) during inference. The traditional approach of adding more denoising steps prevents these models from achieving better results despite additional computational investment.

Various approaches have been explored to enhance the performance of generative models during inference. Test-time compute scaling has proven effective for LLMs through improved search algorithms, verification methods, and compute allocation strategies. Researchers have pursued multiple directions in diffusion models including fine-tuning approaches, reinforcement learning techniques, and implementing direct preference optimization. Moreover, sample selection and optimization methods have been developed using Random Search algorithms, VQA models, and human preference models. However, these methods either focus on training-time improvements or limited test-time optimizations, leaving room for more detailed inference-time scaling solutions.

Researchers from NYU, MIT, and Google have proposed a fundamental framework for scaling diffusion models during inference time. Their approach moves beyond simply increasing denoising steps and introduces a novel search-based methodology for improving generation performance through better noise identification. The framework operates along two key dimensions: utilizing verifiers for feedback and implementing algorithms to discover superior noise candidates. This approach addresses the limitations of conventional scaling methods by introducing a structured way to use additional computational resources during inference. The framework’s flexibility allows component combinations to be tailored to specific application scenarios.

The framework’s implementation centers on class-conditional ImageNet generation using a pre-trained SiT-XL model with 256 × 256 resolution and a second-order Heun sampler. The architecture maintains a fixed 250 denoising steps while exploring additional NFEs dedicated to search operations. The core search mechanism employs a Random Search algorithm, implementing a Best-of-N strategy to select optimal noise candidates. The system utilizes two Oracle Verifiers for verification: Inception Score (IS) and Fréchet Inception Distance (FID). IS selection is based on the highest classification probability from a pre-trained InceptionV3 model, while FID selection minimizes divergence against pre-calculated ImageNet Inception feature statistics.

The framework’s effectiveness has been shown through comprehensive testing on different benchmarks. On DrawBench, which features diverse text prompts, the LLM Grader evaluation shows that searching with various verifiers consistently improves sample quality, though with different patterns across setups. ImageReward and Verifier Ensemble perform well, showing improvements across all metrics due to their nuanced evaluation capabilities and alignment with human preferences. The results reveal different optimal configurations on T2I-CompBench, focusing on text-prompt accuracy rather than visual quality. ImageReward emerges as the top performer, while Aesthetic Scores show minimal or negative impact, and CLIP provides modest improvements.

In conclusion, researchers establish a significant advancement in the diffusion models by introducing a framework for inference-time scaling through strategic search mechanisms. The study shows that computational scaling via search methods can achieve substantial performance improvements across different model sizes and generation tasks, with varying computational budgets yielding distinct scaling behaviors. The research concludes that while the approach proves successful, it also reveals the inherent biases in different verifiers and emphasizes the importance of developing task-specific verification methods. This insight opens new avenues for future research in developing more targeted and efficient verification systems for various vision generation tasks.

Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

The post Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Gemini can now watch Google Drive videos for you – including work meetings

LG is still giving away a free 27-inch gaming monitor, but you’ll have to hurry

Slow Roku TV? This 30-second fix made my system run like new again

Hume’s new EVI 3 model lets you customize AI voices – how to try it

Your Agentforce Readiness Assessment

Your Agentforce Readiness Assessment

Introducing N|Sentinel: Your AI-Powered Agent for Node.js Performance Optimization

FoalTS framework – version 5 is released

KB5058499 finally makes Windows 11 24H2 stable for gaming, and it wasn’t Nvidia’s fault

KB5058499 finally makes Windows 11 24H2 stable for gaming, and it wasn’t Nvidia’s fault

Transform Your Workflow With These 10 Essential Yet Overlooked Linux Tools You Need to Try

KNOPPIX is a bootable Live system

Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

Real-world applications of Amazon Nova Canvas for interior design and product photography

What is GNUnet? A Complete Guide

AI Ethics and Privacy: The Human Role in Responsible Tech

Fortinet Patches CVE-2025-32756 Zero-Day RCE Flaw Exploited in FortiVoice Systems

GitHub introduces security campaigns to help developers reduce security debt

Windows 11 file sharing could be transformed by this hidden feature, especially if you have a Surface Pro

Using Manim For Making UI Animations

CVE-2024-57235 – NETGEAR RAX5 Command Injection Vulnerability

passfzf is a simple fzf wrapper for pass

Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models

Related Posts