This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL

Incorporating demonstrating examples, known as in-context learning (ICL), significantly enhances large language models (LLMs) and large multimodal models (LMMs) without requiring parameter updates. Recent studies confirm the efficacy of few-shot multimodal ICL, particularly in improving LMM performance on out-of-domain tasks. With longer context windows in advanced models like GPT-4o and Gemini 1.5 Pro, researchers can now investigate the impact of increasing demonstrating examples, a factor previously constrained by context window limitations.

Some researchers observed enhanced performance in LLMs with increased in-context examples, albeit constrained by context size. Recent studies extended this exploration, demonstrating improvements with over 1,000 examples, besides in text-only benchmarks. Multimodal ICL research remains emerging, with studies showing benefits for models like GPT-4V and Gemini in out-domain tasks. Batch querying strategies offer efficiency gains in inference, with recent variations proposed to optimize performance, utilizing larger context windows in recent models.

To examine the potential of advanced multimodal foundation models in many-shot ICL, researchers from Stanford execute an extensive array of experiments to assess model efficacy across 10 datasets covering various domains and image classification tasks. This involves significantly increasing the number of demonstrating examples to gauge model performance.

The Key findings of this study include:

1. Increased demonstrating examples significantly enhance model performance, with Gemini 1.5 Pro showing consistent log-linear improvements compared to GPT-4o.

2. Gemini 1.5 Pro demonstrates higher ICL data efficiency compared to GPT-4o across most datasets.

3. Combining multiple queries into a single request can deliver comparable or superior performance to individual queries in a many-shot scenario. This approach also reduces per-example latency significantly and offers a more cost-effective inference process.

4. Batched questioning notably enhances performance in zero-shot scenarios, attributed to domain and class calibrated and self-generated demonstrating examples through autoregressive decoding.

Three advanced multimodal foundation modelsâ€”GPT-4o, GPT4(V)-Turbo, and Gemini 1.5 Proâ€”are employed, with GPT-4o and Gemini 1.5 Pro emphasized due to superior performance. Claude3-Opus is excluded from experiments due to its 20-image limit per request. Each model is accessed through specific endpoints, with OpenAIâ€™s API service for GPT-4o and GPT-4(V)-Turbo, and Google Cloudâ€™s Vertex AI for Gemini 1.5 Pro. Zero temperature is set for all models, and a random seed ensures deterministic responses. Sampling strategies ensure class balance in demonstration and test sets across 10 datasets spanning various domains and classification tasks, with demonstration examples scaled up while maintaining balance for evaluation.

Gemini 1.5 Pro consistently demonstrates significant performance enhancements across most datasets as demonstrating examples increase, except for DrugOOD Assay. Particularly significant improvements are observed in HAM10000 (+23% accuracy compared to zero-shot), FIVES (+29% accuracy), and EuroSAT (+38% accuracy).Â for 5 out of the 10 datasets (FIVES, UCMerced, EuroSAT, Oxford Pets, and

DTD), Gemini 1.5 Pro performance continues to improve up to the highest number of demonstrating

examples considered (~1,000 examples). Conversely, GPT-4o exhibits performance improvements on most datasets but with less consistency, showing V-shaped scaling curves on many datasets. GPT-4oâ€™s performance on DrugOOD Assay also displays high variance, similar to Gemini 1.5 Pro, with peak performance at 50 demo examples.

To recapitulate, This study assesses many-shot ICL of state-of-the-art multimodal foundation models across 10 datasets, revealing consistent performance enhancements. Batching queries with many-shot ICL significantly reduces per-example latency and inference costs without sacrificing performance. These findings suggest the potential of utilizing large numbers of demonstrating examples to adapt models quickly to new tasks and domains, circumventing the need for traditional fine-tuning. Future research should investigate the comparative effectiveness and data efficiency of traditional fine-tuning versus many-shot ICL. Also, examining issues like hallucinations and biases in the context of many-shot ICL and batched queries is crucial for model refinement and mitigating biases across diverse sub-groups.

Check out theÂ Paper and GitHub. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 42k+ ML SubReddit

The post This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

How to write test cases when different test data gives different results?

Community News: Latest PECL Releases (04.15.2025)

SCOTUS Chevron Ruling May Have Limited Impact on Cybersecurity

CPU-GPU I/O-Aware LLM Inference Reduces Latency in GPUs by Optimizing CPU-GPU Interactions

Profanify

The best free password managers of 2024: Expert tested

The Anatomy of a Perfect Poster: Essential Design Principles

Implementing a Color Flipper with JavaScript: A Simple Guide

This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL

Related Posts