Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL

    This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL

    May 19, 2024

    Incorporating demonstrating examples, known as in-context learning (ICL), significantly enhances large language models (LLMs) and large multimodal models (LMMs) without requiring parameter updates. Recent studies confirm the efficacy of few-shot multimodal ICL, particularly in improving LMM performance on out-of-domain tasks. With longer context windows in advanced models like GPT-4o and Gemini 1.5 Pro, researchers can now investigate the impact of increasing demonstrating examples, a factor previously constrained by context window limitations.

    Some researchers observed enhanced performance in LLMs with increased in-context examples, albeit constrained by context size. Recent studies extended this exploration, demonstrating improvements with over 1,000 examples, besides in text-only benchmarks. Multimodal ICL research remains emerging, with studies showing benefits for models like GPT-4V and Gemini in out-domain tasks. Batch querying strategies offer efficiency gains in inference, with recent variations proposed to optimize performance, utilizing larger context windows in recent models.

    To examine the potential of advanced multimodal foundation models in many-shot ICL, researchers from Stanford execute an extensive array of experiments to assess model efficacy across 10 datasets covering various domains and image classification tasks. This involves significantly increasing the number of demonstrating examples to gauge model performance.

    The Key findings of this study include:

    1. Increased demonstrating examples significantly enhance model performance, with Gemini 1.5 Pro showing consistent log-linear improvements compared to GPT-4o.

    2. Gemini 1.5 Pro demonstrates higher ICL data efficiency compared to GPT-4o across most datasets.

    3. Combining multiple queries into a single request can deliver comparable or superior performance to individual queries in a many-shot scenario. This approach also reduces per-example latency significantly and offers a more cost-effective inference process.

    4. Batched questioning notably enhances performance in zero-shot scenarios, attributed to domain and class calibrated and self-generated demonstrating examples through autoregressive decoding.

    Three advanced multimodal foundation models—GPT-4o, GPT4(V)-Turbo, and Gemini 1.5 Pro—are employed, with GPT-4o and Gemini 1.5 Pro emphasized due to superior performance. Claude3-Opus is excluded from experiments due to its 20-image limit per request. Each model is accessed through specific endpoints, with OpenAI’s API service for GPT-4o and GPT-4(V)-Turbo, and Google Cloud’s Vertex AI for Gemini 1.5 Pro. Zero temperature is set for all models, and a random seed ensures deterministic responses. Sampling strategies ensure class balance in demonstration and test sets across 10 datasets spanning various domains and classification tasks, with demonstration examples scaled up while maintaining balance for evaluation.

    Gemini 1.5 Pro consistently demonstrates significant performance enhancements across most datasets as demonstrating examples increase, except for DrugOOD Assay. Particularly significant improvements are observed in HAM10000 (+23% accuracy compared to zero-shot), FIVES (+29% accuracy), and EuroSAT (+38% accuracy).  for 5 out of the 10 datasets (FIVES, UCMerced, EuroSAT, Oxford Pets, and

    DTD), Gemini 1.5 Pro performance continues to improve up to the highest number of demonstrating

    examples considered (~1,000 examples). Conversely, GPT-4o exhibits performance improvements on most datasets but with less consistency, showing V-shaped scaling curves on many datasets. GPT-4o’s performance on DrugOOD Assay also displays high variance, similar to Gemini 1.5 Pro, with peak performance at 50 demo examples.

    To recapitulate, This study assesses many-shot ICL of state-of-the-art multimodal foundation models across 10 datasets, revealing consistent performance enhancements. Batching queries with many-shot ICL significantly reduces per-example latency and inference costs without sacrificing performance. These findings suggest the potential of utilizing large numbers of demonstrating examples to adapt models quickly to new tasks and domains, circumventing the need for traditional fine-tuning. Future research should investigate the comparative effectiveness and data efficiency of traditional fine-tuning versus many-shot ICL. Also, examining issues like hallucinations and biases in the context of many-shot ICL and batched queries is crucial for model refinement and mitigating biases across diverse sub-groups.

    Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 42k+ ML SubReddit

    The post This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePlanning Architectures for Autonomous Robotics
    Next Article Researchers from Columbia University and Databricks Conducted a Comparative Study of LoRA and Full Finetuning in Large Language Models

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    How to write test cases when different test data gives different results?

    Development

    Community News: Latest PECL Releases (04.15.2025)

    Development

    SCOTUS Chevron Ruling May Have Limited Impact on Cybersecurity

    Development

    CPU-GPU I/O-Aware LLM Inference Reduces Latency in GPUs by Optimizing CPU-GPU Interactions

    Development

    Highlights

    Development

    Profanify

    February 17, 2025

    Profanify is a PestPHP Plugin designed to detect and flag instances of profanity within your…

    The best free password managers of 2024: Expert tested

    August 14, 2024

    The Anatomy of a Perfect Poster: Essential Design Principles

    May 5, 2025

    Implementing a Color Flipper with JavaScript: A Simple Guide

    January 31, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.