Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      How AI further empowers value stream management

      June 27, 2025

      12 Top ReactJS Development Companies in 2025

      June 27, 2025

      Not sure where to go with AI? Here’s your roadmap.

      June 27, 2025

      This week in AI dev tools: A2A donated to Linux Foundation, OpenAI adds Deep Research to API, and more (June 27, 2025)

      June 27, 2025

      Microsoft’s Copilot+ has been here over a year and I still don’t care about it — but I do wish I had one of its features

      June 29, 2025

      SteelSeries’ latest wireless mouse is cheap and colorful — but is this the one to spend your money on?

      June 29, 2025

      DistroWatch Weekly, Issue 1128

      June 29, 2025

      Your Slack app is getting a big upgrade – here’s how to try the new AI features

      June 29, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      How Code Feedback MCP Enhances AI-Generated Code Quality

      June 28, 2025
      Recent

      How Code Feedback MCP Enhances AI-Generated Code Quality

      June 28, 2025

      PRSS Site Creator – Create Blogs and Websites from Your Desktop

      June 28, 2025

      Say hello to ECMAScript 2025

      June 27, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft’s Copilot+ has been here over a year and I still don’t care about it — but I do wish I had one of its features

      June 29, 2025
      Recent

      Microsoft’s Copilot+ has been here over a year and I still don’t care about it — but I do wish I had one of its features

      June 29, 2025

      SteelSeries’ latest wireless mouse is cheap and colorful — but is this the one to spend your money on?

      June 29, 2025

      Microsoft confirms Windows 11 25H2, might make Windows more stable

      June 29, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist Models

    Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist Models

    May 13, 2025

    Artificial intelligence has grown beyond language-focused systems, evolving into models capable of processing multiple input types, such as text, images, audio, and video. This area, known as multimodal learning, aims to replicate the natural human ability to integrate and interpret varied sensory data. Unlike conventional AI models that handle a single modality, multimodal generalists are designed to process and respond across formats. The goal is to move closer to creating systems that mimic human cognition by seamlessly combining different types of knowledge and perception.

    The challenge faced in this field lies in enabling these multimodal systems to demonstrate true generalization. While many models can process multiple inputs, they often fail to transfer learning across tasks or modalities. This absence of cross-task enhancement—known as synergy—hinders progress toward more intelligent and adaptive systems. A model may excel in image classification and text generation separately, but it cannot be considered a robust generalist without the ability to connect skills from both domains. Achieving this synergy is essential for developing more capable, autonomous AI systems.

    Many current tools rely heavily on large language models (LLMs) at their core. These LLMs are often supplemented with external, specialized components tailored to image recognition or speech analysis tasks. For example, existing models such as CLIP or Flamingo integrate language with vision but do not deeply connect the two. Instead of functioning as a unified system, they depend on loosely coupled modules that mimic multimodal intelligence. This fragmented approach means the models lack the internal architecture necessary for meaningful cross-modal learning, resulting in isolated task performance rather than holistic understanding.

    Researchers from the National University of Singapore (NUS), Nanyang Technological University (NTU), Zhejiang University (ZJU), Peking University (PKU), and others proposed an AI framework named General-Level and a benchmark called General-Bench. These tools are built to measure and promote synergy across modalities and tasks. General-Level establishes five levels of classification based on how well a model integrates comprehension, generation, and language tasks. The benchmark is supported by General-Bench, a large dataset encompassing over 700 tasks and 325,800 annotated examples drawn from text, images, audio, video, and 3D data.

    The evaluation method within General-Level is built on the concept of synergy. Models are assessed by task performance and their ability to exceed state-of-the-art (SoTA) specialist scores using shared knowledge. The researchers define three types of synergy—task-to-task, comprehension-generation, and modality-modality—and require increasing capability at each level. For example, a Level-2 model supports many modalities and tasks, while a Level-4 model must exhibit synergy between comprehension and generation. Scores are weighted to reduce bias from modality dominance and encourage models to support a balanced range of tasks.

    The researchers tested 172 large models, including over 100 top-performing MLLMs, against General-Bench. Results revealed that most models do not demonstrate the needed synergy to qualify as higher-level generalists. Even advanced models like GPT-4V and GPT-4o did not reach Level 5, which requires models to use non-language inputs to improve language understanding. The highest-performing models managed only basic multimodal interactions, and none showed evidence of total synergy across tasks and modalities. For instance, the benchmark showed 702 tasks assessed across 145 skills, yet no model achieved dominance in all areas. General-Bench’s coverage across 29 disciplines, using 58 evaluation metrics, set a new standard for comprehensiveness.

    This research clarifies the gap between current multimodal systems and the ideal generalist model. The researchers address a core issue in multimodal AI by introducing tools prioritizing integration over specialization. With General-Level and General-Bench, they offer a rigorous path forward for assessing and building models that handle various inputs and learn and reason across them. Their approach helps steer the field toward more intelligent systems with real-world flexibility and cross-modal understanding.


    Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

    Here’s a brief overview of what we’re building at Marktechpost:

    • ML News Community – r/machinelearningnews (92k+ members)
    • Newsletter– airesearchinsights.com/(30k+ subscribers)
    • miniCON AI Events – minicon.marktechpost.com
    • AI Reports & Magazines – magazine.marktechpost.com
    • AI Dev & Research News – marktechpost.com (1M+ monthly readers)
    • Partner with us

    The post Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare
    Next Article A Step-by-Step Guide on Building, Customizing, and Publishing an AI-Focused Blogging Website with Lovable.dev and Seamless GitHub Integration

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 29, 2025
    Machine Learning

    AWS costs estimation using Amazon Q CLI and AWS Cost Analysis MCP

    June 27, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-28025 – TOTOLINK Router Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Streamlining Date Queries with Laravel’s Shorthand Methods

    Development

    DOJ charges 12 more in $263 million crypto fraud takedown where money was hidden in squishmallow stuffed animals

    Development

    Twilio Unveils Next-Generation Customer Engagement Platform Built for an AI and Data-Powered World at SIGNAL 2025

    Tech & Work

    Highlights

    Best early Prime Day smartwatch and fitness tracker deals: My 10 favorite sales live now

    June 18, 2025

    Find helpful wearables from smartwatches to smart rings and more discounted ahead of Prime Day.…

    An Animated Introduction to Elixir

    May 22, 2025

    CVE-2022-27562 – HCL Domino Volt HTML Injection Vulnerability

    April 30, 2025

    CVE-2025-6161 – SourceCodester Simple Food Ordering System Unrestricted File Upload Vulnerability

    June 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.