Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 31, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 31, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 31, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 31, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025

      How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025
      Recent

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025

      Filament Is Now Running Natively on Mobile

      May 31, 2025

      How Remix is shaking things up

      May 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025
      Recent

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper Introduces XMODE: An Explainable Multi-Modal Data Exploration System Powered by LLMs for Enhanced Accuracy and Efficiency

    This AI Paper Introduces XMODE: An Explainable Multi-Modal Data Exploration System Powered by LLMs for Enhanced Accuracy and Efficiency

    December 29, 2024

    Researchers are focusing increasingly on creating systems that can handle multi-modal data exploration, which combines structured and unstructured data. This involves analyzing text, images, videos, and databases to answer complex queries. These capabilities are crucial in healthcare, where medical professionals interact with patient records, medical imaging, and textual reports. Similarly, multi-modal exploration helps interpret databases with metadata, textual critiques, and artwork images in art curation or research. Seamlessly combining these data types offers significant potential for decision-making and insights.

    One of the main challenges in this field is enabling users to query multi-modal data using natural language. Traditional systems struggle to interpret complex queries that involve multiple data formats, such as asking for trends in structured tables while analyzing related image content. Moreover, the absence of tools that provide clear explanations for query outcomes makes it difficult for users to trust and validate the results. These limitations create a gap between advanced data processing capabilities and real-world usability.

    Current solutions attempt to address these challenges using two main approaches. The first integrates multiple modalities into unified query languages, such as NeuralSQL, which embeds vision-language functions directly into SQL commands. The second uses agentic workflows that coordinate various tools for analyzing specific modalities, exemplified by CAESURA. While these approaches have advanced the field, they fall short in optimizing task execution, ensuring explainability, and addressing complex queries efficiently. These shortcomings highlight the need for a system capable of dynamic adaptation and clear reasoning.

    Researchers at Zurich University of Applied Sciences have introduced XMODE, a novel system designed to address these issues. XMODE enables explainable multi-modal data exploration using a Large Language Model (LLM)-based agentic framework. The system interprets user queries and decomposes them into subtasks like SQL generation and image analysis. By creating workflows represented as Directed Acyclic Graphs (DAGs), XMODE optimizes the sequence and execution of tasks. This approach improves efficiency and accuracy compared to state-of-the-art systems like CAESURA and NeuralSQL. Moreover, XMODE supports task re-planning, enabling it to adapt when specific components fail.

    The architecture of XMODE includes five key components: planning and expert model allocation, execution and self-debugging, decision-making, expert tools, and a shared data repository. When a query is received, the system constructs a detailed workflow of tasks, assigning them to appropriate tools like SQL generation modules and image analysis models. These tasks are executed in parallel wherever possible, reducing latency and computational costs. Further, XMODE’s self-debugging capabilities allow it to identify and rectify errors in task execution, ensuring reliability. This adaptability is critical for handling complex workflows that involve diverse data modalities.

    XMODE demonstrated superior performance during testing on two datasets. On an artwork dataset, XMODE achieved 63.33% accuracy overall, compared to CAESURA’s 33.33%. It excelled in handling tasks requiring complex outputs, such as plots and combined data structures, achieving 100% accuracy in generating plot-plot and plot-data structure outputs. Also, XMODE’s ability to execute tasks in parallel reduced latency to 3,040 milliseconds, compared to CAESURA’s 5,821 milliseconds. These results highlight its efficiency in processing natural language queries over multi-modal datasets.

    Hostinger

    On the electronic health records (EHR) dataset, XMODE achieved 51% accuracy, outperforming NeuralSQL in multi-table queries, scoring 77.50% compared to NeuralSQL’s 47.50%. The system demonstrated strong performance in handling binary queries, achieving 74% accuracy, significantly higher than NeuralSQL’s 48% in the same category. XMODE’s capability to adapt and re-plan tasks contributed to its robust performance, making it particularly effective in scenarios requiring detailed reasoning and cross-modal integration.

    XMODE effectively addresses the limitations of existing multi-modal data exploration systems by combining advanced planning, parallel task execution, and dynamic re-planning. Its innovative approach allows users to query complex datasets efficiently, ensuring transparency and explainability. With demonstrated accuracy, efficiency, and cost-effectiveness improvements, XMODE represents a significant advancement in the field, offering practical applications in areas such as healthcare and art curation.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

    🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

    The post This AI Paper Introduces XMODE: An Explainable Multi-Modal Data Exploration System Powered by LLMs for Enhanced Accuracy and Efficiency appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleB-STAR: A Self-Taught AI Reasoning Framework for LLMs
    Next Article Advancing Parallel Programming with HPC-INSTRUCT: Optimizing Code LLMs for High-Performance Computing

    Related Posts

    Security

    New Linux Flaws Allow Password Hash Theft via Core Dumps in Ubuntu, RHEL, Fedora

    June 1, 2025
    Security

    Exploit details for max severity Cisco IOS XE flaw now public

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Community News: Latest PEAR Releases (12.09.2024)

    Development

    Error’d: Well Done

    News & Updates

    CVE-2025-4189 – WordPress Audio Comments Plugin CSRF

    Common Vulnerabilities and Exposures (CVEs)

    Hover Animations for Terminal-like Typography

    Development

    Highlights

    Development

    Representative Line: One More Parameter, Bro

    November 7, 2024

    Matt needed to add a new field to a form. This simple task was made…

    Microsoft UX Dark Patterns, CSS-only blurry image placeholders + Inclusive Design

    April 10, 2025

    Build a Forum With Laravel: He Likes Me Not

    June 17, 2024

    Empowering Industry with Seamless Online Procurement

    May 5, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.