Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»MedTrinity-25M: A Comprehensive Multimodal Medical Dataset with Advanced Annotations and Its Impact on Vision-Language Model Performance

    MedTrinity-25M: A Comprehensive Multimodal Medical Dataset with Advanced Annotations and Its Impact on Vision-Language Model Performance

    August 9, 2024

    Large-scale multimodal foundation models have achieved notable success in understanding complex visual patterns and natural language, generating interest in their application to medical vision-language tasks. Progress has been made by creating medical datasets with image-text pairs and fine-tuning general domain models on these datasets. However, these datasets have limitations. They lack multi-granular annotations that link local and global information within medical images, which is crucial for identifying specific lesions from regional details. Additionally, current methods for constructing these datasets rely heavily on pairing medical images with reports or captions, limiting their scalability.

    Researchers from UC Santa Cruz, Harvard University, and Stanford University have introduced MedTrinity-25M, a large-scale multimodal medical dataset containing over 25 million images across ten modalities. This dataset includes detailed multi-granular annotations for more than 65 diseases, encompassing global information like disease type and modality and local annotations such as bounding boxes and segmentation masks for regions of interest (ROIs). Using an automated pipeline, the researchers generated these comprehensive annotations without relying on paired text descriptions, enabling advanced multimodal tasks and supporting large-scale pretraining of medical AI models.

    Medical multimodal foundation models have seen growing interest due to their ability to understand complex visual and textual features, leading to advancements in medical vision-language tasks. Models like Med-Flamingo and Med-PaLM have been fine-tuned on medical datasets to enhance their performance. However, the scale of available training data often limits these models. To address this, researchers have focused on constructing large medical datasets. However, datasets like MIMIC-CXR and RadGenome-Chest CT are constrained by the labor-intensive process of pairing images with detailed textual descriptions. In contrast, the MedTrinity-25M dataset uses an automated pipeline to generate comprehensive multi-granular annotations for unpaired photos, offering a significantly larger and more detailed dataset.

    The MedTrinity-25M dataset features over 25 million images organized into triplets of {image, ROI, description}. Images span ten modalities and cover 65 diseases, sourced from repositories like TCIA and Kaggle. ROIs are highlighted with masks or bounding boxes, pinpointing abnormalities or key anatomical features. Multigranular textual descriptions detail the image modality, disease, and ROI specifics. The dataset construction involves generating coarse captions, identifying ROIs with models like SAT and BA-Transformer, and leveraging medical knowledge for accurate descriptions. MedTrinity-25M stands out for its scale, diversity, and detailed annotations compared to other datasets.

    The study evaluated LLaVA-Med++ on biomedical Visual Question Answering (VQA) tasks using VQA-RAD, SLAKE, and PathVQA datasets to assess the impact of pretraining on the MedTrinity-25M dataset. Initial pretraining followed LLaVA-Med’s methodology, with additional fine-tuning on VQA datasets for three epochs. Results show that LLaVA-Med++ with MedTrinity-25M pretraining outperforms the baseline model by approximately 10.75% on VQA-RAD, 6.1% on SLAKE, and 13.25% on PathVQA. It achieves state-of-the-art results in two benchmarks and ranks third in the third, demonstrating significant performance improvements with MedTrinity-25M pretraining.

    The study presents MedTrinity-25M, a vast multi-modal medical dataset with over 25 million image-ROI-description triplets from 90 sources, spanning ten modalities and covering over 65 diseases. Unlike previous methods reliant on paired image-text data, MedTrinity-25M is created using an automated pipeline that generates detailed annotations from unpaired images, leveraging expert models and advanced MLLMs. The dataset’s rich multi-granular annotations support a variety of tasks, including captioning, report generation, and classification. The model, pretrained on MedTrinity-25M, achieved state-of-the-art results in VQA tasks, highlighting its effectiveness for training multimodal medical AI models.

    Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

    The post MedTrinity-25M: A Comprehensive Multimodal Medical Dataset with Advanced Annotations and Its Impact on Vision-Language Model Performance appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSelenium IE Webdriver showing “Only Local connection are allowed” in TestNG
    Next Article SENSE: Bridging the Gap Between Open-Source and Closed-Source LLMs for Advanced Text-to-SQL Parsing

    Related Posts

    Machine Learning

    Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

    May 16, 2025
    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Microsoft fixes Xbox Cloud Gaming issue that won’t capture screenshots or game clips

    Operating Systems

    Five Design Phrases That Need to Die

    Web Development

    ChatGPT’s New Search Feature

    Development

    Cybercriminals Employ PhantomLoader to Distribute SSLoad Malware

    Development

    Highlights

    Development

    Meta AI Introducing the Language Model Transparency Tool: An Open-Source Interactive Toolkit for Analyzing Transformer-based Language Models

    April 18, 2024

    The Large Language Model Transparency Tool (LLM-TT) is an open-source interactive toolkit by Meta Research…

    Morgan Wallen I’m The Problem Tour 2025 Shirt

    January 25, 2025

    Commvault Confirms Hackers Exploited CVE-2025-3928 as Zero-Day in Azure Breach

    May 11, 2025

    From Idea to Prototype in Minutes: Claude Sonnet 3.5

    July 12, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.