Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»BiomedGPT: A Versatile Transformer-Based Foundation Model for Biomedical AI with Enhanced Multimodal Capabilities and Performance

    BiomedGPT: A Versatile Transformer-Based Foundation Model for Biomedical AI with Enhanced Multimodal Capabilities and Performance

    August 11, 2024

    Traditional biomedical AI models are often specialized and need more flexibility, making them less effective for real-world applications requiring integrating various data types. Generalist AI models, particularly those based on transformers, offer a versatile solution by handling textual and visual data. These models can streamline complex tasks like radiology interpretation and clinical summarization, overcoming the limitations of narrow, task-specific systems. Unlike many biomedical models, which are cumbersome and closed-source, generalist models can simplify deployment and management by consolidating multiple functions into a single system, improving efficiency and adaptability in medical settings.

    Researchers from Lehigh University and other institutions present BiomedGPT, an open-source, lightweight vision–language foundation model designed for various biomedical tasks. BiomedGPT achieved state-of-the-art results in 16 out of 25 experiments while maintaining a computing-friendly model scale. Human evaluations showed robust performance in radiology visual question answering, report generation, and summarization, with low error rates and competitive summarization ability. BiomedGPT, trained with diverse, cross-disciplinary data, demonstrates effective transfer and zero-shot learning capabilities. Despite its potential, further improvements are needed for clinical deployment, particularly in safety, equity, and bias considerations.

    BiomedGPT is a transformer-based model optimized for the biomedical field, combining concepts from Vision Transformers and language models. Its encoder-decoder architecture, featuring a BERT-style and GPT-style decoder, supports multimodal tasks with enhanced convergence through multi-head attention and normalization. The model comes in three sizes (BiomedGPT-S, M, and B) and processes inputs via a unified token vocabulary for text and image patches. It undergoes pretraining with a mix of vision and text tasks, fine-tuned on specific datasets. Evaluated using accuracy, F1 score, and ROUGE-L, BiomedGPT’s capabilities include 3D imaging extension and instruction-tuning for zero-shot tasks.

    BiomedGPT utilizes masked modeling and supervised learning during its pretraining phase, leveraging 14 diverse datasets to build strong data representations. The model is available in three sizes: small (BiomedGPT-S), medium (BiomedGPT-M), and base (BiomedGPT-B). BiomedGPT was adapted for several biomedical applications during fine-tuning, including medical image classification, text understanding, summarization, image captioning, and visual question answering (VQA). These applications aim to enhance disease diagnostics, clinical documentation, and healthcare chatbot development.

    In performance evaluations, BiomedGPT excelled across various multimodal tasks. It achieved 86.1% accuracy in VQA on the SLAKE dataset, surpassing the previous state-of-the-art. BiomedGPT outperformed previous models in medical image classification on seven out of nine MedMNIST-Raw datasets. For text understanding and summarization, BiomedGPT-B demonstrated superior results compared to BioGPT and LLaVA-Med. The model also showed effective zero-shot capabilities for biomedical VQA and report generation, though there is still potential for improvement. Human evaluations of BiomedGPT’s radiology task performance indicated high accuracy and competitive results in radiology report generation and summarization.

    The study demonstrates that BiomedGPT achieves strong transfer-learning performance across vision, language, and multimodal domains by integrating diverse biomedical data within a unified framework. However, challenges persist, such as the need for high-quality annotated biomedical data and the risk of negative transfer when expanding to new data types like 3D images. Evaluation of generated text remains difficult, with emerging metrics like the F1-RadGraph score helping to assess factual accuracy. While scaling improves performance, it also introduces efficiency and training challenges. BiomedGPT’s capabilities, particularly in zero-shot scenarios, are limited by current resources and training strategies, though fine-tuning shows promise.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

    The post BiomedGPT: A Versatile Transformer-Based Foundation Model for Biomedical AI with Enhanced Multimodal Capabilities and Performance appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCodexGraph: An Artificial Intelligence AI System that Integrates LLM Agents with Graph Database Interfaces Extracted from Code Repositories
    Next Article LiteLLM: Call 100+ LLMs Using the Same Input/Output Format

    Related Posts

    Machine Learning

    Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

    May 16, 2025
    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Orchestrate generative AI workflows with Amazon Bedrock and AWS Step Functions

    Development

    How To Fix a Corrupt JPEG File? All You Need To Know

    Operating Systems

    Design Inspiration

    Web Development

    CVE-2025-4375 – Sparx Systems Pro Cloud Server CSRF Session Hijacking

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Development

    Understanding Debouncing and Throttling in JavaScript – A Comprehensive Guide

    November 12, 2024

    Throttling and debouncing are two essential optimization strategies. In this comprehensive guide, we will delve…

    A new thesis for the Fermi Paradox: is AI a Great Filter or a cosmic colonizer?

    August 8, 2024

    EXAMPLEARTICLE

    February 7, 2025

    Microsoft Teams to block screen capture during meetings starting July 2025

    May 13, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.