Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 20, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 20, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 20, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 20, 2025

      GPT-5 should have a higher “degree of scientific certainty” than the current ChatGPT — but with less model switching

      May 20, 2025

      Elon Musk’s Grok 3 AI coming to Azure proves Satya Nadella’s allegiance isn’t to OpenAI, but to maximizing Microsoft’s profit gains by heeding consumer demands

      May 20, 2025

      One of the most promising open-world RPGs in years is releasing next week on Xbox and PC

      May 20, 2025

      NVIDIA’s latest driver fixes some big issues with DOOM: The Dark Ages

      May 20, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Community News: Latest PECL Releases (05.20.2025)

      May 20, 2025
      Recent

      Community News: Latest PECL Releases (05.20.2025)

      May 20, 2025

      Getting Started with Personalization in Sitecore XM Cloud: Enable, Extend, and Execute

      May 20, 2025

      Universal Design and Global Accessibility Awareness Day (GAAD)

      May 20, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      GPT-5 should have a higher “degree of scientific certainty” than the current ChatGPT — but with less model switching

      May 20, 2025
      Recent

      GPT-5 should have a higher “degree of scientific certainty” than the current ChatGPT — but with less model switching

      May 20, 2025

      Elon Musk’s Grok 3 AI coming to Azure proves Satya Nadella’s allegiance isn’t to OpenAI, but to maximizing Microsoft’s profit gains by heeding consumer demands

      May 20, 2025

      One of the most promising open-world RPGs in years is releasing next week on Xbox and PC

      May 20, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Researchers at Stanford Explore the Potential of Mid-Sized Language Models for Clinical QA (Question-Answering) Tasks

    Researchers at Stanford Explore the Potential of Mid-Sized Language Models for Clinical QA (Question-Answering) Tasks

    May 3, 2024

    Recently, there has been remarkable performance on clinical question-answer (QA) tasks by large language models (LLMs) like Med-PaLM 2 and GPT-4. For example, Med-PaLM 2 produced answers to consumer health questions that were competitive with human doctors, and a GPT-4-based system achieved 90.2% on the MedQA task. But these models have a lot of problems. They are costly to train and run and ecologically unsustainable because their parameter counts can reach into the billions, necessitating dedicated computing clusters. Researchers can only access these large models through a paid API. Researchers and practitioners are thus unable to analyze these models, and only individuals having access to the model’s weights and architecture can research improvements.

    A new and promising approach, known as on-device AI or edge AI, utilizes local devices like phones or tablets to run language models. This technology holds immense potential in biomedicine, offering solutions such as disseminating medical information after catastrophic events or in areas with limited or no internet service. Despite the challenges posed by their size and closed nature, models like GPT-4 and Med-PaLM 2 can be adapted for on-device AI, opening up new avenues for research and application in the field.

    In a biomedical context, two types of models are applicable. Only biomedical text from PubMed was used to train smaller domain-specific models (<3B parameters) like BioGPT-large and BioMedLM. Larger 7B parameter models like LLaMA 2 and Mistral 7B are more powerful than their smaller counterparts. However, they were trained on broad English text and did not have a biological focus. How well these models work and which is best suited for clinical QA applications are still in the air.

    To ensure comprehensive and reliable findings, a team of researchers from Stanford University, University College London, and the University of Cambridge conducted a rigorous evaluation of all four models in the clinical QA domain. They used two popular tasks, MedQA (questions similar to those on the USMLE) and MultiMedQA Long Form Answering (open response to consumer health queries), which assess the ability to understand and reason about medical scenarios and write informative paragraphs responding to health questions.

    The MedQA four-option activity is similar to the USMLE in that it asks a question with four possible answers. This test commonly assesses a language model’s ability to use medical information and reason about clinical situations. Some questions may seek particular medical information (such as schizophrenia symptoms), while others may pose a clinical scenario and ask for the best diagnosis or next step (such as, “A 27-year-old male presents… “). 

    There are 1273 test cases, 10178 training instances, and 1272 development examples in the MedQA dataset. A prompt and an expected response were provided for each example. The four models were taught to use the same prompt and offer the same response, just the word “Answer: “accompanied by the letter representing the right choice. Comparison of Four-Way Models By updating all of their parameters, all four models were fine-tuned using the 10178 training instances. The researchers used the same format, training data, and training code for all the models to ensure they could compare them fairly. To get the models just right, they used the Hugging Face package.

    By merging the MedQA training data with the bigger MedMCQA training set, which includes 182822 more examples, the top-performing model (Mistral 7B) was fine-tuned, allowing researchers to delve deeper into the capabilities of mid-size models. Research has demonstrated that using this data for training improves MedQA performance. At this stage, they trained the model to produce the right letter and the complete text of the response using a somewhat more complex request. A comparable hyperparameter sweep was used to find the optimal values. Remember that the primary goal of these trials was to optimize Mistral 7B’s performance rather than to provide an accurate evaluation of competing models.

    To train the model for the MultiMedQA Long Form Question Answering job, the researchers fed it health-related queries that users often submit to search engines. Three datasets—LiveQA, MedicationQA, and HealthSearchQA—contribute to the four thousand questions. LiveQA also includes answers to frequently asked questions. Similarly to a response to a health-related frequently asked questions page, the system is anticipated to produce a detailed response of one or two paragraphs. The comprehensive set of questions covers infectious diseases, chronic illnesses, dietary deficiencies, reproductive health, developmental issues, drug usage, pharmaceutical interactions, preventative measures, and a host of other consumer health subjects.

    These findings have practical implications for the field of biomedicine. Mistral 7 B emerged as the top performer on both tests, demonstrating its potential for clinical question-answering tasks. BioMedLM, while less bulky than the 7B versions, also showed respectable performance. For those with the computational resources, BioGPT-large can provide satisfactory results. However, the researchers noted that domain-specific models performed worse on both tasks than larger-scale models trained on generic English, which might have incorporated the PubMed corpus. The question of whether a larger biomedical specialty model would significantly outperform Mistral 7B remains open, highlighting the need for expert medical review of model outputs before their clinical application.  

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    For medicine, how do good, mid-sized, general LLMs (which may be partially trained on medical text) compare in performance to models built on medical resources like PubMed? We find that the general-purpose models now do better (Bolton, Xiong, et al. 2024)https://t.co/XgkMwlKCsV pic.twitter.com/5hOZ1M4NHS

    — Stanford NLP Group (@stanfordnlp) April 29, 2024

    The post Researchers at Stanford Explore the Potential of Mid-Sized Language Models for Clinical QA (Question-Answering) Tasks appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleIs there any possible way to automatically update the locators when developer changes the locators
    Next Article Latent Guard: A Machine Learning Framework Designed to Improve the Safety of Text-to-Image T2I Generative Networks

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 20, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-30193 – DNSdist TCP Stack Exhaustion Denial of Service Vulnerability

    May 20, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Resume Worded Review: Is It Worth the Price Tag?

    Development

    Scientists accelerate the search for Parkinson’s treatments using AI

    Artificial Intelligence

    Morgan Wallen I’m The Problem Tour 2025 Shirt

    Web Development

    Conversion-Centered Content: 5 Secrets Your Healthcare Organization Needs to Know

    Development

    Highlights

    News & Updates

    Microsoft renders Mail & Calendar apps inoperable as it forces users over to the new Outlook on Windows

    January 15, 2025

    Microsoft announced last year that it would be retiring the Windows Mail & Calendar apps…

    CVE-2025-4243 – Code-projects Online Bus Reservation System SQL Injection Vulnerability

    May 3, 2025

    Marvel’s Spider-Man 2 gets first big patch on PC as “Mixed” player reviews pour in

    February 7, 2025

    New – Amazon DynamoDB lowers pricing for on-demand throughput and global tables

    November 15, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.