Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»AgentClinic: Simulating Clinical Environments for Assessing Language Models in Healthcare

    AgentClinic: Simulating Clinical Environments for Assessing Language Models in Healthcare

    May 17, 2024

    The primary goal of AI is to create interactive systems capable of solving diverse problems, including those in medical AI aimed at improving patient outcomes. Large language models (LLMs)  have demonstrated significant problem-solving abilities, surpassing human scores on exams like the USMLE. While LLMs can enhance healthcare accessibility, they still face limitations in real-world clinical settings due to the complexity of clinical tasks involving sequential decision-making, handling uncertainty, and compassionate patient care. Current evaluations mostly focus on static multiple-choice questions, not fully capturing the dynamic nature of clinical work.

    The USMLE assesses medical students across foundational knowledge, clinical application, and independent practice skills. In contrast, the Objective Structured Clinical Examination (OSCE) evaluates practical clinical skills through simulated scenarios, offering direct observation and a comprehensive assessment. Language models in medicine are primarily evaluated using knowledge-based benchmarks like MedQA, which consists of challenging medical question-answering pairs. Recent efforts focus on refining language models’ applications in healthcare through red teaming and creating new benchmarks like EquityMedQA to address biases and improve evaluation methods. Also, advancements in clinical decision-making simulations, such as AMIE, show promise in enhancing diagnostic accuracy in medical AI.

    Researchers from  Stanford University, Johns Hopkins University, and Hospital Israelita Albert Einstein present AgentClinic, an open-source benchmark for simulating clinical environments using language, patient, doctor, and measurement agents. It extends previous simulations by including medical exams (e.g., temperature, blood pressure) and ordering medical images (e.g., MRI, X-ray) through dialogue. Also, AgentClinic supports 24 biases found in clinical settings.

    AgentClinic introduces four language agents: patient, doctor, measurement, and moderator. Each agent has specific roles and unique information for simulating clinical interactions. The patient agent provides symptom information without knowing the diagnosis, the measurement agent offers medical readings and test results, the doctor agent evaluates the patient and requests tests, and the moderator assesses the doctor’s diagnosis. AgentClinic also includes 24 biases relevant to clinical settings. The agents are built using curated medical questions from the USMLE and NEJM case challenges to create structured scenarios for evaluation using language models like GPT-4.

    The accuracy of different language models (GPT-4, Mixtral-8x7B, GPT-3.5, and Llama 2 70B-chat) is evaluated on AgentClinic-MedQA, where each model acts as a doctor agent diagnosing patients through dialogue. GPT-4 achieved the highest accuracy at 52%, followed by GPT-3.5 at 38%, Mixtral-8x7B at 37%, and Llama 2 at 70B-chat at 9%. Comparison with MedQA accuracy showed weak predictability for AgentClinic-MedQA accuracy, similar to studies on medical residents’ performance relative to the USMLE.

    To recapitulate,  this work researchers present AgentClinic, a benchmark for simulating clinical environments with 15 multimodal language agents and 107 unique language agents based on USMLE cases. These agents exhibit 23 biases, impacting diagnostic accuracy and patient-doctor interactions. GPT-4, the highest-performing model, shows reduced accuracy (1.7%-2%) with cognitive biases and larger reductions (1.5%) with implicit biases, affecting patient follow-up willingness and confidence. Cross-communication between patient and doctor models improves accuracy. Limited or excessive interaction time decreases accuracy, with a 27% reduction at N=10 interactions and a 4%-9% reduction at N>20 interactions. GPT-4V achieves around 27% accuracy in a multimodal clinical environment based on NEJM cases.

    Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 42k+ ML SubReddit

    The post AgentClinic: Simulating Clinical Environments for Assessing Language Models in Healthcare appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNuMind Releases Three SOTA NER Models that Outperform Similar-Sized Foundation Models in the Few-shot Regime and Competing with Much Larger LLMs
    Next Article Consistency Large Language Models (CLLMs): A New Family of LLMs Specialized for the Jacobi Decoding Method for Latency Reduction

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    How to manage personal info saved on Microsoft Edge

    News & Updates

    60 AI-Video Creation Tools Compiled by Marketing Legend Srinidhi Ranganathan

    Artificial Intelligence

    How to implement automated invoice processing for high-volume operations

    Artificial Intelligence

    How Client Requests Can Undermine Web Performance & Accessibility

    Development

    Highlights

    Development

    Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

    July 26, 2024

    Implementing hardware resiliency in your training infrastructure is crucial to mitigating risks and enabling uninterrupted…

    The Future of AI

    February 3, 2025

    NativePHP Hit $100K — And We’re Just Getting Started 🚀

    May 8, 2025

    How to record a phone call on Android in 3 easy ways

    June 5, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.