Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Gemini 2.5 Pro and Flash are generally available and Gemini 2.5 Flash-Lite preview is announced

      June 19, 2025

      CSS Cascade Layers Vs. BEM Vs. Utility Classes: Specificity Control

      June 19, 2025

      IBM launches new integration to help unify AI security and governance

      June 18, 2025

      Meet Accessible UX Research, A Brand-New Smashing Book

      June 18, 2025

      How to free up your Mac’s storage space – 3 easy ways

      June 19, 2025

      I finally found a mini PC with a striking design (and the power to back it up)

      June 19, 2025

      The best password generators of 2025: Expert tested

      June 19, 2025

      Facebook’s new passkey support could soon let you ditch your password forever

      June 19, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      eslint-plugin-mutate

      June 19, 2025
      Recent

      eslint-plugin-mutate

      June 19, 2025

      Event-Driven Microservice Backend For a Modern E-commerce Platform.

      June 19, 2025

      Search Params Are State – How TanStack Router Solves It

      June 19, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      You Can Now Auto-Generate Google Forms Using Gemini Using Prompts or Files – Here’s How

      June 19, 2025
      Recent

      You Can Now Auto-Generate Google Forms Using Gemini Using Prompts or Files – Here’s How

      June 19, 2025

      Google Helps Devs Build Safe Android Apps with THIS Play Policy – Find Out More Here

      June 19, 2025

      Microsoft Edge for Business Now Lets Admins Push Encrypted Passwords to Users Securely

      June 19, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary Care Physicians Using Multimodal Reasoning with Gemini 2.0 Flash

    Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary Care Physicians Using Multimodal Reasoning with Gemini 2.0 Flash

    May 4, 2025

    LLMs have shown impressive promise in conducting diagnostic conversations, particularly through text-based interactions. However, their evaluation and application have largely ignored the multimodal nature of real-world clinical settings, especially in remote care delivery, where images, lab reports, and other medical data are routinely shared through messaging platforms. While systems like the Articulate Medical Intelligence Explorer (AMIE) have matched or surpassed primary care physicians in text-only consultations, this format falls short of reflecting telemedicine environments. Multimodal communication is essential in modern care, as patients often share photographs, documents, and other visual artifacts that cannot be fully conveyed through text alone. Limiting AI systems to textual inputs risks omitting critical clinical information, increasing diagnostic errors, and creating accessibility barriers for patients with lower health or digital literacy. Despite the widespread use of multimedia messaging apps in global healthcare, there has been little research into how LLMs can reason over such diverse data during diagnostic interactions.

    Research in diagnostic conversational agents began with rule-based systems like MYCIN, but recent developments have focused on LLMs capable of emulating clinical reasoning. While multimodal AI systems, such as vision-language models, have demonstrated success in radiology and dermatology, integrating these capabilities into conversational diagnostics remains challenging. Effective AI-based diagnostic tools must handle the complexity of multimodal reasoning and uncertainty-driven information gathering, a step beyond merely answering isolated questions. Evaluation frameworks like OSCEs and platforms such as AgentClinic provide useful starting points, yet tailored metrics are still needed to assess performance in multimodal diagnostic contexts. Moreover, while messaging apps are increasingly used in low-resource settings for sharing clinical data, concerns about data privacy, integration with formal health systems, and policy compliance persist. 

    Google DeepMind and Google Research have enhanced the AMIE with multimodal capabilities for improved conversational diagnosis and management. Using Gemini 2.0 Flash, AMIE employs a state-aware dialogue framework that adapts conversation flow based on patient state and diagnostic uncertainty, allowing strategic, structured history-taking with multimodal inputs like skin images, ECGs, and documents. AMIE outperformed or matched primary care physicians in a randomized OSCE-style study with 105 scenarios and 25 patient actors across 29 of 32 clinical metrics and 7 of 9 multimodal-specific criteria, demonstrating strong diagnostic accuracy, reasoning, communication, and empathy. 

    The study enhances the AMIE diagnostic system by incorporating multimodal perception and a state-aware dialogue framework that guides conversations through phases of history taking, diagnosis, and follow-up. Gemini 2.0 Flash powers the system and dynamically adapts based on evolving patient data, including text, images, and clinical documents. A structured patient profile and differential diagnosis are updated throughout the interaction, with targeted questions and multimodal data requests guiding clinical reasoning. Evaluation includes automated perception tests on isolated artifacts, simulated dialogues rated by auto-evaluators, and expert OSCE-style assessments, ensuring robust diagnostic performance and clinical realism. 

    The results show that the multimodal AMIE system performs at par or better than primary care physicians (PCPs) across multiple clinical tasks in simulated text-chat consultations. In OSCE-style assessments, AMIE consistently outperformed PCPs in diagnostic accuracy, especially when interpreting multimodal data such as images and clinical documents. It also demonstrated greater robustness when image quality was poor and showed fewer hallucinations. Patient actors rated AMIE’s communication skills highly, including empathy and trust. Automated evaluations confirmed that AMIE’s advanced reasoning framework, built on the Gemini 2.0 Flash model, significantly improved diagnosis and conversation quality, validating its design and effectiveness in real-world clinical scenarios. 

    In conclusion, the study advances conversational diagnostic AI by enhancing AMIE to integrate multimodal reasoning within patient dialogues. Using a novel state-aware inference-time strategy with Gemini 2.0 Flash, AMIE can interpret and reason about medical artifacts like images or ECGs in real-time clinical conversations. Evaluated through a multimodal OSCE framework, AMIE outperformed or matched primary care physicians in diagnostic accuracy, empathy, and artifact interpretation, even in complex cases. Despite limitations tied to chat-based interfaces and the need for real-world testing, these findings highlight AMIE’s potential as a robust, context-aware diagnostic assistant for future telehealth applications. 


    Check out the Paper and Technical details. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit. For Promotion and Partnerships, please talk us.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary Care Physicians Using Multimodal Reasoning with Gemini 2.0 Flash appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleBuilding AI Agents Using Agno’s Multi-Agent Teaming Framework for Comprehensive Market Analysis and Risk Reporting
    Next Article InfHow: Learn how to do anything

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 19, 2025
    Machine Learning

    MiniMax AI Releases MiniMax-M1: A 456B Parameter Hybrid Model for Long-Context and Reinforcement Learning RL Tasks

    June 19, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Fuel your creativity with new generative media models and tools

    Artificial Intelligence

    CVE-2025-5865 – RT-Thread Parameter Handler Memory Corruption Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Buy a Samsung Odyssey G9 gaming monitor on sale and get a second screen for free

    News & Updates

    QAQ-QQ-AI-QUEST

    Development

    Highlights

    CVE-2025-4417 – AVEVA PI Connector for CygNet Cross-Site Scripting (XSS)

    June 12, 2025

    CVE ID : CVE-2025-4417

    Published : June 12, 2025, 8:15 p.m. | 1 hour, 46 minutes ago

    Description : A cross-site scripting vulnerability exists in
    AVEVA PI Connector for CygNet
    Versions 1.6.14 and prior that, if exploited, could allow an
    administrator miscreant with local access to the connector admin portal
    to persist arbitrary JavaScript code that will be executed by other
    users who visit affected pages.

    Severity: 5.5 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-6302 – TOTOLINK EX1200T Stack-Based Buffer Overflow Vulnerability

    June 20, 2025

    NVIDIA AI Released AgentIQ: An Open-Source Library for Efficiently Connecting and Optimizing Teams of AI Agents

    April 5, 2025

    Demis Hassabis & John Jumper awarded Nobel Prize in Chemistry

    May 27, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.