Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Error’d: You Talkin’ to Me?

      September 20, 2025

      The Psychology Of Trust In AI: A Guide To Measuring And Designing For User Confidence

      September 20, 2025

      This week in AI updates: OpenAI Codex updates, Claude integration in Xcode 26, and more (September 19, 2025)

      September 20, 2025

      Report: The major factors driving employee disengagement in 2025

      September 20, 2025

      Development Release: Zorin OS 18 Beta

      September 19, 2025

      Distribution Release: IPFire 2.29 Core 197

      September 19, 2025

      Development Release: Ubuntu 25.10 Beta

      September 18, 2025

      Development Release: Linux Mint 7 Beta “LMDE”

      September 18, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Performance Prediction System using Python Machine Learning (ML)

      September 21, 2025
      Recent

      Student Performance Prediction System using Python Machine Learning (ML)

      September 21, 2025

      The attack on the npm ecosystem continues

      September 20, 2025

      Feature Highlight

      September 20, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Hyprland Made Easy: Preconfigured Beautiful Distros

      September 20, 2025
      Recent

      Hyprland Made Easy: Preconfigured Beautiful Distros

      September 20, 2025

      Development Release: Zorin OS 18 Beta

      September 19, 2025

      Distribution Release: IPFire 2.29 Core 197

      September 19, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary Care Physicians Using Multimodal Reasoning with Gemini 2.0 Flash

    Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary Care Physicians Using Multimodal Reasoning with Gemini 2.0 Flash

    May 4, 2025

    LLMs have shown impressive promise in conducting diagnostic conversations, particularly through text-based interactions. However, their evaluation and application have largely ignored the multimodal nature of real-world clinical settings, especially in remote care delivery, where images, lab reports, and other medical data are routinely shared through messaging platforms. While systems like the Articulate Medical Intelligence Explorer (AMIE) have matched or surpassed primary care physicians in text-only consultations, this format falls short of reflecting telemedicine environments. Multimodal communication is essential in modern care, as patients often share photographs, documents, and other visual artifacts that cannot be fully conveyed through text alone. Limiting AI systems to textual inputs risks omitting critical clinical information, increasing diagnostic errors, and creating accessibility barriers for patients with lower health or digital literacy. Despite the widespread use of multimedia messaging apps in global healthcare, there has been little research into how LLMs can reason over such diverse data during diagnostic interactions.

    Research in diagnostic conversational agents began with rule-based systems like MYCIN, but recent developments have focused on LLMs capable of emulating clinical reasoning. While multimodal AI systems, such as vision-language models, have demonstrated success in radiology and dermatology, integrating these capabilities into conversational diagnostics remains challenging. Effective AI-based diagnostic tools must handle the complexity of multimodal reasoning and uncertainty-driven information gathering, a step beyond merely answering isolated questions. Evaluation frameworks like OSCEs and platforms such as AgentClinic provide useful starting points, yet tailored metrics are still needed to assess performance in multimodal diagnostic contexts. Moreover, while messaging apps are increasingly used in low-resource settings for sharing clinical data, concerns about data privacy, integration with formal health systems, and policy compliance persist. 

    Google DeepMind and Google Research have enhanced the AMIE with multimodal capabilities for improved conversational diagnosis and management. Using Gemini 2.0 Flash, AMIE employs a state-aware dialogue framework that adapts conversation flow based on patient state and diagnostic uncertainty, allowing strategic, structured history-taking with multimodal inputs like skin images, ECGs, and documents. AMIE outperformed or matched primary care physicians in a randomized OSCE-style study with 105 scenarios and 25 patient actors across 29 of 32 clinical metrics and 7 of 9 multimodal-specific criteria, demonstrating strong diagnostic accuracy, reasoning, communication, and empathy. 

    The study enhances the AMIE diagnostic system by incorporating multimodal perception and a state-aware dialogue framework that guides conversations through phases of history taking, diagnosis, and follow-up. Gemini 2.0 Flash powers the system and dynamically adapts based on evolving patient data, including text, images, and clinical documents. A structured patient profile and differential diagnosis are updated throughout the interaction, with targeted questions and multimodal data requests guiding clinical reasoning. Evaluation includes automated perception tests on isolated artifacts, simulated dialogues rated by auto-evaluators, and expert OSCE-style assessments, ensuring robust diagnostic performance and clinical realism. 

    The results show that the multimodal AMIE system performs at par or better than primary care physicians (PCPs) across multiple clinical tasks in simulated text-chat consultations. In OSCE-style assessments, AMIE consistently outperformed PCPs in diagnostic accuracy, especially when interpreting multimodal data such as images and clinical documents. It also demonstrated greater robustness when image quality was poor and showed fewer hallucinations. Patient actors rated AMIE’s communication skills highly, including empathy and trust. Automated evaluations confirmed that AMIE’s advanced reasoning framework, built on the Gemini 2.0 Flash model, significantly improved diagnosis and conversation quality, validating its design and effectiveness in real-world clinical scenarios. 

    In conclusion, the study advances conversational diagnostic AI by enhancing AMIE to integrate multimodal reasoning within patient dialogues. Using a novel state-aware inference-time strategy with Gemini 2.0 Flash, AMIE can interpret and reason about medical artifacts like images or ECGs in real-time clinical conversations. Evaluated through a multimodal OSCE framework, AMIE outperformed or matched primary care physicians in diagnostic accuracy, empathy, and artifact interpretation, even in complex cases. Despite limitations tied to chat-based interfaces and the need for real-world testing, these findings highlight AMIE’s potential as a robust, context-aware diagnostic assistant for future telehealth applications. 


    Check out the Paper and Technical details. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit. For Promotion and Partnerships, please talk us.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Google Researchers Advance Diagnostic AI: AMIE Now Matches or Outperforms Primary Care Physicians Using Multimodal Reasoning with Gemini 2.0 Flash appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleBuilding AI Agents Using Agno’s Multi-Agent Teaming Framework for Comprehensive Market Analysis and Risk Reporting
    Next Article InfHow: Learn how to do anything

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-49871 – Noptin Stored Cross-Site Scripting (XSS) Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-53683 – Apache HTTP Server Unvalidated User Input

    Common Vulnerabilities and Exposures (CVEs)

    Nvidia driver 572.83 is causing a black screen on Windows 11, Windows 10

    Operating Systems

    Microsoft Introduces Paid Extended Support for Azure Database for MySQL

    Operating Systems

    Highlights

    NSO Group fined $167M for spyware attacks on 1,400 WhatsApp users

    May 7, 2025

    NSO Group fined $167M for spyware attacks on 1,400 WhatsApp users

    A U.S. federal jury has ordered Israeli spyware vendor NSO Group to pay WhatsApp $167,254,000 in punitive damages and $444,719 in compensatory damages for a 2019 campaign that targeted 1,400 users of …
    Read more

    Published Date:
    May 07, 2025 (2 hours, 1 minute ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2019-3568

    One of Atlus’ best Xbox JRPGs that puts modern Final Fantasy games to shame is now on a 45% discount — This is your last chance to seize it as the Amazon Day Prime closes today

    July 11, 2025

    CVE-2025-6462 – WordPress EZ SQL Reports Stored Cross-Site Scripting Vulnerability

    June 29, 2025

    Microsoft Links Ongoing SharePoint Exploits to Three Chinese Hacker Groups

    July 22, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.