Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 2, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 2, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 2, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 2, 2025

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025

      I’ve fallen hard for Starsand Island, a promising anime-style life sim bringing Ghibli vibes to Xbox and PC later this year

      June 2, 2025

      This new official Xbox 4TB storage card costs almost as much as the Xbox SeriesXitself

      June 2, 2025

      I may have found the ultimate monitor for conferencing and productivity, but it has a few weaknesses

      June 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      May report 2025

      June 2, 2025
      Recent

      May report 2025

      June 2, 2025

      Write more reliable JavaScript with optional chaining

      June 2, 2025

      Deploying a Scalable Next.js App on Vercel – A Step-by-Step Guide

      June 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025
      Recent

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025

      I’ve fallen hard for Starsand Island, a promising anime-style life sim bringing Ghibli vibes to Xbox and PC later this year

      June 2, 2025

      This new official Xbox 4TB storage card costs almost as much as the Xbox SeriesXitself

      June 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Accelerate digital pathology slide annotation workflows on AWS using H-optimus-0

    Accelerate digital pathology slide annotation workflows on AWS using H-optimus-0

    January 31, 2025

    Digital pathology is essential for the diagnosis and treatment of cancer, playing a critical role in healthcare delivery and pharmaceutical research and development. Pathology traditionally relies heavily on pathologist expertise and experience to conduct meticulous examination of tissue samples to identify abnormalities. However, the increasing complexity and volume of cases necessitate advanced tools to assist pathologists in making faster, more accurate diagnoses.

    The digitization of pathology slides, known as whole slide images (WSIs), gave rise to the new field of computational pathology. By applying AI to these digitized WSIs, researchers are working to unlock new insights and enhance current annotations workflows. A pivotal advancement in the field of computational pathology has been the emergence of large-scale deep neural network architectures, known as foundation models (FMs). These models are trained using self-supervised learning algorithms on expansive datasets, enabling them to capture a comprehensive repertoire of visual representations and patterns inherent within pathology images. The power of FMs lies in their ability to learn robust and generalizable data embeddings that can be effectively transferred and fine-tuned for a wide variety of downstream tasks, ranging from automated disease detection and tissue characterization to quantitative biomarker analysis and pathological subtyping.

    Recently, French startup Bioptimus announced the release of a new pathology vision FM: H-optimus-0, the world’s largest publicly available FM for pathology. With 1.1 billion parameters, H-optimus-0 was trained on a proprietary dataset of several hundreds of millions of images extracted from over 500,000 histopathology slides. This sets a new benchmark for state-of-the-art performance in critical medical diagnostic tasks, from identifying cancerous cells to detecting genetic abnormalities in tumors.

    The recent addition of H-optimus-0 to Amazon SageMaker JumpStart marks a significant milestone in making advanced AI capabilities accessible to healthcare organizations. This powerful FM, with its comprehensive training on over 500,000 histopathology slides, represents a valuable tool for organizations looking to enhance their digital pathology workflows.

    In this post, we demonstrate how to use H-optimus-0 for two common digital pathology tasks: patch-level analysis for detailed tissue examination, and slide-level analysis for broader diagnostic assessment. Through practical examples, we show you how to adapt this FM to these specific use cases while optimizing computational resources.

    Solution overview

    Our solution uses the AWS integrated ecosystem to create an efficient scalable pipeline for digital pathology AI workflows. The architecture combines the following services:

    • Amazon Elastic File System (Amazon EFS) for scalable high-throughput data management of pathology slides
    • Amazon Elastic Container Registry (Amazon ECR) for managing custom training containers
    • Amazon Simple Storage Service (Amazon S3) for secure model artifact storage
    • Amazon SageMaker for end-to-end machine learning (ML) operations and efficient compute resource allocation

    The following diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0.

    The following diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0

    This diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0

    This post provides example scripts and training notebooks in the following GitHub repository.

    Prerequisites

    We assume you have access to and are authenticated in an AWS account. The AWS CloudFormation template for this solution uses t3.medium instances to host the SageMaker notebook. Feature extraction uses g5.2xlarge instance types powered by NVIDIA T4 GPU tested in the us-west-2 AWS Region. Training jobs are run on p3.2xlarge and g5.2xlarge instances. Check your AWS service quotas to make sure you have sufficient access to these instance types.

    Create the AWS infrastructure

    To get started with pathology AI workflows, we use AWS CloudFormation to automate the setup of our core infrastructure. The provided infra-stack.yml template creates a complete environment ready for model fine-tuning and training.

    Our CloudFormation stack configures a secure networking environment using Amazon Virtual Private Cloud (Amazon VPC), establishing both public and private subnets with appropriate gateways for internet connectivity. Within this network, it creates an EFS file system to efficiently store and serve large pathology slide images. The stack also provisions a SageMaker notebook instance that automatically connects to the EFS storage, providing seamless access to training data.

    The template handles all necessary security configurations, including AWS Identity and Access Management (IAM) roles. When deploying the stack, make note of the private subnet and security group identifiers; you will need to make sure your training jobs can access the EFS data storage.

    For detailed setup instructions and configuration options, refer to the README in our GitHub repository.

    Use FMs for patch-level prediction tasks

    Patch-level analysis is fundamental to digital pathology AI workflows. Instead of processing entire WSIs that can exceed several gigabytes, patch-level analysis focuses on specific tissue regions. This targeted approach enables efficient resource utilization and faster model development cycles. The following diagram illustrates the workflow of patch-level prediction tasks on a WSI.

     The following diagram illustrates the workflow of patch-level prediction tasks on a WSI

    This diagram illustrates the workflow of patch-level prediction tasks on a WSI

    Classification task: MHIST dataset

    We demonstrate patch-level classification using the MHIST dataset, which contains colorectal polyp images. Early detection of potentially cancerous polyps directly impacts patient survival rates, making this a clinically relevant use case. By adding a simple classification head on top of H-optimus-0’s pretrained features and using linear probing, we achieve 83% accuracy. The implementation uses Amazon EFS for efficient data streaming and p3.2xlarge instances for optimal GPU utilization.

    To access the MHIST dataset, submit a data request through their portal to obtain the annotations.csv file and images.zip file. Our repository includes a download_mhist.sh script that automatically downloads and organizes the data in your EFS storage.

    Segmentation task: Lizard dataset

    For our second patch-level task, we demonstrate nuclear segmentation using the Lizard dataset, which requires precise pixel-level predictions of nuclear boundaries in colon tissue. We adapt H-optimus-0 for segmentation by adding a Mask2Former ViT adapter head, allowing the model to generate detailed segmentation masks while using the FM’s powerful feature extraction capabilities.

    The Lizard dataset is available on Kaggle, and our repository includes scripts to automatically download and prepare the data for training. The segmentation implementation runs on g5.16xlarge instances to handle the computational demands of pixel-level predictions.

    Use FMs for WSI-level tasks

    Analyzing entire WSIs presents unique challenges due to their massive size, often exceeding 50,000 x 50,000 pixels. To address this, we implement multiple instance learning (MIL), which treats each WSI as a collection of smaller patches. Our attention-based MIL approach automatically learns which regions are most relevant for the final prediction. The following diagram illustrates the workflow for WSI-level prediction tasks using MIL.

    The following diagram illustrates the workflow for WSI-level prediction tasks using MIL

    This diagram illustrates the workflow for WSI-level prediction tasks using MIL

    WSI processing pipeline

    Our implementation optimizes WSI analysis through the following methods:

    • Intelligent patching – We use the GPU-accelerated CuCIM library to efficiently load WSIs and apply Canny edge detection to identify and extract only tissue-containing regions
    • Feature extraction – The selected patches are processed in parallel using GPU acceleration, with features stored in space-efficient HDF5 format for downstream analysis

    MSI status prediction

    We demonstrate our WSI pipeline by predicting microsatellite instability (MSI) status, a crucial biomarker that guides immunotherapy decisions in cancer treatment. The TCGA-COAD dataset used for this task can be accessed through the GDC Data Portal, and our repository provides detailed instructions for downloading the WSIs and corresponding MSI labels.

    Clean up

    After you’ve finished, don’t forget to delete the associated resources (Amazon EFS storage and SageMaker notebook instances) to avoid unexpected costs.

    Conclusion

    In this post, we demonstrated how you can use AWS services to build scalable digital pathology AI workflows using the H-optimus-0 FM. Through practical examples of both patch-level tasks (MHIST classification and Lizard nuclear segmentation) and WSI analysis (MSI status prediction), we showed how to efficiently handle the unique challenges of computational pathology.

    Our implementation highlights the seamless integration between AWS services for handling large-scale pathology data processing. Although we used Amazon EFS for this demonstration to enable high-throughput training workflows, production deployments might consider AWS HealthImaging for long-term storage of medical imaging data.

    We hope this pipeline serves as a starting point for your own pathology AI initiatives. The provided GitHub repository contains the necessary components to help you begin building and scaling pathology workflows for your specific use cases. You can clone the repository and set up the infrastructure using the provided CloudFormation template. Then try fine-tuning H-optimus-0 on your own pathology datasets and downstream tasks and compare the results with your current methods.

    We’d love to hear about your experiences and insights. Reach out to us or contribute to the publicly available FMs to help advance the field of computational pathology.


    About the Authors

    Pierre de Malliard is a Senior AI/ML Solutions Architect at Amazon Web Services and supports customers in the healthcare and life sciences industry. Pierre de Malliard is a Senior AI/ML Solutions Architect at Amazon Web Services and supports customers in the healthcare and life sciences industry. In his free time, Pierre enjoys skiing and exploring the New York food scene.

    Christopher is a senior partner account manager at Amazon Web Services (AWS), helping independent software vendors (ISVs) innovate, build, and co-sell cloud-based healthcare software-as-a-service (SaaS) solutions in public sector. Part of the Healthcare and Life Sciences Technical Field Community (TFC), Christopher aims to accelerate the digitization and utilization of healthcare data to drive improved outcomes and personalized care delivery.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow Travelers Insurance classified emails with Amazon Bedrock and prompt engineering
    Next Article New Russian Threat Group Hacks Into U.S. Oil and Gas Facilities

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 2, 2025
    Machine Learning

    Off-Policy Reinforcement Learning RL with KL Divergence Yields Superior Reasoning in Large Language Models

    June 2, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Customizing Data Exports: Dynamic Excel Updates with Power Apps, Power Automate, and Office Scripts

    Development

    The top 6 GNOME extensions I install first (and what they can do for you)

    News & Updates

    12 Best Practices for React Developers

    Development

    I compared a $190 robot vacuum to a $550 one. Here’s my buying advice

    Development

    Highlights

    Linux

    Linus Torvalds torna alla tastiera meccanica: il valore del feedback nella digitazione per chi sviluppa

    May 14, 2025

    Linus Torvalds, figura centrale nello sviluppo del kernel Linux e punto di riferimento per la…

    CVE-2025-32707 – Windows NTFS Out-of-bounds Read Privilege Elevation

    May 13, 2025

    Smashing Security podcast #404: Podcast not found

    February 25, 2025

    Why I bought a $5,300 Apple Mac Studio in the midst of tariffs news – and don’t regret it

    April 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.