RadGraph2: A New Dataset for Tracking Disease Progression in Radiology Reports

Automated information extraction from radiology notes presents significant challenges in the field of medical informatics. Researchers are trying to develop systems that can accurately extract and interpret complex medical data from radiological reports, particularly focusing on tracking disease progression over time. The primary challenge lies in the limited availability of suitably labeled data that can capture the nuanced information contained in these reports. Current methodologies often struggle with representing the temporal aspects of patient conditions, especially when it comes to comparisons with prior examinations, which are crucial for understanding a patientâ€™s healthcare trajectory.

To overcome the limitations in capturing temporal changes in radiology reports, researchers have developed RadGraph2, an enhanced hierarchical schema for entities and relations. This new approach builds upon the original RadGraph schema, expanding its capabilities to represent various types of changes observed in patient conditions over time. RadGraph2 was developed through an iterative process, involving continuous feedback from medical practitioners to ensure its coverage, faithfulness, and reliability. The schema maintains the original design principles of maximizing clinically relevant information while preserving simplicity for efficient labeling. This method enables the capture of detailed information about findings and changes described in radiology reports, particularly focusing on comparisons with prior examinations.

The RadGraph2 method employs a Hierarchical Graph Information Extraction (HGIE) model to annotate radiology reports automatically. This approach utilizes the structured organization of labels to enhance information extraction performance. The core of the system is a Hierarchical Recognition (HR) component that utilizes an entity taxonomy, recognizing inherent relationships between various entities used in graph labeling. For instance, entities like CHAN-CON-WOR and CHAN-CON-AP are categorized under changes in patient conditions. The HR system uses a BERT-based model as its backbone, extracting 12 scalar outputs corresponding to entity categories. These outputs represent conditional probabilities of entities being true, given their parentâ€™s truth in the entity hierarchy.

RadGraph2â€™s information schema defines three main entity types: â€œanatomy,â€ â€œobservation,â€ and â€œchange,â€ along with three relation types: â€œmodify,â€ â€œlocated at,â€ and â€œsuggestive of.â€ The entity types are further divided into subtypes, forming a hierarchical structure. Change entities (CHAN) are a key addition to the original RadGraph schema, encompassing subtypes such as No change (CHAN-NC), Change in medical condition (CHAN-CON), and Change in medical devices (CHAN-DEV). Each of these subtypes is further categorized to capture specific aspects of change, such as condition appearance, worsening, improvement, or resolution. Anatomy entities (ANAT) and Observation entities (OBS) are retained from the original schema, with OBS further divided into definitely present, uncertain, and absent subtypes. This hierarchical structure allows for a more nuanced representation of the information contained in radiology reports, particularly emphasizing the temporal aspects and changes in patient conditions.

RadGraph2â€™s schema defines three types of relations as directed edges between entities:

1. Modify relations (modify):

Â Â Â â€¢ Indicate that the first entity modifies the second entity

Â Â Â â€¢ Connect entity types: (OBS-*, OBS-*), (ANAT-DP, ANAT-DP), (CHAN-*, *), and (OBS-*, CHAN-*)

Â Â Â â€¢ Example: â€œrightâ€ â†’ â€œlungâ€ in â€œright lungâ€

2. Located at relations (located_at):

Â Â Â â€¢ Connect anatomy and observation entities

Â Â Â â€¢ Indicate that observation is related to anatomy

Â Â Â â€¢ Connect entity types: (OBS-*, ANAT-DP)

Â Â Â â€¢ Example: â€œclearâ€ â†’ â€œlungsâ€ in â€œlungs are clearâ€

3. Suggestive of relations (suggestive_of):

Â Â Â â€¢ Indicate that the status of the second entity is derived from the first entity

Â Â Â â€¢ Connect entity types: (OBS-*, OBS-*), (CHAN-*, OBS-*), and (OBS-*, CHAN-*)

Â Â Â â€¢ Example: â€œopacityâ€ â†’ â€œpneumoniaâ€ in â€œThe opacity may indicate pneumoniaâ€

These relations enable RadGraph2 to capture the complex relationships between different entities in radiology reports, including modifications, anatomical associations, and diagnostic inferences. The schemaâ€™s relational structure allows for a more comprehensive representation of the information contained in the reports, facilitating a better understanding of the interconnections between observations, anatomical structures, and changes in patient conditions.

RadGraph2â€™s dataset is organized into three main partitions:

1. Training set:

Â Â Â â€¢ Contains 575 manually labeled reports

Â Â Â â€¢ Used for model training and optimization

2. Development set:

Â Â Â â€¢ Consists of 75 manually labeled reports

Â Â Â â€¢ Used for model validation and hyperparameter tuning

3. Test set:

Â Â Â â€¢ Comprises 150 manually labeled reports

Â Â Â â€¢ Used for final model evaluation

Key characteristics of the dataset:

â€¢ Patient disjointness: Reports in each partition are from distinct sets of patients

â€¢ Consistency with original RadGraph: Maintains the report placement from the original dataset

â€¢ De-identification: All protected health information in the reports is removed

Additional dataset component:

â€¢ 220,000+ automatically labeled reports:

Â Â Â â€“ Annotated by the best-performing model (HGIE)

Â Â Â â€“ Provides a large-scale resource for further research and model development

This dataset structure ensures a robust evaluation framework for RadGraph2, maintaining data integrity and patient privacy while offering a substantial corpus for training and testing advanced information extraction models in the radiology domain.

RadGraph2 releases a comprehensive set of files to support researchers and developers. The dataset package includes a README.md file providing a brief overview, along with train.json, dev.json, and test.json files containing labeled reports from MIMIC-CXR-JPG and CheXpert. Also, two large inference files, inference-chexpert.json and inference-mimic.json, contain reports labeled by the benchmark model. The file format follows a structure similar to the original RadGraph dataset, utilizing a JSON format with a hierarchical dictionary structure. Each report is identified by a unique key and contains metadata such as the full text, data split, data source, and a flag indicating if it was part of the original RadGraph dataset. The â€œentitiesâ€ key within each reportâ€™s dictionary encapsulates detailed information about entity and relation labels, including tokens, label types, token indices, and relations to other entities. This structured format allows for efficient data processing and analysis, enabling researchers to utilize the rich information contained in radiology reports for various natural language processing tasks and medical informatics applications.

Image source: https://physionet.org/content/radgraph2-radiology-reports/1.0.0/

RadGraph2 is an advanced approach to automated information extraction from radiology reports, addressing the challenges of tracking disease progression over time. Key aspects of RadGraph2 include:

1. Enhanced hierarchical schema: Built upon the original RadGraph, it introduces new entity types to represent various kinds of changes in patient conditions.

2. Hierarchical Graph Information Extraction model: Utilizes a structured organization of labels and a Hierarchical Recognition component with a BERT-based backbone.

3. Comprehensive entity types: Includes anatomy, observation, and change entities, with further subtypes to capture nuanced information.

4. Relation types: Defines modify, located_at, and suggestive_of relations to represent complex relationships between entities.

5. Dataset structure: Comprises training (575 reports), development (75 reports), and test (150 reports) sets, plus 220,000+ automatically labeled reports.

6. File format: Uses JSON structure with detailed metadata and entity information for each report.

RadGraph2 aims to provide a more comprehensive representation of temporal changes in radiology reports, enabling better tracking of disease progression and patient care trajectories. The dataset and schema offer researchers a robust framework for developing advanced natural language processing models in the medical domain.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 48k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post RadGraph2: A New Dataset for Tracking Disease Progression in Radiology Reports appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

RadGraph2: A New Dataset for Tracking Disease Progression in Radiology Reports

Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

Anchor Positioning Just Don’t Care About Source Order

Autoapply: Automatically Apply for Jobs with Smart Tools in 2025

CVE-2025-43919 – cPanel WHM GNU Mailman File Traversal Vulnerability

Audeze continues to push into the mainstream with these high-end audiophile headphones, and I love them

nip4 is an image processing spreadsheet

How to preorder the new Surface Pro and Surface Laptop

How Deutsche Bahn redefines forecasting using Chronos models – Now available on Amazon Bedrock Marketplace

GSAP is Now Completely Free, Even for Commercial Use!

RadGraph2: A New Dataset for Tracking Disease Progression in Radiology Reports

Related Posts