Health care professionals often make critical decisions about medications, new technologies, and healthcare innovations using only a subset of available data, because patient information is frequently dispersed across multiple systems. To enhance the quality and impact of their decisions, they need a unified 360-degree patient view that combines health records, drug research, and medical conditions. This comprehensive data integration enables better decision-making and accelerates medical discoveries.
In this post, we explore how you can achieve a patient 360-degree view using Amazon Neptune and generative AI, and use it to strengthen your organization’s research and breakthroughs. Centralizing patient data for a 360-degree view is essential for delivering comprehensive and personalized healthcare. By consolidating information from multiple sources such as electronic health records (EHRs), lab reports, prescriptions, and medical histories into a single location, healthcare providers can gain a better understanding of a patient’s health. This centralized approach allows for more efficient care coordination, reduces the risk of errors, and enables more informed clinical decisions. Additionally, it enhances patient engagement because relevant data is accessible in one place, leading to a more seamless healthcare experience. A 360-degree patient view not only improves outcomes but also helps healthcare organizations meet regulatory standards while enforcing data security and privacy.
Master data management (MDM) plays a critical role in centralizing patient data for a 360-degree view, so healthcare organizations can integrate and manage data from various sources with consistency and accuracy. MDM consolidates patient information from disparate systems such as EHRs, lab systems, billing systems, and insurance providers into a unified repository. By resolving duplicates, providing high levels of data quality, and maintaining a unified single source of truth, MDM enables healthcare providers to access a more complete and accurate view of each patient’s health. This unified data view can be used to improve decision-making, streamline workflows, and personalize treatment plans.
Advantages of a patient 360-degree view data-driven approach to improving health outcomes
The following are key reasons to build a solution that integrates patient health data with external information to provide a 360-degree view of a patient record.
Note: as health data is considered a type of Personal Identifiable Information (PII), you should consider local and regional data laws governing how PII data can be stored and used.
- Enabling personalized care – Predictive analytics can use data from the patient 360-degree view such as lab results, biometrics, past claims, and social determinants like education, income, and housing—to generate a risk score for each These scores help providers identify patients at greater risk for chronic conditions and offer timely interventions, such as additional services or wellness programs, to help prevent disease progression.
- Early detection of health issues – The patient 360-degree view can help identify potential health issues early on by analyzing patient data and flagging concerning trends or patterns. This can enable providers to intervene sooner and potentially prevent more serious health problems from developing.
- Improved patient engagement – When patients understand how lifestyle choices affect their health, they gain a stronger sense of ownership. By incorporating remote patient monitoring (RPM) data from devices tracking metrics like weight, blood pressure, and sleep, providers can analyze health trends and share these insights with patients. This creates an opportunity for meaningful discussions, empowering patients to take an active role in their health management.
- Identifying barriers to care – The patient 360-degree view includes factors like proximity to public transportation and supermarkets, childcare or eldercare responsibilities, and health insurance status. By giving providers a comprehensive view of these factors, they can better identify barriers to care and connect patients with community resources to overcome them.
- Supporting clinical decision-making – Though technology can’t replace physicians, it enhances decision-making by providing crucial data that reduces errors. For example, a system that tracks all of the medications a patient takes can alert providers to possible drug-to-drug interactions, helping avoid adverse effects and providing safer, more effective care.
Let’s walk through a real-time example of how to implement a 360-degree view of patients using MDM with Neptune and generative AI.
Solution overview
For our example use case, a healthcare organization is managing data from multiple systems, such as EHRs, lab results, prescription records, appointments, and insurance claims. The goal is to integrate this fragmented data into a unified view using MDM, and store and query the relationships using Neptune.
The following diagram outlines the architecture of the solution.
The diagram illustrates the bulk data ingestion process into Neptune using Amazon Simple Storage Service (Amazon S3) and a Virtual Private Cloud (VPC) endpoint within a VPC. The process follows these steps:
- Amazon SageMaker notebook is used to execute Neptune commands and graph queries. The notebook has an Identity and Access Management (IAM) roles attached to the Neptune database cluster and notebook to grant access to resources
- Amazon Neptune verifies the request and uses its IAM role to gain access to the S3 bucket where the data files are stored.
- The Bulk Load API reads files from the S3 bucket through a VPC endpoint, enabling secure access to resources outside the VPC and facilitating data ingestion into the Neptune database, where the graph data will be stored.
- The files in S3 are copied across the network to Neptune for ingestion.
- The file data is ingested into Amazon Neptune, completing the bulk load process.
This architecture ensures secure, scalable, and efficient data ingestion into Amazon Neptune while leveraging IAM roles for access control and VPC Endpoints for private connectivity.
The Neptune graph database service supports both labeled property graph (LPG) and Resource Description Framework (RDF) formats. In this example, we use an LPG model to represent the relationships between patients, doctors, treatments, and medical records. LPGs provide a flexible and customized approach to graph data modeling, with the ability to understand how objects in your graph are related by traversing across multiple connected paths. To learn more about building optimized graph data models, take our Data Modeling for Amazon Neptune course on Skill Builder.
A graph data model is made up of nodes, which represent the key objects or data points within the graph, and edges, which represent the relationships between the nodes. Labels are used to define the type or group that a node or edge belongs to; for example, the Patient
label determines that all nodes with this label represent patients. Nodes and edges can also contain properties to provide additional information for filtering or visualization of the data. An example of this could be a patient name or date or birth.
The following diagram illustrates the complete data model. This graph structure gives a full 360-degree view of the patient’s healthcare journey, which can be used for more effective decision-making and patient care coordination.
This diagram represents a graph-based healthcare data model that captures relationships between patients, diagnoses, prescriptions, and treatments. The nodes represent key healthcare entities, while edges define their relationships.
As part of the data model, we identified the following nodes:
- Patients – A patient belongs to an age group and has appointments, represented by unique patient IDs
- EHR – Each appointment is linked to an EHR, which contains LabResults and identified conditions and represents an outcome from an appointment
- LabResult – Represented by lab IDs
- Prescription – A Physician prescribes a Prescription, which may include Drugs that treat specific Conditions. Drugs can also have side effects and represented by prescription IDs
- Appointment – Represented by appointment IDs
- Condition – A Physician diagnoses a Condition, which is recorded in the EHR condition and represents a specific condition or ailment, such as headache
- Drug – Represents a specific drug
- AgeGroup – Represents an age group-based categorization of a patient
- Physician – Represents a medical professional
- Diagnosis – Represents a diagnosis of a condition by a medical professional
- Interconnected Relationships:
- LabResults help identify Conditions.
- Drugs are linked to Prescriptions and Conditions they treat.
- Physicians review prescriptions and diagnose conditions.
Edges define how entities are related to one another and provide the framework for navigating through the graph to derive insights. Edges have labels that describe the nature of the relationship and can also store properties that provide further detail to the relationship. In Neptune, edges are directed, meaning they go from one node to another in a specific direction, forming meaningful relationships between entities.
The following edge types have been included in our data model:
- in_age_group – Categorizes a patient with a specific age group.
- has_ehr – Connects a patient with one or multiple EHRs.
- has_appointment – Connects a patient with their appointment.
- has_prescription – Connects a patient directly with a prescription record (for fast traversal). It also connects a diagnosis with a prescription.
- has_condition – Connects an EHR record with a condition or affliction. It also connects a patient with a condition (for fast traversal).
- has_diagnosis – Connects a condition with a diagnosis.
- diagnosed_by – Connects a diagnosis with the physician that provided the diagnosis.
- prescribed_by – Connects a prescription to the physician who prescribed the drugs.
- has_prescribed_drug – Connects one or more drugs to a specific prescription
- treats – Connects a drug to a condition.
- contains – Connects a drug to other drugs that it contains.
- has_side_effect – Connects a drug to a condition.
- has_lab_result – Connects an EHR with their lab results. It also connects a patient with lab results (for fast traversal).
Prerequisites
Before you can ingest data into Neptune, you must first create a Neptune database cluster. For instructions, refer to Creating an Amazon Neptune cluster. After you create your cluster, refer to Using Amazon Neptune with graph notebooks to create a Neptune notebook, which we use to demonstrate the power of graph queries over highly connected patient data.
Ingest data into Neptune
We created two Neptune bulk load files using the Gremlin CSV format in order to ingest some sample data based on the model previously described:
Download these files to your own S3 bucket located in the same AWS Region as your Neptune cluster. Follow the bulk load tutorial to make sure your cluster has the correct IAM permissions, and the necessary infrastructure is in place for it to access your S3 bucket.
Using the graph notebook, you can use line and cell magics to ingest these files hosted in our S3 bucket. In your Jupyter notebook, run the %load command to initiate a bulk load request, as shown in the following screenshot.
You can also use request parameters to the %load
command to pass constructed variables to the request. Because the bulk loader is an API, you can also make a direct HTTPS request to the loader endpoint, for example, https://your-neptune-cluster.cluster-abc0de12fg3h.us-east-1.neptune.amazonaws.com/loader
, passing the required configuration parameters.
When the bulk load request is complete, run the following openCypher query to check the data:
This query returns the path of nodes with an outgoing connection to Patient nodes. This results in the following visualization.
Get patient insights to fulfill common healthcare use cases
Now that you have some sample data, let’s see how you can address our previous use cases.
Use case 1: Enabling personalized care
In this scenario, a physician needs to understand the entire medical history of their patients, that is the name of the patient and the different conditions that they’ve been diagnosed with. To get a patient’s full medical history, use the following query:
This results in the following output.
PatientName | MedicalHistory |
John Doe | [‘Hypertension’,’Arthritis’,’Heart Disease’,’Migraine’] |
Jane Smith | [‘Asthma’,’Diabetes’,’Back Pain’] |
Bob Johnson | [‘Diabetes’,’Allergy’] |
Emily Davis | [‘Hypertension’,’Arthritis’,’Back Pain’] |
Michael Brown | [‘Asthma’,’Diabetes’,’Heart Disease’,’Migraine’] |
Linda White | [‘Hypertension’,’Allergy’,’Back Pain’] |
The physician might want to view this from a different perspective. For example, they may want to view the data in a visual way to identify people with connecting conditions to understand how a specific condition affects our patients. In this case, you can run a different query to visualize the data as a graph:
Having access to a patient’s complete medical history can help in delivering personalized care because it reveals critical risk factors, on-going health conditions, and prior treatments. These insights give healthcare providers the ability to detect potential issues early by identifying patterns or subtle symptoms that might otherwise be overlooked. By leveraging this holistic understanding, providers can create customized care plans and implement proactive measures, ultimately enhancing patient outcomes and overall health management.
Use case 2: Early detection of health issues
In this scenario, physicians, or healthcare researchers and companies want to understand which conditions are prevalent across specific age groups. From a physician’s perspective, this could help the early detection of a specific age-related illness. For insurance companies, this could help when calculating insurance premiums. You can use the following query for common health conditions typically seen in individuals of specific age groups:
This results in the following output.
AgeGroup | HealthConditions |
18-30 | [‘Migraine’,’Heart Disease’,’Arthritis’,’Hypertension’] |
31-40 | [‘Back Pain’,’Diabetes’,’Asthma’] |
41-50 | [‘Back Pain’,’Arthritis’,’Hypertension’,’Migraine’,’Heart Disease’] |
51-60 | [‘Back Pain’,’Allergy’,’Hypertension’] |
Looking across the results, we can see Back Pain
is the number one condition that affects the most age groups across our data. This can help healthcare providers focus on building programs to help educate people on how to deal with and potentially mitigate back pain through healthier living styles.
You can narrow this down by targeting a specific age group, as well as producing a graphical visualization using the following query:
By leveraging the connected nature of the graph data, it simplifies the identification of current and future potential age group related conditions. It also provides a way to identify similarities between conditions suffered by patients in the same age group when additional data points such as lifestyle habits, working environment, and hobbies are added to the graph.
Use case 3: Improved patient engagement
To list all patients, and diagnoses where the diagnosing physician is different from the prescribing physician, use the following code:
This results in the following output.
Diagnosis | DiagnosisPhysician | PrescribingPhysician |
Asthma Diagnosis | Dr. Adams | Dr. Smith |
Heart Disease Diagnosis | Dr. Lee | Dr. Johnson |
Obesity Diagnosis | Dr. Davis | Dr. Lee |
Allergy Diagnosis | Dr. Davis | Dr. Lee |
From the results, we can see that it’s more often the case that Dr. Lee will write a prescription as opposed to diagnosing a specific condition. We can use this information to identify common patterns in behavior, and subsequently perform analysis to determine where anomalies exist.
It also provides us with a view of how care plans may can be improved. Patients who have to see multiple physicians for a diagnosis and prescriptions for the same condition is not optimal, and can cause additional stress and anxiety. A healthcare researcher or insurance company can run the following query to identify specific occurrences of this, and take necessary action where required.
This results in the following output.
Patient | Diagnosis | DiagnosingPhysician | PrescribingPhysician |
Jane Smith | Heart Disease Diagnosis | Dr. Lee | Dr. Johnson |
Michael Brown | Heart Disease Diagnosis | Dr. Lee | Dr. Johnson |
Natural language querying using generative AI
With the introduction of generative AI, end-users no longer have to understand complex query languages to be able to interrogate their data sources. Instead, you can use the power of generative AI to generate a query for you based on your graph schema.
When using Neptune, you can use the integration with libraries such as LangChain and LlamaIndex to incorporate natural language querying features into our solution. This means the end-users don’t need to learn how to write openCypher in order to query the graph—instead, you use the power of generative AI to generate an openCypher query from your natural language question, use an integration library to run it on your Neptune database, and then use the large language model (LLM) to summarize the results.
The following diagram describes the process of how Neptune integrates with an LLM using an integration library.
In this example, we use the LangChain NeptuneOpenCypherQAChain class to integrate with an LLM hosted on Amazon Bedrock.
First, you need to install the required libraries to work with LangChain. Run the following command in your Neptune notebook:
After the libraries are installed, you can start to import the required libraries into your code:
Next, you need to set up the variables and objects you need to create connections to your graph, LLM, and ultimately the chain that will run your natural language queries:
To connect to your LLM in Amazon Bedrock, you can use the ChatBedrock
class. In this example, we use the anthropic.claude-v2
foundation model (FM):
After you create a connection to your LLM and graph, you now need to create the chain itself:
In the above code, the allow_dangerous_requests parameter is required to allow the chain to execute LLM-generated queries against the database. As such, it is important that the executing IAM role has narrowly-scoped privileges applied to it. For example, only associating the NeptuneReadonlyAccess managed IAM policy will make sure that no write or delete queries will execute successfully. For more information, read the Granting NeptuneReadonlyAccess to Amazon Neptune databases using AWS managed policy documentation.
Improving results
Retrieval Augmented Generation (RAG) involves an element of generation, which is precisely what we’re asking our Amazon Bedrock hosted LLM to do—generate an openCypher query based on our graph schema that will answer the given question. Although LLMs are good at this, there are occasions when you need to provide it with some additional information in order to improve the results. In this case, you can use the chain.extra_instructions property and provide it with additional contextual and graph schema information. See the following code:
Finally, for the sake of reducing the amount of code you need to write later on, we’ve written a function that will do the invocation and error handling for you:
Perform natural language querying
Now that you have all the pieces together, you can start asking some natural language questions of our graph data. A good first example could be listing the age group of patients who most suffer from back pain:
For the above natural language query, the LLM generates the following openCypher statement:
The following are some additional examples of natural language questions:
Questions such as “which drugs have been most used to treat Hypertension?” can support clinical decision-making by leveraging comprehensive patient data to provide healthcare professionals with accurate insights and actionable recommendations. By integrating advanced analytics and AI-driven tools, clinicians can make informed decisions tailored to each patient’s unique needs, improving diagnosis, treatment, and overall care quality.
It’s important to note that the generated openCypher queries can sometimes fail to execute successfully. This is due to non-deterministic way of how the queries are generated. By adding more information about the graph schema to the chain.extra_instructions
parameter, the LLM will understand more about your graph and have better opportunity to create a more correct query.
Visualize the LLM-generated query
For many use cases, employing an LLM to construct a valid openCypher query and then summarizing the results is the most natural way of understanding the graph. However, in some cases, it also makes sense to visualize the resulting data in a graphical format. If you use the query “Which drugs have been most used to treat hypertension?” and change the prompt slightly, you can make sure the LLM returns a valid openCypher query that expresses the path containing the nodes and edges, rather than a summarization:
In this example, you invoke the chain directly, and ask the LLM to return the paths rather than list an output. Because you’re accessing the invoke method directly, you can store the result, query, and intermediate steps returned by LangChain. Part of the intermediate steps is the query that the LLM has generated.
Now that you captured the generated query, you can use Neptune Workbench magics to run the query and visualize the results in a graphical format:
Here, we’ve demonstrated that using LLMs and integration libraries to generate and run graph queries on Neptune and then summarize the results can provide medical and healthcare professionals access to key information stored within the graph, without the need to learn graph queries.
Visualize the patient data using Graph Explorer
A key function of a physician is exploring a patient’s medical history to understand which conditions they’re affected by, which drugs they have or are currently taking, and which other physicians have treated them, should they need to follow up or perform additional investigations.
To do this type of graph exploration, you can use Graph Explorer, an open source, low-code visual exploration tool for graph data. It integrates seamlessly with Neptune, as well as other graph databases that support LPG or RDF graph models. It can be hosted as a Docker container on Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Compute Cloud (Amazon EC2), or can be accessed directly from a running Neptune notebook. For this example, we access it directly from our notebook from the Neptune console.
The following animation provides a walkthrough of how you can use Graph Explorer to explore the connected nature of the graph to identify common patterns, connections, as well as anomalies, without the need for writing complex graph queries.
Clean up
To make sure you don’t incur any unnecessary additional costs, delete the notebook and Neptune database cluster you created earlier in this post.
Conclusion
By building an MDM using Neptune, healthcare organizations can centralize patient data, alleviate silos, and improve care coordination and decision-making. This enables more efficient data management and analysis, ultimately leading to better patient outcomes. By organizing the data into a single connected structure, healthcare professionals, insurers, and other relevant parties can benefit from the following improvements in data access and governance:
- Improved decision-making – Healthcare providers can quickly access complete patient histories, helping them make informed treatment decisions
- Faster data access – Relationships between patients, doctors, diagnoses, and medications are efficiently managed using the graph-based architecture of Neptune
- Scalability – Neptune allows for scalability and real-time querying, which is vital for large healthcare organizations managing millions of records
- Data integrity – The MDM makes sure that data remains clean, accurate, and consistent across multiple systems, providing reliable data for analysis and patient care.
To get started with Neptune for this and other healthcare life sciences use cases, refer to Creating an Amazon Neptune cluster.
For a healthcare life sciences industry graph demonstration using Amazon Neptune Analytics, our fast, memory-optimized graph analytics database, refer to the Healthcare Life Science Search Demo, which showcases how to incorporate vector similarity search (VSS) with PubMed datasets to provide a fast, semantically relevant search engine for healthcare-related journals.
About the Authors
Santosh Bhupathi is a Senior Database Specialist Solutions Architect based in Philadelphia. With a focus on relational and No SQL databases, he provides guidance and technical assistance to customers to help them design, deploy, and optimize database workloads on AWS.
Kevin Phillips is a Sr. Neptune Specialist Solutions Architect working in the UK at Amazon Web Services, having spent the last 4 years working with customers across EMEA get started, and accelerate with graph. He has over 20 years of development and solutions architectural experience, which he uses to help support and guide customers.
Source: Read More