Amazon Neptune recently released the GraphRAG Toolkit, an open source Python library that makes it straightforward to build graph-enhanced Retrieval Augmented Generation (RAG) workflows. The toolkit provides a framework for automating the construction of a graph with vector embeddings from unstructured data, and composing question-answering strategies that query this graph to retrieve structurally relevant information when answering user questions.
In this post, we describe how you can get started with the toolkit. We begin by looking at the benefits of adding a graph to your RAG application. Then we show you how to set up a quick start environment and install the toolkit. Lastly, we discuss some of the design considerations that led to the toolkit’s graph model and its approach to content retrieval.
Why add a graph to your RAG application?
Consider the following fictitious narrative, assembled from different news articles, press releases, industry publications, and analyst reports over the course of several weeks, and fed, together with many other documents, into a RAG workflow:
- Example Corp, the US-based maker of the popular Widget personal gizmo, has recently extended its worldwide distribution channels by partnering with AnyCompany Logistics, an international shipping, storage, and last-mile distribution provider. Widget is an AI-augmented personal desktop pet whose conversational capabilities are powered by a new breed of generative AI technologies. Developed in the Austin-based Example Corp labs, the toys are manufactured in Taiwan.
- The UK’s Christmas top-10 toy predictions are already in, even though it’s only August, with industry analysts predicting huge demand for Example Corp’s Widget chatty desktop pet. Retailers in London, Manchester, and other major cities have already placed orders for over 1 million units, to the value of $15 million, and those numbers are only set to increase in the months leading up to Christmas.
- AnyCompany Logistics today announced that it is cutting shipping times from Taiwan to the UK from 2 weeks to just 3 days by directing all distribution through the recently opened Fictitious Canal. The Fictitious Canal is set to become one of the most heavily trafficked waterways over the next 5 years as manufacturers and distributors look to move from air to sea to cut energy spend and save costs.
- The Fictitious Canal continues to experience blockages and delays following landslips. Over 5 million tons of rock has now blocked the southern basin, trapping container vessels and disrupting international supply chains. Goods destined for Europe are expected to be delayed for several months, meaning that Christmas inventory for many retailers is under threat.
Imagine you’re a researcher. What do you want to know? How about: What are the sales prospects for Example Corp in the UK?
We asked a leading vector-based RAG application and got the following response:
Based on the provided context, the sales prospects for Example Corp’s Widget product in the UK appear to be very strong for the upcoming Christmas season. Industry analysts have predicted huge demand for the Widget chatty desktop pet, even ranking it among the top 10 toys for Christmas. Retailers in major UK cities like London and Manchester have already placed orders for over 1 million units worth $15 million, and these order numbers are expected to increase further as Christmas approaches. The context suggests Example Corp can anticipate very high sales volumes and revenues from the UK market for their Widget product during the critical holiday shopping period.
Then we asked an application built using the GraphRAG Toolkit and got the following response:
Based on the search results, the sales prospects for Example Corp’s Widget chatty desktop pet in the UK appear to be very promising initially, but there are potential supply chain issues that could hamper sales later on.
The search results indicate that retailers in major UK cities like London and Manchester have already placed orders for over 1 million units of the Widget product, anticipating huge demand. Industry analysts are also predicting high demand for this toy.
To meet this expected demand, Example Corp has partnered with AnyCompany Logistics to extend its worldwide distribution channels, including in the UK. AnyCompany Logistics had announced plans to significantly cut shipping times from Taiwan to the UK from 2 weeks down to just 3 days by routing shipments through the newly opened Fictitious Canal.
However, there are reports of a major blockage in the Fictitious Canal caused by landslips, with over 5 million tons of rock obstructing the southern basin and trapping container vessels. This could potentially disrupt international supply chains and delay goods destined for Europe, including the UK, for several months.
So while initial sales prospects for Example Corp’s Widget product in the UK look very promising based on pre-orders and anticipated demand, there is a risk that supply chain issues caused by the Fictitious Canal blockage could hamper the ability to meet that demand in the coming months if the situation is not resolved quickly.
Why the big difference? What does the graph approach offer that the primarily vector-based one doesn’t?
Vector search can only retrieve information that is semantically similar, or close in language, to the question being asked. Information that is dissimilar is structurally unavailable for retrieval. In our example, the fragments concerning AnyCompany Logistics’ use of the Fictitious Canal, and the current blockages troubling the Canal, are sufficiently dissimilar to the question being asked that they don’t get pulled into the context by the vector-based solution—even though they are crucial in formulating a more accurate and complete response.
Relevancy can be thought of in terms of relatedness: whatever is relevant to the question is somehow related, either directly or indirectly, to the question. Relatedness is a broader concept than similarity. Semantic similarity is just one way in which the things that are of interest to us can be related to one another; we might say, for example, that texts A and B are related because they are semantically similar. But there are lots of other ways in which things can be related: contiguity in time or space, cause and effect, parent-child, part-whole; or social, organizational, legal, taxonomic relations—the list is endless. The ways in which things are related, and the relative importance, strength, and quality of those relationships, will vary from domain to domain, but suffice to say, “is semantically similar to” is just one tool in your RAG retrieval toolbox.
By modeling our domain as a graph and using the edges in the graph to represent the different types of relationships that are important to us, we can provide access to information that is dissimilar to the question but nonetheless structurally relevant for creating an accurate and full response.
Similarity-based retrieval remains an important RAG strategy, and context that is semantically similar to the question will often comprise the foundation of a good answer. However, similarity-based retrieval alone is not always sufficient for generating a nuanced response. In many circumstances it will also be necessary to find and return information that can’t be found using vector similarity search, in order to present a question-answering process with a more differentiated context that it can use to develop comparisons, arguments, and summaries. The relationships in a graph provide a means by which a retrieval process can find this additional, relevant information.
The GraphRAG Toolkit
Every RAG application is built around two core capabilities: indexing and querying. The GraphRAG Toolkit is an open source Python library that you can use both to index your data into a graph and a vector store, and build question-answering solutions that then retrieve relevant content from this graph.
With the first version of the toolkit, the focus is on building graph-based RAG applications over unstructured and semi-structured textual content (such as webpages, PDFs, and JSON documents). See the Installing the GraphRAG Toolkit section later in this post for details on setting up and running the toolkit.
Indexing
Indexing content is just a few lines of code:
The LexicalGraphIndex
is the primary means of indexing content. You can use it, as shown in this example, in a continuous-ingest fashion, whereby content is pipelined through a set of extract and build stages, so that the graph soon starts to be populated with data that can then be queried even while the ingest continues. You can also use it to run separate extract and build stages—something you might do if you have a one-time-only job, or want to build and rebuild a graph multiple times from the same underlying extracted content.
A LexicalGraphIndex
is configured with a graph store and a vector store. For this example, we’re using a Neptune Database graph store and an Amazon OpenSearch Serverless vector store. At the time of writing, the toolkit supports Neptune Database and Neptune Analytics, OpenSearch Serverless, and Amazon Bedrock for the foundation models (FMs) used to extract and embed content.
The content that is being indexed in the preceding example comprises several pages of Neptune documentation. We’re using a LlamaIndex SimpleWebPageReader
to parse and load the data into the index. Depending on the type and location of your source data, you can use other LlamaIndex readers, including the SimpleDirectoryReader
and the JSONReader
, to load data into the index.
Querying
Querying, or question answering, is as straightforward as indexing:
Querying is actually a two-step process. It starts by retrieving relevant information from the underlying storage, and then supplies this information to a large language model (i.e. the FM) for the FM to generate an answer. The LexicalGraphQueryEngine
performs both steps on your behalf.
Again, we’re configuring the process with a graph store and vector store. At first glance, this looks a little redundant—after all, didn’t we already specify the graph and vector stores in the indexing stage? But remember, indexing and querying are two separate processes. These processes could be running in different environments, on different machines, and at different times. As such, each process needs to be configured with the location of its graph and vector stores.
Installing the GraphRAG Toolkit
You can get started with the GraphRAG Toolkit using the quick start AWS CloudFormation template from the project’s GitHub repository. This template creates a Neptune database and OpenSearch Serverless collection, and an Amazon SageMaker notebook instance with example code. The examples use FMs in Amazon Bedrock to extract and embed content, and generate responses.
Prerequisites
Before you run the template, make sure you have enabled access to the appropriate FMs in Amazon Bedrock. The default models are:
anthropic.claude-3-sonnet-20240229-v1:0
cohere.embed-english-v3
You can configure the toolkit with other models, besides those configured in the quick start examples.
You must run the CloudFormation stack in an AWS Region containing these models, and enable access to the models before running the notebook examples.
Deploy the CloudFormation stack
The following screenshot shows the stack details for the CloudFormation template.
You need to supply a stack name. Most of the parameters have been populated with sensible defaults, but there are a couple you may want to change:
- ApplicationId – Use this to specify a unique identifier that will be used to name the resources in the deployment, including the Neptune cluster and instance, and the OpenSearch Serverless collection.
- IamPolicyArn – Use this to specify the Amazon Resource Name (ARN) of an additional AWS Identity and Access Management (IAM) policy to be attached to the SageMaker notebook instance. This custom policy can contain permissions to additional resources that you want to use, such as specific Amazon Simple Storage Service (Amazon S3) buckets, or additional Amazon Bedrock FMs.
The template creates the following resources:
- A virtual private cloud (VPC) with three private subnets, one public subnet, and an internet gateway
- A Neptune Database cluster with a single Neptune serverless instance
- An OpenSearch Serverless collection with a public endpoint
- A SageMaker notebook containing the GraphRAG Toolkit sample notebooks
When the stack deployment has completed, you can open the SageMaker sample notebooks (there’s a NeptuneSagemakerNotebook output parameter on the Outputs tab of the stack, with a link to the notebook instance), and start indexing and querying your content.
Run the notebooks
Notebook 01 – Combined-Extract-and-Build is a good place to start. The first cell in each notebook installs the toolkit from the GitHub repository. You only need to run this install one time per deployment, not for every notebook.
When the install has completed, you can run the second cell, which indexes the example content.
With the indexing complete, you can start querying the content. Notebook 04 – Querying allows you to experiment with the different query strategies contained in the toolkit.
Clean up
The resources deployed incur costs in your account. Remember to delete the stack when you’ve finished with it so that you don’t incur any unnecessary charges (approximately $1.5/hour in the US East (N. Virginia) AWS Region).
Build your own applications
You don’t need to run the quick start CloudFormation template to use the toolkit. You can install the toolkit in your own environment, and build your own Python applications that compose the toolkit with other libraries and services (you will need to provision the necessary graph and vector store resources, and make sure you have access to the appropriate FMs beforehand).
You can install the toolkit and its dependencies using pip (the toolkit isn’t currently available on PyPi, but we make frequent releases to the project’s GitHub repository). Follow the installation instructions on the project’s homepage to install the latest version.
The project’s documentation contains many examples of configuring and running the indexing and querying processes. You can adapt these examples for use in your own applications. The examples in the documentation are written for running in a notebook environment. If you’re building an application with a main entry point, you should put the application logic inside a method, and add an if __name__ == '__main__'
block:
Graph model and query strategy design
When designing a RAG solution, it’s useful to adopt a working backwards approach in order to determine an appropriate set of retrieval and generation strategies, and an underlying indexing and storage scheme, capable of supporting your specific workload needs. What kinds of question-answering or end-user or application data needs is your workflow intended to fulfil? What kinds of data must you therefore retrieve to satisfy those needs? What kinds of retrieval strategies will best furnish the context window with this data? And what kinds of indexing structures or data models will most efficiently facilitate such retrieval?
The GraphRAG Toolkit is designed to support question-answering workflows over unstructured and semi-structured textual content, and in particular workflows that require retrieving relevant information from multiple, potentially unrelated sources, or information that is structurally inaccessible to solely vector-based solutions. We might call these search-based workflows, as opposed to counting– or aggregation-based workflows, which would require computing a numerical result.
To satisfy the needs of a search-based workflow, the system should present the question-answering process—the FM—with pieces of relevant textual content: snippets of text, or lexical units, that the FM can use to generate a response. With this in mind, one of the first design decisions we had to address was, what size lexical unit should form the basis of the context supplied to the FM? For many RAG applications, the primary unit of context is the chunk: that is, the context window is formed of one or more chunks retrieved from the corpus. Different chunking strategies produce differently sized chunks—there’s no one-size-fits-all definition of a chunk—but a chunk is typically larger than an individual sentence but smaller than an entire document.
For the GraphRAG Toolkit, the primary unit of context is not the chunk, but the statement, which is a standalone assertion or proposition. Source documents are broken into chunks, and from these chunks are extracted statements. Statements are thematically grouped by topic, and supported by facts. At question-answering time, the toolkit retrieves sets of relevant statements grouped by topic, and presents them in the context window to the FM.
This requirement to supply lexical units in the form of statements to the FM led us to design a lexical graph model, and an extraction process that targets this model. This lexical graph has three tiers:
- Lineage – Sources, chunks, and the relations between them
- Summarization – Topics, statements, and the facts that support statements
- Entity-relationship – Individual entities and relations extracted from the underlying sources
The following diagram shows the overall lexical graph model.
You can read more about this graph model in the toolkit’s documentation. In this section, we dive deeper into the summarization tier.
When we design a graph model, we often think of this model in terms of its capacity to represent the things we’re interested in. An alternative, though complementary, viewpoint is to consider the role or responsibility of each model element in the context of the application and data needs the model is intended to support. In the context of the search-based workflow needs we’ve identified for the toolkit, the model should support retrieving discrete lexical units that are related directly or indirectly to the question. The way in which the model exhibits and applies this relatedness or connectedness will determine in large part the effectiveness of the retrieval strategies. If it simply links everything to everything else, it makes it difficult to extract relevant units of context from within a sea of irrelevancy. If, on the other hand, the model permits very few links between elements in the graph, it reduces the opportunities for discovering relevant but nonetheless semantically dissimilar information. A well-designed graph strikes a balance: it avoids overwhelming connections that dilute relevance while ensuring enough links to discover contextually important but non-obvious relationships.
The elements in the summarization tier fulfil several different responsibilities. In terms of retrieving lexical units, statements act as the primary unit of context returned to the FM. In terms of connectedness, the summarization tier distinguishes between local and global connectedness. Topics provide local thematic connectivity between statements derived from the same source. Facts provide global connectivity between statements derived from different sources. (Topics and facts also have secondary responsibilities: topics act to group statements; facts act to annotate or furnish statements with more detail.) This division between local and global connectivity responsibilities allows retrieval strategies to control their exploration of the graph: a retriever can choose to stay mostly local, while tentatively exploring more remote opportunities, or start broad, and then narrow in on the most promising topics.
When retrieving content from the graph, retrieval strategies must first find one or more suitable entry points, before then traversing to relevant statements. The vector store plays an important part here in finding entry points. In the current lexical graph implementation, both statements and chunks are embedded. Retrievers can therefore find entry points that are semantically similar to the question, either at the chunk or the statement level, and from there explore neighboring local statements as well as hop to more indirectly connected, remote statements. Retrievers can also perform keyword lookups against entities in the entity-relationship tier, and from there navigate to statements and topics—an approach that tends to yield a broader set of statements.
The toolkit currently contains two different high-level retrievers: a TraversalBasedRetriever
and a SemanticGuidedRetriever
. The TraversalBasedRetriever
uses a combination of top-down search—finding chunks through vector similarity search, and then traversing from these chunks through topics to statements and facts—and bottom-up search, which performs keyword-based lookups of entities, and proceeds through facts to statements and topics. The SemanticGuidedRetriever
blends vector-based semantic search with structured graph traversal. It identifies entry points through semantic and keyword searches, then intelligently explores the graph through beam search and path analysis, while employing reranking and diversity filtering to achieve quality results. This hybrid approach enables both precise matching and contextual exploration.
Conclusion
In this post, we discussed how you can get started with the GraphRAG Toolkit. This open source Python library can help you build RAG applications that use a graph to retrieve structurally relevant information.
Try out the toolkit for your own use case, and share your feedback in the comments.
About the Authors
Ian Robinson is a Principal Graph Architect with Amazon Neptune. He is a co-author of ‘Graph Databases’ and ‘REST in Practice’ (both from O’Reilly) and a contributor to ‘REST: From Research to Practice’ (Springer) and ‘Service Design Patterns’ (Addison-Wesley).
Abdellah Ghassel is a Machine Learning Engineer Intern with Amazon Neptune
Source: Read More