In this post, we describe some of the openCypher features that have been released as part of the 1.4.2.0 engine update to Amazon Neptune. Neptune is a fast, reliable, and fully managed graph database service for building and running applications with highly connected datasets, such as knowledge graphs, fraud graphs, identity graphs, and security graphs. Neptune provides developers with the choice of building their graph applications using three open graph query languages: openCypher, Apache TinkerPop Gremlin, and the World Wide Web Consortium’s (W3C) SPARQL 1.1. Neptune announced the general availability of the latest engine release to 1.4.2.0 on December 19, 2024. Starting with this release, you can benefit from a variety of new features and improvements, including custom functions and support for the CALL
subquery, which we further discuss in this post. You can use the guide at the end of this post to try out the new features that are described.
Support for read-only CALL subqueries
For queries that require sub-query support, such as executing a specific openCypher query on a node-by-node basis, support for the CALL
function was added. Prior to this, if you wanted to execute additional MATCH
statements against a collection of data, it was necessary to split the code into multiple queries, passing the output of one query as the input to the next.
The following is an example of using the CALL
functionality to run a second query that will be run for each result in stopover
. The initial MATCH
starts at the Austin Bergstrom International (AUS) airport and performs a single-hop traversal across the route
edge to a connected stopover
node. For each stopover
, it then retrieves the first two airports connected to the stopover
, ordered by the route
distance property value in descending order.
Prior to support for CALL
, the preceding query would not have been possible in openCypher, because Neptune didn’t support functionality to run a subquery on a per-object basis. For more information on how the CALL
subquery works, how to write queries using it, and current limitations, see CALL subquery support in Neptune.
Support for Neptune openCypher custom functions
Neptune openCypher functions are additions to the Neptune openCypher specification implementation that support customer requirements such as string matching, and collection and map sorting. The following functions are available in Neptune Database version 1.4.2.0 and above, as well as Amazon Neptune Analytics.
textIndexOf(text :: STRING, lookup :: STRING, from = 0 :: INTEGER?, to = -1 :: INTEGER?) :: (INTEGER?)
This function returns the index of the first occurrence of lookup
in the range of text
starting at offset from
(inclusive), through offset to
(exclusive). If to
is -1, the range continues to the end of text
. Indexing is zero-based and is expressed in Unicode scalar values (non-surrogate code points). In the following example, we search for a specific expression ‘e’ in the value ‘Amazon Neptune’:
collToSet(values :: LIST OF ANY?) :: (LIST? OF ANY?)
If you wanted to return a list containing only a unique set of values, you would need to combine COLLECT
with DISTINCT
to produce the results. For example, the following query produces a unique collection of names of airports that have connecting routes to airports located in the US:
collSubtract(first :: LIST OF ANY?, second :: LIST OF ANY?) :: (LIST? OF ANY?)
If you want to return a list that contains only values that are present in one list and not another, you can use the collSubtract
function. This function returns a new list containing all the unique elements from the first list, excluding elements from the second list. The order of the list is maintained. For example, the following query produces a unique collection of names of airports in the US that have connecting routes to airports located in France, but don’t also connect to airports in the UK:
collIntersection(first :: LIST? OF ANY?, second :: LIST? OF ANY?) :: (LIST? OF ANY?)
If you want to return a list that contains only the items that exist in two given lists, you can use collIntersection
. This function returns a new unique list of items that are present in both of the given parameter lists. For example, the following query produces a unique collection of airport names that have routes originating from either London Heathrow (LHR) or Seattle-Tacoma International (SEA) airport:
Sorting functions
Neptune sorting functions focus on improving readability and providing flexibility for use cases that involve sorting over complex data types such as single or multiple maps, as well as providing single or multiple sort keys. For each of the sorting functions, the default sorting order is ascending.
collSort(coll :: LIST OF ANY, config :: MAP?) :: (LIST? OF ANY?)
You can use the collSort
function to sort a list of values. The function returns a new, sorted list, containing the original list elements. By default, it sorts the values in ascending order, but you can modify this behavior by providing a map configuration, such as in the following example. This query sorts the names in descending alphabetical order of the first 10 airports located in the US:
collSortMaps(coll :: LIST OF MAP, config :: MAP) :: (LIST? OF ANY?)
If you have a list of map objects, as opposed to a list of single data type values, you can use collSortMaps
to sort the list of maps based on a map property. To do this, you must provide a configuration map that specifies the property name to sort, along with the sort direction. For example, the following configuration specifies the desc
map property as the sort key, and the order of the sorting to be in ascending order:
The following query returns a collection of map objects based on the desc
property and code
property of all airports located in the US. It then sorts this collection by the code
property in descending order, before outputting the top 10 results:
collSortMulti(coll :: LIST OF MAP?, configs = [] :: LIST OF MAP, limit = -1 :: INTEGER?, skip = 0 :: INTEGER?) :: (LIST? OF ANY?)
Extending the functionality of sorting lists of maps, collSortMulti
enables you to sort on multiple map properties, as well as optionally providing limit
and skip
configurations. For example, the following configuration specifies that each map should be first sorted using the runways
property in descending order, then by the desc
property in ascending order (default), skipping the first 10 records, and limited to the next 20:
The following query returns a collection of map objects using the preceding configuration:
collSortNodes(coll :: LIST OF NODE, config :: MAP) :: (LIST? OF NODE?)
Some use cases require returning a sorted collection of nodes as opposed to maps of node values. For this, you can use collSortNodes
, which sorts an input list of nodes based on the specified configuration, similar to collSortMaps
. The following configuration defines the sort key as runways
and the sort order as descending:
This configuration is demonstrated in the following query, which returns the top 10 airports in the world ordered by the number of runways, with the airport with the largest number of runways first, and those with fewer airports last:
Bulk load data using Neptune Database or Neptune Analytics
The preceding queries use the publicly available air-routes dataset that can be ingested into your Neptune Database cluster or Neptune Analytics graph automatically using the %seed Workbench magic command. Alternatively, you can bulk load the data into either Neptune Database or Neptune Analytics from a Neptune notebook, using the following commands.
For Neptune Database, you can use the bulk load API:
For more information about setting up IAM roles, and examples of initiating a Neptune Database bulk load using curl, see Example: Loading Data into a Neptune DB Instance.
For Neptune Analytics, you can use the neptune.load command with CALL
:
Note that you are responsible for any costs incurred while trying out these examples on your Neptune Database cluster or Neptune Analytics graph.
Conclusion
In this post, we described how Neptune has extended the openCypher graph query language to provide you with additional functionality that in some cases wasn’t previously available, meaning queries needed to be segregated between the graph and application code, creating a more complex, highly coupled solution architecture.
You can find a complete list of improvements and fixes in the release notes. The following are a few ways to get started with this release:
- Create your first Neptune cluster as part of the AWS Free Tier
- Upgrade your existing Neptune cluster to 1.4.2.0 or later to take advantage of the latest features
- Use the open source graph-explorer application to quickly visualize and explore graphs on Neptune
- Run the open source graph-notebook library on Jupyter or JupyterLab notebooks to interactively query and build graph applications on Neptune
Leave your questions in the comments section.
About the authors
Kevin Phillips is a Sr. Neptune Specialist Solutions Architect working in the UK at Amazon Web Services, having spent the last 4 years working with customers across EMEA to get started and accelerate with graphs. He has over 20 years of development and solutions architectural experience, which he uses to help support and guide customers.
Source: Read More