Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 31, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 31, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 31, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 31, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025

      How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025
      Recent

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025

      Filament Is Now Running Natively on Mobile

      May 31, 2025

      How Remix is shaking things up

      May 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025
      Recent

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Faster distributed graph neural network training with GraphStorm v0.4

    Faster distributed graph neural network training with GraphStorm v0.4

    February 11, 2025

    GraphStorm is a low-code enterprise graph machine learning (ML) framework that provides ML practitioners a simple way of building, training, and deploying graph ML solutions on industry-scale graph data. Although GraphStorm can run efficiently on single instances for small graphs, it truly shines when scaling to enterprise-level graphs in distributed mode using a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances or Amazon SageMaker.

    Today, AWS AI released GraphStorm v0.4. This release introduces integration with DGL-GraphBolt, a new graph storage and sampling framework that uses a compact graph representation and pipelined sampling to reduce memory requirements and speed up Graph Neural Network (GNN) training and inference. For the large-scale dataset examined in this post, the inference speedup is 3.6 times faster, and per-epoch training speedup is 1.4 times faster, with even larger speedups possible.

    To achieve this, GraphStorm v0.4 with DGL-GraphBolt addresses two crucial challenges of graph learning:

    • Memory constraints – GraphStorm v0.4 provides compact and distributed storage of graph structure and features, which may grow in the multi-TB range. For example, a graph with 1 billion nodes with 512 features per node and 10 billion edges will require more than 4 TB of memory to store, which necessitates distributed computation.
    • Graph sampling – In multi-layer GNNs, you need to sample neighbors of each node to propagate their representations. This can lead to exponential growth in the number of nodes sampled, potentially visiting the entire graph for a single node’s representation. GraphStorm v0.4 provides efficient, pipelined graph sampling.

    In this post, we demonstrate how GraphBolt enhances GraphStorm’s performance in distributed settings. We provide a hands-on example of using GraphStorm with GraphBolt on SageMaker for distributed training. Lastly, we share how to use Amazon SageMaker Pipelines with GraphStorm.

    GraphBolt: Pipeline-driven graph sampling

    GraphBolt is a new data loading and graph sampling framework developed by the DGL team. It streamlines the operations needed to sample efficiently from a heterogeneous graph and fetch the corresponding features. GraphBolt introduces a new, more compact graph structure representation for heterogeneous graphs, called fused Compressed Sparse Column (fCSC). This can reduce the memory cost of storing a heterogeneous graph by up to 56%, allowing users to fit larger graphs in memory and potentially use smaller, more cost-efficient instances for GNN model training.

    GraphStorm v0.4 seamlessly integrates with GraphBolt, allowing users to take advantage of its performance improvements in their GNN workflows. The user just needs to provide the additional argument --use-graphbolt true when launching graph construction and training jobs.

    Solution overview

    A common model development process is to perform model exploration locally on a subset of your full data, and when you’re satisfied with the results, train the full-scale model. This setup allows for cheaper exploration before training on the full dataset. GraphStorm and SageMaker Pipelines allows you to do that by creating a model pipeline you can run locally to retrieve model metrics, and when you’re ready, run your pipeline on the full data on SageMaker, and produce models, predictions, and graph embeddings to use in downstream tasks. In the next section, we show how to set up such pipelines for GraphStorm.

    We demonstrate such a setup in the following diagram, where a user can perform model development and initial training on a single EC2 instance, and when they’re ready to train on their full data, hand off the heavy lifting to SageMaker for distributed training. Using SageMaker Pipelines to train models provides several benefits, like reduced costs, auditability, and lineage tracking.

    GraphStorm SageMaker Arhcitecture Diagram

    Prerequisites

    To run this example, you will need an AWS account, an Amazon SageMaker Studio domain, and the necessary permissions to run BYOC SageMaker jobs.

    Set up the environment for SageMaker distributed training

    You will use the example code available in the GraphStorm repository to run through this example.

    Setting up your environment should take around 10 minutes. First, set up your Python environment to run the examples:

    conda init
    eval $SHELL
    # Create a new env for the post
    conda create --name gsf python=3.10
    conda activate gsf
    
    # Install dependencies for local scripts
    pip install torch==2.3.0 --index-url https://download.pytorch.org/whl/cpu
    pip install sagemaker boto3 ogb pyarrow
    # Verify installation, might take a few minutes for first run
    python -c "import sagemaker; import torch"
    
    # Clone the GraphStorm repository to access the example code
    git clone https://github.com/awslabs/graphstorm.git ~/graphstorm

    Build a GraphStorm SageMaker CPU image

    Next, build and push the GraphStorm PyTorch Docker image that you will use to run the graph construction, training, and inference jobs for smaller-scale data. Your role will need to be able to pull images from the Amazon ECR Public Gallery and create Amazon Elastic Container Registry (Amazon ECR) repositories and push images to your private ECR registry.

    # Enter you account ID here
    ACCOUNT_ID=<aws-account-id>
    REGION=us-east-1
    
    cd ~/graphstorm
    bash docker/build_graphstorm_image.sh --environment sagemaker --device cpu
    bash docker/push_graphstorm_image.sh -e sagemaker -r $REGION -a $ACCOUNT_ID -d cpu
    # This will create an ECR repository and push an image to
    # ${ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/graphstorm:sagemaker-cpu

    Download and prepare datasets

    In this post, we use two citation datasets to demonstrate the scalability of GraphStorm. The Open Graph Benchmark (OGB) project hosts a number of graph datasets that can be used to benchmark the performance of graph learning systems. For a small-scale demo, we use the ogbn-arxiv dataset, and for a demonstration of GraphStorm’s large-scale learning capabilities, we use the ogbn-papers100M dataset.

    Prepare the ogbn-arxiv dataset

    Download the smaller-scale ogbn-arxiv dataset to run a local test before launching larger-scale SageMaker jobs on AWS. This dataset has approximately 170,000 nodes and 1.2 million edges. Use the following code to download the data and prepare it for GraphStorm:

    # Provide the S3 bucket to use for output
    BUCKET_NAME=<your-s3-bucket>

    You use the following script to directly download, transform and upload the data to Amazon Simple Storage Service (Amazon S3):

    cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt
    python convert_arxiv_to_gconstruct.py 
    --output-s3-prefix s3://$BUCKET_NAME/ogb-arxiv-input

    This will create the tabular graph data in Amazon S3, which you can verify by running the following code:

    aws s3 ls s3://$BUCKET_NAME/ogb-arxiv-input/ 
    edges/
    nodes/
    splits/
    gconstruct_config_arxiv.json

    Finally, upload GraphStorm training configuration files for arxiv to use for training and inference:

    # Upload the training configurations to S3
    aws s3 cp ~/graphstorm/training_scripts/gsgnn_np/arxiv_nc.yaml 
    s3://$BUCKET_NAME/yaml/arxiv_nc_train.yaml
    aws s3 cp ~/graphstorm/inference_scripts/np_infer/arxiv_nc.yaml 
    s3://$BUCKET_NAME/yaml/arxiv_nc_inference.yaml

    Prepare the ogbn-papers100M dataset on SageMaker

    The papers-100M dataset is a large-scale graph dataset, with 111 million nodes and 3.2 billion edges after adding reverse edges.

    To download and preprocess the data as an Amazon SageMaker Processing step, use the following code. You can launch and let the job run in the background while proceeding through the rest of the post, and return to this dataset later. The job should take approximately 45 minutes to run.

    # Navigate to the example code
    cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt
    
    # Build and push a Docker image to download and process the papers100M data
    bash build_and_push_papers100M_image.sh -a $ACCOUNT_ID -r $REGION
    
    # This creates an ECR repository and pushes an image to
    # $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/papers100m-processor
    
    # Run a SageMaker job to do the processing and upload the output to S3
    SAGEMAKER_EXECUTION_ROLE_ARN=<your-sagemaker-execution-role-arn>
    aws configure set region $REGION
    python sagemaker_convert_papers100m.py 
    --output-bucket $BUCKET_NAME 
    --execution-role-arn $SAGEMAKER_EXECUTION_ROLE_ARN 
    --region $REGION 
    --instance-type ml.m5.4xlarge 
    --image-uri $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/papers100m-processor

    This will produce the processed data in s3://$BUCKET_NAME/ogb-papers100M-input, which can then be used as input to GraphStorm. While this job is running, you can create the GraphStorm pipelines.

    Create a SageMaker pipeline

    Run the following command to create a SageMaker pipeline:

    # Navigate to the example code
    cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt
    
    PIPELINE_NAME="ogbn-arxiv-gs-pipeline"
    
    bash deploy_arxiv_pipeline.sh 
    --account $ACCOUNT_ID
    --bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
    --pipeline-name $PIPELINE_NAME 
    --use-graphbolt false

    Inspect the pipeline

    Running the preceding code will create a SageMaker pipeline configured to run three SageMaker jobs in sequence:

    • A GConstruct job that converts the tabular file input to a binary partitioned graph on Amazon S3
    • A GraphStorm training job that trains a node classification model and saves the model to Amazon S3
    • A GraphStorm inference job that produces predictions for all nodes in the test set, and creates embeddings for all nodes

    To review the pipeline, navigate to SageMaker AI Studio, choose the domain and user profile you used to create the pipeline, then choose Open Studio.

    In the navigation pane, choose Pipelines. There should be a pipeline named ogbn-arxiv-gs-pipeline. Choose the pipeline, which will take you to the Executions tab for the pipeline. Choose Graph to view the pipeline steps.

    GraphStorm SageMaker Pipeline on SageMaker Studio

    Run the SageMaker pipeline locally for ogbn-arxiv

    The ogbn-arxiv dataset is small enough that you can run the pipeline locally. Run the following command to start a local execution of the pipeline:

    # Allow the local containers to inherit AWS credentials
    export USE_SHORT_LIVED_CREDENTIALS=1
    python ~/graphstorm/sagemaker/pipeline/execute_sm_pipeline.py 
    --pipeline-name ogbn-arxiv-gs-pipeline 
    --region us-east-1 
    --local-execution | tee arxiv-local-logs.txt

    We save the log output to arxiv-local-logs.txt. You will use that later to analyze the training speed.

    Running the pipeline should take approximately 5 minutes. When the pipeline is complete, it will print a message like the following:

    Pipeline execution 655b9357-xxx-xxx-xxx-4fc691fcce94 SUCCEEDED

    You can inspect the mean epoch and evaluation time using the provided analyze_training_time.py script and the log file you created:

    python analyze_training_time.py --log-file arxiv-local-logs.txt
    
    Reading logs from file: arxiv-local-logs.txt
    
    === Training Epochs Summary ===
    Total epochs completed: 10
    Average epoch time: 4.70 seconds
    
    === Evaluation Summary ===
    Total evaluations: 11
    Average evaluation time: 1.90 seconds

    These numbers will vary depending on your instance type; in this case, these are values reported on an m6in.4xlarge instance.

    Create a GraphBolt pipeline

    Now you have established a baseline for performance, you can create another pipeline that uses the GraphBolt graph representation to compare the performance.

    You can use the same pipeline creation script, but change two variables, providing a new pipeline name and setting --use-graphbolt to “true”:

    # Deploy a GraphBolt-enabled pipeline
    PIPELINE_NAME_GB="ogbn-arxiv-gs-graphbolt-pipeline"
    bash deploy_arxiv_pipeline.sh 
    --account $ACCOUNT_ID 
    --bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
    --pipeline-name $PIPELINE_NAME_GB 
    --use-graphbolt true
    
    # Execute the pipeline locally
    python ~/graphstorm/sagemaker/pipeline/execute_sm_pipeline.py 
    --pipeline-name $PIPELINE_NAME_GB 
    --region us-east-1 
    --local-execution | tee arxiv-local-gb-logs.txt

    Analyzing the training logs, you can see the per-epoch time has dropped somewhat:

    python analyze_training_time.py --log-file arxiv-local-gb-logs.txt
    
    Reading logs from file: arxiv-local-gb-logs.txt
    
    === Training Epochs Summary ===
    Total epochs completed: 10
    Average epoch time: 4.21 seconds
    
    === Evaluation Summary ===
    Total evaluations: 11
    Average evaluation time: 1.63 seconds

    For such a small graph, the performance gains are modest, around 13% per epoch time. With large data, the potential gains are much greater. In the next section, you will create a pipeline and train a model for papers-100M, a citation graph with 111 million nodes and 3.2 billion edges.

    Create a SageMaker pipeline for distributed training

    After the SageMaker processing job that prepares the papers-100M data has finished processing and the data is stored in Amazon S3, you can set up a pipeline to train a model on that dataset.

    Build the GraphStorm GPU image

    For this job, you will use large GPU instances, so you will build and push the GPU image this time:

    cd ~/graphstorm
    
    bash ./docker/build_graphstorm_image.sh --environment sagemaker --device gpu
    
    bash docker/push_graphstorm_image.sh -e sagemaker -r $REGION -a $ACCOUNT_ID -d gpu

    Deploy and run pipelines for papers-100M

    Before you deploy your new pipeline, upload the training YAML configuration for papers-100M to Amazon S3:

    aws s3 cp 
    ~/graphstorm/training_scripts/gsgnn_np/papers100M_nc.yaml 
    s3://$BUCKET_NAME/yaml/

    Now you are ready to deploy your initial pipeline for papers-100M:

    # Navigate to the example code 
    cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt 
    PIPELINE_NAME="ogb-papers100M-pipeline" 
    bash deploy_papers100M_pipeline.sh  
        --account $ACCOUNT_ID 
        --bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
        --pipeline-name $PIPELINE_NAME  
        --use-graphbolt false

    Run the pipeline on SageMaker and let it run in the background:

    # Navigate to the example code
    cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt
    
    PIPELINE_NAME="ogb-papers100M-pipeline"
    bash deploy_papers100M_pipeline.sh 
    --account $ACCOUNT_ID 
    --bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
    --pipeline-name $PIPELINE_NAME 
    --use-graphbolt false

    Your account needs to meet the required quotas for the requested instances. For this post, the defaults are set to four ml.g5.48xlarge for training jobs and one ml.r5.24xlarge instance for a processing job. To adjust your SageMaker service quotas, you can use the Service Quotas console. To run both pipelines in parallel, i.e. without GraphBolt and with GraphBolt, you will need 8 x $TRAIN_GPU_INSTANCE and 2 x $GCONSTRUCT_INSTANCE.

    Next, you can deploy and run another pipeline, with GraphBolt enabled:

    # Deploy the GraphBolt-enabled pipeline
    PIPELINE_NAME_GB="ogb-papers100M-graphbolt-pipeline"
    bash deploy_papers100M_pipeline.sh 
    --account $ACCOUNT_ID
    --bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
    --pipeline-name $PIPELINE_NAME_GB 
    --use-graphbolt true
    
    # Execute the GraphBolt pipeline on SageMaker
    python ~/graphstorm/sagemaker/pipeline/execute_sm_pipeline.py 
    --pipeline-name $PIPELINE_NAME_GB 
    --region us-east-1 
    --async-execution
    

    Compare performance for GraphBolt-enabled training

    After both pipelines are complete, which should take approximately 4 hours, you can compare the training times for both cases.

    On the Pipelines page of the SageMaker console, there should be two new pipelines named ogb-papers100M-pipeline and ogb-papers100M-graphbolt-pipeline. Choose ogb-papers100M-pipeline, which will take you to the Executions tab for the pipeline. Copy the name of the latest successful execution and use that to run the training analysis script:

    python analyze_training_time.py 
    --pipeline-name $PIPELINE_NAME
    --execution-name execution-1734404366941

    Your output will look like the following code:

    == Training Epochs Summary ===
    Total epochs completed: 15
    Average epoch time: 73.95 seconds
    
    === Evaluation Summary ===
    Total evaluations: 15
    Average evaluation time: 15.07 seconds

    Now do the same for the GraphBolt-enabled pipeline:

    python analyze_training_time.py 
    --pipeline-name $PIPELINE_NAME_GB 
    --execution-name execution-1734463209078

    You will see the improved per-epoch and evaluation times:

    == Training Epochs Summary ===
    Total epochs completed: 15
    Average epoch time: 54.54 seconds
    
    === Evaluation Summary ===
    Total evaluations: 15
    Average evaluation time: 4.13 seconds

    Without loss in accuracy, the latest version of GraphStorm achieved a speedup of approximately 1.4 times faster per epoch for training, and a speedup of 3.6 times faster in evaluation time! Depending on the dataset, the speedups can be even greater, as shown by the DGL team’s benchmarking.

    Conclusion

    This post showcased how GraphStorm 0.4, integrated with DGL-GraphBolt, significantly speeds up large-scale GNN training and inference, by 1.4 and 3.6 times faster, respectively, as measured on the papers-100M dataset. As shown in the DGL benchmarks, even larger speedups are possible depending on the dataset.

    We encourage ML practitioners working with large graph data to try GraphStorm. Its low-code interface simplifies building, training, and deploying graph ML solutions on AWS, allowing you to focus on modeling rather than infrastructure.

    To get started, visit the GraphStorm documentation and GraphStorm GitHub repository.


    About the author

    Theodore Vasiloudis is a Senior Applied Scientist at Amazon Web Services, where he works on distributed machine learning systems and algorithms. He led the development of GraphStorm Processing, the distributed graph processing library for GraphStorm and is a core developer for GraphStorm. He received his PhD in Computer Science from the KTH Royal Institute of Technology, Stockholm, in 2019.

    Xiang Song is a Senior Applied Scientist at Amazon Web Services, where he develops deep learning frameworks including GraphStorm, DGL, and DGL-KE. He led the development of Amazon Neptune ML, a new capability of Neptune that uses graph neural networks for graphs stored in a Neptune graph database. He is now leading the development of GraphStorm, an open source graph machine learning framework for enterprise use cases. He received his PhD in computer systems and architecture at the Fudan University, Shanghai, in 2014.

    Florian Saupe is a Principal Technical Product Manager at AWS AI/ML research supporting science teams like the graph machine learning group, and ML Systems teams working on large scale distributed training, inference, and fault resilience. Before joining AWS, Florian lead technical product management for automated driving at Bosch, was a strategy consultant at McKinsey & Company, and worked as a control systems and robotics scientist—a field in which he holds a PhD.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleAmazon Q Business simplifies integration of enterprise knowledge bases at scale
    Next Article How to Combine currentcolor with Relative Color Syntax in CSS

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 31, 2025
    Machine Learning

    Cisco’s Latest AI Agents Report Details the Transformative Impact of Agentic AI on Customer Experience

    May 31, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    How To Build A Multilingual Website With Nuxt.js

    Development

    Turntable scrobbles your music

    Linux

    3 Actively Exploited Zero-Day Flaws Patched in Microsoft’s Latest Security Update

    Development

    Secret Blizzard Deploys Kazuar Backdoor in Ukraine Using Amadey Malware-as-a-Service

    Development

    Highlights

    IMPORTANT! Update Node.JS to 18.20.4, 20.15.1, 22.4.1 or newer!

    July 8, 2024

    Comments Source: Read More 

    GraphStorm 0.3: Scalable, multi-task learning on graphs with user-friendly APIs

    August 2, 2024

    Secure by Default: Mandatory MFA in MongoDB Atlas

    March 16, 2025

    First developer preview of Android 16 marks start of faster release cycle for Android APIs

    November 19, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.