Have you ever tried connecting a database to AWS Glue using a JDBC SSL encryption connection? It can be quite a puzzle. A few months ago, I faced this exact challenge. I thought it would be easy, but I was wrong! When I searched for help online, I couldn’t find much useful guidance. So, I rolled up my sleeves and experimented until I finally figured it out.
Now, I am sharing my learnings with you. In this blog, I’ll break down the steps in a clear, easy-to-follow way. By the end, you’ll know exactly how to connect your database to AWS Glue with SSL encryption. Let’s make this complex task a little simpler together.
Before moving ahead let’s discuss briefly how SSL encryption works
The client sends a connection request (Client Hello).
The server responds, choosing encryption (Server Hello).
The client verifies the server’s identity using its certificate and root certificate.
Key exchange establishes a shared encryption key.
Encrypted data exchanged securely.
Client may authenticate with its certificate before encrypted data exchange.
Connection terminates upon session end or timeout.
Now you got basic understanding let’s continue to configure the Glue for SSL encryption.
The steps above are the basic steps of SSL encryption process. Let’s us now discuss how to configure the AWS Glue for SSL encryption.Before we start the configuration process we need the following Formatting below
1)Client Certificate
2 Root Certificate
3) Certificate Key
into DER format. This is the format suitable for AWS glue.
DER (Distinguished Encoding Rules) is a binary encoding format used in cryptographic protocols like SSL/TLS to represent and exchange data structures defined by ASN.1. It ensures unambiguous and minimal-size encoding of cryptographic data such as certificates.
Here’s how you can do it for each component:
1 .Client Certificate (PEM):
This certificate is used by the client (in this case, AWS Glue) to authenticate itself to the server (e.g., another Database) during the SSL handshake. It includes the public key of the client and is usually signed by a trusted Certificate Authority (CA) or an intermediate CA.
If your client certificate is not already in DER format, you can convert it using the OpenSSL command-line tool:
openssl x509 -in client_certificate.pem -outform der -out client_certificate.der
Replace client_certificate.pem with the filename of your client certificate in DER format, and client_certificate.der with the desired filename for the converted DER-encoded client certificate.
Â
2.Root Certificate (PEM):
The root certificate belongs to the Certificate Authority (CA) that signed the server’s certificate (in this case, Postgre Database). It’s used by the client to verify the authenticity of the server’s certificate during the SSL.
Convert the root certificate to DER format using the following command:
openssl x509 -in root_certificate.pem -outform der -out root_certificate.der
Replace root_certificate.pem with the filename of your root certificate in DER format, and root_certificate.der with the desired filename for the converted DER-encoded root certificate.
Â
3.Certificate Key (PKCS#8 PEM):
This is the private key corresponding to the client certificate. It’s used to prove the ownership of the client certificate during the SSL handshake.
Convert the certificate key to PKCS#8 PEM format using the OpenSSL command-line tool:
openssl pkcs8 -topk8 -inform PEM -outform DER -in certificate_key.pem -out certificate_key.pk8 -nocrypt
Replace certificate_key.pem with the filename of your certificate key in PEM format, and certificate_key.pk8 with the desired filename for the converted PKCS#8 PEM-encoded certificate key.
Â
Stored the above certificates and key to S3 bucket. We will need these certificates while configuring the AWS glue.
Â
Â
To connect AWS Glue to a PostgreSQL database over SSL using PySpark, you’ll need to provide the necessary SSL certificates and configure the connection properly. Here’s an example PySpark script demonstrating how to achieve this:
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from pyspark.sql import SparkSession
# Initialize Spark and Glue contexts
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
# Define PostgreSQL connection properties
jdbc_url = “jdbc:postgresql://your_postgresql_host:5432/your_database”
connection_properties = {
“user”: “your_username”,
“password”: “your_password”,
“ssl”: “true”,
“sslmode”: “verify-ca”, # SSL mode: verify-ca or verify-full
“sslrootcert”: “s3://etl-test-bucket1/root_certificate.der”, # S3 Path to root certificate
“sslcert”: “s3://etl-test-bucket1/client_certificate.der”, # S3 Path to client certificate
“sslkey”: “s3://etl-test-bucket1/certificate_key.pk8” # S3 Path to client certificate key
}
# Load data from PostgreSQL table
dataframe = spark.read.jdbc(url=jdbc_url, table=”your_table_name”, properties=connection_properties)
# Perform data processing or analysis
# For example:
dataframe.show()
# Stop Spark session
spark.stop()
Â
Now inside your glue job click on Job details page and scroll down until you see Dependent JARs path and Referenced files path options. Under Dependent JARs path put location of S3 path where you stored the jar file and in Referenced files path add the S3 path of converted Client,Root and Key certificate separated by comma “,â€
Â
Now Click on Save option and you are ready to Go
Â
This concludes the steps to configure secure JDBC Connection with DB in AWS Glue. To summarize, in this blog we:
1)Explained how to SSL encryption can be used for secure data exchange between AWS Glue and your database(here Postgresql)
2) The steps to configure SSL Encryption in AWS Glue to secure JDBC connection with a database
Â
You can read my other blogs here
Â
Â
Â
Â
Â
Â
Source: Read MoreÂ