Overview:
Databricks supports a wide range of compliance standards to meet the needs of highly regulated industries, including:
- HIPAA (Health Insurance Portability and Accountability Act)
- PCI-DSS (Payment Card Industry Data Security Standard)
- FedRAMP High & Moderate
- DoD IL5
- IRAP (Australia)
- GDPR (EU)
- CCPA (California)
However, I was surprised to read that Databricks Serverless workloads are not covered for PCI-DSS (Databricks PCI DSS Compliance | Databricks) and got curious to find the reason behind it. Based on my research, I managed to convince with the reason and would like to share it over here.
To begin with, let’s understand different Databricks SQL Warehouse types and its capabilities,
Pro SQL Warehouse |
Classic SQL Warehouse |
Serverless SQL Warehouse |
|
|
|
Databricks SQL (Classic/Pro):
- In Databricks SQL (Classic/Pro) warehouses, compute resources in customer account will be leveraged
- When running workloads using Databricks SQL (Classic/Pro), data is processed by the compute resources which are managed by the customers
- Customers will have more control and monitoring over the compute resources
- Data getting processed will also reside within network boundary of the customer cloud account
Databricks SQL (Serverless):
- In Databricks SQL (Serverless) warehouse, compute resources in Databricks account will be leveraged
- Serverless compute operates on multi-tenant architecture, where compute resources are shared across different customers
- Compute resources are completed managed by Databricks and customers will have less control and monitoring ability on the networking and compute resources
- Different workload data is processed within compute resources of Databricks account
- Though customers have less control over the compute, they can greatly benefit out of the capabilities that Serverless warehouses exhibit
Final View:
- PCI-DSS requires strict isolation of environments handling cardholder data, which is difficult to guarantee in a shared setup
- It mandates restricted and monitored network access, especially for systems handling payment data
- It requires fine-grained control and auditing, which is more feasible in dedicated or customer-managed environments
- Databricks recommends using classic or pro clusters with dedicated VPCs, private networking, and enhanced security controls for PCI DSS compliant workloads
- Additionally, Databricks dedicates effort to bring in more isolation boundaries within Serverless compute
Source: Read MoreÂ