How to monitor hundreds of Key Vaults across multiple subscriptions for just $15-25/month
The Challenge: Key Vault Sprawl in Enterprise Azure
If you’re managing Azure at enterprise scale, you’ve likely encountered this scenario: Key Vaults scattered across dozens of subscriptions, hundreds of certificates and secrets with different expiry dates, and the constant fear of unexpected outages due to expired certificates. Manual monitoring simply doesn’t scale when you’re dealing with:
- Multiple Azure subscriptions (often 10-50+ in large organizations)
- Hundreds of Key Vaults across different teams and environments
- Thousands of certificates with varying renewal cycles
- Critical secrets that applications depend on
- Different time zones and rotation schedules
The traditional approach of spreadsheets, manual checks, or basic Azure Monitor alerts breaks down quickly. You need something that scales automatically, costs practically nothing, and provides real-time visibility across your entire Azure estate.
The Solution: Event-Driven Monitoring Architecture
Single Function App, Unlimited Key Vaults
Instead of deploying monitoring resources per Key Vault (expensive and complex), we use a centralized architecture:
Management Group (100+ Key Vaults)
↓
Single Function App
↓
Action Group
↓
Notifications
This approach provides:
- Unlimited scalability: Monitor 1 or 1000+ Key Vaults with the same infrastructure
- Cross-subscription coverage: Works across your entire Azure estate
- Real-time alerts: Sub-5-minute notification delivery
- Cost optimization: $15-25/month total (not per Key Vault!)
How It Works: The Technical Deep Dive
1. Event Grid System Topics (The Sensors)
Azure Key Vault automatically generates events when certificates and secrets are about to expire. We create Event Grid System Topics for each Key Vault to capture these events:
Event Types Monitored:
• Microsoft.KeyVault.CertificateNearExpiry
• Microsoft.KeyVault.CertificateExpired
• Microsoft.KeyVault.SecretNearExpiry
• Microsoft.KeyVault.SecretExpired
The beauty? These events are generated automatically by Azure – no polling, no manual checking, just real-time notifications when things are about to expire.
2. Centralized Processing (The Brain)
A single Azure Function App processes ALL events from across your organization:
// Simplified event processing flow
eventGridEvent → parseEvent() → extractMetadata() →
formatAlert() → sendToActionGroup()
Example Alert Generated:
{
severity: "Sev1",
alertTitle: "Certificate Expired in Key Vault",
description: "Certificate 'prod-ssl-cert' has expired in Key Vault 'prod-keyvault'",
keyVaultName: "prod-keyvault",
objectType: "Certificate",
expiryDate: "2024-01-15T00:00:00.000Z"
}
3. Smart Notification Routing (The Messenger)
Azure Action Groups handle notification distribution with support for:
- Email notifications (unlimited recipients)
- SMS alerts for critical expiries
- Webhook integration with ITSM tools (ServiceNow, Jira, etc.)
- Voice calls for emergency situations.
Implementation: Infrastructure as Code
The entire solution is deployed using Terraform, making it repeatable and version-controlled. Here’s the high-level infrastructure:
Resource Architecture
# Single monitoring resource group
resource "azurerm_resource_group" "monitoring" {
name = "rg-kv-monitoring-${var.timestamp}"
location = var.primary_location
}
# Function App (handles ALL Key Vaults)
resource "azurerm_linux_function_app" "kv_processor" {
name = "func-kv-monitoring-${var.timestamp}"
service_plan_id = azurerm_service_plan.function_plan.id
# ... configuration
}
# Event Grid System Topics (one per Key Vault)
resource "azurerm_eventgrid_system_topic" "key_vault" {
for_each = { for kv in var.key_vaults : kv.name => kv }
name = "evgt-${each.key}"
source_arm_resource_id = "/subscriptions/${each.value.subscriptionId}/resourceGroups/${each.value.resourceGroup}/providers/Microsoft.KeyVault/vaults/${each.key}"
topic_type = "Microsoft.KeyVault.vaults"
}
# Event Subscriptions (route events to Function App)
resource "azurerm_eventgrid_event_subscription" "certificate_expiry" {
for_each = { for kv in var.key_vaults : kv.name => kv }
azure_function_endpoint {
function_id = "${azurerm_linux_function_app.kv_processor.id}/functions/EventGridTrigger"
}
included_event_types = [
"Microsoft.KeyVault.CertificateNearExpiry",
"Microsoft.KeyVault.CertificateExpired"
]
}
CI/CD Pipeline Integration
The solution includes an Azure DevOps pipeline that:
- Discovers Key Vaults across your management group automatically
- Generates Terraform variables with all discovered Key Vaults
- Deploys infrastructure using infrastructure as code
- Validates deployment to ensure everything works
# Simplified pipeline flow
stages:
- stage: DiscoverKeyVaults
# Scan management group for all Key Vaults
- stage: DeployMonitoring
# Deploy Function App and Event Grid subscriptions
- stage: ValidateDeployment
# Ensure monitoring is working correctly
Cost Analysis: Why This Approach Wins
Traditional Approach (Per-Key Vault Monitoring)
100 Key Vaults × $20/month per KV = $2,000/month
Annual cost: $24,000
This Approach (Centralized Monitoring)
Base infrastructure: $15-25/month
Event Grid events: $2-5/month
Total: $17-30/month
Annual cost: $204-360
Savings: 98%+ reduction in monitoring costs
Detailed Cost Breakdown
Component | Monthly Cost | Notes |
---|---|---|
Function App (Basic B1) | $13.14 | Handles unlimited Key Vaults |
Storage Account | $1-3 | Function runtime storage |
Log Analytics | $2-15 | Centralized logging |
Event Grid | $0.50-2 | $0.60 per million operations |
Action Group | $0 | Email notifications free |
Total | $17-33 | Scales to unlimited Key Vaults |
Implementation Guide: Getting Started
Prerequisites
- Azure Management Group with Key Vaults to monitor
- Service Principal with appropriate permissions:
- Reader on Management Group
- Contributor on monitoring subscription
- Event Grid Contributor on Key Vault subscriptions
- Azure DevOps or similar CI/CD platform
Step 1: Repository Setup
Create this folder structure:
keyvault-monitoring/
├── terraform/
│ ├── main.tf # Infrastructure definitions
│ ├── variables.tf # Configuration variables
│ ├── terraform.tfvars # Your specific settings
│ └── function_code/ # Function App source code
├── azure-pipelines.yml # CI/CD pipeline
└── docs/ # Documentation
Step 2: Configuration
Update terraform.tfvars
with your settings:
# Required configuration
notification_emails = [
"your-team@company.com",
"security@company.com"
]
primary_location = "East US"
log_retention_days = 90
# Optional: SMS for critical alerts
sms_notifications = [
{
country_code = "1"
phone_number = "5551234567"
}
]
# Optional: Webhook integration
webhook_url = "https://your-itsm-tool.com/api/alerts"
Step 3: Deployment
The pipeline automatically:
- Scans your management group for all Key Vaults
- Generates infrastructure code with discovered Key Vaults
- Deploys monitoring resources using Terraform
- Validates functionality with test events
Expected deployment time: 5-10 minutes
Step 4: Validation
Test the setup by creating a short-lived certificate:
# Create test certificate with 1-day expiry
az keyvault certificate create
--vault-name "your-test-keyvault"
--name "test-monitoring-cert"
--policy '{
"issuerParameters": {"name": "Self"},
"x509CertificateProperties": {
"validityInMonths": 1,
"subject": "CN=test-monitoring"
}
}'
# You should receive an alert within 5 minutes
Operational Excellence
Monitoring the Monitor
The solution includes comprehensive observability:
// Function App performance dashboard
FunctionAppLogs
| where TimeGenerated > ago(24h)
| summarize
ExecutionCount = count(),
SuccessRate = (countif(Level != "Error") * 100.0) / count(),
AvgDurationMs = avg(DurationMs)
| extend PerformanceScore = case(
SuccessRate >= 99.5, "Excellent",
SuccessRate >= 99.0, "Good",
"Needs Attention"
)
Advanced Features and Customizations
1. Integration with ITSM Tools
The webhook capability enables integration with enterprise tools:
// ServiceNow integration example
const serviceNowPayload = {
short_description: `${objectType} '${objectName}' expiring in Key Vault '${keyVaultName}'`,
urgency: severity === 'Sev1' ? '1' : '3',
category: 'Security',
subcategory: 'Certificate Management',
caller_id: 'keyvault-monitoring-system'
};
2. Custom Alert Routing
Different Key Vaults can route to different teams:
// Route alerts based on Key Vault naming convention
const getNotificationGroup = (keyVaultName) => {
if (keyVaultName.includes('prod-')) return 'production-team';
if (keyVaultName.includes('dev-')) return 'development-team';
return 'platform-team';
};
3. Business Hours Filtering
Critical alerts can bypass business hours, while informational alerts respect working hours:
const shouldSendImmediately = (severity, currentTime) => {
if (severity === 'Sev1') return true; // Always send critical alerts
const businessHours = isBusinessHours(currentTime);
return businessHours || isNearBusinessHours(currentTime, 2); // 2 hours before business hours
};
Troubleshooting Common Issues
Issue: No Alerts Received
Symptoms:
Events are visible in Azure, but no notifications are arriving
Resolution Steps:
- Check the Action Group configuration in the Azure Portal
- Verify the Function App is running and healthy
- Review Function App logs for processing errors
- Validate Event Grid subscription is active
Issue: High Alert Volume
Symptoms:
Too many notifications, alert fatigue
Resolution:
// Implement intelligent batching
const batchAlerts = (alerts, timeWindow = '15m') => {
return alerts.reduce((batches, alert) => {
const key = `${alert.keyVaultName}-${alert.objectType}`;
batches[key] = batches[key] || [];
batches[key].push(alert);
return batches;
}, {});
};
Issue: Missing Key Vaults
Symptoms: Some Key Vaults are not included in monitoring
Resolution:
- Re-run the discovery pipeline to pick up new Key Vaults
- Verify service principal has Reader access to all subscriptions
- Check for Key Vaults in subscriptions outside the management group
Source: Read MoreÂ