Amazon Bedrock cross-Region inference capability that provides organizations with flexibility to access foundation models (FMs) across AWS Regions while maintaining optimal performance and availability. However, some enterprises implement strict Regional access controls through service control policies (SCPs) or AWS Control Tower to adhere to compliance requirements, inadvertently blocking cross-Region inference functionality in Amazon Bedrock. This creates a challenging situation where organizations must balance security controls with using AI capabilities.
In this post, we explore how to modify your Regional access controls to specifically allow Amazon Bedrock cross-Region inference while maintaining broader Regional restrictions for other AWS services. We provide practical examples for both SCP modifications and AWS Control Tower implementations.
Understanding cross-Region inference
When running model inference in on-demand mode, your requests might be restricted by service quotas or during peak usage times. Cross-Region inference enables you to seamlessly manage unplanned traffic bursts by utilizing compute across different Regions. With cross-Region inference, you can distribute traffic across multiple Regions, enabling higher throughput.
Many organizations implement Regional access controls through:
- SCPs in AWS Organizations
- AWS Control Tower controls
- Custom AWS Identity and Access Management (IAM) policies
These controls typically deny access to all services in specific Regions for security, compliance, or cost management reasons. However, these broad denials also prevent Amazon Bedrock from functioning properly when it needs to access models in those Regions through cross-Region inference.
How Cross-Region inference works and interacts with SCPs
Cross-Region inference in Amazon Bedrock is a powerful feature that enables automatic cross-Region routing for inference requests. This capability is particularly beneficial for developers using on-demand inference mode, because it provides a seamless solution for achieving higher throughput and performance while effectively managing incoming traffic spikes in applications powered by Amazon Bedrock.
With cross-Region inference, developers can alleviate the need to predict demand fluctuations manually. Instead, the system dynamically routes traffic across multiple Regions, maintaining optimal resource utilization and performance. Importantly, cross-Region inference prioritizes the connected Amazon Bedrock API source Region when possible, helping minimize latency and improve overall responsiveness. This intelligent routing enhances applications’ reliability, performance, and efficiency without requiring constant oversight from development teams.
At its core, cross-Region inference operates on two key concepts: the source Region and the fulfillment Region. The source Region, also known as the origination Region, is where the inference request is initially invoked by the client. In contrast, the fulfillment Region is the Region that actually services the large language model (LLM) invocation request.
Cross-Region inference employs a proprietary custom routing logic that Amazon continuously evolves to provide the best inference experience for customers. This routing mechanism is intentionally heuristics-based, with a primary focus on providing high availability. By default, the service attempts to fulfill requests from the source Region, when possible, but it can seamlessly route requests to other Regions as needed. This intelligent routing considers factors such as Regional capacity, latency, and availability to make optimal decisions.
Although cross-Region inference offers powerful flexibility, it requires access to models in all potential fulfillment Regions to function properly. This requirement is where SCPs can significantly impact cross-Region inference functionality.
Let’s examine a scenario that highlights the critical interaction between cross-Region inference and SCPs. As illustrated in the following figure, we use two Regions, us-east-1 and us-west-2, and have denied all other Regions using an SCP that could have been implemented using AWS Organizations or an AWS Control Tower control.
The workflow consists of the following steps:
- A user makes an inference request to the
us-east-1
Amazon Bedrock endpoint (source Region) using a cross-Region inference profile. - The Amazon Bedrock heuristics-based routing system evaluates available Regions for request fulfillment.
us-west-2
andus-east-1
are allowed for Amazon Bedrock service access through SCPs, butus-east-2
is denied using the SCP.- This single Regional restriction (
us-east-2
) causes the cross-Region inference call to fail. - Even though other Regions are available and allowed, the presence of one blocked Region (
us-east-2
) results in a failed request. - The client receives an error indicating they are not authorized to perform the action.
This behavior is by design; cross-Region inference service requires access to run inference in all potential fulfillment Regions to maintain its ability to optimally route requests. Attempts to use cross-Region inference will fail if any potential target Region is blocked by SCPs, regardless of other available Regions. To successfully implement cross-Region inference, organizations must make sure that their SCPs allow Amazon Bedrock api actions in all Regions where their target model is available. This means identifying all Regions where required models are hosted, modifying SCPs to allow minimal required Amazon Bedrock permissions in these Regions, and maintaining these permissions across all relevant Regions, even if some Regions are not primary operation zones. We will provide specific guidance on SCP modifications and AWS Control Tower implementations that enable cross-Region inference functionality in the following sections.
Use case
For our sample use case, we use Regions us-east-1
and us-west-2
. All other Regions are denied using the landing zone deny (GRREGIONDENY). The customer’s AWS accounts that are allowed to use Amazon Bedrock are under an Organizational Unit (OU) called Sandbox. We want to enable the accounts under the Sandbox OU to use Anthropic’s Claude 3.5 Sonnet v2 model using cross-Region inference. This model is available in us-east-1
, us-east-2
, and us-west-2
, as shown in the following screenshot.
In the current state, when the user tries to use Anthropic’s Claude 3.5 Sonnet v2 model using cross-Region inference, they get an error stating the SCP is denying the action.
Modify existing SCPs to allow Amazon Bedrock cross-Region inference
If you aren’t using AWS Control Tower to govern the multi-account AWS environment, you can create a new SCP or modify an existing SCP to allow Amazon Bedrock cross-Region inference.
The following code is an example of how to modify an existing SCP that denies access to all services in specific Regions while allowing Amazon Bedrock inference through cross-Region inference for Anthropic’s Claude 3.5 Sonnet V2 model:
This policy effectively blocks all actions in the us-east-2
Region except for the specified resources. This is a deny-based policy, which means it should be used in conjunction with allow policies to define a full set of permissions.
You should review and adapt this example to your organization’s specific needs and security requirements before implementing it in a production environment.
When implementing these policies, consider the following:
- Customize the Region and allowed resources to fit your specific requirements
- Test thoroughly in your environment to make sure that it doesn’t unintentionally block necessary services or actions
- Remember that SCPs affect the users and roles in the accounts they’re attached to, including the root user
- Service-linked roles are not affected by SCPs, allowing other AWS services to integrate with AWS Organizations
Implementation using AWS Control Tower
AWS Control Tower creates SCPs to manage permissions across your organization. Manually editing these SCPs is not recommended because it can cause drift in your AWS Control Tower environment. However, there are some approaches you can take to allow specific AWS services, which we discuss in the following sections.
Prerequisites
Make sure that you’re running the latest version of AWS Control Tower. If you’re using a version less than 3.x and have Regions denied through AWS Control Tower settings, you need to enable your AWS Control Tower version to update the Region deny settings. Refer to the following considerations related to AWS Control Tower upgrades from 2.x to 3.x.
Additionally, make sure that the Organization dashboard on AWS Control Tower doesn’t show policy drifts and that the OUs and accounts are in compliance.
Option 1: Extend existing Region deny SCPs for cross-Region inference
AWS Control Tower offers two primary Region deny controls to restrict access to AWS services based on Regions:
- GRREGIONDENY (landing zone Region deny control) – This control applies to the entire landing zone rather than specific OUs. When enabled, it disallows access to operations in global and Regional services outside of specified Regions, including all Regions where AWS Control Tower is not available and all Regions not selected for governance.
- MULTISERVICE.PV.1 (OU Region deny control) – This configurable control can be applied to specific OUs rather than the entire landing zone. It disallows access to unlisted operations in global and Regional AWS services outside of specified Regions for an organizational unit. This control is configurable. This control accepts one or more parameters, such as
AllowedRegions
,ExemptedPrincipalARNs
, andExemptedActions
, which describe operations that are allowed for accounts that are part of this OU:- AllowedRegions – Specifies the Regions selected, in which the OU is allowed to operate. This parameter is mandatory.
- ExemptedPrincipalARNs – Specifies the IAM principals that are exempt from this control, so that they are allowed to operate certain AWS services globally.
- ExemptedActions – Specifies actions that are exempt from this control, so that the actions are allowed.
We will use the CT.MULTISERVICE.PV.1 control and configure it for our scenario.
- Create an IAM role with an IAM policy that will allow Amazon Bedrock inference using cross-Region inference. Let’s name this IAM role Bedrock-Access-CRI. We will use this at a later step. This IAM role will be created in AWS accounts that are part of the Sandbox OU.
- Navigate to the Landing zone settings page and choose Modify settings.
- Enable the Region,
us-east-2
in our case, and leave the rest of the settings unchanged. - Choose Update landing zone to complete the changes.
The updates can take up to 60 minutes or more depending on the size of the Organization. This will update the landing zone Region deny settings (GRREGIONDENY
) to include the Region us-east-2 to govern the Region.
- When the landing zone setup is complete, review the Organization settings to make sure that there are no pending updates for AWS accounts across the OUs. If you see pending updates, complete updating them and make sure the status for the account status shows Enrolled.
- On the AWS Control Tower console, choose All controls under Controls library in the navigation pane to see a list of controls.
- Locate
MULTISERVICE.PV.1
and choose the policy to open the control.
- Choose Control actions followed by Enable to start the configuration.
- On the Select an OU page, select the OU you want to apply this control to. For our use case, we use the Sandbox OU.
- Choose Next.
- On the Specify Region access page, select the Regions to allow access for the OU. For our use case, we select
us-west-2
andus-east-1
.
We don’t select us-east-2
because we want to deny all services on us-east-2
and only allow Amazon Bedrock inference through cross-Region inference.
- Choose Next.
- On the Add service actions – optional page, add the Amazon Bedrock actions to the NotActions We add
bedrock:Invoke*
to allow Amazon Bedrock InvokeModel actions. - Choose Next.
- On the Specify configurations and tags – optional page, add the IAM role we created earlier under Exempted principals and choose Next.
- Review the configuration and choose Enable control.
After the control is enabled, you can review the configuration by choosing OUs enabled, Accounts, Artifacts, and the Regions tab.
This completes the configuration. You can test the Amazon Bedrock inference with Anthropic’s Sonnet 3.5 v2 using the Amazon Bedrock console or the API by assuming the custom IAM role mentioned in the previous step (Bedrock-Access-CRI
).
You will see that you can make Amazon Bedrock inference calls to only Anthropic’s Sonnet 3.5 v2 model using cross-Region inference from all of the three Regions (us-east-1
, us-east-2
, and us-west-2
). Attempts to access other services on us-east-2
are blocked due to the CT.MULTISERVICE.PV.1
control you configured earlier.
By following these approaches, you can safely extend the permissions managed by AWS Control Tower without causing drift or compromising your governance controls.
Option 2: Enable the denied Region using AWS Control Tower and conditionally block using an SCP
In this option, we enable the denied Region (us-east-2
) and create a new SCP to conditionally block us-east-2 while allowing Amazon Bedrock inference through cross-Region inference.
- Navigate to the Landing zone settings page and choose Modify settings.
- Enable the Region,
us-east-2
in our case, and leave the rest of the settings unchanged. - Choose Update landing zone to complete the changes.
The updates can take up to 60 minutes or more depending on the size of the Organization. You can monitor the status of this update on the console.
- When the landing zone setup is complete, review the Organization settings to make sure that there are no pending updates for AWS accounts across the OUs. If you see pending updates, complete updating them and make sure the status for the account status shows Enrolled.
- On the AWS Control Tower console, choose Service Control Policies under Policies in the navigation pane.
- Create a new SCP with the sample policy shown earlier. This SCP denies all actions for
us-east-2
while allowing Amazon Bedrock inference using a CRI profile ARN for Anthropic’s Claude Sonnet 3.5 v2. - Apply the SCP to the specific OU. In this scenario, we use the Sandbox OU.
Because you’re creating a new SCP and not modifying the existing SCPs created by AWS Control Tower, you will not see a drift in the AWS Control Tower state.
You can now test the update by running a few inference calls using the Amazon Bedrock console or the AWS Command Line Interface (AWS CLI). You will see that you can make Amazon Bedrock inference calls to only Anthropic’s Sonnet 3.5 v2 model using cross-Region inference from all three of the Regions (us-east-1
, us-east-2
, and us-west-2
). Access to other AWS services on us-east-2
will be denied.
Using Customizations for AWS Control Tower to deploy SCPs
The recommended way to add custom SCPs is through the Customizations for AWS Control Tower (CfCT) solution:
- Deploy the CfCT solution in your management account.
- Create a configuration package with your custom SCPs.
The following screenshot shows an example SCP that denies a specific Region while allowing calls to Amazon Bedrock using cross-Region inference for Anthropic’s Sonnet 3.5 v2 model.
- Prepare a
manifest.yaml
file that defines your policies.
The following screenshot shows an example manifest.yaml
that defines the resources targeting the Sandbox OU.
- Deploy your custom SCPs to specific OUs.
Summary
Amazon Bedrock cross-Region inference provides valuable flexibility for organizations looking to use FMs across Regions. By carefully modifying your service control policies or AWS Control Tower controls, you can enable this functionality while maintaining your broader Regional access restrictions.
This approach allows you to:
- Maintain compliance with Regional access requirements
- Take advantage of the full capabilities of Amazon Bedrock
- Simplify your application architecture by accessing models from your primary Region
There is no additional cost associated with cross-Region inference, including the failover capabilities provided by this feature. This includes management, data transfer, encryption, network usage, and potential differences in price per million token per model. You pay the same price per token of the individual models in your source Region.
As AI and machine learning capabilities continue to evolve, finding the right balance between security controls and innovation enablement will remain a key challenge for organizations. The approach outlined in this post provides a practical solution to this specific challenge.
For more information, refer to Increase throughput with cross-region inference.
About the Authors
Satveer Khurpa is a Sr. WW Specialist Solutions Architect, Amazon Bedrock at Amazon Web Services. In this role, he uses his expertise in cloud-based architectures to develop innovative generative AI solutions for clients across diverse industries. Satveer’s deep understanding of generative AI technologies allows him to design scalable, secure, and responsible applications that unlock new business opportunities and drive tangible value.
Ramesh Venkataraman is a Solutions Architect who enjoys working with customers to solve their technical challenges using AWS services. Outside of work, Ramesh enjoys following stack overflow questions and answers them in any way he can.
Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing and artificial intelligence. He focuses on deep learning, including NLP and computer vision domains. He helps customers achieve high-performance model inference on Amazon SageMaker.
Sumit Kumar is a Principal Product Manager, Technical at AWS Bedrock team, based in Seattle. He has over 12 years of product management experience across a variety of domains and is passionate about AI/ML. Outside of work, Sumit loves to travel and enjoys playing cricket and lawn tennis.
Source: Read MoreÂ