Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Report: 71% of tech leaders won’t hire devs without AI skills

      July 17, 2025

      Slack’s AI search now works across an organization’s entire knowledge base

      July 17, 2025

      In-House vs Outsourcing for React.js Development: Understand What Is Best for Your Enterprise

      July 17, 2025

      Tiny Screens, Big Impact: The Forgotten Art Of Developing Web Apps For Feature Phones

      July 16, 2025

      Too many open browser tabs? This is still my favorite solution – and has been for years

      July 17, 2025

      This new browser won’t monetize your every move – how to try it

      July 17, 2025

      Pokémon has partnered with one of the biggest PC gaming brands again, and you can actually buy these accessories — but do you even want to?

      July 17, 2025

      AMD’s budget Ryzen AI 5 330 processor will introduce a wave of ultra-affordable Copilot+ PCs with its mobile 50 TOPS NPU

      July 17, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The details of TC39’s last meeting

      July 18, 2025
      Recent

      The details of TC39’s last meeting

      July 18, 2025

      Reclaim Space: Delete Docker Orphan Layers

      July 18, 2025

      Notes Android App Using SQLite

      July 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      KeySmith – SSH key management

      July 17, 2025
      Recent

      KeySmith – SSH key management

      July 17, 2025

      Pokémon has partnered with one of the biggest PC gaming brands again, and you can actually buy these accessories — but do you even want to?

      July 17, 2025

      AMD’s budget Ryzen AI 5 330 processor will introduce a wave of ultra-affordable Copilot+ PCs with its mobile 50 TOPS NPU

      July 17, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Implementing on-demand deployment with customized Amazon Nova models on Amazon Bedrock

    Implementing on-demand deployment with customized Amazon Nova models on Amazon Bedrock

    July 17, 2025

    Amazon Bedrock offers model customization capabilities for customers to tailor versions of foundation models (FMs) to their specific needs through features such as fine-tuning and distillation. Today, we’re announcing the launch of on-demand deployment for customized models ready to be deployed on Amazon Bedrock.

    On-demand deployment for customized models provides an additional deployment option that scales with your usage patterns. This approach allows for invoking customized models only when needed, with requests processed in real time without requiring pre-provisioned compute resources.

    The on-demand deployment option includes a token-based pricing model that charges based on the number of tokens processed during inference. This pay-as-you-go approach complements the existing Provisioned Throughput option, giving users flexibility to choose the deployment method that best aligns with their specific workload requirements and cost objectives.

    In this post, we walk through the custom model on-demand deployment workflow for Amazon Bedrock and provide step-by-step implementation guides using both the AWS Management Console and APIs or AWS SDKs. We also discuss best practices and considerations for deploying customized Amazon Nova models on Amazon Bedrock.

    Understanding custom model on-demand deployment workflow

    The model customization lifecycle represents the end-to-end journey from conceptualization to deployment. This process begins with defining your specific use case, preparing and formatting appropriate data, and then performing model customization through features such as Amazon Bedrock fine-tuning or Amazon Bedrock Model Distillation. Each stage builds upon the previous one, creating a pathway toward deploying production-ready generative AI capabilities that you tailor to your requirements. The following diagram illustrates this workflow.

    After customizing your model, the evaluation and deployment phases determine how the model will be made available for inference. This is where custom model on-demand deployment becomes valuable, offering a deployment option that aligns with variable workloads and cost-conscious implementations. When using on-demand deployment, you can invoke your customized model through the AWS console or standard API operations using the model identifier, with compute resources automatically allocated only when needed. The on-demand deployment provides flexibility while maintaining performance expectations, so you can seamlessly integrate customized models into your applications with the same serverless experience offered by Amazon Bedrock—all compute resources are automatically managed for you, based on your actual usage. Because the workflow supports iterative improvements, you can refine your models based on evaluation results and evolving business needs.

    Prerequisites

    This post assumes you have a customized Amazon Nova model before deploying it using on-demand deployment. On-demand deployment requires newly customized Amazon Nova models after this launch. Previously customized models aren’t compatible with this deployment option. For instructions on creating or customizing your Nova model through fine-tuning or distillation, refer to these resources:

    • Fine-tuning Amazon Nova models
    • A guide to Amazon Bedrock Model Distillation

    After you’ve successfully customized your Amazon Nova model, you can proceed with deploying it using the on-demand deployment option as detailed in the following sections.

    Implementation guide for on-demand deployment

    There are two main approaches to implementing on-demand deployment for your customized Amazon Nova models on Amazon Bedrock: using the Amazon Bedrock console or using the API or SDK. First, we explore how to deploy your model through the Amazon Bedrock console, which provides a user-friendly interface for setting up and managing your deployments.

    Step-by-step implementation using the Amazon Bedrock console

    To implement on-demand deployment for your customized Amazon Nova models on Amazon Bedrock using the console, follow these steps:

    1. On the Amazon Bedrock console, select your customized model (fine-tuning or model distillation) to be deployed. Choose Set up inference and select Deploy for on-demand, as shown in the following screenshot.

    1. Under Deployment details, enter a Name and a Description. You have the option to add Tags, as shown in the following screenshot. Choose Create to start on-demand deployment of customized model.

    Under Custom model deployments, the status of your deployment should be InProgress, Active, or Failed, as shown in the following screenshot.

    You can select a deployment to find Deployment ARN, Creation time, Last updated, and Status for the selected custom model.

    The custom model is deployed and ready using on-demand deployment. Try it out in the test playground or go to Chat/Text playground, choose Custom models under Categories. Select your model, choose On demand under Inference, and select by the deployment name, as shown in the following screenshot.

    Step-by-step implementation using API or SDK

    After you have trained the model successfully, you can deploy it to evaluate the response quality and latency or to use the model as a production model for your use case. You use CreateCustomModelDeployment API to create model deployment for the trained model. The following steps show how to use the APIs for deploying and deleting the custom model deployment for on-demand inference.

    import boto3
    import json
    
    # First, create and configure an Amazon Bedrock client:
    bedrock_client = boto3.client(
    service_name="bedrock",region_name="<region-info>")
    
    # create custom model deployment 
    response = bedrock_client.create_custom_model_deployment(
                            modelDeploymentName="<model-deployment-name>",
                            modelArn="<trained-model-arn>",
                            description="<model-deployment-description>",
                            tags=[
    {"key":"<your-key>",
     "value":"<your-value>"},
       ])

    After you’ve successfully created a model deployment, you can check the status of the deployment by using GetCustomModelDeployment API as follows:

    response = bedrock_client.get_custom_model_deployment( 
    			customModelDeploymentIdentifier="<custom-deployment-arn>")

    GetCustomModelDeployment supports three states: Creating , Active , and Failed. When the status in response is Active, you should be able to use the custom model through on-demand deployment with InvokeModel or Converse API, as shown in the following example:

    # Define Runtime Client
    bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name="<region-info>") 
    # invoke a deployed custom model using Converse API
    response = bedrock_runtime.converse(
                        modelId="<custom-deployment-arn>",
                        messages=[
                            {
                                "role": "user",
                                "content": [
                                    {
                                        "text": "<your-prompt-for-custom-model>",
                                    }
                                ]
                            }
                        ]
                    )
    
    result = response.get('output')
    print(result)
    
    # invoke a deployed custom model using InvokeModel API
    request_body = {
        "schemaVersion": "messages-v1",
        "messages": [{"role": "user", 
                      "content": [{"text": "<your-prompt-for-custom-model>"}]}],
        "system": [{"text": "<system prompt>"}],
        "inferenceConfig": {"maxTokens": 500, 
                            "topP": 0.9, 
                            "temperature": 0.0
                            }
    }
    body = json.dumps(request_body)
    response = bedrock_runtime.invoke_model(
            modelId="<custom-deployment-arn>",
            body=body
        )
    
    # Extract and print the response text
    model_response = json.loads(response["body"].read())
    response_text = model_response["output"]["message"]["content"][0]["text"]
    print(response_text)

    By following these steps, you can deploy and use your customized model through Amazon Bedrock API and instantly use your efficient and high-performing model tailored to your use cases through on-demand deployment.

    Best practices and considerations

    Successful implementation of on-demand deployment with customized models depends on understanding several operational factors. These considerations—including latency, Regional availability, quota limitations, deployment option selections, and cost management strategies—directly impact your ability to deploy effective solutions while optimizing resource utilization. The following guidelines help you make informed decisions when implementing your inference strategy:

    • Cold start latency – When using on-demand deployment, you might experience initial cold start latencies, typically lasting several seconds, depending on the model size. This occurs when the deployment hasn’t received recent traffic and needs to reinitialize compute resources.
    • Regional availability – At launch, custom model deployment will be available in US East (N. Virginia) for Amazon Nova models.
    • Quota management – Each custom model deployment has specific quotas:
      • Tokens per minute (TPM)
      • Requests per minute (RPM)
      • The number of Creating status deployment
      • Total on-demand deployments in a single account

    Each deployment operates independently within its assigned quota. If a deployment exceeds its TPM or RPM allocation, incoming requests will be throttled. You can request quota increases by submitting a ticket or contacting your AWS account team.

    • Choosing between custom model deployment and Provisioned Throughput – You can set up inference on a custom model by either creating a custom model deployment (for on-demand usage) or purchasing Provisioned Throughput. The choice depends on the supported Regions and models for each inference option, throughput requirement, and cost considerations. These two options operate independently and can be used simultaneously for the same custom model.
    • Cost management – On-demand deployment uses a pay-as-you-go pricing model based on the number of tokens processed during inference. You can use cost allocation tags on your on-demand deployments to track and manage inference costs, allowing better budget tracking and cost optimization through AWS Cost Explorer.

    Cleanup

    If you’ve been testing the on-demand deployment feature and don’t plan to continue using it, it’s important to clean up your resources to avoid incurring unnecessary costs. Here’s how to delete using the Amazon Bedrock Console:

    1. Navigate to your custom model deployment
    2. Select the deployment you want to remove
    3. Delete the deployment

    Here’s how to delete using the API or SDK:

    To delete a custom model deployment, you can use DeleteCustomModelDeployment API. The following example demonstrates how to delete your custom model deployment:

    # delete deployed custom model deployment
    response = bedrock_client.delete_custom_model_deployment(
                  customModelDeploymentIdentifier="<trained-model-arn>"
                            )

    Conclusion

    The introduction of on-demand deployment for customized models on Amazon Bedrock represents a significant advancement in making AI model deployment more accessible, cost-effective, and flexible for businesses of all sizes. On-demand deployment offers the following advantages:

    • Cost optimization – Pay-as-you-go pricing allows you only pay for the compute resources you actually use
    • Operational simplicity – Automatic resource management eliminates the need for manual infrastructure provisioning
    • Scalability – Seamless handling of variable workloads without upfront capacity planning
    • Flexibility – Freedom to choose between on-demand and Provisioned Throughput based on your specific needs

    Getting started is straightforward. Begin by completing your model customization through fine-tuning or distillation, then choose on-demand deployment using the AWS Management Console or API. Configure your deployment details, validate model performance in a test environment, and seamlessly integrate into your production workflows.

    Start exploring on-demand deployment for customized models on Amazon Bedrock today! Visit the Amazon Bedrock documentation to begin your model customization journey and experience the benefits of flexible, cost-effective AI infrastructure. For hands-on implementation examples, check out our GitHub repository which contains detailed code samples for customizing Amazon Nova models and evaluating them using on-demand custom model deployment.


    About the Authors

    Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

    Sovik Kumar Nath is an AI/ML and Generative AI senior solution architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.

    Ishan Singh is a Sr. Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

    Koushik Mani is an associate solutions architect at AWS. He had worked as a Software Engineer for two years focusing on machine learning and cloud computing use cases at Telstra. He completed his masters in computer science from University of Southern California. He is passionate about machine learning and generative AI use cases and building solutions.

    Rishabh Agrawal is a Senior Software Engineer working on AI services at AWS. In his spare time, he enjoys hiking, traveling and reading.

    Shreeya Sharma is a Senior Technical Product Manager at AWS, where she has been working on leveraging the power of generative AI to deliver innovative and customer-centric products. Shreeya holds a master’s degree from Duke University. Outside of work, she loves traveling, dancing, and singing.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleBuilding cost-effective RAG applications with Amazon Bedrock Knowledge Bases and Amazon S3 Vectors
    Next Article Exploring modern CSS features such as scroll-state(stuck: top) and more

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 18, 2025
    Machine Learning

    Building cost-effective RAG applications with Amazon Bedrock Knowledge Bases and Amazon S3 Vectors

    July 17, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Here’s why I recommend this Motorola over flagship phones that cost twice as much

    News & Updates

    The Return of the UX Generalis

    Web Development

    CVE-2025-6039 – WordPress ProcessingJS Stored Cross-Site Scripting

    Common Vulnerabilities and Exposures (CVEs)

    Microsoft Secures MSA Signing with Azure Confidential VMs Following Storm-0558 Breach

    Development

    Highlights

    RapperBot Botnet Attack Peaks 50,000+ Attacks Targeting Network Edge Devices

    June 18, 2025

    RapperBot Botnet Attack Peaks 50,000+ Attacks Targeting Network Edge Devices

    The RapperBot botnet has reached unprecedented scale, with security researchers observing over 50,000 active bot infections targeting network edge devices across the globe.
    This sophisticated malware …
    Read more

    Published Date:
    Jun 18, 2025 (2 hours, 43 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2023-4473

    CVE-2021-46229

    HopToDesk – remote desktop tool

    June 29, 2025

    Prepare for Contact Center Week with Colleen Eager

    May 9, 2025

    My favorite Xbox controller is superior to Microsoft’s — Amazon Gaming Week slices the price to below $90 for a limited time

    April 28, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.