Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Ultimate Guide to Node.js Development Pricing for Enterprises

      July 29, 2025

      Stack Overflow: Developers’ trust in AI outputs is worsening year over year

      July 29, 2025

      Web Components: Working With Shadow DOM

      July 28, 2025

      Google’s new Opal tool allows users to create mini AI apps with no coding required

      July 28, 2025

      I replaced my Samsung OLED TV with this Sony Mini LED model for a week – and didn’t regret it

      July 29, 2025

      I tested the most popular robot mower on the market – and it was a $5,000 crash out

      July 29, 2025

      5 gadgets and accessories that leveled up my gaming setup (including a surprise console)

      July 29, 2025

      Why I’m patiently waiting for the Samsung Z Fold 8 next year (even though the foldable is already great)

      July 29, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Performance Analysis with Laravel’s Measurement Tools

      July 29, 2025
      Recent

      Performance Analysis with Laravel’s Measurement Tools

      July 29, 2025

      Memoization and Function Caching with this PHP Package

      July 29, 2025

      Laracon US 2025 Livestream

      July 29, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft mysteriously offered a Windows 11 upgrade to this unsupported Windows 10 PC — despite it failing to meet the “non-negotiable” TPM 2.0 requirement

      July 29, 2025
      Recent

      Microsoft mysteriously offered a Windows 11 upgrade to this unsupported Windows 10 PC — despite it failing to meet the “non-negotiable” TPM 2.0 requirement

      July 29, 2025

      With Windows 10’s fast-approaching demise, this Linux migration tool could let you ditch Microsoft’s ecosystem with your data and apps intact — but it’s limited to one distro

      July 29, 2025

      Windows 10 is 10 years old today — let’s look back at 10 controversial and defining moments in its history

      July 29, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Fine-tune and deploy Meta Llama 3.2 Vision for generative AI-powered web automation using AWS DLCs, Amazon EKS, and Amazon Bedrock

    Fine-tune and deploy Meta Llama 3.2 Vision for generative AI-powered web automation using AWS DLCs, Amazon EKS, and Amazon Bedrock

    July 29, 2025

    Fine-tuning of large language models (LLMs) has emerged as a crucial technique for organizations seeking to adapt powerful foundation models (FMs) to their specific needs. Rather than training models from scratch—a process that can cost millions of dollars and require extensive computational resources—companies can customize existing models with domain-specific data at a fraction of the cost. This approach has become particularly valuable as organizations across healthcare, finance, and technology sectors look to use AI for specialized tasks while maintaining cost-efficiency. However, implementing a production-grade fine-tuning solution presents several significant challenges. Organizations must navigate complex infrastructure setup requirements, enforce robust security measures, optimize performance, and establish reliable model hosting solutions.

    In this post, we present a complete solution for fine-tuning and deploying the Llama-3.2-11B-Vision-Instruct model for web automation tasks. We demonstrate how to build a secure, scalable, and efficient infrastructure using AWS Deep Learning Containers (DLCs) on Amazon Elastic Kubernetes Service (Amazon EKS). By using AWS DLCs, you can gain access to well-tested environments that come with enhanced security features and pre-installed software packages, significantly simplifying the optimization of your fine-tuning process. This approach not only accelerates development, but also provides robust security and performance in production environments.

    Solution overview

    In this section, we explore the key components of our architecture for fine-tuning a Meta Llama model and using it for web task automation. We explore the benefits of different components and how they interact with each other, and how we can use them to build a production-grade fine-tuning pipeline.

    AWS DLCs for training and hosting AI/ML workloads

    At the core of our solution are AWS DLCs, which provide optimized environments for machine learning (ML) workloads. These containers come preconfigured with essential dependencies, including NVIDIA drivers, CUDA toolkit, and Elastic Fabric Adapter (EFA) support, along with preinstalled frameworks like PyTorch for model training and hosting. AWS DLCs tackle the complex challenge of packaging various software components to work harmoniously with training scripts, so you can use optimized hardware capabilities out of the box. Additionally, AWS DLCs implement unique patching algorithms and processes that continuously monitor, identify, and address security vulnerabilities, making sure the containers remain secure and up-to-date. Their pre-validated configurations significantly reduce setup time and reduce compatibility issues that often occur in ML infrastructure setup.

    AWS DLCs, Amazon EKS, and Amazon EC2 for seamless infrastructure management

    We deploy these DLCs on Amazon EKS, creating a robust and scalable infrastructure for model fine-tuning. Organizations can use this combination to build and manage their training infrastructure with unprecedented flexibility. Amazon EKS handles the complex container orchestration, so you can launch training jobs that run within DLCs on your desired Amazon Elastic Compute Cloud (Amazon EC2) instance, producing a production-grade environment that can scale based on training demands while maintaining consistent performance.

    AWS DLCs and EFA support for high-performance networking

    AWS DLCs come with pre-configured support for EFA, enabling high-throughput, low-latency communication between EC2 nodes. An EFA is a network device that you can attach to your EC2 instance to accelerate AI, ML, and high performance computing applications. DLCs are pre-installed with EFA software that is tested and compatible with the underlying EC2 instances, so you don’t have to go through the hassle of setting up the underlying components yourself. For this post, we use setup scripts to create EKS clusters and EC2 instances that will support EFA out of the box.

    AWS DLCs with FSDP for enhanced memory efficiency

    Our solution uses PyTorch’s built-in support for Fully Sharded Data Parallel (FSDP) training, a cutting-edge technique that dramatically reduces memory requirements during training. Unlike traditional distributed training approaches where each GPU must hold a complete model copy, FSDP shards model parameters, optimizer states, and gradients across workers. The optimized implementation of FSDP within AWS DLCs makes it possible to train larger models with limited GPU resources while maintaining training efficiency.

    For more information, see Scale LLMs with PyTorch 2.0 FSDP on Amazon EKS – Part 2.

    Model deployment on Amazon Bedrock

    For model deployment, we use Amazon Bedrock, a fully managed service for FMs. Although we can use AWS DLCs for model hosting, we use Amazon Bedrock for this post to demonstrate diversity in service utilization.

    Web automation integration

    Finally, we implement the SeeAct agent, a sophisticated web automation tool, and demonstrate its integration with our hosted model on Amazon Bedrock. This combination creates a powerful system capable of understanding visual inputs and executing complex web tasks autonomously, showcasing the practical applications of our fine-tuned model.In the following sections, we demonstrate how to:

    1. Set up an EKS cluster for AI workloads.
    2. Use AWS DLCs to fine-tune Meta Llama 3.2 Vision using PyTorch FSDP.
    3. Deploy the fine-tuned model on Amazon Bedrock.
    4. Use the model with SeeAct for web task automation.

    Prerequisites

    You must have the following prerequisites:

    • An AWS account.
    • An AWS Identity and Access Management (IAM) role with appropriate policies. Because this post deals with creating clusters, nodes, and infrastructure, administrator-level permissions would work well. However, if you must have restricted permissions, you should at least have the following permissions: AmazonEC2FullAccess, AmazonSageMakerFullAccess, AmazonBedrockFullAccess, AmazonS3FullAccess, AWSCloudFormationFullAccess, AmazonEC2ContainerRegistryFullAccess. For more information about other IAM policies needed, see Minimum IAM policies.
    • The necessary dependencies installed for Amazon EKS. For instructions, see Set up to use Amazon EKS.
    • For this post, we use P5 instances. To request a quota increase, see Requesting a quota increase.
    • An EC2 key pair. For instructions, see Create a key pair for your Amazon EC2 instance.

    Run export AWS_REGION=<region_name> in your bash script from where you are running the commands.

    Set up the EKS cluster

    In this section, we walk through the steps to create your EKS cluster and install the necessary plugins, operators, and other dependencies.

    Create an EKS cluster

    The simplest way to create an EKS cluster is to use the cluster configuration YAML file. You can use the following sample configuration file as a base and customize it as needed. Provide the EC2 key pair created as a prerequisite. For more configuration options, see Using Config Files.

    ---
    apiVersion: eksctl.io/v1alpha5
    kind: ClusterConfig
    
    metadata:
      name: MyCluster
      region: us-west-2
    
    managedNodeGroups: 
      - name: p5
        instanceType: p5.48xlarge
        minSize: 0
        maxSize: 2
        desiredCapacity: 2
        availabilityZones: ["us-west-2a"]
        volumeSize: 1024
        ssh:
          publicKeyName: <your-ec2-key-pair>
        efaEnabled: true
        privateNetworking: true
        ## In case you have an On Demand Capacity Reservation (ODCR) and want to use it, uncomment the lines below.
        # capacityReservation:
        #   capacityReservationTarget:
        #     capacityReservationResourceGroupARN: arn:aws:resource-groups:us-west-2:897880167187:group/eks_blog_post_capacity_reservation_resource_group_p5

    Run the following command to create the EKS cluster:

    eksctl create cluster --config-file cluster.yamlThe following is an example output:

    YYYY-MM-DD HH:mm:SS [ℹ] eksctl version x.yyy.z
    YYYY-MM-DD HH:mm:SS [ℹ] using region <region_name>
    ...
    YYYY-MM-DD HH:mm:SS [✔] EKS cluster "<cluster_name>" in "<region_name>" region is ready

    Cluster creation might take between 15–30 minutes. After it’s created, your local ~/.kube/config file gets updated with connection information to your cluster.

    Run the following command line to verify that the cluster is accessible:

    kubectl get nodes

    Install plugins, operators, and other dependencies

    In this step, you install the necessary plugins, operators and other dependencies on your EKS cluster. This is necessary to run the fine-tuning on the correct node and save the model.

    1. Install the NVIDIA Kubernetes device plugin:
    kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.0/deployments/static/nvidia-device-plugin.yml
    1. Install the AWS EFA Kubernetes device plugin:
    helm repo add eks https://aws.github.io/eks-charts
    git clone -b v0.0.190 https://github.com/aws/eks-charts.git
    cd  eks-charts/stable
    helm install efa ./aws-efa-k8s-device-plugin -n kube-system
    cd ../..
    1. Delete aws-efa-k8s-device-plugin-daemonset by running the following command:
    kubectl delete daemonset aws-efa-k8s-device-plugin-daemonset -n kube-system
    1. Clone the code locally that with help with setup and fine-tuning:
    git clone https://github.com/aws-samples/aws-do-eks.git
    cd aws-do-eks
    git checkout f59007ee50117b547305f3b8475c8e1b4db5a1d5
    curl -L -o patch-aws-do-eks.tar.gz https://github.com/aws/deep-learning-containers/raw/refs/heads/master/examples/dlc-llama-3-finetuning-and-hosting-with-agent/patch-aws-do-eks.tar.gz
    ftar -xzf patch-aws-do-eks.tar.gz
    cd patch-aws-do-eks/
    git am *.patch
    cd ../..
    1. Install etcd for running distributed training with PyTorch:
    kubectl apply -f aws-do-eks/Container-Root/eks/deployment/etcd/etcd-deployment.yaml
    1. Deploy the FSx CSI driver for saving the model after fine-tuning:
      1. Enter into the fsx folder:
        cd aws-do-eks/Container-Root/eks/deployment/csi/fsx/
      2. Edit the fsx.conf file to modify the CLUSTER_NAME, CLUSTER_REGION, and CLUSTER_ZONE values to your cluster specific data:
        vi fsx.conf
      3. Deploy the FSX CSI driver:
        ./deploy.sh
    2. Deploy the Kubeflow Training Operator that will be used to run the fine-tuning job:
      1. Change the location to the following:
        cd aws-do-eks/Container-Root/eks/deployment/kubeflow/training-operator/
      2. Deploy the Kubeflow Training Operator:
        ./deploy.sh
    3. Deploy the Kubeflow MPI Operator for running NCCL tests:
      1. Run deploy.sh from the following GitHub repo.
      2. Change the location to the following:
        cd aws-do-eks/Container-Root/eks/deployment/kubeflow/mpi-operator/
      3. Deploy the Kubeflow MPI Operator:
        ./deploy.sh

    Fine-tune Meta Llama 3.2 Vision using DLCs on Amazon EKS

    This section outlines the process for fine-tuning the Meta Llama 3.2 Vision model using PyTorch FSDP on Amazon EKS. We use the DLCs as the base image to run our training jobs.

    Configure the setup needed for fine-tuning

    Complete the following steps to configure the setup for fine-tuning:

    1. Create a Hugging Face account and get a Hugging Face security token.
    2. Enter into the fsdp folder:
    cd Container-Root/eks/deployment/distributed-training/pytorch/pytorchjob/fsdp
    1. Create a Persistent Volume Claim (PVC) that will use the underlying FSx CSI driver that you installed earlier:
    kubectl apply -f pvc.yaml

    Monitor kubectl get pvc fsx-claim and make sure it reached BOUND status. You can then go to the Amazon EKS console to see an unnamed volume created without a name. You can let this happen in the background, but before you run the ./run.sh command to run the fine-tuning job in a later step, make sure the BOUND status is achieved.

    1. To configure the environment, open the .env file and modify the following variables:
      1. HF_TOKEN: Add the Hugging Face token that you generated earlier.
      2. S3_LOCATION: Add the Amazon Simple Storage Service (Amazon S3) location where you want to store the fine-tuned model after the training is complete.
    2. Create the required resource YAMLs:
    ./deploy.sh

    This line uses the values in the .env file to generate new YAML files that will eventually be used for model deployment.

    1. Build and push the container image:
    ./login-dlc.sh
    ./build.sh
    ./push.sh

    Run the fine-tuning job

    In this step, we use the upstream DLCs and add the training scripts within the image for running the training.

    Make sure that you have requested access to the Meta Llama 3.2 Vision model on Hugging Face. Continue to the next step after permission has been granted.

    Execute the fine-tuning job:

    ./run.sh

    For our use case, the job took 1.5 hours to complete. The script uses the following PyTorch command that’s defined in the .env file within the fsdp folder:

    ```
    bash
    torchrun --nnodes 1 --nproc_per_node 8  
    recipes/quickstart/finetuning/finetuning.py 
    --enable_fsdp --lr 1e-5  --num_epochs 5 
    --batch_size_training 2 
    --model_name meta-llama/Llama-3.2-11B-Vision-Instruct 
    --dist_checkpoint_root_folder ./finetuned_model 
    --dist_checkpoint_folder fine-tuned  
    --use_fast_kernels 
    --dataset "custom_dataset" --custom_dataset.test_split "test" 
    --custom_dataset.file "recipes/quickstart/finetuning/datasets/mind2web_dataset.py"  
    --run_validation False --batching_strategy padding
    ```

    You can use the ./logs.sh command to see the training logs in both FSDP workers.

    After a successful run, logs from fsdp-worker will look as follows:

    Sharded state checkpoint saved to /workspace/llama-recipes/finetuned_model_mind2web/fine-tuned-meta-llama/Llama-3.2-11B-Vision-Instruct
    Checkpoint Time = 85.3276
    
    Epoch 5: train_perplexity=1.0214, train_epoch_loss=0.0211, epoch time 706.1626197730075s
    training params are saved in /workspace/llama-recipes/finetuned_model_mind2web/fine-tuned-meta-llama/Llama-3.2-11B-Vision-Instruct/train_params.yaml
    Key: avg_train_prep, Value: 1.0532150745391846
    Key: avg_train_loss, Value: 0.05118955448269844
    Key: avg_epoch_time, Value: 716.0386156642023
    Key: avg_checkpoint_time, Value: 85.34336999000224
    fsdp-worker-1:78:5593 [0] NCCL INFO [Service thread] Connection closed by localRank 1
    fsdp-worker-1:81:5587 [0] NCCL INFO [Service thread] Connection closed by localRank 4
    fsdp-worker-1:85:5590 [0] NCCL INFO [Service thread] Connection closed by localRank 0
    I0305 19:37:56.173000 140632318404416 torch/distributed/elastic/agent/server/api.py:844] [default] worker group successfully finished. Waiting 300 seconds for other agents to finish.
    I0305 19:37:56.173000 140632318404416 torch/distributed/elastic/agent/server/api.py:889] Local worker group finished (WorkerState.SUCCEEDED). Waiting 300 seconds for other agents to finish
    I0305 19:37:56.177000 140632318404416 torch/distributed/elastic/agent/server/api.py:902] Done waiting for other agents. Elapsed: 0.0037238597869873047 seconds

    Additionally:

    [rank8]:W0305 19:37:46.754000 139970058049344 torch/distributed/distributed_c10d.py:2429] _tensor_to_object size: 2817680 hash value: 9260685783781206407
    fsdp-worker-0:84:5591 [0] NCCL INFO [Service thread] Connection closed by localRank 7
    I0305 19:37:56.124000 139944709084992 torch/distributed/elastic/agent/server/api.py:844] [default] worker group successfully finished. Waiting 300 seconds for other agents to finish.
    I0305 19:37:56.124000 139944709084992 torch/distributed/elastic/agent/server/api.py:889] Local worker group finished (WorkerState.SUCCEEDED). Waiting 300 seconds for other agents to finish
    I0305 19:37:56.177000 139944709084992 torch/distributed/elastic/agent/server/api.py:902] Done waiting for other agents. Elapsed: 0.05295562744140625 seconds

    Run the processing model and store output in Amazon S3

    After the jobs are complete, the fine-tuned model will exist in the FSx file system. The next step is to convert the model into Hugging Face format and save it in Amazon S3 so you can access and deploy the model in the upcoming steps:kubectl apply -f model-processor.yaml

    The preceding command deploys a pod on your instance that will read the model from FSx, convert it to Hugging Face type, and push it to Amazon S3. It takes approximately 8–10 minutes for this pod to run. You can monitor the logs for this using ./logs.sh or kubectl logs -l app=model-processor.

    Get the location where your model has been stored in Amazon S3. This is the same Amazon S3 location that was mentioned the .env file in an earlier step. Run the following command (provide the Amazon S3 location):aws s3 cp tokenizer_config.json <S3_LOCATION>://tokenizer_config.json

    This is the tokenizer config that is needed by Amazon Bedrock to import Meta Llama models so they work with the Amazon Bedrock Converse API. For more details, see Converse API code samples for custom model import.

    For this post, we use the Mind2Web dataset. We have implemented code that has been adapted from the Mind2Web code for fine-tuning. The adapted code is as follows:

    git clone https://github.com/meta-llama/llama-cookbook && 
    cd llama-cookbook && 
    git checkout a346e19df9dd1a9cddde416167732a3edd899d09 && 
    curl -L -o patch-llama-cookbook.tar.gz https://raw.githubusercontent.com/aws/deep-learning-containers/master/examples/dlc-llama-3-finetuning-and-hosting-with-agent/patch-llama-cookbook.tar.gz && 
    tar -xzf patch-llama-cookbook.tar.gz && 
    cd patch-llama-cookbook && 
    git config --global user.email "you@example.com" && 
    git am *.patch && 
    cd .. && 
    cat recipes/quickstart/finetuning/datasets/mind2web_dataset.py

    Deploy the fine-tuned model on Amazon Bedrock

    After you fine-tune your Meta Llama 3.2 Vision model, you have several options for deployment. This section covers one deployment method using Amazon Bedrock. With Amazon Bedrock, you can import and use your custom trained models seamlessly. Make sure your fine-tuned model is uploaded to an S3 bucket, and it’s converted to Hugging Face format. Complete the following steps to import your fine-tuned Meta Llama 3.2 Vision model:

    1. On the Amazon Bedrock console, under Foundation models in the navigation pane, choose Imported models.
    2. Choose Import model.
    3. For Model name, enter a name for the model.

    1. For Model import source, select Amazon S3 bucket.
    2. For S3 location, enter the location of the S3 bucket containing your fine-tuned model.

    1. Configure additional model settings as needed, then import your model.

    The process might take 10–15 minutes depending on the model size to complete.

    After you import your custom model, you can invoke it using the same Amazon Bedrock API as the default Meta Llama 3.2 Vision model. Just replace the model name with your imported model’s Amazon Resource Name (ARN). For detailed instructions, refer to Amazon Bedrock Custom Model Import.

    You can follow the prompt formats mentioned in the following GitHub repo. For example:

    What are the steps to build a docker image?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

    Run the agent workload using the hosted Amazon Bedrock model

    Running the agent workload involves using the SeeAct framework and browser automation to start an interactive session with the AI agent and perform the browser operations. We recommend completing the steps in this section on a local machine for browser access.

    Clone the SeeAct repository

    Clone the customized SeeAct repository, which contains example code that can work with Amazon Bedrock, as well as a couple of test scripts:

    git clone https://github.com/OSU-NLP-Group/SeeAct.git

    Set up SeeAct in a local runtime environment

    Complete the following steps to set up SeeAct in a local runtime environment:

    1. Create a Python virtual environment for this demo. We use Python 3.11 in the example, but you can change to other Python versions.
    python3.11 -m venv seacct-python-3-11
    source seacct-python-3-11/bin/activate
    1. Apply a patch to add the code change needed for this demo:
    cd SeeAct
    curl -O https://raw.githubusercontent.com/aws/deep-learning-containers/master/examples/dlc-llama-3-finetuning-and-hosting-with-agent/patch-seeact.patch
    git checkout 2fdbf373f58a1aa5f626f7c5931fe251afc69c0a
    git apply patch-seeact.patch
    1. Run the following commands to install the SeeAct package and dependencies:
    cd SeeAct/seeact_package
    pip install .
    pip install -r requirements.txt
    pip install -U boto3
    playwright install

    Make sure you’re using the latest version of Boto3 for these steps.

    Validate the browser automation tool used by SeeAct

    We added a small Python script to verify the functionality of Playwright, the browser automation tool used by SeeAct:

    cd SeeAct/src
    python test_playwright.py

    You should see a browser launched and closed after a few seconds. You should also see a screenshot being captured in SeeAct/src/example.png showing google.com.

    Test Amazon Bedrock model availability

    Modify the content of test_bedrock.py. Update the MODEL_ID to be your hosted Amazon Bedrock model ARN and set up the AWS connection.

    export AWS_ACCESS_KEY_ID="replace with your aws credential"
    export AWS_SECRET_ACCESS_KEY="replace with your aws credential"
    export AWS_SESSION_TOKEN="replace with your aws credential"

    Run the test:

    cd SeeAct
    python test_bedrock.py

    After a successful invocation, you should see a log similar to the following in your terminal:

    The image shows a dog lying down inside a black pet carrier, with a leash attached to the dog's collar.

    If the botocore.errorfactory.ModelNotReadyException error occurs, retry the command in a few minutes.

    Run the agent workflow

    The branch has already added support for BedrockEngine and SGLang for running inference with the fine-tuned Meta Llama 3.2 Vision model. The default option uses Amazon Bedrock inference.

    To run the agent workflow, update self.model from src/demo_utils/inference_engine.py at line 229 to your Amazon Bedrock model ARN. Then run the following code:

    cd SeeAct/src
    python seeact.py -c config/demo_mode.toml 

    This will launch a terminal prompt like the following code, so you can input the task you want the agent to do:

    Please input a task, and press Enter. 
    Or directly press Enter to use the default task: Find pdf of paper "GPT-4V(ision) is a Generalist Web Agent, if Grounded" from arXiv
    Task: 

    In the following screenshot, we asked the agent to search for the website for DLCs.

    Clean up

    Use the following code to clean the resources you created as part of this post:

    cd Container-Root/eks/deployment/distributed-training/pytorch/pytorchjob/fsdp
    kubectl delete -f ./fsdp.yaml ## Deletes the training fsdp job
    kubectl delete -f ./etcd.yaml ## Deletes etcd
    kubectl delete -f ./model-processor.yaml ## Deletes model processing YAML
    
    cd aws-do-eks/Container-Root/eks/deployment/kubeflow/mpi-operator/
    ./remove.sh
    
    cd aws-do-eks/Container-Root/eks/deployment/kubeflow/training-operator/
    ./remove.sh
    
    ## [VOLUME GETS DELETED] - If you want to delete the FSX volume
    kubectl delete -f ./pvc.yaml ## Deletes persistent volume claim, persistent volume and actual volume

    To stop the P5 nodes and release them, complete the following steps:

    1. On the Amazon EKS console, choose Clusters in the navigation pane.
    2. Choose the cluster that contains your node group.
    3. On the cluster details page choose the Compute tab.
    4. In the Node groups section, select your node group, then choose Edit.
    5. Set the desired size to 0.

    Conclusion

    In this post, we presented an end-to-end workflow for fine-tuning and deploying the Meta Llama 3.2 Vision model using the production-grade infrastructure of AWS. By using AWS DLCs on Amazon EKS, you can create a robust, secure, and scalable environment for model fine-tuning. The integration of advanced technologies like EFA support and FSDP training enables efficient handling of LLMs while optimizing resource usage. The deployment through Amazon Bedrock provides a streamlined path to production, and the integration with SeeAct demonstrates practical applications in web automation tasks. This solution serves as a comprehensive reference point for engineers to develop their own specialized AI applications, adapt the demonstrated approaches, and implement similar solutions for web automation, content analysis, or other domain-specific tasks requiring vision-language capabilities.

    To get started with your own implementation, refer to our GitHub repo. To learn more about AWS DLCs, see the AWS Deep Learning Containers Developer Guide. For more details about Amazon Bedrock, see Getting started with Amazon Bedrock.

    For deeper insights into related topics, refer to the following resources:

    • Scale LLMs with PyTorch 2.0 FSDP on Amazon EKS – Part 2
    • Build high-performance ML models using PyTorch 2.0 on AWS – Part 1
    • Mind2Web dataset

    Need help or have questions? Join our AWS Machine Learning community on Discord or reach out to AWS Support. You can also stay updated with the latest developments by following the AWS Machine Learning Blog.


    About the Authors

    Shantanu Tripathi is a Software Development Engineer at AWS with over 4 years of experience in building and optimizing large-scale AI/ML solutions. His experience spans developing distributed AI training libraries, creating and launching DLCs and Deep Learning AMIs, designing scalable infrastructure for high-performance AI workloads, and working on generative AI solutions. He has contributed to AWS services like Amazon SageMaker HyperPod, AWS DLCs, and DLAMIs, along with driving innovations in AI security. Outside of work, he enjoys theater and swimming.

    Junpu Fan is a Senior Software Development Engineer at Amazon Web Services, specializing in AI/ML Infrastructure. With over 5 years of experience in the field, Junpu has developed extensive expertise across the full cycle of AI/ML workflows. His work focuses on building robust systems that power ML applications at scale, helping organizations transform their data into actionable insights.

    Harish Rao is a Senior Solutions Architect at AWS, specializing in large-scale distributed AI training and inference. He helps customers harness the power of AI to drive innovation and solve complex challenges. Outside of work, Harish embraces an active lifestyle, enjoying the tranquility of hiking, the intensity of racquetball, and the mental clarity of mindfulness practices.

    Arindam Paul is a Sr. Product Manager in SageMaker AI team at AWS responsible for Deep Learning workloads on SageMaker, EC2, EKS, and ECS. He is passionate about using AI to solve customer problems. In his spare time, he enjoys working out and gardening.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleGenerate suspicious transaction report drafts for financial compliance using generative AI
    Next Article How Nippon India Mutual Fund improved the accuracy of AI assistant responses using advanced RAG methods on Amazon Bedrock

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 29, 2025
    Machine Learning

    Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons

    July 29, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-2771 – BEC Technologies Router Authentication Bypass Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    UC Berkeley Introduces CyberGym: A Real-World Cybersecurity Evaluation Framework to Evaluate AI Agents on Large-Scale Vulnerabilities Across Massive Codebases

    Machine Learning

    CVE-2025-4403 – WooCommerce Drag and Drop Multiple File Upload Arbitrary File Upload Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-45017 – PHPGurukul Park Ticketing Management System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Zimbra Classic Web Client Vulnerability Let Attackers Execute Arbitrary JavaScript

    June 24, 2025

    Zimbra Classic Web Client Vulnerability Let Attackers Execute Arbitrary JavaScript

    A critical security vulnerability has been discovered in Zimbra Classic Web Client that enables attackers to execute arbitrary JavaScript code through stored cross-site scripting (XSS) attacks.
    The vu …
    Read more

    Published Date:
    Jun 24, 2025 (3 hours, 55 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-27915

    Warhol Arts: A Digital Playground of Pop, Pixels, and Pure Motion

    April 1, 2025

    LWiAI Podcast #217 – ChatGPT Agent, Kimi k2, Hiring Drama

    July 23, 2025

    Optimizing Customer Value for Banks with Conversational AI: Strategies, Benefits, and Use Cases🏦

    July 16, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.