Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Hallucinated code, real threat: How slopsquatting targets AI-assisted development

      July 1, 2025

      CompTIA State of the Tech Workforce 2025 released, Meta joins Kotlin Foundation, Percona launches Transparent Data Encryption for PostgreSQL – Daily News Digest

      July 1, 2025

      Turning User Research Into Real Organizational Change

      July 1, 2025

      June 2025: All AI updates from the past month

      June 30, 2025

      Intel’s former CEO speaks out — “I wanted to finish what I started”

      July 1, 2025

      NVIDIA RTX 5000 GPUs are on sale at Newegg, and discounted gift cards can be used towards your purchase — It’s free money

      July 1, 2025

      False alarm! Microsoft rectifies language that implied Windows may have lost millions of users since Windows 11 debut

      July 1, 2025

      ROG Ally drops to its lowest price ever, making it less than Switch 2 — Here’s why it could be your dream gaming handheld

      July 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      1KB JavaScript Demoscene Challenge Just Launched

      July 1, 2025
      Recent

      1KB JavaScript Demoscene Challenge Just Launched

      July 1, 2025

      Salesforce Marketing Cloud for Medical Devices

      July 1, 2025

      June report 2025

      July 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Intel’s former CEO speaks out — “I wanted to finish what I started”

      July 1, 2025
      Recent

      Intel’s former CEO speaks out — “I wanted to finish what I started”

      July 1, 2025

      NVIDIA RTX 5000 GPUs are on sale at Newegg, and discounted gift cards can be used towards your purchase — It’s free money

      July 1, 2025

      False alarm! Microsoft rectifies language that implied Windows may have lost millions of users since Windows 11 debut

      July 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Build a serverless audio summarization solution with Amazon Bedrock and Whisper

    Build a serverless audio summarization solution with Amazon Bedrock and Whisper

    June 6, 2025

    Recordings of business meetings, interviews, and customer interactions have become essential for preserving important information. However, transcribing and summarizing these recordings manually is often time-consuming and labor-intensive. With the progress in generative AI and automatic speech recognition (ASR), automated solutions have emerged to make this process faster and more efficient.

    Protecting personally identifiable information (PII) is a vital aspect of data security, driven by both ethical responsibilities and legal requirements. In this post, we demonstrate how to use the Open AI Whisper foundation model (FM) Whisper Large V3 Turbo, available in Amazon Bedrock Marketplace, which offers access to over 140 models through a dedicated offering, to produce near real-time transcription. These transcriptions are then processed by Amazon Bedrock for summarization and redaction of sensitive information.

    Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma, Meta, Mistral AI, poolside (coming soon), Stability AI, and Amazon Nova through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Additionally, you can use Amazon Bedrock Guardrails to automatically redact sensitive information, including PII, from the transcription summaries to support compliance and data protection needs.

    In this post, we walk through an end-to-end architecture that combines a React-based frontend with Amazon Bedrock, AWS Lambda, and AWS Step Functions to orchestrate the workflow, facilitating seamless integration and processing.

    Solution overview

    The solution highlights the power of integrating serverless technologies with generative AI to automate and scale content processing workflows. The user journey begins with uploading a recording through a React frontend application, hosted on Amazon CloudFront and backed by Amazon Simple Storage Service (Amazon S3) and Amazon API Gateway. When the file is uploaded, it triggers a Step Functions state machine that orchestrates the core processing steps, using AI models and Lambda functions for seamless data flow and transformation. The following diagram illustrates the solution architecture.

    AWS serverless architecture for audio processing: CloudFront to S3, EventBridge trigger, Lambda and Bedrock for transcription and summarization

    The workflow consists of the following steps:

    1. The React application is hosted in an S3 bucket and served to users through CloudFront for fast, global access. API Gateway handles interactions between the frontend and backend services.
    2. Users upload audio or video files directly from the app. These recordings are stored in a designated S3 bucket for processing.
    3. An Amazon EventBridge rule detects the S3 upload event and triggers the Step Functions state machine, initiating the AI-powered processing pipeline.
    4. The state machine performs audio transcription, summarization, and redaction by orchestrating multiple Amazon Bedrock models in sequence. It uses Whisper for transcription, Claude for summarization, and Guardrails to redact sensitive data.
    5. The redacted summary is returned to the frontend application and displayed to the user.

    The following diagram illustrates the state machine workflow.

    AWS Step Functions state machine for audio processing: Whisper transcription, speaker identification, and Bedrock summary tasks

    The Step Functions state machine orchestrates a series of tasks to transcribe, summarize, and redact sensitive information from uploaded audio/video recordings:

    1. A Lambda function is triggered to gather input details (for example, Amazon S3 object path, metadata) and prepare the payload for transcription.
    2. The payload is sent to the OpenAI Whisper Large V3 Turbo model through the Amazon Bedrock Marketplace to generate a near real-time transcription of the recording.
    3. The raw transcript is passed to Anthropic’s Claude Sonnet 3.5 through Amazon Bedrock, which produces a concise and coherent summary of the conversation or content.
    4. A second Lambda function validates and forwards the summary to the redaction step.
    5. The summary is processed through Amazon Bedrock Guardrails, which automatically redacts PII and other sensitive data.
    6. The redacted summary is stored or returned to the frontend application through an API, where it is displayed to the user.

    Prerequisites

    Before you start, make sure that you have the following prerequisites in place:

    • Before using Amazon Bedrock models, you must request access—a one-time setup step. For this solution, verify that access to the Anthropic’s Claude Sonnet 3.5 model is enabled in your Amazon Bedrock account. For instructions, see Access Amazon Bedrock foundation models.
    • Set up a guardrail to enable PII redaction by configuring filters that block or mask sensitive information. For guidance on configuring filters for additional use cases, see Remove PII from conversations by using sensitive information filters.
    • Deploy the Whisper Large V3 Turbo model within the Amazon Bedrock Marketplace. This post also offers step-by-step guidance for the deployment.
    • The AWS Command Line Interface (AWS CLI) should be installed and configured. For instructions, see Installing or updating to the latest version of the AWS CLI.
    • Node.js 14.x or later should be installed.
    • The AWS CDK CLI should be installed.
    • You should have Python 3.8+.

    Create a guardrail in the Amazon Bedrock console

    For instructions for creating guardrails in Amazon Bedrock, refer to Create a guardrail. For details on detecting and redacting PII, see Remove PII from conversations by using sensitive information filters. Configure your guardrail with the following key settings:

    • Enable PII detection and handling
    • Set PII action to Redact
    • Add the relevant PII types, such as:
      • Names and identities
      • Phone numbers
      • Email addresses
      • Physical addresses
      • Financial information
      • Other sensitive personal information

    After you deploy the guardrail, note the Amazon Resource Name (ARN), and you will be using this when deploys the model.

    Deploy the Whisper model

    Complete the following steps to deploy the Whisper Large V3 Turbo model:

    1. On the Amazon Bedrock console, choose Model catalog under Foundation models in the navigation pane.
    2. Search for and choose Whisper Large V3 Turbo.
    3. On the options menu (three dots), choose Deploy.

    Amazon Bedrock console displaying filtered model catalog with Whisper Large V3 Turbo speech recognition model and deployment option

    1. Modify the endpoint name, number of instances, and instance type to suit your specific use case. For this post, we use the default settings.
    2. Modify the Advanced settings section to suit your use case. For this post, we use the default settings.
    3. Choose Deploy.

    This creates a new AWS Identity and Access Management IAM role and deploys the model.

    You can choose Marketplace deployments in the navigation pane, and in the Managed deployments section, you can see the endpoint status as Creating. Wait for the endpoint to finish deployment and the status to change to In Service, then copy the Endpoint Name, and you will be using this when deploying the

    Amazon Bedrock console: "How it works" overview, managed deployments table with Whisper model endpoint in service

    Deploy the solution infrastructure

    In the GitHub repo, follow the instructions in the README file to clone the repository, then deploy the frontend and backend infrastructure.

    We use the AWS Cloud Development Kit (AWS CDK) to define and deploy the infrastructure. The AWS CDK code deploys the following resources:

    • React frontend application
    • Backend infrastructure
    • S3 buckets for storing uploads and processed results
    • Step Functions state machine with Lambda functions for audio processing and PII redaction
    • API Gateway endpoints for handling requests
    • IAM roles and policies for secure access
    • CloudFront distribution for hosting the frontend

    Implementation deep dive

    The backend is composed of a sequence of Lambda functions, each handling a specific stage of the audio processing pipeline:

    • Upload handler – Receives audio files and stores them in Amazon S3
    • Transcription with Whisper – Converts speech to text using the Whisper model
    • Speaker detection – Differentiates and labels individual speakers within the audio
    • Summarization using Amazon Bedrock – Extracts and summarizes key points from the transcript
    • PII redaction – Uses Amazon Bedrock Guardrails to remove sensitive information for privacy compliance

    Let’s examine some of the key components:

    The transcription Lambda function uses the Whisper model to convert audio files to text:

    def transcribe_with_whisper(audio_chunk, endpoint_name):
        # Convert audio to hex string format
        hex_audio = audio_chunk.hex()
        
        # Create payload for Whisper model
        payload = {
            "audio_input": hex_audio,
            "language": "english",
            "task": "transcribe",
            "top_p": 0.9
        }
        
        # Invoke the SageMaker endpoint running Whisper
        response = sagemaker_runtime.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType='application/json',
            Body=json.dumps(payload)
        )
        
        # Parse the transcription response
        response_body = json.loads(response['Body'].read().decode('utf-8'))
        transcription_text = response_body['text']
        
        return transcription_text
    

    We use Amazon Bedrock to generate concise summaries from the transcriptions:

    def generate_summary(transcription):
        # Format the prompt with the transcription
        prompt = f"{transcription}nnGive me the summary, speakers, key discussions, and action items with owners"
        
        # Call Bedrock for summarization
        response = bedrock_runtime.invoke_model(
            modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
            body=json.dumps({
                "prompt": prompt,
                "max_tokens_to_sample": 4096,
                "temperature": 0.7,
                "top_p": 0.9,
            })
        )
        
        # Extract and return the summary
        result = json.loads(response.get('body').read())
        return result.get('completion')

    A critical component of our solution is the automatic redaction of PII. We implemented this using Amazon Bedrock Guardrails to support compliance with privacy regulations:

    def apply_guardrail(bedrock_runtime, content, guardrail_id):
    # Format content according to API requirements
    formatted_content = [{"text": {"text": content}}]
    
    # Call the guardrail API
    response = bedrock_runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion="DRAFT",
    source="OUTPUT",  # Using OUTPUT parameter for proper flow
    content=formatted_content
    )
    
    # Extract redacted text from response
    if 'action' in response and response['action'] == 'GUARDRAIL_INTERVENED':
    if len(response['outputs']) > 0:
    output = response['outputs'][0]
    if 'text' in output and isinstance(output['text'], str):
    return output['text']
    
    # Return original content if redaction fails
    return content

    When PII is detected, it’s replaced with type indicators (for example, {PHONE} or {EMAIL}), making sure that summaries remain informative while protecting sensitive data.

    To manage the complex processing pipeline, we use Step Functions to orchestrate the Lambda functions:

    {
    "Comment": "Audio Summarization Workflow",
    "StartAt": "TranscribeAudio",
    "States": {
    "TranscribeAudio": {
    "Type": "Task",
    "Resource": "arn:aws:states:::lambda:invoke",
    "Parameters": {
    "FunctionName": "WhisperTranscriptionFunction",
    "Payload": {
    "bucket": "$.bucket",
    "key": "$.key"
    }
    },
    "Next": "IdentifySpeakers"
    },
    "IdentifySpeakers": {
    "Type": "Task",
    "Resource": "arn:aws:states:::lambda:invoke",
    "Parameters": {
    "FunctionName": "SpeakerIdentificationFunction",
    "Payload": {
    "Transcription.$": "$.Payload"
    }
    },
    "Next": "GenerateSummary"
    },
    "GenerateSummary": {
    "Type": "Task",
    "Resource": "arn:aws:states:::lambda:invoke",
    "Parameters": {
    "FunctionName": "BedrockSummaryFunction",
    "Payload": {
    "SpeakerIdentification.$": "$.Payload"
    }
    },
    "End": true
    }
    }
    }

    This workflow makes sure each step completes successfully before proceeding to the next, with automatic error handling and retry logic built in.

    Test the solution

    After you have successfully completed the deployment, you can use the CloudFront URL to test the solution functionality.

    Audio/video upload and summary interface with completed file upload for team meeting recording analysis

    Security considerations

    Security is a critical aspect of this solution, and we’ve implemented several best practices to support data protection and compliance:

    • Sensitive data redaction – Automatically redact PII to protect user privacy.
    • Fine-Grained IAM Permissions – Apply the principle of least privilege across AWS services and resources.
    • Amazon S3 access controls – Use strict bucket policies to limit access to authorized users and roles.
    • API security – Secure API endpoints using Amazon Cognito for user authentication (optional but recommended).
    • CloudFront protection – Enforce HTTPS and apply modern TLS protocols to facilitate secure content delivery.
    • Amazon Bedrock data security – Amazon Bedrock (including Amazon Bedrock Marketplace) protects customer data and does not send data to providers or train using customer data. This makes sure your proprietary information remains secure when using AI capabilities.

    Clean up

    To prevent unnecessary charges, make sure to delete the resources provisioned for this solution when you’re done:

    1. Delete the Amazon Bedrock guardrail:
      1. On the Amazon Bedrock console, in the navigation menu, choose Guardrails.
      2. Choose your guardrail, then choose Delete.
    2. Delete the Whisper Large V3 Turbo model deployed through the Amazon Bedrock Marketplace:
      1. On the Amazon Bedrock console, choose Marketplace deployments in the navigation pane.
      2. In the Managed deployments section, select the deployed endpoint and choose Delete.
    3. Delete the AWS CDK stack by running the command cdk destroy, which deletes the AWS infrastructure.

    Conclusion

    This serverless audio summarization solution demonstrates the benefits of combining AWS services to create a sophisticated, secure, and scalable application. By using Amazon Bedrock for AI capabilities, Lambda for serverless processing, and CloudFront for content delivery, we’ve built a solution that can handle large volumes of audio content efficiently while helping you align with security best practices.

    The automatic PII redaction feature supports compliance with privacy regulations, making this solution well-suited for regulated industries such as healthcare, finance, and legal services where data security is paramount. To get started, deploy this architecture within your AWS environment to accelerate your audio processing workflows.


    About the Authors

    Kaiyin HuKaiyin Hu is a Senior Solutions Architect for Strategic Accounts at Amazon Web Services, with years of experience across enterprises, startups, and professional services. Currently, she helps customers build cloud solutions and drives GenAI adoption to cloud. Previously, Kaiyin worked in the Smart Home domain, assisting customers in integrating voice and IoT technologies.

    Sid VantairSid Vantair is a Solutions Architect with AWS covering Strategic accounts.  He thrives on resolving complex technical issues to overcome customer hurdles. Outside of work, he cherishes spending time with his family and fostering inquisitiveness in his children.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Comprehensive Coding Tutorial for Advanced SerpAPI Integration with Google Gemini-1.5-Flash for Advanced Analytics
    Next Article Darwin Gödel Machine: A Self-Improving AI Agent That Evolves Code Using Foundation Models and Real-World Benchmarks

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 1, 2025
    Machine Learning

    EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

    July 1, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    PlayStation Plus Extra and Premium Users get 6 New Games for April – Here’s the Complete List

    Operating Systems

    Mistral AI Releases Mistral Small 3.2: Enhanced Instruction Following, Reduced Repetition, and Stronger Function Calling for AI Integration

    Machine Learning

    CVE-2025-46826 – INSA Rouen insa-auth Information Disclosure

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4725 – iSourcecode Placement Management System SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-48387 – Tar-fs Directory Traversal Vulnerability

    June 2, 2025

    CVE ID : CVE-2025-48387

    Published : June 2, 2025, 8:15 p.m. | 3 hours, 10 minutes ago

    Description : tar-fs provides filesystem bindings for tar-stream. Versions prior to 3.0.9, 2.1.3, and 1.16.5 have an issue where an extract can write outside the specified dir with a specific tarball. This has been patched in versions 3.0.9, 2.1.3, and 1.16.5. As a workaround, use the ignore option to ignore non files/directories.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-21416 – Azure Virtual Desktop Privilege Escalation Vulnerability

    April 30, 2025

    CVE-2025-5380 – Ashinigit XueShengZhuSu Image File Upload Remote Path Traversal Vulnerability

    May 31, 2025

    Introduction and Overview Microsoft 365 Admin Center

    April 15, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.