Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Upwork Freelancers vs Dedicated React.js Teams: What’s Better for Your Project in 2025?

      August 1, 2025

      Is Agile dead in the age of AI?

      August 1, 2025

      Top 15 Enterprise Use Cases That Justify Hiring Node.js Developers in 2025

      July 31, 2025

      The Core Model: Start FROM The Answer, Not WITH The Solution

      July 31, 2025

      Finally, a sleek gaming laptop I can take to the office (without sacrificing power)

      August 1, 2025

      These jobs face the highest risk of AI takeover, according to Microsoft

      August 1, 2025

      Apple’s tariff costs and iPhone sales are soaring – how long until device prices are too?

      August 1, 2025

      5 ways to successfully integrate AI agents into your workplace

      August 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Enhancing Laravel Queries with Reusable Scope Patterns

      August 1, 2025
      Recent

      Enhancing Laravel Queries with Reusable Scope Patterns

      August 1, 2025

      Everything We Know About Livewire 4

      August 1, 2025

      Everything We Know About Livewire 4

      August 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      YouTube wants to use AI to treat “teens as teens and adults as adults” — with the most age-appropriate experiences and protections

      August 1, 2025
      Recent

      YouTube wants to use AI to treat “teens as teens and adults as adults” — with the most age-appropriate experiences and protections

      August 1, 2025

      Sam Altman is afraid of OpenAI’s GPT-5 creation — “The Manhattan Project feels very fast, like there are no adults in the room”

      August 1, 2025

      9 new features that arrived on the Windows 11 Insider Program during the second half of July 2025

      August 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Automate the creation of handout notes using Amazon Bedrock Data Automation

    Automate the creation of handout notes using Amazon Bedrock Data Automation

    July 31, 2025

    Organizations across various sectors face significant challenges when converting meeting recordings or recorded presentations into structured documentation. The process of creating handouts from presentations requires lots of manual effort, such as reviewing recordings to identify slide transitions, transcribing spoken content, capturing and organizing screenshots, synchronizing visual elements with speaker notes, and formatting content. These challenges impact productivity and scalability, especially when dealing with multiple presentation recordings, conference sessions, training materials, and educational content.

    In this post, we show how you can build an automated, serverless solution to transform webinar recordings into comprehensive handouts using Amazon Bedrock Data Automation for video analysis. We walk you through the implementation of Amazon Bedrock Data Automation to transcribe and detect slide changes, as well as the use of Amazon Bedrock foundation models (FMs) for transcription refinement, combined with custom AWS Lambda functions orchestrated by AWS Step Functions. Through detailed implementation details, architectural patterns, and code, you will learn how to build a workflow that automates the handout creation process.

    Amazon Bedrock Data Automation

    Amazon Bedrock Data Automation uses generative AI to automate the transformation of multimodal data (such as images, videos and more) into a customizable structured format. Examples of structured formats include summaries of scenes in a video, unsafe or explicit content in text and images, or organized content based on advertisements or brands. The solution presented in this post uses Amazon Bedrock Data Automation to extract audio segments and different shots in videos.

    Solution overview

    Our solution uses a serverless architecture orchestrated by Step Functions to process presentation recordings into comprehensive handouts. The workflow consists of the following steps:

    1. The workflow begins when a video is uploaded to Amazon Simple Storage Service (Amazon S3), which triggers an event notification through Amazon EventBridge rules that initiates our video processing workflow in Step Functions.
    2. After the workflow is triggered, Amazon Bedrock Data Automation initiates a video transformation job to identify different shots in the video. In our case, this is represented by a change of slides. The workflow moves into a waiting state, and checks for the transformation job progress. If the job is in progress, the workflow returns to the waiting state. When the job is complete, the workflow continues, and we now have extracted both visual shots and spoken content.
    3. These visual shots and spoken content feed into a synchronization step. In this Lambda function, we use the output of the Amazon Bedrock Data Automation job to match the spoken content to the correlating shots based on the matching of timestamps.
    4. After function has matched the spoken content to the visual shots, the workflow moves into a parallel state. One of the steps of this state is the generation of screenshots. We use a FFmpeg-enabled Lambda function to create images for each identified video shot.
    5. The other step of the parallel state is the refinement of our transformations. Amazon Bedrock processes and improves each raw transcription section through a Map state. This helps us remove speech disfluencies and improve the sentence structure.
    6. Lastly, after the screenshots and refined transcript are created, the workflow uses a Lambda function to create handouts. We use the Python-PPTX library, which generates the final presentation with synchronized content. These final handouts are stored in Amazon S3 for distribution.

    The following diagram illustrates this workflow.

    AWS Step Functions workflow diagram for data automation process

    If you want to try out this solution, we have created an AWS Cloud Development Kit (AWS CDK) stack available in the accompanying GitHub repo that you can deploy in your account. It deploys the Step Functions state machine to orchestrate the creation of handout notes from the presentation video recording. It also provides you with a sample video to test out the results.

    To deploy and test the solution in your own account, follow the instructions in the GitHub repository’s README file. The following sections describe in more detail the technical implementation details of this solution.

    Video upload and initial processing

    The workflow begins with Amazon S3, which serves as the entry point for our video processing pipeline. When a video is uploaded to a dedicated S3 bucket, it triggers an event notification that, through EventBridge rules, initiates our Step Functions workflow.

    Shot detection and transcription using Amazon Bedrock Data Automation

    This step uses Amazon Bedrock Data Automation to detect slide transitions and create video transcriptions. To integrate this as part of the workflow, you must create an Amazon Bedrock Data Automation project. A project is a grouping of output configurations. Each project can contain standard output configurations as well as custom output blueprints for documents, images, video, and audio. The project has already been created as part of the AWS CDK stack. After you set up your project, you can process content using the InvokeDataAutomationAsync API. In our solution, we use the Step Functions service integration to execute this API call and start the asynchronous processing job. A job ID is returned for tracking the process.

    The workflow must now check the status of the processing job before continuing with the handout creation process. This is done by polling Amazon Bedrock Data Automation for the job status using the GetDataAutomationStatus API on a regular basis. Using a combination of the Step Functions Wait and Choice states, we can ask the workflow to poll the API on a fixed interval. This not only gives you the ability to customize the interval depending on your needs, but it also helps you control the workflow costs, because every state transition is billed in Standard workflows, which this solution uses.

    When the GetDataAutomationStatus API output shows as SUCCESS, the loop exits and the workflow continues to the next step, which will match transcripts to the visual shots.

    Matching audio segments with corresponding shots

    To create comprehensive handouts, you must establish a mapping between the visual shots and their corresponding audio segments. This mapping is crucial to make sure the final handouts accurately represent both the visual content and the spoken narrative of the presentation.

    A shot represents a series of interrelated consecutive frames captured during the presentation, typically indicating a distinct visual state. In our presentation context, a shot corresponds to either a new slide or a significant slide animation that adds or modifies content.

    An audio segment is a specific portion of an audio recording that contains uninterrupted spoken language, with minimal pauses or breaks. This segment captures a natural flow of speech. The Amazon Bedrock Data Automation output provides an audio_segments array, with each segment containing precise timing information such as the start and end time of each segment. This allows for accurate synchronization with the visual shots.

    The synchronization between shots and audio segments is critical for creating accurate handouts that preserve the presentation’s narrative flow. To achieve this, we implement a Lambda function that manages the matching process in three steps:

    1. The function retrieves the processing results from Amazon S3, which contains both the visual shots and audio segments.
    2. It creates structured JSON arrays from these components, preparing them for the matching algorithm.
    3. It executes a matching algorithm that analyzes the different timestamps of the audio segments and the shots, and matches them based on these timestamps. This algorithm also considers timestamp overlaps between shots and audio segments.

    For each shot, the function examines audio segments and identifies those whose timestamps overlap with the shot’s duration, making sure the relevant spoken content is associated with its corresponding slide in the final handouts. The function returns the matched results directly to the Step Functions workflow, where it will serve as input for the next step, where Amazon Bedrock will refine the transcribed content and where we will create screenshots in parallel.

    Screenshot generation

    After you get the timestamps of each shot and associated audio segment, you can capture the slides of the presentation to create comprehensive handouts. Each detected shot from Amazon Bedrock Data Automation represents a distinct visual state in the presentation—typically a new slide or significant content change. By generating screenshots at these precise moments, we make sure our handouts accurately represent the visual flow of the original presentation.

    This is done with a Lambda function using the ffmpeg-python library. This library acts as a Python binding for the FFmpeg media framework, so you can run FFmpeg terminal commands using Python methods. In our case, we can extract frames from the video at specific timestamps identified by Amazon Bedrock Data Automation. The screenshots are stored in an S3 bucket to be used in creating the handouts, as described in the following code. To use ffmpeg-python in Lambda, we created a Lambda ZIP deployment containing the required dependencies to run the code. Instructions on how to create the ZIP file can be found in our GitHub repository.

    The following code shows how a screenshot is taken using ffmpeg-python. You can view the full Lambda code on GitHub.

    ## Taking a screenshot at a specific timestamp 
    ffmpeg.input(video_path, ss=timestamp).output(screenshot_path, vframes=1).run()

    Transcript refinement with Amazon Bedrock

    In parallel with the screenshot generation, we refine the transcript using a large language model (LLM). We do this to improve the quality of the transcript and filter out errors and speech disfluencies. This process uses an Amazon Bedrock model to enhance the quality of the matched transcription segments while maintaining content accuracy. We use a Lambda function that integrates with Amazon Bedrock through the Python Boto3 client, using a prompt to guide the model’s refinement process. The function can then process each transcript segment, instructing the model to do the following:

    • Fix typos and grammatical errors
    • Remove speech disfluencies (such as “uh” and “um”)
    • Maintain the original meaning and technical accuracy
    • Preserve the context of the presentation

    In our solution, we used the following prompt with three example inputs and outputs:

    prompt = '''This is the result of a transcription. 
    I want you to look at this audio segment and fix the typos and mistakes present. 
    Feel free to use the context of the rest of the transcript to refine (but don't leave out any info). 
    Leave out parts where the speaker misspoke. 
    Make sure to also remove works like "uh" or "um". 
    Only make change to the info or sentence structure when there are mistakes. 
    Only give back the refined transcript as output, don't add anything else or any context or title. 
    If there are no typos or mistakes, return the original object input. 
    Do not explain why you have or have not made any changes; I just want the JSON object. 
    
    These are examples: 
    Input: <an example-input> 
    Output: <an example-output>
    
    Input: <an example-input> 
    Output: <an example-output>
    
    Input: <an example-input> 
    Output: <an example-output>
    
    Here is the object: ''' + text

    The following is an example input and output:

    Input: Yeah. Um, so let's talk a little bit about recovering from a ransomware attack, right?
    
    Output: Yes, let's talk a little bit about recovering from a ransomware attack.

    To optimize processing speed while adhering to the maximum token limits of the Amazon Bedrock InvokeModel API, we use the Step Functions Map state. This enables parallel processing of multiple transcriptions, each corresponding to a separate video segment. Because these transcriptions must be handled individually, the Map state efficiently distributes the workload. Additionally, it reduces operational overhead by managing integration—taking an array as input, passing each element to the Lambda function, and automatically reconstructing the array upon completion.The Map state returns the refined transcript directly to the Step Functions workflow, maintaining the structure of the matched segments while providing cleaner, more professional text content for the final handout generation.

    Handout generation

    The final step in our workflow involves creating the handouts using the python-pptx library. This step combines the refined transcripts with the generated screenshots to create a comprehensive presentation document.

    The Lambda function processes the matched segments sequentially, creating a new slide for each screenshot while adding the corresponding refined transcript as speaker notes. The implementation uses a custom Lambda layer containing the python-pptx package. To enable this functionality in Lambda, we created a custom layer using Docker. By using Docker to create our layer, we make sure the dependencies are compiled in an environment that matches the Lambda runtime. You can find the instructions to create this layer and the layer itself in our GitHub repository.

    The Lambda function implementation uses python-pptx to create structured presentations:

    import boto3
    from pptx import Presentation
    from pptx.util import Inches
    import os
    import json
    
    def lambda_handler(event, context):
        # Create new presentation with specific dimensions
        prs = Presentation()
        prs.slide_width = int(12192000)  # Standard presentation width
        prs.slide_height = int(6858000)  # Standard presentation height
        
        # Process each segment
        for i in range(num_images):
            # Add new slide
            slide = prs.slides.add_slide(prs.slide_layouts[5])
            
            # Add screenshot as full-slide image
            slide.shapes.add_picture(image_path, 0, 0, width=slide_width)
            
            # Add transcript as speaker notes
            notes_slide = slide.notes_slide
            transcription_text = transcription_segments[i].get('transcript', '')
            notes_slide.notes_text_frame.text = transcription_text
        
        # Save presentation
        pptx_path = os.path.join(tmp_dir, "lecture_notes.pptx")
        prs.save(pptx_path)
    

    The function processes segments sequentially, creating a presentation that combines visual shots with their corresponding audio segments, resulting in handouts ready for distribution.

    The following screenshot shows an example of a generated slide with notes. The full deck has been added as a file in the GitHub repository.

    Slide presentation showing an example output

    Conclusion

    In this post, we demonstrated how to build a serverless solution that automates the creation of handout notes from recorded slide presentations. By combining Amazon Bedrock Data Automation with custom Lambda functions, we’ve created a scalable pipeline that significantly reduces the manual effort required in creating handout materials. Our solution addresses several key challenges in content creation:

    • Automated detection of slide transitions, content changes, and accurate transcription of spoken content using the video modality capabilities of Amazon Bedrock Data Automation
    • Intelligent refinement of transcribed text using Amazon Bedrock
    • Synchronized visual and textual content with a custom matching algorithm
    • Handout generation using the ffmpeg-python and python-pptx libraries in Lambda

    The serverless architecture, orchestrated by Step Functions, provides reliable execution while maintaining cost-efficiency. By using Python packages for FFmpeg and a Lambda layer for python-pptx, we’ve overcome technical limitations and created a robust solution that can handle various presentation formats and lengths. This solution can be extended and customized for different use cases, from educational institutions to corporate training programs. Certain steps such as the transcript refinement can also be improved, for instance by adding translation capabilities to account for diverse audiences.

    To learn more about Amazon Bedrock Data Automation, refer to the following resources:

    • Transform unstructured data into meaningful insights using Amazon Bedrock Data Automation
    • New Amazon Bedrock capabilities enhance data processing and retrieval
    • Simplify multimodal generative AI with Amazon Bedrock Data Automation
    • Guidance for Multimodal Data Processing Using Amazon Bedrock Data Automation

    About the authors

    Laura VerghoteLaura Verghote is the GenAI Lead for PSI Europe at Amazon Web Services (AWS), driving Generative AI adoption across public sector organizations. She partners with customers throughout Europe to accelerate their GenAI initiatives through technical expertise and strategic planning, bridging complex requirements with innovative AI solutions.

    Elie Elmalem is a solutions architect at Amazon Web Services (AWS) and supports Education customers across the UK and EMEA. He works with customers to effectively use AWS services, providing architectural best practices, advice, and guidance. Outside of work, he enjoys spending time with family and friends and loves watching his favorite football team play.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNVIDIA AI Presents ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
    Next Article Streamline GitHub workflows with generative AI using Amazon Bedrock and MCP

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 1, 2025
    Machine Learning

    TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs

    August 1, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Creating proportional, equal-height image rows with CSS

    Web Development

    SAP Patch Fixes Critical CVSS 9.6 Flaw in NetWeaver: Privilege Escalation and System Integrity at Risk

    Security

    CVE-2025-20974 – Android PackageInstallerCN Permission Bypass

    Common Vulnerabilities and Exposures (CVEs)

    What Zuckerberg’s ‘personal superintelligence’ sales pitch leaves out

    News & Updates

    Highlights

    How to Create a JavaScript EXIF Info Parser to Read Image Metadata

    April 16, 2025

    Comments Source: Read More 

    I love how Grounded 2 gets rid of tired survival game tools — it makes the game better, not easier

    July 29, 2025

    CVE-2025-23235 – OpenHarmony Out-of-Bounds Read Denial of Service

    June 8, 2025

    EA cancels Titanfall incubation project, lays off staff at Apex Legends and Star Wars Jedi developer Respawn Entertainment

    April 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.