Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Docker Compose gets new features for building and running agents

      July 10, 2025

      Why Enterprises Are Choosing AI-Driven React.js Development Companies in 2025

      July 10, 2025

      Unmasking The Magic: The Wizard Of Oz Method For UX Research

      July 10, 2025

      Newest LF Decentralized Trust Lab HOPrS identifies if photos have been altered

      July 9, 2025

      It’s Ubisoft’s most polished game in years — Assassin’s Creed Shadows on Xbox Series X has a fantastic Amazon Prime Day discount

      July 10, 2025

      I’ve accepted that Adobe subscriptions are part of my creative life — but with these discounts, it stings much less than it used to

      July 10, 2025

      ChatGPT falls for a “dead grandma” scam and generates Microsoft Windows 7 activation keys — but they’re useless

      July 10, 2025

      Copilot and ChatGPT went against a 4 KB Atari chess game from the 70s — with an embarrassing effort from Microsoft’s AI

      July 10, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Custom Object Casting in Laravel Models

      July 10, 2025
      Recent

      Custom Object Casting in Laravel Models

      July 10, 2025

      PHP 8.5 Introduces an INI Diff Option

      July 10, 2025

      Cally – Small, feature-rich calendar components

      July 9, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      It’s Ubisoft’s most polished game in years — Assassin’s Creed Shadows on Xbox Series X has a fantastic Amazon Prime Day discount

      July 10, 2025
      Recent

      It’s Ubisoft’s most polished game in years — Assassin’s Creed Shadows on Xbox Series X has a fantastic Amazon Prime Day discount

      July 10, 2025

      I’ve accepted that Adobe subscriptions are part of my creative life — but with these discounts, it stings much less than it used to

      July 10, 2025

      ChatGPT falls for a “dead grandma” scam and generates Microsoft Windows 7 activation keys — but they’re useless

      July 10, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Build real-time conversational AI experiences using Amazon Nova Sonic and LiveKit

    Build real-time conversational AI experiences using Amazon Nova Sonic and LiveKit

    July 10, 2025

    The rapid growth of generative AI technology has been a catalyst for business productivity growth, creating new opportunities for greater efficiency, enhanced customer service experiences, and more successful customer outcomes. Today’s generative AI advances are helping existing technologies achieve their long-promised potential. For example, voice-first applications have been gaining traction across industries for years—from customer service to education to personal voice assistants and agents. But early versions of this technology struggled to interpret human speech or mimic real conversation. Building real-time, natural-sounding, low-latency voice AI has until recently remained complex, especially when working with streaming infrastructure and speech foundation models (FMs).

    The rapid progress of conversational AI technology has led to the development of powerful models that address the historical challenges of traditional voice-first applications. Amazon Nova Sonic is a state-of-the-art speech-to-speech FM designed to build real-time conversational AI applications in Amazon Bedrock. This model offers industry-leading price-performance and low latency. The Amazon Nova Sonic architecture unifies speech understanding and generation into a single model, to enable real, human-like voice conversations in AI applications.

    Amazon Nova Sonic accommodates the breadth and richness of human language. It can understand speech in different speaking styles and generate speech in expressive voices, including both masculine-sounding and feminine-sounding voices. Amazon Nova Sonic can also adapt the patterns of stress, intonation, and style of the generated speech response to align with the context and content of the speech input. Additionally, Amazon Nova Sonic supports function calling and knowledge grounding with enterprise data using Retrieval-Augmented Generation (RAG). To further simplify the process of getting the most from this technology, Amazon Nova Sonic is now integrated with LiveKit’s WebRTC framework, a widely used platform that enables developers to build real-time audio, video, and data communication applications. This integration makes it possible for developers to build conversational voice interfaces without needing to manage complex audio pipelines or signaling protocols. In this post, we explain how this integration works, how it addresses the historical challenges of voice-first applications, and some initial steps to start using this solution.

    Solution overview

    LiveKit is a popular open source WebRTC platform that provides scalable, multi‑user real‑time video, audio, and data communication. Designed as a full-stack solution, it offers a Selective Forwarding Unit (SFU) architecture; modern client SDKs across web, mobile, and server environments; and built‑in features such as speaker detection, bandwidth optimization, simulcast support, and seamless room management. You can deploy it as a self-hosted system or on AWS, so developers can focus on application logic without managing the underlying media infrastructure.

    Building real-time, voice-first AI applications requires developers to manage multiple layers of infrastructure—from handling audio capture and streaming protocols to coordinating signaling, routing, and event-driven state management. Working with bidirectional streaming models such as Amazon Nova Sonic often meant setting up custom pipelines, managing audio buffers, and working to maintain low-latency performance across diverse client environments. These tasks added development overhead and required specialized knowledge in networking and real-time systems, making it difficult to quickly prototype or scale production-ready voice AI solutions. To address this complexity, we implemented a real-time plugin for Amazon Nova Sonic in the LiveKit Agent SDK. This solution removes the need for developers to manage audio signaling, streaming protocols, or custom transport layers. LiveKit handles real-time audio routing and session management, and Amazon Nova Sonic powers speech understanding and generation. Together, LiveKit and Amazon Nova Sonic provide a streamlined, production-ready setup for building voice-first AI applications. Features such as full-duplex audio, voice activity detection, and noise suppression are available out of the box, so developers can focus on application logic rather than infrastructure orchestration.

    The following video shows Amazon Nova Sonic and LiveKit in action. You can find the code for this example in the LiveKit Examples GitHub repo.

    The following diagram illustrates the solution architecture of Amazon Nova Sonic deployed as a voice agent in the LiveKit framework on AWS.

    Diagram illustrates the solution architecture of Amazon Nova Sonic

    Prerequisites

    To implement the solution, you must have the following prerequisites:

    • Python version 3.12 or higher
    • An AWS account with appropriate Identity and Access Management (IAM) permissions for Amazon Bedrock
    • Access to Amazon Nova Sonic on Amazon Bedrock
    • A web browser (such as Google Chrome or Mozilla Firefox) with WebRTC support

    Deploy the solution

    Complete the following steps to get started talking to Amazon Nova Sonic through LiveKit:

    1. Install the necessary dependencies:
    brew install livekit livekit-cli
    curl -LsSf https://astral.sh/uv/install.sh | sh

    uv is a fast, drop-in replacement for pip, used in the LiveKit Agents SDK (you can also choose to use pip).

    1. Set up a new local virtual environment:
    uv init sonic_demo
    cd sonic_demo
    uv venv --python 3.12
    uv add livekit-agents python-dotenv 'livekit-plugins-aws[realtime]'
    1. To run the LiveKit server locally, open a new terminal (for example, a new UNIX process) and run the following command:
    livekit-server --dev

    You must keep the LiveKit server running for the entire duration that the Amazon Nova Sonic agent is running, because it’s responsible for proxying data between parties.

    1. Generate an access token using the following code. The default values for api-key and api-secret are devkey and secret, respectively. When creating an access token for permission to join a LiveKit room, you must specify the room name and user identity.
    lk token create 
     --api-key devkey --api-secret secret 
     --join --room my-first-room --identity user1 
     --valid-for 24h
    1. Create environment variables. You must specify the AWS credentials:
    vim .env
    
    // contents of the .env file
    AWS_ACCESS_KEY_ID=<aws access key id>
    AWS_SECRET_ACCESS_KEY=<aws secret access key>
    
    # if using a permanent identity (e.g. IAM user)
    # then session token is optional
    AWS_SESSION_TOKEN=<aws session token>
    LIVEKIT_API_KEY=devkey
    LIVEKIT_API_SECRET=secret
    1. Create the main.py file:
    from dotenv import load_dotenv
    from livekit import agents
    from livekit.agents import AgentSession, Agent, AutoSubscribe
    from livekit.plugins.aws.experimental.realtime import RealtimeModel
    
    load_dotenv()
    
    async def entrypoint(ctx: agents.JobContext):
        # Connect to the LiveKit server
        await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
        
        # Initialize the Amazon Nova Sonic agent
        agent = Agent(instructions="You are a helpful voice AI assistant.")
        session = AgentSession(llm=RealtimeModel())
        
        # Start the session in the specified room
        await session.start(
            room=ctx.room,
            agent=agent,
        )
    
    if __name__ == "__main__":
        agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
    1. Run the main.py file:
    uv run python main.py connect --room my-first-room

    Now you’re ready to connect to the agent frontend.

    1. Go to https://agents-playground.livekit.io/.
    2. Choose Manual.
    3. In the first text field, enter ws://localhost:7880.
    4. In the second text field, enter the access token you generated.
    5. Choose Connect.

    You should now be able to talk to Amazon Nova Sonic in real time.

    If you’re disconnected from the LiveKit room, you will have to restart the agent process (main.py) to talk to Amazon Nova Sonic again.

    Clean up

    This example runs locally, meaning there are no special teardown steps required for cleanup. You can simply exit the agent and LiveKit server processes. The only cost incurred are the costs of making calls to Amazon Bedrock to talk to Amazon Nova Sonic. After you have disconnected from the LiveKit room, you will no longer incur charges and no AWS resources will remain in use.

    Conclusion

    Thanks to generative AI, the qualitative benefits long promised by voice-first applications can now be realized. By combining Amazon Nova Sonic with LiveKit’s WebRTC infrastructure, developers can build real-time, voice-first AI applications with less complexity and faster deployment. The integration reduces the need for custom audio pipelines, so teams can focus on building engaging conversational experiences.

    “Our goal with this integration is to simplify the development of real-time voice applications,” said Josh Wulf, CEO of LiveKit. “By combining LiveKit’s robust media routing and session management with Nova Sonic’s speech capabilities, we’re helping developers move faster—no need to manage low-level infrastructure, so they can focus on building the conversation.”

    To learn more about Amazon Nova Sonic, read the AWS News Blog, Amazon Nova Sonic product page, and Amazon Nova Sonic User Guide. To get started with Amazon Nova Sonic in Amazon Bedrock, visit the Amazon Bedrock console.


    About the authors

    Glen Ko is an AI developer at AWS Bedrock, where his focus is on enabling the proliferation of open source AI tooling and supporting open source innovation.

    Anuj Jauhari is a Senior Product Marketing Manager at Amazon Web Services, where he helps customers realize value from innovations in generative AI.

    Osman Ipek is a Solutions Architect on Amazon’s AGI team focusing on Nova foundation models. He guides teams to accelerate development through practical AI implementation strategies, with expertise spanning voice AI, NLP, and MLOps.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCVE-2025-38337 – Linux Kernel jbd2 Null Pointer Dereference and Data Race Vulnerability
    Next Article How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 10, 2025
    Machine Learning

    Scale generative AI use cases, Part 1: Multi-tenant hub and spoke architecture using AWS Transit Gateway

    July 9, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Microsoft wants to streamline your workday with powerful AI agents

    Operating Systems

    TeleSculptor – transforms aerial videos and images into Geospatial 3D models

    Linux

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    Machine Learning

    CVE-2025-47951 – Weblate Second Factor OTP Guessing Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-47292 – Cap Collectif Remote Code Execution Vulnerability

    May 14, 2025

    CVE ID : CVE-2025-47292

    Published : May 14, 2025, 11:16 a.m. | 51 minutes ago

    Description : Cap Collectif is an online decision making platform that integrates several tools. Before commit 812f2a7d271b76deab1175bdaf2be0b8102dd198, the `DebateAlternateArgumentsResolver` deserializes a `Cursor`, allowing any classes and which can be controlled by unauthenticated user. Exploitation of this vulnerability can lead to Remote Code Execution. The vulnerability is fixed in commit 812f2a7d271b76deab1175bdaf2be0b8102dd198.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Millions of new materials discovered with deep learning

    May 27, 2025

    Turning User Research Into Real Organizational Change

    July 1, 2025

    CVE-2025-48882 – PHPOffice Math XML External Entity (XXE) Vulnerability

    May 30, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.