Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Error’d: Pickup Sticklers

      September 27, 2025

      From Prompt To Partner: Designing Your Custom AI Assistant

      September 27, 2025

      Microsoft unveils reimagined Marketplace for cloud solutions, AI apps, and more

      September 27, 2025

      Design Dialects: Breaking the Rules, Not the System

      September 27, 2025

      Building personal apps with open source and AI

      September 12, 2025

      What Can We Actually Do With corner-shape?

      September 12, 2025

      Craft, Clarity, and Care: The Story and Work of Mengchu Yao

      September 12, 2025

      Cailabs secures €57M to accelerate growth and industrial scale-up

      September 12, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

      September 28, 2025
      Recent

      Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

      September 28, 2025

      Mastering PHP File Uploads: A Guide to php.ini Settings and Code Examples

      September 28, 2025

      The first browser with JavaScript landed 30 years ago

      September 27, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured
      Recent
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Build real-time conversational AI experiences using Amazon Nova Sonic and LiveKit

    Build real-time conversational AI experiences using Amazon Nova Sonic and LiveKit

    July 10, 2025

    The rapid growth of generative AI technology has been a catalyst for business productivity growth, creating new opportunities for greater efficiency, enhanced customer service experiences, and more successful customer outcomes. Today’s generative AI advances are helping existing technologies achieve their long-promised potential. For example, voice-first applications have been gaining traction across industries for years—from customer service to education to personal voice assistants and agents. But early versions of this technology struggled to interpret human speech or mimic real conversation. Building real-time, natural-sounding, low-latency voice AI has until recently remained complex, especially when working with streaming infrastructure and speech foundation models (FMs).

    The rapid progress of conversational AI technology has led to the development of powerful models that address the historical challenges of traditional voice-first applications. Amazon Nova Sonic is a state-of-the-art speech-to-speech FM designed to build real-time conversational AI applications in Amazon Bedrock. This model offers industry-leading price-performance and low latency. The Amazon Nova Sonic architecture unifies speech understanding and generation into a single model, to enable real, human-like voice conversations in AI applications.

    Amazon Nova Sonic accommodates the breadth and richness of human language. It can understand speech in different speaking styles and generate speech in expressive voices, including both masculine-sounding and feminine-sounding voices. Amazon Nova Sonic can also adapt the patterns of stress, intonation, and style of the generated speech response to align with the context and content of the speech input. Additionally, Amazon Nova Sonic supports function calling and knowledge grounding with enterprise data using Retrieval-Augmented Generation (RAG). To further simplify the process of getting the most from this technology, Amazon Nova Sonic is now integrated with LiveKit’s WebRTC framework, a widely used platform that enables developers to build real-time audio, video, and data communication applications. This integration makes it possible for developers to build conversational voice interfaces without needing to manage complex audio pipelines or signaling protocols. In this post, we explain how this integration works, how it addresses the historical challenges of voice-first applications, and some initial steps to start using this solution.

    Solution overview

    LiveKit is a popular open source WebRTC platform that provides scalable, multi‑user real‑time video, audio, and data communication. Designed as a full-stack solution, it offers a Selective Forwarding Unit (SFU) architecture; modern client SDKs across web, mobile, and server environments; and built‑in features such as speaker detection, bandwidth optimization, simulcast support, and seamless room management. You can deploy it as a self-hosted system or on AWS, so developers can focus on application logic without managing the underlying media infrastructure.

    Building real-time, voice-first AI applications requires developers to manage multiple layers of infrastructure—from handling audio capture and streaming protocols to coordinating signaling, routing, and event-driven state management. Working with bidirectional streaming models such as Amazon Nova Sonic often meant setting up custom pipelines, managing audio buffers, and working to maintain low-latency performance across diverse client environments. These tasks added development overhead and required specialized knowledge in networking and real-time systems, making it difficult to quickly prototype or scale production-ready voice AI solutions. To address this complexity, we implemented a real-time plugin for Amazon Nova Sonic in the LiveKit Agent SDK. This solution removes the need for developers to manage audio signaling, streaming protocols, or custom transport layers. LiveKit handles real-time audio routing and session management, and Amazon Nova Sonic powers speech understanding and generation. Together, LiveKit and Amazon Nova Sonic provide a streamlined, production-ready setup for building voice-first AI applications. Features such as full-duplex audio, voice activity detection, and noise suppression are available out of the box, so developers can focus on application logic rather than infrastructure orchestration.

    The following video shows Amazon Nova Sonic and LiveKit in action. You can find the code for this example in the LiveKit Examples GitHub repo.

    The following diagram illustrates the solution architecture of Amazon Nova Sonic deployed as a voice agent in the LiveKit framework on AWS.

    Diagram illustrates the solution architecture of Amazon Nova Sonic

    Prerequisites

    To implement the solution, you must have the following prerequisites:

    • Python version 3.12 or higher
    • An AWS account with appropriate Identity and Access Management (IAM) permissions for Amazon Bedrock
    • Access to Amazon Nova Sonic on Amazon Bedrock
    • A web browser (such as Google Chrome or Mozilla Firefox) with WebRTC support

    Deploy the solution

    Complete the following steps to get started talking to Amazon Nova Sonic through LiveKit:

    1. Install the necessary dependencies:
    brew install livekit livekit-cli
    curl -LsSf https://astral.sh/uv/install.sh | sh

    uv is a fast, drop-in replacement for pip, used in the LiveKit Agents SDK (you can also choose to use pip).

    1. Set up a new local virtual environment:
    uv init sonic_demo
    cd sonic_demo
    uv venv --python 3.12
    uv add livekit-agents python-dotenv 'livekit-plugins-aws[realtime]'
    1. To run the LiveKit server locally, open a new terminal (for example, a new UNIX process) and run the following command:
    livekit-server --dev

    You must keep the LiveKit server running for the entire duration that the Amazon Nova Sonic agent is running, because it’s responsible for proxying data between parties.

    1. Generate an access token using the following code. The default values for api-key and api-secret are devkey and secret, respectively. When creating an access token for permission to join a LiveKit room, you must specify the room name and user identity.
    lk token create 
     --api-key devkey --api-secret secret 
     --join --room my-first-room --identity user1 
     --valid-for 24h
    1. Create environment variables. You must specify the AWS credentials:
    vim .env
    
    // contents of the .env file
    AWS_ACCESS_KEY_ID=<aws access key id>
    AWS_SECRET_ACCESS_KEY=<aws secret access key>
    
    # if using a permanent identity (e.g. IAM user)
    # then session token is optional
    AWS_SESSION_TOKEN=<aws session token>
    LIVEKIT_API_KEY=devkey
    LIVEKIT_API_SECRET=secret
    1. Create the main.py file:
    from dotenv import load_dotenv
    from livekit import agents
    from livekit.agents import AgentSession, Agent, AutoSubscribe
    from livekit.plugins.aws.experimental.realtime import RealtimeModel
    
    load_dotenv()
    
    async def entrypoint(ctx: agents.JobContext):
        # Connect to the LiveKit server
        await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
        
        # Initialize the Amazon Nova Sonic agent
        agent = Agent(instructions="You are a helpful voice AI assistant.")
        session = AgentSession(llm=RealtimeModel())
        
        # Start the session in the specified room
        await session.start(
            room=ctx.room,
            agent=agent,
        )
    
    if __name__ == "__main__":
        agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
    1. Run the main.py file:
    uv run python main.py connect --room my-first-room

    Now you’re ready to connect to the agent frontend.

    1. Go to https://agents-playground.livekit.io/.
    2. Choose Manual.
    3. In the first text field, enter ws://localhost:7880.
    4. In the second text field, enter the access token you generated.
    5. Choose Connect.

    You should now be able to talk to Amazon Nova Sonic in real time.

    If you’re disconnected from the LiveKit room, you will have to restart the agent process (main.py) to talk to Amazon Nova Sonic again.

    Clean up

    This example runs locally, meaning there are no special teardown steps required for cleanup. You can simply exit the agent and LiveKit server processes. The only cost incurred are the costs of making calls to Amazon Bedrock to talk to Amazon Nova Sonic. After you have disconnected from the LiveKit room, you will no longer incur charges and no AWS resources will remain in use.

    Conclusion

    Thanks to generative AI, the qualitative benefits long promised by voice-first applications can now be realized. By combining Amazon Nova Sonic with LiveKit’s WebRTC infrastructure, developers can build real-time, voice-first AI applications with less complexity and faster deployment. The integration reduces the need for custom audio pipelines, so teams can focus on building engaging conversational experiences.

    “Our goal with this integration is to simplify the development of real-time voice applications,” said Josh Wulf, CEO of LiveKit. “By combining LiveKit’s robust media routing and session management with Nova Sonic’s speech capabilities, we’re helping developers move faster—no need to manage low-level infrastructure, so they can focus on building the conversation.”

    To learn more about Amazon Nova Sonic, read the AWS News Blog, Amazon Nova Sonic product page, and Amazon Nova Sonic User Guide. To get started with Amazon Nova Sonic in Amazon Bedrock, visit the Amazon Bedrock console.


    About the authors

    Glen Ko is an AI developer at AWS Bedrock, where his focus is on enabling the proliferation of open source AI tooling and supporting open source innovation.

    Anuj Jauhari is a Senior Product Marketing Manager at Amazon Web Services, where he helps customers realize value from innovations in generative AI.

    Osman Ipek is a Solutions Architect on Amazon’s AGI team focusing on Nova foundation models. He guides teams to accelerate development through practical AI implementation strategies, with expertise spanning voice AI, NLP, and MLOps.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNew Opossum Attack Allows Hackers to Compromise Secure TLS Channels with Malicious Messages
    Next Article Blind Eagle Uses Proton66 Hosting for Phishing, RAT Deployment on Colombian Banks

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

    Machine Learning

    How to Build a Custom MCP Server with TypeScript – A Handbook for Developers

    Development

    5 Major Concerns With Employees Using The Browser

    Development

    Phishers built fake Okta and Microsoft 365 login sites with AI – here’s how to protect yourself

    News & Updates

    Highlights

    You can ask Gemini AI anything directly in Google Chrome – here’s how and why you should

    July 16, 2025

    Here’s how you can call on Gemini for help from any web page within Chrome.…

    See-Through Parallel Universes with Your Mind’s Eye – The Course Guidebook: Chapter 2

    April 23, 2025

    Product Walkthrough: Securing Microsoft Copilot with Reco

    April 29, 2025

    From GenAI Demos to Production: Why Structured Workflows Are Essential

    April 25, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.