Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»TRAMBA: A Novel Hybrid Transformer and Mamba-based Architecture for Speech Super Resolution and Enhancement for Mobile and Wearable Platforms

    TRAMBA: A Novel Hybrid Transformer and Mamba-based Architecture for Speech Super Resolution and Enhancement for Mobile and Wearable Platforms

    May 8, 2024

    Wearables have transformed human-technology interaction, facilitating continuous health monitoring. The wearables market is projected to surge from 70 billion USD in 2023 to 230 billion USD by 2032, with head-worn devices, including earphones and glasses, experiencing rapid growth (71 billion USD in 2023 to 172 billion USD by 2030). This growth is propelled by the rising significance of wearables, augmented reality (AR), and virtual reality (VR). Head-worn wearables uniquely capture speech signals, traditionally collected by over-the-air (OTA) microphones near or on the head, converting air pressure fluctuations into electrical signals for various applications. However, OTA microphones, typically located near the mouth, easily capture background noise, potentially compromising speech quality, particularly in noisy environments.

    Various studies have tackled the challenge of separating speech from background noise through denoising, sound source separation, and speech enhancement techniques. However, this approach is hindered by the model’s inability to anticipate the diverse types of background noises and the prevalence of noisy environments, such as bustling cafeterias or construction sites. Unlike OTA microphones, bone conduction microphones (BCM) placed directly on the head are resilient to ambient noise, detecting vibrations from the skin and skull during speech. Although BCMs offer noise robustness, vibration-based methods suffer from frequency attenuation, affecting speech intelligibility. Some research endeavors explore vibration and bone-conduction super-resolution methods to reconstruct higher frequencies for improved speech quality, yet practical implementation for real-time wearable systems faces challenges. These include the heavy processing demands of state-of-the-art speech super-resolution models like generative adversarial networks (GANs), which require substantial memory and computational resources, resulting in performance gaps compared to smaller footprint methods. Optimization considerations, such as sampling rate and deployment strategies, remain crucial for enhancing real-time system efficiency.

    Researchers from Northwestern University and Columbia University introduced TRAMBA, a hybrid transformer, and Mamba architecture for enhancing acoustic and bone conduction speech in mobile and wearable platforms. Previously, adopting bone conduction speech enhancement in such platforms faced challenges due to labor-intensive data collection and performance gaps between models. TRAMBA addresses this by pre-training with widely available audio speech datasets and fine-tuning with a small amount of bone conduction data. It achieves reconstructing intelligible speech using a single wearable accelerometer, demonstrating generalizability across multiple acoustic modalities. Integrated into wearable and mobile platforms, TRAMBA enables real-time speech super-resolution and significant power consumption reduction. This is also the first study to sense intelligible speech using only a single head-worn accelerometer.

    At a macro level, TRAMBA architecture integrates a modified U-Net structure with self-attention in the downsampling and upsampling layers, along with Mamba in the narrow bottleneck layer. TRAMBA operates on 512ms windows of single-channel audio and preprocesses acceleration data from an accelerometer. Each downsampling block consists of a 1D convolutional layer with LeakyReLU activations, followed by a robust conditioning layer called Scale-only Attention-based Feature-wise Linear Modulation (SAFiLM). SAFiLM utilizes a multi-head attention mechanism to learn scaling factors for enhancing feature representations. The bottleneck layer employs Mamba, known for its efficient memory usage and attention mechanisms akin to transformers. However, due to gradient vanishing issues, transformers are retained only in the downsampling and upsampling blocks. Residual connections are employed to facilitate gradient flow and optimize deeper networks, enhancing training efficiency.

    TRAMBA exhibits superior performance across various metrics and sampling rates compared to other models, including U-Net architectures. Although the Aero GAN method slightly outperforms TRAMBA in the LSD metric, TRAMBA excels in perceptual and noise metrics such as SNR, PESQ, and STOI. This highlights the effectiveness of integrating transformers and Mamba in enhancing local speech formants compared to traditional architectures. Also, transformer and Mamba-based models demonstrate superior performance over state-of-the-art GANs with significantly reduced memory and inference time requirements. Notably, TRAMBA’s efficient processing allows for real-time operation, unlike Aero GAN, which exceeds the window size, making it impractical for real-time applications. Comparisons with the top-performing U-Net architecture (TUNet) are also made.

    In conclusion, this study presents TRAMBA, a hybrid architecture combining transformer and Mamba elements for speech super-resolution and enhancement on mobile and wearable platforms. It surpasses existing methods across various acoustic modalities while maintaining a compact memory footprint of only 19.7 MBs, contrasting with GANs requiring at least hundreds of MBs. Integrated into real mobile and head-worn wearable systems, TRAMBA exhibits superior speech quality in noisy environments compared to traditional denoising approaches. Also, it extends battery life by up to 160% by reducing the resolution of audio that needs to be sampled and transmitted. TRAMBA represents a crucial advancement for integrating speech enhancement into practical mobile and wearable platforms.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 41k+ ML SubReddit

    The post TRAMBA: A Novel Hybrid Transformer and Mamba-based Architecture for Speech Super Resolution and Enhancement for Mobile and Wearable Platforms appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow to read numerical cells without converting to scientific notation (reading excel using Apache POI)
    Next Article Build a Hugging Face text classification model in Amazon SageMaker JumpStart

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Tosca Automation Tutorial: Model-Based Approach

    Development

    When is One Vulnerability Scanner Not Enough?

    Development

    Lenovo ThinkSystem Edge SE Series Cost in India | Affordable Edge Servers

    Web Development

    Top remortgage Adviser & Broker in Leeds | Remortgage Advice Leeds

    Web Development
    Hostinger

    Highlights

    Development

    Here’s how to register for 2XKO’s Alpha Lab playtests on Xbox Series X|S and Windows PC

    June 26, 2024

    Playtests for Riot Games’ fighting game 2XKO are beginning later in 2024, and players can…

    CVE-2025-44192 – SourceCodester Simple Barangay Management System SQL Injection Vulnerability

    April 30, 2025

    Former Polish Deputy Minister Charged with Misuse of Funds to Buy Pegasus Spyware

    August 30, 2024

    Building an understanding of how drivers interact with emerging vehicle technologies

    November 22, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.