Multi-Scale Neural Audio Codec (SNAC): An Wxtension of Residual Vector Quantization that Uses Quantizers Operating at Multiple Temporal Resolutions

October 23, 2024

Neural audio compression has emerged as a critical challenge in digital signal processing, particularly in achieving efficient audio representation while preserving quality. Traditional audio codecs, despite their widespread use, face limitations in achieving lower bitrates without compromising audio fidelity. While recent neural compression methods have demonstrated superior performance in reducing bitrates, they encounter significant challenges in capturing long-term audio structures. The primary limitation stems from high token granularity in existing audio tokenizers, which creates computational bottlenecks when processing extended sequences in transformer architectures. This limitation becomes particularly evident when dealing with complex audio signals that inherently contain multiple levels of abstraction, from local acoustic features to higher-level semantic structures, as observed in speech and music. Understanding and effectively representing these hierarchical structures while maintaining computational efficiency remains a fundamental challenge in audio processing systems.

Prior attempts to address audio compression challenges have primarily centered around two main approaches: neural audio codecs and multi-scale modeling techniques. Vector quantization (VQ) emerged as a fundamental tool, mapping high-dimensional audio data to discrete code vectors through VQ-VAE models. However, VQ faced efficiency limitations at higher bitrates due to codebook size constraints. This led to the development of Residual Vector Quantization (RVQ), which introduced a multi-stage quantization process. In parallel, researchers explored multi-scale models with hierarchical decoders and separate VQ-VAE models at different temporal resolutions to capture long-term musical structures, though these approaches still had limitations in balancing compression efficiency with structural representation.

Researchers from Papla Media and ETH Zurich present SNAC (Multi-Scale Neural Audio Codec), representing a significant advancement in audio compression technology by extending the residual quantization approach with multi-scale temporal resolutions. The method enhances the RVQGAN framework through strategic additions of noise blocks, depthwise convolutions, and local windowed attention mechanisms. This innovative approach enables more efficient compression while maintaining high audio quality across different temporal scales.

SNACâ€™s architecture extends RVQGAN by implementing a sophisticated multi-scale approach through several key components. The core structure consists of an encoder-decoder network with cascaded Residual Vector Quantization layers in the bottleneck. At each iteration, the system performs downsampling of residuals using average pooling, followed by codebook lookup and upsampling via nearest-neighbor interpolation. The architecture incorporates three key elements: noise blocks that inject input-dependent Gaussian noise for enhanced expressiveness, depthwise convolutions for efficient computation and training stability, and local windowed attention layers at the lowest temporal resolution to capture contextual relationships effectively.

Performance evaluation of SNAC demonstrates significant improvements across both speech and music compression tasks. In music compression, SNAC outperformed competing codecs like Encodec and DAC at comparable bitrates, even matching the quality of systems operating at twice its bitrate. The 32 kHz SNAC model showed similar performance to its 44 kHz counterpart, suggesting optimal efficiency at lower sampling rates. In speech compression, SNAC exhibited remarkable results, maintaining near-reference audio quality even at bitrates below 1 kbit/s. These results were validated through both objective metrics and MUSHRA listening tests conducted with audio experts, confirming SNACâ€™s superior performance in bandwidth-constrained applications.

SNAC represents a significant advancement in neural audio compression through its innovative multi-scale approach to Residual Vector Quantization. By operating at multiple temporal resolutions, the system effectively adapts to audio signalsâ€™ inherent structures, achieving superior compression efficiency. Comprehensive evaluations through both objective metrics and subjective testing confirm SNACâ€™s ability to deliver higher audio quality at lower bitrates compared to existing state-of-the-art codecs.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter.. Donâ€™t Forget to join ourÂ 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post Multi-Scale Neural Audio Codec (SNAC): An Wxtension of Residual Vector Quantization that Uses Quantizers Operating at Multiple Temporal Resolutions appeared first on MarkTechPost.

Source: Read MoreÂ

Previous Article15 Fundamental Mathematics Theories Needed to Understand AI

Next Article Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison

Highlights

Development

Java maven – IllegalArgumentException: Input must be set

April 21, 2024

I get the following errror when I use git bash command – however when I run the test in Intellij everything works fine and test starts. Could you help ?
mvn test -Dcucumber.filter.tags=”@desktop”
java.lang.IllegalArgumentException: Input must be set
at org.openqa.selenium.internal.Require.nonNull(Require.java:59)
at org.openqa.selenium.support.ui.FluentWait.<init>(FluentWait.java:97)
at org.openqa.selenium.support.ui.WebDriverWait.<init>(WebDriverWait.java:77)
at org.openqa.selenium.support.ui.WebDriverWait.<init>(WebDriverWait.java:46)
at utils.Utils.waitForElement(Utils.java:28)
at pages.LandingPage.acceptConsent(LandingPage.java:21)

Feature file
@desktop
Scenario: Test landing page
Given Customer accepts cookie consent
And I click “OK” button at the bottom of the page

POM
<?xml version=”1.0″ encoding=”UTF-8″?>
<project xmlns=”http://maven.apache.org/POM/4.0.0″
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation=”http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd”>
<modelVersion>4.0.0</modelVersion>

<groupId>org.example</groupId>
<artifactId>Cucumber</artifactId>
<version>1.0-SNAPSHOT</version>

<dependencies>
<!– https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java –>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.4.0</version>
</dependency>

<!– https://mvnrepository.com/artifact/org.testng/testng –>

<!– https://mvnrepository.com/artifact/org.testng/testng –>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>7.7.1</version>
<scope>test</scope>
</dependency>

<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-java –>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-java</artifactId>
<version>7.8.1</version>
</dependency>

<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-junit –>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-junit</artifactId>
<version>7.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>compile</scope>
</dependency>

<!– https://mvnrepository.com/artifact/org.assertj/assertj-core –>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<version>3.23.1</version>
<scope>test</scope>
</dependency>

<!– https://mvnrepository.com/artifact/io.github.bonigarcia/webdrivermanager –>
<dependency>
<groupId>io.github.bonigarcia</groupId>
<artifactId>webdrivermanager</artifactId>
<version>5.3.1</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>RELEASE</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.reflections</groupId>
<artifactId>reflections</artifactId>
<version>0.9.11</version>
</dependency>

<!– https://mvnrepository.com/artifact/com.google.guava/guava –>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>31.1-jre</version>
</dependency>

<!– https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java –>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.8.3</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.26</version>
<scope>compile</scope>
</dependency>

<!– https://mvnrepository.com/artifact/org.slf4j/slf4j-simple –>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>2.0.7</version>
<scope>test</scope>
</dependency>

<!– https://mvnrepository.com/artifact/org.slf4j/slf4j-api –>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.7</version>
</dependency>

<!– https://mvnrepository.com/artifact/org.springframework/spring-core –>
<!– https://mvnrepository.com/artifact/org.springframework/spring-core –>

<!– https://mvnrepository.com/artifact/com.opencsv/opencsv –>
<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>5.7.1</version>
</dependency>

<!– https://mvnrepository.com/artifact/picocontainer/picocontainer –>
<!– https://mvnrepository.com/artifact/picocontainer/picocontainer –>
<!– https://mvnrepository.com/artifact/picocontainer/picocontainer –>
<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-picocontainer –>
<!– https://mvnrepository.com/artifact/picocontainer/picocontainer –>

<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-picocontainer –>
<!– https://mvnrepository.com/artifact/picocontainer/picocontainer –>
<dependency>
<groupId>picocontainer</groupId>
<artifactId>picocontainer</artifactId>
<version>1.2</version>
</dependency>

<!– https://mvnrepository.com/artifact/de.monochromata.cucumber/reporting-plugin –>
<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-html –>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-html</artifactId>
<version>0.2.7</version>
</dependency>

<!– https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-surefire-plugin –>

<!– https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-surefire-plugin –>
<dependency>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.0.0</version>
</dependency>

<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-java8 –>
<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-core –>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-core</artifactId>
<version>7.11.2</version>
</dependency>

<!– https://mvnrepository.com/artifact/net.masterthought/maven-cucumber-reporting –>
<!– https://mvnrepository.com/artifact/net.masterthought/cucumber-reporting –>
<!– https://mvnrepository.com/artifact/com.aventstack/extentreports-cucumber4-adapter –>

<!– https://mvnrepository.com/artifact/com.aventstack/extentreports –>

<!– https://mvnrepository.com/artifact/com.relevantcodes/extentreports –>

<!– https://mvnrepository.com/artifact/net.masterthought/maven-cucumber-reporting –>
<dependency>
<groupId>net.masterthought</groupId>
<artifactId>maven-cucumber-reporting</artifactId>
<version>5.7.5</version>
</dependency>

</dependencies>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<includes>
<include>**/*TestRunner.java</include>
</includes>
</configuration>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<testFailureIgnore>true</testFailureIgnore>
</configuration>
</plugin>
<plugin>
<groupId>net.masterthought</groupId>
<artifactId>maven-cucumber-reporting</artifactId>
<version>5.7.5</version>
<executions>
<execution>
<id>execution</id>
<phase>verify</phase>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<projectName>Cucumber</projectName>
<!– optional, per documentation set this to “true” to bypass generation of Cucumber Reports entirely, defaults to false if not specified –>
<skip>false</skip>
<!– output directory for the generated report –>
<outputDirectory>${project.build.directory}</outputDirectory>
<!– optional, defaults to outputDirectory if not specified –>
<inputDirectory>${project.build.directory}/jsonReports</inputDirectory>
<jsonFiles>
<!– supports wildcard or name pattern –>
<param>**/*.json</param>
</jsonFiles>
<!– optional, defaults to outputDirectory if not specified –>

<!– optional, set true to group features by its Ids –>
<mergeFeaturesById>false</mergeFeaturesById>
<!– optional, set true to get a final report with latest results of the same test from different test runs –>
<mergeFeaturesWithRetest>false</mergeFeaturesWithRetest>
<!– optional, set true to fail build on test failures –>
<checkBuildResult>false</checkBuildResult>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>

Intellij configuration where it works

CodeSOD: Enterprise Code Coverage

Error’d: Infallabella

CodeSOD: Ready Xor Not

CodeSOD: A Set of Mistakes

Predicting the (actually very exciting) future of next gen Xbox hardware

With Astro Bot winning Game of the Year, Microsoft and Xbox need to start reinvesting in their platforming games

If ChatGPT produces AI-generated code for your app, who does it really belong to?

I tested the viral ‘tangle-free’ USB-C cable, and it’s my new travel essential

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PEAR Releases (12.09.2024)

Community News: Latest PECL Releases (12.17.2024)

Predicting the (actually very exciting) future of next gen Xbox hardware

Predicting the (actually very exciting) future of next gen Xbox hardware

With Astro Bot winning Game of the Year, Microsoft and Xbox need to start reinvesting in their platforming games

Windows 11 December 2024 update issues break Start menu and more

Multi-Scale Neural Audio Codec (SNAC): An Wxtension of Residual Vector Quantization that Uses Quantizers Operating at Multiple Temporal Resolutions

Predicting the (actually very exciting) future of next gen Xbox hardware

With Astro Bot winning Game of the Year, Microsoft and Xbox need to start reinvesting in their platforming games

Microsoft offers $4 million in AI and cloud bug bounties – how to qualify

Extending Salesforce with Salesforce Functions

BreachForums Fallout: Secretforums Announces BF Ranks, USDoD Shares Update

Space Marine 2 Not Launching? Here are the Fixes

Java maven – IllegalArgumentException: Input must be set

I tested the new Thunderbird Appointment tool, and it’s thoroughly impressive – and free

Two Ways to Create Custom Translated Messaging for HTML Forms

Halo Infinite finally gets Match Composer support and brings back one big, fan-favorite mode

Multi-Scale Neural Audio Codec (SNAC): An Wxtension of Residual Vector Quantization that Uses Quantizers Operating at Multiple Temporal Resolutions

Related Posts