NVIDIA unveils new AI model for generating audio

NVIDIA has announced that its researchers have developed a new generative AI model capable of creating audio from text or audio prompts.

Fugatto, which is short for Foundational Generative Audio Transformer Opus 1, can create music from text prompts, remove or add instruments from existing audio, or even change the accent or emotion in a voice.

For instance, a promo video by NVIDIA shows a user prompting Fugatto to create â€œDeep, rumbling bass pulses paired with intermittent, high-pitched digital chirps, like the sound of a massive, sentient machine waking up.â€ Another example was to provide an audio clip of a person saying a short sentence and asking to change the tone from calm to angry.Â

According to NVIDIA, Fugatto builds on the research teamâ€™s previous work in areas like speech modeling, audio vocoding, and audio understanding.

It was developed by a diverse group of researchers around the world â€” including India, Brazil, China, Jordan, and South Korea â€” which NVIDIA says makes the modelâ€™s multi-accent and multilingual capabilities better. According to the team, one of the hardest challenges in building Fugatto was â€œgenerating a blended dataset that contains millions of audio samples used for training.â€ To achieve this, the team used a strategy in which they generated data and instructions that expanded the range of tasks the model could perform, which improves performance and also allows it to take on new tasks without needing additional data.

The team also meticulously studied existing datasets to try to uncover any potential new relationships among the data.Â

According to NVIDIA, during inference the model uses a technique called ComposableART, which allows them to combine instructions that during training were only seen separately. For instance, a prompt could ask for an audio snippet spoken in a sad tone in a French accent.Â

â€œI wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one,â€ said Rohan Badlani, one of the AI researchers who built Fugatto.

The model can also generate sounds that can change over time, such as a thunderstorm moving through an area. It can also generate soundscapes of sounds it hasnâ€™t heard together during training, like a thunderstorm transitioning into birds singing in the morning.Â

â€œFugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale,â€ said Rafael Valle, manager of applied audio research at NVIDIA and another member of the research team that developed the model.Â

The post NVIDIA unveils new AI model for generating audio appeared first on SD Times.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

I saw every Samsung QLED TV releasing in 2025 – these standout features had me hooked

Xbox Cloud Gaming seems to now support early access games, starting with South of Midnight

GameSir just showed off its G7 Pro “Xbox Elite” controller, and it looksspectacular

6 reasons why I think Microsoft should keep the ‘local account’ option in Windows 11

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Feature Flags with Laravel Pennant

Microsoft launches new Copilot app on Windows 11 with o3 reasoning, screenshots tool

Microsoft launches new Copilot app on Windows 11 with o3 reasoning, screenshots tool

Xbox Cloud Gaming seems to now support early access games, starting with South of Midnight

GameSir just showed off its G7 Pro “Xbox Elite” controller, and it looksspectacular

NVIDIA unveils new AI model for generating audio

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

Hugging Face & FriendliAI settle zero-dollar patent infringement lawsuit

This ultraportable LG tablet that runs on WebOS is my favorite TV at CES 2025

Fine-tune LLMs with synthetic data for context-based Q&A using Amazon Bedrock

China-Backed Hackers Exploit Fortinet Flaw, Infecting 20,000 Systems Globally

Google’s new ‘Ask For Me’ AI tool calls businesses to get your questions answered

Private and Personalized Frequency Estimation in a Federated Setting

North Korean Hackers Target Freelance Developers in Job Scam to Deploy Malware

The Biggest Secret of How Tech Billionaires Are Getting Really Rich!

NVIDIA unveils new AI model for generating audio

Related Posts