Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 4, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 4, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 4, 2025

      Smashing Animations Part 4: Optimising SVGs

      June 4, 2025

      I test AI tools for a living. Here are 3 image generators I actually use and how

      June 4, 2025

      The world’s smallest 65W USB-C charger is my latest travel essential

      June 4, 2025

      This Spotlight alternative for Mac is my secret weapon for AI-powered search

      June 4, 2025

      Tech prophet Mary Meeker just dropped a massive report on AI trends – here’s your TL;DR

      June 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025
      Recent

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025

      Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

      June 4, 2025

      Cast Model Properties to a Uri Instance in 12.17

      June 4, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025
      Recent

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025

      Rilasciata /e/OS 3.0: Nuova Vita per Android Senza Google, Più Privacy e Controllo per l’Utente

      June 4, 2025

      Rilasciata Oracle Linux 9.6: Scopri le Novità e i Miglioramenti nella Sicurezza e nelle Prestazioni

      June 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»YuE: An Open-Source Music Generation AI Model Family Capable of Creating Full-Length Songs with Coherent Vocals, Instrumental Harmony, and Multi-Genre Creativity

    YuE: An Open-Source Music Generation AI Model Family Capable of Creating Full-Length Songs with Coherent Vocals, Instrumental Harmony, and Multi-Genre Creativity

    January 30, 2025

    Significant progress has been made in short-form instrumental compositions in AI and music generation. However, creating full songs with lyrics, vocals, and instrumental accompaniment is still challenging for existing models. Generating a full-length song from lyrics poses several challenges. The music is long, requiring AI models to maintain consistency and coherence over several minutes. The music incorporates intricate harmonic structures, instrumentation, and rhythmic patterns rather than speech or sound effects. AI-generated lyrics often suffer from incoherence when merged with musical elements, and paired lyrics-audio datasets are scarce for effectively training AI models.

    This is where YuE, an open-source foundation model family by the Multimodal Art Projection team, emerges, rivaling Suno AI in song generation. These models are designed to create full-length songs lasting several minutes, from lyrics with capabilities to vary background music, genre, and lyrics. The model family comes with different variants with parameters up to 7 billion. Some of the models of the YuE series on Hugging Face include:

    • YuE-s1-7B-anneal-en-cot 
    • YuE-s1-7B-anneal-en-icl 
    • YuE-s1-7B-anneal-jp-kr-cot
    • YuE-s1-7B-anneal-jp-kr-icl  
    • YuE-s1-7B-anneal-zh-cot  
    • YuE-s1-7B-anneal-zh-icl
    • YuE-s2-1B-general
    • YuE-upsampler

    YuE employs advanced techniques to tackle the challenges of full-length song generation, leveraging the LLaMA family of language models for an enhanced lyrics-to-song generation process. A core advancement is its dual-token technique, which enables synchronized vocal and instrumental modeling without modifying the fundamental LLaMA architecture. This ensures that the vocal and instrumental elements are harmonious throughout the generated song. Also, YuE incorporates a powerful audio tokenizer, which reduces training costs and accelerates convergence. This ensures that the generated audio maintains musical integrity while optimizing computational efficiency.

    Another unique technique used in YuE is Lyrics-Chain-of-Thoughts (Lyrics-CoT), which allows the model to generate lyrics progressively in a structured manner, ensuring that the lyrical content remains consistent and meaningful throughout the song. YuE also follows a structured three-stage training scheme, which enhances scalability, musicality, and lyric control. This structured training ensures that the model can generate songs of varying lengths and complexities, improves the natural feel of the generated music, and enhances the alignment between the generated lyrics and the overall song structure.

    Image Source

    YuE stands out from prior AI-based music generation models because it can generate full-length songs incorporating vocal melodies and instrumental accompaniment. Unlike existing models that struggle with long-form compositions, YuE maintains musical coherence throughout an entire song. The generated vocals follow natural singing patterns and tonal shifts, engaging the music. At the same time, the instrumental elements are carefully aligned with the vocal track, producing a natural and balanced song. The model family also supports multiple musical genres and languages.

    When it comes to using it, YuE models are designed to run on high-performance GPUs for seamless full-song generation. At least 80GB GPU memory (e.g., NVIDIA A100) is recommended for best results. Depending on the GPU used, a 30-second segment typically takes 150-360 seconds. Users can leverage the Hugging Face Transformers library to generate music using YuE. The model also supports Music In-Context Learning (ICL), allowing users to provide a reference song so the AI can generate new music similarly.

    YuE is released under a Creative Commons Attribution Non-Commercial 4.0 License. It encourages artists and content creators to sample, modify, and incorporate its outputs into their works while crediting the model as YuE by HKUST/M-A-P. YuE opens the door to numerous applications in AI-generated music. It can assist musicians and composers in generating song ideas and full-length compositions, create soundtracks for films, video games, and virtual content, generate customized songs based on user-provided lyrics or themes, and aid music education by demonstrating AI-generated compositions across various styles and languages.

    In conclusion, YuE represents a breakthrough in AI-powered music generation, addressing the long-standing challenges of lyrics-to-song conversion. With its advanced techniques, scalable architecture, and open-source approach, YuE is set to redefine the landscape of AI-driven music production. As further enhancements and community contributions emerge, YuE has the potential to become the leading foundation model for full-song generation.


    Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

    🚨 Meet IntellAgent: An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System (Promoted)

    The post YuE: An Open-Source Music Generation AI Model Family Capable of Creating Full-Length Songs with Coherent Vocals, Instrumental Harmony, and Multi-Genre Creativity appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleQuantization Space Utilization Rate (QSUR): A Novel Post-Training Quantization Method Designed to Enhance the Efficiency of Large Language Models (LLMs)
    Next Article Creating An AI Agent-Based System with LangGraph: A Beginner’s Guide

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 4, 2025
    Machine Learning

    A Coding Implementation to Build an Advanced Web Intelligence Agent with Tavily and Gemini AI

    June 4, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    5 BCDR Essentials for Effective Ransomware Defense

    Development

    CVE-2023-50338 – Apache HTTP Server SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    Cyber Espionage Alert: LilacSquid Targets IT, Energy, and Pharma Sectors

    Development

    CVE-2025-4512 – Inetum IODAS Cross-Site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Development

    India’s Banking Sector Tightens Cybersecurity as DFS Calls for Stronger Digital Defenses

    November 6, 2024

    To enhance India’s financial stability and cyber resilience, M. Nagaraju, Secretary of the Department of…

    CVE-2025-43946 – TCPWave DDI Remote Code Execution Vulnerability

    April 22, 2025

    https://viralstyle.com/6T9/the-party-never-ends-juice-wrld-merch

    December 1, 2024

    Learn How to Display WordPress Custom Field Data With Blocks

    May 5, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.