Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 4, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 4, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 4, 2025

      Smashing Animations Part 4: Optimising SVGs

      June 4, 2025

      I test AI tools for a living. Here are 3 image generators I actually use and how

      June 4, 2025

      The world’s smallest 65W USB-C charger is my latest travel essential

      June 4, 2025

      This Spotlight alternative for Mac is my secret weapon for AI-powered search

      June 4, 2025

      Tech prophet Mary Meeker just dropped a massive report on AI trends – here’s your TL;DR

      June 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025
      Recent

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025

      Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

      June 4, 2025

      Cast Model Properties to a Uri Instance in 12.17

      June 4, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025
      Recent

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025

      Rilasciata /e/OS 3.0: Nuova Vita per Android Senza Google, Più Privacy e Controllo per l’Utente

      June 4, 2025

      Rilasciata Oracle Linux 9.6: Scopri le Novità e i Miglioramenti nella Sicurezza e nelle Prestazioni

      June 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Qwen AI Releases Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M: Allowing Deployment with Context Length up to 1M Tokens

    Qwen AI Releases Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M: Allowing Deployment with Context Length up to 1M Tokens

    January 27, 2025

    The advancements in large language models (LLMs) have significantly enhanced natural language processing (NLP), enabling capabilities like contextual understanding, code generation, and reasoning. However, a key limitation persists: the restricted context window size. Most LLMs can only process a fixed amount of text, typically up to 128K tokens, which limits their ability to handle tasks requiring extensive context, such as analyzing lengthy documents or debugging large codebases. These constraints often necessitate workarounds like text chunking, increasing computational complexity. Overcoming these challenges requires models that can extend context lengths efficiently without compromising performance.

    Qwen AI’s Latest Release

    Qwen AI has introduced two new models, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, designed to support context lengths of up to 1 million tokens. Developed by the Qwen team at Alibaba Group, these models also come with an open-sourced inference framework optimized for handling long contexts. This advancement enables developers and researchers to work with larger datasets in a single pass, offering a practical solution for applications that demand extended context processing. Additionally, the models feature improvements in sparse attention mechanisms and kernel optimization, resulting in faster processing times for extended inputs.

    Technical Details and Benefits

    The Qwen2.5-1M series retains a Transformer-based architecture, incorporating features like Grouped Query Attention (GQA), Rotary Positional Embeddings (RoPE), and RMSNorm for stability over long contexts. Training involved both natural and synthetic datasets, with tasks like Fill-in-the-Middle (FIM), paragraph reordering, and position-based retrieval enhancing the model’s ability to handle long-range dependencies. Sparse attention methods such as Dual Chunk Attention (DCA) allow for efficient inference by dividing sequences into manageable chunks. Progressive pre-training strategies, which gradually scale context lengths from 4K to 1M tokens, optimize efficiency while controlling computational demands. The models are fully compatible with vLLM’s open-source inference framework, simplifying integration for developers.

    Results and Insights

    Benchmark results demonstrate the capabilities of the Qwen2.5-1M models. In the Passkey Retrieval Test, the 7B and 14B variants successfully retrieved hidden information from 1 million tokens, showcasing their effectiveness in long-context scenarios. In other benchmarks, including RULER and Needle in a Haystack (NIAH), the 14B model outperformed alternatives like GPT-4o-mini and Llama-3. Sparse attention techniques contributed to reduced inference times, achieving speedups of up to 6.7x on Nvidia H20 GPUs. These results highlight the models’ ability to combine efficiency with high performance, making them suitable for real-world applications requiring extensive context.

    Conclusion

    The Qwen2.5-1M series addresses critical limitations in NLP by significantly extending context lengths while maintaining efficiency and accessibility. By overcoming constraints that have long hindered LLMs, these models open new possibilities for applications ranging from analyzing large datasets to processing entire code repositories. With innovations in sparse attention, kernel optimization, and long-context pre-training, Qwen2.5-1M offers a practical and effective tool for tackling complex, context-heavy tasks.


    Check out the Paper, Models on Hugging Face and Technical Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

    🚨 [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

    The post Qwen AI Releases Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M: Allowing Deployment with Context Length up to 1M Tokens appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHAC++: Revolutionizing 3D Gaussian Splatting Through Advanced Compression Techniques
    Next Article Empowering Health: Your Guide to Wellness

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 4, 2025
    Machine Learning

    A Coding Implementation to Build an Advanced Web Intelligence Agent with Tavily and Gemini AI

    June 4, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 21/2025

    Linux

    Cornell University Researchers Introduce Reinforcement Learning for Consistency Models for Efficient Training and Inference in Text-to-Image Generation

    Development

    Node.js Malware Campaign Targets Crypto Users with Fake Binance and TradingView Installers

    Development

    CVE-2024-46546 – NEXTU FLETA AX1500 WIFI6 Router Stack Overflow Denial of Service

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    The best earbuds for audiophiles I’ve tested are $80 off right now

    March 26, 2025

    The Bowers and Wilkins Pi8 earbuds exude luxury while impressing audiophiles with crisp audio and…

    April 2025: All AI updates from the past month

    May 2, 2025

    Recensione: “Linux Riga di Comando – 100 comandi che ogni amministratore deve conoscere” di Paul Olushile

    March 24, 2025

    Heterogenous data sources: Access your data in PostgreSQL from Amazon RDS for Oracle using Oracle Database Gateway

    November 11, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.