Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 2, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 2, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 2, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 2, 2025

      How Red Hat just quietly, radically transformed enterprise server Linux

      June 2, 2025

      OpenAI wants ChatGPT to be your ‘super assistant’ – what that means

      June 2, 2025

      The best Linux VPNs of 2025: Expert tested and reviewed

      June 2, 2025

      One of my favorite gaming PCs is 60% off right now

      June 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      `document.currentScript` is more useful than I thought.

      June 2, 2025
      Recent

      `document.currentScript` is more useful than I thought.

      June 2, 2025

      Adobe Sensei and GenAI in Practice for Enterprise CMS

      June 2, 2025

      Over The Air Updates for React Native Apps

      June 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025
      Recent

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025

      Microsoft says Copilot can use location to change Outlook’s UI on Android

      June 2, 2025

      TempoMail — Command Line Temporary Email in Linux

      June 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Qwen AI Introduces Qwen2.5-Max: A large MoE LLM Pretrained on Massive Data and Post-Trained with Curated SFT and RLHF Recipes

    Qwen AI Introduces Qwen2.5-Max: A large MoE LLM Pretrained on Massive Data and Post-Trained with Curated SFT and RLHF Recipes

    January 29, 2025

    The field of artificial intelligence is evolving rapidly, with increasing efforts to develop more capable and efficient language models. However, scaling these models comes with challenges, particularly regarding computational resources and the complexity of training. The research community is still exploring best practices for scaling extremely large models, whether they use a dense or Mixture-of-Experts (MoE) architecture. Until recently, many details about this process were not widely shared, making it difficult to refine and improve large-scale AI systems.

    Qwen AI aims to address these challenges with Qwen2.5-Max, a large MoE model pretrained on over 20 trillion tokens and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). This approach fine-tunes the model to better align with human expectations while maintaining efficiency in scaling.

    Technically, Qwen2.5-Max utilizes a Mixture-of-Experts architecture, allowing it to activate only a subset of its parameters during inference. This optimizes computational efficiency while maintaining performance. The extensive pretraining phase provides a strong foundation of knowledge, while SFT and RLHF refine the model’s ability to generate coherent and relevant responses. These techniques help improve the model’s reasoning and usability across various applications.

    Qwen2.5-Max has been evaluated against leading models on benchmarks such as MMLU-Pro, LiveCodeBench, LiveBench, and Arena-Hard. The results suggest it performs competitively, surpassing DeepSeek V3 in tests like Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond. Its performance on MMLU-Pro is also strong, highlighting its capabilities in knowledge retrieval, coding tasks, and broader AI applications.

    In summary, Qwen2.5-Max presents a thoughtful approach to scaling language models while maintaining efficiency and performance. By leveraging a MoE architecture and strategic post-training methods, it addresses key challenges in AI model development. As AI research progresses, models like Qwen2.5-Max demonstrate how thoughtful data use and training techniques can lead to more capable and reliable AI systems.


    Check out the Demo on Hugging Face, and Technical Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

    🚨 [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

    The post Qwen AI Introduces Qwen2.5-Max: A large MoE LLM Pretrained on Massive Data and Post-Trained with Curated SFT and RLHF Recipes appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleRilasciata KaOS 2025.01: Novità e Aggiornamenti
    Next Article Qwen AI Releases Qwen2.5-VL: A Powerful Vision-Language Model for Seamless Computer Interaction

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 2, 2025
    Machine Learning

    MiMo-VL-7B: A Powerful Vision-Language Model to Enhance General Visual Understanding and Multimodal Reasoning

    June 2, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    I want to write Java code in Jmeter to copy current date in a variable and use this variable in next request. Where should I write it?

    Development

    How Habby enhanced resiliency and system robustness using Valkey GLIDE and Amazon ElastiCache

    Databases

    CVE-2025-4450 – D-Link DIR-619L Remote Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    How to Combat AI Bot Traffic on Your Website

    Web Development

    Highlights

    CVE-2025-43845 – VITS Voice Changing Framework Remote Code Injection Vulnerability

    May 5, 2025

    CVE ID : CVE-2025-43845

    Published : May 5, 2025, 6:15 p.m. | 36 minutes ago

    Description : Retrieval-based-Voice-Conversion-WebUI is a voice changing framework based on VITS. Versions 2.2.231006 and prior are vulnerable to code injection. The ckpt_path2 variable takes user input (e.g. a path to a model) and passes it to change_info_ function, which opens and reads the file on the given path (except it changes the final on the path to train.log), and passes the contents of the file to eval, which can lead to remote code execution. As of time of publication, no known patches exist.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-3951 – WordPress WP-Optimize SQL Injection Vulnerability

    June 2, 2025

    No Power Outage, Just a Data One: Nova Scotia Hit by Ransomware Surge

    May 26, 2025

    FinTech Software Development

    January 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.