Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 31, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 31, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 31, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 31, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025

      How to install SteamOS on ROG Ally and Legion Go Windows gaming handhelds

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025
      Recent

      Oracle Fusion new Product Management Landing Page and AI (25B)

      May 31, 2025

      Filament Is Now Running Natively on Mobile

      May 31, 2025

      How Remix is shaking things up

      May 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025
      Recent

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025

      I love Elden Ring Nightreign’s weirdest boss — he bargains with you, heals you, and throws tantrums if you ruin his meditation

      May 31, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models that Incentivize Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models that Incentivize Reasoning Capability in LLMs via Reinforcement Learning

    January 21, 2025

    Large Language Models (LLMs) have made significant progress in natural language processing, excelling in tasks like understanding, generation, and reasoning. However, challenges remain. Achieving robust reasoning often requires extensive supervised fine-tuning, which limits scalability and generalization. Furthermore, issues like poor readability and balancing computational efficiency with reasoning complexity persist, prompting researchers to explore new approaches.

    DeepSeek-R1: A New Approach to LLM Reasoning

    DeepSeek-AI’s recent work introduces DeepSeek-R1, a model designed to enhance reasoning capabilities through reinforcement learning (RL). This effort resulted in two models:

    • DeepSeek-R1-Zero, which is trained solely with RL and demonstrates emergent reasoning behaviors such as long Chain-of-Thought (CoT) reasoning.
    • DeepSeek-R1, which builds on its predecessor by incorporating a multi-stage training pipeline, addressing challenges like readability and language mixing while maintaining high reasoning performance.

    These models aim to overcome existing limitations, combining innovative RL techniques with structured training processes to achieve scalability and usability.

    Technical Innovations and Benefits

    1. Reinforcement Learning on Reasoning Tasks: DeepSeek-R1-Zero employs RL without relying on supervised data. Using Group Relative Policy Optimization (GRPO), it optimizes reasoning by evaluating multiple outputs, significantly improving benchmark performance. For example, its AIME 2024 pass@1 score rose from 15.6% to 71.0% during training.

    2. Multi-Stage Training in DeepSeek-R1: DeepSeek-R1 incorporates cold-start data—thousands of curated CoT examples—to fine-tune its base model before undergoing reasoning-focused RL. This process ensures outputs are both coherent and user-friendly by incorporating language consistency rewards.

    3. Distillation for Smaller Models: To address computational constraints, DeepSeek-AI distilled six smaller models (1.5B to 70B parameters) from DeepSeek-R1 using Qwen and Llama architectures. These models retain strong reasoning capabilities, with the 14B distilled model achieving a pass@1 score of 69.7% on AIME 2024, outperforming some larger models.

    Results: Performance Insights

    DeepSeek-R1’s performance is supported by benchmark results:

    • Reasoning Benchmarks:
      • AIME 2024: 79.8% pass@1, surpassing OpenAI’s o1-mini.
      • MATH-500: 97.3% pass@1, comparable to OpenAI-o1-1217.
      • GPQA Diamond: 71.5% pass@1, excelling in fact-based reasoning.
    • Coding and STEM Tasks:
      • Codeforces Elo rating: 2029, outperforming 96.3% of human participants.
      • SWE-Bench Verified: 49.2% resolution rate, competitive with other leading models.
    • General Capabilities:
      • Strong generalization was demonstrated on ArenaHard and AlpacaEval 2.0 benchmarks, achieving 92.3% and 87.6% win rates, respectively.

    Distilled Model Highlights: Smaller models like DeepSeek-R1-Distill-Qwen-32B show strong performance, with a pass@1 score of 72.6% on AIME 2024, demonstrating effective scalability and practicality.

    Conclusion: Refining Reasoning in AI

    DeepSeek-AI’s DeepSeek-R1 and DeepSeek-R1-Zero represent meaningful advancements in reasoning capabilities for LLMs. By leveraging RL, cold-start data, and distillation techniques, these models address critical limitations while promoting accessibility through open-source availability under the MIT License. The API (‘model=deepseek-reasoner’) further enhances usability for developers and researchers.

    Looking ahead, DeepSeek-AI plans to refine multilingual support, enhance software engineering capabilities, and improve prompt sensitivity. These efforts aim to further establish DeepSeek-R1 as a robust solution for reasoning-focused AI applications. By integrating thoughtful training paradigms, DeepSeek-R1 illustrates how AI can advance toward addressing increasingly complex challenges.


    Check out the Paper, DeepSeek R1 and DeepSeek R1 Zero. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

    🚨 [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

    The post DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models that Incentivize Reasoning Capability in LLMs via Reinforcement Learning appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous Articletimer-cli – countdown timer
    Next Article Generative AI versus Predictive AI

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 31, 2025
    Machine Learning

    Cisco’s Latest AI Agents Report Details the Transformative Impact of Agentic AI on Customer Experience

    May 31, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    AI-Powered SaaS Security: Keeping Pace with an Expanding Attack Surface

    Development

    State Spies Exploited Cisco Zero-Days to Intrude Government Networks

    Development

    Someone made Elon’s ‘map tab’ and if it were real, Path of Exile 2 could make a lot of cash

    News & Updates

    Microsoft is rolling out “Think Deeper” to free Copilot, and results are insane

    Operating Systems

    Highlights

    Most UK Software Buyers Regret Their Purchases Because of Hidden Costs, Research Finds

    February 5, 2025

    Hidden fees, onboarding issues, and poor planning lead to software regret for UK buyers. Discover…

    Top 7 Business Benefits of ISO 20022 Adoption for Banks

    December 17, 2024

    How to investigate the online vs offline performance for DNN models

    December 20, 2024

    Neha Pasi Leads Global Development for Perficient’s Sitecore Team with Precision and Passion

    January 13, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.