Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 2, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 2, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 2, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 2, 2025

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025

      I’ve fallen hard for Starsand Island, a promising anime-style life sim bringing Ghibli vibes to Xbox and PC later this year

      June 2, 2025

      This new official Xbox 4TB storage card costs almost as much as the Xbox SeriesXitself

      June 2, 2025

      I may have found the ultimate monitor for conferencing and productivity, but it has a few weaknesses

      June 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      May report 2025

      June 2, 2025
      Recent

      May report 2025

      June 2, 2025

      Write more reliable JavaScript with optional chaining

      June 2, 2025

      Deploying a Scalable Next.js App on Vercel – A Step-by-Step Guide

      June 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025
      Recent

      The Alters: Release date, mechanics, and everything else you need to know

      June 2, 2025

      I’ve fallen hard for Starsand Island, a promising anime-style life sim bringing Ghibli vibes to Xbox and PC later this year

      June 2, 2025

      This new official Xbox 4TB storage card costs almost as much as the Xbox SeriesXitself

      June 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models that Incentivize Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models that Incentivize Reasoning Capability in LLMs via Reinforcement Learning

    January 21, 2025

    Large Language Models (LLMs) have made significant progress in natural language processing, excelling in tasks like understanding, generation, and reasoning. However, challenges remain. Achieving robust reasoning often requires extensive supervised fine-tuning, which limits scalability and generalization. Furthermore, issues like poor readability and balancing computational efficiency with reasoning complexity persist, prompting researchers to explore new approaches.

    DeepSeek-R1: A New Approach to LLM Reasoning

    DeepSeek-AI’s recent work introduces DeepSeek-R1, a model designed to enhance reasoning capabilities through reinforcement learning (RL). This effort resulted in two models:

    • DeepSeek-R1-Zero, which is trained solely with RL and demonstrates emergent reasoning behaviors such as long Chain-of-Thought (CoT) reasoning.
    • DeepSeek-R1, which builds on its predecessor by incorporating a multi-stage training pipeline, addressing challenges like readability and language mixing while maintaining high reasoning performance.

    These models aim to overcome existing limitations, combining innovative RL techniques with structured training processes to achieve scalability and usability.

    Technical Innovations and Benefits

    1. Reinforcement Learning on Reasoning Tasks: DeepSeek-R1-Zero employs RL without relying on supervised data. Using Group Relative Policy Optimization (GRPO), it optimizes reasoning by evaluating multiple outputs, significantly improving benchmark performance. For example, its AIME 2024 pass@1 score rose from 15.6% to 71.0% during training.

    2. Multi-Stage Training in DeepSeek-R1: DeepSeek-R1 incorporates cold-start data—thousands of curated CoT examples—to fine-tune its base model before undergoing reasoning-focused RL. This process ensures outputs are both coherent and user-friendly by incorporating language consistency rewards.

    3. Distillation for Smaller Models: To address computational constraints, DeepSeek-AI distilled six smaller models (1.5B to 70B parameters) from DeepSeek-R1 using Qwen and Llama architectures. These models retain strong reasoning capabilities, with the 14B distilled model achieving a pass@1 score of 69.7% on AIME 2024, outperforming some larger models.

    Results: Performance Insights

    DeepSeek-R1’s performance is supported by benchmark results:

    Hostinger
    • Reasoning Benchmarks:
      • AIME 2024: 79.8% pass@1, surpassing OpenAI’s o1-mini.
      • MATH-500: 97.3% pass@1, comparable to OpenAI-o1-1217.
      • GPQA Diamond: 71.5% pass@1, excelling in fact-based reasoning.
    • Coding and STEM Tasks:
      • Codeforces Elo rating: 2029, outperforming 96.3% of human participants.
      • SWE-Bench Verified: 49.2% resolution rate, competitive with other leading models.
    • General Capabilities:
      • Strong generalization was demonstrated on ArenaHard and AlpacaEval 2.0 benchmarks, achieving 92.3% and 87.6% win rates, respectively.

    Distilled Model Highlights: Smaller models like DeepSeek-R1-Distill-Qwen-32B show strong performance, with a pass@1 score of 72.6% on AIME 2024, demonstrating effective scalability and practicality.

    Conclusion: Refining Reasoning in AI

    DeepSeek-AI’s DeepSeek-R1 and DeepSeek-R1-Zero represent meaningful advancements in reasoning capabilities for LLMs. By leveraging RL, cold-start data, and distillation techniques, these models address critical limitations while promoting accessibility through open-source availability under the MIT License. The API (‘model=deepseek-reasoner’) further enhances usability for developers and researchers.

    Looking ahead, DeepSeek-AI plans to refine multilingual support, enhance software engineering capabilities, and improve prompt sensitivity. These efforts aim to further establish DeepSeek-R1 as a robust solution for reasoning-focused AI applications. By integrating thoughtful training paradigms, DeepSeek-R1 illustrates how AI can advance toward addressing increasingly complex challenges.


    Check out the Paper, DeepSeek R1 and DeepSeek R1 Zero. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

    🚨 [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

    The post DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models that Incentivize Reasoning Capability in LLMs via Reinforcement Learning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous Articletimer-cli – countdown timer
    Next Article Generative AI versus Predictive AI

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 2, 2025
    Machine Learning

    Off-Policy Reinforcement Learning RL with KL Divergence Yields Superior Reasoning in Large Language Models

    June 2, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    6 Best Procure-to-Pay Software in 2024

    Artificial Intelligence

    15 Best New Fonts, August 2024

    Development

    This clever Kindle trick lets you download 25 books at once – but it’s risky

    News & Updates

    Guess the hand game (Gol ya Pooch)

    Development

    Highlights

    Top misconceptions about platform engineering (and what to do about them)

    December 26, 2024

    While it’s often said that “time is money” when it comes to business, that phrase…

    CVE-2025-3223 – GE Vernova WorkstationST Path Traversal Vulnerability

    May 19, 2025

    Microsoft Addresses Entra ID Token Logging Issue, Alerts to Protect Users

    April 21, 2025

    Fine-Tuning an Open-Source LLM with Axolotl Using Direct Preference Optimization (DPO)

    December 7, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.