Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 14, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 14, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 14, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 14, 2025

      I test a lot of AI coding tools, and this stunning new OpenAI release just saved me days of work

      May 14, 2025

      How to use your Android phone as a webcam when your laptop’s default won’t cut it

      May 14, 2025

      The 5 most customizable Linux desktop environments – when you want it your way

      May 14, 2025

      Gen AI use at work saps our motivation even as it boosts productivity, new research shows

      May 14, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025
      Recent

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025

      Perficient’s “What If? So What?” Podcast Wins Gold at the 2025 Hermes Creative Awards

      May 14, 2025

      PIM for Azure Resources

      May 14, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025
      Recent

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025

      You can now share an app/browser window with Copilot Vision to help you with different tasks

      May 14, 2025

      Microsoft will gradually retire SharePoint Alerts over the next two years

      May 14, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»DeepSeek-AI Open-Sources DeepSeek-Prover-V1.5: A Language Model with 7 Billion Parameters that Outperforms all Open-Source Models in Formal Theorem Proving in Lean 4

    DeepSeek-AI Open-Sources DeepSeek-Prover-V1.5: A Language Model with 7 Billion Parameters that Outperforms all Open-Source Models in Formal Theorem Proving in Lean 4

    August 17, 2024

    Large language models (LLMs) have made significant strides in mathematical reasoning and theorem proving, yet they face considerable challenges in formal theorem proving using systems like Lean and Isabelle. These systems demand rigorous derivations that adhere to strict formal specifications, posing difficulties even for advanced models such as GPT-4. The core challenge lies in the model’s need to simultaneously comprehend the syntax and semantics of formal systems while aligning abstract mathematical reasoning with precise formal representations. This complex task requires a deep understanding of coding intricacies and mathematical concepts, creating a significant hurdle for current AI systems in producing complex formal proofs.

    Researchers from DeepSeek-AI introduced DeepSeek-Prover-V1.5, a unified approach that combines the strengths of proof-step and whole-proof generation techniques through a robust truncate-and-resume mechanism. This method begins with whole-proof generation, where the language model produces complete proof code based on the theorem statement. The Lean prover then verifies this code. If an error is detected, the code is truncated at the first error message, and the successfully generated portion serves as a prompt for the next proof segment. The latest state from the Lean 4 prover is appended as a comment to the prompt to enhance accuracy. The truncate-and-resume mechanism is integrated into the Monte-Carlo tree search (MCTS), allowing for flexible truncation points determined by the tree search policy. Also, a reward-free exploration algorithm is proposed to address the reward sparsity issue in proof search, assigning intrinsic motivation to the tree search agent for extensive exploration of the tactic state space.

    This study presents the following contributions:

    • Pre-Training: Enhanced base model with further training on mathematics and code data, focusing on formal languages like Lean, Isabelle, and Metamath.

    • Supervised Fine-Tuning: Improved Lean 4 code completion dataset through two data augmentation techniques:

      1. Used DeepSeek-Coder V2 236B to add natural language chain-of-thought comments.

      2. Inserted intermediate tactic state information within Lean 4 proof code.

    • Reinforcement Learning: Employed GRPO algorithm for reinforcement learning from proof assistant feedback (RLPAF), using Lean prover verification results as rewards.

    • Monte-Carlo Tree Search: Advanced tree search method with:

     1. Truncate-and-resume mechanism as state-action abstraction.

     2. RMaxTS algorithm, utilizing RMax strategy for exploration in sparse-reward proof search.

     3. Assigned intrinsic rewards to encourage diverse planning paths and extensive proof space exploration.

    DeepSeek-Prover-V1.5 demonstrates significant advancements in formal theorem proving across multiple benchmarks. On the miniF2F-test dataset, DeepSeek-Prover-V1.5-RL achieved a 60.2% pass rate in a single-pass whole-proof generation, marking a 10.2 percentage point improvement over its predecessor. With a limited sampling budget of 128 attempts, it proved 51.6% of problems, outperforming other whole-proof generation methods and matching leading tree search methods. When enhanced with RMaxTS tree search, DeepSeek-Prover-V1.5-RL achieved a state-of-the-art 62.7% pass rate. Also, it surpassed the previous best result with significantly fewer samplings. On the ProofNet dataset, DeepSeek-Prover-V1.5-RL achieved pass rates of 22.6% and 25.3% in single-pass and RMaxTS-enhanced settings respectively, outperforming existing methods. These results demonstrate DeepSeek-Prover-V1.5’s superior performance across different theorem-proving tasks and methodologies.

    DeepSeek-Prover-V1.5, a 7 billion parameter language model, sets new benchmarks in formal theorem proving using Lean 4. Built on DeepSeek-Prover-V1.5-Base, it undergoes specialized pre-training, comprehensive supervised fine-tuning, and reinforcement learning via GRPO. The model incorporates RMaxTS, an innovative Monte-Carlo tree search variant, to enhance problem-solving through extensive exploration. This framework establishes an AlphaZero-like pipeline for formal theorem proving, utilizing expert iteration and synthetic data. While the current focus is on exploration, future developments may include a critic model for assessing incomplete proofs, addressing the exploitation aspect of reinforcement learning in theorem proving.

    Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    The post DeepSeek-AI Open-Sources DeepSeek-Prover-V1.5: A Language Model with 7 Billion Parameters that Outperforms all Open-Source Models in Formal Theorem Proving in Lean 4 appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMomento Migrates Object Cache as a Service to Ampere Altra
    Next Article Marqo Releases Marqo-FashionCLIP and Marqo-FashionSigLIP: A Family of Embedding Models for E-Commerce and Retail

    Related Posts

    Machine Learning

    Georgia Tech and Stanford Researchers Introduce MLE-Dojo: A Gym-Style Framework Designed for Training, Evaluating, and Benchmarking Autonomous Machine Learning Engineering (MLE) Agents

    May 15, 2025
    Machine Learning

    A Step-by-Step Guide to Build an Automated Knowledge Graph Pipeline Using LangGraph and NetworkX

    May 15, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Mastering Buttons in CSS

    Web Development

    Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

    Databases

    Build a contextual text and image search engine for product recommendations using Amazon Bedrock and Amazon OpenSearch Serverless

    Development

    I used ChatGPT to translate image text when Google’s tool failed me – and things got weird

    News & Updates

    Highlights

    The Perfect Colour Palette For Your Website

    November 10, 2024

    In a competitive online world, your website must boast a color palette that appeals to…

    How to Create a Gooey Search Interaction with Framer Motion and React

    November 18, 2024
    “It’s literally tens ofmillions of hours.” Xbox CEO Phil Spencer celebrates Xbox Cloud Gaming’s “dramatic growth,” now with per-device usage charts.

    “It’s literally tens ofmillions of hours.” Xbox CEO Phil Spencer celebrates Xbox Cloud Gaming’s “dramatic growth,” now with per-device usage charts.

    April 19, 2025

    Checkbox screen reader announcement

    November 24, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.