Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Upwork Freelancers vs Dedicated React.js Teams: What’s Better for Your Project in 2025?

      August 1, 2025

      Is Agile dead in the age of AI?

      August 1, 2025

      Top 15 Enterprise Use Cases That Justify Hiring Node.js Developers in 2025

      July 31, 2025

      The Core Model: Start FROM The Answer, Not WITH The Solution

      July 31, 2025

      Anthropic beats OpenAI as the top LLM provider for business – and it’s not even close

      August 2, 2025

      I bought Samsung’s Galaxy Watch Ultra 2025 – here’s why I have buyer’s remorse

      August 2, 2025

      I can admit when I’m wrong — this 75% wireless gaming keyboard is way better than I thought it would be

      August 2, 2025

      This is Microsoft’s canceled Windows-based Surface Duo — the dual-screen Windows Phone from 2018 that we never got

      August 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The details of TC39’s last meeting

      August 2, 2025
      Recent

      The details of TC39’s last meeting

      August 2, 2025

      Enhancing Laravel Queries with Reusable Scope Patterns

      August 1, 2025

      Everything We Know About Livewire 4

      August 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I can admit when I’m wrong — this 75% wireless gaming keyboard is way better than I thought it would be

      August 2, 2025
      Recent

      I can admit when I’m wrong — this 75% wireless gaming keyboard is way better than I thought it would be

      August 2, 2025

      This is Microsoft’s canceled Windows-based Surface Duo — the dual-screen Windows Phone from 2018 that we never got

      August 2, 2025

      Looking for an Ubuntu Manual? Try This Book

      August 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»DeepSeek-AI Released DeepSeek-Prover-V2: An Open-Source Large Language Model Designed for Formal Theorem, Proving through Subgoal Decomposition and Reinforcement Learning

    DeepSeek-AI Released DeepSeek-Prover-V2: An Open-Source Large Language Model Designed for Formal Theorem, Proving through Subgoal Decomposition and Reinforcement Learning

    May 1, 2025

    Formal mathematical reasoning has evolved into a specialized subfield of artificial intelligence that requires strict logical consistency. Unlike informal problem solving, which allows for intuition and loosely defined heuristics, formal theorem proving relies on every step being fully described, precise, and verifiable by computational systems. Proof assistants, such as Lean, Coq, and Isabelle, serve as the structural frameworks within which these formal proofs are constructed. Their operation demands logical soundness with no space for omissions, approximations, or unstated assumptions. This makes the challenge particularly demanding for AI systems, especially large language models, which excel in producing coherent natural language responses but typically lack the rigor to produce verifiable formal proofs. However, the desire to blend these strengths, AI’s fluency in informal reasoning and the structure of formal verification, has led to new innovations at the interface of language modeling and formal logic automation.

    A major issue arises from the inability of current language models to bridge the conceptual divide between informal and formal reasoning. Language models typically excel at generating human-like explanations and solving math problems written in natural language. However, this reasoning is inherently informal and often lacks the structural precision required by formal logic systems. While humans can intuitively leap from one deductive step to another, proof assistants require a fully specified sequence of steps, free of ambiguity. Thus, the challenge is to guide AI models to produce logically coherent formal outputs from their otherwise informal and intuitive internal reasoning processes. This problem becomes increasingly complex when handling advanced theorems from domains such as number theory or geometry, where precision is crucial.

    Recent efforts have attempted to address this issue by guiding models first to generate natural language proof sketches, which are then manually or semi-automatically translated into formal proof steps. A known strategy includes decomposing a complex theorem into smaller subgoals. Each subgoal represents a lemma that can be tackled independently and later combined to form a complete proof. Frameworks like “Draft, Sketch, and Prove” have applied this idea, using language models to generate proof outlines that are then translated into formal language. Another method employs hierarchical reinforcement learning, breaking down complex mathematical problems into simpler layers. However, these models often struggle to produce fully verifiable outputs in Lean or Coq environments. Moreover, the training data for these models is usually limited, and proof attempts frequently fail to yield successful outcomes that provide useful learning signals.

    A team of researchers from DeepSeek-AI has introduced a new model, DeepSeek-Prover-V2, designed to generate formal mathematical proofs by leveraging subgoal decomposition and reinforcement learning. The core of their approach utilizes DeepSeek-V3 to break down a complex theorem into manageable subgoals, each of which is translated into a “have” statement in Lean 4 with a placeholder indicating that the proof is incomplete. These subgoals are then passed to a 7B-sized prover model that completes each proof step. Once all steps are resolved, they are synthesized into a complete Lean proof and paired with the original natural language reasoning generated by DeepSeek-V3. This forms a rich cold-start dataset for reinforcement learning. Importantly, the model’s training is entirely bootstrapped from synthetic data, with no human-annotated proof steps used.

    The cold-start pipeline begins by prompting DeepSeek-V3 to create proof sketches in natural language. These sketches are transformed into formal theorem statements with unresolved parts. A key innovation lies in recursively solving each subgoal using the 7B prover, reducing computation costs while maintaining formal rigor. Researchers constructed a curriculum learning framework that increased the complexity of training tasks over time. They also implemented two types of subgoal theorems, one incorporating preceding subgoals as premises, and one treating them independently. This dual structure was embedded into the model’s expert iteration stage to train it on progressively more challenging problem sets. The model’s capability was then reinforced through a consistency-based reward system during training, ensuring that all decomposed lemmas were correctly incorporated into the final formal proof.

    On the MiniF2F-test benchmark, the model achieved an 88.9% pass rate with high sampling (Pass@8192), compared to 82.0% by Kimina-Prover and 64.7% by Geodel-Prover. It also solved 49 out of 658 problems from PutnamBench, a platform featuring challenging mathematical tasks. On the newly introduced ProverBench dataset, comprising 325 formalized problems, the model addressed 6 out of 15 issues from the AIME (American Invitational Mathematics Examination) competitions for the years 2024 and 2025. These benchmarks highlight the model’s generalization ability across multiple formal reasoning tasks. Even when compared to DeepSeek-V3, which employs natural-language reasoning, the new model demonstrates competitive performance, solving a comparable number of AIME problems while ensuring formal verifiability.

    Several Key Takeaways from the Research on DeepSeek-Prover-V2:

    • DeepSeek-Prover-V2 achieved an 88.9% pass rate on the MiniF2F-test (Pass@8192), the highest reported among formal reasoning models so far. 
    • The model successfully solved 49 out of 658 problems from the PutnamBench dataset, which contains advanced mathematical challenges. 
    • It tackled 6 out of 15 problems from the recent AIME 2024–2025 competitions, showcasing real-world applicability.
    • A new benchmark, ProverBench, comprising 325 formal problems, has been introduced for evaluating formal reasoning models. 
    • The pipeline unifies natural language proof sketching and formal proof construction by combining DeepSeek-V3 and a 7B prover model.  
    • Two types of subgoal decompositions—one with and one without dependent premises—were used to train the model in a structured, curriculum-guided manner. 
    • Reinforcement learning with a consistency-based reward significantly improved proof accuracy by enforcing structural alignment between sketch and solution.  
    • The entire training strategy relies on synthetic cold-start data, eliminating dependence on manually labeled proofs.

    Check out the model on Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post DeepSeek-AI Released DeepSeek-Prover-V2: An Open-Source Large Language Model Designed for Formal Theorem, Proving through Subgoal Decomposition and Reinforcement Learning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleBest practices for Meta Llama 3.2 multimodal fine-tuning on Amazon Bedrock
    Next Article Salesforce AI Research Introduces New Benchmarks, Guardrails, and Model Architectures to Advance Trustworthy and Capable AI Agents

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 2, 2025
    Machine Learning

    Meet Trackio: The Free, Local-First, Open-Source Experiment Tracker Python Library that Simplifies and Enhances Machine Learning Workflows

    August 2, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Bandit – find security issues in Python code

    Linux

    Fedora 43 sancisce la fine di X11 su GNOME: Wayland prende il comando

    Linux

    Microsoft Edge Integrates On-Device AI APIs for Web Developers

    Operating Systems

    CVE-2025-6293 – Code-projects Hostel Management System SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2024-48766 – NetAlertX HTTP File Disclosure

    May 13, 2025

    CVE ID : CVE-2024-48766

    Published : May 13, 2025, 4:15 p.m. | 14 minutes ago

    Description : NetAlertX 24.7.18 before 24.10.12 allows unauthenticated file reading because an HTTP client can ignore a redirect, and because of factors related to strpos and directory traversal, as exploited in the wild in May 2025. This is related to components/logs.php.

    Severity: 8.6 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-38349 – Linux Kernel Epoll Use-After-Free Vulnerability

    July 18, 2025

    CVE-2025-6166 – Frdel Agent-Zero Path Traversal Vulnerability

    June 17, 2025

    LightOn AI Released GTE-ModernColBERT-v1: A Scalable Token-Level Semantic Search Model for Long-Document Retrieval and Benchmark-Leading Performance

    May 11, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.