Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 4, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 4, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 4, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 4, 2025

      Players aren’t buying Call of Duty’s “error” excuse for the ads Activision started forcing into the game’s menus recently

      June 4, 2025

      In Sam Altman’s world, the perfect AI would be “a very tiny model with superhuman reasoning capabilities” for any context

      June 4, 2025

      Sam Altman’s ouster from OpenAI was so dramatic that it’s apparently becoming a movie — Will we finally get the full story?

      June 4, 2025

      One of Microsoft’s biggest hardware partners joins its “bold strategy, Cotton” moment over upgrading to Windows 11, suggesting everyone just buys a Copilot+ PC

      June 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      LatAm’s First Databricks Champion at Perficient

      June 4, 2025
      Recent

      LatAm’s First Databricks Champion at Perficient

      June 4, 2025

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025

      Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

      June 4, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Players aren’t buying Call of Duty’s “error” excuse for the ads Activision started forcing into the game’s menus recently

      June 4, 2025
      Recent

      Players aren’t buying Call of Duty’s “error” excuse for the ads Activision started forcing into the game’s menus recently

      June 4, 2025

      In Sam Altman’s world, the perfect AI would be “a very tiny model with superhuman reasoning capabilities” for any context

      June 4, 2025

      Sam Altman’s ouster from OpenAI was so dramatic that it’s apparently becoming a movie — Will we finally get the full story?

      June 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Meta AI Releases ‘NATURAL REASONING’: A Multi-Domain Dataset with 2.8 Million Questions To Enhance LLMs’ Reasoning Capabilities

    Meta AI Releases ‘NATURAL REASONING’: A Multi-Domain Dataset with 2.8 Million Questions To Enhance LLMs’ Reasoning Capabilities

    February 22, 2025

    Large language models (LLMs) have shown remarkable advancements in reasoning capabilities in solving complex tasks. While models like OpenAI’s o1 and DeepSeek’s R1 have significantly improved challenging reasoning benchmarks such as competition math, competitive coding, and GPQA, critical limitations remain in evaluating their true reasoning potential. The current reasoning datasets focus on problem-solving tasks but fail to encompass domains that require open-ended reasoning. Moreover, these datasets suffer from limited diversity in both scale and difficulty levels, making it challenging to evaluate and enhance the reasoning capabilities of LLMs across different domains and complexity levels.

    Previous attempts to enhance LLM reasoning capabilities mostly focus on two approaches: synthetic data generation and unsupervised self-training. In synthetic data generation, STaR and MetaMath methods augment existing datasets with new chain-of-thought rationales and question variations. Still, they heavily depend on pre-existing high-quality datasets. While approaches like OpenMathInstruct-2, NuminaMath, and Xwin-Math generate new data from seed examples, they struggle with scaling to novel domains. In unsupervised self-training, most methods rely on human-annotated final answers or external reward models, making them resource-intensive and costly, particularly for complex multi-step problems that require human evaluation of LLM outputs.

    Researchers from Meta, and New York University have proposed NATURALREASONING, a comprehensive dataset of 2.8 million reasoning questions extracted from pretraining corpora. This dataset spans diverse fields including Mathematics, Physics, Computer Science, and Economics & Business. Unlike synthetic datasets like MetaMathQA and OpenMathInstruct-2, NATURALREASONING represents authentic real-world reasoning problems through backtranslation from pretraining corpora. It uniquely combines verifiable and open-ended questions, including theorem proving, making it valuable for developing algorithms that enhance LLMs’ reasoning abilities beyond simple verification tasks and enabling knowledge distillation from stronger to weaker models.

    The efficacy of the NATURALREASONING method is shown in two ways to enhance reasoning capabilities. First, it utilizes knowledge distillation and supervised finetuning to achieve steeper scaling trends than existing datasets. Second, it functions as a source for domain-specific seed data extraction. For targeting science reasoning benchmarks like GPQA, the method samples 250 benchmark questions and retrieves 1K similar decontaminated questions from NATURALREASONING using cosine similarity between question embeddings. These questions are then deduplicated and clustered into 15K groups. The evaluation protocol uses zero-shot testing across various benchmarks including MATH, GPQA, GPQA-Diamond, and MMLUPro, using greedy decoding for consistent performance measurement.

    The evaluation results show that with just 1.5 million training examples, models trained on NATURALREASONING outperform Llama3.1-8B-Instruct but other datasets like OpenMathInstruct-2 and WebInstruct fail to achieve comparable performance even with 2.8 million data points. While math-specific datasets like OpenMathInstruct-2 show strong performance on math benchmarks (improving from 50.83 to 59.25 on MATH), they struggle to generalize, with GPQA accuracy plateauing around 26-27% and inconsistent MMLU-Pro performance. Moreover, datasets like WebInstruct show diminishing returns, with GPQA performance peaking at 29.02% with 500K samples but declining to 26.12% at 2.8M samples.

    In conclusion, researchers introduced NATURALREASONING, a dataset that represents a significant advancement in developing comprehensive reasoning datasets for LLMs. The dataset’s collection of 2.8 million questions spans multiple domains including mathematics, physics, computer science, economics, and social sciences. The results show that using the NATURALREASONING method for knowledge distillation leads to consistent improvements in reasoning benchmark performance as data size increases. Its effectiveness extends to enabling unsupervised self-training of LLMs through external reward models and self-rewarding techniques, marking a step forward to enhance LLMs’ reasoning capabilities in diverse domains.


    Check out the Paper and Dataset. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

    🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

    The post Meta AI Releases ‘NATURAL REASONING’: A Multi-Domain Dataset with 2.8 Million Questions To Enhance LLMs’ Reasoning Capabilities appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous Articlesxcs – minimal X11 color picker and magnifier
    Next Article Google DeepMind Research Releases SigLIP2: A Family of New Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 4, 2025
    Machine Learning

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

    June 4, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    The 10+ best early Amazon Spring Sale phone deals

    News & Updates

    CSS: scrollbar-color and scrollbar-gutter are Baseline Newly available

    Web Development

    Looking for an AI-powered website builder? Here’s your best option in 2025

    News & Updates

    Don’t Tread on Me Penguins Against Trump Shirt https://viralstyle.com/graydesigner/dont-tread-on-me-penguins-against-trump Make a bold statement with our “Don’t Tread on Me Penguins Against Trump” shirt. This eye-catching design features rebellious penguins standing up to Trump, blending humor with political activism. Perfect for protests, casual wear, or sparking conversation. Soft, high-quality cotton for all-day comfort. Wear your values loud and proud!

    Web Development

    Highlights

    Xjadeo – the X JACK video monitor

    February 11, 2025

    Xjadeo is a standard for video-monitoring with the Ardour Digital Audio Workstation and used by…

    AI Full Audiobook Creator from a single line of prompt : Say hi to Kuluko

    February 19, 2025

    CVE-2025-24473 – Fortinet FortiClient Information Disclosure

    May 28, 2025

    Medusa Ransomware Hits 40+ Victims in 2025, Demands $100K–$15M Ransom

    March 16, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.