Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»From Explicit to Implicit: Stepwise Internalization Ushers in a New Era of Natural Language Processing Reasoning

    From Explicit to Implicit: Stepwise Internalization Ushers in a New Era of Natural Language Processing Reasoning

    June 1, 2024

    Natural language processing (NLP) teaches computers to understand, interpret, and generate human language. Researchers in this field are particularly focused on improving the reasoning capabilities of language models to solve complex tasks effectively. This involves enhancing models’ abilities to process and generate text that requires logical steps and coherent thought processes.

    A significant challenge in NLP is enabling language models to solve reasoning tasks accurately and efficiently. Traditional models often rely on generating explicit intermediate steps, which can be computationally expensive and inefficient. While improving accuracy, these intermediate steps require substantial computational resources and may not fully leverage the models’ potential. The central issue is finding a way to internalize these reasoning processes within the models to maintain high accuracy while reducing computational overhead.

    Existing work includes explicit chain-of-thought (CoT) reasoning, which generates intermediate reasoning steps to improve accuracy but demands substantial computational resources. Implicit CoT via knowledge distillation (ICoT-KD) trains models using hidden states for reasoning without explicit steps. Methods like MathGLM solve multi-digit arithmetic tasks without intermediate steps, achieving accuracy with large models. Another approach, Searchformer, trains transformers to perform searches with fewer steps. These methods aim to enhance efficiency and accuracy in natural language processing tasks.

    Researchers from the Allen Institute for Artificial Intelligence, the University of Waterloo, the University of Washington, and Harvard University have introduced Stepwise Internalization to solve this inefficiency. This innovative method starts with a model trained for explicit CoT reasoning and then gradually removes the intermediate steps while fine-tuning the model. This process helps the model internalize the reasoning steps, simplifying the reasoning process while preserving performance. The gradual removal of CoT tokens during training allows the model to internalize these steps within its hidden states, achieving implicit CoT reasoning without generating intermediate steps.

    Stepwise Internalization involves a meticulous training process. Initially, a language model is trained using explicit CoT reasoning, which generates intermediate steps to reach the final answer. As training progresses, these intermediate steps are incrementally removed. At each stage of the process, the model is fine-tuned to adapt to the absence of certain steps, which encourages it to internalize the reasoning process within its hidden states. The method uses a linear schedule to remove CoT tokens, ensuring the model gradually adapts to these changes. This systematic removal and fine-tuning process enables the model to handle complex reasoning tasks more efficiently.

    The proposed method has shown remarkable improvements in performance across various tasks. For example, a GPT-2 Small model trained using Stepwise Internalization solved 9-by-9 multiplication problems with up to 99% accuracy. In contrast, models trained using standard methods struggled with tasks beyond 4-by-4 multiplication. Furthermore, the Mistral 7B model achieved over 50% accuracy on the GSM8K dataset, which consists of grade-school math problems, without producing any explicit intermediate steps. This performance surpasses the much larger GPT-4 model, which only scored 44% when prompted to generate the answer directly. Furthermore, Stepwise Internalization allows for significant computational efficiency. On tasks requiring explicit CoT reasoning, the method proves to be up to 11 times faster while maintaining high accuracy.

    To conclude, this research highlights a promising approach to enhancing the reasoning capabilities of language models. By internalizing CoT steps, Stepwise Internalization offers a balance between accuracy and computational efficiency. This method represents a significant advancement in potentially transforming how complex reasoning tasks are handled in NLP. The ability to internalize reasoning steps within hidden states could pave the way for more efficient and capable language models, making them more practical for various applications. The research underscores the potential of this innovative approach, suggesting that further development and scaling could lead to even more impressive results in the future.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 43k+ ML SubReddit | Also, check out our AI Events Platform

    The post From Explicit to Implicit: Stepwise Internalization Ushers in a New Era of Natural Language Processing Reasoning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleAddressing Sycophancy in AI: Challenges and Insights from Human Feedback Training
    Next Article Llama3-V: A SOTA Open-Source VLM Model Comparable performance to GPT4-V, Gemini Ultra, Claude Opus with a 100x Smaller Model

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    How to Make a 3D Rotating Cube That Spins on Scroll

    Development

    CVE-2025-31234 – Apple VisionOS iOS iPadOS macOS tvOS Kernel Corruption Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Arc browser lets you control individual site settings. Here’s why I love this feature

    Development

    So Your Website Or App Is Live… Now What?

    Development

    Highlights

    Development

    AI-as-a-Service Providers Vulnerable to PrivEsc and Cross-Tenant Attacks

    April 5, 2024

    New research has found that artificial intelligence (AI)-as-a-service providers such as Hugging Face are susceptible…

    How to Use TypeSpec for Documenting and Modeling APIs

    April 11, 2025

    I tested Minitailz’ AI-powered pet tracker, and it solved my biggest pain point as a dog owner

    June 9, 2024

    A Step-by-Step Guide to Setting Up a Custom BPE Tokenizer with Tiktoken for Advanced NLP Applications in Python

    February 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.