Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Salesforce AI Research Introduces LaTRO: A Self-Rewarding Framework for Enhancing Reasoning Capabilities in Large Language Models

    Salesforce AI Research Introduces LaTRO: A Self-Rewarding Framework for Enhancing Reasoning Capabilities in Large Language Models

    November 15, 2024

    Large language models (LLMs), useful for answering questions and generating content, are now being trained to handle tasks requiring advanced reasoning, such as complex problem-solving in mathematics, science, and logical deduction. Improving reasoning capabilities within LLMs is a core focus of AI research, aiming to empower models to conduct sequential thinking processes. This area’s enhancement could enable more robust applications in diverse fields by allowing models to navigate through complex reasoning tasks independently.

    A persistent challenge in LLM development is optimizing their reasoning abilities without external feedback. Current LLMs perform well on relatively simple tasks but need help with multi-step or sequential reasoning, where an answer is derived through a series of connected logical steps. This limitation restricts LLMs’ utility in tasks that require a logical progression of ideas, such as solving intricate mathematical problems or analyzing data in a structured way. Consequently, building self-sufficient reasoning capabilities into LLMs has become essential to expand their functionality and effectiveness in tasks where reasoning is key.

    Researchers have experimented with several inference-time methods to address these challenges to improve reasoning. One prominent approach is Chain-of-Thought (CoT) prompting, which encourages the model to break down a complex problem into manageable parts, making each decision step-by-step. This method enables models to follow a structured approach toward problem-solving, making them better suited for tasks requiring logic and precision. Other approaches, like Tree-of-Thought and Program-of-Thought, allow LLMs to explore multiple reasoning paths, providing diverse approaches to problem-solving. While effective, these methods focus primarily on runtime improvements and do not fundamentally enhance reasoning ability during the model’s training phase.

    Researchers from Salesforce AI Research have introduced a new framework called LaTent Reasoning Optimization (LaTRO). LaTRO is an innovative approach that transforms the reasoning process into a latent sampling problem, offering an intrinsic enhancement to the model’s reasoning capabilities. This framework allows LLMs to refine their reasoning pathways through a self-rewarding mechanism, which enables them to evaluate and improve their responses without relying on external rewards or supervised feedback. By focusing on a self-improvement strategy, LaTRO advances reasoning performance at the training level, creating a foundational change in how models understand and tackle complex tasks.

    LaTRO’s methodology is grounded in sampling reasoning paths from a latent distribution and optimizing these paths through variational techniques. LaTRO utilizes a unique self-rewarding mechanism at its core by sampling multiple reasoning paths for a given question. Each path is evaluated based on its likelihood of producing a correct answer, with the model then adjusting its parameters to prioritize paths with higher success rates. This iterative process enables the model to concurrently enhance its ability to generate quality reasoning paths and assess the effectiveness of these paths, thus fostering a continual self-improvement cycle. Unlike conventional approaches, LaTRO does not depend on external reward models, making it a more autonomous and adaptable framework for enhancing reasoning in LLMs. Furthermore, by shifting the reasoning optimization to the training phase, LaTRO effectively reduces computational demands during inference, making it a resource-efficient solution.

    The performance of LaTRO has been rigorously tested across various datasets, with results underscoring its effectiveness. For instance, in tests on the GSM8K dataset, which includes math-based reasoning challenges, LaTRO demonstrated a substantial 12.5% improvement over base models in zero-shot accuracy. This gain indicates a marked enhancement in the model’s reasoning ability without requiring task-specific training. Furthermore, LaTRO outperformed supervised fine-tuning models by 9.6%, showcasing its ability to deliver more accurate results while maintaining efficiency. On the ARC-Challenge dataset, which focuses on logical reasoning, LaTRO again surpassed both base and fine-tuned models, significantly increasing performance. For Mistral-7B, one of the LLM architectures used, the zero-shot accuracy on GSM8K improved from 47.8% in base models to 67.3% under LaTRO with greedy decoding. In self-consistency testing, where multiple reasoning paths are considered, LaTRO achieved an additional performance boost, with a remarkable 90.5% accuracy for Phi-3.5 models on GSM8K.

    In addition to quantitative results, LaTRO’s self-rewarding mechanism is evident in its qualitative improvements. The method effectively teaches LLMs to evaluate reasoning paths internally, producing concise and logically coherent answers. The experimental analysis reveals that LaTRO enables LLMs to better utilize their latent reasoning potential, even in complex scenarios, thus reducing reliance on external evaluation frameworks. This advancement has implications for many applications, especially in fields where logical coherence and structured reasoning are essential.

    In conclusion, LaTRO offers an innovative and effective solution to enhance LLM reasoning through self-rewarding optimization, setting a new standard for model self-improvement. This framework enables pre-trained LLMs to unlock their latent potential in reasoning tasks by focusing on training-time reasoning enhancement. This advancement by Salesforce AI Research highlights the potential for autonomous reasoning in AI models and demonstrates that LLMs can self-evolve into more effective problem-solvers. LaTRO represents a significant leap forward, bringing AI closer to achieving autonomous reasoning abilities across various domains.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    [FREE AI WEBINAR] Implementing Intelligent Document Processing with GenAI in Financial Services and Real Estate Transactions

    The post Salesforce AI Research Introduces LaTRO: A Self-Rewarding Framework for Enhancing Reasoning Capabilities in Large Language Models appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePrincipal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI
    Next Article Anthropic Introduces New Prompt Improver to Developer Console: Automatically Refine Prompts With Prompt Engineering Techniques and CoT Reasoning

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    With Copilot Avatar, Microsoft will finally bring Clippy back

    Operating Systems

    Best Civilization 7 mods so far — Steam, Nexus, CivFanatics, and everything you need to know

    News & Updates
    Driving Retail Loyalty with MongoDB and Cognigy

    Driving Retail Loyalty with MongoDB and Cognigy

    Databases

    Exciting New Tools for Designers, April 2024

    Development

    Highlights

    News & Updates

    Fallout TV series Season 2 gets release window, and the show is already renewed for Season 3

    May 13, 2025

    The Fallout TV series is renewed for a third season, while the second season is…

    Critical Security Vulnerability Found in WordPress Plugin InstaWP Connect

    April 22, 2025

    Roëlis Cleaning – Uw Schoonmaakbedrijf in Zeeland

    June 12, 2024

    There is any play/App store app available that only give the performance of specified of app?

    July 1, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.