Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 18, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 18, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 18, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 18, 2025

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025

      Your Android devices are getting several upgrades for free – including a big one for Auto

      May 18, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025
      Recent

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025
      Recent

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Q*: A Versatile Artificial Intelligence AI Approach to Improve LLM Performance in Reasoning Tasks

    Q*: A Versatile Artificial Intelligence AI Approach to Improve LLM Performance in Reasoning Tasks

    June 28, 2024

    Large Language Models (LLMs) have demonstrated remarkable abilities in tackling various reasoning tasks expressed in natural language, including math word problems, code generation, and planning. However, as the complexity of reasoning tasks increases, even the most advanced LLMs struggle with errors, hallucinations, and inconsistencies due to their auto-regressive nature. This challenge is particularly evident in tasks requiring multiple reasoning steps, where LLMs’ “System 1” thinking—fast and instinctive but less accurate—falls short. The need for more deliberative, logical “System 2” thinking becomes crucial for solving complex reasoning problems accurately and consistently.

    Several attempts have been made to overcome the challenges faced by LLMs in complex reasoning tasks. Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) aim to align LLM outputs with human expectations. Direct Preference Optimization (DPO) and Aligner methods have also been developed to improve alignment. In the realm of enhancing LLMs with planning capabilities, Tree-of-Thoughts (ToT), A* search, and Monte Carlo Tree Search (MCTS) have been applied. For math reasoning and code generation, techniques such as prompt engineering, fine-tuning with task-specific corpora, and training reward models have been explored. However, these methods often require extensive expertise, significant computational resources, or task-specific modifications, limiting their generalizability and efficiency.

    Researchers from Skywork AI and Nanyang Technological University present Q*, a robust framework designed to enhance the multi-step reasoning capabilities of LLMs through deliberative planning. This approach formalizes LLM reasoning as a Markov Decision Process (MDP), where the state combines the input prompt and previous reasoning steps, the action represents the next reasoning step, and the reward measures task success. Q* introduces general methods for estimating optimal Q-values of state-action pairs, including offline reinforcement learning, best sequence selection from rollouts, and completion using stronger LLMs. By framing multi-step reasoning as a heuristic search problem, Q* employs plug-and-play Q-value models as heuristic functions within an A* search framework, guiding LLMs to select the most promising next steps efficiently.

    The Q* framework employs a sophisticated architecture to enhance LLMs’ multi-step reasoning capabilities. It formalizes the process as a heuristic search problem, utilizing an A* search algorithm. The framework associates each state with an f-value, computed as a weighted sum of aggregated utility and a heuristic value. The aggregated utility is calculated using a process-based reward function, while the heuristic value is estimated using the optimal Q-value of the state. Q* introduces three methods for estimating optimal Q-values: offline reinforcement learning, learning from rollouts, and approximation using stronger LLMs. These methods enable the framework to learn from training data without task-specific modifications. The deliberative planning process follows an A* search algorithm. It maintains two sets of states: unvisited and visited. The algorithm iteratively selects the state with the highest f-value from the unvisited set, expands it using the LLM policy, and updates both sets accordingly. This process continues until a terminal state (complete trajectory) is reached, at which point the answer is extracted from the final state.

    Q* demonstrated significant performance improvements across various reasoning tasks. On the GSM8K dataset, it enhanced Llama-2-7b to achieve 80.8% accuracy, surpassing ChatGPT-turbo. For the MATH dataset, Q* improved Llama-2-7b and DeepSeekMath-7b, reaching 55.4% accuracy, outperforming models like Gemini Ultra (4-shot). In code generation, Q* boosted CodeQwen1.5-7b-Chat to 77.0% accuracy on the MBPP dataset. These results consistently show Q*’s effectiveness in enhancing LLM performance across math reasoning and code generation tasks, outperforming traditional methods and some closed-source models.

    Q* emerges as an effective method to overcome the challenge of multi-step reasoning in LLMs by introducing a robust deliberation framework. This approach enhances LLMs’ ability to solve complex problems that require in-depth, logical thinking beyond simple auto-regressive token generation. Unlike previous methods that rely on task-specific utility functions, Q* uses a versatile Q-value model trained solely on ground-truth data, making it easily adaptable to various reasoning tasks without modifications. The framework employs plug-and-play Q-value models as heuristic functions, guiding LLMs effectively without the need for task-specific fine-tuning, thus preserving performance across diverse tasks. Q*’s agility stems from its single-step consideration approach, contrasting with more computationally intensive methods like MCTS. Extensive experiments in math reasoning and code generation demonstrate Q*’s superior performance, highlighting its potential to improve LLMs’ complex problem-solving capabilities significantly.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 45k+ ML SubReddit

    Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generally available! [Advertisement]

    The post Q*: A Versatile Artificial Intelligence AI Approach to Improve LLM Performance in Reasoning Tasks appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleImbue Team Trains 70B-Parameter Model From Scratch: Innovations in Pre-Training, Evaluation, and Infrastructure for Advanced AI Performance
    Next Article Dolphin{anty} Antidetect Browser: The Ultimate Antidetect Browser for Online Anonymity and Multi-Account Management

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 18, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4873 – PHPGurukul News Portal SQL Injection Vulnerability

    May 18, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Will Apple’s new AI capture the “most valuable company” title from Microsoft or inspire its own Recall privacy PR nightmare?

    Development

    Error’d: Just Beastly

    Development

    Good design is structure for the work

    Web Development

    Integrating Design And Code With Native Design Tokens In Penpot

    Tech & Work

    Highlights

    gdcalc – financial, scientific, statistical and programming calculator

    January 16, 2025

    gdcalc is a financial, statistics, scientific and programmers calculator with its GUI using GTK. The…

    Autonomous Navigation for Aerial Vehicles at Night

    May 15, 2024

    Tutorial on Seamlessly Accessing Any LinkedIn Profile with exa-mcp-server and Claude Desktop Using the Model Context Protocol MCP

    April 30, 2025

    How to Combat AI Bot Traffic on Your Website

    May 7, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.