Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 3, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 3, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 3, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 3, 2025

      All the WWE 2K25 locker codes that are currently active

      June 3, 2025

      PSA: You don’t need to spend $400+ to upgrade your Xbox Series X|S storage

      June 3, 2025

      UK civil servants saved 24 minutes per day using Microsoft Copilot, saving two weeks each per year according to a new report

      June 3, 2025

      These solid-state fans will revolutionize cooling in our PCs and laptops

      June 3, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Community News: Latest PECL Releases (06.03.2025)

      June 3, 2025
      Recent

      Community News: Latest PECL Releases (06.03.2025)

      June 3, 2025

      A Comprehensive Guide to Azure Firewall

      June 3, 2025

      Test Job Failures Precisely with Laravel’s assertFailedWith Method

      June 3, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      All the WWE 2K25 locker codes that are currently active

      June 3, 2025
      Recent

      All the WWE 2K25 locker codes that are currently active

      June 3, 2025

      PSA: You don’t need to spend $400+ to upgrade your Xbox Series X|S storage

      June 3, 2025

      UK civil servants saved 24 minutes per day using Microsoft Copilot, saving two weeks each per year according to a new report

      June 3, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper Proposes TALE: An AI Framework that Reduces Token Redundancy in Chain-of-Thought (CoT) Reasoning by Incorporating Token Budget Awareness

    This AI Paper Proposes TALE: An AI Framework that Reduces Token Redundancy in Chain-of-Thought (CoT) Reasoning by Incorporating Token Budget Awareness

    December 29, 2024

    Large Language Models (LLMs) have shown significant potential in reasoning tasks, using methods like Chain-of-Thought (CoT) to break down complex problems into manageable steps. However, this capability comes with challenges. CoT prompts often increase token usage, leading to higher computational costs and energy consumption. This inefficiency is a concern for applications that require both precision and resource efficiency. Current LLMs tend to generate unnecessarily lengthy outputs, which do not always translate into better accuracy but incur additional costs. The key challenge is finding a balance between reasoning performance and resource efficiency.

    Researchers from Nanjing University, Rutgers University, and UMass Amherst have introduced a Token-Budget-Aware LLM Reasoning Framework. This framework dynamically estimates token budgets based on the complexity of a reasoning task and uses these estimates to guide the process. Known as TALE (Token-Budget-Aware LLM rEasoning), the approach seeks to reduce token usage without compromising the accuracy of responses. By integrating a token budget into CoT prompts, TALE provides a practical solution for enhancing cost-efficiency in LLMs while maintaining their performance.

    Technical Details and Benefits

    TALE operates in two main phases: budget estimation and token-budget-aware reasoning. Initially, it estimates an appropriate token budget for a problem using methods such as zero-shot prediction or regression-based estimators. This budget is then embedded in the prompt to encourage the LLM to generate concise yet accurate responses.

    A key innovation in TALE is the concept of “Token Elasticity,” which identifies an optimal range of token budgets that minimizes token usage while preserving accuracy. Using iterative search techniques like binary search, TALE determines the optimal budget for various tasks and LLM architectures. On average, the framework achieves a 68.64% reduction in token usage with less than a 5% decrease in accuracy, making it a practical and adaptable approach for token efficiency.

    Results and Insights

    Experiments demonstrate TALE’s effectiveness across benchmarks like GSM8K and MathBench. For instance, on the GSM8K dataset, TALE achieved 84.46% accuracy, surpassing the Vanilla CoT method while reducing token costs from 318.10 to 77.26 on average. On GSM8K-Zero, it reduced token costs by 91%, maintaining an accuracy of 98.72%.

    TALE also generalizes well across different LLMs, such as GPT-4o-mini and Yi-lightning. When applied to the MathBench-College dataset, TALE reduced token costs by up to 70% while maintaining competitive accuracy. Additionally, the framework significantly lowers operational expenses, cutting costs by 59% on average compared to Vanilla CoT. These results highlight TALE’s ability to enhance efficiency without sacrificing performance, making it suitable for a variety of applications.

    Conclusion

    The Token-Budget-Aware LLM Reasoning Framework addresses the inefficiency of token usage in reasoning tasks. By dynamically estimating and applying token budgets, TALE strikes a balance between accuracy and cost-effectiveness. This approach reduces computational expenses and broadens the accessibility of advanced LLM capabilities. As AI continues to evolve, frameworks like TALE offer a pathway to more efficient and sustainable use of LLMs in both academic and industrial contexts.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

    🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

    The post This AI Paper Proposes TALE: An AI Framework that Reduces Token Redundancy in Chain-of-Thought (CoT) Reasoning by Incorporating Token Budget Awareness appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleLe notizie minori del mondo GNU/Linux e dintorni della settimana nr 52/2024
    Next Article Researchers from Tsinghua University Propose ReMoE: A Fully Differentiable MoE Architecture with ReLU Routing

    Related Posts

    Security

    BitoPro Silent on $11.5M Hack: Investigator Uncovers Massive Crypto Theft

    June 3, 2025
    Security

    New Linux Vulnerabilities

    June 3, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Why World of Warcraft’s Pandaria ‘Remix’ event is the PERFECT starting point for new players

    Development

    World of Warcraft: The War Within’s next big season, “Undermine(d),” gets a February 25 release date

    News & Updates

    CVE-2025-48927 – Apache TeleMessage Spring Boot Actuator Heap Dump Exposed Endpoint Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    [Podcast] What if You Ran Your Innovation Lab Like a Casino? An Interview With Eisaiah Engel

    Development

    Highlights

    CVE-2025-4075 – VMSMan Cross Site Scripting Vulnerability

    April 29, 2025

    CVE ID : CVE-2025-4075

    Published : April 29, 2025, 6:15 p.m. | 52 minutes ago

    Description : A vulnerability was found in VMSMan up to 20250416. It has been rated as problematic. Affected by this issue is some unknown functionality of the file /login.php. The manipulation of the argument Email with the input “> leads to cross site scripting. The attack may be launched remotely. The exploit has been disclosed to the public and may be used. The vendor was contacted early about this disclosure but did not respond in any way.

    Severity: 4.3 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Rilasciata CachyOS Febbraio 2025: Novità e Ottimizzazioni per gli Utenti GNU/Linux

    February 3, 2025

    JavaScript and Node.js Speech-to-Text

    November 25, 2024

    Gemini 2.5 Pro Preview: even better coding performance

    May 13, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.