Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 2, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 2, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 2, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 2, 2025

      How Red Hat just quietly, radically transformed enterprise server Linux

      June 2, 2025

      OpenAI wants ChatGPT to be your ‘super assistant’ – what that means

      June 2, 2025

      The best Linux VPNs of 2025: Expert tested and reviewed

      June 2, 2025

      One of my favorite gaming PCs is 60% off right now

      June 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      `document.currentScript` is more useful than I thought.

      June 2, 2025
      Recent

      `document.currentScript` is more useful than I thought.

      June 2, 2025

      Adobe Sensei and GenAI in Practice for Enterprise CMS

      June 2, 2025

      Over The Air Updates for React Native Apps

      June 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025
      Recent

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025

      Microsoft says Copilot can use location to change Outlook’s UI on Android

      June 2, 2025

      TempoMail — Command Line Temporary Email in Linux

      June 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»TokenSkip: Optimizing Chain-of-Thought Reasoning in LLMs Through Controllable Token Compression

    TokenSkip: Optimizing Chain-of-Thought Reasoning in LLMs Through Controllable Token Compression

    February 23, 2025

    Large Language Models (LLMs) face significant challenges in complex reasoning tasks, despite the breakthrough advances achieved through Chain-of-Thought (CoT) prompting. The primary challenge lies in the computational overhead introduced by longer CoT sequences, which directly impacts inference latency and memory requirements. The autoregressive nature of LLM decoding means that as CoT sequences grow longer, there is a proportional increase in processing time and memory usage in attention layers where computational costs scale quadratically. Finding a balance between maintaining reasoning accuracy and computational efficiency has become a critical challenge, as attempts to reduce reasoning steps often compromise the model’s problem-solving capabilities.

    Various methodologies have been developed to address the computational challenges of Chain-of-Thought (CoT) reasoning. Some approaches focus on streamlining the reasoning process by simplifying or skipping certain thinking steps, while others attempt to generate steps in parallel. A different strategy involves compressing reasoning steps into continuous latent representations, enabling LLMs to reason without generating explicit word tokens. Moreover, prompt compression techniques to handle complex instructions and long-context inputs more efficiently range from using lightweight language models to generate concise prompts, employing implicit continuous tokens for task representation, and implementing direct compression by filtering for high-informative tokens.

    Researchers from The Hong Kong Polytechnic University and the University of Science and Technology of China have proposed TokenSkip, an approach to optimize CoT processing in LLMs. It enables models to skip less important tokens within CoT sequences while maintaining connections between critical reasoning tokens, with adjustable compression ratios. The system works by first constructing compressed CoT training data through token pruning, followed by a supervised fine-tuning process. Initial testing across multiple models, including LLaMA-3.1-8B-Instruct and Qwen2.5-Instruct series shows promising results, particularly in maintaining reasoning capabilities while significantly reducing computational overhead.

    TokenSkip’s architecture is built on the fundamental principle that different reasoning tokens contribute varying levels of importance to reaching the final answer. It contains two main phases: training data preparation and inference. In the training phase, the system generates CoT trajectories using the target LLM, and Each remaining trajectory undergoes pruning with a randomly selected compression ratio. The token pruning process is guided by an “importance scoring” mechanism. TokenSkip maintains the autoregressive decoding approach during inference but enhances efficiency by enabling LLMs to skip less important tokens. The structure of the input format is such that the question and compression ratio gets separated by end-of-sequence tokens.

    The results show that larger language models are more adept at maintaining performance while achieving higher compression rates. The Qwen2.5-14B-Instruct model achieves remarkable results with only a 0.4% performance drop while reducing token usage by 40%. TokenSkip shows superior performance when compared with alternative approaches like prompt-based reduction and truncation. While prompt-based reduction fails to achieve target compression ratios and truncation leads to significant performance degradation, TokenSkip maintains the specified compression ratio while preserving reasoning capabilities. On the MATH-500 dataset, it achieves a 30% reduction in token usage with less than a 4% performance drop.

    In this paper, researchers introduced TokenSkip which represents a significant advancement in optimizing CoT processing for LLMs by introducing a controllable compression mechanism based on token importance. The method’s success lies in maintaining reasoning accuracy while significantly reducing computational overhead by selectively preserving critical tokens and skipping less important ones. The approach has proven effective with LLMs, showing minimal performance degradation even at substantial compression ratios. This research opens new possibilities for advancing efficient reasoning in LLMs, establishing a foundation for future developments in computational efficiency while maintaining robust reasoning capabilities.


      Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

      🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

      The post TokenSkip: Optimizing Chain-of-Thought Reasoning in LLMs Through Controllable Token Compression appeared first on MarkTechPost.

      Source: Read More 

      Facebook Twitter Reddit Email Copy Link
      Previous ArticleSony Researchers Propose TalkHier: A Novel AI Framework for LLM-MA Systems that Addresses Key Challenges in Communication and Refinement
      Next Article Meta AI Releases the Video Joint Embedding Predictive Architecture (V-JEPA) Model: A Crucial Step in Advancing Machine Intelligence

      Related Posts

      Machine Learning

      How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

      June 2, 2025
      Machine Learning

      MiMo-VL-7B: A Powerful Vision-Language Model to Enhance General Visual Understanding and Multimodal Reasoning

      June 2, 2025
      Leave A Reply Cancel Reply

      Continue Reading

      The Mask APT Resurfaces with Sophisticated Multi-Platform Malware Arsenal

      Development

      Buy a Samsung Odyssey G9 gaming monitor on sale and get a second screen for free

      News & Updates

      All About JavaScript Loops

      Development

      PonyProg is a serial device programmer

      Linux

      Highlights

      Development

      Potential Data Exposure Issue Discovered in NetSuite’s SuiteCommerce Platform

      August 16, 2024

      Oracle’s NetSuite, a popular Enterprise Resource Planning (ERP) platform, has a feature that allows businesses…

      Hypernetworks for Personalizing ASR to Atypical Speech

      June 17, 2024

      Improve the productivity of your customer support and project management teams using Amazon Q Business and Atlassian Jira

      July 29, 2024

      CVE-2025-3520 – “WordPress Avatar Plugin File Deletion Vulnerability”

      April 21, 2025
      © DevStackTips 2025. All rights reserved.
      • Contact
      • Privacy Policy

      Type above and press Enter to search. Press Esc to cancel.