Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 4, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 4, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 4, 2025

      Smashing Animations Part 4: Optimising SVGs

      June 4, 2025

      I test AI tools for a living. Here are 3 image generators I actually use and how

      June 4, 2025

      The world’s smallest 65W USB-C charger is my latest travel essential

      June 4, 2025

      This Spotlight alternative for Mac is my secret weapon for AI-powered search

      June 4, 2025

      Tech prophet Mary Meeker just dropped a massive report on AI trends – here’s your TL;DR

      June 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025
      Recent

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025

      Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

      June 4, 2025

      Cast Model Properties to a Uri Instance in 12.17

      June 4, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025
      Recent

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025

      Rilasciata /e/OS 3.0: Nuova Vita per Android Senza Google, Più Privacy e Controllo per l’Utente

      June 4, 2025

      Rilasciata Oracle Linux 9.6: Scopri le Novità e i Miglioramenti nella Sicurezza e nelle Prestazioni

      June 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Advancing Test-Time Computing: Scaling System-2 Thinking for Robust and Cognitive AI

    Advancing Test-Time Computing: Scaling System-2 Thinking for Robust and Cognitive AI

    January 8, 2025

    The o1 model’s impressive performance in complex reasoning highlights the potential of test-time computing scaling, which enhances System-2 thinking by allocating greater computational effort during inference. While deep learning’s scaling effects have driven advancements in AI, particularly in LLMs like GPT, further scaling during training faces limitations due to data scarcity and computational constraints. Additionally, current models often fail in robustness and handling intricate tasks, primarily relying on fast, intuitive System-1 thinking. The o1 model, introduced by OpenAI in 2024, incorporates System-2 thinking, enabling superior performance in complex reasoning tasks through test-time computing scaling. This approach demonstrates that increasing computational effort during inference improves model accuracy, addressing some of the limitations of traditional training-phase scaling.

    System-1 and System-2 thinking, derived from cognitive psychology, are used in AI to describe different processing strategies. System-1 models rely on pattern recognition and fast, intuitive responses, lacking robustness and adaptability to distribution shifts. Earlier efforts to enhance robustness, such as test-time adaptation (TTA), focused on parameter updates or external input adjustments. However, these models were limited to weak System-2 capabilities. With the rise of LLMs, System-2 models have gained traction, allowing for incremental reasoning and the generation of intermediate steps, as seen in Chain-of-Thought (CoT) prompting. While this approach improves reasoning compared to direct output methods, it remains prone to cumulative errors. Retrieval-augmented generation (RAG) partially addresses factual inaccuracies, but its impact on reasoning abilities is limited, leaving CoT-enabled models at an early stage of System-2 thinking.

    Researchers from Soochow University, the National University of Singapore, and Ant Group explored test-time computing, tracing its evolution from System-1 to System-2 models. Initially applied to System-1 models to address distribution shifts and enhance robustness through parameter updates, input modifications, and output calibration, test-time computing now strengthens reasoning in System-2 models using strategies like repeated sampling, self-correction, and tree search. These methods enable models to solve complex problems by simulating diverse thinking patterns, reflecting on errors, and improving reasoning depth. The survey highlights this progression and further discusses future research directions for developing robust, cognitively capable AI systems.

    TTA fine-tunes models during inference using test sample information. Key considerations include learning signals, parameter updates, and ensuring efficiency. Learning signals like Test-time Training (TTT) use auxiliary tasks, while Fully Test-time Adaptation (FTTA) leverages internal feedback (e.g., entropy minimization) but requires safeguards against model collapse. Human feedback is also utilized for tasks like QA and cross-modal retrieval. To improve efficiency, parameter updates target specific layers (e.g., normalization or adapters). Techniques such as episodic TTA or exponential moving averages address catastrophic forgetting. Methods like FOA further refine adaptation by optimizing prompts without backpropagation.

    Test-time reasoning involves leveraging extended inference time to identify human-like reasoning within the decoding search space. Its two core components are feedback modeling and search strategies. Feedback modeling evaluates outputs through score-based and verbal feedback. Score-based feedback uses verifiers to score outputs based on correctness or reasoning process quality, with outcome-based and process-based approaches. Verbal feedback provides interpretability and correction suggestions via natural language critiques, often utilizing LLMs like GPT-4. Search strategies include repeated sampling and self-correction, where diverse responses are generated and refined. Multi-agent debates and self-critiques enhance reasoning by leveraging external feedback or intrinsic evaluation mechanisms.

    In conclusion, The future of test-time computing involves several key directions. First, enhancing the generalization of System-2 models beyond domain-specific tasks like math and code to support scientific discovery and weak-to-strong generalization is vital. Second, expanding multimodal reasoning by integrating modalities like speech and video and aligning processes with human cognition holds promise. Third, balancing efficiency and performance by optimizing resource allocation and integrating acceleration strategies is critical. Fourth, establishing universal scaling laws remains challenging due to diverse strategies and influencing factors. Lastly, combining multiple test-time strategies and adaptation methods can improve reasoning, advancing LLMs toward cognitive intelligence.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

    🚨 FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.

    The post Advancing Test-Time Computing: Scaling System-2 Thinking for Robust and Cognitive AI appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleThis AI Paper Introduces Semantic Backpropagation and Gradient Descent: Advanced Methods for Optimizing Language-Based Agentic Systems
    Next Article The best advice to bring to work in 2025

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 4, 2025
    Machine Learning

    A Coding Implementation to Build an Advanced Web Intelligence Agent with Tavily and Gemini AI

    June 4, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    How to Setup Kubernetes Cluster with Minikube on Windows

    Linux

    AI could erase half of entry-level white collar jobs in 5 years, CEO warns

    News & Updates

    Cybercriminals Clone Antivirus Site to Spread Venom RAT and Steal Crypto Wallets

    Development

    I changed 10 settings on my Android smartwatch to drastically improve battery life

    Development
    GetResponse

    Highlights

    News & Updates

    I may have found the ultimate monitor for conferencing and productivity, but it has a few weaknesses

    June 2, 2025

    The HP Series 7 Pro 34 WQHD Conferencing Monitor (734pm) combines an ultrawide display, Thunderbolt…

    A Coding Tutorial of Model Context Protocol Focusing on Semantic Chunking, Dynamic Token Management, and Context Relevance Scoring for Efficient LLM Interactions

    April 28, 2025

    This robot lawn mower is so impressive my neighbors come to watch it mow

    July 31, 2024

    Article: Vue.js, the Green framework

    January 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.