Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      7 MagSafe accessories that I recommend every iPhone user should have

      June 1, 2025

      I replaced my Kindle with an iPad Mini as my ebook reader – 8 reasons why I don’t regret it

      June 1, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025
      Recent

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 22/2025

      June 1, 2025

      Rilasciata PorteuX 2.1: Novità e Approfondimenti sulla Distribuzione GNU/Linux Portatile Basata su Slackware

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Why Do Task Vectors Exist in Pretrained LLMs? This AI Research from MIT and Improbable AI Uncovers How Transformers Form Internal Abstractions and the Mechanisms Behind in-Context Learning (ICL)

    Why Do Task Vectors Exist in Pretrained LLMs? This AI Research from MIT and Improbable AI Uncovers How Transformers Form Internal Abstractions and the Mechanisms Behind in-Context Learning (ICL)

    December 24, 2024

    Large Language Models (LLMs) have demonstrated remarkable similarities to human cognitive processes’ ability to form abstractions and adapt to new situations. Just as humans have historically made sense of complex experiences through fundamental concepts like physics and mathematics, autoregressive transformers now show comparable capabilities through in-context learning (ICL). Recent research has highlighted how these models can adapt to tricky tasks without parameter updates, suggesting the formation of internal abstractions similar to human mental models. Studies have begun exploring the mechanistic aspects of how pretrained LLMs represent latent concepts as vectors in their representations. However, questions remain about the underlying reasons for these task vectors’ existence and their varying effectiveness across different tasks.

    Researchers have proposed several theoretical frameworks to understand the mechanisms behind in-context learning in LLMs. One significant approach views ICL through a Bayesian framework, suggesting a two-stage algorithm that estimates posterior probability and likelihood. Parallel to this, studies have identified task-specific vectors in LLMs that can trigger desired ICL behaviors. At the same time, other research has revealed how these models encode concepts like truthfulness, time, and space as linearly separable representations. Through mechanistic interpretability techniques such as causal mediation analysis and activation patching, researchers have begun to uncover how these concepts emerge in LLM representations and influence downstream ICL task performance, demonstrating that transformers implement different algorithms based on inferred concepts.

    Researchers from the Massachusetts Institute of Technology and  Improbable AI introduce the concept encoding-decoding mechanism, providing a compelling explanation for how transformers develop internal abstractions. Research on a small transformer trained on sparse linear regression tasks reveals that concept encoding emerges as the model learns to map different latent concepts into distinct, separable representation spaces. This process operates in tandem with the development of concept-specific ICL algorithms through concept decoding. Testing across various pretrained model families, including Llama-3.1 and Gemma-2 in different sizes, demonstrates that larger language models exhibit this concept encoding-decoding behavior when processing natural ICL tasks. The research introduces Concept Decodability as a geometric measure of internal abstraction formation, showing that earlier layers encode latent concepts while latter layers condition algorithms on these inferred concepts, with both processes developing interdependently.

    The theoretical framework for understanding in-context learning draws heavily from a Bayesian perspective, which proposes that transformers implicitly infer latent variables from demonstrations before generating answers. This process operates in two distinct stages: latent concept inference and selective algorithm application. Experimental evidence from synthetic tasks, particularly using sparse linear regression, demonstrates how this mechanism emerges during model training. When trained on multiple tasks with different underlying bases, models develop distinct representational spaces for different concepts while simultaneously learning to apply concept-specific algorithms. The research reveals that concepts sharing overlaps or correlations tend to share representational subspaces, suggesting potential limitations in how models distinguish between related tasks in natural language processing.

    The research provides compelling empirical validation of the concept encoding-decoding mechanism in pretrained Large Language Models across different families and scales, including Llama-3.1 and Gemma-2. Through experiments with part-of-speech tagging and bitwise arithmetic tasks, researchers demonstrated that models develop more distinct representational spaces for different concepts as the number of in-context examples increases. The study introduces Concept Decodability (CD) as a metric to quantify how well latent concepts can be inferred from representations, showing that higher CD scores correlate strongly with better task performance. Notably, concepts frequently encountered during pretraining, such as nouns and basic arithmetic operations, show clearer separation in representational space compared to more complex concepts. The research further demonstrates through finetuning experiments that early layers play a crucial role in concept encoding, with modifications to these layers yielding significantly better performance improvements than changes to later layers.

    The concept encoding-decoding mechanism provides valuable insights into several key questions about Large Language Models’ behavior and capabilities. The research addresses the varying success rates of LLMs across different in-context learning tasks, suggesting that performance bottlenecks can occur at both the concept inference and algorithm decoding stages. Models show stronger performance with concepts frequently encountered during pretraining, such as basic logical operators, but may struggle even with known algorithms if concept distinction remains unclear. The mechanism also explains why explicit modeling of latent variables doesn’t necessarily outperform implicit learning in transformers, as standard transformers naturally develop effective concept encoding capabilities. Also, this framework offers a theoretical foundation for understanding activation-based interventions in LLMs, suggesting that such methods work by directly influencing the encoded representations that guide the model’s generation process.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

    🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

    The post Why Do Task Vectors Exist in Pretrained LLMs? This AI Research from MIT and Improbable AI Uncovers How Transformers Form Internal Abstractions and the Mechanisms Behind in-Context Learning (ICL) appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet OREO (Offline REasoning Optimization): An Offline Reinforcement Learning Method for Enhancing LLM Multi-Step Reasoning
    Next Article ConfliBERT: A Domain-Specific Language Model for Political Violence Event Detection and Classification

    Related Posts

    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    June 1, 2025
    Artificial Intelligence

    LWiAI Podcast #201 – GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    ADI | Snap One and Perficient Win Coveo Relevance Accelerator Award

    Development

    Risk Management Data Strategy – Insights from an Inquisitive Overseer

    Development

    Create and Build Packer Template & Images for AWS

    Development

    How to Fix ERROR_RXACT_COMMIT_NECESSARY

    Operating Systems

    Highlights

    Development

    Landing Page UI Components For React & Next.js – Page UI

    May 6, 2024

    A collection of templates, components and examples to create beautiful, high-converting landing pages with React…

    Google Wallet brings digital IDs to more states – how to add yours

    April 29, 2025

    10 speech-to-text use cases to inspire your applications

    December 20, 2024

    How to Completely Remove Node.js from Your Computer

    February 6, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.