Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Gaze-LLE: A New AI Model for Gaze Target Estimation Built on Top of a Frozen Visual Foundation Model

    Gaze-LLE: A New AI Model for Gaze Target Estimation Built on Top of a Frozen Visual Foundation Model

    December 17, 2024

    Accurately predicting where a person is looking in a scene—gaze target estimation—represents a significant challenge in AI research. Integrating complex cues such as head orientation and scene context must be used to infer gaze direction. Traditionally, methods for this problem use multi-branch architectures, processing the scene and head features separately before integrating them with auxiliary inputs, such as depth and pose. However, these methods are computationally intensive, hard to train, and often fail to generalize well across datasets. This calls for overcoming these issues so that the applications in understanding human behavior, robotics, and assistive technologies can progress.

    Existing gaze estimation methods heavily depend on multi-branch pipelines, where separate encoders handle the scene and head features, followed by fusion modules to combine these inputs. To improve efficiency, many of these models use additional signals, such as pose, depth, and auxiliary features, which are obtained from specific modules. However, these approaches have several limitations. First, their high computational cost makes real-time implementation impossible. Second, these systems generally require large amounts of labeled training data, which is labor-intensive and nearly impossible to scale. This limits their ability to transfer learned generalizations to numerous environments and datasets when relying on particular encoders with supplementary inputs.

    To address these issues, researchers from the Georgia Institute of Technology and the University of Illinois Urbana-Champaign introduced Gaze-LLE, a streamlined and efficient framework for gaze target estimation. Gaze-LLE eliminates the need for complex multi-branch architectures through a static DINOv2 visual encoder and a minimalist decoder module. The framework uses a unified backbone for feature extraction and has an innovative head positional prompting mechanism that allows the gaze estimation to be specific to certain individuals in the scene. Some of the primary contributions of this methodology are reducing trainable parameters to a significant level that translates into 95% fewer computations in comparison with traditional methods. Gaze-LLE is also a successful method in transforming transformer-based encoders at a large scale. It accurately enables gaze estimation without complex auxiliary models and allows for the maintenance of superior performance with minimal adjustments across a range of datasets and tasks through a simple and scalable architecture.

    The architecture of Gaze-LLE comprises two main components. First, a frozen DINOv2 visual encoder extracts robust features from the input image, which are then projected into a lower-dimensional space via a linear layer for efficient processing. Second, a lightweight gaze decoder integrates these scene features with a head position embedding that encodes the location of the individual being observed. This mechanism allows the model to focus on the specific source of gaze. The gaze decoder consists of three transformer layers intended to be used for feature enhancement, and it produces a gaze heatmap that indicates possible targets of gaze, as well as an in-frame classification to determine whether the gaze falls within the observable frame. The simple model requires using a straightforward training objective: simply a pixel-wise binary cross-entropy loss, allowing the optimal tuning without a sophisticated approach based on complex multitasking objectives. Benchmarks comprised benchmark datasets: GazeFollow, VideoAttentionTarget, and ChildPlay.

    Gaze-LLE achieves state-of-the-art performance across multiple benchmarks with significantly fewer parameters and faster training times. The GazeFollow dataset, yields an AUC of 0.958 and an average L2 error of 0.099, besting prior methods both in precision and in computational efficiency. The training time is, in particular, remarkably efficient, with the model achieving convergence within less than 1.5 GPU hours and significantly outperforming traditional multi-branch architectures. Further, Gaze-LLE also exhibits strong generalization properties as its high performance is retained over several datasets, like ChildPlay and GOO-Real, even without fine-tuning. Results like these show that the frozen foundational models in an optimized architecture can be useful for accurate and flexible gaze estimation. 

    In summary, Gaze-LLE redefines gaze target estimation with a streamlined and effective framework that brings in fundamental visual encoders and an innovative head positional prompting system. Free from the intricacies of architectures with multiple branches, this achieves higher accuracy, better efficiency, and scalability. Moreover, its ability to generalize across various datasets provides promise for its applications in further research on human behavior and related fields, thus introducing a new benchmark for the advancement of gaze estimation research.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

    🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

    The post Gaze-LLE: A New AI Model for Gaze Target Estimation Built on Top of a Frozen Visual Foundation Model appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleUBC Researchers Introduce ‘First Explore’: A Two-Policy Learning Approach to Rescue Meta-Reinforcement Learning RL from Failed Explorations
    Next Article Llama 3.3 70B now available in Amazon SageMaker JumpStart

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    FOSS Weekly #25.10: Skype is Dead, GNOME 48 Features, Ubuntu Versions, Nano Guide and More Linux Stuff

    Linux

    RAGEval: An AI Framework for Automatically Generating Evaluation Datasets to Evaluate the Knowledge Usage Ability of Different LLMs in Different Scenarios

    Development

    Secure Your Full-Stack Web Applications

    Web Development
    Clair Obscur: Expedition 33 preorders are open — Which version of the game is right for you?

    Clair Obscur: Expedition 33 preorders are open — Which version of the game is right for you?

    News & Updates

    Highlights

    Machine Learning

    This AI Paper Explores Embodiment, Grounding, Causality, and Memory: Foundational Principles for Advancing AGI Systems

    January 11, 2025

    Artificial General Intelligence (AGI) seeks to create systems that can perform various tasks, reasoning, and…

    CVE-2025-47226 – Grokability Snipe-IT Authorization Bypass

    May 2, 2025

    Generate Documentation in Laravel with AI

    March 27, 2025

    DistroWatch Weekly, Issue 1104

    January 12, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.