Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 20, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 20, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 20, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 20, 2025

      Helldivers 2: Heart of Democracy update is live, and you need to jump in to save Super Earth from the Illuminate

      May 20, 2025

      Qualcomm’s new Adreno Control Panel will let you fine-tune the GPU for certain games on Snapdragon X Elite devices

      May 20, 2025

      Samsung takes on LG’s best gaming TVs — adds NVIDIA G-SYNC support to 2025 flagship

      May 20, 2025

      The biggest unanswered questions about Xbox’s next-gen consoles

      May 20, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      HCL Commerce V9.1 – The Power of HCL Commerce Search

      May 20, 2025
      Recent

      HCL Commerce V9.1 – The Power of HCL Commerce Search

      May 20, 2025

      Community News: Latest PECL Releases (05.20.2025)

      May 20, 2025

      Getting Started with Personalization in Sitecore XM Cloud: Enable, Extend, and Execute

      May 20, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Helldivers 2: Heart of Democracy update is live, and you need to jump in to save Super Earth from the Illuminate

      May 20, 2025
      Recent

      Helldivers 2: Heart of Democracy update is live, and you need to jump in to save Super Earth from the Illuminate

      May 20, 2025

      Qualcomm’s new Adreno Control Panel will let you fine-tune the GPU for certain games on Snapdragon X Elite devices

      May 20, 2025

      Samsung takes on LG’s best gaming TVs — adds NVIDIA G-SYNC support to 2025 flagship

      May 20, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Apple Researchers Present ReALM: An AI that Can ‘See’ and Understand Screen Context

    Apple Researchers Present ReALM: An AI that Can ‘See’ and Understand Screen Context

    April 3, 2024

    Within natural language processing (NLP), reference resolution is a critical challenge as it involves determining the antecedent or referent of a word or phrase within a text, which is essential for understanding and successfully handling different types of context. Such contexts can range from previous dialogue turns in a conversation to non-conversational elements, like entities on a user’s screen or background processes.

    Researchers aim to tackle the core issue of how to enhance the capability of large language models (LLMs) in resolving references, especially for non-conversational entities. Existing research includes models like MARRS, focusing on multimodal reference resolution, especially for on-screen content. Vision transformers and vision+text models have also contributed to the progress, although heavy computational requirements limit their application. 

    Apple researchers propose Reference Resolution As Language Modeling (ReALM) by reconstructing the screen using parsed entities and their locations to generate a purely textual representation of the screen visually representative of the screen content. The parts of the screen that are entities are then tagged so that the LM has context around where entities appear and what the text surrounding them is (Eg: call the business number). They also claim that this is the first work using an LLM that aims to encode context from a screen to the best of their knowledge.

    For fine-tuning the LLM, they used the FLAN-T5 model. First, they provided the parsed input to the model and fine-tuned it, sticking to the default fine-tuning parameters only. For each data point consisting of a user query and the corresponding entities, they convert it to a sentence-wise format that can be fed to an LLM for training. The entities are shuffled before being sent to the model so that the model does not overfit particular entity positions.

    ReALM outperforms the MARRS model in all types of datasets. It can also outperform GPT-3.5, which has a significantly larger number of parameters than the ReALM model by several orders of magnitude. ReALM performs in the same ballpark as the latest GPT-4 despite being a much lighter (and faster) model. Researchers have highlighted the gains on onscreen datasets and found that the ReALM model with the textual encoding approach can perform almost as well as GPT-4 despite the latter being provided with screenshots.

    In conclusion, this research introduces ReALM, which uses LLMs to perform reference resolution by encoding entity candidates as natural text. They demonstrated how entities on the screen can be passed into an LLM using a unique textual representation that effectively summarizes the user’s screen while retaining the relative spatial positions of these entities.  ReaLM outperforms previous approaches and performs roughly as well as the state-of-the-art LLM today, GPT-4, despite having fewer parameters, even for onscreen references, despite being purely in the textual domain. It also outperforms GPT-4 for domain-specific user utterances, thus making ReaLM an ideal choice for a practical reference resolution system.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 39k+ ML SubReddit

    The post Apple Researchers Present ReALM: An AI that Can ‘See’ and Understand Screen Context appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleVite-Vue-MD: Import .md file as Vue.js Components
    Next Article DALL·E Images Now Editable Directly in ChatGPT on Web and Mobile Platforms

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 21, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-5011 – MoonlightL Hexo-Boot Cross-Site Scripting Vulnerability

    May 21, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

    Development

    FileSorter – Ultimate Lightweight File Organizer on Windows

    Web Development

    bmk is a command-line bookmark manager

    Linux

    Embrace Strategic Thinking: 3 Smart Ways to Escape Admin Chaos and Innovate Boldly

    Development

    Highlights

    Top 5 Use Cases for AI Agents in the Insurance Industry

    April 22, 2025

    Understanding AI Agents’ Role in Insurance How Does Rigorous Software Testing Help Avoid Such Issues? Use Cases of AI Agents in Insurance Key Benefits of AI Agents in Insurance How can Tx Help You Improve Your AI Agents Quality? Summary In the rapidly evolving insurance ecosystem, balancing regulatory changes with dynamic customer needs is one … Top 5 Use Cases for AI Agents in the Insurance Industry
    The post Top 5 Use Cases for AI Agents in the Insurance Industry first appeared on TestingXperts.

    CodeSOD: Leap to the Past

    May 8, 2025

    8 Best Free and Open Source Linux Anti-Spam Tools

    March 15, 2025

    CLI Experiments : Dashboard (Part 2)

    July 2, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.