Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Microsoft Researchers Propose Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

    Microsoft Researchers Propose Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

    April 9, 2024

    Large language models (LLMs) excel in language comprehension and reasoning tasks but lack spatial reasoning exploration, a vital aspect of human cognition. Humans demonstrate remarkable skills in mental imagery, termed the Mind’s Eye, enabling imagination of the unseen world. This capability remains relatively unexplored in LLMs, highlighting a gap in their understanding of spatial concepts and their inability to replicate human-like imagination.

    Previous studies have highlighted the remarkable achievements of LLMs in language tasks but underscored their underexplored spatial reasoning abilities. While human cognition relies on spatial reasoning for environmental interaction, LLMs primarily depend on verbal reasoning. Humans augment spatial awareness through mental imagery, enabling tasks like navigation and mental stimulation, a concept extensively studied across neuroscience, philosophy, and cognitive science.

    Microsoft researchers propose Visualization-of-Thought (VoT) prompting. It can generate and manipulate mental images similar to the human mind’s eye for spatial reasoning. Through VoT prompting, LLMs utilise a visuospatial sketchpad to visualise reasoning steps, enhancing subsequent spatial reasoning. VoT employs zero-shot prompting, utilising LLMs’ capability to acquire mental images from text-based visual art, instead of relying on few-shot demonstrations or text-to-image techniques with CLIP.

    VoT prompts LLMs to generate visualisations after each reasoning step, forming interleaved reasoning traces. Utilising a visuospatial sketchpad tracks the visual state, represented by partial solutions at each step. This mechanism grounds LLMs’ reasoning in the visual context, improving their spatial reasoning abilities within tasks like navigation and tiling.

    GPT-4 VoT surpasses other settings across all tasks and metrics, indicating the effectiveness of visual state tracking. Comparisons reveal significant performance gaps, highlighting VoT’s superiority. In the natural language navigation task, GPT-4 VoT outperforms GPT-4 w/o VoT by 27%. Notably, GPT-4 CoT lags behind GPT-4V CoT in visual tasks, suggesting the advantage of grounding LLMs with a 2D grid for spatial reasoning.

    The key contributions of this research are the following:

    The paper explores LLMs’ mental imagery for spatial reasoning, analysing its nature and constraints while delving into its origin from code pre-training.

    It introduces two unique tasks, “visual navigation” and “visual tiling,” accompanied by synthetic datasets. These offer diverse sensory inputs for LLMs and varying complexity levels, thereby providing a robust testbed for spatial reasoning research.

    The researchers propose VoT prompting, which effectively elicits LLMs’ mental imagery for spatial reasoning, showcasing superior performance compared to other prompting methods and existing multimodal large language models (MLLMs). This capability resembles the human mind’s eye process, implying its potential applicability in enhancing MLLMs.

    In conclusion, the research introduces VoT, which mirrors human cognitive function in visualising mental images. VoT empowers LLMs to excel in multi-hop spatial reasoning tasks, surpassing MLLMs in visual tasks. Similar to the mind’s eye process, this capability indicates promise for MLLMs. The findings underscore VoT’s efficacy in enhancing spatial reasoning in LLMs, suggesting its potential to advance multimodal language models.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    The post Microsoft Researchers Propose Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet Depot: A Developer Focused Startup with an AI-Powered Approach to Faster Docker Builds
    Next Article Top Product Management Books to Read in 2024

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2024-47893 – VMware GPU Firmware Memory Disclosure

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Easterly Calls for Resilience Against China as Biden Preps Cybersecurity Order

    Development

    Appium TestNG error : Attempting bi-dialect session, assuming Postel’s Law holds true on the remote end

    Development

    Can LLMs Visualize Graphics? Assessing Symbolic Program Understanding in AI

    Development

    Collective Monte Carlo Tree Search (CoMCTS): A New Learning-to-Reason Method for Multimodal Large Language Models

    Development

    Highlights

    Development

    Benefits of Education Accessibility in the Universal Design Series – 5

    May 17, 2024

    Welcome to the latest installment of our Universal Design Series, where we explore the invaluable…

    10 Ways to Avoid Mistakes during Project Development (Free Download)

    August 5, 2024

    WSO2’s latest product release allows AI services to be managed like APIs

    November 5, 2024

    Advancing Parallel Programming with HPC-INSTRUCT: Optimizing Code LLMs for High-Performance Computing

    December 29, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.