Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 14, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 14, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 14, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 14, 2025

      I test a lot of AI coding tools, and this stunning new OpenAI release just saved me days of work

      May 14, 2025

      How to use your Android phone as a webcam when your laptop’s default won’t cut it

      May 14, 2025

      The 5 most customizable Linux desktop environments – when you want it your way

      May 14, 2025

      Gen AI use at work saps our motivation even as it boosts productivity, new research shows

      May 14, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025
      Recent

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025

      Perficient’s “What If? So What?” Podcast Wins Gold at the 2025 Hermes Creative Awards

      May 14, 2025

      PIM for Azure Resources

      May 14, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025
      Recent

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025

      You can now share an app/browser window with Copilot Vision to help you with different tasks

      May 14, 2025

      Microsoft will gradually retire SharePoint Alerts over the next two years

      May 14, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Whiteboard-of-Thought (WoT) Prompting: A Simple AI Approach to Enhance the Visual Reasoning Abilities of MLLMs Across Modalities

    Whiteboard-of-Thought (WoT) Prompting: A Simple AI Approach to Enhance the Visual Reasoning Abilities of MLLMs Across Modalities

    June 24, 2024

    Large language models (LLMs) have transformed natural language processing (NLP) by demonstrating the effectiveness of increasing the number of parameters and training data for various reasoning tasks. One successful method, chain-of-thought (CoT) prompting, helps language models solve complex problems by breaking them into intermediate steps written as text before giving the final answer, focusing on tasks like arithmetic and symbolic reasoning. This poses an important question: can LLMs tackle tasks that humans solve using visual thinking? Research shows that even the best LLMs perform badly on tasks having visual and spatial reasoning.

    To address these shortcomings, this paper discusses various existing approaches. The first approach is Intermediate reasoning for language models, in which the success of chain-of-thought (CoT) in arithmetic and symbolic reasoning tasks has attracted interest from the NLP community and beyond. The next approach is Tool usage and code augmentation. This approach is compared to using whiteboards, focusing on improving a language model with additional computation, in which a text buffer trained on Python execution traces is used. The last method is Visual and spatial reasoning in LLMs and MLLMs, where the limited success of these models on tasks requiring visual and spatial reasoning is noted. The ability of these models to connect knowledge from text to other areas, like vision, is still debated.

    Researchers from Columbia University have proposed Whiteboard-of-Thought (WoT) prompting, a simple approach to enhance the visual reasoning abilities of MLLMs (multimodal large language models) across modalities. WoT prompting provides MLLMs a metaphorical ‘whiteboard’ where they can draw out reasoning steps as images and then return these images to the model for further processing. This method works without showing examples or special modules, using the models’ existing ability to create code with libraries like Matplotlib and Turtle. This simple method achieves state-of-the-art results on four difficult natural language tasks that require visual and spatial reasoning. 

    The main aim of WoT is to give MLLMs the ability to create images and visually process them to answer queries better. Current MLLMs usually do not inherently possess the ability to produce outputs in the visual domain, so, researchers showed how to create visuals using a model that only generates texts. The images created for visual reasoning are minimal, abstract, and symbolic, and such visuals are developed using a natural process of code. Moreover, several scenarios were found where GPT-4o fails badly when using chain-of-thought, even achieving  0% accuracy in some cases. In contrast, WoT can achieve up to 92% accuracy in the same scenarios.

    The results of the experiments carried out by researchers show that LLMs using text perform best in a 2D grid setting but may perform badly in other types of geometries. The reason could be because of grid settings:

    Being easier to represent as coordinates in text, especially in the form of a simple square.

    Having more data available in this format online, such as tabular data, city grids, and 2D maze coding problems.

    Humans often write about square grids in text, and grid cells, and use them to navigate physical spaces and map conceptual spaces. This poses interesting questions about how spatial understanding differs between humans and LLMs. The WoT performs consistently across various geometries, eliminating the dependencies on 2D-grid-specific textual knowledge and focusing on the general applications of the approach.   

    In conclusion, researchers from Columbia University have introduced WoT, a zero-shot method that enables visual reasoning across modalities in MLLMs. This is achieved by generating code that can create a visual, and then returning the visual back to the model for further reasoning. This paper shows WoT’s capabilities across multiple tasks that need visual and spatial understanding, which have been difficult for current state-of-the-art models depending on text reasoning. However, WoT needs accurate vision systems, so future research should aim to improve state-of-the-art MLLMs to understand detailed geometric figures. 

    Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 45k+ ML SubReddit

    Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generally available! [Advertisement]

    The post Whiteboard-of-Thought (WoT) Prompting: A Simple AI Approach to Enhance the Visual Reasoning Abilities of MLLMs Across Modalities appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet Wisdom AI: An AI Startup that Bring Insights at your Fingertips with AI-Powered Analytics
    Next Article MIPRO: A Novel Optimizer that Outperforms Baselines on Five of Six Diverse Language Model LM Programs Using a Best-in-Class Open-Source Model (Llama-3-8B) by 12.9% accuracy

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 15, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-30419 – NI Circuit Design Suite SymbolEditor Out-of-Bounds Read Vulnerability

    May 15, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    From caching to real-time analytics: Essential use cases for Amazon ElastiCache for Valkey

    Databases

    CVE-2025-32404 – RT-Labs P-Net OOB Write Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    2023: A Year of Groundbreaking Advances in AI and Computing

    Artificial Intelligence

    Rilasciata PorteuX 1.8: La Prima Distribuzione GNU/Linux con Xfce 4.20

    Development

    Highlights

    CVE-2025-4487 – iSourcecode Gym Management System SQL Injection Vulnerability

    May 9, 2025

    CVE ID : CVE-2025-4487

    Published : May 9, 2025, 8:15 p.m. | 4 hours, 3 minutes ago

    Description : A vulnerability was found in itsourcecode Gym Management System 1.0. It has been classified as critical. Affected is an unknown function of the file /ajax.php?action=delete_member. The manipulation of the argument ID leads to sql injection. It is possible to launch the attack remotely. The exploit has been disclosed to the public and may be used.

    Severity: 7.3 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Is it possible to make a jar file using Selenium java and Eclipse and run that file on any machine for the testing

    July 29, 2024

    CNCF Arm64 Pilot: Impact and Insights

    April 14, 2025

    Data is the new petroleum; companies need better pipelines — and better oil-spill clean-up methods

    February 7, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.