Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 18, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 18, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 18, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 18, 2025

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025

      Your Android devices are getting several upgrades for free – including a big one for Auto

      May 18, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025
      Recent

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025
      Recent

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Can LLMs Visualize Graphics? Assessing Symbolic Program Understanding in AI

    Can LLMs Visualize Graphics? Assessing Symbolic Program Understanding in AI

    August 19, 2024

    Large language models (LLMs) have demonstrated the ability to generate generic computer programs, providing an understanding of program structure. However, it is challenging to find the true capabilities of LLMs, especially in finding tasks they did not see during training. It is crucial to find whether LLMs can truly “understand” the symbolic graphics programs, which generate visual content when executed. They define this understanding as the ability to understand the semantic content of the rendered image based only on the raw text input, of the program. This method involves answering questions about the image’s content without actually viewing it, which is easy with visual input but much harder when relying only on the program’s text.

    Existing research on symbolic graphics programs has primarily focused on procedural modeling for 2D shapes and 3D geometry. These programs, such as Constructive Solid Geometry (CSG), Computer-Aided Design (CAD), and Scalable Vector Graphics (SVG), provide a clear and interpretable representation of visual content. Moreover, LLMs have been applied to various programming tasks, such as code retrieval, automated testing, and generation; however, understanding symbolic graphics programs is largely different, as their semantic meaning is often defined visually. Existing benchmarks for LLMs focus on non-graphics program understanding, while vision-language models are evaluated using multimodal datasets for tasks like image captioning and visual question answering.

    Researchers from the Max Planck Institute for Intelligent Systems, Tübingen, University of Cambridge, and MIT have proposed a novel approach to evaluate and enhance LLMs’ understanding of symbolic graphics programs. A benchmark called SGP-Bench is introduced for LLMs’ semantic understanding and consistency in interpreting SVG (2D vector graphics) and CAD (2D/3D objects) programs. Moreover, a new fine-tuning method based on a collected instruction-following dataset called symbolic instruction tuning is developed to enhance performance. Also, the symbolic MNIST dataset created by the researchers shows major differences between LLM and human understanding of symbolic graphics programs.

    The process of constructing a benchmark to evaluate LLMs’ understanding of symbolic graphics programs uses a scalable and efficient pipeline. It uses a powerful vision-language model (GPT-4o) to generate semantic questions based on rendered images of the symbolic programs. Further, human annotators verify the quality and accuracy of these automatically generated question-answer pairs. This approach reduces the manual effort needed compared to traditional data creation methods. The process for SVG and 2D CAD programs is straightforward as they directly produce 2D images, but in 3D CAD programs, the 3D models are first converted into 2D images from multiple fixed camera positions.

    The evaluation of LLMs’ understanding of symbolic graphics programs is done on the SGP-MNIST dataset that consists of 1,000 SVG programs that render MNIST-like digit images, with 100 programs per digit (0-9). While humans can easily recognize the images, LLMs found it extremely challenging to interpret the symbolic programs. Even the advanced GPT-4o model performed only slightly better than random guessing. This stark contrast between human and LLM performance highlights a significant gap in how machines process and understand symbolic representations of visual information compared to humans.

    In conclusion, researchers present a new way to evaluate LLMs by assessing their ability to understand images directly from their symbolic graphics programs without visual input. The researchers created the SGP-Bench, a benchmark that effectively measures how well LLMs perform in this task. They also introduced Symbolic Instruction Finetuning (SIT) to enhance LLMs’ ability to interpret graphics programs. This research helps provide a clearer picture of LLM capabilities and promotes the creation of varied evaluation tasks. Future research includes investigating how LLMs understand semantics in this area and working on developing advanced methods to improve their performance in these tasks.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Arcee AI Introduces Arcee Swarm: A Groundbreaking Mixture of Agents MoA Architecture Inspired by the Cooperative Intelligence Found in Nature Itself

    The post Can LLMs Visualize Graphics? Assessing Symbolic Program Understanding in AI appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCohere Rerank 3 Nimble now generally available on Amazon SageMaker JumpStart
    Next Article Harvard and Google Researchers Developed a Novel Communication Learning Approach to Enhance Decision-Making in Noisy Restless Multi-Arm Bandits

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 18, 2025
    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    May 18, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Windows 11’s File Explorer UI could soon scale better when you change OS settings

    Operating Systems

    Windows 11’s bug-fixing update is making things worse

    News & Updates

    Smaller Can Be Better: Exploring the Sampling Efficiency of Latent Diffusion Models

    Development

    Sign in as anyone: Bypassing SAML SSO authentication with parser differentials

    News & Updates
    GetResponse

    Highlights

    CVE-2025-37818 – LoongArch Linux Kernel Invalid PMD Pointer Dereference Vulnerability

    May 8, 2025

    CVE ID : CVE-2025-37818

    Published : May 8, 2025, 7:15 a.m. | 58 minutes ago

    Description : In the Linux kernel, the following vulnerability has been resolved:

    LoongArch: Return NULL from huge_pte_offset() for invalid PMD

    LoongArch’s huge_pte_offset() currently returns a pointer to a PMD slot
    even if the underlying entry points to invalid_pte_table (indicating no
    mapping). Callers like smaps_hugetlb_range() fetch this invalid entry
    value (the address of invalid_pte_table) via this pointer.

    The generic is_swap_pte() check then incorrectly identifies this address
    as a swap entry on LoongArch, because it satisfies the “!pte_present()
    && !pte_none()” conditions. This misinterpretation, combined with a
    coincidental match by is_migration_entry() on the address bits, leads to
    kernel crashes in pfn_swap_entry_to_page().

    Fix this at the architecture level by modifying huge_pte_offset() to
    check the PMD entry’s content using pmd_none() before returning. If the
    entry is invalid (i.e., it points to invalid_pte_table), return NULL
    instead of the pointer to the slot.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    The 2025 Work Trend Index Annual Report: The Year the Frontier Firm Is Born

    April 23, 2025

    Patient and Employee Data Exposed in June Ascension Cyberattack: New Details Released

    December 24, 2024

    Is Apple finally going to make a TV set? Maybe. Here’s what it’ll depend on

    November 18, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.