Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 2, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 2, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 2, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 2, 2025

      How Red Hat just quietly, radically transformed enterprise server Linux

      June 2, 2025

      OpenAI wants ChatGPT to be your ‘super assistant’ – what that means

      June 2, 2025

      The best Linux VPNs of 2025: Expert tested and reviewed

      June 2, 2025

      One of my favorite gaming PCs is 60% off right now

      June 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      `document.currentScript` is more useful than I thought.

      June 2, 2025
      Recent

      `document.currentScript` is more useful than I thought.

      June 2, 2025

      Adobe Sensei and GenAI in Practice for Enterprise CMS

      June 2, 2025

      Over The Air Updates for React Native Apps

      June 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025
      Recent

      You can now open ChatGPT on Windows 11 with Win+C (if you change the Settings)

      June 2, 2025

      Microsoft says Copilot can use location to change Outlook’s UI on Android

      June 2, 2025

      TempoMail — Command Line Temporary Email in Linux

      June 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Microsoft Research Introduces Data Formulator: An AI Application that Leverages LLMs to Transform Data and Create Rich Visualizations

    Microsoft Research Introduces Data Formulator: An AI Application that Leverages LLMs to Transform Data and Create Rich Visualizations

    February 15, 2025

    Most modern visualization authoring tools like Charticulator, Data Illustrator, and Lyra,  and libraries like ggplot2, and VegaLite expect tidy data, where every variable to be visualized is a column and each observation is a row. When the input data is in a tidy format, authors simply need to bind data columns to visual channels, otherwise, they need to prepare the data, even if the original data is clean and contains all the information. Moreover, users must transform their data using specialized libraries like tidyverse or pandas, or separate tools like Wrangler before they can create visualizations. This requirement poses two major challenges – the need for programming expertise or specialized tool knowledge, and the inefficient workflow of constantly switching between data transformation and visualization steps.

    Various approaches have emerged to simplify visualization creation, starting with the grammar of graphics concepts that established the foundation for mapping data to visual elements. High-level grammar-based tools like ggplot2, Vega-Lite, and Altair have gained popularity for their concise syntax and abstraction of complex implementation details. More advanced approaches include visualization by demonstration tools like Lyra 2 and VbD, which allow users to specify visualizations through direct manipulation. Natural language interfaces, such as NCNet and VisQA, have also been developed to make visualization creation more intuitive. However, these solutions either require tidy data input or introduce new complexities by focusing on low-level specifications similar to Falx.

    A team from Microsoft Research has proposed Data Formulator, an innovative visualization authoring tool built around a new paradigm called concept binding. It allows users to express their visualization intent by binding data concepts to visual channels, where data concepts can either come from existing columns or be created on demand. The tool supports two methods for creating new concepts: natural language prompts for data derivation and example-based input for data reshaping. When users select a chart type and map their desired concepts, Data Formulator’s AI backend infers the necessary data transformations and generates candidate visualizations. The system provides explanatory feedback for multiple candidates, enabling users to inspect, refine, and iterate on their visualizations through an intuitive interface.

    Data Formulator’s architecture is built around the core concept of treating data concepts as first-class objects that serve as abstractions of existing and potential future table columns. This design fundamentally differs from traditional approaches by focusing on concept-level transformations rather than table-level operators, making it more intuitive for users to communicate with the AI agent and verify results. The natural language component of the tool utilizes LLMs’ ability to understand high-level intent and natural concepts, while the programming-by-example component offers precise, unambiguous reshaping operations through demonstration. This hybrid architecture allows users to work with familiar shelf-configuration tools while accessing powerful transformation capabilities.

    Data Formulator’s evaluation through user testing revealed promising results in task completion and usability. Participants completed all assigned visualization tasks within an average time of 20 minutes, with Task 6 requiring the most time due to its complexity involving 7-day moving average calculations. The system’s dual-interaction approach proved effective, though some participants needed occasional hints regarding concept type selection and data type management. For derived concepts, users averaged 1.62 prompt attempts with relatively concise descriptions (average of 7.28 words), and the system generated approximately 1.94 candidates per prompt. Most challenges encountered were minor and related to interface familiarization rather than fundamental usability issues.

    In conclusion, the team introduced Data Formulator which represents a significant advancement in visualization authoring by effectively addressing the persistent challenge of data transformation through its concept-driven approach. The tool’s innovative combination of AI assistance and user interaction enables authors to create complex visualizations without directly handling data transformations. User studies have validated the tool’s effectiveness, showing that even users facing complex data transformation requirements can successfully create their desired visualizations. Looking forward, this concept-driven visualization approach shows promise for influencing the next generation of visual data exploration and authoring tools, potentially eliminating the long-standing barrier of data transformation in visualization creation.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

    The post Microsoft Research Introduces Data Formulator: An AI Application that Leverages LLMs to Transform Data and Create Rich Visualizations appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous Articlestree – visualize the directory tree structure of an S3 bucket
    Next Article This AI Paper from UC Berkeley Introduces a Data-Efficient Approach to Long Chain-of-Thought Reasoning for Large Language Models

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 2, 2025
    Machine Learning

    MiMo-VL-7B: A Powerful Vision-Language Model to Enhance General Visual Understanding and Multimodal Reasoning

    June 2, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Call of Duty: Black Ops 6 and Warzone causing blue screen errors on PC since Season 2 launched — is the anticheat to blame?

    News & Updates

    ZipNN: A New Lossless Compression Method Tailored to Neural Networks

    Development

    API with NestJS #158. Soft deletes with the Drizzle ORM

    Development

    Cybersecurity Startup Treacle Raises About 40 million in Pre-Seeding Round

    Development
    GetResponse

    Highlights

    CVE-2025-2305 – Apache Linux Path Traversal Vulnerability

    May 16, 2025

    CVE ID : CVE-2025-2305

    Published : May 16, 2025, 1:15 p.m. | 1 hour, 48 minutes ago

    Description : A Path traversal vulnerability in the file
    download functionality was identified. This vulnerability allows
    unauthenticated users to download arbitrary files, in the context of the
    application server, from the Linux server.

    Severity: 8.6 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Celebrating GAAD by Committing to Universal Design: Starting with Tolerance for Error

    May 20, 2025

    Crafting Responsible Immersive Experiences

    November 1, 2024

    Moonshot AI Research Introduce Mixture of Block Attention (MoBA): A New AI Approach that Applies the Principles of Mixture of Experts (MoE) to the Attention Mechanism

    February 19, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.