Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      Handling JavaScript Event Listeners With Parameters

      July 21, 2025

      ChatGPT now has an agent mode

      July 21, 2025

      Scrum Alliance and Kanban University partner to offer new course that teaches both methodologies

      July 21, 2025

      Is ChatGPT down? You’re not alone. Here’s what OpenAI is saying

      July 21, 2025

      I found a tablet that could replace my iPad and Kindle – and it’s worth every penny

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 21, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 21, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Execute Ping Commands and Get Back Structured Data in PHP

      July 21, 2025
      Recent

      Execute Ping Commands and Get Back Structured Data in PHP

      July 21, 2025

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 21, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 21, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I Made Kitty Terminal Even More Awesome by Using These 15 Customization Tips and Tweaks

      July 21, 2025
      Recent

      I Made Kitty Terminal Even More Awesome by Using These 15 Customization Tips and Tweaks

      July 21, 2025

      Microsoft confirms active cyberattacks on SharePoint servers

      July 21, 2025

      How to Manually Check & Install Windows 11 Updates (Best Guide)

      July 21, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Tech & Work»Garbage in, garbage out: The importance of data quality when training AI models

    Garbage in, garbage out: The importance of data quality when training AI models

    June 2, 2025

    As every company moves to implement AI in some form or another, data is king. Without quality data to train on, the AI likely won’t deliver the results people are looking for and any investment made into training the model won’t pay off in the way it was intended.  

    “If you’re training your AI model on poor quality data, you’re likely to get bad results,” explained Robert Stanley, senior director of special projects at Melissa. 

    According to Stanley, there are a number of data quality best practices to stick to when it comes to training data. “You need to have data that is of good quality, which means it’s properly typed, it’s fielded correctly, it’s deduplicated, and it’s rich. It’s accurate, complete and augmented or well-defined with lots of useful metadata, so that there’s context for the AI model to work off of,” he said. 

    If the training data does not meet those standards, it’s likely that the outputs of the AI model won’t be reliable, Stanley explained. For instance, if data has the wrong fields, then the model might start giving strange and unexpected outputs. “It thinks it’s giving you a noun, but it’s really a verb. Or it thinks it’s giving you a number, but it’s really a string because it’s fielded incorrectly,” he said. 

    It’s also important to ensure that you have the right kind of data that is appropriate to the model you are trying to build, whether that be business data or contact data or health care data. 

    “I would just sort of be going down these data quality steps that would be recommended before you even start your AI project,” he said. Melissa’s “Gold Standard” for any business critical data is to use data that’s coming in from at least three different sources, and is dynamically updated. 

    According to Stanley, large language models (LLMs) unfortunately really want to please their users, which sometimes means giving answers that look like compelling right answers, but are actually incorrect. 

    This is why the data quality process doesn’t stop after training; it’s important to continue testing the model’s outputs to ensure that its responses are what you’d expect to see. 

    “You can ask questions of the model and then check the answers by comparing it back to the reference data and making sure it’s matching your expectations, like they’re not mixing up names and addresses or anything like that,” Stanley explained.

    For instance, Melissa has curated reference datasets that include geographic, business, identification, and other domains, and its informatics division utilizes ontological reasoning using formal semantic technologies in order to compare AI results to expected results based on real world models. 

    The post Garbage in, garbage out: The importance of data quality when training AI models appeared first on SD Times.

    Source: Read More 

    news
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSecure GUI VPN for Kali Linux
    Next Article Designing For Neurodiversity

    Related Posts

    Tech & Work

    Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

    July 21, 2025
    Tech & Work

    Handling JavaScript Event Listeners With Parameters

    July 21, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    You can get a free $349 Starlink kit if you live in one of these US states

    News & Updates

    Microsoft and Xbox just gave this Halo fan game their blessing — it’s Vampire Survivors with Master Chief, and I’m having a blast

    News & Updates

    CVE-2025-45997 – Sourcecodester Web-based Pharmacy Product Management System File Upload Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Repeating Image Transition

    Web Development

    Highlights

    This Android wearable lasts for days, and left my Samsung Galaxy Watch in the dust

    July 18, 2025

    OnePlus incorporated user feedback into the design of the Watch 3, resulting into one of…

    Magnanimous is a simple and fast static website generator

    June 10, 2025

    CVE-2025-3640 – Moodle Information Disclosure Vulnerability

    April 25, 2025

    How Innovative Healthcare Organizations Integrate Clinical Intelligence

    April 28, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.