Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Designing For TV: Principles, Patterns And Practical Guidance (Part 2)

      September 5, 2025

      Neo4j introduces new graph architecture that allows operational and analytics workloads to be run together

      September 5, 2025

      Beyond the benchmarks: Understanding the coding personalities of different LLMs

      September 5, 2025

      Top 10 Use Cases of Vibe Coding in Large-Scale Node.js Applications

      September 3, 2025

      Building smarter interactions with MCP elicitation: From clunky tool calls to seamless user experiences

      September 4, 2025

      From Zero to MCP: Simplifying AI Integrations with xmcp

      September 4, 2025

      Distribution Release: Linux Mint 22.2

      September 4, 2025

      Coded Smorgasbord: Basically, a Smorgasbord

      September 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Drupal 11’s AI Features: What They Actually Mean for Your Team

      September 5, 2025
      Recent

      Drupal 11’s AI Features: What They Actually Mean for Your Team

      September 5, 2025

      Why Data Governance Matters More Than Ever in 2025?

      September 5, 2025

      Perficient Included in the IDC Market Glance for Digital Business Professional Services, 3Q25

      September 5, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      How DevOps Teams Are Redefining Reliability with NixOS and OSTree-Powered Linux

      September 5, 2025
      Recent

      How DevOps Teams Are Redefining Reliability with NixOS and OSTree-Powered Linux

      September 5, 2025

      Distribution Release: Linux Mint 22.2

      September 4, 2025

      ‘Cronos: The New Dawn’ was by far my favorite experience at Gamescom 2025 — Bloober might have cooked an Xbox / PC horror masterpiece

      September 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Tech & Work»Garbage in, garbage out: The importance of data quality when training AI models

    Garbage in, garbage out: The importance of data quality when training AI models

    June 2, 2025

    As every company moves to implement AI in some form or another, data is king. Without quality data to train on, the AI likely won’t deliver the results people are looking for and any investment made into training the model won’t pay off in the way it was intended.  

    “If you’re training your AI model on poor quality data, you’re likely to get bad results,” explained Robert Stanley, senior director of special projects at Melissa. 

    According to Stanley, there are a number of data quality best practices to stick to when it comes to training data. “You need to have data that is of good quality, which means it’s properly typed, it’s fielded correctly, it’s deduplicated, and it’s rich. It’s accurate, complete and augmented or well-defined with lots of useful metadata, so that there’s context for the AI model to work off of,” he said. 

    If the training data does not meet those standards, it’s likely that the outputs of the AI model won’t be reliable, Stanley explained. For instance, if data has the wrong fields, then the model might start giving strange and unexpected outputs. “It thinks it’s giving you a noun, but it’s really a verb. Or it thinks it’s giving you a number, but it’s really a string because it’s fielded incorrectly,” he said. 

    It’s also important to ensure that you have the right kind of data that is appropriate to the model you are trying to build, whether that be business data or contact data or health care data. 

    “I would just sort of be going down these data quality steps that would be recommended before you even start your AI project,” he said. Melissa’s “Gold Standard” for any business critical data is to use data that’s coming in from at least three different sources, and is dynamically updated. 

    According to Stanley, large language models (LLMs) unfortunately really want to please their users, which sometimes means giving answers that look like compelling right answers, but are actually incorrect. 

    This is why the data quality process doesn’t stop after training; it’s important to continue testing the model’s outputs to ensure that its responses are what you’d expect to see. 

    “You can ask questions of the model and then check the answers by comparing it back to the reference data and making sure it’s matching your expectations, like they’re not mixing up names and addresses or anything like that,” Stanley explained.

    For instance, Melissa has curated reference datasets that include geographic, business, identification, and other domains, and its informatics division utilizes ontological reasoning using formal semantic technologies in order to compare AI results to expected results based on real world models. 

    The post Garbage in, garbage out: The importance of data quality when training AI models appeared first on SD Times.

    Source: Read More 

    news
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSecure GUI VPN for Kali Linux
    Next Article Designing For Neurodiversity

    Related Posts

    Tech & Work

    Designing For TV: Principles, Patterns And Practical Guidance (Part 2)

    September 5, 2025
    Tech & Work

    Neo4j introduces new graph architecture that allows operational and analytics workloads to be run together

    September 5, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    6 hidden Android features every user should know – and how they make life easier

    News & Updates

    CVE-2025-43946 – TCPWave DDI Remote Code Execution Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Stop buying Steam games that are already on Game Pass — I found an extension that does the work for you

    News & Updates

    The Laravel Way to Build AI Agents That Actually Work

    Development

    Highlights

    Legacy OneNote for Windows 10 app is retiring on October 14, 2025

    August 22, 2025

    On October 14, 2025, Microsoft is ending support for “OneNote for Windows 10 (legacy),” which…

    Rilasciata FunOS 25.04: la Distribuzione GNU/Linux Leggera e Moderna Basata su Ubuntu 25.04

    Rilasciata FunOS 25.04: la Distribuzione GNU/Linux Leggera e Moderna Basata su Ubuntu 25.04

    April 21, 2025

    Unpatched Versa Concerto Flaws Let Attackers Escape Docker and Compromise Host

    May 22, 2025

    Perficient Obsesses Over Outcomes to Drive Client Success Through Expertise and Innovation

    April 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.