Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 5, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 5, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 5, 2025

      CodeSOD: Integral to a Database Read

      June 5, 2025

      Players aren’t buying Call of Duty’s “error” excuse for the ads Activision started forcing into the game’s menus recently

      June 4, 2025

      In Sam Altman’s world, the perfect AI would be “a very tiny model with superhuman reasoning capabilities” for any context

      June 4, 2025

      Sam Altman’s ouster from OpenAI was so dramatic that it’s apparently becoming a movie — Will we finally get the full story?

      June 4, 2025

      One of Microsoft’s biggest hardware partners joins its “bold strategy, Cotton” moment over upgrading to Windows 11, suggesting everyone just buys a Copilot+ PC

      June 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Enable Flexible Pattern Matching with Laravel’s Case-Insensitive Str::is Method

      June 5, 2025
      Recent

      Enable Flexible Pattern Matching with Laravel’s Case-Insensitive Str::is Method

      June 5, 2025

      Laravel OpenRouter

      June 5, 2025

      This Week in Laravel: Starter Kits, Alpine, PDFs and Roles/Permissions

      June 5, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      FOSS Weekly #25.23: Helwan Linux, Quarkdown, Konsole Tweaks, Keyboard Shortcuts and More Linux Stuff

      June 5, 2025
      Recent

      FOSS Weekly #25.23: Helwan Linux, Quarkdown, Konsole Tweaks, Keyboard Shortcuts and More Linux Stuff

      June 5, 2025

      Grow is a declarative website generator

      June 5, 2025

      Raspberry Pi 5 Desktop Mini PC: Benchmarking

      June 5, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Tech & Work»Garbage in, garbage out: The importance of data quality when training AI models

    Garbage in, garbage out: The importance of data quality when training AI models

    June 2, 2025

    As every company moves to implement AI in some form or another, data is king. Without quality data to train on, the AI likely won’t deliver the results people are looking for and any investment made into training the model won’t pay off in the way it was intended.  

    “If you’re training your AI model on poor quality data, you’re likely to get bad results,” explained Robert Stanley, senior director of special projects at Melissa. 

    According to Stanley, there are a number of data quality best practices to stick to when it comes to training data. “You need to have data that is of good quality, which means it’s properly typed, it’s fielded correctly, it’s deduplicated, and it’s rich. It’s accurate, complete and augmented or well-defined with lots of useful metadata, so that there’s context for the AI model to work off of,” he said. 

    If the training data does not meet those standards, it’s likely that the outputs of the AI model won’t be reliable, Stanley explained. For instance, if data has the wrong fields, then the model might start giving strange and unexpected outputs. “It thinks it’s giving you a noun, but it’s really a verb. Or it thinks it’s giving you a number, but it’s really a string because it’s fielded incorrectly,” he said. 

    It’s also important to ensure that you have the right kind of data that is appropriate to the model you are trying to build, whether that be business data or contact data or health care data. 

    “I would just sort of be going down these data quality steps that would be recommended before you even start your AI project,” he said. Melissa’s “Gold Standard” for any business critical data is to use data that’s coming in from at least three different sources, and is dynamically updated. 

    According to Stanley, large language models (LLMs) unfortunately really want to please their users, which sometimes means giving answers that look like compelling right answers, but are actually incorrect. 

    This is why the data quality process doesn’t stop after training; it’s important to continue testing the model’s outputs to ensure that its responses are what you’d expect to see. 

    “You can ask questions of the model and then check the answers by comparing it back to the reference data and making sure it’s matching your expectations, like they’re not mixing up names and addresses or anything like that,” Stanley explained.

    For instance, Melissa has curated reference datasets that include geographic, business, identification, and other domains, and its informatics division utilizes ontological reasoning using formal semantic technologies in order to compare AI results to expected results based on real world models. 

    The post Garbage in, garbage out: The importance of data quality when training AI models appeared first on SD Times.

    Source: Read More 

    news
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSecure GUI VPN for Kali Linux
    Next Article Designing For Neurodiversity

    Related Posts

    Tech & Work

    The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

    June 5, 2025
    Tech & Work

    How To Fix Largest Contentful Paint Issues With Subpart Analysis

    June 5, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    This Microsoft 365 feature will nudge users to save files to OneDrive

    News & Updates

    SHREC: A Physics-Based Machine Learning Approach to Time Series Analysis

    Machine Learning

    Kingdom Come: Deliverance 2’s Patch 1.2.4 update just added a new Hardcore Mode, and I can’t wait to get my Bohemian butt kicked

    News & Updates

    Microsoft kills DirectAccess, a Windows Server’s remote connection feature. What should you do?

    Development

    Highlights

    Man Cures 5-Year Jaw Problem in 60 Seconds Using ChatGPT, Doctors Are Stunned

    April 29, 2025

    A Reddit user claims that OpenAI’s ChatGPT solved a painful, five-year-old jaw issue in under…

    MongoDB Django Backend Now Available in Public Preview

    February 3, 2025

    Why Synology’s new NAS drive support policy isn’t as bad as I first thought

    April 24, 2025

    Steps to improve your charity website’s navigation

    May 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.