Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 3, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 3, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 3, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 3, 2025

      SteelSeries reveals new Arctis Nova 3 Wireless headset series for Xbox, PlayStation, Nintendo Switch, and PC

      June 3, 2025

      The Witcher 4 looks absolutely amazing in UE5 technical presentation at State of Unreal 2025

      June 3, 2025

      Razer’s having another go at making it so you never have to charge your wireless gaming mouse, and this time it might have nailed it

      June 3, 2025

      Alienware’s rumored laptop could be the first to feature NVIDIA’s revolutionary Arm-based APU

      June 3, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      easy-live2d – About Make your Live2D as easy to control as a pixi sprite! Live2D Web SDK based on Pixi.js.

      June 3, 2025
      Recent

      easy-live2d – About Make your Live2D as easy to control as a pixi sprite! Live2D Web SDK based on Pixi.js.

      June 3, 2025

      From Kitchen To Conversion

      June 3, 2025

      Perficient Included in Forrester’s AI Technical Services Landscape, Q2 2025

      June 3, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      SteelSeries reveals new Arctis Nova 3 Wireless headset series for Xbox, PlayStation, Nintendo Switch, and PC

      June 3, 2025
      Recent

      SteelSeries reveals new Arctis Nova 3 Wireless headset series for Xbox, PlayStation, Nintendo Switch, and PC

      June 3, 2025

      The Witcher 4 looks absolutely amazing in UE5 technical presentation at State of Unreal 2025

      June 3, 2025

      Razer’s having another go at making it so you never have to charge your wireless gaming mouse, and this time it might have nailed it

      June 3, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Tech & Work»Garbage in, garbage out: The importance of data quality when training AI models

    Garbage in, garbage out: The importance of data quality when training AI models

    June 2, 2025

    As every company moves to implement AI in some form or another, data is king. Without quality data to train on, the AI likely won’t deliver the results people are looking for and any investment made into training the model won’t pay off in the way it was intended.  

    “If you’re training your AI model on poor quality data, you’re likely to get bad results,” explained Robert Stanley, senior director of special projects at Melissa. 

    According to Stanley, there are a number of data quality best practices to stick to when it comes to training data. “You need to have data that is of good quality, which means it’s properly typed, it’s fielded correctly, it’s deduplicated, and it’s rich. It’s accurate, complete and augmented or well-defined with lots of useful metadata, so that there’s context for the AI model to work off of,” he said. 

    If the training data does not meet those standards, it’s likely that the outputs of the AI model won’t be reliable, Stanley explained. For instance, if data has the wrong fields, then the model might start giving strange and unexpected outputs. “It thinks it’s giving you a noun, but it’s really a verb. Or it thinks it’s giving you a number, but it’s really a string because it’s fielded incorrectly,” he said. 

    It’s also important to ensure that you have the right kind of data that is appropriate to the model you are trying to build, whether that be business data or contact data or health care data. 

    “I would just sort of be going down these data quality steps that would be recommended before you even start your AI project,” he said. Melissa’s “Gold Standard” for any business critical data is to use data that’s coming in from at least three different sources, and is dynamically updated. 

    According to Stanley, large language models (LLMs) unfortunately really want to please their users, which sometimes means giving answers that look like compelling right answers, but are actually incorrect. 

    This is why the data quality process doesn’t stop after training; it’s important to continue testing the model’s outputs to ensure that its responses are what you’d expect to see. 

    “You can ask questions of the model and then check the answers by comparing it back to the reference data and making sure it’s matching your expectations, like they’re not mixing up names and addresses or anything like that,” Stanley explained.

    For instance, Melissa has curated reference datasets that include geographic, business, identification, and other domains, and its informatics division utilizes ontological reasoning using formal semantic technologies in order to compare AI results to expected results based on real world models. 

    The post Garbage in, garbage out: The importance of data quality when training AI models appeared first on SD Times.

    Source: Read More 

    news
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSecure GUI VPN for Kali Linux
    Next Article Designing For Neurodiversity

    Related Posts

    Tech & Work

    Sunshine And March Vibes (2025 Wallpapers Edition)

    June 3, 2025
    Tech & Work

    The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

    June 3, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Artificial General Intelligence (AGI): Crash Course – The Full Guidebook by Bookspotz

    Artificial Intelligence

    Google NotebookLM Launches Audio Overviews in 50+ Languages, Expanding Global Accessibility for AI Summarization

    Machine Learning

    CVE-2025-24339 – CtrlX OS HTTP Request Manipulation Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2024-12706 – OpenText Digital Asset Management SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Development

    Unleash the Power of Your CloudFront Logs: Analytics with AWS Athena

    May 22, 2024

    CloudFront, Amazon’s Content Delivery Network (CDN), accelerates website performance by delivering content from geographically distributed…

    Playwright: Storing an element selector in variable

    May 2, 2024

    Must-Have Tools for Designers

    June 7, 2024

    Building Your First Container with a HelloWorld Image using Docker CLI

    December 7, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.