How are you testing your Generative AI ? What testing strategies have you discovered

Generative AI is becoming the new norm, widely used and more accessible to the public via platforms like ChatGPT or Meta AI, which appear on social media platforms like WhatsApp and Instagram Messenger.
Despite its being fundamentally a transformers that break sentences into tokens and predict the next word, the implications and applications are vast. However, these GPT models currently lack human-like understanding. Which might cause reliability issues and others, but considering its capabilities the new trend of agentic AI is on rise this highlights the importance of having a well-defined testing approach.

I wanted to ask:

What are the patterns or testing strategies you are following beyond basic testing strategies?
What’s your approach to identify and fix, do you follow any checkmarks ?
- AI Hallucination
- Fairness and Bias
- Security & Ethical Issue
- Coherence and relevance
- Robustness and Reliability
- Explainability and Interpretability
- Include others you have Identified

Here are some of my observations:
Example 1: AI Hallucination

Issue: Generating factually incorrect or nonsensical outputs, The response provided has data that is not reliable however its sounds plausible or true.

Solution: Fact-checking, Human-in-the-loop, Prompt engineering, Training data quality, Model fine-tuning, Post-processing

Example 2: Bias and Fairness

Issue: Based on the data, Generating outputs that unfairly favor certain groups.

Solution: Bias audits, Fairness metrics, Diverse training data

Example 3: Adherence to Instructions

Issue: With tools like Meta AI Agents and similar others in Salesforce, we need to check if the response adheres to the instructions, as sometimes it fails to follow the guidelines and guardrails.

Solution: It might be an issue with the instruction, but we need to go back to basics and test against each instruction to check if it is followed or not.
This might become hectic any alternate

Example 4: Not in Coherence Knowledge Article Boundaries

Issue: GPT models used as chatbots with a set of knowledge articles sometimes provide results outside the set of knowledge articles as a reference.

Solution: Coherence metrics, Prompt design, Feedback

Example 5: Chain of Thought

Issue: In some cases, the generative AI assumes continuity with earlier conversations within the window period, which might cause unnecessary references.

Solution: There should be instructions to cross-verify and provide a note.
Most of these issues can be addressed with effective prompt engineering. However, I am curious about your methods for breaking these issues and any observations you have identified.

Source: Read More

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

I saw every Samsung QLED TV releasing in 2025 – these standout features had me hooked

Xbox Cloud Gaming seems to now support early access games, starting with South of Midnight

GameSir just showed off its G7 Pro “Xbox Elite” controller, and it looksspectacular

6 reasons why I think Microsoft should keep the ‘local account’ option in Windows 11

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Feature Flags with Laravel Pennant

Microsoft launches new Copilot app on Windows 11 with o3 reasoning, screenshots tool

Microsoft launches new Copilot app on Windows 11 with o3 reasoning, screenshots tool

Xbox Cloud Gaming seems to now support early access games, starting with South of Midnight

GameSir just showed off its G7 Pro “Xbox Elite” controller, and it looksspectacular

How are you testing your Generative AI ? What testing strategies have you discovered

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

AI will change all businesses and most leaders are not ready

They said Apple’s M4 MacBook Air was a ‘boring update’ – my tests told a different story

10 Must-Have Apps for 3 Monitors You Should Know About

Microsoft is killing off Windows 11â€™s Win + C shortcut as Copilot becomes a web app

XAI-DROP: Enhancing Graph Neural Networks GNNs Training with Explainability-Driven Dropping Strategies

MediSecure Data Breach an â€˜Isolatedâ€™ Attack; No Impact on Current e-Prescriptions

Windows Recall will introduce the ability to filter apps and websites from being captured by the app

Chinaâ€™s Kuaishou Technology Unveils Kling AI Video Model: A Revolutionary Competitor to OpenAIâ€™s Sora in Text-to-Video Generation

How are you testing your Generative AI ? What testing strategies have you discovered

Related Posts