Cleanlab Introduces the Trustworthy Language Model (TLM) that Addresses the Primary Challenge to Enterprise Adoption of LLMs: Unreliable Outputs and Hallucinations

While 55% of organizations are experimenting with generative AI, only 10% have implemented it in production, according to a recent Gartner poll. LLMs face a major obstacle in transitioning to production due to their tendency to generate erroneous outputs, termed hallucinations. These inaccuracies hinder their utilization in applications requiring correct results. Instances like Air Canadaâ€™s chatbot misinforming customers about refund policies and a law firmâ€™s use of ChatGPT to produce a brief filled with fabricated citations illustrate the risks associated with deploying unreliable LLMs. Similarly, New York Cityâ€™s â€œMyCityâ€ chatbot has provided incorrect responses to inquiries about local laws, underscoring the challenges in ensuring accurate outputs from LLMs.

Image Source

Cleanlab presents the Trustworthy Language Model (TLM), addressing the primary challenge hindering enterprise adoption of LLMs: unreliable outputs and hallucinations. TLM integrates a trust score into each LLM response, empowering users to identify and control erroneous outputs, thus facilitating the deployment of generative AI in previously inaccessible scenarios. Extensive benchmarking demonstrates that TLM outperforms existing LLMs in accuracy while offering better-calibrated trustworthiness scores, leading to enhanced cost and time efficiency compared to alternative methods for managing LLM uncertainty.

TLM addresses the inevitable presence of hallucinations in LLMs by assigning a trustworthiness score to each output, enabling users to identify instances of hallucination. TLM prioritizes minimizing false negatives, ensuring that the trustworthiness score is low when hallucinations occur, thereby facilitating the reliable deployment of LLM-based applications.Â

Image Source

The TLM API serves multiple purposes: it can function as a seamless replacement for existing LLMs, offering a .prompt() method that returns responses and trustworthiness scores, enabling new applications. Also, TLM enhances the accuracy of responses by internally generating multiple responses and selecting the one with the highest trustworthiness score. TLM can augment trust for outputs from existing LLMs or human-generated data through its .get_trustworthiness_score() method. TLM operates by integrating a trust layer onto existing LLMs, allowing users to select from popular base models like GPT-3.5 and GPT-4 or augment any LLM with only black-box access to the LLM API. For enterprise needs, such as enhancing trustworthiness in custom fine-tuned LLMs, users can engage with Cleanlab directly.

Image Source

The evaluation compares Cleanlabâ€™s TLM to OpenAIâ€™s GPT-4, focusing on response accuracy and cost/time savings. TLMâ€™s trustworthiness score enhances trust in LLM outputs, detecting errors efficiently. Compared to self-evaluation and probability-based methods, TLMâ€™s comprehensive assessment includes epistemic uncertainty, offering superior reliability. TLM optimizes resource allocation by flagging low-scoring outputs for human review, ensuring robust decision-making. Berkeley Research Group (BRG) has already seen significant cost savings from leveraging TLM, according to Steven Gawthorpe, PhD, Associate Director and Senior Data Scientist at BRG.

Image Source

In conclusion, Cleanlabâ€™s Trustworthy Language Model (TLM) is an extensive solution to organizationsâ€™ challenges in deploying LLM applications. TLM enables more accurate and dependable outputs by addressing the reliability issues associated with hallucinations through trustworthiness scores. With its ability to augment existing LLMs and enhance trust in various applications, TLM signifies a significant advancement in the deployment of generative AI, paving the way for increased adoption & utilization in enterprise settings.

The post Cleanlab Introduces the Trustworthy Language Model (TLM) that Addresses the Primary Challenge to Enterprise Adoption of LLMs: Unreliable Outputs and Hallucinations appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

Cleanlab Introduces the Trustworthy Language Model (TLM) that Addresses the Primary Challenge to Enterprise Adoption of LLMs: Unreliable Outputs and Hallucinations

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

Researchers from Snowflake and CMU Introduce SuffixDecoding: A Novel Model-Free Approach to Accelerating Large Language Model (LLM) Inference through Speculative Decoding

From Humble Beginnings to Global Success: Inspiring Transformations of Famous Brands

How to Build an Application with AWS Lambda

Global View Data Management in Laravel

SCOTUS Chevron Ruling May Have Limited Impact on Cybersecurity

Glossary of everything color related

Boost Your Customer Support to the Next Level with Salesforce Agentforce â€“ Hereâ€™s How!

Introducing Curated Solutions for Databases on AWS

Cleanlab Introduces the Trustworthy Language Model (TLM) that Addresses the Primary Challenge to Enterprise Adoption of LLMs: Unreliable Outputs and Hallucinations

Related Posts