Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs

Post-training methods for pre-trained language models (LMs) depend on human supervision through demonstrations or preference feedback to specify desired behaviors. However, this approach faces critical limitations as tasks and model behaviors become very complex. Human supervision is unreliable in these scenarios as LMs learn to mimic mistakes in demonstrations or exploit inherent flaws in feedback systems. The core challenge lies in training LMs for tasks that exceed human capability in reliability in demonstrations or evaluations. Recent research has identified diverse failure modes, including reward-hacking of human-designed supervision signals or real humans themselves.

Limitations of Human Supervision in LLM Post-Training

Researchers have explored several approaches to scale beyond human supervision. One standard method utilizes high-quality verifiable rewards, such as matching model outputs with ground-truth solutions in mathematical domains. Despite evidence that pre-trained base models have strong latent capabilities for downstream tasks, with post-training adding minimal improvements, effective elicitation remains challenging. The Contrast Consistent Search (CCS) method is an unsupervised elicitation approach that uses logical consistency to find latent knowledge without supervision. However, CCS underperforms supervised approaches and often fails to identify knowledge due to other prominent features satisfying consistency properties.

Introducing Internal Coherence Maximization (ICM)

Researchers from Anthropic, Schmidt Sciences, Independent, Constellation, New York University, and George Washington University have proposed Internal Coherence Maximization (ICM), which fine-tunes pre-trained models on their own generated labels without using any provided labels. ICM solves this by searching for label sets that are both logically consistent and mutually predictable according to the pre-trained model. Since optimal label set identification remains computationally infeasible, ICM uses a simulated annealing-inspired search algorithm to approximate the maximum objective. Moreover, this method matches the performance of training on golden labels on TruthfulQA and GSM8K, and outperforms training on crowdsourced human labels on Alpaca.

How the ICM Algorithm Works

The ICM algorithm follows an iterative three-step process: (a) the system samples a new unlabeled example from the dataset for potential inclusion, (b) it determines the optimal label for this example while simultaneously resolving any logical inconsistencies, and (c) the algorithm evaluates whether to accept this new labeled example based on the scoring function. ICM is evaluated across three datasets: TruthfulQA for truthfulness assessment, GSM8K-verification for mathematical correctness, and Alpaca for helpfulness and harmlessness. Researchers used four baselines in their experiments: Zero-shot, Zero-shot (Chat), Golden Label, and Human Label. Moreover, Experiments used two open-weight models, Llama 3.1 8B and 70B, and two proprietary models: Claude 3 Haiku and Claude 3.5 Haiku.

Benchmark Performance and Model Comparisons

In superhuman capability elicitation tasks, ICM matches golden supervision accuracy at 80%, outperforming the estimated human accuracy of 60%. Using ICM-generated reward models, researchers successfully trained an assistant chatbot without human supervision. The unsupervised reward model achieves 75.0% accuracy on RewardBench, compared to 72.2% for human-supervised alternatives trained on production data. Moreover, using both the unsupervised and human-supervised RM, two policies are trained with RL to create helpful, harmless, and honest assistants. The policy trained with the unsupervised RM achieves a 60% win rate. However, these policies still lag behind the publicly released Claude 3.5 Haiku, which achieves 92% win rates.

Conclusion and Future Outlook

This paper introduces Internal Coherence Maximization (ICM), an advancement in unsupervised LM for fine-tuning pre-trained models on self-generated labels. The method consistently matches golden supervision performance and surpasses crowdsourced human supervision across GSM8K-verification, TruthfulQA, and Alpaca reward modeling tasks. However, ICM’s limitations include dependency on concept salience within pre-trained models and ineffectiveness with long inputs due to context window constraints. As LMs advance beyond human evaluation capabilities, ICM offers promising alternatives to traditional RLHF, ensuring model alignment with human intent without human supervision boundaries.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs appeared first on MarkTechPost.

Source: Read MoreÂ

From Data To Decisions: UX Strategies For Real-Time Dashboards

Honeycomb launches AI observability suite for developers

Low-Code vs No-Code Platforms for Node.js: What CTOs Must Know Before Investing

ServiceNow unveils Zurich AI platform

Building personal apps with open source and AI

What Can We Actually Do With corner-shape?

Craft, Clarity, and Care: The Story and Work of Mengchu Yao

Distribution Release: Q4OS 6.1

Learning from PHP Log to File Example

Learning from PHP Log to File Example

Online EMI Calculator using PHP – Calculate Loan EMI, Interest, and Amortization Schedule

Package efficiency and dependency hygiene

Dmitry — The Deep Magic

Dmitry — The Deep Magic

Right way to record and share our Terminal sessions

Windows 11 Powers Up WSL: How GPU Acceleration & Kernel Upgrades Change the Game

Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs

Limitations of Human Supervision in LLM Post-Training

Introducing Internal Coherence Maximization (ICM)

How the ICM Algorithm Works

Benchmark Performance and Model Comparisons

Conclusion and Future Outlook

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

Announcing the new cluster creation experience for Amazon SageMaker HyperPod

CVE-2025-20337 – Cisco ISE/Cisco ISE-PIC Remote Code Execution Vulnerability

Agentic AI using Azure AI Foundry and Power Platform

Texas A&M Researchers Introduce a Two-Phase Machine Learning Method Named ‘ShockCast’ for High-Speed Flow Simulation with Neural Temporal Re-Meshing

VidsYouTube – Free Online Video Downloader for YouTube, TikTok, Instagram & More

Google Spoofed in Sophisticated DKIM Replay Attack Exploiting Email Trust Mechanisms

CVE-2025-3520 – “WordPress Avatar Plugin File Deletion Vulnerability”

CVE-2025-46338 – Audiobookshelf Reflected Cross-Site Scripting (XSS) Vulnerability

Microsoft says Copilot can use location to change Outlook’s UI on Android

Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs

Limitations of Human Supervision in LLM Post-Training

Introducing Internal Coherence Maximization (ICM)

How the ICM Algorithm Works

Benchmark Performance and Model Comparisons

Conclusion and Future Outlook

Related Posts