MORCELA: A New AI Approach to Linking Language Models LM Scores with Human Acceptability Judgments

In natural language processing (NLP), a central question is how well the probabilities generated by language models (LMs) align with human linguistic behavior. This alignment is often assessed by comparing LM scores with human acceptability judgments, which evaluate how natural a sentence feels. Previous studies, such as those using SLOR (Syntactic Log-Odds Ratio), have attempted to bridge this gap, but significant issues remain. SLOR assumes uniform correction for factors such as sequence length and unigram frequency across different models, which can lead to inaccuracies. A more dynamic method is needed, one that can better adapt to differences between models and the complexities of human language processing.

MORCELA: A New Linking Theory

A team of researchers from NYU and CMU propose MORCELA (Magnitude-Optimized Regression for Controlling Effects on Linguistic Acceptability), which introduces a new linking theory that addresses these challenges. Unlike SLOR, which applies static adjustments for length and unigram frequency, MORCELA estimates the optimal level of adjustment from data, using learned parameters specific to these effects. By incorporating parametersâ€”Î² for unigram frequency and Î³ for sentence lengthâ€”MORCELA adjusts the LM scores, resulting in improved correlation with human judgments. This approach better accounts for how LMs perceive the rarity of words and the length of sentences compared to human expectations. The core idea behind MORCELA is that not all language models should receive the same correction, as models differ in how well they predict linguistic acceptability.

Technical Overview

MORCELA works by incorporating parameters that are trained on human acceptability judgments. These parameters control the extent of correction applied to LM log probabilities, making MORCELA more adaptable than its predecessors like SLOR. Specifically, the learned parameter Î² adjusts the impact of unigram frequency, while Î³ controls the correction for sentence length. The flexibility of these adjustments allows MORCELA to better match human acceptability ratings, especially for larger models. For example, larger models, which tend to have a more nuanced understanding of language, often require less adjustment for unigram frequency due to their improved ability to predict less common words in context.

Performance and Significance

The significance of MORCELA becomes evident when considering its performance across different LM sizes. MORCELA outperformed SLOR in predicting human acceptability judgments for models from two well-known families: Pythia and OPT. Results showed that as models grew larger, MORCELAâ€™s correlation with human judgments improved. The optimal parameter values estimated by MORCELA revealed that larger LMs are more robust to frequency and length effects, requiring less correction. This suggests that larger LMs have a better understanding of linguistic context, allowing them to predict the acceptability of rare words more accurately, thereby reducing the impact of unigram frequency as a confounding factor. MORCELA improved the correlation between LM-generated scores and human judgments by up to 46% compared to SLOR, demonstrating its ability to fine-tune corrections more precisely.

This advancement is important for several reasons. First, it suggests that current LMs may be more capable of reflecting human language processing than previously thought, provided the right corrections are applied. Second, the insights from MORCELA can be valuable in psycholinguistic studies that utilize LMs as proxies for human language comprehension. By providing a more accurate linking theory, MORCELA ensures that LMs are evaluated in a way that aligns more closely with human linguistic intuition. For instance, a key result from MORCELAâ€™s implementation showed that larger LMs had a lower reliance on unigram frequency corrections, indicating that these models have a better grasp of less frequent, context-specific words. This characteristic could significantly impact how we interpret LMs in tasks involving rare or domain-specific language.

Conclusion

MORCELA represents an important development in aligning language models with human acceptability judgments. Using learned parameters to adjust dynamically for length and frequency addresses critical flaws in previous approaches like SLOR. The results show that, with proper adjustment, LMs can better reflect human linguistic intuition, particularly as the models scale in size. Future work could explore further adjustments or new parameters that could bring LMs even closer to human-like language understanding. MORCELA not only enhances the evaluation process for LMs but also provides valuable insights into how these models process language, bridging the gap between machine-generated probabilities and human language behavior.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter.. Donâ€™t Forget to join ourÂ 55k+ ML SubReddit.

[FREE AI VIRTUAL CONFERENCE] SmallCon: Free Virtual GenAI Conference ft. Meta, Mistral, Salesforce, Harvey AI & more. Join us on Dec 11th for this free virtual event to learn what it takes to build big with small models from AI trailblazers likeÂ Meta, Mistral AI, Salesforce, Harvey AI, Upstage, Nubank, Nvidia, Hugging Face,Â and more.

The post MORCELA: A New AI Approach to Linking Language Models LM Scores with Human Acceptability Judgments appeared first on MarkTechPost.

Source: Read MoreÂ

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

In MCP era API discoverability is now more important than ever

Black Myth: Wukong is coming to Xbox exactly one year after launching on PlayStation

Reddit wants to sue Anthropic for stealing its data, but the Claude AI manufacturers vow to “defend ourselves vigorously”

Satya Nadella says Microsoft makes money every time you use ChatGPT: “Every day that ChatGPT succeeds is a fantastic day”

Multiple reports suggest a Persona 4 Remake from Atlus will be announced during the Xbox Games Showcase

TC39 advances numerous proposals at latest meeting

TC39 advances numerous proposals at latest meeting

TypeBridge – zero ceremony, compile time rpc for client and server com

Simplify Cloud-Native Development with Quarkus Extensions

Black Myth: Wukong is coming to Xbox exactly one year after launching on PlayStation

Black Myth: Wukong is coming to Xbox exactly one year after launching on PlayStation

Reddit wants to sue Anthropic for stealing its data, but the Claude AI manufacturers vow to “defend ourselves vigorously”

Satya Nadella says Microsoft makes money every time you use ChatGPT: “Every day that ChatGPT succeeds is a fantastic day”

MORCELA: A New AI Approach to Linking Language Models LM Scores with Human Acceptability Judgments

MORCELA: A New Linking Theory

Technical Overview

Performance and Significance

Conclusion

Leadership, Trust, and Cyber Hygiene: NCSC’s Guide to Security Culture in Action

CVE-2025-4318 Critical RCE in AWS Amplify Codegen UI

Mozilla annuncia la chiusura di Pocket

Building a REACT-Style Agent Using Fireworks AI with LangChain that Fetches Data, Generates BigQuery SQL, and Maintains Conversational Memory

Driving Business Value with Responsible AI: Ensuring Trust and Transparency

DeepSeek Security Scrutinized Amid Data Leaks, Jailbreaks

Microsoft Issues Patches for 51 Flaws, Including Critical MSMQ Vulnerability

A Quick Playwright Overview for QA Managers

Microsoft’s Patch for Symlink Exploit Introduces New Windows Update DoS Flaw

From motor control to embodied intelligence

MORCELA: A New AI Approach to Linking Language Models LM Scores with Human Acceptability Judgments

MORCELA: A New Linking Theory

Technical Overview

Performance and Significance

Conclusion

Related Posts