Cut Your Losses in Large-Vocabulary Language Models

February 7, 2025

As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit…

Source: Read MoreÂ

Previous ArticleeaSEL: Promoting Social-Emotional Learning and Parent-Child Interaction Through AI-Mediated Content Consumption

Next Article Nveil: Offline Marketing Strategies

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Smashing Animations Part 4: Optimising SVGs

I test AI tools for a living. Here are 3 image generators I actually use and how

The world’s smallest 65W USB-C charger is my latest travel essential

This Spotlight alternative for Mac is my secret weapon for AI-powered search

Tech prophet Mary Meeker just dropped a massive report on AI trends – here’s your TL;DR

Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

Cast Model Properties to a Uri Instance in 12.17

My Favorite Obsidian Plugins and Their Hidden Settings

My Favorite Obsidian Plugins and Their Hidden Settings

Rilasciata /e/OS 3.0: Nuova Vita per Android Senza Google, Più Privacy e Controllo per l’Utente

Rilasciata Oracle Linux 9.6: Scopri le Novità e i Miglioramenti nella Sicurezza e nelle Prestazioni

Cut Your Losses in Large-Vocabulary Language Models

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

A Coding Implementation to Build an Advanced Web Intelligence Agent with Tavily and Gemini AI

CVE-2025-48757 – Lovable Database Row-Level Security Bypass (Remote Unauthenticated)

Critical GitHub Enterprise Server Flaw Allows Authentication Bypass

Windows 11 tests “Advanced Settings” for greater control over File Explorer and more

Are Locals Finding You? How to Optimize for Local SEO

The Path of Love: A Grandmother’s Gift

CVE-2025-25022 – IBM QRadar Suite Software Information Disclosure

Newpark Resources Hit by Ransomware Attack, Disrupting Key Systems

Microsoft has killed “several” data center projects in the U.S. and Europe, according to reports — Microsoft responds (Updated)

Cut Your Losses in Large-Vocabulary Language Models

Related Posts