Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

July 4, 2025

Large-scale models are routinely trained on a mixture of different data sources.
Different data mixtures yield very different downstream performances.
We propose a novel architecture that can instantiate one model for each data mixture without having to re-train the model.
Our architecture consists of a bank of expert weights, which are linearly combined to instantiate one model.
We learn the linear combination coefficients as a function of the input histogram.
To train this architecture, we sample random histograms, instantiate the corresponding model, and backprop through one batch of data…

Source: Read MoreÂ

Previous ArticleDuizenden NetScaler-servers kwetsbaar voor CitrixBleed2, details snel openbaar

Next Article Introducing Muzli Me

Error’d: Pickup Sticklers

From Prompt To Partner: Designing Your Custom AI Assistant

Microsoft unveils reimagined Marketplace for cloud solutions, AI apps, and more

Design Dialects: Breaking the Rules, Not the System

Building personal apps with open source and AI

What Can We Actually Do With corner-shape?

Craft, Clarity, and Care: The Story and Work of Mengchu Yao

Cailabs secures €57M to accelerate growth and industrial scale-up

Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

Mastering PHP File Uploads: A Guide to php.ini Settings and Code Examples

The first browser with JavaScript landed 30 years ago

Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

Announcing the new cluster creation experience for Amazon SageMaker HyperPod

Build your own Google Photo Search

Wire Room Math: AI + SME = (Less Compensation Paid) X (Headline Risk + Payment Errors)^2

Repeat Strings Efficiently with Laravel’s Str::repeat Method

Low-Code vs No-Code Platforms for Node.js: What CTOs Must Know Before Investing

Don’t let dormant accounts become a doorway for cybercriminals

Perficient Wins the Gold: Globee® Customer Excellence Award for Customer Success Story

How to Lock Down the No-Code Supply Chain Attack Surface

RT-2: New model translates vision and language into action

Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Related Posts