In-Context Learning Capabilities of Multi-Layer Perceptrons MLPs: A Comparative Study with Transformers

Recent years have seen significant advances in neural language models, particularly Large Language Models (LLMs) enabled by the Transformer architecture and increased scale. LLMs exhibit exceptional skills in generating grammatical text, answering questions, summarising content, creating imaginative outputs, and solving complex puzzles. A key capability is in-context learning (ICL), where the model uses novel task exemplars presented during inference to respond accurately without weight updates. ICL is typically attributed to Transformers and their attention-based mechanisms.

ICL has been shown for linear regression tasks with Transformers, which can generalize to new input/label pairs in-context. Transformers achieve this by potentially implementing gradient descent or replicating least-squares regression. Transformers interpolate between in-weight learning (IWL) and ICL, with diverse datasets enhancing ICL capabilities. While most studies focus on Transformers, some research explores recurrent neural networks (RNNs) and LSTMs, with mixed results. Recent findings highlight various causal sequence models and state space models also achieving ICL. However, MLPsâ€™ potential for ICL remains underexplored despite their resurgence in complex tasks, prompted by the introduction of the MLP-Mixer model.

In this study researchers from Harvard demonstrate that multi-layer perceptrons (MLPs) can effectively learn in-context. MLPs and MLPMixer models perform competitively with Transformers on ICL tasks within the same compute budget. Particularly, MLPs outperform Transformers in relational reasoning ICL tasks, challenging the belief that ICL is unique to Transformers. This success suggests exploring beyond attention-based architectures and indicates that Transformers, constrained by self-attention and positional encodings, may be biased away from certain task structures compared to MLPs.

The study investigates MLPsâ€™ behavior in ICL through two tasks: in-context regression and in-context classification. For ICL regression, the input is a sequence of linearly related value pairs (xi, yi), with varying weights Î² and added noise, plus a query xq. The model predicts the corresponding yq by inferring Î² from the context exemplars. For ICL classification, the input is a sequence of exemplars (xi, yi) followed by a query xq, sampled from a Gaussian mixture model. The model predicts the correct label for xq by referencing the context exemplars, considering data diversity and burstiness (Number of repeats per cluster in the context).

MLPs and Transformers were compared on in-context regression and classification tasks. Both architectures, including MLP-Mixers, achieved near-optimal mean squared error (MSE) with sufficient computing, although Transformers slightly outperformed MLPs for smaller computing budgets. For longer context lengths, vanilla MLPs performed worse, while MLP-Mixers maintained optimal MSE. As data diversity increased, all models transitioned from IWL to ICL, with Transformers making the transition more quickly. In in-context classification, MLPs performed comparably to Transformers, maintaining relatively flat loss across context lengths and transitioning from IWL to ICL with increased data diversity.

In this work, Harvard researchers compare MLPs and Transformers on in-context regression and classification tasks. All architectures, including MLP-Mixers, achieved near-optimal MSE with sufficient compute, although Transformers slightly outperformed MLPs with smaller compute budgets. Vanilla MLPs performed worse with longer context lengths, while MLP-Mixers maintained optimal MSE. As data diversity increased, all models transitioned from IWL to ICL, with Transformers making the transition more quickly. In in-context classification, MLPs performed comparably to Transformers, maintaining flat loss across context lengths and transitioning from IWL to ICL as data diversity increased.

Check out theÂ Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 43k+ ML SubReddit | Also, check out our AI Events Platform

The post In-Context Learning Capabilities of Multi-Layer Perceptrons MLPs: A Comparative Study with Transformers appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

In-Context Learning Capabilities of Multi-Layer Perceptrons MLPs: A Comparative Study with Transformers

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

I tested a 9,000,000mAh battery pack from eBay – here’s my buying advice

Future-Proofing the Past: AIâ€™s Role in Protecting Cultural Legacies

Google AI Unveils 601 Real-World Generative AI Use Cases Across Industries

Arm Warns of Actively Exploited Zero-Day Vulnerability in Mali GPU Drivers

Stop playing Call of Duty: Black Ops 6 and Warzone on PC and jump to console, your sanity will thank you

Darcula Adds GenAI to Phishing Toolkit, Lowering the Barrier for Cybercriminals

Decoding Decoder-Only Transformers: Insights from Google DeepMindâ€™s Paper

TFB: An Open-Source Machine Learning Library Designed for Time Series Researchers

In-Context Learning Capabilities of Multi-Layer Perceptrons MLPs: A Comparative Study with Transformers

Related Posts