In-Context Learning Capabilities of Multi-Layer Perceptrons MLPs: A Comparative Study with Transformers

Recent years have seen significant advances in neural language models, particularly Large Language Models (LLMs) enabled by the Transformer architecture and increased scale. LLMs exhibit exceptional skills in generating grammatical text, answering questions, summarising content, creating imaginative outputs, and solving complex puzzles. A key capability is in-context learning (ICL), where the model uses novel task exemplars presented during inference to respond accurately without weight updates. ICL is typically attributed to Transformers and their attention-based mechanisms.

ICL has been shown for linear regression tasks with Transformers, which can generalize to new input/label pairs in-context. Transformers achieve this by potentially implementing gradient descent or replicating least-squares regression. Transformers interpolate between in-weight learning (IWL) and ICL, with diverse datasets enhancing ICL capabilities. While most studies focus on Transformers, some research explores recurrent neural networks (RNNs) and LSTMs, with mixed results. Recent findings highlight various causal sequence models and state space models also achieving ICL. However, MLPsâ€™ potential for ICL remains underexplored despite their resurgence in complex tasks, prompted by the introduction of the MLP-Mixer model.

In this study researchers from Harvard demonstrate that multi-layer perceptrons (MLPs) can effectively learn in-context. MLPs and MLPMixer models perform competitively with Transformers on ICL tasks within the same compute budget. Particularly, MLPs outperform Transformers in relational reasoning ICL tasks, challenging the belief that ICL is unique to Transformers. This success suggests exploring beyond attention-based architectures and indicates that Transformers, constrained by self-attention and positional encodings, may be biased away from certain task structures compared to MLPs.

The study investigates MLPsâ€™ behavior in ICL through two tasks: in-context regression and in-context classification. For ICL regression, the input is a sequence of linearly related value pairs (xi, yi), with varying weights Î² and added noise, plus a query xq. The model predicts the corresponding yq by inferring Î² from the context exemplars. For ICL classification, the input is a sequence of exemplars (xi, yi) followed by a query xq, sampled from a Gaussian mixture model. The model predicts the correct label for xq by referencing the context exemplars, considering data diversity and burstiness (Number of repeats per cluster in the context).

MLPs and Transformers were compared on in-context regression and classification tasks. Both architectures, including MLP-Mixers, achieved near-optimal mean squared error (MSE) with sufficient computing, although Transformers slightly outperformed MLPs for smaller computing budgets. For longer context lengths, vanilla MLPs performed worse, while MLP-Mixers maintained optimal MSE. As data diversity increased, all models transitioned from IWL to ICL, with Transformers making the transition more quickly. In in-context classification, MLPs performed comparably to Transformers, maintaining relatively flat loss across context lengths and transitioning from IWL to ICL with increased data diversity.

In this work, Harvard researchers compare MLPs and Transformers on in-context regression and classification tasks. All architectures, including MLP-Mixers, achieved near-optimal MSE with sufficient compute, although Transformers slightly outperformed MLPs with smaller compute budgets. Vanilla MLPs performed worse with longer context lengths, while MLP-Mixers maintained optimal MSE. As data diversity increased, all models transitioned from IWL to ICL, with Transformers making the transition more quickly. In in-context classification, MLPs performed comparably to Transformers, maintaining flat loss across context lengths and transitioning from IWL to ICL as data diversity increased.

Check out theÂ Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 43k+ ML SubReddit | Also, check out our AI Events Platform

The post In-Context Learning Capabilities of Multi-Layer Perceptrons MLPs: A Comparative Study with Transformers appeared first on MarkTechPost.

Source: Read MoreÂ

CodeSOD: Enterprise Code Coverage

Error’d: Infallabella

CodeSOD: Ready Xor Not

CodeSOD: A Set of Mistakes

Predicting the (actually very exciting) future of next gen Xbox hardware

With Astro Bot winning Game of the Year, Microsoft and Xbox need to start reinvesting in their platforming games

If ChatGPT produces AI-generated code for your app, who does it really belong to?

I tested the viral ‘tangle-free’ USB-C cable, and it’s my new travel essential

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PEAR Releases (12.09.2024)

Community News: Latest PECL Releases (12.17.2024)

Predicting the (actually very exciting) future of next gen Xbox hardware

Predicting the (actually very exciting) future of next gen Xbox hardware

With Astro Bot winning Game of the Year, Microsoft and Xbox need to start reinvesting in their platforming games

Asus bombards Windows 11 with christmas.exe malware-like Christmas wreath banner

In-Context Learning Capabilities of Multi-Layer Perceptrons MLPs: A Comparative Study with Transformers

Predicting the (actually very exciting) future of next gen Xbox hardware

With Astro Bot winning Game of the Year, Microsoft and Xbox need to start reinvesting in their platforming games

Discordâ€™s new end-to-end encryption protocol now goes live, after months of previewing

Bitdefender Launches â€˜Scamioâ€™ on WhatsApp: A New AI Tool to Combat Online Scams in Australia

Core, Extra, Multilib? Unraveling the Arch Linux Repositories

LLaVaOLMoBitnet1B: The First Ternary Multimodal LLM Capable of Accepting Image(s) and Text Inputs to Produce Coherent Textual Response

Researchers from MIT and Peking University Introduce a Self-Correction Mechanism for Improving the Safety and Reliability of Large Language Models

The weird future of buttons

Microsoft and Khan Academy announce AI partnership

Creating a real world React Native app from scratch with Expo and Metricalp

In-Context Learning Capabilities of Multi-Layer Perceptrons MLPs: A Comparative Study with Transformers

Related Posts