Revolutionizing Recurrent Neural NetworksÂ RNNs: How Test-Time Training TTT Layers Outperform Transformers

Self-attention mechanisms can capture associations across entire sequences, making them excellent at processing extended contexts. However, they have a high computational cost, namely quadratic complexity, which implies that as the sequence length increases, the amount of time and memory needed increases. Recurrent Neural Networks (RNNs), on the other hand, have linear complexity, which increases their computational efficiency. However, due to the constraints placed on their hidden state, which needs to contain all of the data in a fixed-size representation, RNNs perform poorly in lengthy settings.

To overcome these limitations, a team of researchers from Stanford University, UC San Diego, UC Berkeley, and Meta AI has suggested a unique class of sequence modeling layers that combines a more expressive hidden state with the linear complexity of RNNs. The main concept is to employ a self-supervised learning step as the update rule and turn the concealed state into a machine learning model. This implies that the hidden state is updated by efficiently training on the input sequence, even during the test phase. These levels are referred to as Test-Time Training (TTT) layers.

TTT-Linear and TTT-MLP are the two distinct varieties of TTT layers that have been introduced. Whereas the hidden state of TTT-MLP is a two-layer Multilayer Perceptron (MLP), the hidden state of TTT-Linear is a linear model. The team has tested the performance of these TTT layers against a robust Transformer model and Mamba, a contemporary RNN, evaluating them over models with parameters ranging from 125 million to 1.3 billion.

According to the evaluations, TTT-Linear and TTT-MLP both perform on par with or better than the baselines. Similar to the Transformer, TTT layers keep getting smaller as they condition on additional tokens. Perplexity is a metric that assesses how well a model predicts a sequence. This is a big benefit because it shows that TTT layers employ extended contexts well, whereas Mamba stops improving at 16,000 tokens.

After some preliminary optimizations, TTT-Linear matched Mamba in wall-clock time, which is a measure of the real amount of time that elapses while processing and beat the Transformer in speed for sequences up to 8,000 tokens. Though it has more potential for managing lengthy contexts, TTT-MLP still has issues with memory input/output operations.

The team has summarized their primary contributions as follows:

A unique class of sequence modeling layers has been introduced, called Test-Time Training (TTT) layers, in which a model updated via self-supervised learning serves as the hidden state. This view presents a new avenue for sequence modeling research by integrating a training loop into a layerâ€™s forward pass.

A straightforward instantiation of TTT layers called TTT-Linear has been introduced, and the team has shown that it performs better in evaluations with model sizes ranging from 125 million to 1.3 billion parameters than both Transformers and Mamba, suggesting that TTT layers have the ability to improve sequence modelsâ€™ performance.

The team has also created mini-batch TTT and the dual form to increase the hardware efficiency of TTT layers, which makes TTT-Linear a useful building block for large language models. These optimizations make the integration of TTT layers into practical applications more feasible.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â

Join ourÂ Telegram Channel andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 46k+ ML SubReddit

The post Revolutionizing Recurrent Neural NetworksÂ RNNs: How Test-Time Training TTT Layers Outperform Transformers appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

Revolutionizing Recurrent Neural NetworksÂ RNNs: How Test-Time Training TTT Layers Outperform Transformers

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

How to Upgrade Liferay 7.0 to 7.4 Migration

The Power of Community Learning in Tech Projects

Valuepitch

The What If Machine: Bringing the “Iffy” Future of CSS into the Present

Tim Brown: Flexible Typesetting is now yours, for free

How to Check the Word Count in Google Docs

Building AI with MongoDB: Conversation Intelligence with Observe.AI

Llama-3-based OpenBioLLM-Llama3-70B and 8B: Outperforming GPT-4, Gemini, Meditron-70B, Med-PaLM-1 and Med-PaLM-2 in Medical-Domain

Revolutionizing Recurrent Neural NetworksÂ RNNs: How Test-Time Training TTT Layers Outperform Transformers

Related Posts