How â€˜Chain of Thoughtâ€™ Makes Transformers Smarter

Large Language Models (LLMs) like GPT-3 and ChatGPT exhibit exceptional capabilities in complex reasoning tasks such as mathematical problem-solving and code generation, far surpassing standard supervised machine learning techniques. The key to unlocking these advanced reasoning abilities lies in the chain of thought (CoT), which refers to the ability of the model to generate intermediate reasoning steps before arriving at the final answer, kind of like how we humans break down a complex problem into smaller steps in our head. This can be achieved through methods like training the model on examples enriched with intermediate reasoning steps or using few-shot prompting to instruct the model to generate a CoT.

Now, you might think that the contents of these intermediate steps is what allows the model to reason better. But interestingly, in this study, the researchers found that even if the intermediate steps are incorrect or completely random, just the act of generating them still helps the model a lot. Itâ€™s like the model is being told â€œOkay, think this through step-by-stepâ€ and that alone improves its reasoning ability drastically.

So the researchers wanted to understand why this â€œchain of thoughtâ€ approach is so powerful for transformers (the type of model used in GPT-3, etc). They used concepts from circuit complexity theory and adopted the language of computational complexity classes like NC, AC, and TC to analyze this problem.

Essentially, they found that without the chain of thought, transformers are limited to efficiently performing only parallel computations, meaning they can solve problems that can be broken down into independent sub-tasks that can be computed simultaneously.

However, many complex reasoning tasks require inherently serial computations, where one step follows from the previous step. And this is where the chain of thought helps transformers a lot. By generating step-by-step reasoning, the model can perform many more serial computations than it could without CoT.

The researchers proved theoretically that while a basic transformer without CoT can only solve problems up to a certain complexity level, allowing a polynomial number of CoT steps makes transformers powerful enough to solve almost any computationally hard problem, at least from a theoretical perspective.

To back up their theory, they also did some experiments on different arithmetic tasks â€“ ones that can be parallelized and ones that inherently require sequential computations. Sure enough, they found that transformers struggled on the sequential tasks without CoT, but enabling CoT drastically boosted their performance, especially when the transformer model was relatively small/shallow.

In essence, the chain of thought is a simple but powerful trick that vastly increases the reasoning capabilities of transformer models like GPT-3. It allows them to tackle complex tasks requiring sequential logic that parallel models would fail at.Â

Check out theÂ Paper.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 42k+ ML SubReddit

The post How â€˜Chain of Thoughtâ€™ Makes Transformers Smarter appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

How â€˜Chain of Thoughtâ€™ Makes Transformers Smarter

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

7 Emerging Generative AI User Interfaces: How Emerging User Interfaces Are Transforming Interaction

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

OpenAI Introduces Deep Research: An AI Agent that Uses Reasoning to Synthesize Large Amounts of Online Information and Complete Multi-Step Research Tasks

Announcing MongoDB Server 8.0 Platform Support Improvements

TCE Cyberwatch: Weekly Roundup Highlights AI Risks, Data Breaches, and Legal Battles

Proton-conducting materials could enable new green energy technologies

Selenium WebDriver gives the same window handle across multiple Firefox browser sessions

Generative AI UX â€” Developing Innovative Use Cases for the Enterprise

How â€˜Chain of Thoughtâ€™ Makes Transformers Smarter

Related Posts