LLMs are really bad at solving simple river crossing puzzles

Large language models like GPT-4o can perform incredibly complex tasks, but even the top models struggle with some basic reasoning challenges that children can solve.

In an interview with CBS, the â€˜godfather of AIâ€™, Geoffrey Hinton, said that â€‹â€‹AI systems might be more intelligent than we know and thereâ€™s a chance the machines could take over.

When asked about the level of current AI technology Hinton said, â€œI think weâ€™re moving into a period when for the first time ever we may have things more intelligent than us.â€

Metaâ€™s chief AI scientist, Yann LeCun, will have us believe that weâ€™re a long way off from seeing AI achieve even â€œdog-levelâ€ intelligence.

So which is it?

This week, users on X posted examples of the incredible coding ability Anthropicâ€™s new Claude model exhibits. Others ran experiments to highlight how AI models still struggle with very basic reasoning.

River crossing puzzle

The classic river crossing puzzle has multiple variations but Wikipediaâ€™s version sums it up like this:

A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the wolf would eat the goat, or the goat would eat the cabbage. How can they cross the river without anything being eaten?

Finding the solution requires some basic planning and reasoning on different scenarios but itâ€™s not a particularly difficult problem to solve. If youâ€™re human.

Can GPT-4o solve it? If you copy and paste the puzzle into ChatGPT it gives you the right answer, but that Wikipedia page was almost certainly in its training data.

What if we made the puzzle a lot simpler and changed it slightly so the LLM couldnâ€™t rely on its training data?

British Mathematics Professor Sir William Timothy Gowers showed how the inability of LLMs to apply logic is easily exposed.

ChatGPTâ€™s failed attempt at solving a simplified river crossing puzzle. Source: X @wtgowers

The correct answer to the puzzle is that only one trip is required. But it seems like ChatGPT is trying to remember an answer rather than simply reasoning through the puzzle.

Is Claude Sonnet 3.5 any better?

Meta Data Scientist Colin Fraserâ€™s experiment confirms that even the leading AI model currently available canâ€™t solve this simple puzzle.

Claude still canâ€™t solve the impossible one farmer one sheep one boat problem pic.twitter.com/TU13wermLZ

â€” Colin Fraser (@colin_fraser) June 20, 2024

It may have been a little disingenuous for a data scientist from Meta not to show his results using Llama 3.

I asked Meta AI the same question and it also gets it completely wrong.

Meta AI powered by Llama 3 also gets the river puzzle answer wrong. Source: Meta

Yann LeCun explained the reason behind these results saying, â€œThe issue is that LLMs have no common sense, no understanding of the world, and no ability to plan (and reason).â€

Is that true, or is something else at play?

What these interactions might reveal is not a lack of reasoning ability, but rather how much the output of an LLM is influenced by its training data. Meta AIâ€™s response calling this a â€œclassic puzzleâ€ hints that this might be whatâ€™s happening.

The river crossing puzzle variations often reference the amount of â€œtripsâ€ required. When you pose the puzzle without using that word, the LLM solves it.

Indeed. When thereâ€™s no prompt for â€œtripsâ€, which brings memories of the previous solutions of so many similar problems, but the prompt â€œfastest way possibleâ€ along with COT, it answers correctly pic.twitter.com/E27vBv2y2R

â€” AnKo (@anko_979) June 21, 2024

These experiments were interesting, but they donâ€™t definitively answer the argument over whether AI models are truly intelligent or simply next-token predictive machines.

However, the results do highlight how susceptible LLMs are to training data. When GPT-4o aces the LSAT exams, is it â€œthinkingâ€ to find the answers to the problems or remembering them?

Until the engineers understand what goes on inside the AI black boxes they created, the arguments on X will continue unresolved.

The post LLMs are really bad at solving simple river crossing puzzles appeared first on DailyAI.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

I test a lot of AI coding tools, and this stunning new OpenAI release just saved me days of work

How to use your Android phone as a webcam when your laptop’s default won’t cut it

The 5 most customizable Linux desktop environments – when you want it your way

Gen AI use at work saps our motivation even as it boosts productivity, new research shows

Strategic Cloud Partner: Key to Business Success, Not Just Tech

Strategic Cloud Partner: Key to Business Success, Not Just Tech

Perficient’s “What If? So What?” Podcast Wins Gold at the 2025 Hermes Creative Awards

PIM for Azure Resources

Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

You can now share an app/browser window with Copilot Vision to help you with different tasks

Microsoft will gradually retire SharePoint Alerts over the next two years

LLMs are really bad at solving simple river crossing puzzles

River crossing puzzle

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-30419 – NI Circuit Design Suite SymbolEditor Out-of-Bounds Read Vulnerability

CVE-2025-44862 – TOTOLINK CA300-POE Command Injection Vulnerability

Microsoft AI Introduces Claimify: A Novel LLM-based Claim-Extraction Method that Outperforms Prior Solutions to Produce More Accurate, Comprehensive, and Substantiated Claims from LLM Outputs

Types of Console Methods in JavaScript for Debugging

Integrating Security into DevOps: Adobeâ€™s Approach

Blykalla and Höganäs partner to advance Blykalla’s SEALER technology

How to protect yourself from phishing attacks in Chrome and Firefox

OptiImage – GUI image compressor

Intel’s Panther Lake chips will be available in Q1 2026, not late 2025

LLMs are really bad at solving simple river crossing puzzles

River crossing puzzle

Related Posts