Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 14, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 14, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 14, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 14, 2025

      I test a lot of AI coding tools, and this stunning new OpenAI release just saved me days of work

      May 14, 2025

      How to use your Android phone as a webcam when your laptop’s default won’t cut it

      May 14, 2025

      The 5 most customizable Linux desktop environments – when you want it your way

      May 14, 2025

      Gen AI use at work saps our motivation even as it boosts productivity, new research shows

      May 14, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025
      Recent

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025

      Perficient’s “What If? So What?” Podcast Wins Gold at the 2025 Hermes Creative Awards

      May 14, 2025

      PIM for Azure Resources

      May 14, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025
      Recent

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025

      You can now share an app/browser window with Copilot Vision to help you with different tasks

      May 14, 2025

      Microsoft will gradually retire SharePoint Alerts over the next two years

      May 14, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»LLMs are really bad at solving simple river crossing puzzles

    LLMs are really bad at solving simple river crossing puzzles

    June 25, 2024

    Large language models like GPT-4o can perform incredibly complex tasks, but even the top models struggle with some basic reasoning challenges that children can solve.

    In an interview with CBS, the ‘godfather of AI’, Geoffrey Hinton, said that ​​AI systems might be more intelligent than we know and there’s a chance the machines could take over.

    When asked about the level of current AI technology Hinton said, “I think we’re moving into a period when for the first time ever we may have things more intelligent than us.”

    Meta’s chief AI scientist, Yann LeCun, will have us believe that we’re a long way off from seeing AI achieve even “dog-level” intelligence.

    So which is it?

    This week, users on X posted examples of the incredible coding ability Anthropic’s new Claude model exhibits. Others ran experiments to highlight how AI models still struggle with very basic reasoning.

    River crossing puzzle

    The classic river crossing puzzle has multiple variations but Wikipedia’s version sums it up like this:

    A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the wolf would eat the goat, or the goat would eat the cabbage. How can they cross the river without anything being eaten?

    Finding the solution requires some basic planning and reasoning on different scenarios but it’s not a particularly difficult problem to solve. If you’re human.

    Can GPT-4o solve it? If you copy and paste the puzzle into ChatGPT it gives you the right answer, but that Wikipedia page was almost certainly in its training data.

    What if we made the puzzle a lot simpler and changed it slightly so the LLM couldn’t rely on its training data?

    British Mathematics Professor Sir William Timothy Gowers showed how the inability of LLMs to apply logic is easily exposed.

    ChatGPT’s failed attempt at solving a simplified river crossing puzzle. Source: X @wtgowers

    The correct answer to the puzzle is that only one trip is required. But it seems like ChatGPT is trying to remember an answer rather than simply reasoning through the puzzle.

    Is Claude Sonnet 3.5 any better?

    Meta Data Scientist Colin Fraser’s experiment confirms that even the leading AI model currently available can’t solve this simple puzzle.

    Claude still can’t solve the impossible one farmer one sheep one boat problem pic.twitter.com/TU13wermLZ

    — Colin Fraser (@colin_fraser) June 20, 2024

    It may have been a little disingenuous for a data scientist from Meta not to show his results using Llama 3.

    I asked Meta AI the same question and it also gets it completely wrong.

    Meta AI powered by Llama 3 also gets the river puzzle answer wrong. Source: Meta

    Yann LeCun explained the reason behind these results saying, “The issue is that LLMs have no common sense, no understanding of the world, and no ability to plan (and reason).”

    Is that true, or is something else at play?

    What these interactions might reveal is not a lack of reasoning ability, but rather how much the output of an LLM is influenced by its training data. Meta AI’s response calling this a “classic puzzle” hints that this might be what’s happening.

    The river crossing puzzle variations often reference the amount of “trips” required. When you pose the puzzle without using that word, the LLM solves it.

    Indeed. When there’s no prompt for “trips”, which brings memories of the previous solutions of so many similar problems, but the prompt “fastest way possible” along with COT, it answers correctly pic.twitter.com/E27vBv2y2R

    — AnKo (@anko_979) June 21, 2024

    These experiments were interesting, but they don’t definitively answer the argument over whether AI models are truly intelligent or simply next-token predictive machines.

    However, the results do highlight how susceptible LLMs are to training data. When GPT-4o aces the LSAT exams, is it “thinking” to find the answers to the problems or remembering them?

    Until the engineers understand what goes on inside the AI black boxes they created, the arguments on X will continue unresolved.

    The post LLMs are really bad at solving simple river crossing puzzles appeared first on DailyAI.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow to Do Hotword Detection with Streaming Speech-to-Text and Go
    Next Article Samsung will offer up to $1,500 off Galaxy Z Fold 6 and Z Flip 6 – here’s how it’ll likely work

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 15, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-30419 – NI Circuit Design Suite SymbolEditor Out-of-Bounds Read Vulnerability

    May 15, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CVE-2025-44862 – TOTOLINK CA300-POE Command Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Microsoft AI Introduces Claimify: A Novel LLM-based Claim-Extraction Method that Outperforms Prior Solutions to Produce More Accurate, Comprehensive, and Substantiated Claims from LLM Outputs

    Machine Learning

    Types of Console Methods in JavaScript for Debugging

    Development

    Integrating Security into DevOps: Adobe’s Approach

    Development

    Highlights

    News & Updates

    Blykalla and Höganäs partner to advance Blykalla’s SEALER technology

    January 28, 2025

    Swedish nuclear tech startup Blykalla is entering a strategic partnership with Swedish manufacturer Höganäs to…

    How to protect yourself from phishing attacks in Chrome and Firefox

    January 8, 2025

    OptiImage – GUI image compressor

    January 7, 2025

    Intel’s Panther Lake chips will be available in Q1 2026, not late 2025

    March 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.