Theory of Mind: How GPT-4 and LLaMA-2 Stack Up Against Human Intelligence

A team of psychologists and researchers from the University Medical Center Hamburg-Eppendorf, Italian Institute of Technology, Genoa, University of Trento, and others have researched the evolving mind capabilities of large language models (LLMs) like GPT-4, GPT-3.5, and LLaMA2-70B and performed comparisons between LLMs and human performance. The theory of mind, the ability to attribute mental states to oneself and others, is fundamental to human social interactions. As AI and LLMs are advancing, a new concern arises about their ability to understand and navigate social complexities at par with humans. This study aims to systematically compare the theory of mind abilities of LLMs with human participants across various tasks, shedding light on their similarities, differences, and underlying mechanisms.

To evaluate LLMsâ€™ theory of mind abilities, the researchers adopt a systematic experimental approach inspired by psychology. They employ a sequence of well-established theory of mind tests, including the hinting task, false belief task, recognition of faux pas, and irony comprehension. These tests cover a spectrum of theory of mind abilities, from basic understanding of false beliefs to more complex interpretations of social situations. LLMs, including GPT-4, GPT-3.5, and LLaMA2-70B, undergo multiple repetitions of each test, allowing for a robust comparison against human performance. Each task is tested on unique inputs to ensure LLMs do not merely replicate training data but demonstrate genuine understanding.

The researchers diligently administered each test to both groups, LLMs and human participants in written formats to ensure a fair comparison. They analyze responses using scoring protocols specific to each test, comparing performance across models and humans. Notably, GPT-4 exhibits strengths in irony comprehension, hinting, and strange stories tests, often surpassing human performance. However, it struggles with uncertain scenarios, such as the faux pas test, where it shows a reluctance to commit without full evidence. In contrast, GPT-3.5 and LLaMA2-70B demonstrated a bias towards affirming inappropriate statements, indicating a lack of differentiation in understanding implied knowledge. The study says that GPT models are cautious because they use mitigation measures to cut down on hallucinations and improve the accuracy of facts, which makes them overly cautious when things are not clear. Furthermore, the disembodied nature of LLMs without embodied decision-making processes contributes to differences in handling social uncertainty compared to humans.

In conclusion, the research highlights the complexity of evaluating LLMsâ€™ theory of mind abilities and the importance of systematic testing to ensure a meaningful comparison with human cognition. While LLMs like GPT-4 demonstrate remarkable advancements in certain theory of mind tasks, they fall short in uncertain scenarios, revealing a cautious epistemic policy possibly linked to training methodologies. Understanding these differences is crucial for the development of LLMs that can navigate social interactions with human-like proficiency.

Check out theÂ Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 42k+ ML SubReddit

The post Theory of Mind: How GPT-4 and LLaMA-2 Stack Up Against Human Intelligence appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

The Palworld dating sim is real and so is the Lovander body pillow

TeamRICOCHET unveils plan to use machine learning in its effort to ramp up Call of Duty anti-cheat efforts

Bill Gates would restart Microsoft as an AI-centric lab after 50 years — “Raising billions of dollars from a few sketch ideas”

Monster Hunter Wilds celebrates 10 million copies sold as Capcom fully admits it made the game too easy — Plans to increase challenges in the coming months

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Community News: Latest PEAR Releases (03.31.2025)

The Palworld dating sim is real and so is the Lovander body pillow

The Palworld dating sim is real and so is the Lovander body pillow

TeamRICOCHET unveils plan to use machine learning in its effort to ramp up Call of Duty anti-cheat efforts

Bill Gates would restart Microsoft as an AI-centric lab after 50 years — “Raising billions of dollars from a few sketch ideas”

Theory of Mind: How GPT-4 and LLaMA-2 Stack Up Against Human Intelligence

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

Why Fixing Websites Is a Growth Opportunity for Freelancers

Share Error Package for Laravel’s New Exception Page

How to Use Google Colab: A Beginnerâ€™s Guide

Webinar Alert: Learn How ITDR Solutions Stop Sophisticated Identity Attacks

Google Warns: Android Zero-Day Flaws in Pixel Phones Exploited by Forensic Companies

The Significance of Application Performance Monitoring for Businesses

A Comprehensive Review of Survey on Efficient Multimodal Large Language Models

The score takes care of itself

Theory of Mind: How GPT-4 and LLaMA-2 Stack Up Against Human Intelligence

Related Posts