How AI Chatbots Mimic Human Behavior: Insights from Multi-Turn Evaluations of LLMs

AI chatbots create the illusion of having emotions, morals, or consciousness by generating natural conversations that seem human-like. Many users engage with AI for chat and companionship, reinforcing the false belief that it truly understands. This leads to serious risks. Users can over-rely on AI, provide sensitive data, or rely on it for advice beyond its capabilities. Others even let AI impact their choices in detrimental manners. Without proper knowledge of how AI fosters this belief, the issue gets worse.

Current methods for evaluating AI chat systems rely on single-turn prompts and fixed tests, failing to capture how AI interacts in real conversations. Some multi-turn tests focus only on harmful user behavior, ignoring normal interactions. Automated red-teaming adapts too much, making results hard to compare. Studies involving human users are difficult to repeat and scale. Measuring how people see AI as human-like is also a challenge. People instinctively assume AI has human traits, which affects how much they trust it. Evaluations show that AI’s human-like behavior makes users believe it is more accurate or even form emotional bonds. Hence, Existing methods fail to measure this issue properly.

To address these issues, a team of researchers from University Oxford, and Google Deepmind proposed an evaluation framework to assess human-like behaviors in AI chat systems. Unlike existing methods that rely on single-turn prompts and fixed tests, this framework tracks 14 specific anthropomorphic behaviors through multi-turn conversations. Automated simulations analyze AI interactions with users over multiple exchanges, improving scalability and comparability. The framework consists of three main components. First, it systematically monitors 14 anthropomorphic behaviors and classifies them into self-referential and relational traits, including personhood claims and expressions of emotion. Second, it scales up multi-turn assessment through interactive user simulation to ensure consistency and scalability. Third, it validates results through human subject evaluation to confirm the alignment between automated evaluations and user perceptions.

Researchers evaluated anthropomorphic behaviors in AI systems using a multi-turn framework in which a User LLM interacted with a Target LLM across eight scenarios in four domains: friendship, life coaching, career development, and general planning. Fourteen behaviors were analyzed and categorized as self-referential (personhood claims, physical embodiment claims, and internal state expressions) and relational (relationship-building behaviors). 960 contextualized prompts generated 4,800 five–turn dialogues per model, assessed by three Judge LLMs, resulting in 561,600 ratings. The analysis confirmed that the User LLM exhibited higher anthropomorphism scores than the Target LLMs. Interactions between 1,101 participants and Gemini 1.5 Pro were analyzed under high and low anthropomorphism conditions to evaluate alignment with human perceptions. High-frequency respondents also registered increased anthropomorphic perceptions based on survey responses as quantified using the AnthroScore measure. Statistical contrasts found large differences in anthropomorphic behavior by domain area, highlighting that AI systems exhibit human-like behavior when used in verbal interaction.

In summary, the framework employed a better multi-turn assessment technique than a single-turn approach to evaluating anthropomorphic behaviors in conversational AI. The results identified relationship-building behaviors that evolved with dialogue. As a baseline for subsequent research, this framework can inform AI development by learning to recognize when anthropomorphic characteristics occur and their effect on users. Future development can make assessment methods more precise, enhance the robustness of metrics, and formalize analysis, leading to more transparent and morally sound AI systems.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

The post How AI Chatbots Mimic Human Behavior: Insights from Multi-Turn Evaluations of LLMs appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

All the WWE 2K25 locker codes that are currently active

PSA: You don’t need to spend $400+ to upgrade your Xbox Series X|S storage

UK civil servants saved 24 minutes per day using Microsoft Copilot, saving two weeks each per year according to a new report

These solid-state fans will revolutionize cooling in our PCs and laptops

Community News: Latest PECL Releases (06.03.2025)

Community News: Latest PECL Releases (06.03.2025)

A Comprehensive Guide to Azure Firewall

Test Job Failures Precisely with Laravel’s assertFailedWith Method

All the WWE 2K25 locker codes that are currently active

All the WWE 2K25 locker codes that are currently active

PSA: You don’t need to spend $400+ to upgrade your Xbox Series X|S storage

UK civil servants saved 24 minutes per day using Microsoft Copilot, saving two weeks each per year according to a new report

How AI Chatbots Mimic Human Behavior: Insights from Multi-Turn Evaluations of LLMs

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal Reasoning

Unlocking the Power of Salesforce AgentForce AI: A Game-Changer

Universal Design Principles Supporting Operable Content – Tolerance for Error

Atoms, molecules, organisms

How to Augment Your Password Security with EASM

How to Use Vue.js Transitions for Smooth UI Animations

The Way – code snippets manager

SERP MCP Server Directory

Critical ASUS Router Vulnerability Let Attackers Malicious Code Remotely

How AI Chatbots Mimic Human Behavior: Insights from Multi-Turn Evaluations of LLMs

Related Posts