Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation

LLMs need to generate text reflecting the diverse views of multifaceted personas. Prior studies on bias in LLMs have focused on simplistic, one-dimensional personas or multiple-choice formats. However, many applications require LLMs to generate open-ended text based on complex personas. The ability to steer LLMs to represent these multifaceted personas accurately is critical to avoid oversimplified or biased representations. If LLMs fail to capture the nuanced views of complex personas, they risk perpetuating stereotypes and monolithic perspectives, especially when personas donâ€™t align with typical demographic views. This could introduce new biases in simulations of individuals.

Carnegie Mellon University researchers define an incongruous persona as one where a trait makes other traits less likely in human data, such as political liberals supporting military spending. LLMs are 9.7% less steerable towards such personas than congruous ones, often reverting to stereotypical views. Models fine-tuned with RLHF are more steerable but show reduced view diversity. Steerability in multiple-choice tasks does not predict open-ended steerability. GPT-4 closely matches human evaluations. These findings highlight the need for improved steerability toward diverse personas and generating nuanced human opinions in LLMs.

Recent research on persona-steered generation has expanded on previous frameworks by focusing on the steerability and congruity of multifaceted personas in LLMs, considering the model scale and fine-tuning effects. Studies have used LLMs to simulate human behavior and evaluate model-generated statements, noting that RLHF can amplify political biases. Concerns about toxic outputs in the persona-steered generation have also been raised. Evaluations of LLM biases show significant variance in model accuracy and alignment with human opinions, particularly in open-ended tasks. Recent work highlights the challenges in reliably simulating diverse personas and the importance of model alignment for downstream tasks.

To assess the steerability of LLMs towards various personas, multifaceted personas combining a demographic and a stance were created using data from the Pew Research Center. Incongruous personas were identified where a demographic trait decreases the likelihood of holding certain stances. Models were tested by generating statements that align with these personas, using different model sizes and fine-tuning methods. GPT-4 evaluated steerability by comparing generated statements against given stances. Additional metrics such as individuation, exaggeration, entailment diversity, and semantic diversity were measured further to analyze the characteristics and diversity of model-generated statements.

GPT-4 aligns closely with human evaluations, showing a strong steerability assessment correlation. Models fine-tuned with RLHF and DPO are generally more steerable, especially towards stances associated with women and political liberals. However, models struggle with incongruous personas, showing significant steerability differences. Steerability could be predicted better by survey response rates. Models are biased toward generating common stances for a demographic, leading to less diversity and more stereotypes. This can perpetuate social polarization and limit modelsâ€™ ability to represent complex social identities, potentially causing representational harm.

In conclusion, the study explores how effectively LLMs can be guided to generate persona-specific statements, revealing that models are more easily steered towards congruent personas across various stances on politics, race, and gender. Models fine-tuned with RLHF show higher steerability, particularly for stances linked to political liberals or women, though at the cost of diversity. Sensitivity to persona congruity suggests models may still propagate demographic stereotypes. Future research should investigate LLM behavior in more interactive settings and develop complex, multifaceted representations to understand better and mitigate these biases.

Check out theÂ Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 43k+ ML SubReddit | Also, check out our AI Events Platform

The post Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Gears of War: Reloaded — Release date, price, and everything you need to know

I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

Your Android devices are getting several upgrades for free – including a big one for Auto

You may qualify for Apple’s $95 million Siri settlement – how to file a claim today

YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

Big Changes at Meteor Software: Our Next Chapter

Gears of War: Reloaded — Release date, price, and everything you need to know

Gears of War: Reloaded — Release date, price, and everything you need to know

I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

How to Make Your Linux Terminal Talk Using espeak-ng

Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation

Last Week in AI #302 – QwQ 32B, OpenAI injunction refused, Alexa Plus

LWiAI Podcast #202 – Qwen-32B, Anthropic’s $3.5 billion, LLM Cognitive Behaviors

SEC Wonâ€™t Bring Charges Against Progress Software Over MOVEit Supply Chain Attack

Use Single-AZ read replicas in Amazon RDS for SQL Server

CVE-2024-12225 – Quarkus WebAuthn Default Endpoints Information Disclosure and Authentication Bypass

Critical Webmin Vulnerability Let Remote Attackers Escalate Privileges to Root-Level

PowerToys latest update focuses on fixes and new features for the Command Palette

JetBrains AI Assistant can now use local LLMs

Implementing Persistent Memory Using a Local Knowledge Graph in Claude Desktop

Need to relax? This new iPhone feature does the trick for me – here’s how

Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation

Related Posts