Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation

LLMs need to generate text reflecting the diverse views of multifaceted personas. Prior studies on bias in LLMs have focused on simplistic, one-dimensional personas or multiple-choice formats. However, many applications require LLMs to generate open-ended text based on complex personas. The ability to steer LLMs to represent these multifaceted personas accurately is critical to avoid oversimplified or biased representations. If LLMs fail to capture the nuanced views of complex personas, they risk perpetuating stereotypes and monolithic perspectives, especially when personas donâ€™t align with typical demographic views. This could introduce new biases in simulations of individuals.

Carnegie Mellon University researchers define an incongruous persona as one where a trait makes other traits less likely in human data, such as political liberals supporting military spending. LLMs are 9.7% less steerable towards such personas than congruous ones, often reverting to stereotypical views. Models fine-tuned with RLHF are more steerable but show reduced view diversity. Steerability in multiple-choice tasks does not predict open-ended steerability. GPT-4 closely matches human evaluations. These findings highlight the need for improved steerability toward diverse personas and generating nuanced human opinions in LLMs.

Recent research on persona-steered generation has expanded on previous frameworks by focusing on the steerability and congruity of multifaceted personas in LLMs, considering the model scale and fine-tuning effects. Studies have used LLMs to simulate human behavior and evaluate model-generated statements, noting that RLHF can amplify political biases. Concerns about toxic outputs in the persona-steered generation have also been raised. Evaluations of LLM biases show significant variance in model accuracy and alignment with human opinions, particularly in open-ended tasks. Recent work highlights the challenges in reliably simulating diverse personas and the importance of model alignment for downstream tasks.

To assess the steerability of LLMs towards various personas, multifaceted personas combining a demographic and a stance were created using data from the Pew Research Center. Incongruous personas were identified where a demographic trait decreases the likelihood of holding certain stances. Models were tested by generating statements that align with these personas, using different model sizes and fine-tuning methods. GPT-4 evaluated steerability by comparing generated statements against given stances. Additional metrics such as individuation, exaggeration, entailment diversity, and semantic diversity were measured further to analyze the characteristics and diversity of model-generated statements.

GPT-4 aligns closely with human evaluations, showing a strong steerability assessment correlation. Models fine-tuned with RLHF and DPO are generally more steerable, especially towards stances associated with women and political liberals. However, models struggle with incongruous personas, showing significant steerability differences. Steerability could be predicted better by survey response rates. Models are biased toward generating common stances for a demographic, leading to less diversity and more stereotypes. This can perpetuate social polarization and limit modelsâ€™ ability to represent complex social identities, potentially causing representational harm.

In conclusion, the study explores how effectively LLMs can be guided to generate persona-specific statements, revealing that models are more easily steered towards congruent personas across various stances on politics, race, and gender. Models fine-tuned with RLHF show higher steerability, particularly for stances linked to political liberals or women, though at the cost of diversity. Sensitivity to persona congruity suggests models may still propagate demographic stereotypes. Future research should investigate LLM behavior in more interactive settings and develop complex, multifaceted representations to understand better and mitigate these biases.

Check out theÂ Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 43k+ ML SubReddit | Also, check out our AI Events Platform

The post Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

Big Changes at Meteor Software: Our Next Chapter

Apps in Generative AI – Transforming the Digital Experience

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation

February 2025 Baseline monthly digest

Learn A1 Level Spanish

The best smart home devices of 2024: Expert tested and reviewed

The most competent robot vacuum I tested last year just got a major upgrade

6 Best VPNs for Australia in 2024

Your interview is a sales call

The latest KB5053649 to the Beta Channel finally fixed one of most frustrating issues with Windows Tools

I Love Lasagna and Latinas Shirt

How GitHub harnesses AI to transform customer feedback into action

Sick of driving? Uber is paying people $1,000 to ditch their cars

Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation

Related Posts