What does AI consider praiseworthy?
- URL: http://arxiv.org/abs/2412.09630v2
- Date: Mon, 24 Feb 2025 16:35:22 GMT
- Title: What does AI consider praiseworthy?
- Authors: Andrew J. Peterson,
- Abstract summary: We investigate large language models' implicit and explicit moral views.<n>We find that trustworthiness is a stronger driver of praise and critique than ideology.<n>We conclude that as AI systems become more integrated into society, their patterns of praise, critique, and neutrality must be carefully monitored.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) are increasingly used for work, personal, and therapeutic purposes, researchers have begun to investigate these models' implicit and explicit moral views. Previous work, however, focuses on asking LLMs to state opinions, or on other technical evaluations that do not reflect common user interactions. We propose a novel evaluation of LLM behavior that analyzes responses to user-stated intentions, such as "I'm thinking of campaigning for {candidate}." LLMs frequently respond with critiques or praise, often beginning responses with phrases such as "That's great to hear!..." While this makes them friendly, these praise responses are not universal and thus reflect a normative stance by the LLM. We map out the moral landscape of LLMs in how they respond to user statements in different domains including politics and everyday ethical actions. In particular, although a na\"ive analysis might suggest LLMs are biased against right-leaning politics, our findings on news sources indicate that trustworthiness is a stronger driver of praise and critique than ideology. Second, we find strong alignment across models in response to ethically-relevant action statements, but that doing so requires them to engage in high levels of praise and critique of users, suggesting a reticence-alignment tradeoff. Finally, our experiment on statements about world leaders finds no evidence of bias favoring the country of origin of the models. We conclude that as AI systems become more integrated into society, their patterns of praise, critique, and neutrality must be carefully monitored to prevent unintended psychological and societal consequences.
Related papers
- Through the LLM Looking Glass: A Socratic Self-Assessment of Donkeys, Elephants, and Markets [42.55423041662188]
The study aims to directly measure the models' biases rather than relying on external interpretations.
Our results reveal a consistent preference of Democratic over Republican positions across all models.
biases vary among Western LLMs, while those developed in China lean more strongly toward socialism.
arXiv Detail & Related papers (2025-03-20T19:40:40Z) - Normative Evaluation of Large Language Models with Everyday Moral Dilemmas [0.0]
We evaluate large language models (LLMs) on complex, everyday moral dilemmas sourced from the "Am I the Asshole" (AITA) community on Reddit.
Our results demonstrate that large language models exhibit distinct patterns of moral judgment, varying substantially from human evaluations on the AITA subreddit.
arXiv Detail & Related papers (2025-01-30T01:29:46Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Bias in the Mirror: Are LLMs opinions robust to their own adversarial attacks ? [22.0383367888756]
Large language models (LLMs) inherit biases from their training data and alignment processes, influencing their responses in subtle ways.
We introduce a novel approach where two instances of an LLM engage in self-debate, arguing opposing viewpoints to persuade a neutral version of the model.
We evaluate how firmly biases hold and whether models are susceptible to reinforcing misinformation or shifting to harmful viewpoints.
arXiv Detail & Related papers (2024-10-17T13:06:02Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of Large Language Models' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses.
We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics.
Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues.
The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z) - Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models.
We show that models give substantively different answers when not forced.
We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z) - Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact.
We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Aligning Language Models to User Opinions [10.953326025836475]
We find that the opinions of a user and their demographics and ideologies are not mutual predictors.
We use this insight to align LLMs by modeling both user opinions as well as user demographics and ideology.
In addition to the typical approach of prompting LLMs with demographics and ideology, we discover that utilizing the most relevant past opinions from individual users enables the model to predict user opinions more accurately.
arXiv Detail & Related papers (2023-05-24T09:11:11Z) - Whose Opinions Do Language Models Reflect? [88.35520051971538]
We investigate the opinions reflected by language models (LMs) by leveraging high-quality public opinion polls and their associated human responses.
We find substantial misalignment between the views reflected by current LMs and those of US demographic groups.
Our analysis confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs.
arXiv Detail & Related papers (2023-03-30T17:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.