Revealing Fine-Grained Values and Opinions in Large Language Models
- URL: http://arxiv.org/abs/2406.19238v1
- Date: Thu, 27 Jun 2024 15:01:53 GMT
- Title: Revealing Fine-Grained Values and Opinions in Large Language Models
- Authors: Dustin Wright, Arnav Arora, Nadav Borenstein, Srishti Yadav, Serge Belongie, Isabelle Augenstein,
- Abstract summary: We analyse the responses of 156k large language models to 62 propositions of the Political Compass Test (PCT)
For fine-grained analysis, we propose to identify tropes in the responses: semantically similar phrases that are recurrent and consistent across different prompts.
We find that demographic features added to prompts significantly affect outcomes on the PCT, reflecting bias, as well as disparities between the results of tests when eliciting closed-form vs. open domain responses.
- Score: 42.48316407080442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Uncovering latent values and opinions in large language models (LLMs) can help identify biases and mitigate potential harm. Recently, this has been approached by presenting LLMs with survey questions and quantifying their stances towards morally and politically charged statements. However, the stances generated by LLMs can vary greatly depending on how they are prompted, and there are many ways to argue for or against a given position. In this work, we propose to address this by analysing a large and robust dataset of 156k LLM responses to the 62 propositions of the Political Compass Test (PCT) generated by 6 LLMs using 420 prompt variations. We perform coarse-grained analysis of their generated stances and fine-grained analysis of the plain text justifications for those stances. For fine-grained analysis, we propose to identify tropes in the responses: semantically similar phrases that are recurrent and consistent across different prompts, revealing patterns in the text that a given LLM is prone to produce. We find that demographic features added to prompts significantly affect outcomes on the PCT, reflecting bias, as well as disparities between the results of tests when eliciting closed-form vs. open domain responses. Additionally, patterns in the plain text rationales via tropes show that similar justifications are repeatedly generated across models and prompts even with disparate stances.
Related papers
- Passing the Turing Test in Political Discourse: Fine-Tuning LLMs to Mimic Polarized Social Media Comments [0.0]
This study explores the extent to which fine-tuned large language models (LLMs) can replicate and amplify polarizing discourse.<n>Using a curated dataset of politically charged discussions extracted from Reddit, we fine-tune an open-source LLM to produce context-aware and ideologically aligned responses.<n>The results indicate that, when trained on partisan data, LLMs are capable of producing highly plausible and provocative comments, often indistinguishable from those written by humans.
arXiv Detail & Related papers (2025-06-17T15:41:26Z) - Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? [3.9003806149601234]
SelfReflect is a metric to assess how faithfully a string summarizes an LLM's internal answer distribution.<n>We show that SelfReflect is able to discriminate even subtle differences of candidate summary strings and that it aligns with human judgement.
arXiv Detail & Related papers (2025-05-26T17:59:53Z) - Analyzing Political Bias in LLMs via Target-Oriented Sentiment Classification [4.352835414206441]
Political biases encoded by LLMs might have detrimental effects on downstream applications.<n>We propose a new approach leveraging the observation that LLM sentiment predictions vary with the target entity in the same sentence.<n>We insert 1319 demographically and politically diverse politician names in 450 political sentences and predict target-oriented sentiment using seven models in six widely spoken languages.
arXiv Detail & Related papers (2025-05-26T10:01:24Z) - Large Language Models Still Exhibit Bias in Long Text [14.338308312117901]
We introduce the Long Text Fairness Test (LTF-TEST), a framework that evaluates biases in large language models.
By assessing both model responses and the reasoning behind them, LTF-TEST uncovers subtle biases that are difficult to detect in simple responses.
We propose FT-REGARD, a finetuning approach that pairs biased prompts with neutral responses.
arXiv Detail & Related papers (2024-10-23T02:51:33Z) - Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models [0.0]
Large language models (LLMs) have generated significant attention since their inception, finding applications across various academic and industrial domains.
LLMs often suffer from the "hallucination problem", where outputs, though grammatically and logically coherent, lack factual accuracy or are entirely fabricated.
arXiv Detail & Related papers (2024-08-09T14:34:32Z) - From Distributional to Overton Pluralism: Investigating Large Language Model Alignment [82.99849359892112]
We re-examine previously reported reductions in response diversity post-alignment.
Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation.
Findings indicate that current alignment techniques capture but do not extend the useful subset of assistant-like base LLM behavior.
arXiv Detail & Related papers (2024-06-25T16:32:33Z) - Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers [13.644277507363036]
We investigate whether these abilities are measurable outside of tailored prompting and MCQ.
Our findings suggest that the Revealed Belief of LLMs significantly differs from their Stated Answer.
As text completion is at the core of LLMs, these results suggest that common evaluation methods may only provide a partial picture.
arXiv Detail & Related papers (2024-06-21T08:56:35Z) - Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models.
We show that models give substantively different answers when not forced.
We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z) - What Evidence Do Language Models Find Convincing? [94.90663008214918]
We build a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts.
We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions.
Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important.
arXiv Detail & Related papers (2024-02-19T02:15:34Z) - Sentiment Analysis through LLM Negotiations [58.67939611291001]
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round.
This paper introduces a multi-LLM negotiation framework for sentiment analysis.
arXiv Detail & Related papers (2023-11-03T12:35:29Z) - "I'd Like to Have an Argument, Please": Argumentative Reasoning in Large Language Models [0.0]
We evaluate two large language models (LLMs) ability to perform argumentative reasoning.
We find that scoring-wise the LLMs match or surpass the SOTA in AM and APE.
However, statistical analysis on the LLMs outputs when subject to small, yet still human-readable, alterations in the I/O representations showed that the models are not performing reasoning.
arXiv Detail & Related papers (2023-09-29T02:41:38Z) - Statistical Knowledge Assessment for Large Language Models [79.07989821512128]
Given varying prompts regarding a factoid question, can a large language model (LLM) reliably generate factually correct answers?
We propose KaRR, a statistical approach to assess factual knowledge for LLMs.
Our results reveal that the knowledge in LLMs with the same backbone architecture adheres to the scaling law, while tuning on instruction-following data sometimes compromises the model's capability to generate factually correct text reliably.
arXiv Detail & Related papers (2023-05-17T18:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.