How do Large Language Models Navigate Conflicts between Honesty and
Helpfulness?
- URL: http://arxiv.org/abs/2402.07282v2
- Date: Tue, 13 Feb 2024 14:21:02 GMT
- Title: How do Large Language Models Navigate Conflicts between Honesty and
Helpfulness?
- Authors: Ryan Liu, Theodore R. Sumers, Ishita Dasgupta, Thomas L. Griffiths
- Abstract summary: We use psychological models and experiments designed to characterize human behavior to analyze large language models.
We find that reinforcement learning from human feedback improves both honesty and helpfulness.
GPT-4 Turbo demonstrates human-like response patterns including sensitivity to the conversational framing and listener's decision context.
- Score: 14.706111954807021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In day-to-day communication, people often approximate the truth - for
example, rounding the time or omitting details - in order to be maximally
helpful to the listener. How do large language models (LLMs) handle such
nuanced trade-offs? To address this question, we use psychological models and
experiments designed to characterize human behavior to analyze LLMs. We test a
range of LLMs and explore how optimization for human preferences or
inference-time reasoning affects these trade-offs. We find that reinforcement
learning from human feedback improves both honesty and helpfulness, while
chain-of-thought prompting skews LLMs towards helpfulness over honesty.
Finally, GPT-4 Turbo demonstrates human-like response patterns including
sensitivity to the conversational framing and listener's decision context. Our
findings reveal the conversational values internalized by LLMs and suggest that
even these abstract values can, to a degree, be steered by zero-shot prompting.
Related papers
- Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction.
We find that contextual characteristics significantly affect human reliance behavior.
Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z) - Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? [64.72966061510375]
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue.
This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis.
We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
arXiv Detail & Related papers (2024-06-16T20:41:44Z) - Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments [0.0]
Large Language Models (LLMs) are already as persuasive as humans.
This paper investigates the persuasion strategies of LLMs, comparing them with human-generated arguments.
arXiv Detail & Related papers (2024-04-14T19:01:20Z) - DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback [61.28463542324576]
We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models.
We propose a novel categorization of the NLF into two key types: critique and refinement.
Our experimental results demonstrate that DRESS can generate more helpful (9.76%), honest (11.52%), and harmless (21.03%) responses.
arXiv Detail & Related papers (2023-11-16T18:37:29Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Verbosity Bias in Preference Labeling by Large Language Models [10.242500241407466]
We examine the biases that come along with evaluating Large Language Models (LLMs)
We take a closer look into verbosity bias -- a bias where LLMs sometimes prefer more verbose answers even if they have similar qualities.
arXiv Detail & Related papers (2023-10-16T05:19:02Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters
for Implicature Resolution by LLMs [26.118193748582197]
We evaluate four categories of widely used state-of-the-art models.
We find that, despite only evaluating on utterances that require a binary inference, models in three of these categories perform close to random.
These results suggest that certain fine-tuning strategies are far better at inducing pragmatic understanding in models.
arXiv Detail & Related papers (2022-10-26T19:04:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.