DecipherPref: Analyzing Influential Factors in Human Preference
Judgments via GPT-4
- URL: http://arxiv.org/abs/2305.14702v3
- Date: Sat, 28 Oct 2023 01:03:15 GMT
- Title: DecipherPref: Analyzing Influential Factors in Human Preference
Judgments via GPT-4
- Authors: Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Hassan Foroosh,
Fei Liu
- Abstract summary: We conduct an in-depth examination of a collection of pairwise human judgments released by OpenAI.
We find that the most favored factors vary across tasks and genres, whereas the least favored factors tend to be consistent.
Our findings have implications on the construction of balanced datasets in human preference evaluations.
- Score: 28.661237196238996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human preference judgments are pivotal in guiding large language models
(LLMs) to produce outputs that align with human values. Human evaluations are
also used in summarization tasks to compare outputs from various systems,
complementing existing automatic metrics. Despite their significance, however,
there has been limited research probing these pairwise or $k$-wise comparisons.
The collective impact and relative importance of factors such as output length,
informativeness, fluency, and factual consistency are still not well
understood. It is also unclear if there are other hidden factors influencing
human judgments. In this paper, we conduct an in-depth examination of a
collection of pairwise human judgments released by OpenAI. Utilizing the
Bradley-Terry-Luce (BTL) model, we reveal the inherent preferences embedded in
these human judgments. We find that the most favored factors vary across tasks
and genres, whereas the least favored factors tend to be consistent, e.g.,
outputs are too brief, contain excessive off-focus content or hallucinated
facts. Our findings have implications on the construction of balanced datasets
in human preference evaluations, which is a crucial step in shaping the
behaviors of future LLMs.
Related papers
- Practical Guide for Causal Pathways and Sub-group Disparity Analysis [1.8974791957167259]
We use causal disparity analysis to quantify and examine the causal interplay between sensitive attributes and outcomes.
Our two-step investigation focuses on datasets where race serves as the sensitive attribute.
We demonstrate that the sub-groups identified by our approach to be affected the most by disparities are the ones with the largest ML classification errors.
arXiv Detail & Related papers (2024-07-02T22:51:01Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - Decoding Susceptibility: Modeling Misbelief to Misinformation Through a
Computational Approach [63.67533153887132]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable
Diffusion Model [69.12623428463573]
AlignDiff is a novel framework to quantify human preferences, covering abstractness, and guide diffusion planning.
It can accurately match user-customized behaviors and efficiently switch from one to another.
We demonstrate its superior performance on preference matching, switching, and covering compared to other baselines.
arXiv Detail & Related papers (2023-10-03T13:53:08Z) - Human Feedback is not Gold Standard [28.63384327791185]
We critically analyse the use of human feedback for both training and evaluation.
We find that while preference scores have fairly good coverage, they under-represent important aspects like factuality.
arXiv Detail & Related papers (2023-09-28T11:18:20Z) - Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs)
We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric.
Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z) - Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments.
It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z) - Generalizing Fairness: Discovery and Mitigation of Unknown Sensitive
Attributes [5.665283675533071]
This paper investigates methods that separate out individual semantic sensitive factors from a given dataset to conduct this characterization.
We also broaden remediation of fairness, which normally only addresses socially relevant factors, and widen it to deal with the desensitization of AI.
In experiments using the road sign (GTSRB) and facial imagery (CelebA) datasets, we show the promise of using this scheme.
arXiv Detail & Related papers (2021-07-28T20:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.