Verbosity Bias in Preference Labeling by Large Language Models
- URL: http://arxiv.org/abs/2310.10076v1
- Date: Mon, 16 Oct 2023 05:19:02 GMT
- Title: Verbosity Bias in Preference Labeling by Large Language Models
- Authors: Keita Saito, Akifumi Wachi, Koki Wataoka, Youhei Akimoto
- Abstract summary: We examine the biases that come along with evaluating Large Language Models (LLMs)
We take a closer look into verbosity bias -- a bias where LLMs sometimes prefer more verbose answers even if they have similar qualities.
- Score: 10.242500241407466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Large Language Models (LLMs) have witnessed a remarkable
surge in prevalence, altering the landscape of natural language processing and
machine learning. One key factor in improving the performance of LLMs is
alignment with humans achieved with Reinforcement Learning from Human Feedback
(RLHF), as for many LLMs such as GPT-4, Bard, etc. In addition, recent studies
are investigating the replacement of human feedback with feedback from other
LLMs named Reinforcement Learning from AI Feedback (RLAIF). We examine the
biases that come along with evaluating LLMs with other LLMs and take a closer
look into verbosity bias -- a bias where LLMs sometimes prefer more verbose
answers even if they have similar qualities. We see that in our problem
setting, GPT-4 prefers longer answers more than humans. We also propose a
metric to measure this bias.
Related papers
- Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs [6.090496490133132]
We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases.
We propose Reinforcement Learning from Multi-role Debates as Feedback (RLDF), a novel approach for bias mitigation replacing human feedback.
arXiv Detail & Related papers (2024-04-15T22:18:50Z) - Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement [75.7148545929689]
Large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others.
We formally define LLM's self-bias - the tendency to favor its own generation.
We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks.
arXiv Detail & Related papers (2024-02-18T03:10:39Z) - I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses [23.053791342294268]
fine-tuning a large language model (LLM) with responses generated by a LLM often yields better results than using responses generated by humans.
Training with LLM-generated responses not only enhances performance but also helps maintain the model's capabilities in other tasks after fine-tuning on a specific task.
arXiv Detail & Related papers (2024-02-17T05:05:31Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - The ART of LLM Refinement: Ask, Refine, and Trust [85.75059530612882]
We propose a reasoning with refinement objective called ART: Ask, Refine, and Trust.
It asks necessary questions to decide when an LLM should refine its output.
It achieves a performance gain of +5 points over self-refinement baselines.
arXiv Detail & Related papers (2023-11-14T07:26:32Z) - Psychometric Predictive Power of Large Language Models [32.31556074470733]
We find that instruction tuning does not always make large language models human-like from a cognitive modeling perspective.
Next-word probabilities estimated by instruction-tuned LLMs are often worse at simulating human reading behavior than those estimated by base LLMs.
arXiv Detail & Related papers (2023-11-13T17:19:14Z) - Large Language Models are biased to overestimate profoundness [0.0]
This study evaluates GPT-4 and various other large language models (LLMs) in judging the profoundness of mundane, motivational, and pseudo-profound statements.
We found a significant statement-to-statement correlation between the LLMs and humans, irrespective of the type of statements and the prompting technique used.
arXiv Detail & Related papers (2023-10-22T21:33:50Z) - Evaluating Large Language Models at Evaluating Instruction Following [54.49567482594617]
We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs.
We discover that different evaluators exhibit distinct performance on LLMBar and even the highest-scoring ones have substantial room for improvement.
arXiv Detail & Related papers (2023-10-11T16:38:11Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Self-Refine: Iterative Refinement with Self-Feedback [62.78755306241981]
Self-Refine is an approach for improving initial outputs from large language models (LLMs) through iterative feedback and refinement.
We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs.
Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach.
arXiv Detail & Related papers (2023-03-30T18:30:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.