Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
- URL: http://arxiv.org/abs/2410.11055v1
- Date: Mon, 14 Oct 2024 20:01:52 GMT
- Title: Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
- Authors: Jihan Yao, Wenxuan Ding, Shangbin Feng, Lucy Lu Wang, Yulia Tsvetkov,
- Abstract summary: We use methods based on self-consistency, token probabilities, and LLM-as-a-judge to elicit wrong-over-wrong preferences.
Experiments show that LLMs do have preliminary capability in distinguishing various shades of wrong, achieving up to 20.9% higher performance than random guess.
- Score: 37.36302216137465
- License:
- Abstract: In the absence of abundant reliable annotations for challenging tasks and contexts, how can we expand the frontier of LLM capabilities with potentially wrong answers? We focus on two research questions: (1) Can LLMs generate reliable preferences among wrong options? And if so, (2) Would alignment with such wrong-over-wrong preferences be helpful? We employ methods based on self-consistency, token probabilities, and LLM-as-a-judge to elicit wrong-over-wrong preferences, and fine-tune language models with preference optimization approaches using these synthesized preferences. Extensive experiments with seven LLMs and eight datasets demonstrate that (1) LLMs do have preliminary capability in distinguishing various shades of wrong, achieving up to 20.9% higher performance than random guess; (2) Alignment with wrong-over-wrong preferences helps LLMs to produce less wrong and sometimes even outright correct answers, while overall improving model calibration.
Related papers
- LLMs can implicitly learn from mistakes in-context [15.818061010632249]
We investigate whether Large Language Models (LLMs) can learn from mistakes in mathematical reasoning tasks when explanations are not provided.
Surprisingly, we find that LLMs perform better, on average, when rationales are eliminated from the context.
This approach also substantially outperforms chain-of-thought prompting in our evaluations.
arXiv Detail & Related papers (2025-02-12T16:31:21Z) - Understanding the Dark Side of LLMs' Intrinsic Self-Correction [55.51468462722138]
Intrinsic self-correction was proposed to improve LLMs' responses via feedback prompts solely based on their inherent capability.
Recent works show that LLMs' intrinsic self-correction fails without oracle labels as feedback prompts.
We identify intrinsic self-correction can cause LLMs to waver both intermedia and final answers and lead to prompt bias on simple factual questions.
arXiv Detail & Related papers (2024-12-19T15:39:31Z) - Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking [56.275521022148794]
Post-training methods claim superior alignment by virtue of better correspondence with human pairwise preferences.
We attempt to answer the question -- do LLM-judge preferences translate to progress on other, more concrete metrics for alignment, and if not, why not?
We find that (1) LLM-judge preferences do not correlate with concrete measures of safety, world knowledge, and instruction following; (2) LLM-judges have powerful implicit biases, prioritizing style over factuality and safety; and (3) the supervised fine-tuning stage of post-training, and not the PO stage, has the greatest impact on alignment.
arXiv Detail & Related papers (2024-09-23T17:58:07Z) - Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models [0.0]
Large language models (LLMs) have generated significant attention since their inception, finding applications across various academic and industrial domains.
LLMs often suffer from the "hallucination problem", where outputs, though grammatically and logically coherent, lack factual accuracy or are entirely fabricated.
arXiv Detail & Related papers (2024-08-09T14:34:32Z) - SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully [9.796854466591942]
We propose an inference-time method, Self-Highlighted Hesitation (SH2), to help large language models decode more truthfully.
Experimental results demonstrate that our SH2 can effectively help LLMs elicit factual knowledge and distinguish hallucinated contexts.
arXiv Detail & Related papers (2024-01-11T14:09:09Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - LLMs cannot find reasoning errors, but can correct them given the error location [0.9017736137562115]
Poor self-correction performance stems from LLMs' inability to find logical mistakes, rather than their ability to correct a known mistake.
We benchmark several state-of-the-art LLMs on their mistake-finding ability and demonstrate that they generally struggle with the task.
We show that it is possible to obtain mistake location information without ground truth labels or in-domain training data.
arXiv Detail & Related papers (2023-11-14T20:12:38Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z) - Learning From Mistakes Makes LLM Better Reasoner [106.48571828587728]
Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems.
This work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process.
arXiv Detail & Related papers (2023-10-31T17:52:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.