Related papers: Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

URL: http://arxiv.org/abs/2410.11055v1
Date: Mon, 14 Oct 2024 20:01:52 GMT
Title: Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Authors: Jihan Yao, Wenxuan Ding, Shangbin Feng, Lucy Lu Wang, Yulia Tsvetkov,
Abstract summary: We use methods based on self-consistency, token probabilities, and LLM-as-a-judge to elicit wrong-over-wrong preferences. Experiments show that LLMs do have preliminary capability in distinguishing various shades of wrong, achieving up to 20.9% higher performance than random guess.
Score: 37.36302216137465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the absence of abundant reliable annotations for challenging tasks and contexts, how can we expand the frontier of LLM capabilities with potentially wrong answers? We focus on two research questions: (1) Can LLMs generate reliable preferences among wrong options? And if so, (2) Would alignment with such wrong-over-wrong preferences be helpful? We employ methods based on self-consistency, token probabilities, and LLM-as-a-judge to elicit wrong-over-wrong preferences, and fine-tune language models with preference optimization approaches using these synthesized preferences. Extensive experiments with seven LLMs and eight datasets demonstrate that (1) LLMs do have preliminary capability in distinguishing various shades of wrong, achieving up to 20.9% higher performance than random guess; (2) Alignment with wrong-over-wrong preferences helps LLMs to produce less wrong and sometimes even outright correct answers, while overall improving model calibration.

Related papers

Self-ensemble: Mitigating Confidence Distortion for Large Language Models [89.03110940871765]
Large Language Models exhibit a confidence distortion problem on multi-choice question-answering.<n>We propose Self-ensemble to solve this problem.<n> Experimental results on three LLMs and datasets demonstrate that Self-ensemble comprehensively addresses the confidence distortion problem.
arXiv Detail & Related papers (2025-06-02T17:59:29Z)
LLMs can implicitly learn from mistakes in-context [15.818061010632249]
We investigate whether Large Language Models (LLMs) can learn from mistakes in mathematical reasoning tasks when explanations are not provided. Surprisingly, we find that LLMs perform better, on average, when rationales are eliminated from the context. This approach also substantially outperforms chain-of-thought prompting in our evaluations.
arXiv Detail & Related papers (2025-02-12T16:31:21Z)
Understanding the Dark Side of LLMs' Intrinsic Self-Correction [55.51468462722138]
Intrinsic self-correction was proposed to improve LLMs' responses via feedback prompts solely based on their inherent capability. Recent works show that LLMs' intrinsic self-correction fails without oracle labels as feedback prompts. We identify intrinsic self-correction can cause LLMs to waver both intermedia and final answers and lead to prompt bias on simple factual questions.
arXiv Detail & Related papers (2024-12-19T15:39:31Z)
Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks [37.326754557721586]
Large language models (LLMs) have shown promise in biomolecule optimization problems.<n> specialized solvers like LaMBO-2 offer efficiency and fine-grained control but require more domain expertise.<n>We address this by introducing Ehrlich functions, a synthetic test suite that captures the geometric structure of biophysical sequence optimization problems.
arXiv Detail & Related papers (2024-10-29T17:45:57Z)
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints [86.59857711385833]
We introduce RealInstruct, the first benchmark designed to evaluate LLMs' ability to follow real-world multi-constrained instructions. To address the performance gap between open-source and proprietary models, we propose the Decompose, Critique and Refine (DeCRIM) self-correction pipeline. Our results show that DeCRIM improves Mistral's performance by 7.3% on RealInstruct and 8.0% on IFEval even with weak feedback.
arXiv Detail & Related papers (2024-10-09T01:25:10Z)
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking [56.275521022148794]
Post-training methods claim superior alignment by virtue of better correspondence with human pairwise preferences. We attempt to answer the question -- do LLM-judge preferences translate to progress on other, more concrete metrics for alignment, and if not, why not? We find that (1) LLM-judge preferences do not correlate with concrete measures of safety, world knowledge, and instruction following; (2) LLM-judges have powerful implicit biases, prioritizing style over factuality and safety; and (3) the supervised fine-tuning stage of post-training, and not the PO stage, has the greatest impact on alignment.
arXiv Detail & Related papers (2024-09-23T17:58:07Z)
Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models [0.0]
Large language models (LLMs) have generated significant attention since their inception, finding applications across various academic and industrial domains. LLMs often suffer from the "hallucination problem", where outputs, though grammatically and logically coherent, lack factual accuracy or are entirely fabricated.
arXiv Detail & Related papers (2024-08-09T14:34:32Z)
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning [34.34977150518316]
textscCoTErrorSet, a new benchmark with 609,432 questions, each designed with both correct and error references. textbfSelf-rethinking prompting guides LLMs to rethink whether they have made similar previous mistakes. textbfMistake tuning involves finetuning models in both correct and incorrect reasoning domains.
arXiv Detail & Related papers (2024-03-29T08:30:34Z)
SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully [9.796854466591942]
We propose an inference-time method, Self-Highlighted Hesitation (SH2), to help large language models decode more truthfully. Experimental results demonstrate that our SH2 can effectively help LLMs elicit factual knowledge and distinguish hallucinated contexts.
arXiv Detail & Related papers (2024-01-11T14:09:09Z)
LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality. We propose LLMRefine, an inference time optimization method to refine LLM's output. We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization. LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z)
LLMs cannot find reasoning errors, but can correct them given the error location [0.9017736137562115]
Poor self-correction performance stems from LLMs' inability to find logical mistakes, rather than their ability to correct a known mistake. We benchmark several state-of-the-art LLMs on their mistake-finding ability and demonstrate that they generally struggle with the task. We show that it is possible to obtain mistake location information without ground truth labels or in-domain training data.
arXiv Detail & Related papers (2023-11-14T20:12:38Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
Learning From Mistakes Makes LLM Better Reasoner [106.48571828587728]
Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. This work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process.
arXiv Detail & Related papers (2023-10-31T17:52:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.