An Analysis and Mitigation of the Reversal Curse
- URL: http://arxiv.org/abs/2311.07468v3
- Date: Sun, 10 Nov 2024 10:24:33 GMT
- Title: An Analysis and Mitigation of the Reversal Curse
- Authors: Ang Lv, Kaiyi Zhang, Shufang Xie, Quan Tu, Yuhan Chen, Ji-Rong Wen, Rui Yan,
- Abstract summary: Recent research observed a noteworthy phenomenon in large language models (LLMs)
The reversal curse is that when dealing with two entities, $a$ and $b$, LLMs excel in handling sequences in the form of $aRb$,'' but encounter challenges when processing $bR-1a$''
- Score: 70.13419502543915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research observed a noteworthy phenomenon in large language models (LLMs), referred to as the ``reversal curse.'' The reversal curse is that when dealing with two entities, denoted as $a$ and $b$, connected by their relation $R$ and its inverse $R^{-1}$, LLMs excel in handling sequences in the form of ``$aRb$,'' but encounter challenges when processing ``$bR^{-1}a$,'' whether in generation or comprehension. For instance, GPT-4 can accurately respond to the query ``Tom Cruise's mother is?'' with ``Mary Lee Pfeiffer,'' but it struggles to provide a satisfactory answer when asked ``Mary Lee Pfeiffer's son is?'' In this paper, we undertake the first-ever study of how the reversal curse happens in LLMs. Our investigations reveal that the reversal curse can stem from the specific training objectives, which become particularly evident in the widespread use of next-token prediction within most causal language models. We hope this initial investigation can draw more attention to the reversal curse, as well as other underlying limitations in current LLMs.
Related papers
- Enough Coin Flips Can Make LLMs Act Bayesian [71.79085204454039]
Large language models (LLMs) exhibit the ability to generalize given few-shot examples in their input prompt, an emergent capability known as in-context learning (ICL)
We investigate whether LLMs utilize ICL to perform structured reasoning in ways that are consistent with a Bayesian framework or rely on pattern matching.
arXiv Detail & Related papers (2025-03-06T18:59:23Z) - Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries.
We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT)
LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z) - Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning [68.57166425493283]
Refusal-Aware Instruction Tuning (RAIT) enables Large Language Models (LLMs) to refuse to answer unknown questions.
RAIT modifies training samples based on the correctness of the initial LLM's response.
This crude approach can cause LLMs to excessively refuse answering questions they could have correctly answered.
arXiv Detail & Related papers (2024-10-09T14:12:51Z) - Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics [45.69328374321502]
Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks.
LLMs fail to conclude '$B gets A$' during inference even if the two sentences are semantically identical.
We theoretically analyze the reversal curse via the training dynamics of gradient descent for two auto-regressive models.
arXiv Detail & Related papers (2024-05-07T21:03:51Z) - Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training [57.771940716189114]
We show that large language models (LLMs) suffer from the "reversal curse"
The root cause of the reversal curse lies in the different word order between the training and inference stage.
We propose Semantic-aware Permutation Training (SPT) to address this issue.
arXiv Detail & Related papers (2024-03-01T18:55:20Z) - Deficiency of Large Language Models in Finance: An Empirical Examination
of Hallucination [7.627664978437055]
hallucination is recognized as a fundamental deficiency of large language models (LLMs)
This paper empirically investigates LLM models' ability of explaining financial concepts and terminologies.
We evaluate the efficacy of four practical methods, including few-shot learning, Decoding by Contrasting Layers (DoLa), the Retrieval Augmentation Generation (RAG) method and the prompt-based tool learning method for a function to generate a query command.
arXiv Detail & Related papers (2023-11-27T05:27:13Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.