Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation
of the Reversal Curse
- URL: http://arxiv.org/abs/2311.07468v2
- Date: Thu, 16 Nov 2023 08:35:05 GMT
- Title: Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation
of the Reversal Curse
- Authors: Ang Lv and Kaiyi Zhang and Shufang Xie and Quan Tu and Yuhan Chen and
Ji-Rong Wen and Rui Yan
- Abstract summary: Recent studies have highlighted a phenomenon in large language models known as "the reversal curse"
We contend that the reversal curse is partially a result of specific model training objectives.
We propose a novel training method, BI Casual language modeling Optimization (BICO), designed to mitigate the reversal curse.
- Score: 73.65112477688353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have highlighted a phenomenon in large language models (LLMs)
known as "the reversal curse," in which the order of knowledge entities in the
training data biases the models' comprehension. For example, if a model is
trained on sentences where entity A consistently appears before entity B, it
can respond to queries about A by providing B as the answer. However, it may
encounter confusion when presented with questions concerning B. We contend that
the reversal curse is partially a result of specific model training objectives,
particularly evident in the prevalent use of the next-token prediction within
most causal language models. For the next-token prediction, models solely focus
on a token's preceding context, resulting in a restricted comprehension of the
input. In contrast, we illustrate that the GLM, trained using the
autoregressive blank infilling objective where tokens to be predicted have
access to the entire context, exhibits better resilience against the reversal
curse. We propose a novel training method, BIdirectional Casual language
modeling Optimization (BICO), designed to mitigate the reversal curse when
fine-tuning pretrained causal language models on new data. BICO modifies the
causal attention mechanism to function bidirectionally and employs a mask
denoising optimization. In the task designed to assess the reversal curse, our
approach improves Llama's accuracy from the original 0% to around 70%. We hope
that more attention can be focused on exploring and addressing these inherent
weaknesses of the current LLMs, in order to achieve a higher level of
intelligence.
Related papers
- RECALL: Library-Like Behavior In Language Models is Enhanced by Self-Referencing Causal Cycles [18.140067201462884]
We introduce the concept of the self-referencing causal cycle (abbreviated RECALL)
It enables large language models to bypass the limitations of unidirectional causality.
We find that RECALL is driven by what we designate as cycle tokens.
arXiv Detail & Related papers (2025-01-23T09:14:07Z) - Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries.
We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT)
LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z) - Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning [68.57166425493283]
Refusal-Aware Instruction Tuning (RAIT) enables Large Language Models (LLMs) to refuse to answer unknown questions.
This crude approach can cause LLMs to excessively refuse answering questions they could have correctly answered.
We introduce Certainty Represented Knowledge Flow for Refusal-Aware Instructions Tuning (CRaFT) to address this issue.
arXiv Detail & Related papers (2024-10-09T14:12:51Z) - Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics [45.69328374321502]
Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks.
LLMs fail to conclude '$B gets A$' during inference even if the two sentences are semantically identical.
We theoretically analyze the reversal curse via the training dynamics of gradient descent for two auto-regressive models.
arXiv Detail & Related papers (2024-05-07T21:03:51Z) - Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training [57.771940716189114]
We show that large language models (LLMs) suffer from the "reversal curse"
The root cause of the reversal curse lies in the different word order between the training and inference stage.
We propose Semantic-aware Permutation Training (SPT) to address this issue.
arXiv Detail & Related papers (2024-03-01T18:55:20Z) - Deficiency of Large Language Models in Finance: An Empirical Examination
of Hallucination [7.627664978437055]
hallucination is recognized as a fundamental deficiency of large language models (LLMs)
This paper empirically investigates LLM models' ability of explaining financial concepts and terminologies.
We evaluate the efficacy of four practical methods, including few-shot learning, Decoding by Contrasting Layers (DoLa), the Retrieval Augmentation Generation (RAG) method and the prompt-based tool learning method for a function to generate a query command.
arXiv Detail & Related papers (2023-11-27T05:27:13Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.