Related papers: Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse

Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse

URL: http://arxiv.org/abs/2311.07468v2
Date: Thu, 16 Nov 2023 08:35:05 GMT
Title: Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse
Authors: Ang Lv and Kaiyi Zhang and Shufang Xie and Quan Tu and Yuhan Chen and Ji-Rong Wen and Rui Yan
Abstract summary: Recent studies have highlighted a phenomenon in large language models known as "the reversal curse" We contend that the reversal curse is partially a result of specific model training objectives. We propose a novel training method, BI Casual language modeling Optimization (BICO), designed to mitigate the reversal curse.
Score: 73.65112477688353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have highlighted a phenomenon in large language models (LLMs) known as "the reversal curse," in which the order of knowledge entities in the training data biases the models' comprehension. For example, if a model is trained on sentences where entity A consistently appears before entity B, it can respond to queries about A by providing B as the answer. However, it may encounter confusion when presented with questions concerning B. We contend that the reversal curse is partially a result of specific model training objectives, particularly evident in the prevalent use of the next-token prediction within most causal language models. For the next-token prediction, models solely focus on a token's preceding context, resulting in a restricted comprehension of the input. In contrast, we illustrate that the GLM, trained using the autoregressive blank infilling objective where tokens to be predicted have access to the entire context, exhibits better resilience against the reversal curse. We propose a novel training method, BIdirectional Casual language modeling Optimization (BICO), designed to mitigate the reversal curse when fine-tuning pretrained causal language models on new data. BICO modifies the causal attention mechanism to function bidirectionally and employs a mask denoising optimization. In the task designed to assess the reversal curse, our approach improves Llama's accuracy from the original 0% to around 70%. We hope that more attention can be focused on exploring and addressing these inherent weaknesses of the current LLMs, in order to achieve a higher level of intelligence.

Related papers

Enough Coin Flips Can Make LLMs Act Bayesian [71.79085204454039]
Large language models (LLMs) exhibit the ability to generalize given few-shot examples in their input prompt, an emergent capability known as in-context learning (ICL) We investigate whether LLMs utilize ICL to perform structured reasoning in ways that are consistent with a Bayesian framework or rely on pattern matching.
arXiv Detail & Related papers (2025-03-06T18:59:23Z)
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries. We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT) LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z)
Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning [68.57166425493283]
Refusal-Aware Instruction Tuning (RAIT) enables Large Language Models (LLMs) to refuse to answer unknown questions. RAIT modifies training samples based on the correctness of the initial LLM's response. This crude approach can cause LLMs to excessively refuse answering questions they could have correctly answered.
arXiv Detail & Related papers (2024-10-09T14:12:51Z)
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics [45.69328374321502]
Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks. LLMs fail to conclude '$B gets A$' during inference even if the two sentences are semantically identical. We theoretically analyze the reversal curse via the training dynamics of gradient descent for two auto-regressive models.
arXiv Detail & Related papers (2024-05-07T21:03:51Z)
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training [57.771940716189114]
We show that large language models (LLMs) suffer from the "reversal curse" The root cause of the reversal curse lies in the different word order between the training and inference stage. We propose Semantic-aware Permutation Training (SPT) to address this issue.
arXiv Detail & Related papers (2024-03-01T18:55:20Z)
Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination [7.627664978437055]
hallucination is recognized as a fundamental deficiency of large language models (LLMs) This paper empirically investigates LLM models' ability of explaining financial concepts and terminologies. We evaluate the efficacy of four practical methods, including few-shot learning, Decoding by Contrasting Layers (DoLa), the Retrieval Augmentation Generation (RAG) method and the prompt-based tool learning method for a function to generate a query command.
arXiv Detail & Related papers (2023-11-27T05:27:13Z)
LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality. We propose LLMRefine, an inference time optimization method to refine LLM's output. We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization. LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z)
Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT) This paper systematically investigates the advantages and challenges of LLMs for MMT. We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.