Understanding In-Context Learning from Repetitions
- URL: http://arxiv.org/abs/2310.00297v3
- Date: Wed, 21 Feb 2024 09:21:52 GMT
- Title: Understanding In-Context Learning from Repetitions
- Authors: Jianhao Yan, Jin Xu, Chiyu Song, Chenming Wu, Yafu Li, Yue Zhang
- Abstract summary: This paper explores the elusive mechanism underpinning in-context learning in Large Language Models (LLMs)
We quantitatively investigate the role of surface features in text generation, and empirically establish the existence of emphtoken co-occurrence reinforcement
By investigating the dual impacts of these features, our research illuminates the internal workings of in-context learning and expounds on the reasons for its failures.
- Score: 21.28694573253979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the elusive mechanism underpinning in-context learning in
Large Language Models (LLMs). Our work provides a novel perspective by
examining in-context learning via the lens of surface repetitions. We
quantitatively investigate the role of surface features in text generation, and
empirically establish the existence of \emph{token co-occurrence
reinforcement}, a principle that strengthens the relationship between two
tokens based on their contextual co-occurrences. By investigating the dual
impacts of these features, our research illuminates the internal workings of
in-context learning and expounds on the reasons for its failures. This paper
provides an essential contribution to the understanding of in-context learning
and its potential limitations, providing a fresh perspective on this exciting
capability.
Related papers
- The broader spectrum of in-context learning [13.111927028942329]
We provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of metalearned in-context learning.
We suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can be elicited.
We close by suggesting that research on in-context learning should consider this broader spectrum in-context capabilities and types of generalization.
arXiv Detail & Related papers (2024-12-05T00:05:11Z) - Mitigating Knowledge Conflicts in Language Model-Driven Question Answering [15.29366851382021]
Two fundamental knowledge sources play crucial roles in document-based question answering and document summarization systems.
Recent studies revealed a significant challenge: when there exists a misalignment between the model's inherent knowledge and the ground truth answers in training data, the system may exhibit problematic behaviors during inference.
Our investigation proposes a strategy to minimize hallucination by building explicit connection between source inputs and generated outputs.
arXiv Detail & Related papers (2024-11-18T07:33:10Z) - Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis [0.0]
The study demonstrates that large language models can detect direct quotations, allusions, and echoes between texts.
The model struggles with long query passages and the inclusion of false intertextual dependences.
The expert-in-the-loop methodology presented offers a scalable approach for intertextual research.
arXiv Detail & Related papers (2024-09-03T13:23:11Z) - Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
We investigate whether attention heads encode two types of relationships between tokens present in natural languages.
We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens.
arXiv Detail & Related papers (2024-02-20T14:43:39Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis [20.142154624977582]
In-context learning (ICL) capability enables large language models to excel in proficiency through demonstration examples.
In this paper, we present a thorough survey on the interpretation and analysis of in-context learning.
We believe that our work establishes the basis for further exploration into the interpretation of in-context learning.
arXiv Detail & Related papers (2023-11-01T02:40:42Z) - Negation, Coordination, and Quantifiers in Contextualized Language
Models [4.46783454797272]
We explore whether the semantic constraints of function words are learned and how the surrounding context impacts their embeddings.
We create suitable datasets, provide new insights into the inner workings of LMs vis-a-vis function words and implement an assisting visual web interface for qualitative analysis.
arXiv Detail & Related papers (2022-09-16T10:01:11Z) - Learning to Express in Knowledge-Grounded Conversation [62.338124154016825]
We consider two aspects of knowledge expression, namely the structure of the response and style of the content in each part.
We propose a segmentation-based generation model and optimize the model by a variational approach to discover the underlying pattern of knowledge expression in a response.
arXiv Detail & Related papers (2022-04-12T13:43:47Z) - Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene
Text Recognition [60.36540008537054]
In this work, we excavate the implicit task, character counting within the traditional text recognition, without additional labor annotation cost.
We design a two-branch reciprocal feature learning framework in order to adequately utilize the features from both the tasks.
Experiments on 7 benchmarks show the advantages of the proposed methods in both text recognition and the new-built character counting tasks.
arXiv Detail & Related papers (2021-05-13T12:27:35Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z) - Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised
Approach [89.56158561087209]
We study summarizing on arbitrary aspects relevant to the document.
Due to the lack of supervision data, we develop a new weak supervision construction method and an aspect modeling scheme.
Experiments show our approach achieves performance boosts on summarizing both real and synthetic documents.
arXiv Detail & Related papers (2020-10-14T03:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.