Understanding In-Context Learning from Repetitions
- URL: http://arxiv.org/abs/2310.00297v3
- Date: Wed, 21 Feb 2024 09:21:52 GMT
- Title: Understanding In-Context Learning from Repetitions
- Authors: Jianhao Yan, Jin Xu, Chiyu Song, Chenming Wu, Yafu Li, Yue Zhang
- Abstract summary: This paper explores the elusive mechanism underpinning in-context learning in Large Language Models (LLMs)
We quantitatively investigate the role of surface features in text generation, and empirically establish the existence of emphtoken co-occurrence reinforcement
By investigating the dual impacts of these features, our research illuminates the internal workings of in-context learning and expounds on the reasons for its failures.
- Score: 21.28694573253979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the elusive mechanism underpinning in-context learning in
Large Language Models (LLMs). Our work provides a novel perspective by
examining in-context learning via the lens of surface repetitions. We
quantitatively investigate the role of surface features in text generation, and
empirically establish the existence of \emph{token co-occurrence
reinforcement}, a principle that strengthens the relationship between two
tokens based on their contextual co-occurrences. By investigating the dual
impacts of these features, our research illuminates the internal workings of
in-context learning and expounds on the reasons for its failures. This paper
provides an essential contribution to the understanding of in-context learning
and its potential limitations, providing a fresh perspective on this exciting
capability.
Related papers
- Argumentation and Machine Learning [4.064849471241967]
This chapter provides an overview of research works that present approaches with some degree of cross-fertilisation between Computational Argumentation and Machine Learning.
Two broad themes representing the purpose of the interaction between these two areas were identified.
We evaluate the spectrum of works across various dimensions, including the type of learning and the form of argumentation framework used.
arXiv Detail & Related papers (2024-10-31T08:19:58Z) - Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis [0.0]
The study demonstrates that large language models can detect direct quotations, allusions, and echoes between texts.
The model struggles with long query passages and the inclusion of false intertextual dependences.
The expert-in-the-loop methodology presented offers a scalable approach for intertextual research.
arXiv Detail & Related papers (2024-09-03T13:23:11Z) - Exploring Continual Learning of Compositional Generalization in NLI [24.683598294766774]
We introduce the Continual Compositional Generalization in Inference (C2Gen NLI) challenge.
A model continuously acquires knowledge of constituting primitive inference tasks as a basis for compositional inferences.
Our analyses show that by learning subtasks continuously while observing their dependencies and increasing degrees of difficulty, continual learning can enhance composition generalization ability.
arXiv Detail & Related papers (2024-03-07T10:54:27Z) - Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
We investigate whether attention heads encode two types of relationships between tokens present in natural languages.
We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens.
arXiv Detail & Related papers (2024-02-20T14:43:39Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis [20.142154624977582]
In-context learning (ICL) capability enables large language models to excel in proficiency through demonstration examples.
In this paper, we present a thorough survey on the interpretation and analysis of in-context learning.
We believe that our work establishes the basis for further exploration into the interpretation of in-context learning.
arXiv Detail & Related papers (2023-11-01T02:40:42Z) - Negation, Coordination, and Quantifiers in Contextualized Language
Models [4.46783454797272]
We explore whether the semantic constraints of function words are learned and how the surrounding context impacts their embeddings.
We create suitable datasets, provide new insights into the inner workings of LMs vis-a-vis function words and implement an assisting visual web interface for qualitative analysis.
arXiv Detail & Related papers (2022-09-16T10:01:11Z) - Learning to Express in Knowledge-Grounded Conversation [62.338124154016825]
We consider two aspects of knowledge expression, namely the structure of the response and style of the content in each part.
We propose a segmentation-based generation model and optimize the model by a variational approach to discover the underlying pattern of knowledge expression in a response.
arXiv Detail & Related papers (2022-04-12T13:43:47Z) - Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene
Text Recognition [60.36540008537054]
In this work, we excavate the implicit task, character counting within the traditional text recognition, without additional labor annotation cost.
We design a two-branch reciprocal feature learning framework in order to adequately utilize the features from both the tasks.
Experiments on 7 benchmarks show the advantages of the proposed methods in both text recognition and the new-built character counting tasks.
arXiv Detail & Related papers (2021-05-13T12:27:35Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z) - Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised
Approach [89.56158561087209]
We study summarizing on arbitrary aspects relevant to the document.
Due to the lack of supervision data, we develop a new weak supervision construction method and an aspect modeling scheme.
Experiments show our approach achieves performance boosts on summarizing both real and synthetic documents.
arXiv Detail & Related papers (2020-10-14T03:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.