Concept-aware Training Improves In-context Learning Ability of Language
Models
- URL: http://arxiv.org/abs/2305.13775v1
- Date: Tue, 23 May 2023 07:44:52 GMT
- Title: Concept-aware Training Improves In-context Learning Ability of Language
Models
- Authors: Michal \v{S}tef\'anik and Marek Kadl\v{c}\'ik
- Abstract summary: Many recent language models (LMs) of Transformers family exhibit so-called in-context learning (ICL) ability.
We propose a method to create LMs able to better utilize the in-context information.
We measure that data sampling of Concept-aware Training consistently improves models' reasoning ability.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many recent language models (LMs) of Transformers family exhibit so-called
in-context learning (ICL) ability, manifested in the LMs' ability to modulate
their function by a task described in a natural language input. Previous work
curating these models assumes that ICL emerges from vast over-parametrization
or the scale of multi-task training. However, a complementary branch of recent
theoretical work attributes ICL emergence to specific properties of training
data and creates functional in-context learners in small-scale, synthetic
settings.
Inspired by recent findings on data properties driving the emergence of ICL,
we propose a method to create LMs able to better utilize the in-context
information, by constructing training scenarios where it is beneficial for the
LM to capture the analogical reasoning concepts. We measure that data sampling
of Concept-aware Training (CoAT) consistently improves models' reasoning
ability. As a result, the in-context learners trained with CoAT on only two
datasets of a single (QA) task perform comparably to larger models trained on
1600+ tasks.
Related papers
- Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning [15.919493497867567]
This study aims to evaluate the performance of Multimodal Large Language Models (MLLMs) on the VALSE benchmark.
We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in model size and pretraining datasets.
arXiv Detail & Related papers (2024-07-17T11:26:47Z) - How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes [6.652837942112205]
Large language models (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text.
We propose several effective curriculum learning strategies that allow ICL models to achieve higher data efficiency and more stable convergence.
Our experiments reveal that ICL models can effectively learn difficult tasks by training on progressively harder tasks while mixing in prior tasks, denoted as mixed curriculum in this work.
arXiv Detail & Related papers (2024-04-04T16:15:23Z) - Concept-aware Data Construction Improves In-context Learning of Language Models [2.4715271879679395]
We show that concept-aware in-context learning is more effective for a majority of new tasks when compared to traditional instruction tuning.
We propose Concept-aware Training (CoAT), a framework for constructing training scenarios that make it beneficial for the LM to learn to utilize the analogical reasoning concepts from demonstrations.
arXiv Detail & Related papers (2024-03-08T19:07:47Z) - Transformer-based Causal Language Models Perform Clustering [20.430255724239448]
We introduce a simplified instruction-following task and use synthetic datasets to analyze a Transformer-based causal language model.
Our findings suggest that the model learns task-specific information by clustering data within its hidden space, with this clustering process evolving dynamically during learning.
arXiv Detail & Related papers (2024-02-19T14:02:31Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Feeding What You Need by Understanding What You Learned [54.400455868448695]
Machine Reading (MRC) reveals the ability to understand a given text passage and answer questions based on it.
Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match.
We argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data.
arXiv Detail & Related papers (2022-03-05T14:15:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.