Concept-aware Training Improves In-context Learning Ability of Language
Models
- URL: http://arxiv.org/abs/2305.13775v1
- Date: Tue, 23 May 2023 07:44:52 GMT
- Title: Concept-aware Training Improves In-context Learning Ability of Language
Models
- Authors: Michal \v{S}tef\'anik and Marek Kadl\v{c}\'ik
- Abstract summary: Many recent language models (LMs) of Transformers family exhibit so-called in-context learning (ICL) ability.
We propose a method to create LMs able to better utilize the in-context information.
We measure that data sampling of Concept-aware Training consistently improves models' reasoning ability.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many recent language models (LMs) of Transformers family exhibit so-called
in-context learning (ICL) ability, manifested in the LMs' ability to modulate
their function by a task described in a natural language input. Previous work
curating these models assumes that ICL emerges from vast over-parametrization
or the scale of multi-task training. However, a complementary branch of recent
theoretical work attributes ICL emergence to specific properties of training
data and creates functional in-context learners in small-scale, synthetic
settings.
Inspired by recent findings on data properties driving the emergence of ICL,
we propose a method to create LMs able to better utilize the in-context
information, by constructing training scenarios where it is beneficial for the
LM to capture the analogical reasoning concepts. We measure that data sampling
of Concept-aware Training (CoAT) consistently improves models' reasoning
ability. As a result, the in-context learners trained with CoAT on only two
datasets of a single (QA) task perform comparably to larger models trained on
1600+ tasks.
Related papers
- The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples.
We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts.
Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z) - Dynamic Skill Adaptation for Large Language Models [78.31322532135272]
We present Dynamic Skill Adaptation (DSA), an adaptive and dynamic framework to adapt novel and complex skills to Large Language Models (LLMs)
For every skill, we utilize LLMs to generate both textbook-like data which contains detailed descriptions of skills for pre-training and exercise-like data which targets at explicitly utilizing the skills to solve problems for instruction-tuning.
Experiments on large language models such as LLAMA and Mistral demonstrate the effectiveness of our proposed methods in adapting math reasoning skills and social study skills.
arXiv Detail & Related papers (2024-12-26T22:04:23Z) - Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning [15.919493497867567]
This study aims to evaluate the performance of Multimodal Large Language Models (MLLMs) on the VALSE benchmark.
We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in model size and pretraining datasets.
arXiv Detail & Related papers (2024-07-17T11:26:47Z) - How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes [6.652837942112205]
Large language models (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text.
We propose several effective curriculum learning strategies that allow ICL models to achieve higher data efficiency and more stable convergence.
Our experiments reveal that ICL models can effectively learn difficult tasks by training on progressively harder tasks while mixing in prior tasks, denoted as mixed curriculum in this work.
arXiv Detail & Related papers (2024-04-04T16:15:23Z) - Concept-aware Data Construction Improves In-context Learning of Language Models [2.4715271879679395]
We show that concept-aware in-context learning is more effective for a majority of new tasks when compared to traditional instruction tuning.
We propose Concept-aware Training (CoAT), a framework for constructing training scenarios that make it beneficial for the LM to learn to utilize the analogical reasoning concepts from demonstrations.
arXiv Detail & Related papers (2024-03-08T19:07:47Z) - Transformer-based Causal Language Models Perform Clustering [20.430255724239448]
We introduce a simplified instruction-following task and use synthetic datasets to analyze a Transformer-based causal language model.
Our findings suggest that the model learns task-specific information by clustering data within its hidden space, with this clustering process evolving dynamically during learning.
arXiv Detail & Related papers (2024-02-19T14:02:31Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.