Related papers: Meta-Learning Transformers to Improve In-Context Generalization

Meta-Learning Transformers to Improve In-Context Generalization

URL: http://arxiv.org/abs/2507.05019v1
Date: Mon, 07 Jul 2025 14:02:22 GMT
Title: Meta-Learning Transformers to Improve In-Context Generalization
Authors: Lorenzo Braccaioli, Anna Vettoruzzo, Prabhant Singh, Joaquin Vanschoren, Mohamed-Rafik Bouguelia, Nicola Conci,
Abstract summary: In-context learning enables transformer models to generalize to new tasks based solely on input prompts.<n>Existing training paradigms typically rely on large, unstructured datasets that are costly to store.<n>We propose an alternative training strategy where we leverage a collection of multiple, small-scale, and domain-specific datasets.
Score: 8.694999451321571
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are costly to store, difficult to evaluate for quality and balance, and pose privacy and ethical concerns due to the inclusion of sensitive information. Motivated by these limitations and risks, we propose an alternative training strategy where we leverage a collection of multiple, small-scale, and domain-specific datasets. We empirically demonstrate that the increased quality and diversity of such data improve the generalization abilities of in-context learners beyond their training domain, while achieving comparable performance with models trained on a single large-scale dataset. We investigate this paradigm by leveraging meta-learning to train an in-context learner on the Meta-Album collection under several settings. Firstly, we show the performance in a controlled environment, where the test domain is completely excluded from the training knowledge. Secondly, we explore the robustness of these models to forgetting in a continual scenario where the information is accessible for a limited time. Finally, we explore the more challenging unsupervised scenario. Our findings demonstrate that transformers still generalize for in-context prediction when trained on a curated dataset collection while offering advantages in modularity and replaceability.

Related papers

ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection [28.75333303894706]
ToReMi is a novel framework that adjusts training sample weights according to their topical associations and observed learning patterns.<n>Our experiments reveal that ToReMi variants consistently achieve superior performance over conventional pre-training approaches.
arXiv Detail & Related papers (2025-04-01T12:06:42Z)
In-context learning of evolving data streams with tabular foundational models [42.13420474990124]
This work bridges advancements from both areas, highlighting how transformers' implicit meta-learning abilities, pre-training on drifting natural data, and reliance on context optimization directly address the core challenges of adaptive learning in dynamic environments.<n> Exploring real-time model adaptation, this research demonstrates that TabPFN, coupled with a simple sliding memory strategy, consistently outperforms ensembles of Hoeffding trees across all non-stationary benchmarks.
arXiv Detail & Related papers (2025-02-24T04:52:35Z)
A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.<n>To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.<n>We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z)
Unsupervised Meta-Learning via In-Context Learning [3.4165401459803335]
We propose a novel approach to unsupervised meta-learning that leverages the generalization abilities of in-supervised learning.<n>Our method reframes meta-learning as a sequence modeling problem, enabling the transformer encoder to learn task context from support images.
arXiv Detail & Related papers (2024-05-25T08:29:46Z)
TrACT: A Training Dynamics Aware Contrastive Learning Framework for Long-tail Trajectory Prediction [7.3292387742640415]
We propose to incorporate richer training dynamics information into a prototypical contrastive learning framework. We conduct empirical evaluations of our approach using two large-scale naturalistic datasets.
arXiv Detail & Related papers (2024-04-18T23:12:46Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision. Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes. We introduce the bilevel paradigm to model the above latent correspondence. A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z)
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks. We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z)
Improving filling level classification with adversarial training [90.01594595780928]
We investigate the problem of classifying - from a single image - the level of content in a cup or a drinking glass. We use adversarial training in a generic source dataset and then refine the training with a task-specific dataset. We show that transfer learning with adversarial training in the source domain consistently improves the classification accuracy on the test set.
arXiv Detail & Related papers (2021-02-08T08:32:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.