Related papers: How to distribute data across tasks for meta-learning?

How to distribute data across tasks for meta-learning?

URL: http://arxiv.org/abs/2103.08463v1
Date: Mon, 15 Mar 2021 15:38:47 GMT
Title: How to distribute data across tasks for meta-learning?
Authors: Alexandru Cioba, Michael Bromberg, Qian Wang, Ritwik Niyogi, Georgios Batzolis, Da-shan Shiu, Alberto Bernacchia
Abstract summary: We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets. Our results suggest a simple and efficient procedure for data collection.
Score: 59.608652082495624
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Meta-learning models transfer the knowledge acquired from previous tasks to quickly learn new ones. They are tested on benchmarks with a fixed number of data points per training task. This number is usually arbitrary and it is unknown how it affects the performance. Since labelling of data is expensive, finding the optimal allocation of labels across training tasks may reduce costs: given a fixed budget of labels, should we use a small number of highly labelled tasks, or many tasks with few labels each? We show that: 1) The optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets; 2) Convergence occurs around the interpolation threshold of the model. We prove our results mathematically on mixed linear regression, and we show empirically that the same results hold for nonlinear regression and few-shot image classification on CIFAR-FS and mini-ImageNet. Our results suggest a simple and efficient procedure for data collection: the optimal allocation of data can be computed at low cost, by using relatively small data, and collection of additional data can be optimized by the knowledge of the optimal allocation.

Related papers

Data Selection for ERMs [67.57726352698933]
We study how well can $mathcalA$ perform when trained on at most $nll N$ data points selected from a population of $N$ points. Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression.
arXiv Detail & Related papers (2025-04-20T11:26:01Z)
Swift Cross-Dataset Pruning: Enhancing Fine-Tuning Efficiency in Natural Language Understanding [2.379669478864599]
Current cross-dataset pruning techniques for fine-tuning often rely on computationally expensive sample ranking processes. We propose Swift Cross-Dataset Pruning (SCDP), which uses TF-IDF embeddings with geometric median to rapidly evaluate sample importance. Experimental results on six diverse datasets demonstrate the effectiveness of our method, spanning various tasks and scales.
arXiv Detail & Related papers (2025-01-05T03:52:04Z)
Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning. We construct pseudo-skill clusters by grouping gradient-based sample vectors. We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z)
Data curation via joint example selection further accelerates multimodal learning [3.329535792151987]
We show that jointly selecting batches of data is more effective for learning than selecting examples independently. We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individually-prioritized data points.
arXiv Detail & Related papers (2024-06-25T16:52:37Z)
Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs [18.242110417706]
This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model. We show the optimality of this approach for fine-tuning tasks under certain conditions. Our proposed method is significantly faster than existing techniques, scaling to millions of samples within a single GPU hour.
arXiv Detail & Related papers (2024-05-05T00:08:00Z)
Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning [17.40655778450583]
We propose a principled metric named Variance Alignment Score (VAS), which has the form $langle Sigma_texttest, Sigma_irangle$. We show that applying VAS and CLIP scores together can outperform baselines by a margin of $1.3%$ on 38 evaluation sets for noisy dataset DataComp and $2.5%$ on VTAB for high-quality dataset CC12M.
arXiv Detail & Related papers (2024-02-03T06:29:04Z)
DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification [1.0602247913671219]
We introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings. Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria. Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.
arXiv Detail & Related papers (2023-07-31T20:30:13Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
USB: A Unified Summarization Benchmark Across Tasks and Domains [68.82726887802856]
We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks. We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models.
arXiv Detail & Related papers (2023-05-23T17:39:54Z)
Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features [119.22672589020394]
We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features. Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$2$ improves performance by 5-15% when given limited target data.
arXiv Detail & Related papers (2023-02-10T18:58:03Z)
Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples. A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels. We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z)
Active clustering for labeling training data [0.8029049649310211]
We propose a setting for training data gathering where the human experts perform the comparatively cheap task of answering pairwise queries. We analyze the algorithms that minimize the average number of queries required to cluster the items and analyze their complexity.
arXiv Detail & Related papers (2021-10-27T15:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.