How to distribute data across tasks for meta-learning?
- URL: http://arxiv.org/abs/2103.08463v1
- Date: Mon, 15 Mar 2021 15:38:47 GMT
- Title: How to distribute data across tasks for meta-learning?
- Authors: Alexandru Cioba, Michael Bromberg, Qian Wang, Ritwik Niyogi, Georgios
Batzolis, Da-shan Shiu, Alberto Bernacchia
- Abstract summary: We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
- Score: 59.608652082495624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-learning models transfer the knowledge acquired from previous tasks to
quickly learn new ones. They are tested on benchmarks with a fixed number of
data points per training task. This number is usually arbitrary and it is
unknown how it affects the performance. Since labelling of data is expensive,
finding the optimal allocation of labels across training tasks may reduce
costs: given a fixed budget of labels, should we use a small number of highly
labelled tasks, or many tasks with few labels each? We show that: 1) The
optimal number of data points per task depends on the budget, but it converges
to a unique constant value for large budgets; 2) Convergence occurs around the
interpolation threshold of the model. We prove our results mathematically on
mixed linear regression, and we show empirically that the same results hold for
nonlinear regression and few-shot image classification on CIFAR-FS and
mini-ImageNet. Our results suggest a simple and efficient procedure for data
collection: the optimal allocation of data can be computed at low cost, by
using relatively small data, and collection of additional data can be optimized
by the knowledge of the optimal allocation.
Related papers
- Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Data curation via joint example selection further accelerates multimodal learning [3.329535792151987]
We show that jointly selecting batches of data is more effective for learning than selecting examples independently.
We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individually-prioritized data points.
arXiv Detail & Related papers (2024-06-25T16:52:37Z) - Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs [18.242110417706]
This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model.
We show the optimality of this approach for fine-tuning tasks under certain conditions.
Our proposed method is significantly faster than existing techniques, scaling to millions of samples within a single GPU hour.
arXiv Detail & Related papers (2024-05-05T00:08:00Z) - Variance Alignment Score: A Simple But Tough-to-Beat Data Selection
Method for Multimodal Contrastive Learning [17.40655778450583]
We propose a principled metric named Variance Alignment Score (VAS), which has the form $langle Sigma_texttest, Sigma_irangle$.
We show that applying VAS and CLIP scores together can outperform baselines by a margin of $1.3%$ on 38 evaluation sets for noisy dataset DataComp and $2.5%$ on VTAB for high-quality dataset CC12M.
arXiv Detail & Related papers (2024-02-03T06:29:04Z) - DiffusAL: Coupling Active Learning with Graph Diffusion for
Label-Efficient Node Classification [1.0602247913671219]
We introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings.
Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria.
Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.
arXiv Detail & Related papers (2023-07-31T20:30:13Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - USB: A Unified Summarization Benchmark Across Tasks and Domains [68.82726887802856]
We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks.
We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models.
arXiv Detail & Related papers (2023-05-23T17:39:54Z) - Project and Probe: Sample-Efficient Domain Adaptation by Interpolating
Orthogonal Features [119.22672589020394]
We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features.
Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$2$ improves performance by 5-15% when given limited target data.
arXiv Detail & Related papers (2023-02-10T18:58:03Z) - Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples.
A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels.
We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z) - Active clustering for labeling training data [0.8029049649310211]
We propose a setting for training data gathering where the human experts perform the comparatively cheap task of answering pairwise queries.
We analyze the algorithms that minimize the average number of queries required to cluster the items and analyze their complexity.
arXiv Detail & Related papers (2021-10-27T15:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.