How to distribute data across tasks for meta-learning?
- URL: http://arxiv.org/abs/2103.08463v1
- Date: Mon, 15 Mar 2021 15:38:47 GMT
- Title: How to distribute data across tasks for meta-learning?
- Authors: Alexandru Cioba, Michael Bromberg, Qian Wang, Ritwik Niyogi, Georgios
Batzolis, Da-shan Shiu, Alberto Bernacchia
- Abstract summary: We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
- Score: 59.608652082495624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-learning models transfer the knowledge acquired from previous tasks to
quickly learn new ones. They are tested on benchmarks with a fixed number of
data points per training task. This number is usually arbitrary and it is
unknown how it affects the performance. Since labelling of data is expensive,
finding the optimal allocation of labels across training tasks may reduce
costs: given a fixed budget of labels, should we use a small number of highly
labelled tasks, or many tasks with few labels each? We show that: 1) The
optimal number of data points per task depends on the budget, but it converges
to a unique constant value for large budgets; 2) Convergence occurs around the
interpolation threshold of the model. We prove our results mathematically on
mixed linear regression, and we show empirically that the same results hold for
nonlinear regression and few-shot image classification on CIFAR-FS and
mini-ImageNet. Our results suggest a simple and efficient procedure for data
collection: the optimal allocation of data can be computed at low cost, by
using relatively small data, and collection of additional data can be optimized
by the knowledge of the optimal allocation.
Related papers
- Swift Cross-Dataset Pruning: Enhancing Fine-Tuning Efficiency in Natural Language Understanding [2.379669478864599]
Current cross-dataset pruning techniques for fine-tuning often rely on computationally expensive sample ranking processes.
We propose Swift Cross-Dataset Pruning (SCDP), which uses TF-IDF embeddings with geometric median to rapidly evaluate sample importance.
Experimental results on six diverse datasets demonstrate the effectiveness of our method, spanning various tasks and scales.
arXiv Detail & Related papers (2025-01-05T03:52:04Z) - Data Pruning Can Do More: A Comprehensive Data Pruning Approach for Object Re-identification [13.732596789612362]
This work is the first to explore the feasibility of data pruning methods applied to object re-identification tasks.
By fully leveraging the logit history during training, our approach offers a more accurate and comprehensive metric for quantifying sample importance.
Our approach is highly efficient, reducing the cost of importance score estimation by 10 times compared to existing methods.
arXiv Detail & Related papers (2024-12-13T12:27:47Z) - Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Data curation via joint example selection further accelerates multimodal learning [3.329535792151987]
We show that jointly selecting batches of data is more effective for learning than selecting examples independently.
We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individually-prioritized data points.
arXiv Detail & Related papers (2024-06-25T16:52:37Z) - Variance Alignment Score: A Simple But Tough-to-Beat Data Selection
Method for Multimodal Contrastive Learning [17.40655778450583]
We propose a principled metric named Variance Alignment Score (VAS), which has the form $langle Sigma_texttest, Sigma_irangle$.
We show that applying VAS and CLIP scores together can outperform baselines by a margin of $1.3%$ on 38 evaluation sets for noisy dataset DataComp and $2.5%$ on VTAB for high-quality dataset CC12M.
arXiv Detail & Related papers (2024-02-03T06:29:04Z) - DiffusAL: Coupling Active Learning with Graph Diffusion for
Label-Efficient Node Classification [1.0602247913671219]
We introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings.
Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria.
Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.
arXiv Detail & Related papers (2023-07-31T20:30:13Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - USB: A Unified Summarization Benchmark Across Tasks and Domains [68.82726887802856]
We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks.
We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models.
arXiv Detail & Related papers (2023-05-23T17:39:54Z) - Project and Probe: Sample-Efficient Domain Adaptation by Interpolating
Orthogonal Features [119.22672589020394]
We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features.
Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$2$ improves performance by 5-15% when given limited target data.
arXiv Detail & Related papers (2023-02-10T18:58:03Z) - Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples.
A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels.
We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.