Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary
Data
- URL: http://arxiv.org/abs/2302.00674v4
- Date: Tue, 3 Oct 2023 15:50:30 GMT
- Title: Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary
Data
- Authors: Alon Albalak, Colin Raffel, William Yang Wang
- Abstract summary: We focus on Few-shot Learning with Auxiliary Data (FLAD)
FLAD assumes access to auxiliary data during few-shot learning in hopes of improving generalization.
We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit.
- Score: 100.33096338195723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot learning is valuable in many real-world applications, but learning a
generalizable model without overfitting to the few labeled datapoints is
challenging. In this work, we focus on Few-shot Learning with Auxiliary Data
(FLAD), a training paradigm that assumes access to auxiliary data during
few-shot learning in hopes of improving generalization. Previous works have
proposed automated methods for mixing auxiliary and target data, but these
methods typically scale linearly (or worse) with the number of auxiliary
datasets, limiting their practicality. In this work we relate FLAD to the
explore-exploit dilemma that is central to the multi-armed bandit setting and
derive algorithms whose computational complexity is independent of the number
of auxiliary datasets, allowing us to scale to 100x more auxiliary datasets
than prior methods. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and
compare them with prior FLAD methods that either explore or exploit, finding
that the combination of exploration and exploitation is crucial. Through
extensive experimentation we find that our methods outperform all pre-existing
FLAD methods by 4% and lead to the first 3 billion parameter language models
that outperform the 175 billion parameter GPT-3. Overall, our work suggests
that the discovery of better, more efficient mixing strategies for FLAD may
provide a viable path towards substantially improving generalization in
few-shot learning.
Related papers
- A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding [9.112203072394648]
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow.
Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples.
arXiv Detail & Related papers (2023-12-08T19:26:13Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - Distributed Gradient Descent for Functional Learning [9.81463654618448]
We propose a novel distributed gradient descent functional learning (DGDFL) algorithm to tackle functional data across numerous local machines (processors) in the framework of reproducing kernel Hilbert space.
Under mild conditions, confidence-based optimal learning rates of DGDFL are obtained without the saturation boundary on the regularity index suffered in previous works in functional regression.
arXiv Detail & Related papers (2023-05-12T12:15:42Z) - Cross-Modal Fine-Tuning: Align then Refine [83.37294254884446]
ORCA is a cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities.
We show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities.
arXiv Detail & Related papers (2023-02-11T16:32:28Z) - Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points.
Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches.
We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy.
Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z) - Explored An Effective Methodology for Fine-Grained Snake Recognition [8.908667065576632]
We design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification.
In order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training.
Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.
arXiv Detail & Related papers (2022-07-24T02:19:15Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.