Related papers: Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

URL: http://arxiv.org/abs/2302.00674v4
Date: Tue, 3 Oct 2023 15:50:30 GMT
Title: Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data
Authors: Alon Albalak, Colin Raffel, William Yang Wang
Abstract summary: We focus on Few-shot Learning with Auxiliary Data (FLAD) FLAD assumes access to auxiliary data during few-shot learning in hopes of improving generalization. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit.
Score: 100.33096338195723
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Few-shot learning is valuable in many real-world applications, but learning a generalizable model without overfitting to the few labeled datapoints is challenging. In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization. Previous works have proposed automated methods for mixing auxiliary and target data, but these methods typically scale linearly (or worse) with the number of auxiliary datasets, limiting their practicality. In this work we relate FLAD to the explore-exploit dilemma that is central to the multi-armed bandit setting and derive algorithms whose computational complexity is independent of the number of auxiliary datasets, allowing us to scale to 100x more auxiliary datasets than prior methods. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit, finding that the combination of exploration and exploitation is crucial. Through extensive experimentation we find that our methods outperform all pre-existing FLAD methods by 4% and lead to the first 3 billion parameter language models that outperform the 175 billion parameter GPT-3. Overall, our work suggests that the discovery of better, more efficient mixing strategies for FLAD may provide a viable path towards substantially improving generalization in few-shot learning.

Related papers

Deepfake Detection that Generalizes Across Benchmarks [48.85953407706351]
The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment.<n>This work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders.<n>The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC.
arXiv Detail & Related papers (2025-08-08T12:03:56Z)
A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding [9.112203072394648]
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow. Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples.
arXiv Detail & Related papers (2023-12-08T19:26:13Z)
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models. We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z)
Distributed Gradient Descent for Functional Learning [9.81463654618448]
We propose a novel distributed gradient descent functional learning (DGDFL) algorithm to tackle functional data across numerous local machines (processors) in the framework of reproducing kernel Hilbert space. Under mild conditions, confidence-based optimal learning rates of DGDFL are obtained without the saturation boundary on the regularity index suffered in previous works in functional regression.
arXiv Detail & Related papers (2023-05-12T12:15:42Z)
Generative Meta-Learning for Zero-Shot Relation Triplet Extraction [20.556880137419064]
Zero-shot Relation Triplet Extraction (ZeroRTE) aims to extract relation triplets from texts containing unseen relation types. Existing approaches typically leverage the knowledge embedded in pre-trained language models to accomplish the generalization process. We propose a generative meta-learning framework which exploits the learning-to-learn' ability of meta-learning to boost the generalization capability of generative models.
arXiv Detail & Related papers (2023-05-03T06:34:39Z)
Cross-Modal Fine-Tuning: Align then Refine [83.37294254884446]
ORCA is a cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. We show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities.
arXiv Detail & Related papers (2023-02-11T16:32:28Z)
Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points. Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches. We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy. Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z)
Explored An Effective Methodology for Fine-Grained Snake Recognition [8.908667065576632]
We design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification. In order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training. Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.
arXiv Detail & Related papers (2022-07-24T02:19:15Z)
Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation. We develop a new adversarial learning based method, which is simple and efficient to apply. We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.