Project and Probe: Sample-Efficient Domain Adaptation by Interpolating
Orthogonal Features
- URL: http://arxiv.org/abs/2302.05441v2
- Date: Thu, 25 May 2023 04:12:52 GMT
- Title: Project and Probe: Sample-Efficient Domain Adaptation by Interpolating
Orthogonal Features
- Authors: Annie S. Chen, Yoonho Lee, Amrith Setlur, Sergey Levine, Chelsea Finn
- Abstract summary: We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features.
Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$2$ improves performance by 5-15% when given limited target data.
- Score: 119.22672589020394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer learning with a small amount of target data is an effective and
common approach to adapting a pre-trained model to distribution shifts. In some
situations, target data labels may be expensive to obtain, so we may only have
access to a limited number of target data points. To make the most of a very
small target dataset, we propose a lightweight, sample-efficient approach that
learns a diverse set of features and adapts to a target distribution by
interpolating these features. Our approach, Project and Probe (Pro$^2$), first
learns a linear projection that maps a pre-trained embedding onto orthogonal
directions while being predictive of labels in the source dataset. The goal of
this step is to learn a variety of predictive features, so that at least some
of them remain useful after distribution shift. Pro$^2$ then learns a linear
classifier on top of these projected features using a small target dataset.
Theoretically, we find that Pro$^2$ results in more sample-efficient
generalization by inducing a favorable bias-variance tradeoff. Our experiments
on four datasets, with multiple distribution shift settings for each, show that
Pro$^2$ improves performance by 5-15% when given limited target data compared
to prior methods such as standard linear probing.
Related papers
- Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs [18.242110417706]
This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model.
We show the optimality of this approach for fine-tuning tasks under certain conditions.
Our proposed method is significantly faster than existing techniques, scaling to millions of samples within a single GPU hour.
arXiv Detail & Related papers (2024-05-05T00:08:00Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Variance Alignment Score: A Simple But Tough-to-Beat Data Selection
Method for Multimodal Contrastive Learning [17.40655778450583]
We propose a principled metric named Variance Alignment Score (VAS), which has the form $langle Sigma_texttest, Sigma_irangle$.
We show that applying VAS and CLIP scores together can outperform baselines by a margin of $1.3%$ on 38 evaluation sets for noisy dataset DataComp and $2.5%$ on VTAB for high-quality dataset CC12M.
arXiv Detail & Related papers (2024-02-03T06:29:04Z) - Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings [16.009815290729904]
MixPro is a lightweight and highly data-efficient approach for few-shot adaptation.
We show that MixPro can outperform baselines by up to 7%, with only 2-4 target examples.
arXiv Detail & Related papers (2023-05-23T20:49:45Z) - Deep Active Learning with Contrastive Learning Under Realistic Data Pool
Assumptions [2.578242050187029]
Active learning aims to identify the most informative data from an unlabeled data pool that enables a model to reach the desired accuracy rapidly.
Most existing active learning methods have been evaluated in an ideal setting where only samples relevant to the target task exist in an unlabeled data pool.
We introduce new active learning benchmarks that include ambiguous, task-irrelevant out-of-distribution as well as in-distribution samples.
arXiv Detail & Related papers (2023-03-25T10:46:10Z) - Data Selection for Language Models via Importance Resampling [90.9263039747723]
We formalize the problem of selecting a subset of a large raw unlabeled dataset to match a desired target distribution.
We extend the classic importance resampling approach used in low-dimensions for LM data selection.
We instantiate the DSIR framework with hashed n-gram features for efficiency, enabling the selection of 100M documents in 4.5 hours.
arXiv Detail & Related papers (2023-02-06T23:57:56Z) - Lightweight Conditional Model Extrapolation for Streaming Data under
Class-Prior Shift [27.806085423595334]
We introduce LIMES, a new method for learning with non-stationary streaming data.
We learn a single set of model parameters from which a specific classifier for any specific data distribution is derived.
Experiments on a set of exemplary tasks using Twitter data show that LIMES achieves higher accuracy than alternative approaches.
arXiv Detail & Related papers (2022-06-10T15:19:52Z) - Deep learning model solves change point detection for multiple change
types [69.77452691994712]
A change points detection aims to catch an abrupt disorder in data distribution.
We propose an approach that works in the multiple-distributions scenario.
arXiv Detail & Related papers (2022-04-15T09:44:21Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.