Related papers: Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings

Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings

URL: http://arxiv.org/abs/2305.14521v3
Date: Wed, 29 May 2024 22:38:13 GMT
Title: Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings
Authors: Yihao Xue, Ali Payani, Yu Yang, Baharan Mirzasoleiman,
Abstract summary: MixPro is a lightweight and highly data-efficient approach for few-shot adaptation. We show that MixPro can outperform baselines by up to 7%, with only 2-4 target examples.
Score: 16.009815290729904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pretrained machine learning models need to be adapted to distribution shifts when deployed in new target environments. When obtaining labeled data from the target distribution is expensive, few-shot adaptation with only a few examples from the target distribution becomes essential. In this work, we propose MixPro, a lightweight and highly data-efficient approach for few-shot adaptation. MixPro first generates a relatively large dataset by mixing (linearly combining) pre-trained embeddings of large source data with those of the few target examples. This process preserves important features of both source and target distributions, while mitigating the specific noise in the small target data. Then, it trains a linear classifier on the mixed embeddings to effectively adapts the model to the target distribution without overfitting the small target data. Theoretically, we demonstrate the advantages of MixPro over previous methods. Our experiments, conducted across various model architectures on 8 datasets featuring different types of distribution shifts, reveal that MixPro can outperform baselines by up to 7\%, with only 2-4 target examples.

Related papers

Test-Time Alignment via Hypothesis Reweighting [56.71167047381817]
Large pretrained models often struggle with underspecified tasks. We propose a novel framework to address the challenge of aligning models to test-time user intent.
arXiv Detail & Related papers (2024-12-11T23:02:26Z)
Distributionally Robust Safe Sample Elimination under Covariate Shift [16.85444622474742]
We consider a machine learning setup where one training dataset is used to train multiple models across slightly different data distributions. We propose the DRSSS method, which combines distributionally robust (DR) optimization and safe sample screening (SSS) The key benefit of this method is that models trained on the reduced dataset will perform the same as those trained on the full dataset for all possible different environments.
arXiv Detail & Related papers (2024-06-10T01:46:42Z)
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms. We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law. Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z)
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE) MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z)
Restricted Generative Projection for One-Class Classification and Anomaly Detection [31.173234437065464]
We learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution. The simplicity is to ensure that we can sample from the distribution easily. The compactness is to ensure that the decision boundary between normal data and abnormal data is clear. The informativeness is to ensure that the transformed data preserve the important information of the original data.
arXiv Detail & Related papers (2023-07-09T04:59:10Z)
Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features [119.22672589020394]
We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features. Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$2$ improves performance by 5-15% when given limited target data.
arXiv Detail & Related papers (2023-02-10T18:58:03Z)
Lightweight Conditional Model Extrapolation for Streaming Data under Class-Prior Shift [27.806085423595334]
We introduce LIMES, a new method for learning with non-stationary streaming data. We learn a single set of model parameters from which a specific classifier for any specific data distribution is derived. Experiments on a set of exemplary tasks using Twitter data show that LIMES achieves higher accuracy than alternative approaches.
arXiv Detail & Related papers (2022-06-10T15:19:52Z)
A Data Cartography based MixUp for Pre-trained Language Models [47.90235939359225]
MixUp is a data augmentation strategy where additional samples are generated during training by combining random pairs of training samples and their labels. We propose TDMixUp, a novel MixUp strategy that leverages Training Dynamics and allows more informative samples to be combined for generating new data samples. We empirically validate that our method not only achieves competitive performance using a smaller subset of the training data compared with strong baselines, but also yields lower expected calibration error on the pre-trained language model, BERT, on both in-domain and out-of-domain settings in a wide range of NLP tasks.
arXiv Detail & Related papers (2022-05-06T17:59:19Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
On-target Adaptation [82.77980951331854]
Domain adaptation seeks to mitigate the shift between training on the emphsource domain and testing on the emphtarget domain. Most adaptation methods rely on the source data by joint optimization over source data and target data. We show significant improvement by on-target adaptation, which learns the representation purely from target data.
arXiv Detail & Related papers (2021-09-02T17:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.