Unsupervised Domain Adaptation Via Data Pruning
- URL: http://arxiv.org/abs/2409.12076v1
- Date: Wed, 18 Sep 2024 15:48:59 GMT
- Title: Unsupervised Domain Adaptation Via Data Pruning
- Authors: Andrea Napoli, Paul White,
- Abstract summary: We consider the problem from the perspective of unsupervised domain adaptation (UDA)
We propose AdaPrune, a method for UDA whereby training examples are removed to attempt to align the training distribution to that of the target data.
As a method for UDA, we show that AdaPrune outperforms related techniques, and is complementary to other UDA algorithms such as CORAL.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The removal of carefully-selected examples from training data has recently emerged as an effective way of improving the robustness of machine learning models. However, the best way to select these examples remains an open question. In this paper, we consider the problem from the perspective of unsupervised domain adaptation (UDA). We propose AdaPrune, a method for UDA whereby training examples are removed to attempt to align the training distribution to that of the target data. By adopting the maximum mean discrepancy (MMD) as the criterion for alignment, the problem can be neatly formulated and solved as an integer quadratic program. We evaluate our approach on a real-world domain shift task of bioacoustic event detection. As a method for UDA, we show that AdaPrune outperforms related techniques, and is complementary to other UDA algorithms such as CORAL. Our analysis of the relationship between the MMD and model accuracy, along with t-SNE plots, validate the proposed method as a principled and well-founded way of performing data pruning.
Related papers
- Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation [84.82153655786183]
We propose a novel framework called Informative Data Mining (IDM) to enable efficient one-shot domain adaptation for semantic segmentation.
IDM provides an uncertainty-based selection criterion to identify the most informative samples, which facilitates quick adaptation and reduces redundant training.
Our approach outperforms existing methods and achieves a new state-of-the-art one-shot performance of 56.7%/55.4% on the GTA5/SYNTHIA to Cityscapes adaptation tasks.
arXiv Detail & Related papers (2023-09-25T15:56:01Z) - Can We Evaluate Domain Adaptation Models Without Target-Domain Labels? [36.05871459064825]
Unsupervised domain adaptation (UDA) involves adapting a model trained on a label-rich source domain to an unlabeled target domain.
In real-world scenarios, the absence of target-domain labels makes it challenging to evaluate the performance of UDA models.
We propose a novel metric called the textitTransfer Score to address these issues.
arXiv Detail & Related papers (2023-05-30T03:36:40Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - SRoUDA: Meta Self-training for Robust Unsupervised Domain Adaptation [25.939292305808934]
Unsupervised domain adaptation (UDA) can transfer knowledge learned from rich-label dataset to unlabeled target dataset.
In this paper, we present a new meta self-training pipeline, named SRoUDA, for improving adversarial robustness of UDA models.
arXiv Detail & Related papers (2022-12-12T14:25:40Z) - LAMA-Net: Unsupervised Domain Adaptation via Latent Alignment and
Manifold Learning for RUL Prediction [0.0]
We propose textitLAMA-Net, an encoder-decoder based model (Transformer) with an induced bottleneck, Latent Alignment using Mean Maximum Discrepancy (MMD) and manifold learning.
The proposed method offers a promising approach to perform domain adaptation in RUL prediction.
arXiv Detail & Related papers (2022-08-17T16:28:20Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - UDALM: Unsupervised Domain Adaptation through Language Modeling [79.73916345178415]
We introduce UDALM, a fine-tuning procedure, using a mixed classification and Masked Language Model loss.
Our experiments show that performance of models trained with the mixed loss scales with the amount of available target data can be effectively used as a stopping criterion.
Our method is evaluated on twelve domain pairs of the Amazon Reviews Sentiment dataset, yielding $91.74%$ accuracy, which is an $1.11%$ absolute improvement over the state-of-versathe-art.
arXiv Detail & Related papers (2021-04-14T19:05:01Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Robust Optimal Transport with Applications in Generative Modeling and
Domain Adaptation [120.69747175899421]
Optimal Transport (OT) distances such as Wasserstein have been used in several areas such as GANs and domain adaptation.
We propose a computationally-efficient dual form of the robust OT optimization that is amenable to modern deep learning applications.
Our approach can train state-of-the-art GAN models on noisy datasets corrupted with outlier distributions.
arXiv Detail & Related papers (2020-10-12T17:13:40Z) - Instance Adaptive Self-Training for Unsupervised Domain Adaptation [19.44504738538047]
We propose an instance adaptive self-training framework for UDA on the task of semantic segmentation.
To effectively improve the quality of pseudo-labels, we develop a novel pseudo-label generation strategy with an instance adaptive selector.
Our method is so concise and efficient that it is easy to be generalized to other unsupervised domain adaptation methods.
arXiv Detail & Related papers (2020-08-27T15:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.