Clustering-Based Validation Splits for Domain Generalisation
- URL: http://arxiv.org/abs/2405.19461v1
- Date: Wed, 29 May 2024 19:21:17 GMT
- Title: Clustering-Based Validation Splits for Domain Generalisation
- Authors: Andrea Napoli, Paul White,
- Abstract summary: It is proposed that a high maximum mean discrepancy (MMD) between the training and validation sets increases the generalisability of selected models.
A data splitting algorithm based on kernel k-means clustering is presented.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper considers the problem of model selection under domain shift. In this setting, it is proposed that a high maximum mean discrepancy (MMD) between the training and validation sets increases the generalisability of selected models. A data splitting algorithm based on kernel k-means clustering, which maximises this objective, is presented. The algorithm leverages linear programming to control the size, label, and (optionally) group distributions of the splits, and comes with convergence guarantees. The technique consistently outperforms alternative splitting strategies across a range of datasets and training algorithms, for both domain generalisation (DG) and unsupervised domain adaptation (UDA) tasks. Analysis also shows the MMD between the training and validation sets to be strongly rank-correlated ($\rho=0.63$) with test domain accuracy, further substantiating the validity of this approach.
Related papers
- Distributed Linear Regression with Compositional Covariates [5.085889377571319]
We focus on the distributed sparse penalized linear log-contrast model in massive compositional data.
Two distributed optimization techniques are proposed for solving the two different constrained convex optimization problems.
In the decentralized topology, we introduce a distributed coordinate-wise descent algorithm for obtaining a communication-efficient regularized estimation.
arXiv Detail & Related papers (2023-10-21T11:09:37Z) - AdaptDHM: Adaptive Distribution Hierarchical Model for Multi-Domain CTR
Prediction [4.299153274884263]
We propose an elegant and flexible multi-distribution modeling paradigm, named Adaptive Distribution Hierarchical Model (AdaptDHM)
Our model achieves impressive prediction accuracy and its time cost during the training stage is more than 50% less than that of other models.
arXiv Detail & Related papers (2022-11-22T09:10:37Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Instance Level Affinity-Based Transfer for Unsupervised Domain
Adaptation [74.71931918541748]
We propose an instance affinity based criterion for source to target transfer during adaptation, called ILA-DA.
We first propose a reliable and efficient method to extract similar and dissimilar samples across source and target, and utilize a multi-sample contrastive loss to drive the domain alignment process.
We verify the effectiveness of ILA-DA by observing consistent improvements in accuracy over popular domain adaptation approaches on a variety of benchmark datasets.
arXiv Detail & Related papers (2021-04-03T01:33:14Z) - Model-Based Domain Generalization [96.84818110323518]
We propose a novel approach for the domain generalization problem called Model-Based Domain Generalization.
Our algorithms beat the current state-of-the-art methods on the very-recently-proposed WILDS benchmark by up to 20 percentage points.
arXiv Detail & Related papers (2021-02-23T00:59:02Z) - Cross-Domain Grouping and Alignment for Domain Adaptive Semantic
Segmentation [74.3349233035632]
Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) do not consider an inter-class variation within the target domain itself or estimated category.
We introduce a learnable clustering module, and a novel domain adaptation framework called cross-domain grouping and alignment.
Our method consistently boosts the adaptation performance in semantic segmentation, outperforming the state-of-the-arts on various domain adaptation settings.
arXiv Detail & Related papers (2020-12-15T11:36:21Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.