Related papers: Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond

Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond

URL: http://arxiv.org/abs/2204.08200v1
Date: Mon, 18 Apr 2022 07:39:23 GMT
Title: Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond
Authors: Haoxiang Wang, Bo Li, Han Zhao
Abstract summary: Gradual domain adaptation (GDA) assumes a path of $(T-1)$ unlabeled intermediate domains bridging the source and target. We prove a significantly improved generalization bound as $widetildeOleft(varepsilon_0+Oleft(sqrtlog(T)/nright)$, where $Delta$ is the average distributional distance between consecutive domains.
Score: 20.518134448156744
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The vast majority of existing algorithms for unsupervised domain adaptation (UDA) focus on adapting from a labeled source domain to an unlabeled target domain directly in a one-off way. Gradual domain adaptation (GDA), on the other hand, assumes a path of $(T-1)$ unlabeled intermediate domains bridging the source and target, and aims to provide better generalization in the target domain by leveraging the intermediate ones. Under certain assumptions, Kumar et al. (2020) proposed a simple algorithm, Gradual Self-Training, along with a generalization bound in the order of $e^{O(T)} \left(\varepsilon_0+O\left(\sqrt{log(T)/n}\right)\right)$ for the target domain error, where $\varepsilon_0$ is the source domain error and $n$ is the data size of each domain. Due to the exponential factor, this upper bound becomes vacuous when $T$ is only moderately large. In this work, we analyze gradual self-training under more general and relaxed assumptions, and prove a significantly improved generalization bound as $\widetilde{O}\left(\varepsilon_0 + T\Delta + T/\sqrt{n} + 1/\sqrt{nT}\right)$, where $\Delta$ is the average distributional distance between consecutive domains. Compared with the existing bound with an exponential dependency on $T$ as a multiplicative factor, our bound only depends on $T$ linearly and additively. Perhaps more interestingly, our result implies the existence of an optimal choice of $T$ that minimizes the generalization error, and it also naturally suggests an optimal way to construct the path of intermediate domains so as to minimize the accumulative path length $T\Delta$ between the source and target. To corroborate the implications of our theory, we examine gradual self-training on multiple semi-synthetic and real datasets, which confirms our findings. We believe our insights provide a path forward toward the design of future GDA algorithms.

Related papers

Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction [57.93371273485736]
We consider a centralized distributed learning setup where all workers jointly find an unbiased bound LDeltaepsilon2,$ better poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution.
arXiv Detail & Related papers (2025-06-30T13:27:39Z)
MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions. We study a more general and realistic class of generalized $ell$-smooth loss functions, where $ell$ is a general non-decreasing function of gradient norm.
arXiv Detail & Related papers (2024-05-29T18:36:59Z)
Learning Conditional Invariances through Non-Commutativity [4.820252317855078]
We show that a provably optimal and sample-efficient way of learning conditional invariances is by relaxing the invariance criterion to be non-commutatively directed towards the target domain. We prove that non-commutativity steers the optimization towards $Phi*_tau$ instead of $varphi*$, bringing the $mathcalH$-divergence between domains down to zero.
arXiv Detail & Related papers (2024-02-18T19:12:18Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Gradual Domain Adaptation: Theory and Algorithms [15.278170387810409]
Unsupervised domain adaptation (UDA) adapts a model from a labeled source domain to an unlabeled target domain in a one-off way. In this work, we first theoretically analyze gradual self-training, a popular GDA algorithm, and provide a significantly improved generalization bound. We propose $textbfG$enerative Gradual D$textbfO$main $textbfA$daptation with Optimal $textbfT$ransport (GOAT)
arXiv Detail & Related papers (2023-10-20T23:02:08Z)
Beyond Moments: Robustly Learning Affine Transformations with Asymptotically Optimal Error [8.615625517708324]
We present a-time algorithm for learning an unknown affine transformation of the standard hypercube from samples. Our algorithm is based on a new method that iteratively improves an estimate of the unknown affine transformation whenever the requirements of the certificate are not met.
arXiv Detail & Related papers (2023-02-23T19:13:30Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Domain-shift adaptation via linear transformations [11.541238742226199]
A predictor, $f_A, learned with data from a source domain (A) might not be accurate on a target domain (B) when their distributions are different. We propose an approach to project the source and target domains into a lower-dimensional, common space. We show the effectiveness of our approach in simulated data and in binary digit classification tasks, obtaining improvements up to 48% accuracy when correcting for the domain shift in the data.
arXiv Detail & Related papers (2022-01-14T02:49:03Z)
Efficient Hierarchical Domain Adaptation for Pretrained Language Models [77.02962815423658]
Generative language models are trained on diverse, general domain corpora. We introduce a method to scale domain adaptation to many diverse domains using a computationally efficient adapter approach.
arXiv Detail & Related papers (2021-12-16T11:09:29Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$ We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
Learning Domain-invariant Graph for Adaptive Semi-supervised Domain Adaptation with Few Labeled Source Samples [65.55521019202557]
Domain adaptation aims to generalize a model from a source domain to tackle tasks in a related but different target domain. Traditional domain adaptation algorithms assume that enough labeled data, which are treated as the prior knowledge are available in the source domain. We propose a Domain-invariant Graph Learning (DGL) approach for domain adaptation with only a few labeled source samples.
arXiv Detail & Related papers (2020-08-21T08:13:25Z)
Rethinking Distributional Matching Based Domain Adaptation [111.15106414932413]
Domain adaptation (DA) is a technique that transfers predictive models trained on a labeled source domain to an unlabeled target domain. Most popular DA algorithms are based on distributional matching (DM) In this paper, we first systematically analyze the limitations of DM based methods, and then build new benchmarks with more realistic domain shifts.
arXiv Detail & Related papers (2020-06-23T21:55:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.