Super-model ecosystem: A domain-adaptation perspective
- URL: http://arxiv.org/abs/2208.14092v1
- Date: Tue, 30 Aug 2022 09:09:43 GMT
- Title: Super-model ecosystem: A domain-adaptation perspective
- Authors: Fengxiang He, Dacheng Tao
- Abstract summary: This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation.
Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry.
- Score: 101.76769818069072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper attempts to establish the theoretical foundation for the emerging
super-model paradigm via domain adaptation, where one first trains a very
large-scale model, {\it i.e.}, super model (or foundation model in some other
papers), on a large amount of data and then adapts it to various specific
domains. Super-model paradigms help reduce computational and data cost and
carbon emission, which is critical to AI industry, especially enormous small
and medium-sized enterprises. We model the super-model paradigm as a two-stage
diffusion process: (1) in the pre-training stage, the model parameter diffuses
from random initials and converges to a steady distribution; and (2) in the
fine-tuning stage, the model parameter is transported to another steady
distribution. Both training stages can be mathematically modeled by the
Uhlenbeck-Ornstein process which converges to two Maxwell-Boltzmann
distributions, respectively, each of which characterizes the corresponding
convergent model. An $\mathcal O(1/\sqrt{N})$ generalization bound is then
established via PAC-Bayesian framework. The theory finds that the
generalization error of the fine-tuning stage is dominant in domain adaptation.
In addition, our theory suggests that the generalization is determined by a new
measure that characterizes the domain discrepancy between the source domain and
target domain, based on the covariance matrices and the shift of the converged
local minimum.
Related papers
- Flow matching achieves almost minimax optimal convergence [50.38891696297888]
Flow matching (FM) has gained significant attention as a simulation-free generative model.
This paper discusses the convergence properties of FM for large sample size under the $p$-Wasserstein distance.
We establish that FM can achieve an almost minimax optimal convergence rate for $1 leq p leq 2$, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models.
arXiv Detail & Related papers (2024-05-31T14:54:51Z) - Transfer Learning for Diffusion Models [43.10840361752551]
Diffusion models consistently produce high-quality synthetic samples.
They can be impractical in real-world applications due to high collection costs or associated risks.
This paper introduces the Transfer Guided Diffusion Process (TGDP), a novel approach distinct from conventional finetuning and regularization methods.
arXiv Detail & Related papers (2024-05-27T06:48:58Z) - Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models [6.76974373198208]
We find that the dependency of the error incurred within each denoising step on the ambient dimension $d$ is in general unavoidable.
This represents the first theoretical demonstration that the DDPM sampler can adapt to unknown low-dimensional structures in the target distribution.
arXiv Detail & Related papers (2024-05-23T17:59:10Z) - Reflected Schr\"odinger Bridge for Constrained Generative Modeling [16.72888494254555]
Reflected diffusion models have become the go-to method for large-scale generative models in real-world applications.
We introduce the Reflected Schrodinger Bridge algorithm: an entropy-regularized optimal transport approach tailored generating data within diverse bounded domains.
Our algorithm yields robust generative modeling in diverse domains, and its scalability is demonstrated in real-world constrained generative modeling through standard image benchmarks.
arXiv Detail & Related papers (2024-01-06T14:39:58Z) - Domain Generalisation via Domain Adaptation: An Adversarial Fourier
Amplitude Approach [13.642506915023871]
We adversarially synthesise the worst-case target domain and adapt a model to that worst-case domain.
On the DomainBedNet dataset, the proposed approach yields significantly improved domain generalisation performance.
arXiv Detail & Related papers (2023-02-23T14:19:07Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Bayesian Neural Network Inference via Implicit Models and the Posterior
Predictive Distribution [0.8122270502556371]
We propose a novel approach to perform approximate Bayesian inference in complex models such as Bayesian neural networks.
The approach is more scalable to large data than Markov Chain Monte Carlo.
We see this being useful in applications such as surrogate and physics-based models.
arXiv Detail & Related papers (2022-09-06T02:43:19Z) - Domain-Specific Risk Minimization for Out-of-Distribution Generalization [104.17683265084757]
We first establish a generalization bound that explicitly considers the adaptivity gap.
We propose effective gap estimation methods for guiding the selection of a better hypothesis for the target.
The other method is minimizing the gap directly by adapting model parameters using online target samples.
arXiv Detail & Related papers (2022-08-18T06:42:49Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Few-shot Domain Adaptation by Causal Mechanism Transfer [107.08605582020866]
We study few-shot supervised domain adaptation (DA) for regression problems, where only a few labeled target domain data and many labeled source domain data are available.
Many of the current DA methods base their transfer assumptions on either parametrized distribution shift or apparent distribution similarities.
We propose mechanism transfer, a meta-distributional scenario in which a data generating mechanism is invariant among domains.
arXiv Detail & Related papers (2020-02-10T02:16:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.