On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality
- URL: http://arxiv.org/abs/2411.17522v1
- Date: Tue, 26 Nov 2024 15:30:48 GMT
- Title: On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality
- Authors: Jerry Yao-Chieh Hu, Weimin Wu, Yi-Chen Lee, Yu-Chao Huang, Minshuo Chen, Han Liu,
- Abstract summary: We show that both conditional DiTs and their latent variants lead to the minimax optimality of unconditional DiTs under identified settings.
Our findings establish statistical limits for conditional and unconditional DiTs, and offer practical guidance toward developing more efficient and accurate DiT models.
- Score: 15.889816082916722
- License:
- Abstract: We investigate the approximation and estimation rates of conditional diffusion transformers (DiTs) with classifier-free guidance. We present a comprehensive analysis for ``in-context'' conditional DiTs under four common data assumptions. We show that both conditional DiTs and their latent variants lead to the minimax optimality of unconditional DiTs under identified settings. Specifically, we discretize the input domains into infinitesimal grids and then perform a term-by-term Taylor expansion on the conditional diffusion score function under H\"older smooth data assumption. This enables fine-grained use of transformers' universal approximation through a more detailed piecewise constant approximation and hence obtains tighter bounds. Additionally, we extend our analysis to the latent setting under the linear latent subspace assumption. We not only show that latent conditional DiTs achieve lower bounds than conditional DiTs both in approximation and estimation, but also show the minimax optimality of latent unconditional DiTs. Our findings establish statistical limits for conditional and unconditional DiTs, and offer practical guidance toward developing more efficient and accurate DiT models.
Related papers
- Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs) [12.810268045479992]
We study the universal approximation and sample complexity of the DiTs score function.
We show that latent DiTs have the potential to bypass the challenges associated with the high dimensionality of initial data.
arXiv Detail & Related papers (2024-07-01T08:34:40Z) - Flow matching achieves almost minimax optimal convergence [50.38891696297888]
Flow matching (FM) has gained significant attention as a simulation-free generative model.
This paper discusses the convergence properties of FM for large sample size under the $p$-Wasserstein distance.
We establish that FM can achieve an almost minimax optimal convergence rate for $1 leq p leq 2$, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models.
arXiv Detail & Related papers (2024-05-31T14:54:51Z) - Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory [87.00653989457834]
Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning.
Despite the empirical success, theory of conditional diffusion models is largely missing.
This paper bridges the gap by presenting a sharp statistical theory of distribution estimation using conditional diffusion models.
arXiv Detail & Related papers (2024-03-18T17:08:24Z) - Consistent Optimal Transport with Empirical Conditional Measures [0.6562256987706128]
We consider the problem of Optimal Transportation (OT) between two joint distributions when conditioned on a common variable.
We use kernelized-least-squares terms computed over the joint samples, which implicitly match the transport plan's conditional objective.
Our methodology improves upon state-of-the-art methods when employed in applications like prompt learning for few-shot classification and conditional-generation in the context of predicting cell responses to treatment.
arXiv Detail & Related papers (2023-05-25T10:01:57Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Conditional Distributional Treatment Effect with Kernel Conditional Mean
Embeddings and U-Statistic Regression [20.544239209511982]
conditional distributional treatment effect (CoDiTE)
CoDiTE encodes a treatment's distributional aspects beyond the mean.
Experiments on synthetic, semi-synthetic and real datasets demonstrate the merits of our approach.
arXiv Detail & Related papers (2021-02-16T15:09:23Z) - Comparing Probability Distributions with Conditional Transport [63.11403041984197]
We propose conditional transport (CT) as a new divergence and approximate it with the amortized CT (ACT) cost.
ACT amortizes the computation of its conditional transport plans and comes with unbiased sample gradients that are straightforward to compute.
On a wide variety of benchmark datasets generative modeling, substituting the default statistical distance of an existing generative adversarial network with ACT is shown to consistently improve the performance.
arXiv Detail & Related papers (2020-12-28T05:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.