Bounds on $L_p$ Errors in Density Ratio Estimation via $f$-Divergence Loss Functions
- URL: http://arxiv.org/abs/2410.01516v1
- Date: Wed, 2 Oct 2024 13:05:09 GMT
- Title: Bounds on $L_p$ Errors in Density Ratio Estimation via $f$-Divergence Loss Functions
- Authors: Yoshiaki Kitazawa,
- Abstract summary: Density ratio estimation (DRE) is a fundamental machine learning technique for identifying relationships between two probability distributions.
$f$-divergence loss functions, derived from variational representations of $f$-divergence, are commonly employed in DRE to achieve state-of-the-art results.
This study presents a novel perspective on DRE using $f$-divergence loss functions by deriving the upper and lower bounds on $L_p$ errors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Density ratio estimation (DRE) is a fundamental machine learning technique for identifying relationships between two probability distributions. $f$-divergence loss functions, derived from variational representations of $f$-divergence, are commonly employed in DRE to achieve state-of-the-art results. This study presents a novel perspective on DRE using $f$-divergence loss functions by deriving the upper and lower bounds on $L_p$ errors. These bounds apply to any estimator within a class of Lipschitz continuous estimators, irrespective of the specific $f$-divergence loss functions utilized. The bounds are formulated as a product of terms that include the data dimension and the expected value of the density ratio raised to the power of $p$. Notably, the lower bound incorporates an exponential term dependent on the Kullback--Leibler divergence, indicating that the $L_p$ error significantly increases with the Kullback--Leibler divergence for $p > 1$, and this increase becomes more pronounced as $p$ increases. Furthermore, these theoretical findings are substantiated through numerical experiments.
Related papers
- A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA)
In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z) - Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$.
We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z) - Convergence Analysis of Probability Flow ODE for Score-based Generative Models [5.939858158928473]
We study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives.
We prove the total variation between the target and the generated data distributions can be bounded above by $mathcalO(d3/4delta1/2)$ in the continuous time level.
arXiv Detail & Related papers (2024-04-15T12:29:28Z) - $α$-Divergence Loss Function for Neural Density Ratio Estimation [0.0]
Density ratio estimation (DRE) is a fundamental machine learning technique for capturing relationships between two probability distributions.
Existing methods face optimization challenges, such as overfitting due to lower-unbounded loss functions, biased mini-batch gradients, vanishing training loss gradients, and high sample requirements for Kullback-Leibler (KL) divergence loss functions.
We propose a novel loss function for DRE, the $alpha$-divergence loss function ($alpha$-Div), which is concise but offers stable and effective optimization for DRE.
arXiv Detail & Related papers (2024-02-03T05:33:01Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - $\alpha$-GAN: Convergence and Estimation Guarantees [7.493779672689531]
We prove a correspondence between the min-max optimization of general CPE loss function GANs and the minimization of associated $f$-divergences.
We then focus on $alpha$-GAN, defined via the $alpha$-loss, which interpolates several GANs and corresponds to the minimization of the Arimoto divergence.
arXiv Detail & Related papers (2022-05-12T23:26:51Z) - Faster Convergence of Local SGD for Over-Parameterized Models [1.5504102675587357]
Modern machine learning architectures are often highly expressive.
We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized functions in heterogeneous data setting.
For general convex loss functions, we establish an error bound $O(K/T)$ otherwise.
For non-loss functions, we prove an error bound $O(K/T)$ in both cases.
We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize.
arXiv Detail & Related papers (2022-01-30T04:05:56Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Understanding the Under-Coverage Bias in Uncertainty Estimation [58.03725169462616]
quantile regression tends to emphunder-cover than the desired coverage level in reality.
We prove that quantile regression suffers from an inherent under-coverage bias.
Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error.
arXiv Detail & Related papers (2021-06-10T06:11:55Z) - Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP)
We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z) - Rates of convergence for density estimation with generative adversarial
networks [19.71040653379663]
We prove an oracle inequality for the Jensen-Shannon (JS) divergence between the underlying density $mathsfp*$ and the GAN estimate.
We show that the JS-divergence between the GAN estimate and $mathsfp*$ decays as fast as $(logn/n)2beta/ (2beta + d)$.
arXiv Detail & Related papers (2021-01-30T09:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.