Related papers: $\alpha$-Divergence Loss Function for Neural Density Ratio Estimation

$\alpha$-Divergence Loss Function for Neural Density Ratio Estimation

URL: http://arxiv.org/abs/2402.02041v2
Date: Sun, 18 Feb 2024 10:53:18 GMT
Title: $\alpha$-Divergence Loss Function for Neural Density Ratio Estimation
Authors: Yoshiaki Kitazawa
Abstract summary: An $alpha$-divergence loss function ($alpha$-Div) that offers concise implementation and stable optimization is proposed in this paper. The stability of the proposed loss function is empirically demonstrated and the estimation accuracy of DRE tasks is investigated.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, neural networks have produced state-of-the-art results for density-ratio estimation (DRE), a fundamental technique in machine learning. However, existing methods bear optimization issues that arise from the loss functions of DRE: a large sample requirement of Kullback--Leibler (KL)-divergence, vanishing of train loss gradients, and biased gradients of the loss functions. Thus, an $\alpha$-divergence loss function ($\alpha$-Div) that offers concise implementation and stable optimization is proposed in this paper. Furthermore, technical justifications for the proposed loss function are presented. The stability of the proposed loss function is empirically demonstrated and the estimation accuracy of DRE tasks is investigated. Additionally, this study presents a sample requirement for DRE using the proposed loss function in terms of the upper bound of $L_1$ error, which connects a curse of dimensionality as a common problem in high-dimensional DRE tasks.

Related papers

Generalized Kullback-Leibler Divergence Loss [105.66549870868971]
We prove that the Kullback-Leibler (KL) Divergence loss is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss. Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement.
arXiv Detail & Related papers (2025-03-11T04:43:33Z)
Bounds on $L_p$ Errors in Density Ratio Estimation via $f$-Divergence Loss Functions [0.0]
Density ratio estimation (DRE) is a fundamental machine learning technique for identifying relationships between two probability distributions. $f$-divergence loss functions, derived from variational representations of $f$-divergence, are commonly employed in DRE to achieve state-of-the-art results. This study presents a novel perspective on DRE using $f$-divergence loss functions by deriving the upper and lower bounds on $L_p$ errors.
arXiv Detail & Related papers (2024-10-02T13:05:09Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity [54.145730036889496]
This paper deals with Gradient learning (FL) in the presence of malicious attacks Byzantine data. A novel Average Algorithm (RAGA) is proposed, which leverages robustness aggregation and can select a dataset.
arXiv Detail & Related papers (2024-03-20T08:15:08Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Decoupled Kullback-Leibler Divergence Loss [90.54331083430597]
We prove that the Kullback-Leibler (KL) Divergence loss is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss. We introduce class-wise global information into KL/DKL to bias from individual samples. The proposed approach achieves new state-of-the-art adversarial robustness on the public leaderboard.
arXiv Detail & Related papers (2023-05-23T11:17:45Z)
$\alpha$-divergence Improves the Entropy Production Estimation via Machine Learning [0.0]
We show that there exists a host of loss functions, namely those implementing a variational representation of the $alpha$-divergence. By fixing $alpha$ to a value between $-1$ and $0$, the $alpha$-NEEP exhibits a much more robust performance against strong nonequilibrium driving or slow dynamics.
arXiv Detail & Related papers (2023-03-06T05:35:32Z)
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED) PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward. To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z)
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning [59.02006924867438]
Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions. Recent work proposed distributionally robust OPE/L (DROPE/L) to remedy this, but the proposal relies on inverse-propensity weighting. We propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets.
arXiv Detail & Related papers (2022-02-19T20:00:44Z)
Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions. Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure. In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z)
A surrogate loss function for optimization of $F_\beta$ score in binary classification with imbalanced data [0.0]
The gradient paths of the proposed surrogate $F_beta$ loss function approximate the gradient paths of the large sample limit of the $F_beta$ score. It is demonstrated that the proposed surrogate $F_beta$ loss function is effective for optimizing $F_beta$ scores under class imbalances.
arXiv Detail & Related papers (2021-04-03T18:36:23Z)
$\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using Sigmoidal Functions [0.9569316316728905]
We introduce a new loss function called squared reduction loss ($sigma2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance. Our loss has clear intuition and geometric interpretation, we demonstrate by experiments the effectiveness of our proposal.
arXiv Detail & Related papers (2020-09-18T12:34:40Z)
Least $k$th-Order and R\'{e}nyi Generative Adversarial Networks [12.13405065406781]
Experimental results indicate that the proposed loss functions, applied to the MNIST and CelebA datasets, confer performance benefits by virtue of the extra degrees of freedom provided by the parameters $k$ and $alpha$, respectively. While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, e.g., the issues of fairness or privacy in artificial intelligence.
arXiv Detail & Related papers (2020-06-03T18:44:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.