$\alpha$-Divergence Loss Function for Neural Density Ratio Estimation
- URL: http://arxiv.org/abs/2402.02041v2
- Date: Sun, 18 Feb 2024 10:53:18 GMT
- Title: $\alpha$-Divergence Loss Function for Neural Density Ratio Estimation
- Authors: Yoshiaki Kitazawa
- Abstract summary: An $alpha$-divergence loss function ($alpha$-Div) that offers concise implementation and stable optimization is proposed in this paper.
The stability of the proposed loss function is empirically demonstrated and the estimation accuracy of DRE tasks is investigated.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, neural networks have produced state-of-the-art results for
density-ratio estimation (DRE), a fundamental technique in machine learning.
However, existing methods bear optimization issues that arise from the loss
functions of DRE: a large sample requirement of Kullback--Leibler
(KL)-divergence, vanishing of train loss gradients, and biased gradients of the
loss functions. Thus, an $\alpha$-divergence loss function ($\alpha$-Div) that
offers concise implementation and stable optimization is proposed in this
paper. Furthermore, technical justifications for the proposed loss function are
presented. The stability of the proposed loss function is empirically
demonstrated and the estimation accuracy of DRE tasks is investigated.
Additionally, this study presents a sample requirement for DRE using the
proposed loss function in terms of the upper bound of $L_1$ error, which
connects a curse of dimensionality as a common problem in high-dimensional DRE
tasks.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - RoBoSS: A Robust, Bounded, Sparse, and Smooth Loss Function for
Supervised Learning [0.0]
We propose a novel robust, bounded, sparse, and smooth (RoBoSS) loss function for supervised learning.
We introduce a new robust algorithm named $mathcalL_rbss$-SVM to generalize well to unseen data.
We evaluate the proposed $mathcalL_rbss$-SVM on $88$ real-world UCI and KEEL datasets from diverse domains.
arXiv Detail & Related papers (2023-09-05T13:59:50Z) - Leaving the Nest: Going Beyond Local Loss Functions for
Predict-Then-Optimize [57.22851616806617]
We show that our method achieves state-of-the-art results in four domains from the literature.
Our approach outperforms the best existing method by nearly 200% when the localness assumption is broken.
arXiv Detail & Related papers (2023-05-26T11:17:45Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Xtreme Margin: A Tunable Loss Function for Binary Classification
Problems [0.0]
We provide an overview of a novel loss function, the Xtreme Margin loss function.
Unlike the binary cross-entropy and the hinge loss functions, this loss function provides researchers and practitioners flexibility with their training process.
arXiv Detail & Related papers (2022-10-31T22:39:32Z) - Real order total variation with applications to the loss functions in
learning schemes [5.8868325478050165]
We propose a loss function consisting of a $r$-order (an)-isotropic total variation semi-norms $TVr$, $rin mathbbR+$.
We focus on studying key theoretical properties, such as the lower semi-continuity and compactness with respect to both the function and the order of derivative $r$, of such loss functions.
arXiv Detail & Related papers (2022-04-10T02:44:04Z) - Do Lessons from Metric Learning Generalize to Image-Caption Retrieval? [67.45267657995748]
The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch.
Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning.
We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods.
arXiv Detail & Related papers (2022-02-14T15:18:00Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - A surrogate loss function for optimization of $F_\beta$ score in binary
classification with imbalanced data [0.0]
The gradient paths of the proposed surrogate $F_beta$ loss function approximate the gradient paths of the large sample limit of the $F_beta$ score.
It is demonstrated that the proposed surrogate $F_beta$ loss function is effective for optimizing $F_beta$ scores under class imbalances.
arXiv Detail & Related papers (2021-04-03T18:36:23Z) - $\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using
Sigmoidal Functions [0.9569316316728905]
We introduce a new loss function called squared reduction loss ($sigma2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance.
Our loss has clear intuition and geometric interpretation, we demonstrate by experiments the effectiveness of our proposal.
arXiv Detail & Related papers (2020-09-18T12:34:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.