Non-Negative Bregman Divergence Minimization for Deep Direct Density
Ratio Estimation
- URL: http://arxiv.org/abs/2006.06979v3
- Date: Sat, 17 Jul 2021 09:25:37 GMT
- Title: Non-Negative Bregman Divergence Minimization for Deep Direct Density
Ratio Estimation
- Authors: Masahiro Kato, Takeshi Teshima
- Abstract summary: We propose a non-negative correction for empirical BD estimators to mitigate train-loss hacking.
We show that the proposed methods show a favorable performance in inlier-based outlier detection.
- Score: 18.782750537161615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Density ratio estimation (DRE) is at the core of various machine learning
tasks such as anomaly detection and domain adaptation. In existing studies on
DRE, methods based on Bregman divergence (BD) minimization have been
extensively studied. However, BD minimization when applied with highly flexible
models, such as deep neural networks, tends to suffer from what we call
train-loss hacking, which is a source of overfitting caused by a typical
characteristic of empirical BD estimators. In this paper, to mitigate
train-loss hacking, we propose a non-negative correction for empirical BD
estimators. Theoretically, we confirm the soundness of the proposed method
through a generalization error bound. Through our experiments, the proposed
methods show a favorable performance in inlier-based outlier detection.
Related papers
- On Training Implicit Meta-Learning With Applications to Inductive
Weighing in Consistency Regularization [0.0]
Implicit meta-learning (IML) requires computing $2nd$ order gradients, particularly the Hessian.
Various approximations for the Hessian were proposed but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked.
We show how training a "Confidence Network" to extract domain specific features can learn to up-weigh useful images and down-weigh out-of-distribution samples.
arXiv Detail & Related papers (2023-10-28T15:50:03Z) - Adaptive learning of density ratios in RKHS [3.047411947074805]
Estimating the ratio of two probability densities from finitely many observations is a central problem in machine learning and statistics.
We analyze a large class of density ratio estimation methods that minimize a regularized Bregman divergence between the true density ratio and a model in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2023-07-30T08:18:39Z) - Convergence of uncertainty estimates in Ensemble and Bayesian sparse
model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator.
We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z) - On double-descent in uncertainty quantification in overparametrized
models [24.073221004661427]
Uncertainty quantification is a central challenge in reliable and trustworthy machine learning.
We show a trade-off between classification accuracy and calibration, unveiling a double descent like behavior in the calibration curve of optimally regularized estimators.
This is in contrast with the empirical Bayes method, which we show to be well calibrated in our setting despite the higher generalization error and overparametrization.
arXiv Detail & Related papers (2022-10-23T16:01:08Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Doubly Robust Collaborative Targeted Learning for Recommendation on Data
Missing Not at Random [6.563595953273317]
In recommender systems, the feedback data received is always missing not at random (MNAR)
We propose bf DR-TMLE that effectively captures the merits of both error imputation-based (EIB) and doubly robust (DR) methods.
We also propose a novel RCT-free collaborative targeted learning algorithm for DR-TMLE, called bf DR-TMLE-TL
arXiv Detail & Related papers (2022-03-19T06:48:50Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - On the Practicality of Deterministic Epistemic Uncertainty [106.06571981780591]
deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution data.
It remains unclear whether DUMs are well calibrated and can seamlessly scale to real-world applications.
arXiv Detail & Related papers (2021-07-01T17:59:07Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.