$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy
- URL: http://arxiv.org/abs/2401.01268v2
- Date: Thu, 16 May 2024 14:46:49 GMT
- Title: $f$-Divergence Based Classification: Beyond the Use of Cross-Entropy
- Authors: Nicola Novello, Andrea M. Tonello,
- Abstract summary: We adopt a Bayesian perspective and formulate the classification task as a maximum a posteriori probability problem.
We propose a class of objective functions based on the variational representation of the $f$-divergence.
We theoretically analyze the objective functions proposed and numerically test them in three application scenarios.
- Score: 4.550290285002704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep learning, classification tasks are formalized as optimization problems often solved via the minimization of the cross-entropy. However, recent advancements in the design of objective functions allow the usage of the $f$-divergence to generalize the formulation of the optimization problem for classification. We adopt a Bayesian perspective and formulate the classification task as a maximum a posteriori probability problem. We propose a class of objective functions based on the variational representation of the $f$-divergence. Furthermore, driven by the challenge of improving the state-of-the-art approach, we propose a bottom-up method that leads us to the formulation of an objective function corresponding to a novel $f$-divergence referred to as shifted log (SL). We theoretically analyze the objective functions proposed and numerically test them in three application scenarios: toy examples, image datasets, and signal detection/decoding problems. The analyzed scenarios demonstrate the effectiveness of the proposed approach and that the SL divergence achieves the highest classification accuracy in almost all the considered cases.
Related papers
- Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Fine-grained Retrieval Prompt Tuning [149.9071858259279]
Fine-grained Retrieval Prompt Tuning steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompt and feature adaptation.
Our FRPT with fewer learnable parameters achieves the state-of-the-art performance on three widely-used fine-grained datasets.
arXiv Detail & Related papers (2022-07-29T04:10:04Z) - Efficient Neural Network Analysis with Sum-of-Infeasibilities [64.31536828511021]
Inspired by sum-of-infeasibilities methods in convex optimization, we propose a novel procedure for analyzing verification queries on networks with extensive branching functions.
An extension to a canonical case-analysis-based complete search procedure can be achieved by replacing the convex procedure executed at each search state with DeepSoI.
arXiv Detail & Related papers (2022-03-19T15:05:09Z) - Counterfactual Explanations for Arbitrary Regression Models [8.633492031855655]
We present a new method for counterfactual explanations (CFEs) based on Bayesian optimisation.
Our method is a globally convergent search algorithm with support for arbitrary regression models and constraints like feature sparsity and actionable recourse.
arXiv Detail & Related papers (2021-06-29T09:53:53Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - High Dimensional Level Set Estimation with Bayesian Neural Network [58.684954492439424]
This paper proposes novel methods to solve the high dimensional Level Set Estimation problems using Bayesian Neural Networks.
For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points.
Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-12-17T23:21:53Z) - Stochastic Compositional Gradient Descent under Compositional
constraints [13.170519806372075]
We study constrained optimization problems where the objective and constraint functions are convex and expressed as compositions of functions.
The problem arises in fair classification/regression and in the design of queuing systems.
We prove that the proposed algorithm is guaranteed to find the optimal and feasible solution almost surely.
arXiv Detail & Related papers (2020-12-17T05:38:37Z) - Bayesian Optimization of Risk Measures [7.799648230758491]
We consider Bayesian optimization of objective functions of the form $rho[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function.
We propose a family of novel Bayesian optimization algorithms that exploit the structure of the objective function to substantially improve sampling efficiency.
arXiv Detail & Related papers (2020-07-10T18:20:46Z) - Objective-Sensitive Principal Component Analysis for High-Dimensional
Inverse Problems [0.0]
We present a novel approach for adaptive, differentiable parameterization of large-scale random fields.
The developed technique is based on principal component analysis (PCA) but modifies a purely data-driven basis of principal components considering objective function behavior.
Three algorithms for optimal parameter decomposition are presented and applied to an objective of 2D synthetic history matching.
arXiv Detail & Related papers (2020-06-02T18:51:17Z) - Finding Optimal Points for Expensive Functions Using Adaptive RBF-Based
Surrogate Model Via Uncertainty Quantification [11.486221800371919]
We propose a novel global optimization framework using adaptive Radial Basis Functions (RBF) based surrogate model via uncertainty quantification.
It first employs an RBF-based Bayesian surrogate model to approximate the true function, where the parameters of the RBFs can be adaptively estimated and updated each time a new point is explored.
It then utilizes a model-guided selection criterion to identify a new point from a candidate set for function evaluation.
arXiv Detail & Related papers (2020-01-19T16:15:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.