Generalized Maximum Entropy for Supervised Classification
- URL: http://arxiv.org/abs/2007.05447v3
- Date: Wed, 15 Dec 2021 17:48:51 GMT
- Title: Generalized Maximum Entropy for Supervised Classification
- Authors: Santiago Mazuelas, Yuan Shen, and Aritz P\'erez
- Abstract summary: The maximum entropy principle advocates to evaluate events' probabilities using a distribution that maximizes entropy.
This paper establishes a framework for supervised classification based on the generalized maximum entropy principle.
- Score: 26.53901315716557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The maximum entropy principle advocates to evaluate events' probabilities
using a distribution that maximizes entropy among those that satisfy certain
expectations' constraints. Such principle can be generalized for arbitrary
decision problems where it corresponds to minimax approaches. This paper
establishes a framework for supervised classification based on the generalized
maximum entropy principle that leads to minimax risk classifiers (MRCs). We
develop learning techniques that determine MRCs for general entropy functions
and provide performance guarantees by means of convex optimization. In
addition, we describe the relationship of the presented techniques with
existing classification methods, and quantify MRCs performance in comparison
with the proposed bounds and conventional methods.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - A Unified Convergence Theorem for Stochastic Optimization Methods [4.94128206910124]
We provide a fundamental unified convergence theorem used for deriving convergence results for a series of unified optimization methods.
As a direct application, we recover almost sure convergence results under general settings.
arXiv Detail & Related papers (2022-06-08T14:01:42Z) - Categorical Distributions of Maximum Entropy under Marginal Constraints [0.0]
estimation of categorical distributions under marginal constraints is key for many machine-learning and data-driven approaches.
We provide a parameter-agnostic theoretical framework that ensures that a categorical distribution of Maximum Entropy under marginal constraints always exists.
arXiv Detail & Related papers (2022-04-07T12:42:58Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Notes on Generalizing the Maximum Entropy Principle to Uncertain Data [0.0]
We generalize the principle of maximum entropy for computing a distribution with the least amount of information possible.
We show that our technique generalizes the principle of maximum entropy and latent maximum entropy.
We discuss a generally applicable regularization technique for adding error terms to feature expectation constraints in the event of limited data.
arXiv Detail & Related papers (2021-09-09T19:43:28Z) - Community Detection in the Stochastic Block Model by Mixed Integer
Programming [3.8073142980733]
Degree-Corrected Block Model (DCSBM) is a popular model to generate random graphs with community structure given an expected degree sequence.
Standard approach of community detection based on the DCSBM is to search for the model parameters that are the most likely to have produced the observed network data through maximum likelihood estimation (MLE)
We present mathematical programming formulations and exact solution methods that can provably find the model parameters and community assignments of maximum likelihood given an observed graph.
arXiv Detail & Related papers (2021-01-26T22:04:40Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Selective Classification via One-Sided Prediction [54.05407231648068]
One-sided prediction (OSP) based relaxation yields an SC scheme that attains near-optimal coverage in the practically relevant high target accuracy regime.
We theoretically derive bounds generalization for SC and OSP, and empirically we show that our scheme strongly outperforms state of the art methods in coverage at small error levels.
arXiv Detail & Related papers (2020-10-15T16:14:27Z) - A maximum-entropy approach to off-policy evaluation in average-reward
MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.