Maximum Entropy competes with Maximum Likelihood
- URL: http://arxiv.org/abs/2012.09430v1
- Date: Thu, 17 Dec 2020 07:44:22 GMT
- Title: Maximum Entropy competes with Maximum Likelihood
- Authors: A.E. Allahverdyan and N.H. Martirosyan
- Abstract summary: Max entropy (MAXENT) method has a large number of applications in theoretical and applied machine learning.
We show that MAXENT applies in sparse data regimes, but needs specific types of prior information.
In particular, MAXENT can outperform the optimally regularized ML provided that there are prior rank correlations between the estimated random quantity and its probabilities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Maximum entropy (MAXENT) method has a large number of applications in
theoretical and applied machine learning, since it provides a convenient
non-parametric tool for estimating unknown probabilities. The method is a major
contribution of statistical physics to probabilistic inference. However, a
systematic approach towards its validity limits is currently missing. Here we
study MAXENT in a Bayesian decision theory set-up, i.e. assuming that there
exists a well-defined prior Dirichlet density for unknown probabilities, and
that the average Kullback-Leibler (KL) distance can be employed for deciding on
the quality and applicability of various estimators. These allow to evaluate
the relevance of various MAXENT constraints, check its general applicability,
and compare MAXENT with estimators having various degrees of dependence on the
prior, viz. the regularized maximum likelihood (ML) and the Bayesian
estimators. We show that MAXENT applies in sparse data regimes, but needs
specific types of prior information. In particular, MAXENT can outperform the
optimally regularized ML provided that there are prior rank correlations
between the estimated random quantity and its probabilities.
Related papers
- An Upper Confidence Bound Approach to Estimating the Maximum Mean [0.0]
We study estimation of the maximum mean using an upper confidence bound (UCB) approach.
We establish statistical guarantees, including strong consistency, mean squared errors, and central limit theorems (CLTs) for both estimators.
arXiv Detail & Related papers (2024-08-08T02:53:09Z) - On the Consistency of Maximum Likelihood Estimation of Probabilistic
Principal Component Analysis [1.0528389538549636]
PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance.
Despite this wide applicability in various fields, hardly any theoretical guarantees exist to justify the soundness of the maximal likelihood (ML) solution for this model.
We propose a novel approach using quotient topological spaces and in particular, we show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space.
arXiv Detail & Related papers (2023-11-08T22:40:45Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - In Defense of Softmax Parametrization for Calibrated and Consistent
Learning to Defer [27.025808709031864]
It has been theoretically shown that popular estimators for learning to defer parameterized with softmax provide unbounded estimates for the likelihood of deferring.
We show that the cause of the miscalibrated and unbounded estimator in prior literature is due to the symmetric nature of the surrogate losses used and not due to softmax.
We propose a novel statistically consistent asymmetric softmax-based surrogate loss that can produce valid estimates without the issue of unboundedness.
arXiv Detail & Related papers (2023-11-02T09:15:52Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Distributionally Robust Parametric Maximum Likelihood Estimation [13.09499764232737]
We propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric nominal distribution.
Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks.
arXiv Detail & Related papers (2020-10-11T19:05:49Z) - A Discriminative Technique for Multiple-Source Adaptation [55.5865665284915]
We present a new discriminative technique for the multiple-source adaptation, MSA, problem.
Our solution only requires conditional probabilities that can easily be accurately estimated from unlabeled data from the source domains.
Our experiments with real-world applications further demonstrate that our new discriminative MSA algorithm outperforms the previous generative solution.
arXiv Detail & Related papers (2020-08-25T14:06:15Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.