Rate-Distortion Analysis of Minimum Excess Risk in Bayesian Learning
- URL: http://arxiv.org/abs/2105.04180v1
- Date: Mon, 10 May 2021 08:14:10 GMT
- Title: Rate-Distortion Analysis of Minimum Excess Risk in Bayesian Learning
- Authors: Hassan Hafez-Kolahi, Behrad Moniri, Shohreh Kasaei, Mahdieh Soleymani
Baghshah
- Abstract summary: Minimum Excess Risk (MER) in Bayesian learning is defined as the difference between the minimum expected loss achievable when learning from data and the minimum expected loss that could be achieved if the underlying parameter $W$ was observed.
We derive information-theoretic bounds on the difference between these upper and lower bounds and show that they can provide order-wise tight rates for MER.
- Score: 15.544041797200045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Minimum Excess Risk (MER) in Bayesian learning is defined as the difference
between the minimum expected loss achievable when learning from data and the
minimum expected loss that could be achieved if the underlying parameter $W$
was observed. In this paper, we build upon and extend the recent results of (Xu
& Raginsky, 2020) to analyze the MER in Bayesian learning and derive
information-theoretic bounds on it. We formulate the problem as a (constrained)
rate-distortion optimization and show how the solution can be bounded above and
below by two other rate-distortion functions that are easier to study. The
lower bound represents the minimum possible excess risk achievable by
\emph{any} process using $R$ bits of information from the parameter $W$. For
the upper bound, the optimization is further constrained to use $R$ bits from
the training set, a setting which relates MER to information-theoretic bounds
on the generalization gap in frequentist learning. We derive
information-theoretic bounds on the difference between these upper and lower
bounds and show that they can provide order-wise tight rates for MER. This
analysis gives more insight into the information-theoretic nature of Bayesian
learning as well as providing novel bounds.
Related papers
- Minimax Linear Regression under the Quantile Risk [31.277788690403522]
We study the problem of designing minimax procedures in linear regression under the quantile risk.
We prove a matching upper bound on the worst-case quantile risk of a variant of the recently proposed min-max regression procedure.
arXiv Detail & Related papers (2024-06-17T23:24:14Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Taylor Learning [0.0]
Empirical risk minimization stands behind most optimization in supervised machine learning.
We introduce a learning algorithm to construct models for real analytic functions using neither gradient descent nor empirical risk minimization.
arXiv Detail & Related papers (2023-05-24T01:10:58Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - Information-Theoretic Bayes Risk Lower Bounds for Realizable Models [11.203656874463697]
We derive information-theoretic lower bounds on the Bayes risk and generalization error of realizable machine learning models.
For realizable models, we show that both the rate distortion function and mutual information admit expressions that are convenient for analysis.
arXiv Detail & Related papers (2021-11-08T15:42:35Z) - Under-bagging Nearest Neighbors for Imbalanced Classification [63.026765294759876]
We propose an ensemble learning algorithm called textitunder-bagging $k$-NN (textitunder-bagging $k$-NN) for imbalanced classification problems.
arXiv Detail & Related papers (2021-09-01T14:10:38Z) - Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural
Networks [15.711517003382484]
We show that the Hessian spectrum concentrates near positives, with the exception of $Theta(d)$ eigenvalues which growly with$d$.
This makes possible the creation and the annihilation of minima using powerful tools from bifurcation theory.
arXiv Detail & Related papers (2021-07-21T22:05:48Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP)
We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.