Lower Bounds on the Bayesian Risk via Information Measures
- URL: http://arxiv.org/abs/2303.12497v3
- Date: Fri, 24 Mar 2023 07:58:13 GMT
- Title: Lower Bounds on the Bayesian Risk via Information Measures
- Authors: Amedeo Roberto Esposito, Adrien Vandenbroucque, Michael Gastpar
- Abstract summary: We show that one can lower bound the risk with any information measure by upper bounding its dual via Markov's inequality.
The behaviour of the lower bound in the number of samples is influenced by the choice of the information measure.
If the observations are subject to privatisation, stronger impossibility results can be obtained via Strong Data-Processing Inequalities.
- Score: 17.698319441265223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on parameter estimation and introduces a new method for
lower bounding the Bayesian risk. The method allows for the use of virtually
\emph{any} information measure, including R\'enyi's $\alpha$,
$\varphi$-Divergences, and Sibson's $\alpha$-Mutual Information. The approach
considers divergences as functionals of measures and exploits the duality
between spaces of measures and spaces of functions. In particular, we show that
one can lower bound the risk with any information measure by upper bounding its
dual via Markov's inequality. We are thus able to provide estimator-independent
impossibility results thanks to the Data-Processing Inequalities that
divergences satisfy. The results are then applied to settings of interest
involving both discrete and continuous parameters, including the
``Hide-and-Seek'' problem, and compared to the state-of-the-art techniques. An
important observation is that the behaviour of the lower bound in the number of
samples is influenced by the choice of the information measure. We leverage
this by introducing a new divergence inspired by the ``Hockey-Stick''
Divergence, which is demonstrated empirically to provide the largest
lower-bound across all considered settings. If the observations are subject to
privatisation, stronger impossibility results can be obtained via Strong
Data-Processing Inequalities. The paper also discusses some generalisations and
alternative directions.
Related papers
- Understanding Probe Behaviors through Variational Bounds of Mutual
Information [53.520525292756005]
We provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory.
First, we connect probing with the variational bounds of mutual information (MI) to relax the probe design, equating linear probing with fine-tuning.
We show that the intermediate representations can have the biggest MI estimate because of the tradeoff between better separability and decreasing MI.
arXiv Detail & Related papers (2023-12-15T18:38:18Z) - Mutual Wasserstein Discrepancy Minimization for Sequential
Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation.
We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z) - Information Processing Equalities and the Information-Risk Bridge [10.451984251615512]
We introduce two new classes of measures of information for statistical experiments.
We derive a simple geometrical relationship between measures of information and the Bayes risk of a statistical decision problem.
arXiv Detail & Related papers (2022-07-25T08:54:36Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
Kernel techniques are among the most popular and flexible approaches in data science.
Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD)
In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
arXiv Detail & Related papers (2021-10-15T21:29:27Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data.
Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information.
Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
arXiv Detail & Related papers (2020-10-05T04:19:27Z) - Doubly Robust Semiparametric Difference-in-Differences Estimators with
High-Dimensional Data [15.27393561231633]
We propose a doubly robust two-stage semiparametric difference-in-difference estimator for estimating heterogeneous treatment effects.
The first stage allows a general set of machine learning methods to be used to estimate the propensity score.
In the second stage, we derive the rates of convergence for both the parametric parameter and the unknown function.
arXiv Detail & Related papers (2020-09-07T15:14:29Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.