Active learning for distributionally robust level-set estimation
- URL: http://arxiv.org/abs/2102.04000v1
- Date: Mon, 8 Feb 2021 04:43:31 GMT
- Title: Active learning for distributionally robust level-set estimation
- Authors: Yu Inatsu, Shogo Iwazaki, Ichiro Takeuchi
- Abstract summary: We show that a black-box function $f$ with high evaluation cost depends on two types of variables $bm x$ and $bm w$.
A natural measure of robustness is the probability that $f(bm x, bm w)$ exceeds a given threshold $h$.
We propose a theoretically grounded and computationally efficient active learning method for this problem.
- Score: 16.869069414080847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many cases exist in which a black-box function $f$ with high evaluation cost
depends on two types of variables $\bm x$ and $\bm w$, where $\bm x$ is a
controllable \emph{design} variable and $\bm w$ are uncontrollable
\emph{environmental} variables that have random variation following a certain
distribution $P$. In such cases, an important task is to find the range of
design variables $\bm x$ such that the function $f(\bm x, \bm w)$ has the
desired properties by incorporating the random variation of the environmental
variables $\bm w$. A natural measure of robustness is the probability that
$f(\bm x, \bm w)$ exceeds a given threshold $h$, which is known as the
\emph{probability threshold robustness} (PTR) measure in the literature on
robust optimization. However, this robustness measure cannot be correctly
evaluated when the distribution $P$ is unknown. In this study, we addressed
this problem by considering the \textit{distributionally robust PTR} (DRPTR)
measure, which considers the worst-case PTR within given candidate
distributions. Specifically, we studied the problem of efficiently identifying
a reliable set $H$, which is defined as a region in which the DRPTR measure
exceeds a certain desired probability $\alpha$, which can be interpreted as a
level set estimation (LSE) problem for DRPTR. We propose a theoretically
grounded and computationally efficient active learning method for this problem.
We show that the proposed method has theoretical guarantees on convergence and
accuracy, and confirmed through numerical experiments that the proposed method
outperforms existing methods.
Related papers
- Computing Optimal Manipulations in Cryptographic Self-Selection Proof-of-Stake Protocols [3.3139913966251227]
We develop methods to provably nail $f(alpha,beta)$ for any desired $(alpha,beta)$ up to arbitrary precision.
One technical challenge is that natural sampling-based estimates of the mean of our target distribution are emphnot unbiased estimators.
arXiv Detail & Related papers (2024-06-21T16:20:39Z) - Robust Distribution Learning with Local and Global Adversarial Corruptions [17.22168727622332]
We develop an efficient finite-sample algorithm with error bounded by $sqrtvarepsilon k + rho + tildeO(dsqrtkn-1/(k lor 2))$ when $P$ has bounded covariance.
Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator.
arXiv Detail & Related papers (2024-06-10T17:48:36Z) - Testing Dependency of Unlabeled Databases [5.384630221560811]
Two random databases $mathsfXinmathcalXntimes d$ and $mathsfYinmathcalYntimes d$ are statistically dependent or not.
We characterize the thresholds at which optimal testing is information-theoretically impossible and possible.
arXiv Detail & Related papers (2023-11-10T05:17:03Z) - Corruption-Robust Offline Reinforcement Learning with General Function
Approximation [60.91257031278004]
We investigate the problem of corruption in offline reinforcement learning (RL) with general function approximation.
Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs)
arXiv Detail & Related papers (2023-10-23T04:07:26Z) - Restarted Bayesian Online Change-point Detection for Non-Stationary
Markov Decision Processes [12.229154524476405]
We introduce a variant of the Restarted Bayesian Online Change-Point Detection algorithm (R-BOCPD)
We propose an improved version of the UCRL2 algorithm for MDPs with state transition kernel sampled from a multinomial distribution.
We show that R-BOCPD-UCRL2 enjoys a favorable regret bound of $Oleft(D O sqrtA T K_T logleft (fracTdelta right) + fracK_Tdeltaminlimits_ell.
arXiv Detail & Related papers (2023-04-01T05:26:41Z) - Contextual Combinatorial Bandits with Probabilistically Triggered Arms [45.305256056479045]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions.
Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z) - Estimating Optimal Policy Value in General Linear Contextual Bandits [50.008542459050155]
In many bandit problems, the maximal reward achievable by a policy is often unknown in advance.
We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable.
We present a more practical, computationally efficient algorithm that estimates a problem-dependent upper bound on $V*$.
arXiv Detail & Related papers (2023-02-19T01:09:24Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.