Effective Dimension in Bandit Problems under Censorship
- URL: http://arxiv.org/abs/2302.06916v1
- Date: Tue, 14 Feb 2023 09:03:35 GMT
- Title: Effective Dimension in Bandit Problems under Censorship
- Authors: Gauthier Guinet, Saurabh Amin, Patrick Jaillet
- Abstract summary: We study both multi-armed and contextual bandit problems in censored environments.
Our goal is to estimate the performance loss due to censorship in the context of classical algorithms designed for uncensored environments.
- Score: 22.269565708490468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study both multi-armed and contextual bandit problems in
censored environments. Our goal is to estimate the performance loss due to
censorship in the context of classical algorithms designed for uncensored
environments. Our main contributions include the introduction of a broad class
of censorship models and their analysis in terms of the effective dimension of
the problem -- a natural measure of its underlying statistical complexity and
main driver of the regret bound. In particular, the effective dimension allows
us to maintain the structure of the original problem at first order, while
embedding it in a bigger space, and thus naturally leads to results analogous
to uncensored settings. Our analysis involves a continuous generalization of
the Elliptical Potential Inequality, which we believe is of independent
interest. We also discover an interesting property of decision-making under
censorship: a transient phase during which initial misspecification of
censorship is self-corrected at an extra cost, followed by a stationary phase
that reflects the inherent slowdown of learning governed by the effective
dimension. Our results are useful for applications of sequential
decision-making models where the feedback received depends on strategic
uncertainty (e.g., agents' willingness to follow a recommendation) and/or
random uncertainty (e.g., loss or delay in arrival of information).
Related papers
- Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.
We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.
We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z) - Generalization Error Bounds for Learning under Censored Feedback [15.367801388932145]
Generalization error bounds from learning theory provide statistical guarantees on how well an algorithm will perform on previously unseen data.
We characterize the impacts of data non-IIDness due to censored feedback on such bounds.
We show that existing generalization error bounds fail to correctly capture the model's generalization guarantees.
arXiv Detail & Related papers (2024-04-14T13:17:32Z) - dugMatting: Decomposed-Uncertainty-Guided Matting [83.71273621169404]
We propose a decomposed-uncertainty-guided matting algorithm, which explores the explicitly decomposed uncertainties to efficiently and effectively improve the results.
The proposed matting framework relieves the requirement for users to determine the interaction areas by using simple and efficient labeling.
arXiv Detail & Related papers (2023-06-02T11:19:50Z) - Algorithmic Censoring in Dynamic Learning Systems [6.2952076725399975]
We formalize censoring, demonstrate how it can arise, and highlight difficulties in detection.
We consider safeguards against censoring - recourse and randomized-exploration.
The resulting techniques allow examples from censored groups to enter into the training data and correct the model.
arXiv Detail & Related papers (2023-05-15T21:42:22Z) - From Contextual Data to Newsvendor Decisions: On the Actual Performance
of Data-Driven Algorithms [2.9603743540540357]
We study how the relevance and quantity of past data affects the performance of a data-driven policy.
We consider a setting in which past demands observed under close by'' contexts come from close by distributions.
arXiv Detail & Related papers (2023-02-16T17:03:39Z) - CertainNet: Sampling-free Uncertainty Estimation for Object Detection [65.28989536741658]
Estimating the uncertainty of a neural network plays a fundamental role in safety-critical settings.
In this work, we propose a novel sampling-free uncertainty estimation method for object detection.
We call it CertainNet, and it is the first to provide separate uncertainties for each output signal: objectness, class, location and size.
arXiv Detail & Related papers (2021-10-04T17:59:31Z) - A fuzzy-rough uncertainty measure to discover bias encoded explicitly or
implicitly in features of structured pattern classification datasets [0.0]
We study the existence of bias encoded implicitly in non-protected features as defined by the correlation between protected and unprotected attributes.
We conduct a sensitivity analysis to determine the fuzzy operatorsand distance function that best capture change in the boundary regions.
arXiv Detail & Related papers (2021-08-20T10:27:32Z) - Embracing Uncertainty: Decoupling and De-bias for Robust Temporal
Grounding [23.571580627202405]
Temporal grounding aims to localize temporal boundaries within untrimmed videos by language queries.
It faces the challenge of two types of inevitable human uncertainties: query uncertainty and label uncertainty.
We propose a novel DeNet (Decoupling and De-bias) to embrace human uncertainty.
arXiv Detail & Related papers (2021-03-31T07:00:56Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z) - Inverse Active Sensing: Modeling and Understanding Timely
Decision-Making [111.07204912245841]
We develop a framework for the general setting of evidence-based decision-making under endogenous, context-dependent time pressure.
We demonstrate how it enables modeling intuitive notions of surprise, suspense, and optimality in decision strategies.
arXiv Detail & Related papers (2020-06-25T02:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.