Online Learning with Costly Features in Non-stationary Environments
- URL: http://arxiv.org/abs/2307.09388v1
- Date: Tue, 18 Jul 2023 16:13:35 GMT
- Title: Online Learning with Costly Features in Non-stationary Environments
- Authors: Saeed Ghoorchian, Evgenii Kortukov, Setareh Maghsudi
- Abstract summary: In sequential decision-making problems, maximizing long-term rewards is the primary goal.
In real-world problems, collecting beneficial information is often costly.
We develop an algorithm that guarantees a sublinear regret in time.
- Score: 6.009759445555003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maximizing long-term rewards is the primary goal in sequential
decision-making problems. The majority of existing methods assume that side
information is freely available, enabling the learning agent to observe all
features' states before making a decision. In real-world problems, however,
collecting beneficial information is often costly. That implies that, besides
individual arms' reward, learning the observations of the features' states is
essential to improve the decision-making strategy. The problem is aggravated in
a non-stationary environment where reward and cost distributions undergo abrupt
changes over time. To address the aforementioned dual learning problem, we
extend the contextual bandit setting and allow the agent to observe subsets of
features' states. The objective is to maximize the long-term average gain,
which is the difference between the accumulated rewards and the paid costs on
average. Therefore, the agent faces a trade-off between minimizing the cost of
information acquisition and possibly improving the decision-making process
using the obtained information. To this end, we develop an algorithm that
guarantees a sublinear regret in time. Numerical results demonstrate the
superiority of our proposed policy in a real-world scenario.
Related papers
- Active Learning for Fair and Stable Online Allocations [6.23798328186465]
We consider feedback from a select subset of agents at each epoch of the online resource allocation process.
Our algorithms provide regret bounds that are sub-linear in number of time-periods for various measures.
We show that efficient decision-making does not require extensive feedback and produces efficient outcomes for a variety of problem classes.
arXiv Detail & Related papers (2024-06-20T23:23:23Z) - Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$.
These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z) - Learning to Schedule Online Tasks with Bandit Feedback [7.671139712158846]
Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing.
We propose a double-optimistic learning based Robbins-Monro (DOL-RM) algorithm.
DOL-RM integrates a learning module that incorporates optimistic estimation for reward-to-cost ratio and a decision module.
arXiv Detail & Related papers (2024-02-26T10:11:28Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Bayesian Inverse Transition Learning for Offline Settings [30.10905852013852]
Reinforcement learning is commonly used for sequential decision-making in domains such as healthcare and education.
We propose a new constraint-based approach that captures our desiderata for reliably learning a posterior distribution of the transition dynamics $T$.
Our results demonstrate that by using our constraints, we learn a high-performing policy, while considerably reducing the policy's variance over different datasets.
arXiv Detail & Related papers (2023-08-09T17:08:29Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Information-Gathering in Latent Bandits [79.6953033727455]
We present a method for information-gathering in latent bandits.
We show that choosing the best arm given the agent's beliefs about the states incurs higher regret.
We also show that by choosing arms carefully, we obtain an improved estimation of the state distribution.
arXiv Detail & Related papers (2022-07-08T01:15:12Z) - Reinforcement Learning with Efficient Active Feature Acquisition [59.91808801541007]
In real-life, information acquisition might correspond to performing a medical test on a patient.
We propose a model-based reinforcement learning framework that learns an active feature acquisition policy.
Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states.
arXiv Detail & Related papers (2020-11-02T08:46:27Z) - Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning [100.73223416589596]
We propose a cost-sensitive portfolio selection method with deep reinforcement learning.
Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations.
A new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning.
arXiv Detail & Related papers (2020-03-06T06:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.