Strategic Decision-Making in the Presence of Information Asymmetry:
Provably Efficient RL with Algorithmic Instruments
- URL: http://arxiv.org/abs/2208.11040v1
- Date: Tue, 23 Aug 2022 15:32:44 GMT
- Title: Strategic Decision-Making in the Presence of Information Asymmetry:
Provably Efficient RL with Algorithmic Instruments
- Authors: Mengxin Yu, Zhuoran Yang, Jianqing Fan
- Abstract summary: We study offline reinforcement learning under a novel model called strategic MDP.
We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN)
- Score: 55.41685740015095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study offline reinforcement learning under a novel model called strategic
MDP, which characterizes the strategic interactions between a principal and a
sequence of myopic agents with private types. Due to the bilevel structure and
private types, strategic MDP involves information asymmetry between the
principal and the agents. We focus on the offline RL problem, where the goal is
to learn the optimal policy of the principal concerning a target population of
agents based on a pre-collected dataset that consists of historical
interactions. The unobserved private types confound such a dataset as they
affect both the rewards and observations received by the principal. We propose
a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments
(PLAN), which leverages the ideas of instrumental variable regression and the
pessimism principle to learn a near-optimal principal's policy in the context
of general function approximation. Our algorithm is based on the critical
observation that the principal's actions serve as valid instrumental variables.
In particular, under a partial coverage assumption on the offline dataset, we
prove that PLAN outputs a $1 / \sqrt{K}$-optimal policy with $K$ being the
number of collected trajectories. We further apply our framework to some
special cases of strategic MDP, including strategic regression, strategic
bandit, and noncompliance in recommendation systems.
Related papers
- Non-linear Welfare-Aware Strategic Learning [10.448052192725168]
This paper studies algorithmic decision-making in the presence of strategic individual behaviors.
We first generalize the agent best response model in previous works to the non-linear setting.
We show the three welfare can attain the optimum simultaneously only under restrictive conditions.
arXiv Detail & Related papers (2024-05-03T01:50:03Z) - Differentially Private Deep Model-Based Reinforcement Learning [47.651861502104715]
We introduce PriMORL, a model-based RL algorithm with formal differential privacy guarantees.
PriMORL learns an ensemble of trajectory-level DP models of the environment from offline data.
arXiv Detail & Related papers (2024-02-08T10:05:11Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Importance Weighted Actor-Critic for Optimal Conservative Offline
Reinforcement Learning [23.222448307481073]
We propose a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage.
Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm.
We provide both theoretical analysis and experimental results to validate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2023-01-30T07:53:53Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges.
MerPO learns a meta-model for efficient task structure inference and an informative meta-policy.
We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z) - Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety
Constraints in Finite MDPs [71.47895794305883]
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning setting.
We present an SPI for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals.
arXiv Detail & Related papers (2021-05-31T21:04:21Z) - Robust Batch Policy Learning in Markov Decision Processes [0.0]
We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP)
We propose to evaluate each policy by a set of the average rewards with respect to distributions centered at the policy induced stationary distribution.
arXiv Detail & Related papers (2020-11-09T04:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.