SIDE: I Infer the State I Want to Learn
- URL: http://arxiv.org/abs/2105.06228v1
- Date: Thu, 13 May 2021 12:26:02 GMT
- Title: SIDE: I Infer the State I Want to Learn
- Authors: Zhiwei Xu, Yunpeng Bai, Dapeng Li, Bin Zhang, Guoliang Fan
- Abstract summary: We propose a novel value decomposition framework, named State Inference for value DEcomposition (SIDE), which eliminates the need to know the true state.
SIDE can be extended to any value decomposition method, as well as other types of multi-agent algorithms in the case of Dec-POMDP.
- Score: 17.993973801986677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As one of the solutions to the Dec-POMDP problem, the value decomposition
method has achieved good results recently. However, most value decomposition
methods require the global state during training, but this is not feasible in
some scenarios where the global state cannot be obtained. Therefore, we propose
a novel value decomposition framework, named State Inference for value
DEcomposition (SIDE), which eliminates the need to know the true state by
simultaneously seeking solutions to the two problems of optimal control and
state inference. SIDE can be extended to any value decomposition method, as
well as other types of multi-agent algorithms in the case of Dec-POMDP. Based
on the performance results of different algorithms in Starcraft II
micromanagement tasks, we verified that SIDE can construct the current state
that contributes to the reinforcement learning process based on past local
observations.
Related papers
- STAT: Towards Generalizable Temporal Action Localization [56.634561073746056]
Weakly-supervised temporal action localization (WTAL) aims to recognize and localize action instances with only video-level labels.
Existing methods suffer from severe performance degradation when transferring to different distributions.
We propose GTAL, which focuses on improving the generalizability of action localization methods.
arXiv Detail & Related papers (2024-04-20T07:56:21Z) - Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs)
Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks.
We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z) - Intermittently Observable Markov Decision Processes [26.118176084782842]
We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel.
The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process.
We develop two finite-state approximations to the tree MDP to find near-optimal policies efficiently.
arXiv Detail & Related papers (2023-02-23T03:38:03Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency [105.17746223041954]
Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges.
It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon.
We propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.
arXiv Detail & Related papers (2022-05-26T16:34:46Z) - An Adaptive State Aggregation Algorithm for Markov Decision Processes [10.494611365482028]
We propose an intuitive algorithm for solving MDPs that reduces the cost of value iteration updates by dynamically grouping together states with similar cost-to-go values.
Our algorithm converges almost surely to within (2varepsilon / (1 - gamma) of the true optimal value in the (ellinfty) norm, where (gamma) is the discount factor and aggregated states differ by at most (varepsilon)
arXiv Detail & Related papers (2021-07-23T07:19:43Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - State Action Separable Reinforcement Learning [11.04892417160547]
We propose a new learning paradigm, State Action Separable Reinforcement Learning (sasRL)
sasRL, wherein the action space is decoupled from the value function learning process for higher efficiency.
Experiments on several gaming scenarios show that sasRL outperforms state-of-the-art MDP-based RL algorithms by up to $75%$.
arXiv Detail & Related papers (2020-06-05T22:02:57Z) - A State Aggregation Approach for Solving Knapsack Problem with Deep
Reinforcement Learning [3.614984020677526]
This paper proposes a Deep Reinforcement Learning (DRL) approach for solving knapsack problem.
The state aggregation policy is applied to each problem instance of the knapsack problem.
The proposed model with the state aggregation strategy not only gives better solutions but also learns in less timesteps, than the one without state aggregation.
arXiv Detail & Related papers (2020-04-25T11:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.