Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator
- URL: http://arxiv.org/abs/2401.16772v1
- Date: Tue, 30 Jan 2024 06:22:19 GMT
- Title: Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator
- Authors: Ryoma Furuyama, Daiki Kuyoshi and Satoshi Yamane
- Abstract summary: Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift.
Soft Q imitation learning (SQIL) addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning is often used in addition to reinforcement learning in
environments where reward design is difficult or where the reward is sparse,
but it is difficult to be able to imitate well in unknown states from a small
amount of expert data and sampling data. Supervised learning methods such as
Behavioral Cloning do not require sampling data, but usually suffer from
distribution shift. The methods based on reinforcement learning, such as
inverse reinforcement learning and Generative Adversarial imitation learning
(GAIL), can learn from only a few expert data. However, they often need to
interact with the environment. Soft Q imitation learning (SQIL) addressed the
problems, and it was shown that it could learn efficiently by combining
Behavioral Cloning and soft Q-learning with constant rewards. In order to make
this algorithm more robust to distribution shift, we propose more efficient and
robust algorithm by adding to this method a reward function based on
adversarial inverse reinforcement learning that rewards the agent for
performing actions in status similar to the demo. We call this algorithm
Discriminator Soft Q Imitation Learning (DSQIL). We evaluated it on MuJoCo
environments.
Related papers
- Machine Unlearning in Forgettability Sequence [22.497699136603877]
We identify key factor affecting unlearning difficulty and the performance of unlearning algorithms.
We propose a general unlearning framework, dubbed RSU, which consists of Ranking module and SeqUnlearn module.
arXiv Detail & Related papers (2024-10-09T01:12:07Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness.
The more disparate the data distributions across clients the more they benefit from learning.
We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z) - Learning Fast Sample Re-weighting Without Reward Data [41.92662851886547]
This paper presents a novel learning-based fast sample re-weighting (FSR) method that does not require additional reward data.
Our experiments show the proposed method achieves competitive results compared to state of the arts on label noise and long-tailed recognition.
arXiv Detail & Related papers (2021-09-07T17:30:56Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z) - Discriminator Soft Actor Critic without Extrinsic Rewards [0.30586855806896046]
It is difficult to imitate well in unknown states from a small amount of expert data and sampling data.
We propose Discriminator Soft Actor Critic (DSAC) to make this algorithm more robust to distribution shift.
arXiv Detail & Related papers (2020-01-19T10:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.