Related papers: Discriminator Soft Actor Critic without Extrinsic Rewards

Discriminator Soft Actor Critic without Extrinsic Rewards

URL: http://arxiv.org/abs/2001.06808v3
Date: Fri, 31 Jan 2020 12:39:52 GMT
Title: Discriminator Soft Actor Critic without Extrinsic Rewards
Authors: Daichi Nishio, Daiki Kuyoshi, Toi Tsuneda and Satoshi Yamane
Abstract summary: It is difficult to imitate well in unknown states from a small amount of expert data and sampling data. We propose Discriminator Soft Actor Critic (DSAC) to make this algorithm more robust to distribution shift.
Score: 0.30586855806896046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. The methods based on reinforcement learning, such as inverse reinforcement learning and generative adversarial imitation learning (GAIL), can learn from only a few expert data. However, they often need to interact with the environment. Soft Q imitation learning addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards. In order to make this algorithm more robust to distribution shift, we propose Discriminator Soft Actor Critic (DSAC). It uses a reward function based on adversarial inverse reinforcement learning instead of constant rewards. We evaluated it on PyBullet environments with only four expert trajectories.

Related papers

Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
Machine Unlearning in Forgettability Sequence [22.497699136603877]
We identify key factor affecting unlearning difficulty and the performance of unlearning algorithms. We propose a general unlearning framework, dubbed RSU, which consists of Ranking module and SeqUnlearn module.
arXiv Detail & Related papers (2024-10-09T01:12:07Z)
A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult. We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z)
Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator [0.0]
Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. Soft Q imitation learning (SQIL) addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards.
arXiv Detail & Related papers (2024-01-30T06:22:19Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks. To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks. However, it is not expected in practice considering the memory constraint or data privacy issue. As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness. The more disparate the data distributions across clients the more they benefit from learning. We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z)
Bayesian Q-learning With Imperfect Expert Demonstrations [56.55609745121237]
We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. We evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards.
arXiv Detail & Related papers (2022-10-01T17:38:19Z)
IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics. Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence. We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z)
Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training. At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z)
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback. We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.