Open Problems and Modern Solutions for Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2302.02298v1
- Date: Sun, 5 Feb 2023 04:42:42 GMT
- Title: Open Problems and Modern Solutions for Deep Reinforcement Learning
- Authors: Weiqin Chen
- Abstract summary: We review two publications that investigate the mentioned issues of DRL and propose effective solutions.
One designs the reward for human-robot collaboration by combining the manually designed extrinsic reward with a parameterized intrinsic reward function.
The other one applies selective attention and particle filters to rapidly and flexibly attend to and select crucial pre-learned features for DRL using approximate inference instead of backpropagation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning (DRL) has achieved great success in solving
complicated decision-making problems. Despite the successes, DRL is frequently
criticized for many reasons, e.g., data inefficient, inflexible and intractable
reward design. In this paper, we review two publications that investigate the
mentioned issues of DRL and propose effective solutions. One designs the reward
for human-robot collaboration by combining the manually designed extrinsic
reward with a parameterized intrinsic reward function via the deterministic
policy gradient, which improves the task performance and guarantees a stronger
obstacle avoidance. The other one applies selective attention and particle
filters to rapidly and flexibly attend to and select crucial pre-learned
features for DRL using approximate inference instead of backpropagation,
thereby improving the efficiency and flexibility of DRL. Potential avenues for
future work in both domains are discussed in this paper.
Related papers
- REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Efficient Diffusion Policies for Offline Reinforcement Learning [85.73757789282212]
Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model.
We propose efficient diffusion policy (EDP) to overcome these two challenges.
EDP constructs actions from corrupted ones at training to avoid running the sampling chain.
arXiv Detail & Related papers (2023-05-31T17:55:21Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - A Transferable and Automatic Tuning of Deep Reinforcement Learning for
Cost Effective Phishing Detection [21.481974148873807]
Many challenging real-world problems require the deployment of ensembles multiple complementary learning models.
Deep Reinforcement Learning (DRL) offers a cost-effective alternative, where detectors are dynamically chosen based on the output of their predecessors.
arXiv Detail & Related papers (2022-09-19T14:09:07Z) - Using Deep Reinforcement Learning to solve Optimal Power Flow problem
with generator failures [0.0]
Two classical algorithms have been presented to solve the Optimal Power Flow (OPF) problem.
The drawbacks of the vanilla DRL application are discussed, and an algorithm is suggested to improve the performance.
A reward function for the OPF problem is presented that enables the solution of inherent issues in DRL.
arXiv Detail & Related papers (2022-05-04T15:09:50Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.