Towards a Reward-Free Reinforcement Learning Framework for Vehicle Control
- URL: http://arxiv.org/abs/2502.15262v2
- Date: Fri, 18 Apr 2025 09:12:56 GMT
- Title: Towards a Reward-Free Reinforcement Learning Framework for Vehicle Control
- Authors: Jielong Yang, Daoyuan Huang,
- Abstract summary: Reinforcement learning plays a crucial role in vehicle control by guiding agents to learn optimal control strategies.<n>In vehicle control applications, rewards typically need to be manually designed while considering multiple implicit factors.<n>We propose a reward-free reinforcement learning framework (RFRLF) to address these issues.
- Score: 1.5883812630616523
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement learning plays a crucial role in vehicle control by guiding agents to learn optimal control strategies through designing or learning appropriate reward signals. However, in vehicle control applications, rewards typically need to be manually designed while considering multiple implicit factors, which easily introduces human biases. Although imitation learning methods does not rely on explicit reward signals, they necessitate high-quality expert actions, which are often challenging to acquire. To address these issues, we propose a reward-free reinforcement learning framework (RFRLF). This framework directly learns the target states to optimize agent behavior through a target state prediction network (TSPN) and a reward-free state-guided policy network (RFSGPN), avoiding the dependence on manually designed reward signals. Specifically, the policy network is learned via minimizing the differences between the predicted state and the expert state. Experimental results demonstrate the effectiveness of the proposed RFRLF in controlling vehicle driving, showing its advantages in improving learning efficiency and adapting to reward-free environments.
Related papers
- ROAD: Responsibility-Oriented Reward Design for Reinforcement Learning in Autonomous Driving [6.713954449470747]
This study introduces a responsibility-oriented reward function that explicitly incorporates traffic regulations into theReinforcement learning framework.<n>We introduce a Traffic Regulation Knowledge Graph and leveraged Vision-Language Models alongside Retrieval-Augmented Generation techniques to automate reward assignment.
arXiv Detail & Related papers (2025-05-30T08:00:51Z) - Can Large Reasoning Models Self-Train? [58.953117118687096]
Scaling the performance of large language models increasingly depends on methods that reduce reliance on human supervision.<n>We propose an online self-training reinforcement learning algorithm that leverages the model's self-consistency to infer correctness signals and train without any ground-truth supervision.
arXiv Detail & Related papers (2025-05-27T17:16:00Z) - Learning to Reason without External Rewards [100.27210579418562]
Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision.<n>We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data.<n>We propose Intuitor, an RLIF method that uses a model's own confidence, termed self-certainty, as its sole reward signal.
arXiv Detail & Related papers (2025-05-26T07:01:06Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.
Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Informed Reinforcement Learning for Situation-Aware Traffic Rule Exceptions [22.305075467333673]
In this work, we introduce Informed Reinforcement Learning, where a structured rulebook is integrated as a knowledge source.
We learn trajectories and asses them with a situation-aware reward design, leading to a dynamic reward which allows the agent to learn situations which require controlled traffic rule exceptions.
arXiv Detail & Related papers (2024-02-06T17:24:06Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.<n>Recent methods aim to mitigate misalignment by learning reward functions from human preferences.<n>We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Risk-Aware Reward Shaping of Reinforcement Learning Agents for
Autonomous Driving [6.613838702441967]
This paper investigates how to use risk-aware reward shaping to leverage the training and test performance of RL agents in autonomous driving.
We propose additional reshaped reward terms that encourage exploration and penalize risky driving behaviors.
arXiv Detail & Related papers (2023-06-05T20:10:36Z) - Developing Driving Strategies Efficiently: A Skill-Based Hierarchical
Reinforcement Learning Approach [0.7373617024876725]
Reinforcement learning is a common tool to model driver policies.
We propose skill-based" hierarchical driving strategies, where motion primitives are designed and used as high-level actions.
arXiv Detail & Related papers (2023-02-04T15:09:51Z) - Tackling Real-World Autonomous Driving using Deep Reinforcement Learning [63.3756530844707]
In this work, we propose a model-free Deep Reinforcement Learning Planner training a neural network that predicts acceleration and steering angle.
In order to deploy the system on board the real self-driving car, we also develop a module represented by a tiny neural network.
arXiv Detail & Related papers (2022-07-05T16:33:20Z) - Learning energy-efficient driving behaviors by imitating experts [75.12960180185105]
This paper examines the role of imitation learning in bridging the gap between control strategies and realistic limitations in communication and sensing.
We show that imitation learning can succeed in deriving policies that, if adopted by 5% of vehicles, may boost the energy-efficiency of networks with varying traffic conditions by 15% using only local observations.
arXiv Detail & Related papers (2022-06-28T17:08:31Z) - Automatically Learning Fallback Strategies with Model-Free Reinforcement
Learning in Safety-Critical Driving Scenarios [9.761912672523977]
We present a principled approach for a model-free Reinforcement Learning (RL) agent to capture multiple modes of behaviour in an environment.
We introduce an extra pseudo-reward term to the reward model, to encourage exploration to areas of state-space different from areas privileged by the optimal policy.
We show that we are able to learn useful policies that would have otherwise been missed out on during training, and unavailable to use when executing the control algorithm.
arXiv Detail & Related papers (2022-04-11T15:34:49Z) - Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal.
We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations.
Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Falsification-Based Robust Adversarial Reinforcement Learning [13.467693018395863]
falsification-based RARL (FRARL) is the first generic framework for integrating temporal logic falsification in adversarial learning to improve policy robustness.
Our experimental results demonstrate that policies trained with a falsification-based adversary generalize better and show less violation of the safety specification in test scenarios.
arXiv Detail & Related papers (2020-07-01T18:32:05Z) - Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL)
We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.