Amortized Safe Active Learning for Real-Time Decision-Making: Pretrained Neural Policies from Simulated Nonparametric Functions
- URL: http://arxiv.org/abs/2501.15458v1
- Date: Sun, 26 Jan 2025 09:05:52 GMT
- Title: Amortized Safe Active Learning for Real-Time Decision-Making: Pretrained Neural Policies from Simulated Nonparametric Functions
- Authors: Cen-You Li, Marc Toussaint, Barbara Rakitsch, Christoph Zimmer,
- Abstract summary: Active Learning (AL) is a sequential learning approach aiming at selecting the most informative data for model training.
Key challenges of AL are the repeated model training and acquisition optimization required for data selection.
By leveraging a pretrained neural network policy, our method eliminates the need for repeated model training and acquisition optimization.
- Score: 23.406516455945653
- License:
- Abstract: Active Learning (AL) is a sequential learning approach aiming at selecting the most informative data for model training. In many systems, safety constraints appear during data evaluation, requiring the development of safe AL methods. Key challenges of AL are the repeated model training and acquisition optimization required for data selection, which become particularly restrictive under safety constraints. This repeated effort often creates a bottleneck, especially in physical systems requiring real-time decision-making. In this paper, we propose a novel amortized safe AL framework. By leveraging a pretrained neural network policy, our method eliminates the need for repeated model training and acquisition optimization, achieving substantial speed improvements while maintaining competitive learning outcomes and safety awareness. The policy is trained entirely on synthetic data utilizing a novel safe AL objective. The resulting policy is highly versatile and adapts to a wide range of systems, as we demonstrate in our experiments. Furthermore, our framework is modular and we empirically show that we also achieve superior performance for unconstrained time-sensitive AL tasks if we omit the safety requirement.
Related papers
- Learning Objective-Specific Active Learning Strategies with Attentive
Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting.
We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem.
Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Policy Improvement in Constrained Markov Decision Processes [10.518340300810504]
We present a solution to the synthesis problem by solving its two main challenges: reward-shaping from a set of formal requirements and safe policy update.
For the former, we propose an automatic reward-shaping procedure, defining a scalar reward signal compliant with the task specification.
For the latter, we introduce an algorithm ensuring that the policy is improved in a safe fashion with high-confidence guarantees.
arXiv Detail & Related papers (2022-10-20T13:29:32Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safety and Liveness Guarantees through Reach-Avoid Reinforcement
Learning [24.56889192688925]
Reach-avoid optimal control problems are central to safety and liveness assurance for autonomous robotic systems.
Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive.
Recent work has shown promise in extending the reinforcement learning machinery to handle safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time.
arXiv Detail & Related papers (2021-12-23T00:44:38Z) - Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z) - MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch
Optimization for Deployment Constrained Reinforcement Learning [108.79676336281211]
Continuous deployment of new policies for data collection and online learning is either cost ineffective or impractical.
We propose a new algorithmic learning framework called Model-based Uncertainty regularized and Sample Efficient Batch Optimization.
Our framework discovers novel and high quality samples for each deployment to enable efficient data collection.
arXiv Detail & Related papers (2021-02-23T01:30:55Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.