Related papers: When to Localize? A Risk-Constrained Reinforcement Learning Approach

When to Localize? A Risk-Constrained Reinforcement Learning Approach

URL: http://arxiv.org/abs/2411.02788v1
Date: Tue, 05 Nov 2024 03:54:00 GMT
Title: When to Localize? A Risk-Constrained Reinforcement Learning Approach
Authors: Chak Lam Shek, Kasra Torshizi, Troi Williams, Pratap Tokekar,
Abstract summary: In some scenarios, a robot needs to selectively localize when it is expensive to obtain observations. RiskRL is a constrained Reinforcement Learning framework that overcomes these limitations.
Score: 13.853127103435012
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In a standard navigation pipeline, a robot localizes at every time step to lower navigational errors. However, in some scenarios, a robot needs to selectively localize when it is expensive to obtain observations. For example, an underwater robot surfacing to localize too often hinders it from searching for critical items underwater, such as black boxes from crashed aircraft. On the other hand, if the robot never localizes, poor state estimates cause failure to find the items due to inadvertently leaving the search area or entering hazardous, restricted areas. Motivated by these scenarios, we investigate approaches to help a robot determine "when to localize?" We formulate this as a bi-criteria optimization problem: minimize the number of localization actions while ensuring the probability of failure (due to collision or not reaching a desired goal) remains bounded. In recent work, we showed how to formulate this active localization problem as a constrained Partially Observable Markov Decision Process (POMDP), which was solved using an online POMDP solver. However, this approach is too slow and requires full knowledge of the robot transition and observation models. In this paper, we present RiskRL, a constrained Reinforcement Learning (RL) framework that overcomes these limitations. RiskRL uses particle filtering and recurrent Soft Actor-Critic network to learn a policy that minimizes the number of localizations while ensuring the probability of failure constraint is met. Our numerical experiments show that RiskRL learns a robust policy that outperforms the baseline by at least 13% while also generalizing to unseen environments.

Related papers

Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning [0.4218593777811082]
This work introduces a novel approach that integrates aleatoric, epistemic, and predictive uncertainty estimation into a DRL-based navigation framework. In uncertain decision-making situations, we propose to change the robot's social behavior to conservative collision avoidance.
arXiv Detail & Related papers (2024-09-16T18:49:38Z)
LDP: A Local Diffusion Planner for Efficient Robot Navigation and Collision Avoidance [16.81917489473445]
The conditional diffusion model has been demonstrated as an efficient tool for learning robot policies. The intricate nature of real-world scenarios, characterized by dynamic obstacles and maze-like structures, underscores the complexity of robot local navigation decision-making.
arXiv Detail & Related papers (2024-07-02T04:53:35Z)
Model Checking for Closed-Loop Robot Reactive Planning [0.0]
We show how model checking can be used to create multistep plans for a differential drive wheeled robot so that it can avoid immediate danger. Using a small, purpose built model checking algorithm in situ we generate plans in real-time in a way that reflects the egocentric reactive response of simple biological agents.
arXiv Detail & Related papers (2023-11-16T11:02:29Z)
Distributional Instance Segmentation: Modeling Uncertainty and High Confidence Predictions with Latent-MaskRCNN [77.0623472106488]
In this paper, we explore a class of distributional instance segmentation models using latent codes. For robotic picking applications, we propose a confidence mask method to achieve the high precision necessary. We show that our method can significantly reduce critical errors in robotic systems, including our newly released dataset of ambiguous scenes.
arXiv Detail & Related papers (2023-05-03T05:57:29Z)
A Multiplicative Value Function for Safe and Efficient Reinforcement Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z)
Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping [0.1588748438612071]
We study the application of DRL algorithms in the context of local navigation problems. Collision avoidance policies based on DRL present some advantages, but they are quite susceptible to local minima. We propose a novel reward function that incorporates map information gained in the training stage, increasing the agent's capacity to deliberate about the best course of action.
arXiv Detail & Related papers (2022-09-28T17:34:48Z)
Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection. Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior. Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z)
Adaptive Risk Tendency: Nano Drone Navigation in Cluttered Environments with Distributional Reinforcement Learning [17.940958199767234]
We present a distributional reinforcement learning framework to learn adaptive risk tendency policies. We show our algorithm can adjust its risk-sensitivity on the fly both in simulation and real-world experiments.
arXiv Detail & Related papers (2022-03-28T13:39:58Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
SABER: Data-Driven Motion Planner for Autonomously Navigating Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal. We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints. recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution. A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z)
Learning Risk-aware Costmaps for Traversability in Challenging Environments [16.88528967313285]
We introduce a neural network architecture for robustly learning the distribution of traversability costs. Because we are motivated by preserving the life of the robot, we tackle this learning problem from the perspective of learning tail-risks. We show that this approach reliably learns the expected tail risk given a desired probability risk threshold between 0 and 1.
arXiv Detail & Related papers (2021-07-25T04:12:03Z)
POMP: Pomcp-based Online Motion Planning for active visual search in indoor environments [89.43830036483901]
We focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup. Our POMP method uses as input the current pose of an agent and a RGB-D frame. We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1.
arXiv Detail & Related papers (2020-09-17T08:23:50Z)
Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.