Composable Model-Free RL for Navigation with Input-Affine Systems
- URL: http://arxiv.org/abs/2602.12492v1
- Date: Fri, 13 Feb 2026 00:19:35 GMT
- Title: Composable Model-Free RL for Navigation with Input-Affine Systems
- Authors: Xinhuan Sang, Abdelrahman Abdelgawad, Roberto Tron,
- Abstract summary: As autonomous robots move into complex, dynamic real-world environments, they must learn to navigate safely in real time.<n>We propose a composable, model-free reinforcement learning method that learns a value function and an optimal policy for each individual environment element.
- Score: 3.2917282915992883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As autonomous robots move into complex, dynamic real-world environments, they must learn to navigate safely in real time, yet anticipating all possible behaviors is infeasible. We propose a composable, model-free reinforcement learning method that learns a value function and an optimal policy for each individual environment element (e.g., goal or obstacle) and composes them online to achieve goal reaching and collision avoidance. Assuming unknown nonlinear dynamics that evolve in continuous time and are input-affine, we derive a continuous-time Hamilton-Jacobi-Bellman (HJB) equation for the value function and show that the corresponding advantage function is quadratic in the action and optimal policy. Based on this structure, we introduce a model-free actor-critic algorithm that learns policies and value functions for static or moving obstacles using gradient descent. We then compose multiple reach/avoid models via a quadratically constrained quadratic program (QCQP), yielding formal obstacle-avoidance guarantees in terms of value-function level sets, providing a model-free alternative to CLF/CBF-based controllers. Simulations demonstrate improved performance over a PPO baseline applied to a discrete-time approximation.
Related papers
- Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics [6.208369829942616]
We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm.<n>ULD unifies the efficiency of model-free methods with the representational strengths of model-based approaches.<n> evaluated on 80 environments spanning Gym locomotion, DeepMind Control (proprioceptive and visual), and Atari.
arXiv Detail & Related papers (2026-02-13T06:06:56Z) - Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions [31.697208397735395]
Existing approaches embed task-specific value functions into constrained optimization programs or learn deterministic structured policies, sacrificing generality and policy expressiveness.<n>We propose a solver-induced emphlatent spherical flow policy that brings the expressiveness of modern generative policies to the RL while guaranteeing feasibility by design.<n>Our approach outperforms state-of-the-art baselines by an average of 20.6% across a range of challenging RL tasks.
arXiv Detail & Related papers (2026-01-29T18:49:07Z) - Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models [35.877107409163784]
Inverse reinforcement learning (IRL) and dynamic discrete choice (DDC) models explain sequential decision-making by recovering reward functions that rationalize observed behavior.<n>We develop a semiparametric framework for debiased inverse reinforcement learning that yields statistically efficient inference for a broad class of reward-dependent functionals.
arXiv Detail & Related papers (2025-12-30T18:41:05Z) - Operator Models for Continuous-Time Offline Reinforcement Learning [4.808981008878068]
Direct interaction with the environment is often unsafe or impractical, motivating offline reinforcement learning from historical data.<n>We address this by linking reinforcement learning to the Hamilton-Jacobi-Bellman equation and proposing an operator-theoretic algorithm.<n>Specifically, we represent our world model in terms of the infinitesimal generator of controlled diffusion processes learned in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2025-11-13T14:58:30Z) - Constrained Decoding for Robotics Foundation Models [12.916330118607918]
We introduce SafeDec, a constrained decoding framework for autoregressive robot foundation models.<n>Task-specific safety rules are expressed as Signal Temporal Logic (STL) formulas and are enforced at inference time with minimal overhead.<n>Our method ensures that generated actions provably satisfy STL specifications under assumed dynamics at runtime without retraining.
arXiv Detail & Related papers (2025-09-01T19:17:40Z) - Action Flow Matching for Continual Robot Learning [54.10050120844738]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks.<n>We introduce a generative framework leveraging flow matching for online robot dynamics model alignment.<n>We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Value Iteration in Continuous Actions, States and Time [99.00362538261972]
We propose a continuous fitted value iteration (cFVI) algorithm for continuous states and actions.
The optimal policy can be derived for non-linear control-affine dynamics.
Videos of the physical system are available at urlhttps://sites.google.com/view/value-iteration.
arXiv Detail & Related papers (2021-05-10T21:40:56Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function.
We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.