Related papers: Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

URL: http://arxiv.org/abs/2510.13358v1
Date: Wed, 15 Oct 2025 09:45:24 GMT
Title: Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control
Authors: Shingo Ayabe, Hiroshi Kera, Kazuhiko Kawamoto,
Abstract summary: This study introduces an offline-to-online framework that trains policies on clean data and then performs adversarial fine-tuning.<n>A performance-aware curriculum adjusts the perturbation probability during training via an exponential-moving-average signal.<n>Experiments on continuous-control locomotion tasks demonstrate that the proposed method consistently improves robustness over offline-only baselines.
Score: 12.961180148172199
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study introduces an offline-to-online framework that trains policies on clean data and then performs adversarial fine-tuning, where perturbations are injected into executed actions to induce compensatory behavior and improve resilience. A performance-aware curriculum further adjusts the perturbation probability during training via an exponential-moving-average signal, balancing robustness and stability throughout the learning process. Experiments on continuous-control locomotion tasks demonstrate that the proposed method consistently improves robustness over offline-only baselines and converges faster than training from scratch. Matching the fine-tuning and evaluation conditions yields the strongest robustness to action-space perturbations, while the adaptive curriculum strategy mitigates the degradation of nominal performance observed with the linear curriculum strategy. Overall, the results show that adversarial fine-tuning enables adaptive and robust control under uncertain environments, bridging the gap between offline efficiency and online adaptability.

Related papers

Balance Equation-based Distributionally Robust Offline Imitation Learning [8.607736795429638]
Imitation Learning (IL) has proven highly effective for robotic and control tasks where manually designing reward functions or explicit controllers is infeasible.<n>Standard IL methods implicitly assume that the environment dynamics remain fixed between training and deployment.<n>We address this challenge through Balance Equation-based Distributionally Robust Offline Learning.<n>We formulate the problem as a distributionally robust optimization over an uncertainty set of transition models, seeking a policy that minimizes the imitation loss under the worst-case transition distribution.
arXiv Detail & Related papers (2025-11-11T07:48:09Z)
Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL [3.2883573376133555]
We introduce Behavior-Adaptive Q-Learning (BAQ), a framework designed to enable a smooth transition from offline to online RL.<n>BAQ incorporates a dual-objective loss that (i) aligns the online policy toward the offline behavior when uncertainty is high, and (ii) gradually relaxes this constraint as more confident online experience is accumulated.<n>Across standard benchmarks, BAQ consistently outperforms prior offline-to-online RL approaches, achieving faster recovery, improved robustness, and higher overall performance.
arXiv Detail & Related papers (2025-11-05T18:20:23Z)
Human-in-the-loop Online Rejection Sampling for Robotic Manipulation [55.99788088622936]
Hi-ORS stabilizes value estimation by filtering out negatively rewarded samples during online fine-tuning.<n>Hi-ORS fine-tunes a pi-base policy to master contact-rich manipulation in just 1.5 hours of real-world training.
arXiv Detail & Related papers (2025-10-30T11:53:08Z)
The Three Regimes of Offline-to-Online Reinforcement Learning [22.777667142224587]
offline-to-online reinforcement learning (RL) has emerged as a practical paradigm that leverages offline datasets for pretraining and online interactions for fine-tuning.<n>We propose a stability--plasticity principle that can explain this inconsistency.<n>This work provides a principled framework for guiding design choices in offline-to-online RL based on the relative performance of the offline dataset and the pretrained policy.
arXiv Detail & Related papers (2025-10-01T20:58:14Z)
Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction [9.509828265491064]
offline reinforcement learning (offline RL) offers a promising framework for developing control strategies in chemical process systems.<n>This work investigates the application of offline RL to the safe and efficient control of an exothermic polymerisation continuous stirred-tank reactor.
arXiv Detail & Related papers (2025-07-30T12:58:02Z)
Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space [3.639580365066386]
We propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training. The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance. The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments.
arXiv Detail & Related papers (2024-05-20T12:31:11Z)
Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance. We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z)
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z)
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z)
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment. During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data. We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability. Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z)
A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.