Reinfier and Reintrainer: Verification and Interpretation-Driven Safe Deep Reinforcement Learning Frameworks
- URL: http://arxiv.org/abs/2410.15127v2
- Date: Tue, 04 Feb 2025 14:01:02 GMT
- Title: Reinfier and Reintrainer: Verification and Interpretation-Driven Safe Deep Reinforcement Learning Frameworks
- Authors: Zixuan Yang, Jiaqi Zheng, Guihai Chen,
- Abstract summary: We propose a verification-driven interpretation-in-the-loop framework Reintrainer to develop trustworthy DRL models.<n>In each iteration, this framework measures the gap between the on-training model and predefined properties using formal verification.<n>Reinfier features breakpoints searching and verification-driven interpretation, associated with a concise constraint-encoding language DRLP.
- Score: 36.730973051834376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring verifiable and interpretable safety of deep reinforcement learning (DRL) is crucial for its deployment in real-world applications. Existing approaches like verification-in-the-loop training, however, face challenges such as difficulty in deployment, inefficient training, lack of interpretability, and suboptimal performance in property satisfaction and reward performance. In this work, we propose a novel verification-driven interpretation-in-the-loop framework Reintrainer to develop trustworthy DRL models, which are guaranteed to meet the expected constraint properties. Specifically, in each iteration, this framework measures the gap between the on-training model and predefined properties using formal verification, interprets the contribution of each input feature to the model's output, and then generates the training strategy derived from the on-the-fly measure results, until all predefined properties are proven. Additionally, the low reusability of existing verifiers and interpreters motivates us to develop Reinfier, a general and fundamental tool within Reintrainer for DRL verification and interpretation. Reinfier features breakpoints searching and verification-driven interpretation, associated with a concise constraint-encoding language DRLP. Evaluations demonstrate that Reintrainer outperforms the state-of-the-art on six public benchmarks in both performance and property guarantees. Our framework can be accessed at https://github.com/Kurayuri/Reinfier.
Related papers
- ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification [6.983144806500892]
ReVeal is a multi-turn reinforcement learning framework that interleaves code generation with explicit self-verification and tool-based evaluation.<n>It fosters the co-evolution of a model's generation and verification capabilities through RL training, expanding the reasoning boundaries of the base model.<n>It also enables test-time scaling into deeper inference regimes, with code consistently evolving as the number of turns increases during inference.
arXiv Detail & Related papers (2025-06-13T03:41:04Z) - VerIF: Verification Engineering for Reinforcement Learning in Instruction Following [55.60192044049083]
Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing large language models (LLMs)<n>We propose VerIF, a verification method that combines rule-based code verification with LLM-based verification from a large reasoning model.<n>We apply RL training with VerIF to two models, achieving significant improvements across several representative instruction-following benchmarks.
arXiv Detail & Related papers (2025-06-11T17:10:36Z) - Intra-Trajectory Consistency for Reward Modeling [67.84522106537274]
We develop an intra-trajectory consistency regularization to enforce that adjacent processes with higher next-token generation probability maintain more consistent rewards.<n>We show that the reward model trained with the proposed regularization induces better DPO-aligned policies and achieves better best-of-N (BON) inference-time verification results.
arXiv Detail & Related papers (2025-06-10T12:59:14Z) - Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards [11.149294285483782]
We propose a unified RLVR-based training paradigm that bridges the gap between non-verifiable tasks and verifiable rewards.<n>We introduce a writing-principle-based pairwise Generative Reward Model (GenRM) and a novel Bootstrapped Relative Policy Optimization (BRPO) algorithm.<n>Our approach empowers LLMs to develop robust writing capabilities without supervised fine-tuning.
arXiv Detail & Related papers (2025-05-30T14:34:57Z) - VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models [55.39064621869925]
OpenAI o1 and DeepSeek-R1 have achieved remarkable performance in the domain of reasoning.<n>A key component of their training is the incorporation of verifiable rewards within reinforcement learning.<n>Existing reward benchmarks do not evaluate reference-based reward systems.
arXiv Detail & Related papers (2025-05-21T17:54:43Z) - Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards [67.86091419220816]
Large Language Models (LLMs) show great promise in complex reasoning.<n>A prevalent issue is superficial self-reflection'', where models fail to robustly verify their own outputs.<n>We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this.
arXiv Detail & Related papers (2025-05-19T17:59:31Z) - Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation [52.83870601473094]
Embodied agents exhibit immense potential across a multitude of domains.
Existing research predominantly concentrates on the security of general large language models.
This paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents.
arXiv Detail & Related papers (2025-04-22T08:34:35Z) - VERA: Validation and Enhancement for Retrieval Augmented systems [0.0]
We propose textbfVERA (textbfValidation and textbfEnhancement for textbfRetrieval textbfAugmented systems), a system designed to evaluate and enhance the retrieved context before response generation.
VERA employs an evaluator-cum-enhancer LLM that first checks if external retrieval is necessary, evaluates the relevance and redundancy of the retrieved context, and refines it to eliminate non-essential information.
arXiv Detail & Related papers (2024-09-18T16:10:47Z) - InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales [14.655518998487237]
We propose InstructRAG, where LMs explicitly learn the denoising process through self-synthesized rationales.
InstructRAG requires no additional supervision, allows for easier verification of the predicted answers.
Experiments show InstructRAG consistently outperforms existing RAG methods in both training-free and trainable scenarios.
arXiv Detail & Related papers (2024-06-19T15:25:29Z) - RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning [50.55776190278426]
Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks.
We introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward algorithms.
arXiv Detail & Related papers (2024-05-29T22:23:20Z) - Towards Faithful Explanations for Text Classification with Robustness
Improvement and Explanation Guided Training [30.626080706755822]
Feature attribution methods highlight the important input tokens as explanations to model predictions.
Recent works show that explanations provided by these methods face challenges of being faithful and robust.
We propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification.
arXiv Detail & Related papers (2023-12-29T13:07:07Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Online Safety Property Collection and Refinement for Safe Deep
Reinforcement Learning in Mapless Navigation [79.89605349842569]
We introduce the Collection and Refinement of Online Properties (CROP) framework to design properties at training time.
CROP employs a cost signal to identify unsafe interactions and use them to shape safety properties.
We evaluate our approach in several robotic mapless navigation tasks and demonstrate that the violation metric computed with CROP allows higher returns and lower violations over previous Safe DRL approaches.
arXiv Detail & Related papers (2023-02-13T21:19:36Z) - Holistic Deep Learning [3.718942345103135]
This paper presents a novel holistic deep learning framework that addresses the challenges of vulnerability to input perturbations, overparametrization, and performance instability.
The proposed framework holistically improves accuracy, robustness, sparsity, and stability over standard deep learning models.
arXiv Detail & Related papers (2021-10-29T14:46:32Z) - BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward [23.176481887478634]
We show that the underlying operations, counting words and comparing counts, can be lifted to embedding words and comparing embeddings.
An in-depth analysis of BERT embeddings shows empirically that contextual embeddings can be employed to capture the required dependencies.
We cast unconditional generation as a reinforcement learning problem and show that our reward function indeed provides a more effective learning signal than n-gram reward in this challenging setting.
arXiv Detail & Related papers (2020-03-05T16:06:37Z) - Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional.
We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.