From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning
- URL: http://arxiv.org/abs/2412.08920v1
- Date: Thu, 12 Dec 2024 04:06:54 GMT
- Title: From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning
- Authors: Pusen Dong, Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li,
- Abstract summary: We introduce the Trajectory-level Textual Constraints Translator (TTCT) to replace the manually designed cost function.
Our empirical results demonstrate that TTCT effectively comprehends textual constraint and trajectory, and the policies trained by TTCT can achieve a lower violation rate than the standard cost function.
- Score: 11.862238338875578
- License:
- Abstract: Safe reinforcement learning (RL) requires the agent to finish a given task while obeying specific constraints. Giving constraints in natural language form has great potential for practical scenarios due to its flexible transfer capability and accessibility. Previous safe RL methods with natural language constraints typically need to design cost functions manually for each constraint, which requires domain expertise and lacks flexibility. In this paper, we harness the dual role of text in this task, using it not only to provide constraint but also as a training signal. We introduce the Trajectory-level Textual Constraints Translator (TTCT) to replace the manually designed cost function. Our empirical results demonstrate that TTCT effectively comprehends textual constraint and trajectory, and the policies trained by TTCT can achieve a lower violation rate than the standard cost function. Extra studies are conducted to demonstrate that the TTCT has zero-shot transfer capability to adapt to constraint-shift environments.
Related papers
- Safe Multi-agent Reinforcement Learning with Natural Language Constraints [49.01100552946231]
The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked.
We propose a novel approach named Safe Multi-agent Reinforcement Learning with Natural Language constraints (SMALL)
Our method leverages fine-tuned language models to interpret and process free-form textual constraints, converting them into semantic embeddings.
These embeddings are then integrated into the multi-agent policy learning process, enabling agents to learn policies that minimize constraint violations while optimizing rewards.
arXiv Detail & Related papers (2024-05-30T12:57:35Z) - Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model [50.339632513018934]
supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences.
We critically examine this hypothesis within the scope of cross-lingual generation tasks.
We introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens.
arXiv Detail & Related papers (2024-04-25T17:19:36Z) - From Instructions to Constraints: Language Model Alignment with
Automatic Constraint Verification [70.08146540745877]
We investigate common constraints in NLP tasks, categorize them into three classes based on the types of their arguments.
We propose a unified framework, ACT (Aligning to ConsTraints), to automatically produce supervision signals for user alignment with constraints.
arXiv Detail & Related papers (2024-03-10T22:14:54Z) - Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models [36.44404825103045]
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints.
We propose to use pre-trained language models (LM) to facilitate RL agents' comprehension of natural language constraints.
Our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints.
arXiv Detail & Related papers (2024-01-15T09:37:03Z) - Disambiguated Lexically Constrained Neural Machine Translation [20.338107081523212]
Current approaches to LCNMT assume that the pre-specified lexical constraints are contextually appropriate.
We propose disambiguated LCNMT (D-LCNMT) to solve the problem.
D-LCNMT is a robust and effective two-stage framework that disambiguates the constraints based on contexts at first, then integrates the disambiguated constraints into LCNMT.
arXiv Detail & Related papers (2023-05-27T03:15:10Z) - Controlled Text Generation with Natural Language Instructions [74.88938055638636]
InstructCTG is a controlled text generation framework that incorporates different constraints.
We first extract the underlying constraints of natural texts through a combination of off-the-shelf NLP tools and simple verbalizes.
By prepending natural language descriptions of the constraints and a few demonstrations, we fine-tune a pre-trained language model to incorporate various types of constraints.
arXiv Detail & Related papers (2023-04-27T15:56:34Z) - SaFormer: A Conditional Sequence Modeling Approach to Offline Safe
Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications.
We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z) - COLD Decoding: Energy-based Constrained Text Generation with Langevin
Dynamics [69.8062252611486]
Cold decoding is a flexible framework that can be applied directly to off-the-shelf left-to-right language models.
Our experiments on constrained generation tasks point to the effectiveness of our approach, both in terms of automatic and human evaluation.
arXiv Detail & Related papers (2022-02-23T18:59:27Z) - Safe Reinforcement Learning with Natural Language Constraints [39.70152978025088]
We propose learning to interpret natural language constraints for safe RL.
HazardWorld is a new multi-task benchmark that requires an agent to optimize reward while not violating constraints specified in free-form text.
We show that our method achieves higher rewards (up to 11x) and fewer constraint violations (by 1.8x) compared to existing approaches.
arXiv Detail & Related papers (2020-10-11T03:41:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.