MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement
Learning for Discrete Prompt Optimization
- URL: http://arxiv.org/abs/2402.11711v1
- Date: Sun, 18 Feb 2024 21:25:09 GMT
- Title: MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement
Learning for Discrete Prompt Optimization
- Authors: Yasaman Jafari, Dheeraj Mekala, Rose Yu, Taylor Berg-Kirkpatrick
- Abstract summary: RL-based techniques can be used to search for prompts that maximize a set of user-specified reward functions.
Current techniques focus on maximizing the average of reward functions, which does not necessarily lead to prompts that achieve balance across rewards.
In this paper, we adapt several techniques for multi-objective optimization to RL-based discrete prompt optimization.
- Score: 49.60729578316884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: RL-based techniques can be used to search for prompts that when fed into a
target language model maximize a set of user-specified reward functions.
However, in many target applications, the natural reward functions are in
tension with one another -- for example, content preservation vs. style
matching in style transfer tasks. Current techniques focus on maximizing the
average of reward functions, which does not necessarily lead to prompts that
achieve balance across rewards -- an issue that has been well-studied in the
multi-objective and robust optimization literature. In this paper, we adapt
several techniques for multi-objective optimization to RL-based discrete prompt
optimization -- two that consider volume of the Pareto reward surface, and
another that chooses an update direction that benefits all rewards
simultaneously. We conduct an empirical analysis of these methods on two NLP
tasks: style transfer and machine translation, each using three competing
reward functions. Our experiments demonstrate that multi-objective methods that
directly optimize volume perform better and achieve a better balance of all
rewards than those that attempt to find monotonic update directions.
Related papers
- Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct
Preference Optimization [78.50294936259026]
We present Multi-Objective Direct Preference Optimization (MODPO) for multiple alignment objectives with minimal overheads.
MODPO folds language modeling directly into reward modeling, training LMs as implicit collective reward models (cRMs) that combine all objectives with specific weightings.
While theoretically guaranteed to produce the same optimal solutions as MORLHF, MODPO is practically more stable and computationally efficient.
arXiv Detail & Related papers (2023-10-05T17:35:26Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Leveraging Trust for Joint Multi-Objective and Multi-Fidelity
Optimization [0.0]
This paper investigates a novel approach to Bayesian multi-objective and multi-fidelity (MOMF) optimization.
We suggest the innovative use of a trust metric to support simultaneous optimization of multiple objectives and data sources.
Our methods offer broad applicability in solving simulation problems in fields such as plasma physics and fluid dynamics.
arXiv Detail & Related papers (2021-12-27T20:55:26Z) - Choosing the Best of Both Worlds: Diverse and Novel Recommendations
through Multi-Objective Reinforcement Learning [68.45370492516531]
We introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the Recommender Systems (RS) setting.
SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations.
Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.
arXiv Detail & Related papers (2021-10-28T13:22:45Z) - From STL Rulebooks to Rewards [4.859570041295978]
We propose a principled approach to shaping rewards for reinforcement learning from multiple objectives.
We first equip STL with a novel quantitative semantics allowing to automatically evaluate individual requirements.
We then develop a method for systematically combining evaluations of multiple requirements into a single reward.
arXiv Detail & Related papers (2021-10-06T14:16:59Z) - Reinforcement Learning Agent Training with Goals for Real World Tasks [3.747737951407512]
Reinforcement Learning (RL) is a promising approach for solving various control, optimization, and sequential decision making tasks.
We propose a specification language (Inkling Goal Specification) for complex control and optimization tasks.
We include a set of experiments showing that the proposed method provides great ease of use to specify a wide range of real world tasks.
arXiv Detail & Related papers (2021-07-21T23:21:16Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Multi-Fidelity Multi-Objective Bayesian Optimization: An Output Space
Entropy Search Approach [44.25245545568633]
We study the novel problem of blackbox optimization of multiple objectives via multi-fidelity function evaluations.
Our experiments on several synthetic and real-world benchmark problems show that MF-OSEMO, with both approximations, significantly improves over the state-of-the-art single-fidelity algorithms.
arXiv Detail & Related papers (2020-11-02T06:59:04Z) - Resource Aware Multifidelity Active Learning for Efficient Optimization [0.8717253904965373]
This paper introduces the Resource Aware Active Learning (RAAL) strategy to accelerate the optimization of black box functions.
The RAAL strategy optimally seeds multiple points at each allowing for a major speed up of the optimization task.
arXiv Detail & Related papers (2020-07-09T10:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.