Related papers: MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization

MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization

URL: http://arxiv.org/abs/2402.11711v1
Date: Sun, 18 Feb 2024 21:25:09 GMT
Title: MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization
Authors: Yasaman Jafari, Dheeraj Mekala, Rose Yu, Taylor Berg-Kirkpatrick
Abstract summary: RL-based techniques can be used to search for prompts that maximize a set of user-specified reward functions. Current techniques focus on maximizing the average of reward functions, which does not necessarily lead to prompts that achieve balance across rewards. In this paper, we adapt several techniques for multi-objective optimization to RL-based discrete prompt optimization.
Score: 49.60729578316884
License: http://creativecommons.org/licenses/by/4.0/
Abstract: RL-based techniques can be used to search for prompts that when fed into a target language model maximize a set of user-specified reward functions. However, in many target applications, the natural reward functions are in tension with one another -- for example, content preservation vs. style matching in style transfer tasks. Current techniques focus on maximizing the average of reward functions, which does not necessarily lead to prompts that achieve balance across rewards -- an issue that has been well-studied in the multi-objective and robust optimization literature. In this paper, we adapt several techniques for multi-objective optimization to RL-based discrete prompt optimization -- two that consider volume of the Pareto reward surface, and another that chooses an update direction that benefits all rewards simultaneously. We conduct an empirical analysis of these methods on two NLP tasks: style transfer and machine translation, each using three competing reward functions. Our experiments demonstrate that multi-objective methods that directly optimize volume perform better and achieve a better balance of all rewards than those that attempt to find monotonic update directions.

Related papers

Preference-based Multi-Objective Reinforcement Learning [5.031225669460861]
This paper introduces preference-based MORL (Pb-MORL), which formalizes the integration of preferences into the MORL framework.<n>To guide policy optimization using preferences, our method constructs a multi-objective reward model that aligns with the given preferences.<n>Experiments in benchmark multi-objective tasks, a multi-energy management task, and an autonomous driving task on a multi-line highway show that our method performs competitively.
arXiv Detail & Related papers (2025-07-18T16:43:04Z)
Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models [13.428939931403473]
We introduce RATTPO, a flexible test-time optimization method applicable across various reward scenarios without modification.<n>RATTPO searches for optimized prompts by querying large language models (LLMs) textitwithout requiring reward-specific task descriptions.<n> Empirical results demonstrate the versatility of RATTPO, effectively enhancing user prompts across diverse reward setups.
arXiv Detail & Related papers (2025-06-20T09:02:05Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Large Language Models Prompting With Episodic Memory [53.8690170372303]
We propose PrOmpting with Episodic Memory (POEM), a novel prompt optimization technique that is simple, efficient, and demonstrates strong generalization capabilities. In the testing phase, we optimize the sequence of examples for each test query by selecting the sequence that yields the highest total rewards from the top-k most similar training examples in the episodic memory. Our results show that POEM outperforms recent techniques like TEMPERA and RLPrompt by over 5.3% in various text classification tasks.
arXiv Detail & Related papers (2024-08-14T11:19:28Z)
Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems [60.91599969408029]
optimizing multiple objectives simultaneously is an important task for recommendation platforms. Existing multi-objective recommender systems do not systematically consider such dynamic relationships.
arXiv Detail & Related papers (2024-07-04T02:19:49Z)
Combining Automated Optimisation of Hyperparameters and Reward Shape [7.407166175374958]
We propose a methodology for the combined optimisation of hyperparameters and the reward function. We conducted extensive experiments using Proximal Policy optimisation and Soft Actor-Critic. Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others.
arXiv Detail & Related papers (2024-06-26T12:23:54Z)
Leveraging Trust for Joint Multi-Objective and Multi-Fidelity Optimization [0.0]
This paper investigates a novel approach to Bayesian multi-objective and multi-fidelity (MOMF) optimization. We suggest the innovative use of a trust metric to support simultaneous optimization of multiple objectives and data sources. Our methods offer broad applicability in solving simulation problems in fields such as plasma physics and fluid dynamics.
arXiv Detail & Related papers (2021-12-27T20:55:26Z)
Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks [17.370056935194786]
We propose Batch Multi-fidelity Bayesian Optimization with Deep Auto-Regressive Networks (BMBO-DARN) We use a set of Bayesian neural networks to construct a fully auto-regressive model, which is expressive enough to capture strong yet complex relationships across all fidelities. We develop a simple yet efficient batch querying method, without any search over fidelities.
arXiv Detail & Related papers (2021-06-18T02:55:48Z)
Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z)
DORB: Dynamically Optimizing Multiple Rewards with Bandits [101.68525259222164]
Policy-based reinforcement learning has proven to be a promising approach for optimizing non-differentiable evaluation metrics for language generation tasks. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit) We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks.
arXiv Detail & Related papers (2020-11-15T21:57:47Z)
Multi-Fidelity Multi-Objective Bayesian Optimization: An Output Space Entropy Search Approach [44.25245545568633]
We study the novel problem of blackbox optimization of multiple objectives via multi-fidelity function evaluations. Our experiments on several synthetic and real-world benchmark problems show that MF-OSEMO, with both approximations, significantly improves over the state-of-the-art single-fidelity algorithms.
arXiv Detail & Related papers (2020-11-02T06:59:04Z)
Resource Aware Multifidelity Active Learning for Efficient Optimization [0.8717253904965373]
This paper introduces the Resource Aware Active Learning (RAAL) strategy to accelerate the optimization of black box functions. The RAAL strategy optimally seeds multiple points at each allowing for a major speed up of the optimization task.
arXiv Detail & Related papers (2020-07-09T10:01:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.