Related papers: Style Transfer with Multi-iteration Preference Optimization

Style Transfer with Multi-iteration Preference Optimization

URL: http://arxiv.org/abs/2406.11581v2
Date: Sun, 28 Jul 2024 04:33:57 GMT
Title: Style Transfer with Multi-iteration Preference Optimization
Authors: Shuai Liu, Jonathan May,
Abstract summary: We consider the relationship between reinforcement learning and preference optimization. Inspired by these techniques from the past, we improve upon established preference optimization approaches. We evaluate our model on two commonly used text style transfer datasets.
Score: 27.5647739554034
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerous recent techniques for text style transfer characterize their approaches as variants of reinforcement learning and preference optimization. In this work, we consider the relationship between these approaches and a class of optimization approaches developed primarily for (non-neural) statistical machine translation, formerly known as `tuning'. Inspired by these techniques from the past, we improve upon established preference optimization approaches, incorporating multiple iterations of exploration and optimization, and choosing contrastive examples by following a `hope' vs `fear' sampling strategy. Cognizant of the difference between machine translation and style transfer, however, we further tailor our framework with a new pseudo-parallel generation method and a dynamic weighted reward aggregation method to tackle the lack of parallel data and the need for a multi-objective reward. We evaluate our model on two commonly used text style transfer datasets. Through automatic and human evaluation results we show the effectiveness and the superiority of our model compared to state-of-the-art baselines.

Related papers

Preference-Guided Diffusion for Multi-Objective Offline Optimization [64.08326521234228]
We propose a preference-guided diffusion model for offline multi-objective optimization. Our guidance is a preference model trained to predict the probability that one design dominates another. Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions.
arXiv Detail & Related papers (2025-03-21T16:49:38Z)
Model Fusion through Bayesian Optimization in Language Model Fine-Tuning [16.86812534268461]
Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains. We introduce a novel model fusion technique that optimize both the desired metric and loss through multi-objective Bayesian optimization. Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.
arXiv Detail & Related papers (2024-11-11T04:36:58Z)
Preference Optimization with Multi-Sample Comparisons [53.02717574375549]
We introduce a novel approach that extends post-training to include multi-sample comparisons. These approaches fail to capture critical characteristics such as generative diversity and bias. We demonstrate that multi-sample comparison is more effective in optimizing collective characteristics than single-sample comparison.
arXiv Detail & Related papers (2024-10-16T00:59:19Z)
MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning [99.09906827676748]
We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks. Our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model. In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate optimality between text adherence, motion quality, and human preferences.
arXiv Detail & Related papers (2024-10-09T03:27:14Z)
WarpAdam: A new Adam optimizer based on Meta-Learning approach [0.0]
This study introduces an innovative approach that merges the 'warped gradient descend' concept from Meta Learning with the Adam. By introducing a learnable distortion matrix P within the adaptation matrix P, we aim to enhance the model's capability across diverse data distributions. Our research showcases potential of this novel approach through theoretical insights and empirical evaluations.
arXiv Detail & Related papers (2024-09-06T12:51:10Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts. We identify two pivotal factors in model parameter learning: update direction and update method. In particular, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Efficient Inverse Design Optimization through Multi-fidelity Simulations, Machine Learning, and Search Space Reduction Strategies [0.8646443773218541]
This paper introduces a methodology designed to augment the inverse design optimization process in scenarios constrained by limited compute. The proposed methodology is analyzed on two distinct engineering inverse design problems: airfoil inverse design and the scalar field reconstruction problem. Notably, this method is adaptable across any inverse design application, facilitating a synergy between a representative low-fidelity ML model, and high-fidelity simulation, and can be seamlessly applied across any variety of population-based optimization algorithms.
arXiv Detail & Related papers (2023-12-06T18:20:46Z)
Transfer Learning with Gaussian Processes for Bayesian Optimization [9.933956770453438]
We provide a unified view on hierarchical GP models for transfer learning, which allows us to analyze the relationship between methods. We develop a novel closed-form boosted GP transfer model that fits between existing approaches in terms of complexity. We evaluate the performance of the different approaches in large-scale experiments and highlight strengths and weaknesses of the different transfer-learning methods.
arXiv Detail & Related papers (2021-11-22T14:09:45Z)
Local and Global Context-Based Pairwise Models for Sentence Ordering [0.0]
In this paper, we put forward a set of robust local and global context-based pairwise ordering strategies. Our proposed encoding method utilizes the paragraph's rich global contextual information to predict the pairwise order. Analysis of the two proposed decoding strategies helps better explain error propagation in pairwise models.
arXiv Detail & Related papers (2021-10-08T17:57:59Z)
Meta Back-translation [111.87397401837286]
We propose a novel method to generate pseudo-parallel data from a pre-trained back-translation model. Our method is a meta-learning algorithm which adapts a pre-trained back-translation model so that the pseudo-parallel data it generates would train a forward-translation model to do well on a validation set.
arXiv Detail & Related papers (2021-02-15T20:58:32Z)
On Learning Text Style Transfer with Direct Rewards [101.97136885111037]
Lack of parallel corpora makes it impossible to directly train supervised models for the text style transfer task. We leverage semantic similarity metrics originally used for fine-tuning neural machine translation models. Our model provides significant gains in both automatic and human evaluation over strong baselines.
arXiv Detail & Related papers (2020-10-24T04:30:02Z)
Dynamic Data Selection and Weighting for Iterative Back-Translation [116.14378571769045]
We propose a curriculum learning strategy for iterative back-translation models. We evaluate our models on domain adaptation, low-resource, and high-resource MT settings. Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.
arXiv Detail & Related papers (2020-04-07T19:49:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.