MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning
- URL: http://arxiv.org/abs/2410.06513v1
- Date: Wed, 9 Oct 2024 03:27:14 GMT
- Title: MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning
- Authors: Xiaoyang Liu, Yunyao Mao, Wengang Zhou, Houqiang Li,
- Abstract summary: We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks.
Our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model.
In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate optimality between text adherence, motion quality, and human preferences.
- Score: 99.09906827676748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks and aligning them with human preferences. Previous works focused on improving numerical performance metrics on the given datasets, often neglecting the variability and subjectivity of human feedback. In contrast, our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model, allowing it to generate motions that better align human preferences. In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate Pareto optimality between text adherence, motion quality, and human preferences. Extensive experiments and user studies demonstrate that MotionRL not only allows control over the generated results across different objectives but also significantly enhances performance across these metrics compared to other algorithms.
Related papers
- TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings [61.9257731511557]
We propose Text Guided LLaVA (TG-LLaVA) to optimize vision-language models (VLMs)
We use learnable latent embeddings as a bridge to analyze textual instruction and add the analysis results to the vision encoder as guidance.
With the guidance of text, the vision encoder can extract text-related features, similar to how humans focus on the most relevant parts of an image when considering a question.
arXiv Detail & Related papers (2024-09-15T00:38:34Z) - Style Transfer with Multi-iteration Preference Optimization [27.5647739554034]
We consider the relationship between reinforcement learning and preference optimization.
Inspired by these techniques from the past, we improve upon established preference optimization approaches.
We evaluate our model on two commonly used text style transfer datasets.
arXiv Detail & Related papers (2024-06-17T14:20:53Z) - Learning Generalizable Human Motion Generator with Reinforcement Learning [95.62084727984808]
Text-driven human motion generation is one of the vital tasks in computer-aided content creation.
Existing methods often overfit specific motion expressions in the training data, hindering their ability to generalize.
We present textbfInstructMotion, which incorporate the trail and error paradigm in reinforcement learning for generalizable human motion generation.
arXiv Detail & Related papers (2024-05-24T13:29:12Z) - Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects.
We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z) - MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - TLControl: Trajectory and Language Control for Human Motion Synthesis [68.09806223962323]
We present TLControl, a novel method for realistic human motion synthesis.
It incorporates both low-level Trajectory and high-level Language semantics controls.
It is practical for interactive and high-quality animation generation.
arXiv Detail & Related papers (2023-11-28T18:54:16Z) - Improving Human Motion Prediction Through Continual Learning [2.720960618356385]
Human motion prediction is an essential component for enabling closer human-robot collaboration.
It is compounded by the variability of human motion, both at a skeletal level due to the varying size of humans and at a motion level due to individual movement idiosyncrasies.
We propose a modular sequence learning approach that allows end-to-end training while also having the flexibility of being fine-tuned.
arXiv Detail & Related papers (2021-07-01T15:34:41Z) - Multi-grained Trajectory Graph Convolutional Networks for
Habit-unrelated Human Motion Prediction [4.070072825448614]
A multigrained graph convolutional networks based lightweight framework is proposed for habit-unrelated human motion prediction.
A new motion generation method is proposed to generate the motion with left-handedness, to better model the motion with less bias to the human habit.
Experimental results on challenging datasets, including Humantemporal3.6M and CMU Mocap, show that the proposed model outperforms state-of-the-art with less than 0.12 times parameters.
arXiv Detail & Related papers (2020-12-23T09:41:50Z) - Contextual Latent-Movements Off-Policy Optimization for Robotic
Manipulation Skills [41.140532647789456]
We propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics.
We introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO)
LAMPO provides sample-efficient policies against common approaches in literature.
arXiv Detail & Related papers (2020-10-26T17:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.