Neural Policy Style Transfer
- URL: http://arxiv.org/abs/2402.00677v1
- Date: Thu, 1 Feb 2024 15:37:42 GMT
- Title: Neural Policy Style Transfer
- Authors: Raul Fernandez-Fernandez, Juan G. Victores, Jennifer J. Gago, David
Estevez, Carlos Balaguer
- Abstract summary: Style Transfer has been proposed in a number of fields: fine arts, natural language processing, and fixed trajectories.
We scale this concept up to control policies within a Deep Reinforcement Learning infrastructure.
The expressive power of deep neural networks enables encoding a secondary task, which can be described as the style.
- Score: 3.1158660854608824
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Style Transfer has been proposed in a number of fields: fine arts, natural
language processing, and fixed trajectories. We scale this concept up to
control policies within a Deep Reinforcement Learning infrastructure. Each
network is trained to maximize the expected reward, which typically encodes the
goal of an action, and can be described as the content. The expressive power of
deep neural networks enables encoding a secondary task, which can be described
as the style. The Neural Policy Style Transfer (NPST) algorithm is proposed to
transfer the style of one policy to another, while maintaining the content of
the latter. Different policies are defined via Deep Q-Network architectures.
These models are trained using demonstrations through Inverse Reinforcement
Learning. Two different sets of user demonstrations are performed, one for
content and other for style. Different styles are encoded as defined by user
demonstrations. The generated policy is the result of feeding a content policy
and a style policy to the NPST algorithm. Experiments are performed in a
catch-ball game inspired by the Deep Reinforcement Learning classical Atari
games; and a real-world painting scenario with a full-sized humanoid robot,
based on previous works of the authors. The implementation of three different
Q-Network architectures (Shallow, Deep and Deep Recurrent Q-Network) to encode
the policies within the NPST framework is proposed and the results obtained in
the experiments with each of these architectures compared.
Related papers
- Dense Policy: Bidirectional Autoregressive Learning of Actions [51.60428100831717]
This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction.
It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner.
Experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies.
arXiv Detail & Related papers (2025-03-17T14:28:08Z) - AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation [65.01527698201956]
Non-autoregressive Transformers (NATs) are able to synthesize images with decent quality in a small number of steps.
We propose AdaNAT, a learnable approach that automatically configures a suitable policy tailored for every sample to be generated.
arXiv Detail & Related papers (2024-08-31T03:53:57Z) - SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained
Networks [52.766795949716986]
We present a study of the generalization capabilities of the pre-trained visual representations at the categorical level.
We propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy.
arXiv Detail & Related papers (2023-07-07T13:01:29Z) - Residual Q-Learning: Offline and Online Policy Customization without
Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations.
We formulate a new problem setting called policy customization.
We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z) - Hierarchical Neural Dynamic Policies [50.969565411919376]
We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input.
We use hierarchical deep policy learning framework called Hierarchical Neural Dynamical Policies (H-NDPs)
H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space.
We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results.
arXiv Detail & Related papers (2021-07-12T17:59:58Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z) - Randomized Policy Learning for Continuous State and Action MDPs [8.109579454896128]
We present textttRANDPOL, a generalized policy iteration algorithm for MDPs with continuous state and action spaces.
We show the numerical performance on challenging environments and compare them with deep neural network based algorithms.
arXiv Detail & Related papers (2020-06-08T02:49:47Z) - Tree-Structured Policy based Progressive Reinforcement Learning for
Temporally Language Grounding in Video [128.08590291947544]
Temporally language grounding in untrimmed videos is a newly-raised task in video understanding.
Inspired by human's coarse-to-fine decision-making paradigm, we formulate a novel Tree-Structured Policy based Progressive Reinforcement Learning framework.
arXiv Detail & Related papers (2020-01-18T15:08:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.