Related papers: RLSS: A Deep Reinforcement Learning Algorithm for Sequential Scene Generation

RLSS: A Deep Reinforcement Learning Algorithm for Sequential Scene Generation

URL: http://arxiv.org/abs/2206.02544v1
Date: Wed, 1 Jun 2022 08:39:33 GMT
Title: RLSS: A Deep Reinforcement Learning Algorithm for Sequential Scene Generation
Authors: Azimkhon Ostonov, Peter Wonka, Dominik L. Michels
Abstract summary: We present RLSS: a reinforcement learning algorithm for sequential scene generation. We consider how to effectively reduce the action space by including a greedy search algorithm in the learning process. We demonstrate the capability of our method to generate plausible and diverse scenes efficiently by solving indoor planning problems and generating Angry Birds levels.
Score: 44.8048196322934
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present RLSS: a reinforcement learning algorithm for sequential scene generation. This is based on employing the proximal policy optimization (PPO) algorithm for generative problems. In particular, we consider how to effectively reduce the action space by including a greedy search algorithm in the learning process. Our experiments demonstrate that our method converges for a relatively large number of actions and learns to generate scenes with predefined design objectives. This approach is placing objects iteratively in the virtual scene. In each step, the network chooses which objects to place and selects positions which result in maximal reward. A high reward is assigned if the last action resulted in desired properties whereas the violation of constraints is penalized. We demonstrate the capability of our method to generate plausible and diverse scenes efficiently by solving indoor planning problems and generating Angry Birds levels.

Related papers

Enhancing Path Planning Performance through Image Representation Learning of High-Dimensional Configuration Spaces [0.4143603294943439]
We present a novel method for accelerating path-planning tasks in unknown scenes with obstacles. We approximate the distribution of waypoints for a collision-free path using the Rapidly-exploring Random Tree algorithm. Our experiments demonstrate promising results in accelerating path-planning tasks under critical time constraints.
arXiv Detail & Related papers (2025-01-11T21:14:52Z)
Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks [3.1594865504808944]
We propose an RL algorithm that automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks. We evaluate our algorithm in a variety of sparse-reward environments.
arXiv Detail & Related papers (2024-01-25T15:06:40Z)
Clustering-based Domain-Incremental Learning [4.835091081509403]
Key challenge in continual learning is the so-called "catastrophic forgetting problem" We propose an online clustering-based approach on a dynamically updated finite pool of samples or gradients. We demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-09-21T13:49:05Z)
Contextual Exploration Using a Linear Approximation Method Based on Satisficing [0.0]
The amount of exploration required for learning is often quite large. Deep reinforcement learning also has super-human performance in that no human being would be able to achieve such amounts of exploration. We propose Linear RS (LinRS), which is a type of satisficing algorithm and a linear extension of risk-sensitive satisficing (RS)
arXiv Detail & Related papers (2021-12-13T07:14:01Z)
C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation. We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states. E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z)
The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function. We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
An Efficient Algorithm for Deep Stochastic Contextual Bandits [10.298368632706817]
In contextual bandit problems, an agent selects an action based on certain observed context to maximize the reward over iterations. Recently there have been a few studies using a deep neural network (DNN) to predict the expected reward for an action, and is trained by a gradient based method.
arXiv Detail & Related papers (2021-04-12T16:34:43Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
GRAC: Self-Guided and Self-Regularized Actor-Critic [24.268453994605512]
We propose a self-regularized TD-learning method to address divergence without requiring a target network. We also propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.
arXiv Detail & Related papers (2020-09-18T17:58:29Z)
Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances. In this paper, we tackle this varying depth problem using a steerable architecture. We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.