Related papers: Learning Efficient Exploration through Human Seeded Rapidly-exploring Random Trees

Learning Efficient Exploration through Human Seeded Rapidly-exploring Random Trees

URL: http://arxiv.org/abs/2203.12774v1
Date: Wed, 23 Mar 2022 23:53:39 GMT
Title: Learning Efficient Exploration through Human Seeded Rapidly-exploring Random Trees
Authors: Max Zuo and Logan Schick and Matthew Gombolay and Nakul Gopalan
Abstract summary: We introduce RRT and behavior-cloning-assisted RRT in testing the number of game states searched and the time taken to explore those game states. We find HSRRT and CA-RRT both explore more game states in fewer tree/iterations when compared to the existing baseline. In our tested environments, CA-RRT was able to reach the same number of states as RRT by 5000 than 5000 fewer iterations on average, almost a 50% reduction.
Score: 1.2993951779393873
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Modern day computer games have extremely large state and action spaces. To detect bugs in these games' models, human testers play the games repeatedly to explore the game and find errors in the games. Such game play is exhaustive and time consuming. Moreover, since robotics simulators depend on similar methods of model specification and debugging, the problem of finding errors in the model is of interest for the robotics community to ensure robot behaviors and interactions are consistent in simulators. Previous methods have used reinforcement learning and search based methods including Rapidly-exploring Random Trees (RRT) to explore a game's state-action space to find bugs. However, such search and exploration based methods are not efficient at exploring the state-action space without a pre-defined heuristic. In this work we attempt to combine a human-tester's expertise in solving games, and the exhaustiveness of RRT to search a game's state space efficiently with high coverage. This paper introduces human-seeded RRT (HS-RRT) and behavior-cloning-assisted RRT (CA-RRT) in testing the number of game states searched and the time taken to explore those game states. We compare our methods to an existing weighted RRT baseline for game exploration testing studied. We find HS-RRT and CA-RRT both explore more game states in fewer tree expansions/iterations when compared to the existing baseline. In each test, CA-RRT reached more states on average in the same number of iterations as RRT. In our tested environments, CA-RRT was able to reach the same number of states as RRT by more than 5000 fewer iterations on average, almost a 50% reduction.

Related papers

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents [56.25101378553328]
We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored to human-aligned keyboard-mouse inputs.<n>Game-TARS is pre-trained on over 500B tokens with diverse trajectories and multimodal data.<n> Experiments show that Game-TARS achieves about 2 times the success rate over the previous sota model on open-world Minecraft tasks.
arXiv Detail & Related papers (2025-10-27T17:43:51Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
We introduce Random Latent Exploration (RLE), a simple yet effective exploration strategy in reinforcement learning (RL) On average, RLE outperforms noise-based methods, which perturb the agent's actions, and bonus-based exploration, which rewards the agent for attempting novel behaviors. RLE is as simple as noise-based methods, as it avoids complex bonus calculations but retains the deep exploration benefits of bonus-based methods.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World [46.02807945490169]
We show that imitating shortest-path planners in simulation produces agents that can proficiently navigate, explore, and manipulate objects in both simulation and in the real world using only RGB sensors (no depth map or GPS coordinates) This surprising result is enabled by our end-to-end, transformer-based, SPOC architecture, powerful visual encoders paired with extensive image augmentation.
arXiv Detail & Related papers (2023-12-05T18:59:45Z)
Go-Explore Complex 3D Game Environments for Automated Reachability Testing [4.322647881761983]
We propose an approach specifically targeted at reachability bugs in simulated 3D environments based on the powerful exploration algorithm, Go-Explore. Go-Explore saves unique checkpoints across the map and then identifies promising ones to explore from. Our algorithm can fully cover a vast 1.5km x 1.5km game world within 10 hours on a single machine.
arXiv Detail & Related papers (2022-09-01T16:31:37Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR) The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment. The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z)
The NetHack Learning Environment [79.06395964379107]
We present the NetHack Learning Environment (NLE), a procedurally generated rogue-like environment for Reinforcement Learning research. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration.
arXiv Detail & Related papers (2020-06-24T14:12:56Z)
AutoOD: Automated Outlier Detection via Curiosity-guided Search and Self-imitation Learning [72.99415402575886]
Outlier detection is an important data mining task with numerous practical applications. We propose AutoOD, an automated outlier detection framework, which aims to search for an optimal neural network model. Experimental results on various real-world benchmark datasets demonstrate that the deep model identified by AutoOD achieves the best performance.
arXiv Detail & Related papers (2020-06-19T18:57:51Z)
Smooth Exploration for Robotic Reinforcement Learning [11.215352918313577]
Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL leads to jerky motion patterns on real robots. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms.
arXiv Detail & Related papers (2020-05-12T12:28:25Z)
Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
Model-Based Reinforcement Learning for Atari [89.3039240303797]
We show how video prediction models can enable agents to solve Atari games with fewer interactions than model-free methods. Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environment.
arXiv Detail & Related papers (2019-03-01T15:40:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.