Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
- URL: http://arxiv.org/abs/2312.02312v2
- Date: Wed, 30 Apr 2025 17:44:55 GMT
- Title: Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
- Authors: Lukas Schäfer, Logan Jones, Anssi Kanervisto, Yuhan Cao, Tabish Rashid, Raluca Georgescu, Dave Bignell, Siddhartha Sen, Andrea Treviño Gavito, Sam Devlin,
- Abstract summary: We show that pre-trained visual encoders can make decision-making research in video games more accessible by significantly reducing the cost of training.<n>Our results show that end-to-end training can be effective with comparably low-resolution images and only minutes of demonstrations.<n>In addition to enabling effective decision making, we show that pre-trained encoders can make decision-making research in video games more accessible by significantly reducing the cost of training.
- Score: 14.500523121809907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video games have served as useful benchmarks for the decision-making community, but going beyond Atari games towards modern games has been prohibitively expensive for the vast majority of the research community. Prior work in modern video games typically relied on game-specific integration to obtain game features and enable online training, or on existing large datasets. An alternative approach is to train agents using imitation learning to play video games purely from images. However, this setting poses a fundamental question: which visual encoders obtain representations that retain information critical for decision making? To answer this question, we conduct a systematic study of imitation learning with publicly available pre-trained visual encoders compared to the typical task-specific end-to-end training approach in Minecraft, Counter-Strike: Global Offensive, and Minecraft Dungeons. Our results show that end-to-end training can be effective with comparably low-resolution images and only minutes of demonstrations, but significant improvements can be gained by utilising pre-trained encoders such as DINOv2 depending on the game. In addition to enabling effective decision making, we show that pre-trained encoders can make decision-making research in video games more accessible by significantly reducing the cost of training.
Related papers
- Playing Non-Embedded Card-Based Games with Reinforcement Learning [18.971623378904503]
We propose a non-embedded offline reinforcement learning training strategy to achieve real-time autonomous gameplay in the RTS game Clash Royale.
We extract features using state-of-the-art object detection and optical character recognition models.
Our method enables real-time image acquisition, perception feature fusion, decision-making, and control on mobile devices, successfully defeating built-in AI opponents.
arXiv Detail & Related papers (2025-04-07T07:26:02Z) - Across-Game Engagement Modelling via Few-Shot Learning [1.7969777786551424]
Domain generalisation involves learning AI models that can maintain high performance across diverse domains.
Video games present unique challenges and opportunities for the analysis of user experience.
We introduce a framework that decomposes the general domain-agnostic modelling of user experience into several domain-specific and game-dependent tasks.
arXiv Detail & Related papers (2024-09-19T16:21:21Z) - Serious Games in Digital Gaming: A Comprehensive Review of Applications,
Game Engines and Advancements [55.2480439325792]
In recent years, serious games have become increasingly popular due to their ability to simultaneously educate and entertain users.
In this review, we provide a comprehensive overview of the different types of digital games and expand on the serious games genre.
We present the most widely used game engines used in the game development industry and extend the Unity game machine advantages.
arXiv Detail & Related papers (2023-11-03T09:17:09Z) - Towards General Game Representations: Decomposing Games Pixels into
Content and Style [2.570570340104555]
Learning pixel representations of games can benefit artificial intelligence across several downstream tasks.
This paper explores how generalizable pre-trained computer vision encoders can be for such tasks.
We employ a pre-trained Vision Transformer encoder and a decomposition technique based on game genres to obtain separate content and style embeddings.
arXiv Detail & Related papers (2023-07-20T17:53:04Z) - Technical Challenges of Deploying Reinforcement Learning Agents for Game
Testing in AAA Games [58.720142291102135]
We describe an effort to add an experimental reinforcement learning system to an existing automated game testing solution based on scripted bots.
We show a use-case of leveraging reinforcement learning in game production and cover some of the largest time sinks anyone who wants to make the same journey for their game may encounter.
We propose a few research directions that we believe will be valuable and necessary for making machine learning, and especially reinforcement learning, an effective tool in game production.
arXiv Detail & Related papers (2023-07-19T18:19:23Z) - Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion
Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators.
It allows a user to play the game by prompting it with high- and low-level action sequences.
Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.
Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z) - Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online
Videos [16.858980871368175]
We extend the internet-scale pretraining paradigm to sequential decision domains through semi-trivial imitation learning.
We show that this behavioral prior has non zero-shot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning.
For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craft diamond tools.
arXiv Detail & Related papers (2022-06-23T16:01:11Z) - Playful Interactions for Representation Learning [82.59215739257104]
We propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks.
We collect 2 hours of playful data in 19 diverse environments and use self-predictive learning to extract visual representations.
Our representations generalize better than standard behavior cloning and can achieve similar performance with only half the number of required demonstrations.
arXiv Detail & Related papers (2021-07-19T17:54:48Z) - Unsupervised Visual Representation Learning by Tracking Patches in Video [88.56860674483752]
We propose to use tracking as a proxy task for a computer vision system to learn the visual representations.
Modelled on the Catch game played by the children, we design a Catch-the-Patch (CtP) game for a 3D-CNN model to learn visual representations.
arXiv Detail & Related papers (2021-05-06T09:46:42Z) - Designing a mobile game to generate player data -- lessons learned [2.695466667982714]
We developed a mobile game without the guidance of similar projects.
Research into game balancing and system simulation required an experimental case study.
In creating RPGLitewe learned a series of lessons about effective amateur game development for research purposes.
arXiv Detail & Related papers (2021-01-18T16:16:58Z) - Deep Learning Techniques for Super-Resolution in Video Games [91.3755431537592]
Computer scientists need to develop new ways to improve the performance of graphical processing hardware.
Deep learning techniques for video super-resolution can enable video games to have high quality graphics whilst offsetting much of the computational cost.
arXiv Detail & Related papers (2020-12-17T18:22:05Z) - DeepCrawl: Deep Reinforcement Learning for Turn-based Strategy Games [137.86426963572214]
We introduce DeepCrawl, a fully-playable Roguelike prototype for iOS and Android in which all agents are controlled by policy networks trained using Deep Reinforcement Learning (DRL)
Our aim is to understand whether recent advances in DRL can be used to develop convincing behavioral models for non-player characters in videogames.
arXiv Detail & Related papers (2020-12-03T13:53:29Z) - Generating Gameplay-Relevant Art Assets with Transfer Learning [0.8164433158925593]
We propose a Convolutional Variational Autoencoder (CVAE) system to modify and generate new game visuals based on gameplay relevance.
Our experimental results indicate that adopting a transfer learning approach can help to improve visual quality and stability over unseen data.
arXiv Detail & Related papers (2020-10-04T20:58:40Z) - "It's Unwieldy and It Takes a Lot of Time." Challenges and Opportunities
for Creating Agents in Commercial Games [20.63320049616144]
Game agents such as opponents, non-player characters, and teammates are central to player experiences in many modern games.
As the landscape of AI techniques used in the games industry evolves to adopt machine learning (ML) more widely, it is vital that the research community learn from the best practices cultivated within the industry over decades creating agents.
We interviewed seventeen game agent creators from AAA studios, indie studios, and industrial research labs about the challenges they experienced with their professional literature.
arXiv Detail & Related papers (2020-09-01T16:21:19Z) - Benchmarking End-to-End Behavioural Cloning on Video Games [5.863352129133669]
We study the general applicability of behavioural cloning on twelve video games, including six modern video games (published after 2010)
Our results show that these agents cannot match humans in raw performance but do learn basic dynamics and rules.
We also demonstrate how the quality of the data matters, and how recording data from humans is subject to a state-action mismatch, due to human reflexes.
arXiv Detail & Related papers (2020-04-02T13:31:51Z) - Disentangling Controllable Object through Video Prediction Improves
Visual Reinforcement Learning [82.25034245150582]
In many vision-based reinforcement learning problems, the agent controls a movable object in its visual field.
We propose an end-to-end learning framework to disentangle the controllable object from the observation signal.
The disentangled representation is shown to be useful for RL as additional observation channels to the agent.
arXiv Detail & Related papers (2020-02-21T05:43:34Z) - Neural MMO v1.3: A Massively Multiagent Game Environment for Training
and Evaluating Neural Networks [48.5733173329785]
We present Neural MMO, a massively multiagent game environment inspired by MMOs.
We discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.
arXiv Detail & Related papers (2020-01-31T18:50:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.