Learning Task-Independent Game State Representations from Unlabeled
Images
- URL: http://arxiv.org/abs/2206.06490v1
- Date: Mon, 13 Jun 2022 21:37:58 GMT
- Title: Learning Task-Independent Game State Representations from Unlabeled
Images
- Authors: Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis, Georgios N.
Yannakakis
- Abstract summary: Self-supervised learning (SSL) techniques have been widely used to learn compact and informative representations from complex data.
This paper investigates whether SSL methods can be leveraged for the task of learning accurate state representations of games.
- Score: 2.570570340104555
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) techniques have been widely used to learn
compact and informative representations from high-dimensional complex data. In
many computer vision tasks, such as image classification, such methods achieve
state-of-the-art results that surpass supervised learning approaches. In this
paper, we investigate whether SSL methods can be leveraged for the task of
learning accurate state representations of games, and if so, to what extent.
For this purpose, we collect game footage frames and corresponding sequences of
games' internal state from three different 3D games: VizDoom, the CARLA racing
simulator and the Google Research Football Environment. We train an image
encoder with three widely used SSL algorithms using solely the raw frames, and
then attempt to recover the internal state variables from the learned
representations. Our results across all three games showcase significantly
higher correlation between SSL representations and the game's internal state
compared to pre-trained baseline models such as ImageNet. Such findings suggest
that SSL-based visual encoders can yield general -- not tailored to a specific
task -- yet informative game representations solely from game pixel
information. Such representations can, in turn, form the basis for boosting the
performance of downstream learning tasks in games, including gameplaying,
content generation and player modeling.
Related papers
- PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Do SSL Models Have D\'ej\`a Vu? A Case of Unintended Memorization in
Self-supervised Learning [47.46863155263094]
Self-supervised learning (SSL) algorithms can produce useful image representations by learning to associate different parts of natural images with one another.
SSL models can unintendedly memorize specific parts in individual training samples rather than learning semantically meaningful associations.
We show that given the trained model and a crop of a training image containing only the background, it is possible to infer the foreground object with high accuracy.
arXiv Detail & Related papers (2023-04-26T22:29:49Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - Game State Learning via Game Scene Augmentation [2.570570340104555]
We introduce a new game scene augmentation technique -- named GameCLR -- that takes advantage of the game-engine to define and synthesize specific, highly-controlled renderings of different game states.
Our results suggest that GameCLR can infer the game's state information from game footage more accurately compared to the baseline.
arXiv Detail & Related papers (2022-07-04T09:40:14Z) - Self Supervised Learning for Few Shot Hyperspectral Image Classification [57.2348804884321]
We propose to leverage Self Supervised Learning (SSL) for HSI classification.
We show that by pre-training an encoder on unlabeled pixels using Barlow-Twins, a state-of-the-art SSL algorithm, we can obtain accurate models with a handful of labels.
arXiv Detail & Related papers (2022-06-24T07:21:53Z) - Bag of Image Patch Embedding Behind the Success of Self-Supervised
Learning [12.480529556920974]
This work shows that joint-embedding SSL approaches learn a representation of image patches, which reflects their co-occurrence.
We empirically show that learning a representation for fixed-scale patches and aggregating local patch representations as the image representation achieves similar or even better results than the baseline methods.
arXiv Detail & Related papers (2022-06-17T18:11:23Z) - Self-supervised Learning for Sonar Image Classification [6.1947705963945845]
Self-supervised learning has proved to be a powerful approach to learn image representations without the need of large labeled datasets.
We present pre-training and transfer learning results on real-life sonar image datasets.
arXiv Detail & Related papers (2022-04-20T08:58:35Z) - Scaling Up Visual and Vision-Language Representation Learning With Noisy
Text Supervision [57.031588264841]
We leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps.
A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss.
We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
arXiv Detail & Related papers (2021-02-11T10:08:12Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.