Related papers: Learning Task-Independent Game State Representations from Unlabeled Images

Learning Task-Independent Game State Representations from Unlabeled Images

URL: http://arxiv.org/abs/2206.06490v1
Date: Mon, 13 Jun 2022 21:37:58 GMT
Title: Learning Task-Independent Game State Representations from Unlabeled Images
Authors: Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis
Abstract summary: Self-supervised learning (SSL) techniques have been widely used to learn compact and informative representations from complex data. This paper investigates whether SSL methods can be leveraged for the task of learning accurate state representations of games.
Score: 2.570570340104555
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-supervised learning (SSL) techniques have been widely used to learn compact and informative representations from high-dimensional complex data. In many computer vision tasks, such as image classification, such methods achieve state-of-the-art results that surpass supervised learning approaches. In this paper, we investigate whether SSL methods can be leveraged for the task of learning accurate state representations of games, and if so, to what extent. For this purpose, we collect game footage frames and corresponding sequences of games' internal state from three different 3D games: VizDoom, the CARLA racing simulator and the Google Research Football Environment. We train an image encoder with three widely used SSL algorithms using solely the raw frames, and then attempt to recover the internal state variables from the learned representations. Our results across all three games showcase significantly higher correlation between SSL representations and the game's internal state compared to pre-trained baseline models such as ImageNet. Such findings suggest that SSL-based visual encoders can yield general -- not tailored to a specific task -- yet informative game representations solely from game pixel information. Such representations can, in turn, form the basis for boosting the performance of downstream learning tasks in games, including gameplaying, content generation and player modeling.

Related papers

SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning [50.98341607245458]
Masked video modeling is an effective paradigm for video self-supervised learning (SSL) This paper introduces a novel SSL approach for video representation learning, dubbed as SMILE, by infusing both spatial and motion semantics. We establish a new self-supervised video learning paradigm capable of learning strong video representations without requiring any natural video data.
arXiv Detail & Related papers (2025-04-01T08:20:55Z)
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z)
CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images. CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z)
Do SSL Models Have D\'ej\`a Vu? A Case of Unintended Memorization in Self-supervised Learning [47.46863155263094]
Self-supervised learning (SSL) algorithms can produce useful image representations by learning to associate different parts of natural images with one another. SSL models can unintendedly memorize specific parts in individual training samples rather than learning semantically meaningful associations. We show that given the trained model and a crop of a training image containing only the background, it is possible to infer the foreground object with high accuracy.
arXiv Detail & Related papers (2023-04-26T22:29:49Z)
CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios. Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z)
Game State Learning via Game Scene Augmentation [2.570570340104555]
We introduce a new game scene augmentation technique -- named GameCLR -- that takes advantage of the game-engine to define and synthesize specific, highly-controlled renderings of different game states. Our results suggest that GameCLR can infer the game's state information from game footage more accurately compared to the baseline.
arXiv Detail & Related papers (2022-07-04T09:40:14Z)
Self Supervised Learning for Few Shot Hyperspectral Image Classification [57.2348804884321]
We propose to leverage Self Supervised Learning (SSL) for HSI classification. We show that by pre-training an encoder on unlabeled pixels using Barlow-Twins, a state-of-the-art SSL algorithm, we can obtain accurate models with a handful of labels.
arXiv Detail & Related papers (2022-06-24T07:21:53Z)
Bag of Image Patch Embedding Behind the Success of Self-Supervised Learning [12.480529556920974]
This work shows that joint-embedding SSL approaches learn a representation of image patches, which reflects their co-occurrence. We empirically show that learning a representation for fixed-scale patches and aggregating local patch representations as the image representation achieves similar or even better results than the baseline methods.
arXiv Detail & Related papers (2022-06-17T18:11:23Z)
Self-supervised Learning for Sonar Image Classification [6.1947705963945845]
Self-supervised learning has proved to be a powerful approach to learn image representations without the need of large labeled datasets. We present pre-training and transfer learning results on real-life sonar image datasets.
arXiv Detail & Related papers (2022-04-20T08:58:35Z)
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [57.031588264841]
We leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps. A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss. We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
arXiv Detail & Related papers (2021-02-11T10:08:12Z)
PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space. Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.