Joint Level Generation and Translation Using Gameplay Videos
- URL: http://arxiv.org/abs/2306.16662v1
- Date: Thu, 29 Jun 2023 03:46:44 GMT
- Title: Joint Level Generation and Translation Using Gameplay Videos
- Authors: Negar Mirgati and Matthew Guzdial
- Abstract summary: Procedural Content Generation via Machine Learning (PCGML) faces a significant hurdle that sets it apart from other fields, such as image or text generation.
Many existing methods for procedural level generation via machine learning require a secondary representation besides level images.
We develop a novel multi-tail framework that learns to perform simultaneous level translation and generation.
- Score: 0.9645196221785693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Procedural Content Generation via Machine Learning (PCGML) faces a
significant hurdle that sets it apart from other fields, such as image or text
generation, which is limited annotated data. Many existing methods for
procedural level generation via machine learning require a secondary
representation besides level images. However, the current methods for obtaining
such representations are laborious and time-consuming, which contributes to
this problem. In this work, we aim to address this problem by utilizing
gameplay videos of two human-annotated games to develop a novel multi-tail
framework that learns to perform simultaneous level translation and generation.
The translation tail of our framework can convert gameplay video frames to an
equivalent secondary representation, while its generation tail can produce
novel level segments. Evaluation results and comparisons between our framework
and baselines suggest that combining the level generation and translation tasks
can lead to an overall improved performance regarding both tasks. This
represents a possible solution to limited annotated level data, and we
demonstrate the potential for future versions to generalize to unseen games.
Related papers
- Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies [12.843274390224853]
Real-world tasks, like multimodal translation, often require a combination of these strengths, such as handling both translation and image processing.
We propose a novel zero-shot ensembling strategy that allows for the integration of different models during the decoding phase without the need for additional training.
Our approach re-ranks beams during decoding by combining scores at the word level, using multimodals to predict when a word is completed.
arXiv Detail & Related papers (2024-08-21T04:20:55Z) - Masked Generative Story Transformer with Character Guidance and Caption
Augmentation [2.1392064955842023]
Story visualization is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences.
Previous approaches either employ some kind of memory mechanism to maintain context throughout an auto-regressive generation of the image sequence, or model the generation of the characters and their background separately.
We propose a completely parallel transformer-based approach, relying on Cross-Attention with past and future captions to achieve consistency.
arXiv Detail & Related papers (2024-03-13T13:10:20Z) - Story Visualization by Online Text Augmentation with Context Memory [64.86944645907771]
We propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation.
The proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision.
arXiv Detail & Related papers (2023-08-15T05:08:12Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - End-to-end Generative Pretraining for Multimodal Video Captioning [82.79187814057313]
We present Multimodal Video Generative Pretraining (MV-GPT), a new pretraining framework for learning from unlabelled videos.
Unlike recent video-language pretraining frameworks, our framework trains both a multimodal video encoder and a sentence decoder jointly.
Our model achieves state-of-the-art performance for multimodal video captioning on four standard benchmarks.
arXiv Detail & Related papers (2022-01-20T16:16:21Z) - Video Generation from Text Employing Latent Path Construction for
Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study.
In this paper, we tackle the text to video generation problem, which is a conditional form of video generation.
We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Video Game Level Repair via Mixed Integer Linear Programming [20.815591392882716]
The proposed framework constructs levels using a generative adversarial network (GAN) trained with human-authored examples and repairs them using a mixed-integer linear program (MIP) with playability constraints.
Results show that the proposed framework generates a diverse range of playable levels, that capture the spatial relationships between objects exhibited in the human-authored levels.
arXiv Detail & Related papers (2020-10-13T18:37:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.