Deep learning for video game genre classification
- URL: http://arxiv.org/abs/2011.12143v1
- Date: Sat, 21 Nov 2020 22:31:43 GMT
- Title: Deep learning for video game genre classification
- Authors: Yuhang Jiang, Lukun Zheng
- Abstract summary: This paper proposes a multi-modal deep learning framework to solve this problem.
We compiles a large dataset consisting of 50,000 video games from 21 genres made of cover images, description text, and title text and the genre information.
Results show that the multi-modal framework outperforms the current state-of-the-art image-based or text-based models.
- Score: 2.66512000865131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video game genre classification based on its cover and textual description
would be utterly beneficial to many modern identification, collocation, and
retrieval systems. At the same time, it is also an extremely challenging task
due to the following reasons: First, there exists a wide variety of video game
genres, many of which are not concretely defined. Second, video game covers
vary in many different ways such as colors, styles, textual information, etc,
even for games of the same genre. Third, cover designs and textual descriptions
may vary due to many external factors such as country, culture, target reader
populations, etc. With the growing competitiveness in the video game industry,
the cover designers and typographers push the cover designs to its limit in the
hope of attracting sales. The computer-based automatic video game genre
classification systems become a particularly exciting research topic in recent
years. In this paper, we propose a multi-modal deep learning framework to solve
this problem. The contribution of this paper is four-fold. First, we compiles a
large dataset consisting of 50,000 video games from 21 genres made of cover
images, description text, and title text and the genre information. Second,
image-based and text-based, state-of-the-art models are evaluated thoroughly
for the task of genre classification for video games. Third, we developed an
efficient and salable multi-modal framework based on both images and texts.
Fourth, a thorough analysis of the experimental results is given and future
works to improve the performance is suggested. The results show that the
multi-modal framework outperforms the current state-of-the-art image-based or
text-based models. Several challenges are outlined for this task. More efforts
and resources are needed for this classification task in order to reach a
satisfactory level.
Related papers
- StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion [78.1014542102578]
Story visualization aims to generate realistic and coherent images based on a storyline.
Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner.
We propose a bidirectional, unified, and efficient framework, namely StoryImager.
arXiv Detail & Related papers (2024-04-09T03:22:36Z) - Panel Transitions for Genre Analysis in Visual Narratives [1.320904960556043]
We present a novel approach to do a multi-modal analysis of genre based on comics and manga-style visual narratives.
We highlight some of the limitations and challenges of our existing computational approaches in modeling subjective labels.
arXiv Detail & Related papers (2023-12-14T08:05:09Z) - Intelligent Generation of Graphical Game Assets: A Conceptual Framework
and Systematic Review of the State of the Art [1.534667887016089]
Procedural content generation can be applied to a wide variety of tasks in games, from narratives, levels and sounds, to trees and weapons.
This paper explores state-of-the-art approaches to graphical asset generation, examining research from a wide range of applications, inside and outside of games.
arXiv Detail & Related papers (2023-11-16T18:36:16Z) - Towards General Game Representations: Decomposing Games Pixels into
Content and Style [2.570570340104555]
Learning pixel representations of games can benefit artificial intelligence across several downstream tasks.
This paper explores how generalizable pre-trained computer vision encoders can be for such tasks.
We employ a pre-trained Vision Transformer encoder and a decomposition technique based on game genres to obtain separate content and style embeddings.
arXiv Detail & Related papers (2023-07-20T17:53:04Z) - Vision-Language Pre-training: Basics, Recent Advances, and Future Trends [158.34830433299268]
Vision-language pre-training methods for multimodal intelligence have been developed in the last few years.
For each category, we present a comprehensive review of state-of-the-art methods, and discuss the progress that has been made and challenges still being faced.
In addition, we discuss advanced topics being actively explored in the research community, such as big foundation models, unified modeling, in-context few-shot learning, knowledge, robustness, and computer vision in the wild, to name a few.
arXiv Detail & Related papers (2022-10-17T17:11:36Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Deep multi-modal networks for book genre classification based on its
cover [0.0]
We propose a multi-modal deep learning framework to solve the cover-based book classification problem.
Our method adds an extra modality by extracting texts automatically from the book covers.
Results show that the multi-modal framework significantly outperforms the current state-of-the-art image-based models.
arXiv Detail & Related papers (2020-11-15T23:27:43Z) - A Unified Framework for Shot Type Classification Based on Subject
Centric Lens [89.26211834443558]
We propose a learning framework for shot type recognition using Subject Guidance Network (SGNet)
SGNet separates the subject and background of a shot into two streams, serving as separate guidance maps for scale and movement type classification respectively.
We build a large-scale dataset MovieShots, which contains 46K shots from 7K movie trailers with annotations of their scale and movement types.
arXiv Detail & Related papers (2020-08-08T15:49:40Z) - Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene
Text [93.08109196909763]
We propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN)
It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively.
It then introduces three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities.
arXiv Detail & Related papers (2020-03-31T05:56:59Z) - Learning Dynamic Belief Graphs to Generalize on Text-Based Games [55.59741414135887]
Playing text-based games requires skills in processing natural language and sequential decision making.
In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured representations learned end-to-end from raw text.
arXiv Detail & Related papers (2020-02-21T04:38:37Z) - Fine-grained Image Classification and Retrieval by Combining Visual and
Locally Pooled Textual Features [8.317191999275536]
In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks.
In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities.
arXiv Detail & Related papers (2020-01-14T12:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.