Game-MUG: Multimodal Oriented Game Situation Understanding and Commentary Generation Dataset
- URL: http://arxiv.org/abs/2404.19175v1
- Date: Tue, 30 Apr 2024 00:39:26 GMT
- Title: Game-MUG: Multimodal Oriented Game Situation Understanding and Commentary Generation Dataset
- Authors: Zhihao Zhang, Feiqi Cao, Yingbin Mo, Yiran Zhang, Josiah Poon, Caren Han,
- Abstract summary: This paper introduces GAME-MUG, a new multimodal game situation understanding and audience-engaged commentary generation dataset.
Our dataset is collected from 2020-2022 LOL game live streams from YouTube and Twitch, and includes multimodal esports game information, including text, audio, and time-series event logs.
In addition, we also propose a new audience conversation augmented commentary dataset by covering the game situation and audience conversation understanding.
- Score: 8.837048597513059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dynamic nature of esports makes the situation relatively complicated for average viewers. Esports broadcasting involves game expert casters, but the caster-dependent game commentary is not enough to fully understand the game situation. It will be richer by including diverse multimodal esports information, including audiences' talks/emotions, game audio, and game match event information. This paper introduces GAME-MUG, a new multimodal game situation understanding and audience-engaged commentary generation dataset and its strong baseline. Our dataset is collected from 2020-2022 LOL game live streams from YouTube and Twitch, and includes multimodal esports game information, including text, audio, and time-series event logs, for detecting the game situation. In addition, we also propose a new audience conversation augmented commentary dataset by covering the game situation and audience conversation understanding, and introducing a robust joint multimodal dual learning model as a baseline. We examine the model's game situation/event understanding ability and commentary generation capability to show the effectiveness of the multimodal aspects coverage and the joint integration learning approach.
Related papers
- MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions [69.9122231800796]
We present MMTrail, a large-scale multi-modality video-language dataset incorporating more than 20M trailer clips with visual captions.
We propose a systemic captioning framework, achieving various modality annotations with more than 27.1k hours of trailer videos.
Our dataset potentially paves the path for fine-grained large multimodal-language model training.
arXiv Detail & Related papers (2024-07-30T16:43:24Z) - MatchTime: Towards Automatic Soccer Game Commentary Generation [52.431010585268865]
We consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.
First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches.
Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale.
Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice.
arXiv Detail & Related papers (2024-06-26T17:57:25Z) - GameVibe: A Multimodal Affective Game Corpus [4.846739905880406]
We present GameVibe, a novel affect corpus which consists of multimodal audiovisual stimuli.
The corpus consists of videos from a diverse set of publicly available gameplay sessions across 30 games.
arXiv Detail & Related papers (2024-06-17T10:52:52Z) - LiveChat: Video Comment Generation from Audio-Visual Multimodal Contexts [8.070778830276275]
We create a large-scale audio-visual multimodal dialogue dataset to facilitate the development of live commenting technologies.
The data is collected from Twitch, with 11 different categories and 575 streamers for a total of 438 hours of video and 3.2 million comments.
We propose a novel multimodal generation model capable of generating live comments that align with the temporal and spatial events within the video.
arXiv Detail & Related papers (2023-10-01T02:35:58Z) - CS-lol: a Dataset of Viewer Comment with Scene in E-sports
Live-streaming [0.5735035463793008]
Billions of live-streaming viewers share their opinions on scenes they are watching in real-time and interact with the event.
We develop CS-lol, a dataset containing comments from viewers paired with descriptions of game scenes in E-sports live-streaming.
We propose a task, namely viewer comment retrieval, to retrieve the viewer comments for the scene of the live-streaming event.
arXiv Detail & Related papers (2023-01-17T13:34:06Z) - Commentary Generation from Data Records of Multiplayer Strategy Esports Game [21.133690853111133]
We build large-scale datasets that pair structured data and commentaries from a popular esports game, League of Legends.
We then evaluate Transformer-based models to generate game commentaries from structured data records.
We will release our dataset to boost potential research in the data-to-text generation community.
arXiv Detail & Related papers (2022-12-21T11:23:31Z) - A Survey on Video Action Recognition in Sports: Datasets, Methods and
Applications [60.3327085463545]
We present a survey on video action recognition for sports analytics.
We introduce more than ten types of sports, including team sports, such as football, basketball, volleyball, hockey and individual sports, such as figure skating, gymnastics, table tennis, diving and badminton.
We develop a toolbox using PaddlePaddle, which supports football, basketball, table tennis and figure skating action recognition.
arXiv Detail & Related papers (2022-06-02T13:19:36Z) - A Multi-stage deep architecture for summary generation of soccer videos [11.41978608521222]
We propose a method to generate the summary of a soccer match exploiting both the audio and the event metadata.
The results show that our method can detect the actions of the match, identify which of these actions should belong to the summary and then propose multiple candidate summaries.
arXiv Detail & Related papers (2022-05-02T07:26:35Z) - MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and
GENeration [46.19536568693307]
Multimodal video-audio-text understanding and generation can benefit from datasets that are narrow but rich.
We present a large-scale video-audio-text dataset MUGEN, collected using the open-sourced platform game CoinRun.
We sample 375K video clips (3.2s each) and collect text descriptions from human annotators.
arXiv Detail & Related papers (2022-04-17T17:59:09Z) - Spoken Moments: Learning Joint Audio-Visual Representations from Video
Descriptions [75.77044856100349]
We present the Spoken Moments dataset of 500k spoken captions each attributed to a unique short video depicting a broad range of different events.
We show that our AMM approach consistently improves our results and that models trained on our Spoken Moments dataset generalize better than those trained on other video-caption datasets.
arXiv Detail & Related papers (2021-05-10T16:30:46Z) - Temporally-Aware Feature Pooling for Action Spotting in Soccer
Broadcasts [86.56462654572813]
We focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game.
We propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge.
We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting.
arXiv Detail & Related papers (2021-04-14T11:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.