CLIP meets GamePhysics: Towards bug identification in gameplay videos
using zero-shot transfer learning
- URL: http://arxiv.org/abs/2203.11096v2
- Date: Tue, 22 Mar 2022 23:37:49 GMT
- Title: CLIP meets GamePhysics: Towards bug identification in gameplay videos
using zero-shot transfer learning
- Authors: Mohammad Reza Taesiri, Finlay Macklon, Cor-Paul Bezemer
- Abstract summary: We propose a search method that accepts any English text query as input to retrieve relevant gameplay videos.
Our approach does not rely on any external information (such as video metadata)
An example application of our approach is as a gameplay video search engine to aid in reproducing video game bugs.
- Score: 4.168157981135698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gameplay videos contain rich information about how players interact with the
game and how the game responds. Sharing gameplay videos on social media
platforms, such as Reddit, has become a common practice for many players.
Often, players will share gameplay videos that showcase video game bugs. Such
gameplay videos are software artifacts that can be utilized for game testing,
as they provide insight for bug analysis. Although large repositories of
gameplay videos exist, parsing and mining them in an effective and structured
fashion has still remained a big challenge. In this paper, we propose a search
method that accepts any English text query as input to retrieve relevant videos
from large repositories of gameplay videos. Our approach does not rely on any
external information (such as video metadata); it works solely based on the
content of the video. By leveraging the zero-shot transfer capabilities of the
Contrastive Language-Image Pre-Training (CLIP) model, our approach does not
require any data labeling or training. To evaluate our approach, we present the
$\texttt{GamePhysics}$ dataset consisting of 26,954 videos from 1,873 games,
that were collected from the GamePhysics section on the Reddit website. Our
approach shows promising results in our extensive analysis of simple queries,
compound queries, and bug queries, indicating that our approach is useful for
object and event detection in gameplay videos. An example application of our
approach is as a gameplay video search engine to aid in reproducing video game
bugs. Please visit the following link for the code and the data:
https://asgaardlab.github.io/CLIPxGamePhysics/
Related papers
- VideoGameBunny: Towards vision assistants for video games [4.652236080354487]
This paper describes the development of VideoGameBunny, a LLaVA-style model based on Bunny, specifically tailored for understanding images from video games.
We release intermediate checkpoints, training logs, and an extensive dataset comprising 185,259 video game images from 413 titles.
Our experiments show that our high quality game-related data has the potential to make a relatively small model outperform the much larger state-of-the-art model LLaVa-1.6-34b.
arXiv Detail & Related papers (2024-07-21T23:31:57Z) - Finding the Needle in a Haystack: Detecting Bug Occurrences in Gameplay
Videos [10.127506928281413]
We present an automated approach that uses machine learning to predict whether a segment of a gameplay video contains a depiction of a bug.
We analyzed 4,412 segments of 198 gameplay videos to predict whether a segment contains an instance of a bug.
Our approach is effective at detecting segments of a video that contain bugs, achieving a high F1 score of 0.88, outperforming the current state-of-the-art technique for bug classification.
arXiv Detail & Related papers (2023-11-18T01:14:18Z) - Harvest Video Foundation Models via Efficient Post-Pretraining [67.30842563833185]
We propose an efficient framework to harvest video foundation models from image ones.
Our method is intuitively simple by randomly dropping input video patches and masking out input text during the post-pretraining procedure.
Our method achieves state-of-the-art performances, which are comparable to some heavily pretrained video foundation models.
arXiv Detail & Related papers (2023-10-30T14:06:16Z) - Using Gameplay Videos for Detecting Issues in Video Games [14.41863992598613]
Streamers may encounter several problems (such as bugs, glitches, or performance issues) while they play.
The identified problems may negatively impact the user's gaming experience and, in turn, can harm the reputation of the game and of the producer.
We propose and empirically evaluate GELID, an approach for automatically extracting relevant information from gameplay videos.
arXiv Detail & Related papers (2023-07-27T10:16:04Z) - TG-VQA: Ternary Game of Video Question Answering [33.180788803602084]
Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them.
In this work, we innovatively resort to game theory, which can simulate complicated relationships among multiple players with specific interaction strategies.
Specifically, we carefully design a VideoQA-specific interaction strategy to tailor the characteristics of VideoQA, which can mathematically generate the fine-grained visual-linguistic alignment label without label-intensive efforts.
arXiv Detail & Related papers (2023-05-17T08:42:53Z) - GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for
Real-time Soccer Commentary Generation [75.60413443783953]
We present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC)
Our data and code are available at https://github.com/THU-KEG/goal.
arXiv Detail & Related papers (2023-03-26T08:43:36Z) - Subjective and Objective Analysis of Streamed Gaming Videos [60.32100758447269]
We study subjective and objective Video Quality Assessment (VQA) models on gaming videos.
We created a novel gaming video video resource, called the LIVE-YouTube Gaming video quality (LIVE-YT-Gaming) database, comprised of 600 real gaming videos.
We conducted a subjective human study on this data, yielding 18,600 human quality ratings recorded by 61 human subjects.
arXiv Detail & Related papers (2022-03-24T03:02:57Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - What is More Likely to Happen Next? Video-and-Language Future Event
Prediction [111.93601253692165]
Given a video with aligned dialogue, people can often infer what is more likely to happen next.
In this work, we explore whether AI models are able to learn to make such multimodal commonsense next-event predictions.
We collect a new dataset, named Video-and-Language Event Prediction, with 28,726 future event prediction examples.
arXiv Detail & Related papers (2020-10-15T19:56:47Z) - Enhancing Unsupervised Video Representation Learning by Decoupling the
Scene and the Motion [86.56202610716504]
Action categories are highly related with the scene where the action happens, making the model tend to degrade to a solution where only the scene information is encoded.
We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.
arXiv Detail & Related papers (2020-09-12T09:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.