Building a Manga Dataset "Manga109" with Annotations for Multimedia
Applications
- URL: http://arxiv.org/abs/2005.04425v2
- Date: Tue, 12 May 2020 14:07:55 GMT
- Title: Building a Manga Dataset "Manga109" with Annotations for Multimedia
Applications
- Authors: Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Yusuke
Matsui, Koki Tsubota, Hikaru Ikuta
- Abstract summary: Manga109 is a dataset consisting of 109 Japanese comic books (94 authors and 21,142 pages)
This dataset provides numerous manga images and annotations, which will be beneficial for use in machine learning algorithms.
In this article, we describe the details of the dataset and present a few examples of multimedia processing applications.
- Score: 33.45306086398143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manga, or comics, which are a type of multimodal artwork, have been left
behind in the recent trend of deep learning applications because of the lack of
a proper dataset. Hence, we built Manga109, a dataset consisting of a variety
of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly
available by obtaining author permissions for academic use. We carefully
annotated the frames, speech texts, character faces, and character bodies; the
total number of annotations exceeds 500k. This dataset provides numerous manga
images and annotations, which will be beneficial for use in machine learning
algorithms and their evaluation. In addition to academic use, we obtained
further permission for a subset of the dataset for industrial use. In this
article, we describe the details of the dataset and present a few examples of
multimedia processing applications (detection, retrieval, and generation) that
apply existing deep learning methods and are made possible by the dataset.
Related papers
- How Panel Layouts Define Manga: Insights from Visual Ablation Experiments [24.408092528259424]
This paper aims to analyze the visual characteristics of manga works, with a particular focus on panel layout features.
As a research method, we used facing page images of manga as input to train a deep learning model for predicting manga titles.
Specifically, we conducted ablation studies by limiting page image information to panel frames to analyze the characteristics of panel layouts.
arXiv Detail & Related papers (2024-12-26T09:53:37Z) - A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain [3.9519587827662397]
We focus on relation extraction and text classification, using the showcase of eight biomedical benchmarks.
We consider trade-offs between accuracy and application costs, dive into training data generation through distant supervision and large language models such as ChatGPT, LLama, and Olmo, and discuss how to design final pipelines.
arXiv Detail & Related papers (2024-11-06T07:54:10Z) - Unlocking Comics: The AI4VA Dataset for Visual Understanding [62.345344799258804]
This paper presents a novel dataset comprising Franco-Belgian comics from the 1950s annotated for tasks including depth estimation, semantic segmentation, saliency detection, and character identification.
It consists of two distinct and consistent styles and incorporates object concepts and labels taken from natural images.
By including such diverse information across styles, this dataset not only holds promise for computational creativity but also offers avenues for the digitization of art and storytelling innovation.
arXiv Detail & Related papers (2024-10-27T14:27:05Z) - The Manga Whisperer: Automatically Generating Transcriptions for Comics [55.544015596503726]
We present a unified model, Magi, that is able to detect panels, text boxes and character boxes.
We propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript.
arXiv Detail & Related papers (2024-01-18T18:59:09Z) - Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection [37.083051419659135]
Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs.
Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models.
Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.
arXiv Detail & Related papers (2023-06-30T08:34:08Z) - DifferSketching: How Differently Do People Sketch 3D Objects? [78.44544977215918]
Multiple sketch datasets have been proposed to understand how people draw 3D objects.
These datasets are often of small scale and cover a small set of objects or categories.
We analyze the collected data at three levels, i.e., sketch-level, stroke-level, and pixel-level, under both spatial and temporal characteristics.
arXiv Detail & Related papers (2022-09-19T06:52:18Z) - I Know What You Draw: Learning Grasp Detection Conditioned on a Few
Freehand Sketches [74.63313641583602]
We propose a method to generate a potential grasp configuration relevant to the sketch-depicted objects.
Our model is trained and tested in an end-to-end manner which is easy to be implemented in real-world applications.
arXiv Detail & Related papers (2022-05-09T04:23:36Z) - Multi-Class Zero-Shot Learning for Artistic Material Recognition [68.8204255655161]
Zero-Shot Learning (ZSL) is an extreme form of transfer learning, where no labelled examples of the data to be classified are provided during the training stage.
Here we outline a model to identify the materials with which a work of art was created, by learning the relationship between English descriptions of the subject of a piece and its composite materials.
We produce a model which is capable of correctly identifying the materials used on pieces from an entirely distinct museum dataset.
arXiv Detail & Related papers (2020-10-26T19:04:50Z) - KaoKore: A Pre-modern Japanese Art Facial Expression Dataset [8.987910033541239]
We propose a new dataset KaoKore which consists of faces extracted from pre-modern Japanese artwork.
We demonstrate its value as both a dataset for image classification as well as a creative and artistic dataset, which we explore using generative models.
arXiv Detail & Related papers (2020-02-20T07:22:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.