Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection
- URL: http://arxiv.org/abs/2306.17469v2
- Date: Mon, 22 Apr 2024 10:40:50 GMT
- Title: Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection
- Authors: Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui,
- Abstract summary: Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs.
Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models.
Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.
- Score: 37.083051419659135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characters' personalities, and inference of character relationships and stories. To deal with the problem of insufficient speaker-to-text annotations, we created a new annotation dataset Manga109Dialog based on Manga109. Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs. We further divided our dataset into different levels by prediction difficulties to evaluate speaker detection methods more appropriately. Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models. Due to the unique features of comics, we enhance the performance of our proposed model by considering the frame reading order. We conducted experiments using Manga109Dialog and other datasets. Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.
Related papers
- Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [83.7506131809624]
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives.
We present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources.
We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names.
arXiv Detail & Related papers (2024-07-16T18:03:58Z) - Comics Datasets Framework: Mix of Comics datasets for detection benchmarking [11.457653763760792]
Comics as a medium uniquely combine text and images in styles often distinct from real-world visuals.
computational research on comics has evolved from basic object detection to more sophisticated tasks.
We aim to standardize annotations across datasets, introduce a variety of comic styles into the datasets, and establish benchmark results with clear, replicable settings.
arXiv Detail & Related papers (2024-07-03T23:07:57Z) - Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion [35.25298023240529]
We propose a novel zero-shot approach to identify characters and predict speaker names based solely on unannotated comic images.
Our method requires no training data or annotations, it can be used as-is on any comic series.
arXiv Detail & Related papers (2024-04-22T08:59:35Z) - The Manga Whisperer: Automatically Generating Transcriptions for Comics [55.544015596503726]
We present a unified model, Magi, that is able to detect panels, text boxes and character boxes.
We propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript.
arXiv Detail & Related papers (2024-01-18T18:59:09Z) - Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - A Benchmark for Understanding and Generating Dialogue between Characters
in Stories [75.29466820496913]
We present the first study to explore whether machines can understand and generate dialogue in stories.
We propose two new tasks including Masked Dialogue Generation and Dialogue Speaker Recognition.
We show the difficulty of the proposed tasks by testing existing models with automatic and manual evaluation on DialStory.
arXiv Detail & Related papers (2022-09-18T10:19:04Z) - Automatic Comic Generation with Stylistic Multi-page Layouts and
Emotion-driven Text Balloon Generation [57.10363557465713]
We propose a fully automatic system for generating comic books from videos without any human intervention.
Given an input video along with its subtitles, our approach first extracts informatives by analyzing the subtitles.
Then, we propose a novel automatic multi-page framework layout, which can allocate the images across multiple pages.
arXiv Detail & Related papers (2021-01-26T22:15:15Z) - Unconstrained Text Detection in Manga: a New Dataset and Baseline [3.04585143845864]
This work aims to binarize text in a comic genre with highly sophisticated text styles: Japanese manga.
To overcome the lack of a manga dataset with text annotations at a pixel level, we create our own.
Using these resources, we designed and evaluated a deep network model, outperforming current methods for text binarization in manga in most metrics.
arXiv Detail & Related papers (2020-09-09T00:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.