The Manga Whisperer: Automatically Generating Transcriptions for Comics
- URL: http://arxiv.org/abs/2401.10224v3
- Date: Thu, 1 Aug 2024 05:18:48 GMT
- Title: The Manga Whisperer: Automatically Generating Transcriptions for Comics
- Authors: Ragav Sachdeva, Andrew Zisserman,
- Abstract summary: We present a unified model, Magi, that is able to detect panels, text boxes and character boxes.
We propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript.
- Score: 55.544015596503726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the past few decades, Japanese comics, commonly referred to as Manga, have transcended both cultural and linguistic boundaries to become a true worldwide sensation. Yet, the inherent reliance on visual cues and illustration within manga renders it largely inaccessible to individuals with visual impairments. In this work, we seek to address this substantial barrier, with the aim of ensuring that manga can be appreciated and actively engaged by everyone. Specifically, we tackle the problem of diarisation i.e. generating a transcription of who said what and when, in a fully automatic way. To this end, we make the following contributions: (1) we present a unified model, Magi, that is able to (a) detect panels, text boxes and character boxes, (b) cluster characters by identity (without knowing the number of clusters apriori), and (c) associate dialogues to their speakers; (2) we propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript; (3) we annotate an evaluation benchmark for this task using publicly available [English] manga pages. The code, evaluation datasets and the pre-trained model can be found at: https://github.com/ragavsachdeva/magi.
Related papers
- Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names [53.24414727354768]
This paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically.
It involves identifying (i) what is being said, detecting the texts on each page and classifying them into essential vs non-essential.
It also ensures the same characters are named consistently throughout the chapter.
arXiv Detail & Related papers (2024-08-01T05:47:04Z) - M2C: Towards Automatic Multimodal Manga Complement [40.01354682367365]
Multimodal manga analysis focuses on enhancing manga understanding with visual and textual features.
Currently, most comics are hand-drawn and prone to problems such as missing pages, text contamination, and aging.
We first propose the Multimodal Manga Complement task by establishing a new M2C benchmark dataset covering two languages.
arXiv Detail & Related papers (2023-10-26T04:10:16Z) - Dense Multitask Learning to Reconfigure Comics [63.367664789203936]
We develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels.
Our method can successfully identify the semantic units as well as the notion of 3D in comic panels.
arXiv Detail & Related papers (2023-07-16T15:10:34Z) - Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection [37.083051419659135]
Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs.
Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models.
Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.
arXiv Detail & Related papers (2023-06-30T08:34:08Z) - Talk-to-Edit: Fine-Grained Facial Editing via Dialog [79.8726256912376]
Talk-to-Edit is an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system.
Our key insight is to model a continual "semantic field" in the GAN latent space.
Our system generates language feedback by considering both the user request and the current state of the semantic field.
arXiv Detail & Related papers (2021-09-09T17:17:59Z) - AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised
Anime Face Generation [84.52819242283852]
We propose a novel framework to translate a portrait photo-face into an anime appearance.
Our aim is to synthesize anime-faces which are style-consistent with a given reference anime-face.
Existing methods often fail to transfer the styles of reference anime-faces, or introduce noticeable artifacts/distortions in the local shapes of their generated faces.
arXiv Detail & Related papers (2021-02-24T22:47:38Z) - Towards Fully Automated Manga Translation [8.45043706496877]
We tackle the problem of machine translation of manga, Japanese comics.
obtaining context from the image is essential for manga translation.
First, we propose multimodal context-aware translation framework.
Second, for training the model, we propose the approach to automatic corpus construction from pairs of original manga.
Third, we created a new benchmark to evaluate manga translation.
arXiv Detail & Related papers (2020-12-28T15:20:52Z) - Unconstrained Text Detection in Manga: a New Dataset and Baseline [3.04585143845864]
This work aims to binarize text in a comic genre with highly sophisticated text styles: Japanese manga.
To overcome the lack of a manga dataset with text annotations at a pixel level, we create our own.
Using these resources, we designed and evaluated a deep network model, outperforming current methods for text binarization in manga in most metrics.
arXiv Detail & Related papers (2020-09-09T00:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.