How Panel Layouts Define Manga: Insights from Visual Ablation Experiments
- URL: http://arxiv.org/abs/2412.19141v1
- Date: Thu, 26 Dec 2024 09:53:37 GMT
- Title: How Panel Layouts Define Manga: Insights from Visual Ablation Experiments
- Authors: Siyuan Feng, Teruya Yoshinaga, Katsuhiko Hayashi, Koki Washio, Hidetaka Kamigaito,
- Abstract summary: This paper aims to analyze the visual characteristics of manga works, with a particular focus on panel layout features.
As a research method, we used facing page images of manga as input to train a deep learning model for predicting manga titles.
Specifically, we conducted ablation studies by limiting page image information to panel frames to analyze the characteristics of panel layouts.
- Score: 24.408092528259424
- License:
- Abstract: Today, manga has gained worldwide popularity. However, the question of how various elements of manga, such as characters, text, and panel layouts, reflect the uniqueness of a particular work, or even define it, remains an unexplored area. In this paper, we aim to quantitatively and qualitatively analyze the visual characteristics of manga works, with a particular focus on panel layout features. As a research method, we used facing page images of manga as input to train a deep learning model for predicting manga titles, examining classification accuracy to quantitatively analyze these features. Specifically, we conducted ablation studies by limiting page image information to panel frames to analyze the characteristics of panel layouts. Through a series of quantitative experiments using all 104 works, 12 genres, and 10,122 facing page images from the Manga109 dataset, as well as qualitative analysis using Grad-CAM, our study demonstrates that the uniqueness of manga works is strongly reflected in their panel layouts.
Related papers
- Large-image Object Detection for Fine-grained Recognition of Punches Patterns in Medieval Panel Painting [6.762125776126245]
We train a Machine Learning pipeline to perform object detection on the punches contained in images.
Our results indicate how art historians working in the field can reliably use our method for the identification and extraction of punches.
arXiv Detail & Related papers (2025-01-21T20:30:51Z) - Manga Generation via Layout-controllable Diffusion [21.080054070512023]
This paper presents the manga generation task and constructs the Manga109Story dataset for studying manga generation solely from plain text.
We propose MangaDiffusion to facilitate the intra-panel and inter-panel information interaction during the manga generation process.
arXiv Detail & Related papers (2024-12-26T17:52:19Z) - MangaUB: A Manga Understanding Benchmark for Large Multimodal Models [25.63892470012361]
Manga is a popular medium that combines stylized drawings and text to convey stories.
Recently, the adaptive nature of modern large multimodal models (LMMs) shows possibilities for more general approaches.
MangaUB is designed to assess the recognition and understanding of content shown in a single panel as well as conveyed across multiple panels.
arXiv Detail & Related papers (2024-07-26T18:21:30Z) - Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors [49.99728312519117]
The aim of this work is to establish how accurately a recent semantic-based active perception model is able to complete visual tasks that are regularly performed by humans.
This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations.
In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model.
arXiv Detail & Related papers (2024-04-16T18:15:57Z) - The Manga Whisperer: Automatically Generating Transcriptions for Comics [55.544015596503726]
We present a unified model, Magi, that is able to detect panels, text boxes and character boxes.
We propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript.
arXiv Detail & Related papers (2024-01-18T18:59:09Z) - Scenimefy: Learning to Craft Anime Scene via Semi-Supervised
Image-to-Image Translation [75.91455714614966]
We propose Scenimefy, a novel semi-supervised image-to-image translation framework.
Our approach guides the learning with structure-consistent pseudo paired data.
A patch-wise contrastive style loss is introduced to improve stylization and fine details.
arXiv Detail & Related papers (2023-08-24T17:59:50Z) - DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation [83.51882381294357]
DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs.
Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
arXiv Detail & Related papers (2022-11-10T11:47:37Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Unsupervised Manga Character Re-identification via Face-body and
Spatial-temporal Associated Clustering [21.696847342192072]
The artistic expression and stylistic limitations of manga pose many challenges to the re-identification problem.
Inspired by the idea that some content-related features may help clustering, we propose a Face-body and Spatial-temporal Associated Clustering method.
In the face-body combination module, a face-body graph is constructed to solve problems such as exaggeration and deformation in artistic creation.
In the spatial-temporal relationship correction module, we analyze the appearance features of characters and design a temporal-spatial-related triplet loss to fine-tune the clustering.
arXiv Detail & Related papers (2022-04-10T07:28:41Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - Building a Manga Dataset "Manga109" with Annotations for Multimedia
Applications [33.45306086398143]
Manga109 is a dataset consisting of 109 Japanese comic books (94 authors and 21,142 pages)
This dataset provides numerous manga images and annotations, which will be beneficial for use in machine learning algorithms.
In this article, we describe the details of the dataset and present a few examples of multimedia processing applications.
arXiv Detail & Related papers (2020-05-09T12:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.