Comics Datasets Framework: Mix of Comics datasets for detection benchmarking
- URL: http://arxiv.org/abs/2407.03540v1
- Date: Wed, 3 Jul 2024 23:07:57 GMT
- Title: Comics Datasets Framework: Mix of Comics datasets for detection benchmarking
- Authors: Emanuele Vivoli, Irene Campaioli, Mariateresa Nardoni, Niccolò Biondi, Marco Bertini, Dimosthenis Karatzas,
- Abstract summary: Comics as a medium uniquely combine text and images in styles often distinct from real-world visuals.
computational research on comics has evolved from basic object detection to more sophisticated tasks.
We aim to standardize annotations across datasets, introduce a variety of comic styles into the datasets, and establish benchmark results with clear, replicable settings.
- Score: 11.457653763760792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comics, as a medium, uniquely combine text and images in styles often distinct from real-world visuals. For the past three decades, computational research on comics has evolved from basic object detection to more sophisticated tasks. However, the field faces persistent challenges such as small datasets, inconsistent annotations, inaccessible model weights, and results that cannot be directly compared due to varying train/test splits and metrics. To address these issues, we aim to standardize annotations across datasets, introduce a variety of comic styles into the datasets, and establish benchmark results with clear, replicable settings. Our proposed Comics Datasets Framework standardizes dataset annotations into a common format and addresses the overrepresentation of manga by introducing Comics100, a curated collection of 100 books from the Digital Comics Museum, annotated for detection in our uniform format. We have benchmarked a variety of detection architectures using the Comics Datasets Framework. All related code, model weights, and detailed evaluation processes are available at https://github.com/emanuelevivoli/cdf, ensuring transparency and facilitating replication. This initiative is a significant advancement towards improving object detection in comics, laying the groundwork for more complex computational tasks dependent on precise object recognition.
Related papers
- CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding [14.22900011952181]
We introduce a novel benchmark, CoMix, designed to evaluate the multi-task capabilities of models in comic analysis.
Our benchmark comprises three existing datasets with expanded annotations to support multi-task evaluation.
To mitigate the over-representation of manga-style data, we have incorporated a new dataset of carefully selected American comic-style books.
arXiv Detail & Related papers (2024-07-04T00:07:50Z) - Multimodal Transformer for Comics Text-Cloze [8.616858272810084]
Text-cloze refers to the task of selecting the correct text to use in a comic panel, given its neighboring panels.
Traditional methods based on recurrent neural networks have struggled with this task due to limited OCR accuracy and inherent model limitations.
We introduce a novel Multimodal Large Language Model (Multimodal-LLM) architecture, specifically designed for Text-cloze, achieving a 10% improvement over existing state-of-the-art models in both its easy and hard variants.
arXiv Detail & Related papers (2024-03-06T14:11:45Z) - Dense Multitask Learning to Reconfigure Comics [63.367664789203936]
We develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels.
Our method can successfully identify the semantic units as well as the notion of 3D in comic panels.
arXiv Detail & Related papers (2023-07-16T15:10:34Z) - Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection [37.083051419659135]
Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs.
Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models.
Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.
arXiv Detail & Related papers (2023-06-30T08:34:08Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Learning 3D Human Pose Estimation from Dozens of Datasets using a
Geometry-Aware Autoencoder to Bridge Between Skeleton Formats [80.12253291709673]
We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks.
Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
arXiv Detail & Related papers (2022-12-29T22:22:49Z) - A Comprehensive Gold Standard and Benchmark for Comics Text Detection
and Recognition [2.1485350418225244]
This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset.
We created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition"
We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS.
arXiv Detail & Related papers (2022-12-27T12:05:23Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object
Segmentation [23.767094632640763]
We present ClevrTex, designed as the next challenge to compare, evaluate and analyze algorithms.
ClarTex features synthetic scenes with diverse shapes, textures and photo-mapped materials, created using physically based rendering techniques.
We benchmark a large set of recent unsupervised multi-object segmentation models on ClevrTex and find all state-of-the-art approaches fail to learn good representations in the textured setting.
arXiv Detail & Related papers (2021-11-19T15:11:25Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.