Related papers: Unlocking Comics: The AI4VA Dataset for Visual Understanding

Unlocking Comics: The AI4VA Dataset for Visual Understanding

URL: http://arxiv.org/abs/2410.20459v1
Date: Sun, 27 Oct 2024 14:27:05 GMT
Title: Unlocking Comics: The AI4VA Dataset for Visual Understanding
Authors: Peter Grönquist, Deblina Bhattacharjee, Bahar Aydemir, Baran Ozaydin, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk,
Abstract summary: This paper presents a novel dataset comprising Franco-Belgian comics from the 1950s annotated for tasks including depth estimation, semantic segmentation, saliency detection, and character identification. It consists of two distinct and consistent styles and incorporates object concepts and labels taken from natural images. By including such diverse information across styles, this dataset not only holds promise for computational creativity but also offers avenues for the digitization of art and storytelling innovation.
Score: 62.345344799258804
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In the evolving landscape of deep learning, there is a pressing need for more comprehensive datasets capable of training models across multiple modalities. Concurrently, in digital humanities, there is a growing demand to leverage technology for diverse media adaptation and creation, yet limited by sparse datasets due to copyright and stylistic constraints. Addressing this gap, our paper presents a novel dataset comprising Franco-Belgian comics from the 1950s annotated for tasks including depth estimation, semantic segmentation, saliency detection, and character identification. It consists of two distinct and consistent styles and incorporates object concepts and labels taken from natural images. By including such diverse information across styles, this dataset not only holds promise for computational creativity but also offers avenues for the digitization of art and storytelling innovation. This dataset is a crucial component of the AI4VA Workshop Challenges~\url{https://sites.google.com/view/ai4vaeccv2024}, where we specifically explore depth and saliency. Dataset details at \url{https://github.com/IVRL/AI4VA}.

Related papers

Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment [51.40989269202702]
aesthetic quality assessment task is crucial for developing a human-aligned quantitative evaluation system for AIGC.<n>We propose ArtQuant, an aesthetics assessment framework for artistic images which couples isolated aesthetic dimensions through description generation.<n>Our approach achieves epoch state-of-the-art performance on several datasets while requiring only 33% of conventional trainings.
arXiv Detail & Related papers (2025-12-29T12:18:26Z)
Few-shot multi-token DreamBooth with LoRa for style-consistent character generation [3.4405653742416145]
The audiovisual industry is undergoing a profound transformation as it is integrating AI developments to inspire new forms of art.<n>This paper addresses the problem of producing a virtually unlimited number of novel characters that preserve the artistic style and shared visual traits of a small set of human-designed reference characters.<n>We propose a multi-token strategy, using clustering to assign separate tokens to individual characters and their collective style, combined with LoRA-based parameter-efficient fine-tuning.
arXiv Detail & Related papers (2025-10-10T15:28:55Z)
LoRaLay: A Multilingual and Multimodal Dataset for Long Range and Layout-Aware Summarization [19.301567079372436]
Text Summarization is a popular task and an active area of research for the Natural Language Processing community. All publicly available summarization datasets only provide plain text content. We present LoRaLay, a collection of datasets for long-range summarization with accompanying visual/Lay information.
arXiv Detail & Related papers (2023-01-26T18:50:54Z)
HandsOff: Labeled Dataset Generation With No Additional Human Annotations [13.11411442720668]
We introduce the HandsOff framework, a technique capable of producing an unlimited number of synthetic images and corresponding labels. Our framework avoids the practical drawbacks of prior work by unifying the field of GAN inversion with dataset generation. We generate datasets with rich pixel-wise labels in multiple challenging domains such as faces, cars, full-body human poses, and urban driving scenes.
arXiv Detail & Related papers (2022-12-24T03:37:02Z)
Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation. We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z)
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources. Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision. We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z)
Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images. We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z)
Text-Based Person Search with Limited Data [66.26504077270356]
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query. We present a framework with two novel components to handle the problems brought by limited data.
arXiv Detail & Related papers (2021-10-20T22:20:47Z)
REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps. Our dataset is collected in both forms of 2D images and 3D point clouds. Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)
KaoKore: A Pre-modern Japanese Art Facial Expression Dataset [8.987910033541239]
We propose a new dataset KaoKore which consists of faces extracted from pre-modern Japanese artwork. We demonstrate its value as both a dataset for image classification as well as a creative and artistic dataset, which we explore using generative models.
arXiv Detail & Related papers (2020-02-20T07:22:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.