Unlocking Comics: The AI4VA Dataset for Visual Understanding
- URL: http://arxiv.org/abs/2410.20459v1
- Date: Sun, 27 Oct 2024 14:27:05 GMT
- Title: Unlocking Comics: The AI4VA Dataset for Visual Understanding
- Authors: Peter Grönquist, Deblina Bhattacharjee, Bahar Aydemir, Baran Ozaydin, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk,
- Abstract summary: This paper presents a novel dataset comprising Franco-Belgian comics from the 1950s annotated for tasks including depth estimation, semantic segmentation, saliency detection, and character identification.
It consists of two distinct and consistent styles and incorporates object concepts and labels taken from natural images.
By including such diverse information across styles, this dataset not only holds promise for computational creativity but also offers avenues for the digitization of art and storytelling innovation.
- Score: 62.345344799258804
- License:
- Abstract: In the evolving landscape of deep learning, there is a pressing need for more comprehensive datasets capable of training models across multiple modalities. Concurrently, in digital humanities, there is a growing demand to leverage technology for diverse media adaptation and creation, yet limited by sparse datasets due to copyright and stylistic constraints. Addressing this gap, our paper presents a novel dataset comprising Franco-Belgian comics from the 1950s annotated for tasks including depth estimation, semantic segmentation, saliency detection, and character identification. It consists of two distinct and consistent styles and incorporates object concepts and labels taken from natural images. By including such diverse information across styles, this dataset not only holds promise for computational creativity but also offers avenues for the digitization of art and storytelling innovation. This dataset is a crucial component of the AI4VA Workshop Challenges~\url{https://sites.google.com/view/ai4vaeccv2024}, where we specifically explore depth and saliency. Dataset details at \url{https://github.com/IVRL/AI4VA}.
Related papers
- LoRaLay: A Multilingual and Multimodal Dataset for Long Range and
Layout-Aware Summarization [19.301567079372436]
Text Summarization is a popular task and an active area of research for the Natural Language Processing community.
All publicly available summarization datasets only provide plain text content.
We present LoRaLay, a collection of datasets for long-range summarization with accompanying visual/Lay information.
arXiv Detail & Related papers (2023-01-26T18:50:54Z) - HandsOff: Labeled Dataset Generation With No Additional Human
Annotations [13.11411442720668]
We introduce the HandsOff framework, a technique capable of producing an unlimited number of synthetic images and corresponding labels.
Our framework avoids the practical drawbacks of prior work by unifying the field of GAN inversion with dataset generation.
We generate datasets with rich pixel-wise labels in multiple challenging domains such as faces, cars, full-body human poses, and urban driving scenes.
arXiv Detail & Related papers (2022-12-24T03:37:02Z) - Unsupervised Neural Stylistic Text Generation using Transfer learning
and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation.
We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Text-Based Person Search with Limited Data [66.26504077270356]
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query.
We present a framework with two novel components to handle the problems brought by limited data.
arXiv Detail & Related papers (2021-10-20T22:20:47Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - KaoKore: A Pre-modern Japanese Art Facial Expression Dataset [8.987910033541239]
We propose a new dataset KaoKore which consists of faces extracted from pre-modern Japanese artwork.
We demonstrate its value as both a dataset for image classification as well as a creative and artistic dataset, which we explore using generative models.
arXiv Detail & Related papers (2020-02-20T07:22:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.