ICC++: Explainable Image Retrieval for Art Historical Corpora using
Image Composition Canvas
- URL: http://arxiv.org/abs/2206.11115v1
- Date: Wed, 22 Jun 2022 14:06:29 GMT
- Title: ICC++: Explainable Image Retrieval for Art Historical Corpora using
Image Composition Canvas
- Authors: Prathmesh Madhu, Tilman Marquart, Ronak Kosti, Dirk Suckow, Peter
Bell, Andreas Maier, Vincent Christlein
- Abstract summary: We present a novel approach called Image Composition Canvas (ICC++) to compare and retrieve images having similar compositional elements.
ICC++ is an improvement over ICC specializing in generating low and high-level features (compositional elements) motivated by Max Imdahl's work.
- Score: 19.80532568090711
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image compositions are helpful in the study of image structures and assist in
discovering the semantics of the underlying scene portrayed across art forms
and styles. With the digitization of artworks in recent years, thousands of
images of a particular scene or narrative could potentially be linked together.
However, manually linking this data with consistent objectiveness can be a
highly challenging and time-consuming task. In this work, we present a novel
approach called Image Composition Canvas (ICC++) to compare and retrieve images
having similar compositional elements. ICC++ is an improvement over ICC
specializing in generating low and high-level features (compositional elements)
motivated by Max Imdahl's work. To this end, we present a rigorous quantitative
and qualitative comparison of our approach with traditional and
state-of-the-art (SOTA) methods showing that our proposed method outperforms
all of them. In combination with deep features, our method outperforms the best
deep learning-based method, opening the research direction for explainable
machine learning for digital humanities. We will release the code and the data
post-publication.
Related papers
- Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs.
We propose Compositional Entailment Learning for hyperbolic vision-language models.
Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z) - Enhancing Historical Image Retrieval with Compositional Cues [3.2276097734075426]
We introduce a crucial factor from computational aesthetics, namely image composition, into this topic.
By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information.
arXiv Detail & Related papers (2024-03-21T10:51:19Z) - Decoupled Textual Embeddings for Customized Image Generation [62.98933630971543]
Customized text-to-image generation aims to learn user-specified concepts with a few images.
Existing methods usually suffer from overfitting issues and entangle the subject-unrelated information with the learned concept.
We propose the DETEX, a novel approach that learns the disentangled concept embedding for flexible customized text-to-image generation.
arXiv Detail & Related papers (2023-12-19T03:32:10Z) - Exploiting CLIP-based Multi-modal Approach for Artwork Classification
and Retrieval [29.419743866789187]
We perform exhaustive experiments on the NoisyArt dataset which is a dataset of artwork images crawled from public resources on the web.
On such dataset CLIP achieves impressive results on (zero-shot) classification and promising results in both artwork-to-artwork and description-to-artwork domain.
arXiv Detail & Related papers (2023-09-21T14:29:44Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Recent Advances in Scene Image Representation and Classification [1.8369974607582584]
We review the existing scene image representation methods that are being used widely for image classification.
We compare their performance both qualitatively (e.g., quality of outputs, pros/cons, etc.) and quantitatively (e.g., accuracy)
Overall, this survey provides in-depth insights and applications of recent scene image representation methods for traditional Computer Vision (CV)-based methods, Deep Learning (DL)-based methods, and Search Engine (SE)-based methods.
arXiv Detail & Related papers (2022-06-15T07:12:23Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - RTIC: Residual Learning for Text and Image Composition using Graph
Convolutional Network [19.017377597937617]
We study the compositional learning of images and texts for image retrieval.
We introduce a novel method that combines the graph convolutional network (GCN) with existing composition methods.
arXiv Detail & Related papers (2021-04-07T09:41:52Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Understanding Compositional Structures in Art Historical Images using
Pose and Gaze Priors [20.98603643788824]
Image compositions are useful in analyzing the interactions in an image to study artists and their artworks.
In this work, we attempt to automate this process using the existing state of the art machine learning techniques.
Our approach focuses on two central themes of image composition: (a) detection of action regions and action lines of the artwork; and (b) pose-based segmentation of foreground and background.
arXiv Detail & Related papers (2020-09-08T15:01:56Z) - Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image
Retrieval [147.24102408745247]
We study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail.
In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels.
arXiv Detail & Related papers (2020-07-29T20:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.