Object Retrieval and Localization in Large Art Collections using Deep
Multi-Style Feature Fusion and Iterative Voting
- URL: http://arxiv.org/abs/2107.06935v1
- Date: Wed, 14 Jul 2021 18:40:49 GMT
- Title: Object Retrieval and Localization in Large Art Collections using Deep
Multi-Style Feature Fusion and Iterative Voting
- Authors: Nikolai Ufer, Sabine Lang, Bj\"orn Ommer
- Abstract summary: We introduce an algorithm that allows users to search for image regions containing specific motifs or objects.
Our region-based voting with GPU-accelerated approximate nearest-neighbour search allows us to find and localize even small motifs within an extensive dataset in a few seconds.
- Score: 10.807131260367298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The search for specific objects or motifs is essential to art history as both
assist in decoding the meaning of artworks. Digitization has produced large art
collections, but manual methods prove to be insufficient to analyze them. In
the following, we introduce an algorithm that allows users to search for image
regions containing specific motifs or objects and find similar regions in an
extensive dataset, helping art historians to analyze large digitized art
collections. Computer vision has presented efficient methods for visual
instance retrieval across photographs. However, applied to art collections,
they reveal severe deficiencies because of diverse motifs and massive domain
shifts induced by differences in techniques, materials, and styles. In this
paper, we present a multi-style feature fusion approach that successfully
reduces the domain gap and improves retrieval results without labelled data or
curated image collections. Our region-based voting with GPU-accelerated
approximate nearest-neighbour search allows us to find and localize even small
motifs within an extensive dataset in a few seconds. We obtain state-of-the-art
results on the Brueghel dataset and demonstrate its generalization to
inhomogeneous collections with a large number of distractors.
Related papers
- Explainable Search and Discovery of Visual Cultural Heritage Collections with Multimodal Large Language Models [0.0]
We introduce a method for using state-of-the-art multimodal large language models (LLMs) to enable an open-ended, explainable search and discovery interface for visual collections.
We show how our approach can create novel clustering and recommendation systems that avoid common pitfalls of methods based directly on visual embeddings.
arXiv Detail & Related papers (2024-11-07T12:48:39Z) - Retrieval Robust to Object Motion Blur [54.34823913494456]
We propose a method for object retrieval in images that are affected by motion blur.
We present the first large-scale datasets for blurred object retrieval.
Our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets.
arXiv Detail & Related papers (2024-04-27T23:22:39Z) - Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Leveraging Computer Vision Application in Visual Arts: A Case Study on
the Use of Residual Neural Network to Classify and Analyze Baroque Paintings [0.0]
In this case study, we focus on the classification of a selected painting 'Portrait of the Painter Charles Bruni' by Johann Kupetzky.
We show that the features extracted during residual network training can be useful for image retrieval within search systems in online art collections.
arXiv Detail & Related papers (2022-10-27T10:15:36Z) - VizWiz-FewShot: Locating Objects in Images Taken by People With Visual
Impairments [74.72656607288185]
We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took.
It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments.
Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects.
arXiv Detail & Related papers (2022-07-24T20:44:51Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - A deep learning approach to clustering visual arts [7.363576598794859]
We propose DELIUS: a DEep learning approach to cLustering vIsUal artS.
The method uses a pre-trained convolutional network to extract features and then feeds these features into a deep embedded clustering model.
The task of mapping the raw input data to a latent space is optimized jointly with the task of finding a set of cluster centroids in this latent space.
arXiv Detail & Related papers (2021-06-11T08:35:26Z) - PhraseCut: Language-based Image Segmentation in the Wild [62.643450401286]
We consider the problem of segmenting image regions given a natural language phrase.
Our dataset is collected on top of the Visual Genome dataset.
Our experiments show that the scale and diversity of concepts in our dataset poses significant challenges to the existing state-of-the-art.
arXiv Detail & Related papers (2020-08-03T20:58:53Z) - Deep convolutional embedding for digitized painting clustering [14.228308494671703]
We propose a deep convolutional embedding model for digitized painting clustering.
The model is capable of outperforming other state-of-the-art deep clustering approaches to the same problem.
The proposed method can be useful for several art-related tasks, in particular visual link retrieval and historical knowledge discovery in painting datasets.
arXiv Detail & Related papers (2020-03-19T06:49:38Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.