Related papers: Enhancing Historical Image Retrieval with Compositional Cues

Enhancing Historical Image Retrieval with Compositional Cues

URL: http://arxiv.org/abs/2403.14287v1
Date: Thu, 21 Mar 2024 10:51:19 GMT
Title: Enhancing Historical Image Retrieval with Compositional Cues
Authors: Tingyu Lin, Robert Sablatnig,
Abstract summary: We introduce a crucial factor from computational aesthetics, namely image composition, into this topic. By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information.
Score: 3.2276097734075426
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In analyzing vast amounts of digitally stored historical image data, existing content-based retrieval methods often overlook significant non-semantic information, limiting their effectiveness for flexible exploration across varied themes. To broaden the applicability of image retrieval methods for diverse purposes and uncover more general patterns, we innovatively introduce a crucial factor from computational aesthetics, namely image composition, into this topic. By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information. Qualitative and quantitative experiments demonstrate that the image retrieval network guided by composition information outperforms those relying solely on content information, facilitating the identification of images in databases closer to the target image in human perception. Please visit https://github.com/linty5/CCBIR to try our codes.

Related papers

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training [62.843316348659165]
Deep learning-based image matching algorithms have dramatically outperformed humans in rapidly and accurately finding large amounts of correspondences. We propose a large-scale pre-training framework that utilizes synthetic cross-modal training signals to train models to recognize and match fundamental structures across images. Our key finding is that the matching model trained with our framework achieves remarkable generalizability across more than eight unseen cross-modality registration tasks.
arXiv Detail & Related papers (2025-01-13T18:37:36Z)
Enrich the content of the image Using Context-Aware Copy Paste [1.450405446885067]
We propose a context-aware approach that integrates Bi Latent Information Propagation (BLIP) for content extraction from source images. By matching extracted content information with category information, our method ensures cohesive integration of target objects using Segment Anything Model (SAM) and You Only Look Once (YOLO) Experimental evaluations across diverse datasets demonstrate the effectiveness of our method in enhancing data diversity and generating high-quality pseudo-images.
arXiv Detail & Related papers (2024-07-11T03:07:28Z)
Where Does the Performance Improvement Come From? - A Reproducibility Concern about Image-Text Retrieval [85.03655458677295]
Image-text retrieval has gradually become a major research direction in the field of information retrieval. We first examine the related concerns and why the focus is on image-text retrieval tasks. We analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models.
arXiv Detail & Related papers (2022-03-08T05:01:43Z)
Contextual Similarity Aggregation with Self-attention for Visual Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention. We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z)
Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph [96.95815946327079]
It is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities. We propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities.
arXiv Detail & Related papers (2021-07-26T05:50:41Z)
Compositional Sketch Search [91.84489055347585]
We present an algorithm for searching image collections using free-hand sketches. We exploit drawings as a concise and intuitive representation for specifying entire scene compositions.
arXiv Detail & Related papers (2021-06-15T09:38:09Z)
Multi-Modal Retrieval using Graph Neural Networks [1.8911962184174562]
We learn a joint vision and concept embedding in the same high-dimensional space. We model the visual and concept relationships as a graph structure. We also introduce a novel inference time control, based on selective neighborhood connectivity.
arXiv Detail & Related papers (2020-10-04T19:34:20Z)
Webpage Segmentation for Extracting Images and Their Surrounding Contextual Information [0.0]
We propose a webpage segmentation algorithm targeting the extraction of web images and their contextual information based on their characteristics as they appear on webpages. We conducted a user study to obtain a human-labeled dataset to validate the effectiveness of our method and experiments demonstrated that our method can achieve better results than an existing segmentation algorithm.
arXiv Detail & Related papers (2020-05-18T19:00:03Z)
Unsupervised Learning of Landmarks based on Inter-Intra Subject Consistencies [72.67344725725961]
We present a novel unsupervised learning approach to image landmark discovery by incorporating the inter-subject landmark consistencies on facial images. This is achieved via an inter-subject mapping module that transforms original subject landmarks based on an auxiliary subject-related structure. To recover from the transformed images back to the original subject, the landmark detector is forced to learn spatial locations that contain the consistent semantic meanings both for the paired intra-subject images and between the paired inter-subject images.
arXiv Detail & Related papers (2020-04-16T20:38:16Z)
Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation [84.7571086566595]
We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape. The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures.
arXiv Detail & Related papers (2020-03-31T11:50:43Z)
Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features [8.317191999275536]
In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities.
arXiv Detail & Related papers (2020-01-14T12:06:12Z)
Learning Transformation-Aware Embeddings for Image Forensics [15.484408315588569]
Image Provenance Analysis aims at discovering relationships among different manipulated image versions that share content. One of the main sub-problems for provenance analysis that has not yet been addressed directly is the edit ordering of images that share full content or are near-duplicates. This paper introduces a novel deep learning-based approach to provide a plausible ordering to images that have been generated from a single image through transformations.
arXiv Detail & Related papers (2020-01-13T22:01:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.