Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking
- URL: http://arxiv.org/abs/2110.13430v1
- Date: Tue, 26 Oct 2021 06:20:31 GMT
- Title: Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking
- Authors: Jianbo Ouyang, Hui Wu, Min Wang, Wengang Zhou, Houqiang Li
- Abstract summary: We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
- Score: 96.55393026011811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In content-based image retrieval, the first-round retrieval result by simple
visual feature comparison may be unsatisfactory, which can be refined by visual
re-ranking techniques. In image retrieval, it is observed that the contextual
similarity among the top-ranked images is an important clue to distinguish the
semantic relevance. Inspired by this observation, in this paper, we propose a
visual re-ranking method by contextual similarity aggregation with
self-attention. In our approach, for each image in the top-K ranking list, we
represent it into an affinity feature vector by comparing it with a set of
anchor images. Then, the affinity features of the top-K images are refined by
aggregating the contextual information with a transformer encoder. Finally, the
affinity features are used to recalculate the similarity scores between the
query and the top-K images for re-ranking of the latter. To further improve the
robustness of our re-ranking model and enhance the performance of our method, a
new data augmentation scheme is designed. Since our re-ranking model is not
directly involved with the visual feature used in the initial retrieval, it is
ready to be applied to retrieval result lists obtained from various retrieval
algorithms. We conduct comprehensive experiments on four benchmark datasets to
demonstrate the generality and effectiveness of our proposed visual re-ranking
method.
Related papers
- Texture image retrieval using a classification and contourlet-based
features [0.10241134756773226]
We propose a new framework for improving Content Based Image Retrieval (CBIR) for texture images.
This is achieved by using a new image representation based on the RCT-Plus transform.
We have achieved significant improvements in the retrieval rates compared to previous CBIR schemes.
arXiv Detail & Related papers (2024-03-10T00:07:47Z) - Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale
Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes.
We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors.
Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z) - Integrating Visual and Semantic Similarity Using Hierarchies for Image
Retrieval [0.46040036610482665]
We propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy.
The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification.
Our method achieves superior performance compared to the existing methods on image retrieval.
arXiv Detail & Related papers (2023-08-16T15:23:14Z) - Graph Convolution Based Efficient Re-Ranking for Visual Retrieval [29.804582207550478]
We present an efficient re-ranking method which refines initial retrieval results by updating features.
Specifically, we reformulate re-ranking based on Graph Convolution Networks (GCN) and propose a novel Graph Convolution based Re-ranking (GCR) for visual retrieval tasks via feature propagation.
In particular, the plain GCR is extended for cross-camera retrieval and an improved feature propagation formulation is presented to leverage affinity relationships across different cameras.
arXiv Detail & Related papers (2023-06-15T00:28:08Z) - Summarize and Search: Learning Consensus-aware Dynamic Convolution for
Co-Saliency Detection [139.10628924049476]
Humans perform co-saliency detection by first summarizing the consensus knowledge in the whole group and then searching corresponding objects in each image.
Previous methods usually lack robustness, scalability, or stability for the first process and simply fuse consensus features with image features for the second process.
We propose a novel consensus-aware dynamic convolution model to explicitly and effectively perform the "summarize and search" process.
arXiv Detail & Related papers (2021-10-01T12:06:42Z) - Cross-Modal Retrieval Augmentation for Multi-Modal Classification [61.5253261560224]
We explore the use of unstructured external knowledge sources of images and their corresponding captions for improving visual question answering.
First, we train a novel alignment model for embedding images and captions in the same space, which achieves substantial improvement on image-caption retrieval.
Second, we show that retrieval-augmented multi-modal transformers using the trained alignment model improve results on VQA over strong baselines.
arXiv Detail & Related papers (2021-04-16T13:27:45Z) - Scene Graph Embeddings Using Relative Similarity Supervision [4.137464623395376]
We employ a graph convolutional network to exploit structure in scene graphs and produce image embeddings useful for semantic image retrieval.
We propose a novel loss function that operates on pairs of similar and dissimilar images and imposes relative ordering between them in embedding space.
We demonstrate that this Ranking loss, coupled with an intuitive triple sampling strategy, leads to robust representations that outperform well-known contrastive losses on the retrieval task.
arXiv Detail & Related papers (2021-04-06T09:13:05Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms.
Our pipeline's modular structure allows easy integration, configuration, and combination of different methods.
We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.