A Triplet-loss Dilated Residual Network for High-Resolution
Representation Learning in Image Retrieval
- URL: http://arxiv.org/abs/2303.08398v1
- Date: Wed, 15 Mar 2023 07:01:44 GMT
- Title: A Triplet-loss Dilated Residual Network for High-Resolution
Representation Learning in Image Retrieval
- Authors: Saeideh Yousefzadeh, Hamidreza Pourreza, Hamidreza Mahyar
- Abstract summary: In some applications, such as localization, image retrieval is employed as the initial step.
The current paper introduces a simple yet efficient image retrieval system with a fewer trainable parameters.
The proposed method benefits from a dilated residual convolutional neural network with triplet loss.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Content-based image retrieval is the process of retrieving a subset of images
from an extensive image gallery based on visual contents, such as color, shape
or spatial relations, and texture. In some applications, such as localization,
image retrieval is employed as the initial step. In such cases, the accuracy of
the top-retrieved images significantly affects the overall system accuracy. The
current paper introduces a simple yet efficient image retrieval system with a
fewer trainable parameters, which offers acceptable accuracy in top-retrieved
images. The proposed method benefits from a dilated residual convolutional
neural network with triplet loss. Experimental evaluations show that this model
can extract richer information (i.e., high-resolution representations) by
enlarging the receptive field, thus improving image retrieval accuracy without
increasing the depth or complexity of the model. To enhance the extracted
representations' robustness, the current research obtains candidate regions of
interest from each feature map and applies Generalized-Mean pooling to the
regions. As the choice of triplets in a triplet-based network affects the model
training, we employ a triplet online mining method. We test the performance of
the proposed method under various configurations on two of the challenging
image-retrieval datasets, namely Revisited Paris6k (RPar) and UKBench. The
experimental results show an accuracy of 94.54 and 80.23 (mean precision at
rank 10) in the RPar medium and hard modes and 3.86 (recall at rank 4) in the
UKBench dataset, respectively.
Related papers
- Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - Influence of image noise on crack detection performance of deep
convolutional neural networks [0.0]
Much research has been conducted on classifying cracks from image data using deep convolutional neural networks.
This paper will investigate the influence of image noise on network accuracy.
AlexNet was selected as the most efficient model based on the proposed index.
arXiv Detail & Related papers (2021-11-03T09:08:54Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Adversarial Domain Feature Adaptation for Bronchoscopic Depth Estimation [111.89519571205778]
In this work, we propose an alternative domain-adaptive approach to depth estimation.
Our novel two-step structure first trains a depth estimation network with labeled synthetic images in a supervised manner.
The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin.
arXiv Detail & Related papers (2021-09-24T08:11:34Z) - A Novel Triplet Sampling Method for Multi-Label Remote Sensing Image
Search and Retrieval [1.123376893295777]
A common approach for learning the metric space relies on the selection of triplets of similar (positive) and dissimilar (negative) images.
We propose a novel triplet sampling method in the framework of deep neural networks (DNNs) defined for multi-label RS CBIR problems.
arXiv Detail & Related papers (2021-05-08T09:16:09Z) - DenserNet: Weakly Supervised Visual Localization Using Multi-scale
Feature Aggregation [7.2531609092488445]
We develop a convolutional neural network architecture which aggregates feature maps at different semantic levels for image representations.
Second, our model is trained end-to-end without pixel-level annotation other than positive and negative GPS-tagged image pairs.
Third, our method is computationally efficient as our architecture has shared features and parameters during computation.
arXiv Detail & Related papers (2020-12-04T02:16:47Z) - Image Retrieval for Structure-from-Motion via Graph Convolutional
Network [13.040952255039702]
We present a novel retrieval method based on Graph Convolutional Network (GCN) to generate accurate pairwise matches without costly redundancy.
By constructing a subgraph surrounding the query image as input data, we adopt a learnable GCN to exploit whether nodes in the subgraph have overlapping regions with the query photograph.
Experiments demonstrate that our method performs remarkably well on the challenging dataset of highly ambiguous and duplicated scenes.
arXiv Detail & Related papers (2020-09-17T04:03:51Z) - Learning Condition Invariant Features for Retrieval-Based Localization
from 1M Images [85.81073893916414]
We develop a novel method for learning more accurate and better generalizing localization features.
On the challenging Oxford RobotCar night condition, our method outperforms the well-known triplet loss by 24.4% in localization accuracy within 5m.
arXiv Detail & Related papers (2020-08-27T14:46:22Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - Image Retrieval using Multi-scale CNN Features Pooling [26.811290793232313]
We present an end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on NetVLAD and a triplet mining procedure based on samples difficulty to obtain an effective image representation.
arXiv Detail & Related papers (2020-04-21T00:57:52Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.