A Novel Image Descriptor with Aggregated Semantic Skeleton
Representation for Long-term Visual Place Recognition
- URL: http://arxiv.org/abs/2202.03677v1
- Date: Tue, 8 Feb 2022 06:49:38 GMT
- Title: A Novel Image Descriptor with Aggregated Semantic Skeleton
Representation for Long-term Visual Place Recognition
- Authors: Nie Jiwei and Feng Joe-Mei and Xue Dingyu and Pan Feng and Liu Wei and
Hu Jun and Cheng Shuai
- Abstract summary: We propose a novel image descriptor with aggregated semantic skeleton representation (SSR)
The SSR-VLAD of one image aggregates the semantic skeleton features of each category and encodes the spatial-temporal distribution information of the image semantic information.
We conduct a series of experiments on three public datasets of challenging urban scenes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a Simultaneous Localization and Mapping (SLAM) system, a loop-closure can
eliminate accumulated errors, which is accomplished by Visual Place Recognition
(VPR), a task that retrieves the current scene from a set of pre-stored
sequential images through matching specific scene-descriptors. In urban scenes,
the appearance variation caused by seasons and illumination has brought great
challenges to the robustness of scene descriptors. Semantic segmentation images
can not only deliver the shape information of objects but also their categories
and spatial relations that will not be affected by the appearance variation of
the scene. Innovated by the Vector of Locally Aggregated Descriptor (VLAD), in
this paper, we propose a novel image descriptor with aggregated semantic
skeleton representation (SSR), dubbed SSR-VLAD, for the VPR under drastic
appearance-variation of environments. The SSR-VLAD of one image aggregates the
semantic skeleton features of each category and encodes the spatial-temporal
distribution information of the image semantic information. We conduct a series
of experiments on three public datasets of challenging urban scenes. Compared
with four state-of-the-art VPR methods- CoHOG, NetVLAD, LOST-X, and
Region-VLAD, VPR by matching SSR-VLAD outperforms those methods and maintains
competitive real-time performance at the same time.
Related papers
- InvSeg: Test-Time Prompt Inversion for Semantic Segmentation [33.60580908728705]
InvSeg is a test-time prompt inversion method for semantic segmentation.
We introduce Contrastive Soft Clustering to align masks with the image's structure information.
InvSeg learns context-rich text prompts in embedding space and achieves accurate semantic alignment across modalities.
arXiv Detail & Related papers (2024-10-15T10:20:31Z) - VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition [17.393105901701098]
This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors.
Our experiments show that our representation can allow for better robustness than current solutions to serious domain shifts away from the training data distribution.
arXiv Detail & Related papers (2024-03-14T01:30:28Z) - Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval [92.13664084464514]
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent.
Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments.
We propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
arXiv Detail & Related papers (2024-03-03T07:58:03Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene
Classification [26.340737217001497]
Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training.
Previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes.
We propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images.
arXiv Detail & Related papers (2024-02-03T09:18:49Z) - Optimal Transport Aggregation for Visual Place Recognition [9.192660643226372]
We introduce SALAD, which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem.
In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative.
Our single-stage method surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost.
arXiv Detail & Related papers (2023-11-27T15:46:19Z) - IFSeg: Image-free Semantic Segmentation via Vision-Language Model [67.62922228676273]
We introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories.
We construct this artificial training data by creating a 2D map of random semantic categories and another map of their corresponding word tokens.
Our model not only establishes an effective baseline for this novel task but also demonstrates strong performances compared to existing methods.
arXiv Detail & Related papers (2023-03-25T08:19:31Z) - Fast and Efficient Scene Categorization for Autonomous Driving using
VAEs [2.694218293356451]
Scene categorization is a useful precursor task that provides prior knowledge for advanced computer vision tasks.
We propose to generate a global descriptor that captures coarse features from the image and use a classification head to map the descriptors to 3 scene categories: Rural, Urban and Suburban.
The proposed global descriptor is very compact with an embedding length of 128, significantly faster to compute, and is robust to seasonal and illuminational changes.
arXiv Detail & Related papers (2022-10-26T18:50:15Z) - Sparse Visual Counterfactual Explanations in Image Space [50.768119964318494]
We present a novel model for visual counterfactual explanations in image space.
We show that it can be used to detect undesired behavior of ImageNet classifiers due to spurious features in the ImageNet dataset.
arXiv Detail & Related papers (2022-05-16T20:23:11Z) - Weakly-supervised segmentation of referring expressions [81.73850439141374]
Text grounded semantic SEGmentation learns segmentation masks directly from image-level referring expressions without pixel-level annotations.
Our approach demonstrates promising results for weakly-supervised referring expression segmentation on the PhraseCut and RefCOCO datasets.
arXiv Detail & Related papers (2022-05-10T07:52:24Z) - Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based
Image Retrieval [55.29233996427243]
Low-shot sketch-based image retrieval is an emerging task in computer vision.
In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks.
For solving these tasks, we propose a semantically aligned cycle-consistent generative adversarial network (SEM-PCYC)
Our results demonstrate a significant boost in any-shot performance over the state-of-the-art on the extended version of the Sketchy, TU-Berlin and QuickDraw datasets.
arXiv Detail & Related papers (2020-06-20T22:43:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.