SFD2: Semantic-guided Feature Detection and Description
- URL: http://arxiv.org/abs/2304.14845v2
- Date: Sun, 11 Jun 2023 16:35:08 GMT
- Title: SFD2: Semantic-guided Feature Detection and Description
- Authors: Fei Xue and Ignas Budvytis and Roberto Cipolla
- Abstract summary: We propose to extract globally reliable features by implicitly embedding high-level semantics into both the detection and description processes.
Specifically, our semantic-aware detector is able to detect keypoints from reliable regions and suppress unreliable areas.
This boosts the accuracy of keypoint matching by reducing the number of features sensitive to appearance changes.
- Score: 34.36397639248686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual localization is a fundamental task for various applications including
autonomous driving and robotics. Prior methods focus on extracting large
amounts of often redundant locally reliable features, resulting in limited
efficiency and accuracy, especially in large-scale environments under
challenging conditions. Instead, we propose to extract globally reliable
features by implicitly embedding high-level semantics into both the detection
and description processes. Specifically, our semantic-aware detector is able to
detect keypoints from reliable regions (e.g. building, traffic lane) and
suppress unreliable areas (e.g. sky, car) implicitly instead of relying on
explicit semantic labels. This boosts the accuracy of keypoint matching by
reducing the number of features sensitive to appearance changes and avoiding
the need of additional segmentation networks at test time. Moreover, our
descriptors are augmented with semantics and have stronger discriminative
ability, providing more inliers at test time. Particularly, experiments on
long-term large-scale visual localization Aachen Day-Night and RobotCar-Seasons
datasets demonstrate that our model outperforms previous local features and
gives competitive accuracy to advanced matchers but is about 2 and 3 times
faster when using 2k and 4k keypoints, respectively.
Related papers
- A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning [9.786907179872815]
The potential of vision and language remains underexplored in face forgery detection.
There is a need for a methodology that converts face forgery detection to a Visual Question Answering (VQA) task.
We propose a multi-staged approach that diverges from the traditional binary decision paradigm to address this gap.
arXiv Detail & Related papers (2024-10-01T08:16:40Z) - Learning to Make Keypoints Sub-Pixel Accurate [80.55676599677824]
This work addresses the challenge of sub-pixel accuracy in detecting 2D local features.
We propose a novel network that enhances any detector with sub-pixel precision by learning an offset vector for detected features.
arXiv Detail & Related papers (2024-07-16T12:39:56Z) - Simultaneous Clutter Detection and Semantic Segmentation of Moving
Objects for Automotive Radar Data [12.96486891333286]
Radar sensors are an important part of the environment perception system of autonomous vehicles.
One of the first steps during the processing of radar point clouds is often the detection of clutter.
Another common objective is the semantic segmentation of moving road users.
We show that our setup is highly effective and outperforms every existing network for semantic segmentation on the RadarScenes dataset.
arXiv Detail & Related papers (2023-11-13T11:29:38Z) - Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching [74.75284453828017]
Open-Vocabulary Keypoint Detection (OVKD) task is innovatively designed to use text prompts for identifying arbitrary keypoints across any species.
We have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM)
This framework combines vision and language models, creating an interplay between language features and local keypoint visual features.
arXiv Detail & Related papers (2023-10-08T07:42:41Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Cognitive TransFuser: Semantics-guided Transformer-based Sensor Fusion
for Improved Waypoint Prediction [38.971222477695214]
RGB-LIDAR-based multi-task feature fusion network, coined Cognitive TransFuser, augments and exceeds the baseline network by a significant margin for safer and more complete road navigation.
We validate the proposed network on the Town05 Short and Town05 Long Benchmark through extensive experiments, achieving up to 44.2 FPS real-time inference time.
arXiv Detail & Related papers (2023-08-04T03:59:10Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Few-Shot Keypoint Detection as Task Adaptation via Latent Embeddings [17.04471874483516]
Existing approaches either compute dense keypoint embeddings in a single forward pass, or allocate their full capacity to a sparse set of points.
In this paper we explore a middle ground based on the observation that the number of relevant points at a given time are typically relatively few.
Our main contribution is a novel architecture, inspired by few-shot task adaptation, which allows a sparse-style network to condition on a keypoint embedding.
arXiv Detail & Related papers (2021-12-09T13:25:42Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D
Vehicle Detection from Point Cloud [39.99118618229583]
We propose a unified model SegVoxelNet to address the above two problems.
A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view.
A novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range.
arXiv Detail & Related papers (2020-02-13T02:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.