Independently Keypoint Learning for Small Object Semantic Correspondence
- URL: http://arxiv.org/abs/2404.02678v1
- Date: Wed, 3 Apr 2024 12:21:41 GMT
- Title: Independently Keypoint Learning for Small Object Semantic Correspondence
- Authors: Hailong Jin, Huiying Li,
- Abstract summary: Keypoint Bounding box-centered Cropping method proposed for small object semantic correspondence.
KBCNet comprises a Cross-Scale Feature Alignment (CSFA) module and an efficient 4D convolutional decoder.
Our method demonstrates a substantial performance improvement of 7.5% on the SPair-71k dataset.
- Score: 7.3866687886529805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic correspondence remains a challenging task for establishing correspondences between a pair of images with the same category or similar scenes due to the large intra-class appearance. In this paper, we introduce a novel problem called 'Small Object Semantic Correspondence (SOSC).' This problem is challenging due to the close proximity of keypoints associated with small objects, which results in the fusion of these respective features. It is difficult to identify the corresponding key points of the fused features, and it is also difficult to be recognized. To address this challenge, we propose the Keypoint Bounding box-centered Cropping (KBC) method, which aims to increase the spatial separation between keypoints of small objects, thereby facilitating independent learning of these keypoints. The KBC method is seamlessly integrated into our proposed inference pipeline and can be easily incorporated into other methodologies, resulting in significant performance enhancements. Additionally, we introduce a novel framework, named KBCNet, which serves as our baseline model. KBCNet comprises a Cross-Scale Feature Alignment (CSFA) module and an efficient 4D convolutional decoder. The CSFA module is designed to align multi-scale features, enriching keypoint representations by integrating fine-grained features and deep semantic features. Meanwhile, the 4D convolutional decoder, based on efficient 4D convolution, ensures efficiency and rapid convergence. To empirically validate the effectiveness of our proposed methodology, extensive experiments are conducted on three widely used benchmarks: PF-PASCAL, PF-WILLOW, and SPair-71k. Our KBC method demonstrates a substantial performance improvement of 7.5\% on the SPair-71K dataset, providing compelling evidence of its efficacy.
Related papers
- Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency [38.93610732090426]
We present PointDSC, a novel deep neural network that explicitly incorporates spatial consistency for pruning outlier correspondences.
Our method outperforms the state-of-the-art hand-crafted and learning-based outlier rejection approaches on several real-world datasets.
arXiv Detail & Related papers (2021-03-09T14:56:08Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - A Holistically-Guided Decoder for Deep Representation Learning with
Applications to Semantic Segmentation and Object Detection [74.88284082187462]
One common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps.
We propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps.
arXiv Detail & Related papers (2020-12-18T10:51:49Z) - CoKe: Localized Contrastive Learning for Robust Keypoint Detection [24.167397429511915]
We show that keypoint kernels can be chosen to optimize three types of distances in the feature space.
We formulate this optimization process within a framework, which includes supervised contrastive learning.
CoKe achieves state-of-the-art results compared to approaches that jointly represent all keypoints holistically.
arXiv Detail & Related papers (2020-09-29T16:00:43Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Correspondence Networks with Adaptive Neighbourhood Consensus [22.013820169455812]
We propose a convolutional neural network architecture, called adaptive neighbourhood consensus network (ANC-Net)
ANC-Net can be trained end-to-end with sparse key-point annotations to handle this challenge.
We thoroughly evaluate the effectiveness of our method on various benchmarks, where it substantially outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-03-26T17:58:09Z) - Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.