LMPNet for Weakly-supervised Keypoint Discovery
- URL: http://arxiv.org/abs/2507.02308v1
- Date: Thu, 03 Jul 2025 04:36:03 GMT
- Title: LMPNet for Weakly-supervised Keypoint Discovery
- Authors: Pei Guo, Ryan Farrell,
- Abstract summary: We explore the task of semantic object keypoint discovery weakly-supervised by only category labels.<n>This is achieved by transforming discriminatively-trained intermediate layer filters into keypoint detectors.<n>Experiments show that LMPNet can (i) automatically discover semantic keypoints that are robust to object pose and (ii) achieves strong prediction accuracy comparable to a supervised pose estimation model.
- Score: 2.033434950296318
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we explore the task of semantic object keypoint discovery weakly-supervised by only category labels. This is achieved by transforming discriminatively-trained intermediate layer filters into keypoint detectors. We begin by identifying three preferred characteristics of keypoint detectors: (i) spatially sparse activations, (ii) consistency and (iii) diversity. Instead of relying on hand-crafted loss terms, a novel computationally-efficient leaky max pooling (LMP) layer is proposed to explicitly encourage final conv-layer filters to learn "non-repeatable local patterns" that are well aligned with object keypoints. Informed by visualizations, a simple yet effective selection strategy is proposed to ensure consistent filter activations and attention mask-out is then applied to force the network to distribute its attention to the whole object instead of just the most discriminative region. For the final keypoint prediction, a learnable clustering layer is proposed to group keypoint proposals into keypoint predictions. The final model, named LMPNet, is highly interpretable in that it directly manipulates network filters to detect predefined concepts. Our experiments show that LMPNet can (i) automatically discover semantic keypoints that are robust to object pose and (ii) achieves strong prediction accuracy comparable to a supervised pose estimation model.
Related papers
- Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection [83.90563802153707]
PLUSNet is a high-quality Small object detection framework.<n>It comprises three components: the Hierarchical Feature (HFP) framework for purifying upstream features, the Multiple Criteria Label Assignment (MCLA) for improving the quality of midstream training samples, and the Frequency Decoupled Head (FDHead) for more effectively exploiting information to accomplish downstream tasks.
arXiv Detail & Related papers (2025-04-29T10:11:03Z) - CS-Net:Contribution-based Sampling Network for Point Cloud Simplification [50.55658910053004]
Point cloud sampling plays a crucial role in reducing computation costs and storage requirements for various vision tasks.<n>Traditional sampling methods, such as farthest point sampling, lack task-specific information.<n>We propose a contribution-based sampling network (CS-Net), where the sampling operation is formulated as a Top-k operation.
arXiv Detail & Related papers (2025-01-18T14:56:09Z) - Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated,
Noisy, and Decimated Point Cloud Data [17.471342278936365]
We propose a new method to infer keypoints from arbitrary object categories in practical scenarios where point cloud data (PCD) are noisy, down-sampled and arbitrarily rotated.
We achieve these desiderata by proposing a new self-supervised training strategy for keypoints estimation.
We compare the keypoints estimated by the proposed approach with those of the state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2023-08-10T08:10:01Z) - Monte Carlo Linear Clustering with Single-Point Supervision is Enough
for Infrared Small Target Detection [48.707233614642796]
Single-frame infrared small target (SIRST) detection aims at separating small targets from clutter backgrounds on infrared images.
Deep learning based methods have achieved promising performance on SIRST detection, but at the cost of a large amount of training data.
We propose the first method to achieve SIRST detection with single-point supervision.
arXiv Detail & Related papers (2023-04-10T08:04:05Z) - Stochastic Deep Networks with Linear Competing Units for Model-Agnostic
Meta-Learning [4.97235247328373]
This work addresses meta-learning (ML) by considering deep networks with local winner-takes-all (LWTA) activations.
This type of network units results in sparse representations from each model layer, as the units are organized into blocks where only one unit generates a non-zero output.
Our approach produces state-of-the-art predictive accuracy on few-shot image classification and regression experiments, as well as reduced predictive error on an active learning setting.
arXiv Detail & Related papers (2022-08-02T16:19:54Z) - GKNet: grasp keypoint network for grasp candidates detection [15.214390498300101]
This paper presents a different approach to grasp detection by treating it as keypoint detection.
The deep network detects each grasp candidate as a pair of keypoints, convertible to the grasp representation g = x, y, w, thetaT, rather than a triplet or quartet of corner points.
arXiv Detail & Related papers (2021-06-16T00:34:55Z) - Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task.
Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z) - Point-Level Temporal Action Localization: Bridging Fully-supervised
Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels.
This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z) - Conditional Link Prediction of Category-Implicit Keypoint Detection [26.400925420154245]
We propose an end-to-end category-implicit Keypoint and Link Prediction Network (KLPNet)
In our KLPNet, a novel Conditional Link Prediction Graph is proposed for link prediction among keypoints that are contingent on a predefined category.
Experiments conducted on three publicly available benchmarks demonstrate that our KLPNet consistently outperforms all other state-of-the-art approaches.
arXiv Detail & Related papers (2020-11-29T23:00:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.