Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints
- URL: http://arxiv.org/abs/2507.11985v1
- Date: Wed, 16 Jul 2025 07:33:14 GMT
- Title: Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints
- Authors: Jiahao Xia, Yike Wu, Wenjian Huang, Jianguo Zhang, Jian Zhang,
- Abstract summary: Part-level features are crucial for image understanding, but few studies focus on them because of the lack of fine-grained labels.<n>We present a more effective paradigm for unsupervised part discovery, named Masked Part Autoencoder (MPAE)<n>MPAE learns part descriptors as well as a feature map from the inputs and produces patch features from a masked version of the original images.<n>It robustly discovers meaningful parts that closely match the actual object shapes, even in complex scenarios.
- Score: 14.712229958844333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Part-level features are crucial for image understanding, but few studies focus on them because of the lack of fine-grained labels. Although unsupervised part discovery can eliminate the reliance on labels, most of them cannot maintain robustness across various categories and scenarios, which restricts their application range. To overcome this limitation, we present a more effective paradigm for unsupervised part discovery, named Masked Part Autoencoder (MPAE). It first learns part descriptors as well as a feature map from the inputs and produces patch features from a masked version of the original images. Then, the masked regions are filled with the learned part descriptors based on the similarity between the local features and descriptors. By restoring these masked patches using the part descriptors, they become better aligned with their part shapes, guided by appearance features from unmasked patches. Finally, MPAE robustly discovers meaningful parts that closely match the actual object shapes, even in complex scenarios. Moreover, several looser yet more effective constraints are proposed to enable MPAE to identify the presence of parts across various scenarios and categories in an unsupervised manner. This provides the foundation for addressing challenges posed by occlusion and for exploring part similarity across multiple categories. Extensive experiments demonstrate that our method robustly discovers meaningful parts across various categories and scenarios. The code is available at the project https://github.com/Jiahao-UTS/MPAE.
Related papers
- DQFormer: Towards Unified LiDAR Panoptic Segmentation with Decoupled Queries [14.435906383301555]
We propose a novel framework dubbed DQFormer to implement semantic and instance segmentation in a unified workflow.
Specifically, we design a decoupled query generator to propose informative queries with semantics by localizing things/stuff positions.
We also introduce a query-oriented mask decoder to decode corresponding segmentation masks.
arXiv Detail & Related papers (2024-08-28T14:14:33Z) - Masked Multi-Query Slot Attention for Unsupervised Object Discovery [7.613552182035413]
In this work, we consider an object-centric approach in which DINO ViT features are reconstructed via a set of representations queried called slots.
We propose a masking scheme on input features that disregards the background regions, inducing our model to focus more on salient objects during the reconstruction phase.
Our experimental results and ablations on the PASCAL-VOC 2012 dataset show the importance of each component and highlight how their combination consistently improves object localization.
arXiv Detail & Related papers (2024-04-30T15:51:05Z) - Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision [87.15580604023555]
Unpair-Seg is a novel weakly-supervised open-vocabulary segmentation framework.
It learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected.
It achieves 14.6% and 19.5% mIoU on the ADE-847 and PASCAL Context-459 datasets.
arXiv Detail & Related papers (2024-02-14T06:01:44Z) - PDiscoNet: Semantically consistent part discovery for fine-grained
recognition [62.12602920807109]
We propose PDiscoNet to discover object parts by using only image-level class labels along with priors encouraging the parts to be.
Our results on CUB, CelebA, and PartImageNet show that the proposed method provides substantially better part discovery performance than previous methods.
arXiv Detail & Related papers (2023-09-06T17:19:29Z) - Multi-spectral Class Center Network for Face Manipulation Detection and Localization [52.569170436393165]
We propose a novel Multi-Spectral Class Center Network (MSCCNet) for face manipulation detection and localization.
Based on the features of different frequency bands, the MSCC module collects multi-spectral class centers and computes pixel-to-class relations.
Applying multi-spectral class-level representations suppresses the semantic information of the visual concepts which is insensitive to manipulated regions of forgery images.
arXiv Detail & Related papers (2023-05-18T08:09:20Z) - Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual
Mask Annotations [86.47908754383198]
Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories.
Our method generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs.
Our method trained with just pseudo-masks significantly improves the mAP scores on the MS-COCO dataset and OpenImages dataset.
arXiv Detail & Related papers (2023-03-29T17:58:39Z) - SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance
Segmentation [22.930296667684125]
We propose a new box-supervised instance segmentation approach by developing a Semantic-aware Instance Mask (SIM) generation paradigm.
Considering that the semantic-aware prototypes cannot distinguish different instances of the same semantics, we propose a self-correction mechanism.
Extensive experimental results demonstrate the superiority of our proposed SIM approach over other state-of-the-art methods.
arXiv Detail & Related papers (2023-03-14T05:59:25Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.
We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase.
Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.