Point-Level Region Contrast for Object Detection Pre-Training
- URL: http://arxiv.org/abs/2202.04639v1
- Date: Wed, 9 Feb 2022 18:56:41 GMT
- Title: Point-Level Region Contrast for Object Detection Pre-Training
- Authors: Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C.
Berg
- Abstract summary: We present point-level region contrast, a self-supervised pre-training approach for the task of object detection.
Our approach performs contrastive learning by directly sampling individual point pairs from different regions.
Compared to an aggregated representation per region, our approach is more robust to the change in input region quality.
- Score: 147.47349344401806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we present point-level region contrast, a self-supervised
pre-training approach for the task of object detection. This approach is
motivated by the two key factors in detection: localization and recognition.
While accurate localization favors models that operate at the pixel- or
point-level, correct recognition typically relies on a more holistic,
region-level view of objects. Incorporating this perspective in pre-training,
our approach performs contrastive learning by directly sampling individual
point pairs from different regions. Compared to an aggregated representation
per region, our approach is more robust to the change in input region quality,
and further enables us to implicitly improve initial region assignments via
online knowledge distillation during training. Both advantages are important
when dealing with imperfect regions encountered in the unsupervised setting.
Experiments show point-level region contrast improves on state-of-the-art
pre-training methods for object detection and segmentation across multiple
tasks and datasets, and we provide extensive ablation studies and
visualizations to aid understanding. Code will be made available.
Related papers
- Image Copy-Move Forgery Detection via Deep PatchMatch and Pairwise Ranking Learning [39.85737063875394]
This study develops a novel end-to-end CMFD framework that integrates the strengths of conventional and deep learning methods.
Unlike existing deep models, our approach utilizes features extracted from high-resolution scales to seek explicit and reliable point-to-point matching.
By leveraging the strong prior of point-to-point matches, the framework can identify subtle differences and effectively discriminate between source and target regions.
arXiv Detail & Related papers (2024-04-26T10:38:17Z) - Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - From Global to Local: Multi-scale Out-of-distribution Detection [129.37607313927458]
Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process.
Recent progress in representation learning gives rise to distance-based OOD detection.
We propose Multi-scale OOD DEtection (MODE), a first framework leveraging both global visual information and local region details.
arXiv Detail & Related papers (2023-08-20T11:56:25Z) - ROIFormer: Semantic-Aware Region of Interest Transformer for Efficient
Self-Supervised Monocular Depth Estimation [6.923035780685481]
We propose an efficient local adaptive attention method for geometric aware representation enhancement.
We leverage geometric cues from semantic information to learn local adaptive bounding boxes to guide unsupervised feature aggregation.
Our proposed method establishes a new state-of-the-art in self-supervised monocular depth estimation task.
arXiv Detail & Related papers (2022-12-12T06:38:35Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Cross-Image Region Mining with Region Prototypical Network for Weakly
Supervised Segmentation [45.39679291105364]
We propose a region network RPNet to explore the cross-image object diversity of the training set.
Similar object parts across images are identified via region feature comparison.
Experiments show that the proposed method generates more complete and accurate pseudo object masks.
arXiv Detail & Related papers (2021-08-17T02:51:02Z) - Align Yourself: Self-supervised Pre-training for Fine-grained
Recognition via Saliency Alignment [34.38172454910976]
Cross-view Saliency Alignment (CVSA) is a contrastive learning framework that first crops and swaps saliency regions of images as a novel view generation and then guides the model to localize on the foreground object via a cross-view alignment loss.
experiments on four popular fine-grained classification benchmarks show that CVSA significantly improves the learned representation.
arXiv Detail & Related papers (2021-06-30T02:56:26Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Interpretable and Accurate Fine-grained Recognition via Region Grouping [14.28113520947247]
We present an interpretable deep model for fine-grained visual recognition.
At the core of our method lies the integration of region-based part discovery and attribution within a deep neural network.
Our results compare favorably to state-of-the-art methods on classification tasks.
arXiv Detail & Related papers (2020-05-21T01:18:26Z) - Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation [62.29076080124199]
This paper proposes a novel coarse-to-fine feature adaptation approach to cross-domain object detection.
At the coarse-grained stage, foreground regions are extracted by adopting the attention mechanism, and aligned according to their marginal distributions.
At the fine-grained stage, we conduct conditional distribution alignment of foregrounds by minimizing the distance of global prototypes with the same category but from different domains.
arXiv Detail & Related papers (2020-03-23T13:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.