Patch-Level Contrasting without Patch Correspondence for Accurate and
Dense Contrastive Representation Learning
- URL: http://arxiv.org/abs/2306.13337v1
- Date: Fri, 23 Jun 2023 07:38:09 GMT
- Title: Patch-Level Contrasting without Patch Correspondence for Accurate and
Dense Contrastive Representation Learning
- Authors: Shaofeng Zhang, Feng Zhu, Rui Zhao, Junchi Yan
- Abstract summary: ADCLR is a self-supervised learning framework for learning accurate and dense vision representation.
Our approach achieves new state-of-the-art performance for contrastive methods.
- Score: 79.43940012723539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose ADCLR: A ccurate and D ense Contrastive Representation Learning, a
novel self-supervised learning framework for learning accurate and dense vision
representation. To extract spatial-sensitive information, ADCLR introduces
query patches for contrasting in addition with global contrasting. Compared
with previous dense contrasting methods, ADCLR mainly enjoys three merits: i)
achieving both global-discriminative and spatial-sensitive representation, ii)
model-efficient (no extra parameters in addition to the global contrasting
baseline), and iii) correspondence-free and thus simpler to implement. Our
approach achieves new state-of-the-art performance for contrastive methods. On
classification tasks, for ViT-S, ADCLR achieves 77.5% top-1 accuracy on
ImageNet with linear probing, outperforming our baseline (DINO) without our
devised techniques as plug-in, by 0.5%. For ViT-B, ADCLR achieves 79.8%, 84.0%
accuracy on ImageNet by linear probing and finetune, outperforming iBOT by
0.3%, 0.2% accuracy. For dense tasks, on MS-COCO, ADCLR achieves significant
improvements of 44.3% AP on object detection, 39.7% AP on instance
segmentation, outperforming previous SOTA method SelfPatch by 2.2% and 1.2%,
respectively. On ADE20K, ADCLR outperforms SelfPatch by 1.0% mIoU, 1.2% mAcc on
the segme
Related papers
- Sebica: Lightweight Spatial and Efficient Bidirectional Channel Attention Super Resolution Network [0.0]
Single Image Super-Resolution (SISR) is a vital technique for improving the visual quality of low-resolution images.
We present Sebica, a lightweight network that incorporates spatial and efficient bidirectional channel attention mechanisms.
Sebica significantly reduces computational costs while maintaining high reconstruction quality.
arXiv Detail & Related papers (2024-10-27T18:27:07Z) - DAPONet: A Dual Attention and Partially Overparameterized Network for Real-Time Road Damage Detection [4.185368042845483]
We propose DAPONet to enhance real-time road damage detection using street view image data (SVRDD)
DAPONet achieves a mAP50 of 70.1% on the SVRDD dataset, outperforming YOLOv10n by 10.4%, while reducing parameters to 1.6M and FLOPs to 1.7G, representing reductions of 41% and 80%, respectively.
On the MS COCO 2017 val dataset, DAPONet achieves an mAP50-95 of 33.4%, 0.8% higher than EfficientDet-D1, with a 74% reduction in both parameters and FLOPs.
arXiv Detail & Related papers (2024-09-03T04:53:32Z) - KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving [2.382388777981433]
This paper introduces the KAN-RCBEVDepth method to enhance 3D object detection in autonomous driving.
Our unique Bird's Eye View-based approach significantly improves detection accuracy and efficiency.
The code will be released in urlhttps://www.laitiamo.com/laitiamo/RCBEVDepth-KAN.
arXiv Detail & Related papers (2024-08-04T16:54:49Z) - Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models.
Our method allows for effortless integration with existing models' training pipelines.
On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z) - (Certified!!) Adversarial Robustness for Free! [116.6052628829344]
We certify 71% accuracy on ImageNet under adversarial perturbations constrained to be within a 2-norm of 0.5.
We obtain these results using only pretrained diffusion models and image classifiers, without requiring any fine tuning or retraining of model parameters.
arXiv Detail & Related papers (2022-06-21T17:27:27Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z) - ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning [91.13797346047984]
We introduce ADAHESSIAN, a second order optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates.
We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods.
arXiv Detail & Related papers (2020-06-01T05:00:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.