Improving Pixel-Level Contrastive Learning by Leveraging Exogenous Depth
Information
- URL: http://arxiv.org/abs/2211.10177v1
- Date: Fri, 18 Nov 2022 11:45:39 GMT
- Title: Improving Pixel-Level Contrastive Learning by Leveraging Exogenous Depth
Information
- Authors: Ahmed Ben Saad, Kristina Prokopetc, Josselin Kherroubi, Axel Davy,
Adrien Courtois, Gabriele Facciolo
- Abstract summary: Self-supervised representation learning based on Contrastive Learning (CL) has been the subject of much attention in recent years.
In this paper we will focus on the depth information, which can be obtained by using a depth network or measured from available data.
We show that using this estimation information in the contrastive loss leads to improved results and that the learned representations better follow the shapes of objects.
- Score: 7.561849435043042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised representation learning based on Contrastive Learning (CL)
has been the subject of much attention in recent years. This is due to the
excellent results obtained on a variety of subsequent tasks (in particular
classification), without requiring a large amount of labeled samples. However,
most reference CL algorithms (such as SimCLR and MoCo, but also BYOL and Barlow
Twins) are not adapted to pixel-level downstream tasks. One existing solution
known as PixPro proposes a pixel-level approach that is based on filtering of
pairs of positive/negative image crops of the same image using the distance
between the crops in the whole image. We argue that this idea can be further
enhanced by incorporating semantic information provided by exogenous data as an
additional selection filter, which can be used (at training time) to improve
the selection of the pixel-level positive/negative samples. In this paper we
will focus on the depth information, which can be obtained by using a depth
estimation network or measured from available data (stereovision, parallax
motion, LiDAR, etc.). Scene depth can provide meaningful cues to distinguish
pixels belonging to different objects based on their depth. We show that using
this exogenous information in the contrastive loss leads to improved results
and that the learned representations better follow the shapes of objects. In
addition, we introduce a multi-scale loss that alleviates the issue of finding
the training parameters adapted to different object sizes. We demonstrate the
effectiveness of our ideas on the Breakout Segmentation on Borehole Images
where we achieve an improvement of 1.9\% over PixPro and nearly 5\% over the
supervised baseline. We further validate our technique on the indoor scene
segmentation tasks with ScanNet and outdoor scenes with CityScapes ( 1.6\% and
1.1\% improvement over PixPro respectively).
Related papers
- Temporal Lidar Depth Completion [0.08192907805418582]
We show how a state-of-the-art method PENet can be modified to benefit from recurrency.
Our algorithm achieves state-of-the-art results on the KITTI depth completion dataset.
arXiv Detail & Related papers (2024-06-17T08:25:31Z) - Probabilistic Deep Metric Learning for Hyperspectral Image
Classification [91.5747859691553]
This paper proposes a probabilistic deep metric learning framework for hyperspectral image classification.
It aims to predict the category of each pixel for an image captured by hyperspectral sensors.
Our framework can be readily applied to existing hyperspectral image classification methods.
arXiv Detail & Related papers (2022-11-15T17:57:12Z) - PDC: Piecewise Depth Completion utilizing Superpixels [0.0]
Current approaches often rely on CNN-based methods with several known drawbacks.
We propose our novel Piecewise Depth Completion (PDC), which works completely without deep learning.
In our evaluation, we can show both the influence of the individual proposed processing steps and the overall performance of our method on the challenging KITTI dataset.
arXiv Detail & Related papers (2021-07-14T13:58:39Z) - Dual Pixel Exploration: Simultaneous Depth Estimation and Image
Restoration [77.1056200937214]
We study the formation of the DP pair which links the blur and the depth information.
We propose an end-to-end DDDNet (DP-based Depth and De Network) to jointly estimate the depth and restore the image.
arXiv Detail & Related papers (2020-12-01T06:53:57Z) - Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised
Visual Representation Learning [60.75687261314962]
We introduce pixel-level pretext tasks for learning dense feature representations.
A pixel-to-propagation consistency task produces better results than state-of-the-art approaches.
Results demonstrate the strong potential of defining pretext tasks at the pixel level.
arXiv Detail & Related papers (2020-11-19T18:59:45Z) - Deep Photo Cropper and Enhancer [65.11910918427296]
We propose a new type of image enhancement problem: to crop an image which is embedded within a photo.
We split our proposed approach into two deep networks: deep photo cropper and deep image enhancer.
In the photo cropper network, we employ a spatial transformer to extract the embedded image.
In the photo enhancer, we employ super-resolution to increase the number of pixels in the embedded image.
arXiv Detail & Related papers (2020-08-03T03:50:20Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Rethinking Data Augmentation for Image Super-resolution: A Comprehensive
Analysis and a New Strategy [21.89072742618842]
We provide a comprehensive analysis of the existing augmentation methods applied to the super-resolution task.
We propose CutBlur that cuts a low-resolution patch and pastes it to the corresponding high-resolution image region and vice versa.
Our method consistently and significantly improves the performance across various scenarios.
arXiv Detail & Related papers (2020-04-01T13:49:38Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.