CYBORGS: Contrastively Bootstrapping Object Representations by Grounding
in Segmentation
- URL: http://arxiv.org/abs/2203.09343v1
- Date: Thu, 17 Mar 2022 14:20:05 GMT
- Title: CYBORGS: Contrastively Bootstrapping Object Representations by Grounding
in Segmentation
- Authors: Renhao Wang, Hang Zhao, Yang Gao
- Abstract summary: We propose a framework which accomplishes this goal via joint learning of representations and segmentation.
By iterating between these two components, we ground the contrastive updates in segmentation information, and simultaneously improve segmentation throughout pretraining.
- Score: 22.89327564484357
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many recent approaches in contrastive learning have worked to close the gap
between pretraining on iconic images like ImageNet and pretraining on complex
scenes like COCO. This gap exists largely because commonly used random crop
augmentations obtain semantically inconsistent content in crowded scene images
of diverse objects. Previous works use preprocessing pipelines to localize
salient objects for improved cropping, but an end-to-end solution is still
elusive. In this work, we propose a framework which accomplishes this goal via
joint learning of representations and segmentation. We leverage segmentation
masks to train a model with a mask-dependent contrastive loss, and use the
partially trained model to bootstrap better masks. By iterating between these
two components, we ground the contrastive updates in segmentation information,
and simultaneously improve segmentation throughout pretraining. Experiments
show our representations transfer robustly to downstream tasks in
classification, detection and segmentation.
Related papers
- IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence [2.822194296769473]
Few-shot segmentation techniques reduce the required number of images to learn to segment a new class.
interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time.
We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes.
arXiv Detail & Related papers (2024-03-22T10:15:53Z) - Foreground-Background Separation through Concept Distillation from
Generative Image Foundation Models [6.408114351192012]
We present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions.
We show results on the task of segmenting four different objects (humans, dogs, cars, birds) and a use case scenario in medical image analysis.
arXiv Detail & Related papers (2022-12-29T13:51:54Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models [54.49581189337848]
We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets.
The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse.
Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
arXiv Detail & Related papers (2022-07-04T13:02:32Z) - Learning To Segment Dominant Object Motion From Watching Videos [72.57852930273256]
We envision a simple framework for dominant moving object segmentation that neither requires annotated data to train nor relies on saliency priors or pre-trained optical flow maps.
Inspired by a layered image representation, we introduce a technique to group pixel regions according to their affine parametric motion.
This enables our network to learn segmentation of the dominant foreground object using only RGB image pairs as input for both training and inference.
arXiv Detail & Related papers (2021-11-28T14:51:00Z) - Half-Real Half-Fake Distillation for Class-Incremental Semantic
Segmentation [84.1985497426083]
convolutional neural networks are ill-equipped for incremental learning.
New classes are available but the initial training data is not retained.
We try to address this issue by "inverting" the trained segmentation network to synthesize input images starting from random noise.
arXiv Detail & Related papers (2021-04-02T03:47:16Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.