Dense Siamese Network
- URL: http://arxiv.org/abs/2203.11075v1
- Date: Mon, 21 Mar 2022 15:55:23 GMT
- Title: Dense Siamese Network
- Authors: Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy
- Abstract summary: We present Dense Siamese Network (DenseSiam), a simple unsupervised learning framework for dense prediction tasks.
It learns visual representations by maximizing the similarity between two views of one image with two types of consistency, i.e., pixel consistency and region consistency.
It surpasses state-of-the-art segmentation methods by 2.1 mIoU with 28% training costs.
- Score: 86.23741104851383
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper presents Dense Siamese Network (DenseSiam), a simple unsupervised
learning framework for dense prediction tasks. It learns visual representations
by maximizing the similarity between two views of one image with two types of
consistency, i.e., pixel consistency and region consistency. Concretely,
DenseSiam first maximizes the pixel level spatial consistency according to the
exact location correspondence in the overlapped area. It also extracts a batch
of region embeddings that correspond to some sub-regions in the overlapped area
to be contrasted for region consistency. In contrast to previous methods that
require negative pixel pairs, momentum encoders, or heuristic masks, DenseSiam
benefits from the simple Siamese network and optimizes the consistency of
different granularities. It also proves that the simple location correspondence
and interacted region embeddings are effective enough to learn the similarity.
We apply DenseSiam on ImageNet and obtain competitive improvements on various
downstream tasks. We also show that only with some extra task-specific losses,
the simple framework can directly conduct dense prediction tasks. On an
existing unsupervised semantic segmentation benchmark, it surpasses
state-of-the-art segmentation methods by 2.1 mIoU with 28% training costs.
Related papers
- Weakly-Supervised Semantic Segmentation of Circular-Scan,
Synthetic-Aperture-Sonar Imagery [3.5534342430133514]
We propose a weakly-supervised framework for the semantic segmentation of circular-scan synthetic-aperture-sonar (CSAS) imagery.
We show that our framework performs comparably to nine fully-supervised deep networks.
We achieve state-of-the-art performance when pre-training on natural imagery.
arXiv Detail & Related papers (2024-01-20T19:55:36Z) - Associating Spatially-Consistent Grouping with Text-supervised Semantic
Segmentation [117.36746226803993]
We introduce self-supervised spatially-consistent grouping with text-supervised semantic segmentation.
Considering the part-like grouped results, we further adapt a text-supervised model from image-level to region-level recognition.
Our method achieves 59.2% mIoU and 32.4% mIoU on Pascal VOC and Pascal Context benchmarks.
arXiv Detail & Related papers (2023-04-03T16:24:39Z) - MKANet: A Lightweight Network with Sobel Boundary Loss for Efficient
Land-cover Classification of Satellite Remote Sensing Imagery [15.614937709070203]
Land cover classification is a multi-class task to classify each pixel into a certain natural or man-made category of the earth surface.
We present an efficient lightweight semantic segmentation network termed MKANet.
We show that MKANet acquires state-of-the-art accuracy on two land-cover classification datasets and infers 2X faster than other competitive lightweight networks.
arXiv Detail & Related papers (2022-07-28T03:29:08Z) - Unsupervised Image Segmentation by Mutual Information Maximization and
Adversarial Regularization [7.165364364478119]
We propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adrial Regularization (InMARS)
Inspired by human perception which parses a scene into perceptual groups, our proposed approach first partitions an input image into meaningful regions (also known as superpixels)
Next, it utilizes Mutual-Information-Maximization followed by an adversarial training strategy to cluster these regions into semantically meaningful classes.
Our experiments demonstrate that our method achieves the state-of-the-art performance on two commonly used unsupervised semantic segmentation datasets.
arXiv Detail & Related papers (2021-07-01T18:36:27Z) - Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks.
ReSim learns both regional representations for localization as well as semantic image-level representations.
We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning [86.45526827323954]
Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training.
We propose an iterative algorithm to learn such pairwise relations.
We show that the proposed algorithm performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2020-02-19T10:32:03Z) - An End-to-End Network for Co-Saliency Detection in One Single Image [47.35448093528382]
Co-saliency detection within a single image is a common vision problem that has not yet been well addressed.
This study proposes a novel end-to-end trainable network comprising a backbone net and two branch nets.
We construct a new dataset of 2,019 natural images with co-saliency in each image to evaluate the proposed method.
arXiv Detail & Related papers (2019-10-25T16:00:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.