CAR: Class-aware Regularizations for Semantic Segmentation
- URL: http://arxiv.org/abs/2203.07160v1
- Date: Mon, 14 Mar 2022 15:02:48 GMT
- Title: CAR: Class-aware Regularizations for Semantic Segmentation
- Authors: Ye Huang, Di Kang, Liang Chen, Xuefei Zhe, Wenjing Jia, Xiangjian He,
Linchao Bao
- Abstract summary: We propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning.
Our method can be easily applied to most existing segmentation models during training, including OCR and CPNet.
- Score: 20.947897583427192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent segmentation methods, such as OCR and CPNet, utilizing "class level"
information in addition to pixel features, have achieved notable success for
boosting the accuracy of existing network modules. However, the extracted
class-level information was simply concatenated to pixel features, without
explicitly being exploited for better pixel representation learning. Moreover,
these approaches learn soft class centers based on coarse mask prediction,
which is prone to error accumulation. In this paper, aiming to use class level
information more effectively, we propose a universal Class-Aware Regularization
(CAR) approach to optimize the intra-class variance and inter-class distance
during feature learning, motivated by the fact that humans can recognize an
object by itself no matter which other objects it appears with. Three novel
loss functions are proposed. The first loss function encourages more compact
class representations within each class, the second directly maximizes the
distance between different class centers, and the third further pushes the
distance between inter-class centers and pixels. Furthermore, the class center
in our approach is directly generated from ground truth instead of from the
error-prone coarse prediction. Our method can be easily applied to most
existing segmentation models during training, including OCR and CPNet, and can
largely improve their accuracy at no additional inference overhead. Extensive
experiments and ablation studies conducted on multiple benchmark datasets
demonstrate that the proposed CAR can boost the accuracy of all baseline models
by up to 2.23% mIOU with superior generalization ability. The complete code is
available at https://github.com/edwardyehuang/CAR.
Related papers
- Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic
Segmentation [17.914290294935427]
Traditional 3D segmentation methods can only recognize a fixed range of classes that appear in the training set.
Large-scale visual-language pre-trained models, such as CLIP, have shown their generalization ability in the zero-shot 2D vision tasks.
We propose a simple yet effective baseline to transfer the visual-linguistic knowledge implied in CLIP to point cloud encoder.
arXiv Detail & Related papers (2023-12-12T12:35:59Z) - Unicom: Universal and Compact Representation Learning for Image
Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model.
To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss.
Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z) - DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot
Object Detection [39.937724871284665]
Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data.
Existing approaches enhance few-shot generalization with the sacrifice of base-class performance.
We propose a new training framework, DiGeo, to learn Geometry-aware features of inter-class separation and intra-class compactness.
arXiv Detail & Related papers (2023-03-16T22:37:09Z) - CARD: Semantic Segmentation with Efficient Class-Aware Regularized
Decoder [31.223271128719603]
We propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning.
CAR can be directly applied to most existing segmentation models during training, and can largely improve their accuracy at no additional inference overhead.
arXiv Detail & Related papers (2023-01-11T01:41:37Z) - Visual Recognition with Deep Nearest Centroids [57.35144702563746]
We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition.
Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition (ADE20K, Cityscapes)
arXiv Detail & Related papers (2022-09-15T15:47:31Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.