CARD: Semantic Segmentation with Efficient Class-Aware Regularized
Decoder
- URL: http://arxiv.org/abs/2301.04258v1
- Date: Wed, 11 Jan 2023 01:41:37 GMT
- Title: CARD: Semantic Segmentation with Efficient Class-Aware Regularized
Decoder
- Authors: Ye Huang, Di Kang, Liang Chen, Wenjing Jia, Xiangjian He, Lixin Duan,
Xuefei Zhe, Linchao Bao
- Abstract summary: We propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning.
CAR can be directly applied to most existing segmentation models during training, and can largely improve their accuracy at no additional inference overhead.
- Score: 31.223271128719603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic segmentation has recently achieved notable advances by exploiting
"class-level" contextual information during learning. However, these approaches
simply concatenate class-level information to pixel features to boost the pixel
representation learning, which cannot fully utilize intra-class and inter-class
contextual information. Moreover, these approaches learn soft class centers
based on coarse mask prediction, which is prone to error accumulation. To
better exploit class level information, we propose a universal Class-Aware
Regularization (CAR) approach to optimize the intra-class variance and
inter-class distance during feature learning, motivated by the fact that humans
can recognize an object by itself no matter which other objects it appears
with. Moreover, we design a dedicated decoder for CAR (CARD), which consists of
a novel spatial token mixer and an upsampling module, to maximize its gain for
existing baselines while being highly efficient in terms of computational cost.
Specifically, CAR consists of three novel loss functions. The first loss
function encourages more compact class representations within each class, the
second directly maximizes the distance between different class centers, and the
third further pushes the distance between inter-class centers and pixels.
Furthermore, the class center in our approach is directly generated from ground
truth instead of from the error-prone coarse prediction. CAR can be directly
applied to most existing segmentation models during training, and can largely
improve their accuracy at no additional inference overhead. Extensive
experiments and ablation studies conducted on multiple benchmark datasets
demonstrate that the proposed CAR can boost the accuracy of all baseline models
by up to 2.23% mIOU with superior generalization ability. CARD outperforms SOTA
approaches on multiple benchmarks with a highly efficient architecture.
Related papers
- Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized
Zero-Shot Learning [0.7420433640907689]
Generalized Zero-Shot Learning (GZSL) recognizes unseen classes by transferring knowledge from the seen classes.
This paper introduces a dual strategy to address the generalization gap.
arXiv Detail & Related papers (2023-12-20T15:18:51Z) - Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic
Segmentation [17.914290294935427]
Traditional 3D segmentation methods can only recognize a fixed range of classes that appear in the training set.
Large-scale visual-language pre-trained models, such as CLIP, have shown their generalization ability in the zero-shot 2D vision tasks.
We propose a simple yet effective baseline to transfer the visual-linguistic knowledge implied in CLIP to point cloud encoder.
arXiv Detail & Related papers (2023-12-12T12:35:59Z) - Learning from Mistakes: Self-Regularizing Hierarchical Representations
in Point Cloud Semantic Segmentation [15.353256018248103]
LiDAR semantic segmentation has gained attention to accomplish fine-grained scene understanding.
We present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK) derived from a standard model.
Our LEAK approach is very general and can be seamlessly applied on top of any segmentation architecture.
arXiv Detail & Related papers (2023-01-26T14:52:30Z) - CAR: Class-aware Regularizations for Semantic Segmentation [20.947897583427192]
We propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning.
Our method can be easily applied to most existing segmentation models during training, including OCR and CPNet.
arXiv Detail & Related papers (2022-03-14T15:02:48Z) - Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels.
When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new.
In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z) - Few-Shot Incremental Learning with Continually Evolved Classifiers [46.278573301326276]
Few-shot class-incremental learning (FSCIL) aims to design machine learning algorithms that can continually learn new concepts from a few data points.
The difficulty lies in that limited data from new classes not only lead to significant overfitting issues but also exacerbate the notorious catastrophic forgetting problems.
We propose a Continually Evolved CIF ( CEC) that employs a graph model to propagate context information between classifiers for adaptation.
arXiv Detail & Related papers (2021-04-07T10:54:51Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - Generalized Zero-Shot Learning Via Over-Complete Distribution [79.5140590952889]
We propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes.
The effectiveness of the framework is evaluated using both Zero-Shot Learning and Generalized Zero-Shot Learning protocols.
arXiv Detail & Related papers (2020-04-01T19:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.