Effects of Architectures on Continual Semantic Segmentation
- URL: http://arxiv.org/abs/2302.10718v1
- Date: Tue, 21 Feb 2023 15:12:01 GMT
- Title: Effects of Architectures on Continual Semantic Segmentation
- Authors: Tobias Kalb, Niket Ahuja, Jingxing Zhou, J\"urgen Beyerer
- Abstract summary: We study how the choice of neural network architecture affects catastrophic forgetting in class- and domain-incremental semantic segmentation.
We find that traditional CNNs like ResNet have high plasticity but low stability, while transformer architectures are much more stable.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research in the field of Continual Semantic Segmentation is mainly
investigating novel learning algorithms to overcome catastrophic forgetting of
neural networks. Most recent publications have focused on improving learning
algorithms without distinguishing effects caused by the choice of neural
architecture.Therefore, we study how the choice of neural network architecture
affects catastrophic forgetting in class- and domain-incremental semantic
segmentation. Specifically, we compare the well-researched CNNs to recently
proposed Transformers and Hybrid architectures, as well as the impact of the
choice of novel normalization layers and different decoder heads. We find that
traditional CNNs like ResNet have high plasticity but low stability, while
transformer architectures are much more stable. When the inductive biases of
CNN architectures are combined with transformers in hybrid architectures, it
leads to higher plasticity and stability. The stability of these models can be
explained by their ability to learn general features that are robust against
distribution shifts. Experiments with different normalization layers show that
Continual Normalization achieves the best trade-off in terms of adaptability
and stability of the model. In the class-incremental setting, the choice of the
normalization layer has much less impact. Our experiments suggest that the
right choice of architecture can significantly reduce forgetting even with
naive fine-tuning and confirm that for real-world applications, the
architecture is an important factor in designing a continual learning model.
Related papers
- What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis [8.008567379796666]
The Transformer architecture has inarguably revolutionized deep learning.
At its core, the attention block differs in form and functionality from most other architectural components in deep learning.
The root causes behind these outward manifestations, and the precise mechanisms that govern them, remain poorly understood.
arXiv Detail & Related papers (2024-10-14T18:15:02Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - Rethinking Architecture Design for Tackling Data Heterogeneity in
Federated Learning [53.73083199055093]
We show that attention-based architectures (e.g., Transformers) are fairly robust to distribution shifts.
Our experiments show that replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices.
arXiv Detail & Related papers (2021-06-10T21:04:18Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Adaptive Signal Variances: CNN Initialization Through Modern
Architectures [0.7646713951724012]
Deep convolutional neural networks (CNN) have achieved the unwavering confidence in its performance on image processing tasks.
CNN practitioners widely understand the fact that the stability of learning depends on how to initialize the model parameters in each layer.
arXiv Detail & Related papers (2020-08-16T11:26:29Z) - The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network
Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks.
We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance.
Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Inferring Convolutional Neural Networks' accuracies from their
architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance.
We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems.
We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.