Related papers: A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized Semantic Segmentation

A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized Semantic Segmentation

URL: http://arxiv.org/abs/2308.13331v1
Date: Fri, 25 Aug 2023 12:06:00 GMT
Title: A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized Semantic Segmentation
Authors: Jan-Aike Term\"ohlen, Timo Bartels, Tim Fingscheidt
Abstract summary: We present a new augmentation-driven approach to domain generalization for semantic segmentation. We achieve state-of-the-art mIoU performance of 47.3% (prior art: 46.3%) for small models and of 50.1% (prior art: 47.8%) for midsized models on commonly used benchmark datasets.
Score: 24.8695123473653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The task of semantic segmentation requires a model to assign semantic labels to each pixel of an image. However, the performance of such models degrades when deployed in an unseen domain with different data distributions compared to the training domain. We present a new augmentation-driven approach to domain generalization for semantic segmentation using a re-parameterized vision transformer (ReVT) with weight averaging of multiple models after training. We evaluate our approach on several benchmark datasets and achieve state-of-the-art mIoU performance of 47.3% (prior art: 46.3%) for small models and of 50.1% (prior art: 47.8%) for midsized models on commonly used benchmark datasets. At the same time, our method requires fewer parameters and reaches a higher frame rate than the best prior art. It is also easy to implement and, unlike network ensembles, does not add any computational complexity during inference.

Related papers

No time to train! Training-Free Reference-Based Instance Segmentation [15.061599989448867]
This work investigates the task of object segmentation when provided with only a small set of reference images.<n>Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image.<n>We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method.
arXiv Detail & Related papers (2025-07-03T16:59:01Z)
Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation. We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks. Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z)
Early Fusion of Features for Semantic Segmentation [10.362589129094975]
This paper introduces a novel segmentation framework that integrates a classifier network with a reverse HRNet architecture for efficient image segmentation. Our methodology is rigorously tested across several benchmark datasets including Mapillary Vistas, Cityscapes, CamVid, COCO, and PASCAL-VOC2012. The results demonstrate the effectiveness of our proposed model in achieving high segmentation accuracy, indicating its potential for various applications in image analysis.
arXiv Detail & Related papers (2024-02-08T22:58:06Z)
CNNs with Multi-Level Attention for Domain Generalization [3.1372269816123994]
Deep convolutional neural networks have achieved significant success in image classification and ranking. Deep convolutional neural networks suffer from performance degradation when neural networks are tested on out-of-distribution scenarios. We propose an alternative neural network architecture for robust, out-of-distribution image classification.
arXiv Detail & Related papers (2023-04-02T10:34:40Z)
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images. A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets. This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets. In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z)
Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network. PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
Feature Transformation Ensemble Model with Batch Spectral Regularization for Cross-Domain Few-Shot Classification [66.91839845347604]
We propose an ensemble prediction model by performing diverse feature transformations after a feature extraction network. We use a batch spectral regularization term to suppress the singular values of the feature matrix during pre-training to improve the generalization ability of the model. The proposed model can then be fine tuned in the target domain to address few-shot classification.
arXiv Detail & Related papers (2020-05-18T05:31:04Z)
FDA: Fourier Domain Adaptation for Semantic Segmentation [82.4963423086097]
We describe a simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the low-frequency spectrum of one with the other. We illustrate the method in semantic segmentation, where densely annotated images are aplenty in one domain, but difficult to obtain in another. Our results indicate that even simple procedures can discount nuisance variability in the data that more sophisticated methods struggle to learn away.
arXiv Detail & Related papers (2020-04-11T22:20:48Z)
Objectness-Aware Few-Shot Semantic Segmentation [31.13009111054977]
We show how to increase overall model capacity to achieve improved performance. We introduce objectness, which is class-agnostic and so not prone to overfitting. Given only one annotated example of an unseen category, experiments show that our method outperforms state-of-art methods with respect to mIoU.
arXiv Detail & Related papers (2020-04-06T19:12:08Z)
Generalizable Model-agnostic Semantic Segmentation via Target-specific Normalization [24.14272032117714]
We propose a novel domain generalization framework for the generalizable semantic segmentation task. We exploit the model-agnostic learning to simulate the domain shift problem. Considering the data-distribution discrepancy between seen source and unseen target domains, we develop the target-specific normalization scheme.
arXiv Detail & Related papers (2020-03-27T09:25:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.