L-MAE: Masked Autoencoders are Semantic Segmentation Datasets Augmenter
- URL: http://arxiv.org/abs/2211.11242v2
- Date: Sun, 1 Oct 2023 07:25:31 GMT
- Title: L-MAE: Masked Autoencoders are Semantic Segmentation Datasets Augmenter
- Authors: Jiaru Jia, Mingzhe Liu, Jiake Xie, Xin Chen, Hong Zhang, Feixiang
Zhao, Aiqing Yang
- Abstract summary: This paper proposes a simple and effective label pixel-level completion method, textbf Mask AutoEncoder (L-MAE)
The proposed model are the first to apply the Mask Auto-Encoder to downstream tasks.
Experiments demonstrate a performance enhancement of 13.5% in the model trained with the L-MAE-enhanced dataset.
- Score: 8.183553437724603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating semantic segmentation datasets has consistently been laborious and
time-consuming, particularly in the context of large models or specialized
domains(i.e. Medical Imaging or Remote Sensing). Specifically, large models
necessitate a substantial volume of data, while datasets in professional
domains frequently require the involvement of domain experts. Both scenarios
are susceptible to inaccurate data labeling, which can significantly affect the
ultimate performance of the trained model. This paper proposes a simple and
effective label pixel-level completion method, \textbf{Label Mask AutoEncoder}
(L-MAE), which fully uses the existing information in the label to generate the
complete label. The proposed model are the first to apply the Mask Auto-Encoder
to downstream tasks. In detail, L-MAE adopts the fusion strategy that stacks
the label and the corresponding image, namely fuse map. Moreover, since some of
the image information is lost when masking the fuse map, direct reconstruction
may lead to poor performance. We proposed Image Patch Supplement algorithm to
supplement the missing information during the mask-reconstruct process, and
empirically found that an average of 4.1\% mIoU can be improved.
We conducted a experiment to evaluate the efficacy of L-MAE to complete the
dataset. We employed a degraded Pascal VOC dataset and the degraded dataset
enhanced by L-MAE to train an identical conventional semantic segmentation
model for the initial set of experiments. The results of these experiments
demonstrate a performance enhancement of 13.5\% in the model trained with the
L-MAE-enhanced dataset compared to the unenhanced dataset.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Towards Natural Image Matting in the Wild via Real-Scenario Prior [69.96414467916863]
We propose a new matting dataset based on the COCO dataset, namely COCO-Matting.
The built COCO-Matting comprises an extensive collection of 38,251 human instance-level alpha mattes in complex natural scenarios.
For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features.
The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes.
arXiv Detail & Related papers (2024-10-09T06:43:19Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Self-supervised Scene Text Segmentation with Object-centric Layered
Representations Augmented by Text Regions [22.090074821554754]
We propose a self-supervised scene text segmentation algorithm with layered decoupling of representations derived from the object-centric manner to segment images into texts and background.
On several public scene text datasets, our method outperforms the state-of-the-art unsupervised segmentation algorithms.
arXiv Detail & Related papers (2023-08-25T05:00:05Z) - DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z) - RADiff: Controllable Diffusion Models for Radio Astronomical Maps
Generation [6.128112213696457]
RADiff is a generative approach based on conditional diffusion models trained over an annotated radio dataset.
We show that it is possible to generate fully-synthetic image-annotation pairs to automatically augment any annotated dataset.
arXiv Detail & Related papers (2023-07-05T16:04:44Z) - Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget [10.290956481715387]
Masked Autoencoder Contrastive Tuning (MAE-CT) is a sequential approach that tunes the rich features such that they form semantic clusters of objects without using any labels.
MaE-CT does not rely on hand-crafted augmentations and frequently achieves its best performances while using only minimal augmentations (crop & flip)
MaE-CT excels over previous self-supervised methods trained on ImageNet in linear probing, k-NN and low-shot classification accuracy as well as in unsupervised clustering accuracy.
arXiv Detail & Related papers (2023-04-20T17:51:09Z) - Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with
Self-Supervised Depth Estimation [94.16816278191477]
We present a framework for semi-adaptive and domain-supervised semantic segmentation.
It is enhanced by self-supervised monocular depth estimation trained only on unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset.
arXiv Detail & Related papers (2021-08-28T01:33:38Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.