Decoder Denoising Pretraining for Semantic Segmentation
- URL: http://arxiv.org/abs/2205.11423v1
- Date: Mon, 23 May 2022 16:08:31 GMT
- Title: Decoder Denoising Pretraining for Semantic Segmentation
- Authors: Emmanuel Brempong Asiedu, Simon Kornblith, Ting Chen, Niki Parmar,
Matthias Minderer and Mohammad Norouzi
- Abstract summary: We propose a decoder pretraining approach based on denoising.
We find that decoder denoising pretraining on the ImageNet dataset strongly outperforms encoder-only supervised pretraining.
- Score: 46.23441959230505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation labels are expensive and time consuming to acquire.
Hence, pretraining is commonly used to improve the label-efficiency of
segmentation models. Typically, the encoder of a segmentation model is
pretrained as a classifier and the decoder is randomly initialized. Here, we
argue that random initialization of the decoder can be suboptimal, especially
when few labeled examples are available. We propose a decoder pretraining
approach based on denoising, which can be combined with supervised pretraining
of the encoder. We find that decoder denoising pretraining on the ImageNet
dataset strongly outperforms encoder-only supervised pretraining. Despite its
simplicity, decoder denoising pretraining achieves state-of-the-art results on
label-efficient semantic segmentation and offers considerable gains on the
Cityscapes, Pascal Context, and ADE20K datasets.
Related papers
- Should we pre-train a decoder in contrastive learning for dense prediction tasks? [0.7237068561453082]
We propose a framework-agnostic adaptation to convert an encoder-only self-supervised learning (SSL) contrastive approach to an efficient encoder-decoder framework.
We first update the existing architecture to accommodate a decoder and its respective contrastive loss.
We then introduce a weighted encoder-decoder contrastive loss with non-competing objectives that facilitates the joint encoder-decoder architecture pre-training.
arXiv Detail & Related papers (2025-03-21T20:19:13Z) - $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation [53.06337011259031]
We introduce UnFuSeD, a novel approach to leverage self-supervised learning for audio classification.
We use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step.
UnFuSeD achieves state-of-the-art results on the LAPE Benchmark, significantly outperforming all our baselines.
arXiv Detail & Related papers (2023-03-10T02:43:36Z) - Transfer Learning for Segmentation Problems: Choose the Right Encoder
and Skip the Decoder [0.0]
It is common practice to reuse models initially trained on different data to increase downstream task performance.
In this work, we investigate the impact of transfer learning for segmentation problems, being pixel-wise classification problems.
We find that transfer learning the decoder does not help downstream segmentation tasks, while transfer learning the encoder is truly beneficial.
arXiv Detail & Related papers (2022-07-29T07:02:05Z) - ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking
Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking.
We finetune a pretrained encoder-decoder model using in the form of document to query generation.
We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z) - Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
Speech Data [145.95460945321253]
We introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes.
The proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training.
arXiv Detail & Related papers (2022-03-31T15:33:56Z) - EncoderMI: Membership Inference against Pre-trained Encoders in
Contrastive Learning [27.54202989524394]
We proposeMI, the first membership inference method against image encoders pre-trained by contrastive learning.
We evaluateMI on image encoders pre-trained on multiple datasets by ourselves as well as the Contrastive Language-Image Pre-training (CLIP) image encoder, which is pre-trained on 400 million (image, text) pairs collected from the Internet and released by OpenAI.
arXiv Detail & Related papers (2021-08-25T03:00:45Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained
Models into Speech Translation Encoders [30.160261563657947]
Speech-to-translation data is scarce; pre-training is promising in end-to-end Speech Translation.
We propose a Stacked.
Acoustic-and-Textual (SATE) method for speech translation.
Our encoder begins with processing the acoustic sequence as usual, but later behaves more like an.
MT encoder for a global representation of the input sequence.
arXiv Detail & Related papers (2021-05-12T16:09:53Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.