Related papers: OLED: One-Class Learned Encoder-Decoder Network with Adversarial Context Masking for Novelty Detection

OLED: One-Class Learned Encoder-Decoder Network with Adversarial Context Masking for Novelty Detection

URL: http://arxiv.org/abs/2103.14953v1
Date: Sat, 27 Mar 2021 17:59:40 GMT
Title: OLED: One-Class Learned Encoder-Decoder Network with Adversarial Context Masking for Novelty Detection
Authors: John Taylor Jewell, Vahid Reza Khazaie, Yalda Mohsenzadeh
Abstract summary: novelty detection is the task of recognizing samples that do not belong to the distribution of the target class. Deep autoencoders have been widely used as a base of many unsupervised novelty detection methods. We have designed a framework consisting of two competing networks, a Mask Module and a Reconstructor.
Score: 1.933681537640272
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Novelty detection is the task of recognizing samples that do not belong to the distribution of the target class. During training, the novelty class is absent, preventing the use of traditional classification approaches. Deep autoencoders have been widely used as a base of many unsupervised novelty detection methods. In particular, context autoencoders have been successful in the novelty detection task because of the more effective representations they learn by reconstructing original images from randomly masked images. However, a significant drawback of context autoencoders is that random masking fails to consistently cover important structures of the input image, leading to suboptimal representations - especially for the novelty detection task. In this paper, to optimize input masking, we have designed a framework consisting of two competing networks, a Mask Module and a Reconstructor. The Mask Module is a convolutional autoencoder that learns to generate optimal masks that cover the most important parts of images. Alternatively, the Reconstructor is a convolutional encoder-decoder that aims to reconstruct unperturbed images from masked images. The networks are trained in an adversarial manner in which the Mask Module generates masks that are applied to images given to the Reconstructor. In this way, the Mask Module seeks to maximize the reconstruction error that the Reconstructor is minimizing. When applied to novelty detection, the proposed approach learns semantically richer representations compared to context autoencoders and enhances novelty detection at test time through more optimal masking. Novelty detection experiments on the MNIST and CIFAR-10 image datasets demonstrate the proposed approach's superiority over cutting-edge methods. In a further experiment on the UCSD video dataset for novelty detection, the proposed approach achieves state-of-the-art results.

Related papers

ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification [29.15203530375882]
Change (CD) from remote sensing (RS) images using deep learning has been widely investigated in the literature. We propose MaskCD to detect changed areas by adaptively generating categorized masks from input image pairs. It reconstructs the desired changed objects by decoding the pixel-wise representations into learnable mask proposals.
arXiv Detail & Related papers (2024-04-18T11:05:15Z)
Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization [40.78236375917571]
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that leverages end-to-end feedback from downstream tasks to learn an optimal masking strategy during pretraining.
arXiv Detail & Related papers (2024-02-28T07:37:26Z)
Rethinking Patch Dependence for Masked Autoencoders [89.02576415930963]
We study the impact of inter-patch dependencies in the decoder of masked autoencoders (MAE) on representation learning. We propose a simple visual pretraining framework: cross-attention masked autoencoders (CrossMAE)
arXiv Detail & Related papers (2024-01-25T18:49:57Z)
On Mask-based Image Set Desensitization with Recognition Support [46.51027529020668]
We propose a mask-based image desensitization approach while supporting recognition. We exploit an interpretation algorithm to maintain critical information for the recognition task. In addition, we propose a feature selection masknet as the model adjustment method to improve the performance based on the masked images.
arXiv Detail & Related papers (2023-12-14T14:26:42Z)
Neural Image Compression Using Masked Sparse Visual Representation [17.229601298529825]
We study neural image compression based on the Sparse Visual Representation (SVR), where images are embedded into a discrete latent space spanned by learned visual codebooks. By sharing codebooks with the decoder, the encoder transfers codeword indices that are efficient and cross-platform robust. We propose a Masked Adaptive Codebook learning (M-AdaCode) method that applies masks to the latent feature subspace to balance and reconstruction quality.
arXiv Detail & Related papers (2023-09-20T21:59:23Z)
Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook. We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling. We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image. Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z)
Context Autoencoder for Self-Supervised Representation Learning [64.63908944426224]
We pretrain an encoder by making predictions in the encoded representation space. The network is an encoder-regressor-decoder architecture. We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks.
arXiv Detail & Related papers (2022-02-07T09:33:45Z)
Contrastive Attention Network with Dense Field Estimation for Face Completion [11.631559190975034]
We propose a self-supervised Siamese inference network to improve the generalization and robustness of encoders. To deal with geometric variations of face images, a dense correspondence field is integrated into the network. This multi-scale architecture is beneficial for the decoder to utilize discriminative representations learned from encoders into images.
arXiv Detail & Related papers (2021-12-20T02:54:38Z)
Adaptive Shrink-Mask for Text Detection [91.34459257409104]
Existing real-time text detectors reconstruct text contours by shrink-masks directly. The dependence on predicted shrink-masks leads to unstable detection results. Super-pixel Window (SPW) is designed to supervise the network.
arXiv Detail & Related papers (2021-11-18T07:38:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.