How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation
- URL: http://arxiv.org/abs/2310.02897v2
- Date: Thu, 13 Jun 2024 15:13:09 GMT
- Title: How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation
- Authors: Koren Abitbul, Yehuda Dar,
- Abstract summary: We propose an inverse problem perspective for the study of memorization.
We use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from.
We show that our method significantly outperforms previous memorization-evaluation methods that recover training data from autoencoders.
- Score: 1.573034584191491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overparameterized autoencoder models often memorize their training data. For image data, memorization is often examined by using the trained autoencoder to recover missing regions in its training images (that were used only in their complete forms in the training). In this paper, we propose an inverse problem perspective for the study of memorization. Given a degraded training image, we define the recovery of the original training image as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous memorization-evaluation methods that recover training data from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such recovery and memorization evaluation.
Related papers
- Data Attribution for Text-to-Image Models by Unlearning Synthesized Images [71.23012718682634]
The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image.
We propose a new approach that efficiently identifies highly-influential images.
arXiv Detail & Related papers (2024-06-13T17:59:44Z) - Autoencoder Based Face Verification System [0.0]
The primary objective of this work is to present an alternative approach aimed at reducing the dependency on labeled data.
Our proposed method involves utilizing autoencoder pre-training within a face image recognition task with two step processes.
Experimental results demonstrate that by initializing the deep neural network with pre-trained autoencoder parameters achieve comparable results to state-of-the-art methods.
arXiv Detail & Related papers (2023-12-21T21:18:53Z) - Accelerating Multiframe Blind Deconvolution via Deep Learning [0.0]
Ground-based solar image restoration is a computationally expensive procedure.
We propose a new method to accelerate the restoration based on algorithm unrolling.
We show that both methods significantly reduce the restoration time compared to the standard optimization procedure.
arXiv Detail & Related papers (2023-06-21T07:53:00Z) - Noise-Robust Dense Retrieval via Contrastive Alignment Post Training [89.29256833403167]
Contrastive Alignment POst Training (CAPOT) is a highly efficient finetuning method that improves model robustness without requiring index regeneration.
CAPOT enables robust retrieval by freezing the document encoder while the query encoder learns to align noisy queries with their unaltered root.
We evaluate CAPOT noisy variants of MSMARCO, Natural Questions, and Trivia QA passage retrieval, finding CAPOT has a similar impact as data augmentation with none of its overhead.
arXiv Detail & Related papers (2023-04-06T22:16:53Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Is Deep Image Prior in Need of a Good Education? [57.3399060347311]
Deep image prior was introduced as an effective prior for image reconstruction.
Despite its impressive reconstructive properties, the approach is slow when compared to learned or traditional reconstruction techniques.
We develop a two-stage learning paradigm to address the computational challenge.
arXiv Detail & Related papers (2021-11-23T15:08:26Z) - Training Stacked Denoising Autoencoders for Representation Learning [0.0]
We implement stacked autoencoders, a class of neural networks that are capable of learning powerful representations of high dimensional data.
We describe gradient descent for unsupervised training of autoencoders, as well as a novel genetic algorithm based approach that makes use of gradient information.
arXiv Detail & Related papers (2021-02-16T08:18:22Z) - EEC: Learning to Encode and Regenerate Images for Continual Learning [9.89901717499058]
We train autoencoders with Neural Style Transfer to encode and store images.
reconstructed images from encoded episodes are replayed in order to avoid catastrophic forgetting.
Our approach increases classification accuracy by 13-17% over state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2021-01-13T06:43:10Z) - Storing Encoded Episodes as Concepts for Continual Learning [22.387008072671005]
Two main challenges faced by continual learning approaches are catastrophic forgetting and memory limitations on the storage of data.
We propose a cognitively-inspired approach which trains autoencoders with Neural Style Transfer to encode and store images.
Our approach increases classification accuracy by 13-17% over state-of-the-art methods on benchmark datasets, while requiring 78% less storage space.
arXiv Detail & Related papers (2020-06-26T04:15:56Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z) - Encoding-based Memory Modules for Recurrent Neural Networks [79.42778415729475]
We study the memorization subtask from the point of view of the design and training of recurrent neural networks.
We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences.
arXiv Detail & Related papers (2020-01-31T11:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.