Data Impressions: Mining Deep Models to Extract Samples for Data-free
Applications
- URL: http://arxiv.org/abs/2101.06069v1
- Date: Fri, 15 Jan 2021 11:37:29 GMT
- Title: Data Impressions: Mining Deep Models to Extract Samples for Data-free
Applications
- Authors: Gaurav Kumar Nayak, Konda Reddy Mopuri, Saksham Jain, Anirban
Chakraborty
- Abstract summary: "Data Impressions" act as proxy to the training data and can be used to realize a variety of tasks.
We show the applicability of data impressions in solving several computer vision tasks.
- Score: 26.48630545028405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained deep models hold their learnt knowledge in the form of the model
parameters. These parameters act as memory for the trained models and help them
generalize well on unseen data. However, in absence of training data, the
utility of a trained model is merely limited to either inference or better
initialization towards a target task. In this paper, we go further and extract
synthetic data by leveraging the learnt model parameters. We dub them "Data
Impressions", which act as proxy to the training data and can be used to
realize a variety of tasks. These are useful in scenarios where only the
pretrained models are available and the training data is not shared (e.g., due
to privacy or sensitivity concerns). We show the applicability of data
impressions in solving several computer vision tasks such as unsupervised
domain adaptation, continual learning as well as knowledge distillation. We
also study the adversarial robustness of the lightweight models trained via
knowledge distillation using these data impressions. Further, we demonstrate
the efficacy of data impressions in generating UAPs with better fooling rates.
Extensive experiments performed on several benchmark datasets demonstrate
competitive performance achieved using data impressions in absence of the
original training data.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning [28.35038726318893]
Machine learning models trained on vast amounts of real or synthetic data often achieve outstanding predictive performance across various domains.
To address privacy concerns, machine unlearning has been proposed to erase specific data samples from models.
We introduce the Unlearning Usability Attack to distill data distribution information into a small set of benign data.
arXiv Detail & Related papers (2024-07-06T15:42:28Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying [10.919336198760808]
We introduce a novel methodology to detect leaked data that are used to train classification models.
textscLDSS involves injecting a small volume of synthetic data--characterized by local shifts in class distribution--into the owner's dataset.
This enables the effective identification of models trained on leaked data through model querying alone.
arXiv Detail & Related papers (2023-10-06T10:36:28Z) - Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving
Training Data Release for Machine Learning [3.29354893777827]
We introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning.
We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets.
arXiv Detail & Related papers (2023-07-04T18:37:11Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.