DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier
- URL: http://arxiv.org/abs/1912.11960v1
- Date: Fri, 27 Dec 2019 02:05:45 GMT
- Title: DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier
- Authors: Sravanti Addepalli, Gaurav Kumar Nayak, Anirban Chakraborty, R.
Venkatesh Babu
- Abstract summary: We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
- Score: 58.979104709647295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this era of digital information explosion, an abundance of data from
numerous modalities is being generated as well as archived everyday. However,
most problems associated with training Deep Neural Networks still revolve
around lack of data that is rich enough for a given task. Data is required not
only for training an initial model, but also for future learning tasks such as
Model Compression and Incremental Learning. A diverse dataset may be used for
training an initial model, but it may not be feasible to store it throughout
the product life cycle due to data privacy issues or memory constraints. We
propose to bridge the gap between the abundance of available data and lack of
relevant data, for the future learning tasks of a given trained network. We use
the available data, that may be an imbalanced subset of the original training
dataset, or a related domain dataset, to retrieve representative samples from a
trained classifier, using a novel Data-enriching GAN (DeGAN) framework. We
demonstrate that data from a related domain can be leveraged to achieve
state-of-the-art performance for the tasks of Data-free Knowledge Distillation
and Incremental Learning on benchmark datasets. We further demonstrate that our
proposed framework can enrich any data, even from unrelated domains, to make it
more useful for the future learning tasks of a given network.
Related papers
- SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation [29.454948190814765]
In recent years, the data collected for artificial intelligence has grown to an unmanageable amount.
We propose a framework to select the most semantically diverse and important dataset portion.
We further semantically enrich it by discovering meaningful new data from a massive unlabeled data pool.
arXiv Detail & Related papers (2024-09-20T19:17:52Z) - Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks [66.87070857705994]
In low-resource settings, the amount of seed data samples to use for data augmentation is very small.
We propose a novel method that augments training data by incorporating a wealth of examples from other datasets.
This approach can ensure that the generated data is not only relevant but also more diverse than what could be achieved using the limited seed data alone.
arXiv Detail & Related papers (2024-02-21T02:45:46Z) - Data Filtering Networks [67.827994353269]
We study the problem of learning a data filtering network (DFN) for this second step of filtering a large uncurated dataset.
Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks.
Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets.
arXiv Detail & Related papers (2023-09-29T17:37:29Z) - Building Manufacturing Deep Learning Models with Minimal and Imbalanced
Training Data Using Domain Adaptation and Data Augmentation [15.333573151694576]
We propose a novel domain adaptation (DA) approach to address the problem of labeled training data scarcity for a target learning task.
Our approach works for scenarios where the source dataset and the dataset available for the target learning task have same or different feature spaces.
We evaluate our combined approach using image data for wafer defect prediction.
arXiv Detail & Related papers (2023-05-31T21:45:34Z) - Scalable Neural Data Server: A Data Recommender for Transfer Learning [70.06289658553675]
Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance.
Nerve Data Server (NDS), a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem.
NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task.
SNDS represents both data sources and downstream tasks by their proximity to the intermediary datasets.
arXiv Detail & Related papers (2022-06-19T12:07:32Z) - On the Pitfalls of Learning with Limited Data: A Facial Expression
Recognition Case Study [0.5249805590164901]
We focus on the problem of Facial Expression Recognition from videos.
We performed an extensive study with four databases at a different complexity and nine deep-learning architectures for video classification.
We found that complex training sets translate better to more stable test sets when trained with transfer learning and synthetically generated data.
arXiv Detail & Related papers (2021-04-02T18:53:41Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Overcoming Noisy and Irrelevant Data in Federated Learning [13.963024590508038]
Federated learning is an effective way of training a machine learning model in a distributed manner from local data collected by client devices.
We propose a method for distributedly selecting relevant data, where we use a benchmark model trained on a small benchmark dataset.
The effectiveness of our proposed approach is evaluated on multiple real-world image datasets in a simulated system with a large number of clients.
arXiv Detail & Related papers (2020-01-22T22:28:47Z) - Neural Data Server: A Large-Scale Search Engine for Transfer Learning
Data [78.74367441804183]
We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain.
NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client.
We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets.
arXiv Detail & Related papers (2020-01-09T01:21:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.