Neural Data Server: A Large-Scale Search Engine for Transfer Learning
Data
- URL: http://arxiv.org/abs/2001.02799v3
- Date: Wed, 1 Apr 2020 00:43:03 GMT
- Title: Neural Data Server: A Large-Scale Search Engine for Transfer Learning
Data
- Authors: Xi Yan, David Acuna, Sanja Fidler
- Abstract summary: We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain.
NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client.
We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets.
- Score: 78.74367441804183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning has proven to be a successful technique to train deep
learning models in the domains where little training data is available. The
dominant approach is to pretrain a model on a large generic dataset such as
ImageNet and finetune its weights on the target domain. However, in the new era
of an ever-increasing number of massive datasets, selecting the relevant data
for pretraining is a critical issue. We introduce Neural Data Server (NDS), a
large-scale search engine for finding the most useful transfer learning data to
the target domain. NDS consists of a dataserver which indexes several large
popular image datasets, and aims to recommend data to a client, an end-user
with a target application with its own small labeled dataset. The dataserver
represents large datasets with a much more compact mixture-of-experts model,
and employs it to perform data search in a series of dataserver-client
transactions at a low computational cost. We show the effectiveness of NDS in
various transfer learning scenarios, demonstrating state-of-the-art performance
on several target datasets and tasks such as image classification, object
detection and instance segmentation. Neural Data Server is available as a
web-service at http://aidemos.cs.toronto.edu/nds/.
Related papers
- Data Filtering Networks [67.827994353269]
We study the problem of learning a data filtering network (DFN) for this second step of filtering a large uncurated dataset.
Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks.
Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets.
arXiv Detail & Related papers (2023-09-29T17:37:29Z) - AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud
Registration [69.21282992341007]
Auto Synth automatically generates 3D training data for point cloud registration.
We replace the point cloud registration network with a much smaller surrogate network, leading to a $4056.43$ speedup.
Our results on TUD-L, LINEMOD and Occluded-LINEMOD evidence that a neural network trained on our searched dataset yields consistently better performance than the same one trained on the widely used ModelNet40 dataset.
arXiv Detail & Related papers (2023-09-20T09:29:44Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Scalable Neural Data Server: A Data Recommender for Transfer Learning [70.06289658553675]
Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance.
Nerve Data Server (NDS), a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem.
NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task.
SNDS represents both data sources and downstream tasks by their proximity to the intermediary datasets.
arXiv Detail & Related papers (2022-06-19T12:07:32Z) - Overcoming Noisy and Irrelevant Data in Federated Learning [13.963024590508038]
Federated learning is an effective way of training a machine learning model in a distributed manner from local data collected by client devices.
We propose a method for distributedly selecting relevant data, where we use a benchmark model trained on a small benchmark dataset.
The effectiveness of our proposed approach is evaluated on multiple real-world image datasets in a simulated system with a large number of clients.
arXiv Detail & Related papers (2020-01-22T22:28:47Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.