Learning-Based Data Storage [Vision] (Technical Report)
- URL: http://arxiv.org/abs/2206.05778v1
- Date: Sun, 12 Jun 2022 16:14:16 GMT
- Title: Learning-Based Data Storage [Vision] (Technical Report)
- Authors: Xiang Lian, Xiaofei Zhang
- Abstract summary: We envision a new paradigm of data storage, "DNN-as-a-Database", where data are encoded in well-trained machine learning models.
In this paper, we propose this novel concept of learning-based data storage, which utilizes a learning structure called learning-based memory unit (LMU)
Our preliminary experimental results show the feasibility of the learning-based data storage by achieving high (100%) accuracy of the DNN storage.
- Score: 9.882820980833698
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep neural network (DNN) and its variants have been extensively used for a
wide spectrum of real applications such as image classification, face/speech
recognition, fraud detection, and so on. In addition to many important machine
learning tasks, as artificial networks emulating the way brain cells function,
DNNs also show the capability of storing non-linear relationships between input
and output data, which exhibits the potential of storing data via DNNs. We
envision a new paradigm of data storage, "DNN-as-a-Database", where data are
encoded in well-trained machine learning models. Compared with conventional
data storage that directly records data in raw formats, learning-based
structures (e.g., DNN) can implicitly encode data pairs of inputs and outputs
and compute/materialize actual output data of different resolutions only if
input data are provided. This new paradigm can greatly enhance the data
security by allowing flexible data privacy settings on different levels,
achieve low space consumption and fast computation with the acceleration of new
hardware (e.g., Diffractive Neural Network and AI chips), and can be
generalized to distributed DNN-based storage/computing. In this paper, we
propose this novel concept of learning-based data storage, which utilizes a
learning structure called learning-based memory unit (LMU), to store, organize,
and retrieve data. As a case study, we use DNNs as the engine in the LMU, and
study the data capacity and accuracy of the DNN-based data storage. Our
preliminary experimental results show the feasibility of the learning-based
data storage by achieving high (100%) accuracy of the DNN storage. We explore
and design effective solutions to utilize the DNN-based data storage to manage
and query relational tables. We discuss how to generalize our solutions to
other data types (e.g., graphs) and environments such as distributed DNN
storage/computing.
Related papers
- Deep-and-Wide Learning: Enhancing Data-Driven Inference via Synergistic Learning of Inter- and Intra-Data Representations [8.013386998355966]
Current deep neural network (DNN) models face several challenges, such as the requirements of extensive amounts of data and computational resources.
Here, we introduce a new learning scheme, referred to as deep-and-wide learning (DWL), to systematically capture features.
We show that DWL surpasses state-of-the-art DNNs in accuracy by a substantial margin with limited training data.
arXiv Detail & Related papers (2025-01-28T23:47:34Z) - DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - A Tale of Two Cities: Data and Configuration Variances in Robust Deep
Learning [27.498927971861068]
Deep neural networks (DNNs) are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving.
Prior work has shown the high accuracy of a DNN model does not imply high robustness because the input data and external environment are constantly changing.
arXiv Detail & Related papers (2022-11-18T03:32:53Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - NN-EMD: Efficiently Training Neural Networks using Encrypted
Multi-Sourced Datasets [7.067870969078555]
Training a machine learning model over an encrypted dataset is an existing promising approach to address the privacy-preserving machine learning task.
We propose a novel framework, NN-EMD, to train a deep neural network (DNN) model over multiple datasets collected from multiple sources.
We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets.
arXiv Detail & Related papers (2020-12-18T23:01:20Z) - Analyzing and Mitigating Data Stalls in DNN Training [7.444113272493349]
We present the first comprehensive analysis of how the input data pipeline affects the training time of Deep Neural Networks (DNNs)
We find that in many cases, DNN training time is dominated by data stall time: time spent waiting for data to be fetched and preprocessed.
We implement three simple but effective techniques in a data-loading library, CoorDL, to mitigate data stalls.
arXiv Detail & Related papers (2020-07-14T02:16:56Z) - Architecture Disentanglement for Deep Neural Networks [174.16176919145377]
We introduce neural architecture disentanglement (NAD) to explain the inner workings of deep neural networks (DNNs)
NAD learns to disentangle a pre-trained DNN into sub-architectures according to independent tasks, forming information flows that describe the inference processes.
Results show that misclassified images have a high probability of being assigned to task sub-architectures similar to the correct ones.
arXiv Detail & Related papers (2020-03-30T08:34:33Z) - Constructing Deep Neural Networks with a Priori Knowledge of Wireless
Tasks [37.060397377445504]
Two kinds of permutation invariant properties widely existed in wireless tasks can be harnessed to reduce the number of model parameters.
We find special architecture of DNNs whose input-output relationships satisfy the properties, called permutation invariant DNN (PINN)
We take predictive resource allocation and interference coordination as examples to show how the PINNs can be employed for learning the optimal policy with unsupervised and supervised learning.
arXiv Detail & Related papers (2020-01-29T08:54:42Z) - Neural Data Server: A Large-Scale Search Engine for Transfer Learning
Data [78.74367441804183]
We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain.
NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client.
We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets.
arXiv Detail & Related papers (2020-01-09T01:21:30Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.