Related papers: Learning-Based Data Storage [Vision] (Technical Report)

Learning-Based Data Storage [Vision] (Technical Report)

URL: http://arxiv.org/abs/2206.05778v1
Date: Sun, 12 Jun 2022 16:14:16 GMT
Title: Learning-Based Data Storage [Vision] (Technical Report)
Authors: Xiang Lian, Xiaofei Zhang
Abstract summary: We envision a new paradigm of data storage, "DNN-as-a-Database", where data are encoded in well-trained machine learning models. In this paper, we propose this novel concept of learning-based data storage, which utilizes a learning structure called learning-based memory unit (LMU) Our preliminary experimental results show the feasibility of the learning-based data storage by achieving high (100%) accuracy of the DNN storage.
Score: 9.882820980833698
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep neural network (DNN) and its variants have been extensively used for a wide spectrum of real applications such as image classification, face/speech recognition, fraud detection, and so on. In addition to many important machine learning tasks, as artificial networks emulating the way brain cells function, DNNs also show the capability of storing non-linear relationships between input and output data, which exhibits the potential of storing data via DNNs. We envision a new paradigm of data storage, "DNN-as-a-Database", where data are encoded in well-trained machine learning models. Compared with conventional data storage that directly records data in raw formats, learning-based structures (e.g., DNN) can implicitly encode data pairs of inputs and outputs and compute/materialize actual output data of different resolutions only if input data are provided. This new paradigm can greatly enhance the data security by allowing flexible data privacy settings on different levels, achieve low space consumption and fast computation with the acceleration of new hardware (e.g., Diffractive Neural Network and AI chips), and can be generalized to distributed DNN-based storage/computing. In this paper, we propose this novel concept of learning-based data storage, which utilizes a learning structure called learning-based memory unit (LMU), to store, organize, and retrieve data. As a case study, we use DNNs as the engine in the LMU, and study the data capacity and accuracy of the DNN-based data storage. Our preliminary experimental results show the feasibility of the learning-based data storage by achieving high (100%) accuracy of the DNN storage. We explore and design effective solutions to utilize the DNN-based data storage to manage and query relational tables. We discuss how to generalize our solutions to other data types (e.g., graphs) and environments such as distributed DNN storage/computing.

Related papers

Deep-and-Wide Learning: Enhancing Data-Driven Inference via Synergistic Learning of Inter- and Intra-Data Representations [8.013386998355966]
Current deep neural network (DNN) models face several challenges, such as the requirements of extensive amounts of data and computational resources. Here, we introduce a new learning scheme, referred to as deep-and-wide learning (DWL), to systematically capture features. We show that DWL surpasses state-of-the-art DNNs in accuracy by a substantial margin with limited training data.
arXiv Detail & Related papers (2025-01-28T23:47:34Z)
DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives. For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z)
A Tale of Two Cities: Data and Configuration Variances in Robust Deep Learning [27.498927971861068]
Deep neural networks (DNNs) are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving. Prior work has shown the high accuracy of a DNN model does not imply high robustness because the input data and external environment are constantly changing.
arXiv Detail & Related papers (2022-11-18T03:32:53Z)
Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z)
Estimating Traffic Speeds using Probe Data: A Deep Neural Network Approach [1.5469452301122177]
This paper presents a dedicated Deep Neural Network architecture that reconstructs space-time traffic speeds on freeways given sparse data. A large set of empirical Floating-Car Data (FCD) collected on German freeway A9 during two months is utilized. The results show that the DNN is able to apply learned patterns, and reconstructs moving as well as stationary congested traffic with high accuracy.
arXiv Detail & Related papers (2021-04-19T23:32:12Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)
NN-EMD: Efficiently Training Neural Networks using Encrypted Multi-Sourced Datasets [7.067870969078555]
Training a machine learning model over an encrypted dataset is an existing promising approach to address the privacy-preserving machine learning task. We propose a novel framework, NN-EMD, to train a deep neural network (DNN) model over multiple datasets collected from multiple sources. We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets.
arXiv Detail & Related papers (2020-12-18T23:01:20Z)
Analyzing and Mitigating Data Stalls in DNN Training [7.444113272493349]
We present the first comprehensive analysis of how the input data pipeline affects the training time of Deep Neural Networks (DNNs) We find that in many cases, DNN training time is dominated by data stall time: time spent waiting for data to be fetched and preprocessed. We implement three simple but effective techniques in a data-loading library, CoorDL, to mitigate data stalls.
arXiv Detail & Related papers (2020-07-14T02:16:56Z)
Architecture Disentanglement for Deep Neural Networks [174.16176919145377]
We introduce neural architecture disentanglement (NAD) to explain the inner workings of deep neural networks (DNNs) NAD learns to disentangle a pre-trained DNN into sub-architectures according to independent tasks, forming information flows that describe the inference processes. Results show that misclassified images have a high probability of being assigned to task sub-architectures similar to the correct ones.
arXiv Detail & Related papers (2020-03-30T08:34:33Z)
Constructing Deep Neural Networks with a Priori Knowledge of Wireless Tasks [37.060397377445504]
Two kinds of permutation invariant properties widely existed in wireless tasks can be harnessed to reduce the number of model parameters. We find special architecture of DNNs whose input-output relationships satisfy the properties, called permutation invariant DNN (PINN) We take predictive resource allocation and interference coordination as examples to show how the PINNs can be employed for learning the optimal policy with unsupervised and supervised learning.
arXiv Detail & Related papers (2020-01-29T08:54:42Z)
Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data [78.74367441804183]
We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain. NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client. We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets.
arXiv Detail & Related papers (2020-01-09T01:21:30Z)
DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network. We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples. We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.