STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge
- URL: http://arxiv.org/abs/2006.14554v1
- Date: Thu, 25 Jun 2020 16:56:23 GMT
- Title: STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge
- Authors: Benjamin Coleman, Gaurav Gupta, John Chen, Anshumali Shrivastava
- Abstract summary: Empirical risk minimization is perhaps the most influential idea in statistical learning.
We propose STORM, an online sketch for empirical risk minimization.
- Score: 42.94785994216686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Empirical risk minimization is perhaps the most influential idea in
statistical learning, with applications to nearly all scientific and technical
domains in the form of regression and classification models. To analyze massive
streaming datasets in distributed computing environments, practitioners
increasingly prefer to deploy regression models on edge rather than in the
cloud. By keeping data on edge devices, we minimize the energy, communication,
and data security risk associated with the model. Although it is equally
advantageous to train models at the edge, a common assumption is that the model
was originally trained in the cloud, since training typically requires
substantial computation and memory. To this end, we propose STORM, an online
sketch for empirical risk minimization. STORM compresses a data stream into a
tiny array of integer counters. This sketch is sufficient to estimate a variety
of surrogate losses over the original dataset. We provide rigorous theoretical
analysis and show that STORM can estimate a carefully chosen surrogate loss for
the least-squares objective. In an exhaustive experimental comparison for
linear regression models on real-world datasets, we find that STORM allows
accurate regression models to be trained.
Related papers
- Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data [14.51185186237899]
We consider using differentially-private (DP), synthetic training data instead of real training data to train an ML model.
A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution.
Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data.
arXiv Detail & Related papers (2024-02-06T20:24:07Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Deep Regression Unlearning [6.884272840652062]
We introduce deep regression unlearning methods that generalize well and are robust to privacy attacks.
We conduct regression unlearning experiments for computer vision, natural language processing and forecasting applications.
arXiv Detail & Related papers (2022-10-15T05:00:20Z) - Federated Latent Class Regression for Hierarchical Data [5.110894308882439]
Federated Learning (FL) allows a number of agents to participate in training a global machine learning model without disclosing locally stored data.
We propose a novel probabilistic model, Hierarchical Latent Class Regression (HLCR), and its extension to Federated Learning, FEDHLCR.
Our inference algorithm, being derived from Bayesian theory, provides strong convergence guarantees and good robustness to overfitting. Experimental results show that FEDHLCR offers fast convergence even in non-IID datasets.
arXiv Detail & Related papers (2022-06-22T00:33:04Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Transfer learning suppresses simulation bias in predictive models built
from sparse, multi-modal data [15.587831925516957]
Many problems in science, engineering, and business require making predictions based on very few observations.
To build a robust predictive model, these sparse data may need to be augmented with simulated data, especially when the design space is multidimensional.
We combine recent developments in deep learning to build more robust predictive models from multimodal data.
arXiv Detail & Related papers (2021-04-19T23:28:32Z) - Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased.
We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief.
In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.