dpart: Differentially Private Autoregressive Tabular, a General
Framework for Synthetic Data Generation
- URL: http://arxiv.org/abs/2207.05810v1
- Date: Tue, 12 Jul 2022 19:55:21 GMT
- Title: dpart: Differentially Private Autoregressive Tabular, a General
Framework for Synthetic Data Generation
- Authors: Sofiane Mahiou, Kai Xu, Georgi Ganev
- Abstract summary: dpart is an open source Python library for differentially private synthetic data generation.
The library has been created with a view to serve as a quick and accessible baseline.
Specific instances of dpart include Independent, an optimized version of PrivBayes, and a newly proposed model, dp-synthpop.
- Score: 8.115937653695884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a general, flexible, and scalable framework dpart, an open source
Python library for differentially private synthetic data generation. Central to
the approach is autoregressive modelling -- breaking the joint data
distribution to a sequence of lower-dimensional conditional distributions,
captured by various methods such as machine learning models (logistic/linear
regression, decision trees, etc.), simple histogram counts, or custom
techniques. The library has been created with a view to serve as a quick and
accessible baseline as well as to accommodate a wide audience of users, from
those making their first steps in synthetic data generation, to more
experienced ones with domain expertise who can configure different aspects of
the modelling and contribute new methods/mechanisms. Specific instances of
dpart include Independent, an optimized version of PrivBayes, and a newly
proposed model, dp-synthpop.
Code: https://github.com/hazy/dpart
Related papers
- Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients [12.008071873475169]
Federated learning is a technique that collaboratively learns a shared prediction model while keeping the data local on different clients.
We propose using dynamical architectures which, employing early-exit solutions, can adapt their processing depending on the input and on the operation conditions.
This solution falls in the realm of partial training methods and brings two benefits: a single model is used on a variety of devices; federating the models after local training is straightforward.
arXiv Detail & Related papers (2024-05-27T17:32:37Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Mixture-Models: a one-stop Python Library for Model-based Clustering
using various Mixture Models [4.60168321737677]
textttMixture-Models is an open-source Python library for fitting Gaussian Mixture Models (GMM) and their variants.
It streamlines the implementation and analysis of these models using various first/second order optimization routines.
The library provides user-friendly model evaluation tools, such as BIC, AIC, and log-likelihood estimation.
arXiv Detail & Related papers (2024-02-08T19:34:24Z) - A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals [1.2277343096128712]
This article proposes a federated prognostic model that allows multiple users to jointly construct a failure time prediction model.
Numerical studies indicate that the performance of the proposed model is the same as that of classic non-federated prognostic models.
arXiv Detail & Related papers (2023-11-13T17:08:34Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources.
Most of the technical contributions aim at accelerating and stabilizing the training at scale.
In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values [2.9707233220536313]
Federated learning makes it possible to train a machine learning model on decentralized data.
We propose a novel method called VertiBayes to train Bayesian networks on vertically partitioned data.
We experimentally show our approach produces models comparable to those learnt using traditional algorithms.
arXiv Detail & Related papers (2022-10-31T11:13:35Z) - Data Summarization via Bilevel Optimization [48.89977988203108]
A simple yet powerful approach is to operate on small subsets of data.
In this work, we propose a generic coreset framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem.
arXiv Detail & Related papers (2021-09-26T09:08:38Z) - GRAFFL: Gradient-free Federated Learning of a Bayesian Generative Model [8.87104231451079]
This paper presents the first gradient-free federated learning framework called GRAFFL.
It uses implicit information derived from each participating institution to learn posterior distributions of parameters.
We propose the GRAFFL-based Bayesian mixture model to serve as a proof-of-concept of the framework.
arXiv Detail & Related papers (2020-08-29T07:19:44Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.