Related papers: dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation

dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation

URL: http://arxiv.org/abs/2207.05810v1
Date: Tue, 12 Jul 2022 19:55:21 GMT
Title: dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation
Authors: Sofiane Mahiou, Kai Xu, Georgi Ganev
Abstract summary: dpart is an open source Python library for differentially private synthetic data generation. The library has been created with a view to serve as a quick and accessible baseline. Specific instances of dpart include Independent, an optimized version of PrivBayes, and a newly proposed model, dp-synthpop.
Score: 8.115937653695884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a general, flexible, and scalable framework dpart, an open source Python library for differentially private synthetic data generation. Central to the approach is autoregressive modelling -- breaking the joint data distribution to a sequence of lower-dimensional conditional distributions, captured by various methods such as machine learning models (logistic/linear regression, decision trees, etc.), simple histogram counts, or custom techniques. The library has been created with a view to serve as a quick and accessible baseline as well as to accommodate a wide audience of users, from those making their first steps in synthetic data generation, to more experienced ones with domain expertise who can configure different aspects of the modelling and contribute new methods/mechanisms. Specific instances of dpart include Independent, an optimized version of PrivBayes, and a newly proposed model, dp-synthpop. Code: https://github.com/hazy/dpart

Related papers

ARMO: Autoregressive Rigging for Multi-Category Objects [8.030479370619458]
We introduce OmniRig, the first large-scale rigging dataset, comprising 79,499 meshes with detailed skeleton and skinning information. Unlike traditional benchmarks that rely on predefined standard poses, our dataset embraces diverse shape categories, styles, and poses. We propose ARMO, a novel rigging framework that utilizes an autoregressive model to predict both joint positions and connectivity relationships in a unified manner.
arXiv Detail & Related papers (2025-03-26T15:56:48Z)
Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients [12.008071873475169]
Federated learning is a technique that collaboratively learns a shared prediction model while keeping the data local on different clients. We propose using dynamical architectures which, employing early-exit solutions, can adapt their processing depending on the input and on the operation conditions. This solution falls in the realm of partial training methods and brings two benefits: a single model is used on a variety of devices; federating the models after local training is straightforward.
arXiv Detail & Related papers (2024-05-27T17:32:37Z)
Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model. DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z)
Mixture-Models: a one-stop Python Library for Model-based Clustering using various Mixture Models [4.60168321737677]
textttMixture-Models is an open-source Python library for fitting Gaussian Mixture Models (GMM) and their variants. It streamlines the implementation and analysis of these models using various first/second order optimization routines. The library provides user-friendly model evaluation tools, such as BIC, AIC, and log-likelihood estimation.
arXiv Detail & Related papers (2024-02-08T19:34:24Z)
A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals [1.2277343096128712]
This article proposes a federated prognostic model that allows multiple users to jointly construct a failure time prediction model. Numerical studies indicate that the performance of the proposed model is the same as that of classic non-federated prognostic models.
arXiv Detail & Related papers (2023-11-13T17:08:34Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values [2.9707233220536313]
Federated learning makes it possible to train a machine learning model on decentralized data. We propose a novel method called VertiBayes to train Bayesian networks on vertically partitioned data. We experimentally show our approach produces models comparable to those learnt using traditional algorithms.
arXiv Detail & Related papers (2022-10-31T11:13:35Z)
Data Summarization via Bilevel Optimization [48.89977988203108]
A simple yet powerful approach is to operate on small subsets of data. In this work, we propose a generic coreset framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem.
arXiv Detail & Related papers (2021-09-26T09:08:38Z)
GRAFFL: Gradient-free Federated Learning of a Bayesian Generative Model [8.87104231451079]
This paper presents the first gradient-free federated learning framework called GRAFFL. It uses implicit information derived from each participating institution to learn posterior distributions of parameters. We propose the GRAFFL-based Bayesian mixture model to serve as a proof-of-concept of the framework.
arXiv Detail & Related papers (2020-08-29T07:19:44Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.