Learning from aggregated data with a maximum entropy model
- URL: http://arxiv.org/abs/2210.02450v1
- Date: Wed, 5 Oct 2022 09:17:27 GMT
- Title: Learning from aggregated data with a maximum entropy model
- Authors: Alexandre Gilotte, Ahmed Ben Yahmed, David Rohde
- Abstract summary: We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
- Score: 73.63512438583375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aggregating a dataset, then injecting some noise, is a simple and common way
to release differentially private data.However, aggregated data -- even without
noise -- is not an appropriate input for machine learning classifiers.In this
work, we show how a new model, similar to a logistic regression, may be learned
from aggregated data only by approximating the unobserved feature distribution
with a maximum entropy hypothesis. The resulting model is a Markov Random Field
(MRF), and we detail how to apply, modify and scale a MRF training algorithm to
our setting. Finally we present empirical evidence on several public datasets
that the model learned this way can achieve performances comparable to those of
a logistic model trained with the full unaggregated data.
Related papers
- When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter [7.886307329450978]
Dyna-style algorithms combine two approaches by using simulated data from an estimated environmental model to accelerate model-free training.
Previous works address this issue by using model ensembles or pretraining the estimated model with data collected from the real environment.
We introduce an out-of-distribution data filter that removes simulated data from the estimated model that significantly diverges from data collected in the real environment.
arXiv Detail & Related papers (2024-10-16T01:49:03Z) - Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals [1.2277343096128712]
This article proposes a federated prognostic model that allows multiple users to jointly construct a failure time prediction model.
Numerical studies indicate that the performance of the proposed model is the same as that of classic non-federated prognostic models.
arXiv Detail & Related papers (2023-11-13T17:08:34Z) - Diffusion Random Feature Model [0.0]
We present a diffusion model-inspired deep random feature model that is interpretable.
We derive generalization bounds between the distribution of sampled data and the true distribution using properties of score matching.
We validate our findings by generating samples on the fashion MNIST dataset and instrumental audio data.
arXiv Detail & Related papers (2023-10-06T17:59:05Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Learning Summary Statistics for Bayesian Inference with Autoencoders [58.720142291102135]
We use the inner dimension of deep neural network based Autoencoders as summary statistics.
To create an incentive for the encoder to encode all the parameter-related information but not the noise, we give the decoder access to explicit or implicit information that has been used to generate the training data.
arXiv Detail & Related papers (2022-01-28T12:00:31Z) - Model-based Clustering with Missing Not At Random Data [0.8777702580252754]
We propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data.
Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership.
We focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership.
arXiv Detail & Related papers (2021-12-20T09:52:12Z) - Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization.
We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation.
Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Data from Model: Extracting Data from Non-robust and Robust Models [83.60161052867534]
This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model.
We repeat the process of Data to Model (DtM) and Data from Model (DfM) in sequence and explore the loss of feature mapping information.
Our results show that the accuracy drop is limited even after multiple sequences of DtM and DfM, especially for robust models.
arXiv Detail & Related papers (2020-07-13T05:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.