Related papers: Learning from aggregated data with a maximum entropy model

Learning from aggregated data with a maximum entropy model

URL: http://arxiv.org/abs/2210.02450v1
Date: Wed, 5 Oct 2022 09:17:27 GMT
Title: Learning from aggregated data with a maximum entropy model
Authors: Alexandre Gilotte, Ahmed Ben Yahmed, David Rohde
Abstract summary: We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
Score: 73.63512438583375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Aggregating a dataset, then injecting some noise, is a simple and common way to release differentially private data.However, aggregated data -- even without noise -- is not an appropriate input for machine learning classifiers.In this work, we show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. The resulting model is a Markov Random Field (MRF), and we detail how to apply, modify and scale a MRF training algorithm to our setting. Finally we present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.

Related papers

Federated Learning for Diffusion Models [12.46092849473786]
Diffusion models are powerful generative models that can produce highly realistic samples for various tasks. We propose FedDDPM-Federated Learning with Denoising Diffusion Probabilistic Models. We provide a rigorous convergence analysis of FedDDPM and propose an enhanced algorithm, FedDDPM+, to reduce training overheads.
arXiv Detail & Related papers (2025-03-09T03:41:10Z)
Generalization is not a universal guarantee: Estimating similarity to training data with an ensemble out-of-distribution metric [0.09363323206192666]
Failure of machine learning models to generalize to new data is a core problem limiting the reliability of AI systems. We propose a standardized approach for assessing data similarity by constructing a supervised autoencoder for generalizability estimation (SAGE) We show that out-of-the-box model performance increases after SAGE score filtering, even when applied to data from the model's own training and test datasets.
arXiv Detail & Related papers (2025-02-22T19:21:50Z)
When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter [7.886307329450978]
Dyna-style algorithms combine two approaches by using simulated data from an estimated environmental model to accelerate model-free training. Previous works address this issue by using model ensembles or pretraining the estimated model with data collected from the real environment. We introduce an out-of-distribution data filter that removes simulated data from the estimated model that significantly diverges from data collected in the real environment.
arXiv Detail & Related papers (2024-10-16T01:49:03Z)
Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning. By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z)
A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals [1.2277343096128712]
This article proposes a federated prognostic model that allows multiple users to jointly construct a failure time prediction model. Numerical studies indicate that the performance of the proposed model is the same as that of classic non-federated prognostic models.
arXiv Detail & Related papers (2023-11-13T17:08:34Z)
Diffusion Random Feature Model [0.0]
We present a diffusion model-inspired deep random feature model that is interpretable. We derive generalization bounds between the distribution of sampled data and the true distribution using properties of score matching. We validate our findings by generating samples on the fashion MNIST dataset and instrumental audio data.
arXiv Detail & Related papers (2023-10-06T17:59:05Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Learning Summary Statistics for Bayesian Inference with Autoencoders [58.720142291102135]
We use the inner dimension of deep neural network based Autoencoders as summary statistics. To create an incentive for the encoder to encode all the parameter-related information but not the noise, we give the decoder access to explicit or implicit information that has been used to generate the training data.
arXiv Detail & Related papers (2022-01-28T12:00:31Z)
Model-based Clustering with Missing Not At Random Data [0.8777702580252754]
We propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership. We focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership.
arXiv Detail & Related papers (2021-12-20T09:52:12Z)
Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization. We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation. Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z)
Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously. We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework. The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
Data from Model: Extracting Data from Non-robust and Robust Models [83.60161052867534]
This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model. We repeat the process of Data to Model (DtM) and Data from Model (DfM) in sequence and explore the loss of feature mapping information. Our results show that the accuracy drop is limited even after multiple sequences of DtM and DfM, especially for robust models.
arXiv Detail & Related papers (2020-07-13T05:27:48Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.