Learning from missing data with the Latent Block Model
- URL: http://arxiv.org/abs/2010.12222v1
- Date: Fri, 23 Oct 2020 08:11:43 GMT
- Title: Learning from missing data with the Latent Block Model
- Authors: Gabriel Frisch (Heudiasyc), Jean-Benoist L\'eger (Heudiasyc), Yves
Grandvalet (Heudiasyc)
- Abstract summary: We propose a co-clustering model, based on the Latent Block Model, that aims to take advantage of Missing Not At Random data.
A variational expectation-maximization algorithm is derived to perform inference and a model selection criterion is presented.
- Score: 0.5735035463793007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Missing data can be informative. Ignoring this information can lead to
misleading conclusions when the data model does not allow information to be
extracted from the missing data. We propose a co-clustering model, based on the
Latent Block Model, that aims to take advantage of this nonignorable
nonresponses, also known as Missing Not At Random data (MNAR). A variational
expectation-maximization algorithm is derived to perform inference and a model
selection criterion is presented. We assess the proposed approach on a
simulation study, before using our model on the voting records from the lower
house of the French Parliament, where our analysis brings out relevant groups
of MPs and texts, together with a sensible interpretation of the behavior of
non-voters.
Related papers
- Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Identification and Estimation for Nonignorable Missing Data: A Data
Fusion Approach [16.57879794516524]
We consider the task of identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR)
In this paper, we take an alternative approach, where information in an MNAR dataset is augmented by information in an auxiliary dataset subject to missingness at random (MAR)
We derive an inverse probability weighted (IPW) estimator for identified parameters, and evaluate the performance of our estimation strategies via simulation studies, and a data application.
arXiv Detail & Related papers (2023-11-15T14:57:20Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - ALUM: Adversarial Data Uncertainty Modeling from Latent Model
Uncertainty Compensation [25.67258563807856]
We propose a novel method called ALUM to handle the model uncertainty and data uncertainty in a unified scheme.
Our proposed ALUM is model-agnostic which can be easily implemented into any existing deep model with little extra overhead.
arXiv Detail & Related papers (2023-03-29T17:24:12Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Model-based Clustering with Missing Not At Random Data [0.8777702580252754]
We propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data.
Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership.
We focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership.
arXiv Detail & Related papers (2021-12-20T09:52:12Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Deep Generative Pattern-Set Mixture Models for Nonignorable Missingness [0.0]
We propose a variational autoencoder architecture to model both ignorable and nonignorable missing data.
Our model explicitly learns to cluster the missing data into missingness pattern sets based on the observed data and missingness masks.
Our setup trades off the characteristics of ignorable and nonignorable missingness and can thus be applied to data of both types.
arXiv Detail & Related papers (2021-03-05T08:21:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.