EPEM: Efficient Parameter Estimation for Multiple Class Monotone Missing
Data
- URL: http://arxiv.org/abs/2009.11360v1
- Date: Wed, 23 Sep 2020 20:07:53 GMT
- Title: EPEM: Efficient Parameter Estimation for Multiple Class Monotone Missing
Data
- Authors: Thu Nguyen, Duy H. M. Nguyen, Huy Nguyen, Binh T. Nguyen, Bruce A.
Wade
- Abstract summary: We propose a novel algorithm to compute the maximum likelihood estimators (MLEs) of a multiple class, monotone missing dataset.
As the computation is exact, our EPEM algorithm does not require multiple iterations through the data as other imputation approaches.
- Score: 3.801859210248944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of monotone missing data has been broadly studied during the last
two decades and has many applications in different fields such as
bioinformatics or statistics. Commonly used imputation techniques require
multiple iterations through the data before yielding convergence. Moreover,
those approaches may introduce extra noises and biases to the subsequent
modeling. In this work, we derive exact formulas and propose a novel algorithm
to compute the maximum likelihood estimators (MLEs) of a multiple class,
monotone missing dataset when all the covariance matrices of all categories are
assumed to be equal, namely EPEM. We then illustrate an application of our
proposed methods in Linear Discriminant Analysis (LDA). As the computation is
exact, our EPEM algorithm does not require multiple iterations through the data
as other imputation approaches, thus promising to handle much less
time-consuming than other methods. This effectiveness was validated by
empirical results when EPEM reduced the error rates significantly and required
a short computation time compared to several imputation-based approaches. We
also release all codes and data of our experiments in one GitHub repository to
contribute to the research community related to this problem.
Related papers
- An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - IRTCI: Item Response Theory for Categorical Imputation [5.9952530228468754]
Several imputation techniques have been designed to replace missing data with stand in values.
The work showcased here offers a novel means for categorical imputation based on item response theory (IRT)
Analyses comparing these techniques were performed on three different datasets.
arXiv Detail & Related papers (2023-02-08T16:17:20Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Active Learning with Expected Error Reduction [4.506537904404427]
Expected Error Reduction (EER) has been shown to be an effective method for active learning.
EER requires the model to be retrained for every candidate sample.
In this paper we reformulate EER under the lens of Bayesian active learning.
arXiv Detail & Related papers (2022-11-17T01:02:12Z) - An Application of a Multivariate Estimation of Distribution Algorithm to
Cancer Chemotherapy [59.40521061783166]
Chemotherapy treatment for cancer is a complex optimisation problem with a large number of interacting variables and constraints.
We show that the more sophisticated algorithm would yield better performance on a complex problem like this.
We hypothesise that this is caused by the more sophisticated algorithm being impeded by the large number of interactions in the problem.
arXiv Detail & Related papers (2022-05-17T15:28:46Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and
Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area.
Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration.
This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z) - DPER: Efficient Parameter Estimation for Randomly Missing Data [0.24466725954625884]
We propose novel algorithms to find the maximum likelihood estimates (MLEs) for a one-class/multiple-class randomly missing data set.
Our algorithms do not require multiple iterations through the data, thus promising to be less time-consuming than other methods.
arXiv Detail & Related papers (2021-06-06T16:37:48Z) - A Method for Handling Multi-class Imbalanced Data by Geometry based
Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem.
Two novel methods are proposed that exploit the geometric relationship between the feature vectors.
The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z) - A Robust Functional EM Algorithm for Incomplete Panel Count Data [66.07942227228014]
We propose a functional EM algorithm to estimate the counting process mean function under a missing completely at random assumption (MCAR)
The proposed algorithm wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption.
We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data.
arXiv Detail & Related papers (2020-03-02T20:04:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.