DPER: Efficient Parameter Estimation for Randomly Missing Data
- URL: http://arxiv.org/abs/2106.05190v1
- Date: Sun, 6 Jun 2021 16:37:48 GMT
- Title: DPER: Efficient Parameter Estimation for Randomly Missing Data
- Authors: Thu Nguyen, Khoi Minh Nguyen-Duy, Duy Ho Minh Nguyen, Binh T. Nguyen,
and Bruce Alan Wade
- Abstract summary: We propose novel algorithms to find the maximum likelihood estimates (MLEs) for a one-class/multiple-class randomly missing data set.
Our algorithms do not require multiple iterations through the data, thus promising to be less time-consuming than other methods.
- Score: 0.24466725954625884
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The missing data problem has been broadly studied in the last few decades and
has various applications in different areas such as statistics or
bioinformatics. Even though many methods have been developed to tackle this
challenge, most of those are imputation techniques that require multiple
iterations through the data before yielding convergence. In addition, such
approaches may introduce extra biases and noises to the estimated parameters.
In this work, we propose novel algorithms to find the maximum likelihood
estimates (MLEs) for a one-class/multiple-class randomly missing data set under
some mild assumptions. As the computation is direct without any imputation, our
algorithms do not require multiple iterations through the data, thus promising
to be less time-consuming than other methods while maintaining superior
estimation performance. We validate these claims by empirical results on
various data sets of different sizes and release all codes in a GitHub
repository to contribute to the research community related to this problem.
Related papers
- Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Numerical Data Imputation for Multimodal Data Sets: A Probabilistic
Nearest-Neighbor Kernel Density Approach [2.750124853532831]
We introduce a data imputation method combining nearest neighbor estimation ($k$NN) and density estimation with Gaussian kernels (KDE)
We show that our method can cope with complex original data structure, yields lower data imputation errors, and provides probabilistic estimates with higher likelihood than current methods.
arXiv Detail & Related papers (2023-06-29T12:55:58Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - An Application of a Multivariate Estimation of Distribution Algorithm to
Cancer Chemotherapy [59.40521061783166]
Chemotherapy treatment for cancer is a complex optimisation problem with a large number of interacting variables and constraints.
We show that the more sophisticated algorithm would yield better performance on a complex problem like this.
We hypothesise that this is caused by the more sophisticated algorithm being impeded by the large number of interactions in the problem.
arXiv Detail & Related papers (2022-05-17T15:28:46Z) - Learning Mixtures of Linear Dynamical Systems [94.49754087817931]
We develop a two-stage meta-algorithm to efficiently recover each ground-truth LDS model up to error $tildeO(sqrtd/T)$.
We validate our theoretical studies with numerical experiments, confirming the efficacy of the proposed algorithm.
arXiv Detail & Related papers (2022-01-26T22:26:01Z) - Multilevel Stochastic Optimization for Imputation in Massive Medical Data Records [6.711824170437793]
We apply a recently developed multi-level computational optimization approach to the problem of imputation in massive medical records.
Results show that the multi-level method significantly outperforms current approaches and is numerically robust.
arXiv Detail & Related papers (2021-10-19T01:14:08Z) - EPEM: Efficient Parameter Estimation for Multiple Class Monotone Missing
Data [3.801859210248944]
We propose a novel algorithm to compute the maximum likelihood estimators (MLEs) of a multiple class, monotone missing dataset.
As the computation is exact, our EPEM algorithm does not require multiple iterations through the data as other imputation approaches.
arXiv Detail & Related papers (2020-09-23T20:07:53Z) - The Integrity of Machine Learning Algorithms against Software Defect
Prediction [0.0]
This report analyses the performance of the Online Sequential Extreme Learning Machine (OS-ELM) proposed by Liang et.al.
OS-ELM trains faster than conventional deep neural networks and it always converges to the globally optimal solution.
The analysis is carried out on 3 projects KC1, PC4 and PC3 carried out by the NASA group.
arXiv Detail & Related papers (2020-09-05T17:26:56Z) - Optimization for Supervised Machine Learning: Randomized Algorithms for
Data and Parameters [10.279748604797911]
Key problems in machine learning and data science are routinely modeled as optimization problems and solved via optimization algorithms.
With the increase of the volume of data and the size and complexity of the statistical models used to formulate these often ill-conditioned optimization tasks, there is a need for new efficient algorithms able to cope with these challenges.
In this thesis, we deal with each of these sources of difficulty in a different way. To efficiently address the big data issue, we develop new methods which in each iteration examine a small random subset of the training data only.
To handle the big model issue, we develop methods which in each iteration update
arXiv Detail & Related papers (2020-08-26T21:15:18Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z) - Improving a State-of-the-Art Heuristic for the Minimum Latency Problem
with Data Mining [69.00394670035747]
Hybrid metaheuristics have become a trend in operations research.
A successful example combines the Greedy Randomized Adaptive Search Procedures (GRASP) and data mining techniques.
arXiv Detail & Related papers (2019-08-28T13:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.