A computational study on imputation methods for missing environmental
data
- URL: http://arxiv.org/abs/2108.09500v1
- Date: Sat, 21 Aug 2021 12:19:42 GMT
- Title: A computational study on imputation methods for missing environmental
data
- Authors: Paul Dixneuf and Fausto Errico and Mathias Glaus
- Abstract summary: This paper focuses on databases collecting information related to the natural environment.
It investigates the performances of several missing data imputation methods and their application to the problem of missing data in environment.
We believe that the present study demonstrates the pertinence of using MF as imputation method when dealing with missing environmental data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data acquisition and recording in the form of databases are routine
operations. The process of collecting data, however, may experience
irregularities, resulting in databases with missing data. Missing entries might
alter analysis efficiency and, consequently, the associated decision-making
process. This paper focuses on databases collecting information related to the
natural environment. Given the broad spectrum of recorded activities, these
databases typically are of mixed nature. It is therefore relevant to evaluate
the performance of missing data processing methods considering this
characteristic. In this paper we investigate the performances of several
missing data imputation methods and their application to the problem of missing
data in environment. A computational study was performed to compare the method
missForest (MF) with two other imputation methods, namely Multivariate
Imputation by Chained Equations (MICE) and K-Nearest Neighbors (KNN). Tests
were made on 10 pretreated datasets of various types. Results revealed that MF
generally outperformed MICE and KNN in terms of imputation errors, with a more
pronounced performance gap for mixed typed databases where MF reduced the
imputation error up to 150%, when compared to the other methods. KNN was
usually the fastest method. MF was then successfully applied to a case study on
Quebec wastewater treatment plants performance monitoring. We believe that the
present study demonstrates the pertinence of using MF as imputation method when
dealing with missing environmental data.
Related papers
- On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets [0.0]
Missing values or data is one popular characteristic of real-world datasets, especially healthcare data.
This study is to compare the performance of seven imputation techniques, namely Mean imputation, Median Imputation, Last Observation carried Forward (LOCF) imputation, K-Nearest Neighbor (KNN) imputation, Interpolation imputation, Missforest imputation, and Multiple imputation by Chained Equations (MICE)
The results show that Missforest imputation performs the best followed by MICE imputation.
arXiv Detail & Related papers (2024-03-13T18:07:17Z) - Physics-informed and Unsupervised Riemannian Domain Adaptation for Machine Learning on Heterogeneous EEG Datasets [53.367212596352324]
We propose an unsupervised approach leveraging EEG signal physics.
We map EEG channels to fixed positions using field, source-free domain adaptation.
Our method demonstrates robust performance in brain-computer interface (BCI) tasks and potential biomarker applications.
arXiv Detail & Related papers (2024-03-07T16:17:33Z) - In-Database Data Imputation [0.6157028677798809]
Missing data is a widespread problem in many domains, creating challenges in data analysis and decision making.
Traditional techniques for dealing with missing data, such as excluding incomplete records or imputing simple estimates, are computationally efficient but may introduce bias and disrupt variable relationships.
Model-based imputation techniques offer a more robust solution that preserves the variability and relationships in the data, but they demand significantly more computation time.
This work enables efficient, high-quality, and scalable data imputation within a database system using the widely used MICE method.
arXiv Detail & Related papers (2024-01-07T01:57:41Z) - Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation
for Time Series [49.992908221544624]
Time series data often exhibit numerous missing values, which is the time series imputation task.
Previous deep learning methods have been shown to be effective for time series imputation.
We propose a non-generative time series imputation method that produces accurate imputations with inherent uncertainty.
arXiv Detail & Related papers (2023-12-03T05:52:30Z) - IRTCI: Item Response Theory for Categorical Imputation [5.9952530228468754]
Several imputation techniques have been designed to replace missing data with stand in values.
The work showcased here offers a novel means for categorical imputation based on item response theory (IRT)
Analyses comparing these techniques were performed on three different datasets.
arXiv Detail & Related papers (2023-02-08T16:17:20Z) - Multiple Imputation with Neural Network Gaussian Process for
High-dimensional Incomplete Data [9.50726756006467]
Imputation is arguably the most popular method for handling missing data, though existing methods have a number of limitations.
We propose two NNGP-based MI methods, namely MI-NNGP, that can apply multiple imputations for missing values from a joint (posterior predictive) distribution.
The MI-NNGP methods are shown to significantly outperform existing state-of-the-art methods on synthetic and real datasets.
arXiv Detail & Related papers (2022-11-23T20:54:26Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - To Impute or not to Impute? -- Missing Data in Treatment Effect
Estimation [84.76186111434818]
We identify a new missingness mechanism, which we term mixed confounded missingness (MCM), where some missingness determines treatment selection and other missingness is determined by treatment selection.
We show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates.
Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not.
arXiv Detail & Related papers (2022-02-04T12:08:31Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.