Denoising ESG: quantifying data uncertainty from missing data with Machine Learning and prediction intervals
- URL: http://arxiv.org/abs/2407.20047v1
- Date: Mon, 29 Jul 2024 14:31:44 GMT
- Title: Denoising ESG: quantifying data uncertainty from missing data with Machine Learning and prediction intervals
- Authors: Sergio Caprioli, Jacopo Foschi, Riccardo Crupi, Alessandro Sabatino,
- Abstract summary: This paper explores the application of established machine learning techniques for imputing missing data in a real-world ESG dataset.
By employing multiple imputation strategies, this study assesses the robustness of imputation methods and quantifies the uncertainty associated with missing data.
- Score: 43.5280757929682
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Environmental, Social, and Governance (ESG) datasets are frequently plagued by significant data gaps, leading to inconsistencies in ESG ratings due to varying imputation methods. This paper explores the application of established machine learning techniques for imputing missing data in a real-world ESG dataset, emphasizing the quantification of uncertainty through prediction intervals. By employing multiple imputation strategies, this study assesses the robustness of imputation methods and quantifies the uncertainty associated with missing data. The findings highlight the importance of probabilistic machine learning models in providing better understanding of ESG scores, thereby addressing the inherent risks of wrong ratings due to incomplete data. This approach improves imputation practices to enhance the reliability of ESG ratings.
Related papers
- An Investigation on Machine Learning Predictive Accuracy Improvement and Uncertainty Reduction using VAE-based Data Augmentation [2.517043342442487]
Deep generative learning uses certain ML models to learn the underlying distribution of existing data and generate synthetic samples that resemble the real data.
In this study, our objective is to evaluate the effectiveness of data augmentation using variational autoencoder (VAE)-based deep generative models.
We investigated whether the data augmentation leads to improved accuracy in the predictions of a deep neural network (DNN) model trained using the augmented data.
arXiv Detail & Related papers (2024-10-24T18:15:48Z) - Uncertainty for Active Learning on Graphs [70.44714133412592]
Uncertainty Sampling is an Active Learning strategy that aims to improve the data efficiency of machine learning models.
We benchmark Uncertainty Sampling beyond predictive uncertainty and highlight a significant performance gap to other Active Learning strategies.
We develop ground-truth Bayesian uncertainty estimates in terms of the data generating process and prove their effectiveness in guiding Uncertainty Sampling toward optimal queries.
arXiv Detail & Related papers (2024-05-02T16:50:47Z) - On the Validation of Gibbs Algorithms: Training Datasets, Test Datasets
and their Aggregation [70.540936204654]
dependence on training data of the Gibbs algorithm (GA) is analytically characterized.
This description enables the development of explicit expressions involving the training errors and test errors of GAs trained with different datasets.
arXiv Detail & Related papers (2023-06-21T16:51:50Z) - Machine Learning Based Missing Values Imputation in Categorical Datasets [2.5611256859404983]
This research looked into the use of machine learning algorithms to fill in the gaps in categorical datasets.
The emphasis was on ensemble models constructed using the Error Correction Output Codes framework.
Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data.
arXiv Detail & Related papers (2023-06-10T03:29:48Z) - Lightweight, Uncertainty-Aware Conformalized Visual Odometry [2.429910016019183]
Data-driven visual odometry (VO) is a critical subroutine for autonomous edge robotics.
Emerging edge robotics devices like insect-scale drones and surgical robots lack a computationally efficient framework to estimate VO's predictive uncertainties.
This paper presents a novel, lightweight, and statistically robust framework that leverages conformal inference (CI) to extract VO's uncertainty bands.
arXiv Detail & Related papers (2023-03-03T20:37:55Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Self-Supervised Learning for Data Scarcity in a Fatigue Damage
Prognostic Problem [0.0]
Self-Supervised Learning is a sub-category of unsupervised learning approaches.
This paper investigates whether pre-training DL models in a self-supervised way on unlabelled sensors data can be useful for Remaining Useful Life (RUL) estimation.
Results show that the self-supervised pre-trained models are able to significantly outperform the non-pre-trained models in downstream RUL prediction task.
arXiv Detail & Related papers (2023-01-20T06:45:32Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Stochastic Approximation for High-frequency Observations in Data
Assimilation [0.0]
High-frequency sensors offer opportunities for higher statistical accuracy of down-stream estimates, but their frequency results in a plethora of computational problems in data assimilation tasks.
We adapt approximation methods to address the unique challenges of high-frequency observations in data assimilation.
As a result, we are able to produce estimates that leverage all of the observations in a manner that avoids the aforementioned computational problems and preserves the statistical accuracy of the estimates.
arXiv Detail & Related papers (2020-11-05T06:02:27Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.