Related papers: The Significance of Data Abstraction Methods in Machine Learning Classification Processes for Critical Decision-Making

The Significance of Data Abstraction Methods in Machine Learning Classification Processes for Critical Decision-Making

URL: http://arxiv.org/abs/2401.11044v1
Date: Fri, 19 Jan 2024 22:11:54 GMT
Title: The Significance of Data Abstraction Methods in Machine Learning Classification Processes for Critical Decision-Making
Authors: Karol Capa{\l}a, Paulina Tworek, Jose Sousa
Abstract summary: Small and Incomplete dataset Analyser (SaNDA) has been proposed to enhance the ability to perform classification in such domains. This paper focuses on column-wise data transformations called abstractions, which are crucial for SaNDA's classification process. It consistently maintains high accuracy even when half of the dataset is missing, unlike Random Forest which experiences a significant decline in accuracy under similar conditions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The applicability of widely adopted machine learning (ML) methods to classification is circumscribed by the imperatives of explicability and uncertainty, particularly evident in domains such as healthcare, behavioural sciences, and finances, wherein accountability assumes priority. Recently, Small and Incomplete Dataset Analyser (SaNDA) has been proposed to enhance the ability to perform classification in such domains, by developing a data abstraction protocol using a ROC curve-based method. This paper focuses on column-wise data transformations called abstractions, which are crucial for SaNDA's classification process and explores alternative abstractions protocols, such as constant binning and quantiles. The best-performing methods have been compared against Random Forest as a baseline for explainable methods. The results suggests that SaNDA can be a viable substitute for Random Forest when data is incomplete, even with minimal missing values. It consistently maintains high accuracy even when half of the dataset is missing, unlike Random Forest which experiences a significant decline in accuracy under similar conditions.

Related papers

Aligning Learning and Endogenous Decision-Making [5.84228364962637]
We introduce an end-to-end method under endogenous uncertainty to train ML models to be aware of their downstream.<n>We also introduce a robust optimization variant that accounts for uncertainty in ML models.<n>We prove guarantees that this robust approach can capture near-optimal decisions with high probability as a function of data.
arXiv Detail & Related papers (2025-07-01T15:22:56Z)
Sufficient Decision Proxies for Decision-Focused Learning [2.7143637678944454]
Decision-focused learning aims at learning a predictive model such that decision quality, instead of prediction accuracy, is maximized.<n>This paper investigates for the first time problem properties that justify using either assumption.<n>We show the effectiveness of presented approaches in experiments on problems with continuous and discrete variables, as well as uncertainty in the objective function and in the constraints.
arXiv Detail & Related papers (2025-05-06T20:10:17Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy. We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods. By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Decision-Focused Uncertainty Quantification [32.93992587758183]
We develop a framework based on conformal prediction to produce prediction sets that account for a downstream decision loss function. We present a real-world use case in healthcare diagnosis, where our method effectively incorporates the hierarchical structure of dermatological diseases.
arXiv Detail & Related papers (2024-10-02T17:22:09Z)
Enhancing Feature Selection and Interpretability in AI Regression Tasks Through Feature Attribution [38.53065398127086]
This study investigates the potential of feature attribution methods to filter out uninformative features in input data for regression problems. We introduce a feature selection pipeline that combines Integrated Gradients with k-means clustering to select an optimal set of variables from the initial data space. To validate the effectiveness of this approach, we apply it to a real-world industrial problem - blade vibration analysis in the development process of turbo machinery.
arXiv Detail & Related papers (2024-09-25T09:50:51Z)
Explainable Data-Driven Optimization: From Context to Decision and Back Again [76.84947521482631]
Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters. We introduce a counterfactual explanation methodology tailored to explain solutions to data-driven problems. We demonstrate our approach by explaining key problems in operations management such as inventory management and routing.
arXiv Detail & Related papers (2023-01-24T15:25:16Z)
RISE: Robust Individualized Decision Learning with Sensitive Variables [1.5293427903448025]
A naive baseline is to ignore sensitive variables in learning decision rules, leading to significant uncertainty and bias. We propose a decision learning framework to incorporate sensitive variables during offline training but not include them in the input of the learned decision rule during model deployment.
arXiv Detail & Related papers (2022-11-12T04:31:38Z)
Causal Fairness Analysis [68.12191782657437]
We introduce a framework for understanding, modeling, and possibly solving issues of fairness in decision-making settings. The main insight of our approach will be to link the quantification of the disparities present on the observed data with the underlying, and often unobserved, collection of causal mechanisms. Our effort culminates in the Fairness Map, which is the first systematic attempt to organize and explain the relationship between different criteria found in the literature.
arXiv Detail & Related papers (2022-07-23T01:06:34Z)
Learning from Heterogeneous Data Based on Social Interactions over Graphs [58.34060409467834]
This work proposes a decentralized architecture, where individual agents aim at solving a classification problem while observing streaming features of different dimensions. We show that the. strategy enables the agents to learn consistently under this highly-heterogeneous setting. We show that the. strategy enables the agents to learn consistently under this highly-heterogeneous setting.
arXiv Detail & Related papers (2021-12-17T12:47:18Z)
Targeted Active Learning for Bayesian Decision-Making [15.491942513739676]
We argue that when acquiring samples sequentially, separating learning and decision-making is sub-optimal. We introduce a novel active learning strategy which takes the down-the-line decision problem into account. Specifically, we introduce a novel active learning criterion which maximizes the expected information gain on the posterior distribution of the optimal decision.
arXiv Detail & Related papers (2021-06-08T09:05:43Z)
Information Theoretic Measures for Fairness-aware Feature Selection [27.06618125828978]
We develop a framework for fairness-aware feature selection, based on information theoretic measures for the accuracy and discriminatory impacts of features. Specifically, our goal is to design a fairness utility score for each feature which quantifies how this feature influences accurate as well as nondiscriminatory decisions.
arXiv Detail & Related papers (2021-06-01T20:11:54Z)
Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap. We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially. Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z)
Estimating Structural Target Functions using Machine Learning and Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models. This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics. We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.