Related papers: An Audit Framework for Technical Assessment of Binary Classifiers

An Audit Framework for Technical Assessment of Binary Classifiers

URL: http://arxiv.org/abs/2211.09500v1
Date: Thu, 17 Nov 2022 12:48:11 GMT
Title: An Audit Framework for Technical Assessment of Binary Classifiers
Authors: Debarati Bhaumik and Diptish Dey
Abstract summary: Multilevel models using logistic regression (MLogRM) and random forest models (RFM) are increasingly deployed in industry for the purpose of binary classification. The European Commission's proposed Artificial Intelligence Act (AIA) necessitates, under certain conditions, that application of such models is fair, transparent, and ethical. This paper proposes and demonstrates an audit framework for technical assessment of RFMs and MLogRMs by focussing on model-, discrimination, and transparency & explainability-related aspects.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multilevel models using logistic regression (MLogRM) and random forest models (RFM) are increasingly deployed in industry for the purpose of binary classification. The European Commission's proposed Artificial Intelligence Act (AIA) necessitates, under certain conditions, that application of such models is fair, transparent, and ethical, which consequently implies technical assessment of these models. This paper proposes and demonstrates an audit framework for technical assessment of RFMs and MLogRMs by focussing on model-, discrimination-, and transparency & explainability-related aspects. To measure these aspects 20 KPIs are proposed, which are paired to a traffic light risk assessment method. An open-source dataset is used to train a RFM and a MLogRM model and these KPIs are computed and compared with the traffic lights. The performance of popular explainability methods such as kernel- and tree-SHAP are assessed. The framework is expected to assist regulatory bodies in performing conformity assessments of binary classifiers and also benefits providers and users deploying such AI-systems to comply with the AIA.

Related papers

Explainable Artificial Intelligence Credit Risk Assessment using Machine Learning [0.0]
This paper presents an AI-driven system for Credit Risk Assessment using three state-of-the-art ensemble machine learning models combined with Explainable AI (XAI) techniques.<n>The system leverages XGBoost, LightGBM, and Random Forest algorithms for predictive analysis of loan default risks.<n>LightGBM emerges as the most business-optimal model with the highest accuracy and best trade off between approval and default rates.
arXiv Detail & Related papers (2025-06-24T07:20:05Z)
Assessing FAIRness of the Digital Shadow Reference Model [0.0]
This paper presents an evaluation of the FAIRness of the Digital Shadow Reference Model. The model's metadata schema supports rich descriptions and authentication techniques. It highlights areas for improvement, such as the need for globally unique identifiers and consequent support for different Web standards.
arXiv Detail & Related papers (2025-04-22T08:58:48Z)
Matrix Factorization for Inferring Associations and Missing Links [5.700773330654261]
Missing link prediction identifies unseen but potentially existing connections in a network. In proliferation detection, this supports efforts to identify and characterize attempts by state and non-state actors to acquire nuclear weapons. We introduce novel weighted (WNMFk), Boolean (BNMFk), and Recommender (RNMFk) matrix factorization methods, along with ensemble variants incorporating logistic factorization, for link prediction.
arXiv Detail & Related papers (2025-03-06T18:22:46Z)
SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection [70.23196257213829]
We propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection. Our proposed framework first constructs a scalable evaluation benchmark that currently includes 564 event types covering 7 major domains. We then leverage large language models (LLMs) as automatic evaluation agents to compute a semantic F1-score, incorporating fine-grained definitions of semantically similar labels.
arXiv Detail & Related papers (2025-03-05T09:37:05Z)
The Lessons of Developing Process Reward Models in Mathematical Reasoning [62.165534879284735]
Process Reward Models (PRMs) aim to identify and mitigate intermediate errors in the reasoning processes. We develop a consensus filtering mechanism that effectively integrates Monte Carlo (MC) estimation with Large Language Models (LLMs) We release a new state-of-the-art PRM that outperforms existing open-source alternatives.
arXiv Detail & Related papers (2025-01-13T13:10:16Z)
Case-based Explainability for Random Forest: Prototypes, Critics, Counter-factuals and Semi-factuals [1.0485739694839669]
Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that elucidates the output of a model by referencing actual examples. XCBR has been relatively underexplored for many algorithms such as tree-based models until recently.
arXiv Detail & Related papers (2024-08-13T07:08:54Z)
Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment. To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z)
Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR) A series of analyses show that TKPR is compatible with existing ranking-based measures. On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making. We present a process-based benchmark MR-Ben that demands a meta-reasoning skill. Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
How fair are we? From conceptualization to automated assessment of fairness definitions [6.741000368514124]
MODNESS is a model-driven approach for user-defined fairness concepts in software systems. It generates the source code to implement fair assessment based on these custom definitions. Our findings reveal that most of the current approaches do not support user-defined fairness concepts.
arXiv Detail & Related papers (2024-04-15T16:46:17Z)
Incorporating Domain Knowledge in Deep Neural Networks for Discrete Choice Models [0.5801044612920815]
This paper proposes a framework that expands the potential of data-driven approaches for DCM. It includes pseudo data samples that represent required relationships and a loss function that measures their fulfillment. A case study demonstrates the potential of this framework for discrete choice analysis.
arXiv Detail & Related papers (2023-05-30T12:53:55Z)
Exploring validation metrics for offline model-based optimisation with diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle. While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples. This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z)
Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models. First, we show that neural causal models (NCMs) are expressive enough. Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z)
A Framework for Auditing Multilevel Models using Explainability Methods [2.578242050187029]
An audit framework for technical assessment of regressions is proposed. The focus is on three aspects, model, discrimination, and transparency and explainability. It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models.
arXiv Detail & Related papers (2022-07-04T17:53:21Z)
Towards a multi-stakeholder value-based assessment framework for algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values. We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z)
PermuteAttack: Counterfactual Explanation of Machine Learning Credit Scorecards [0.0]
This paper is a note on new directions and methodologies for validation and explanation of Machine Learning (ML) models employed for retail credit scoring in finance. Our proposed framework draws motivation from the field of Artificial Intelligence (AI) security and adversarial ML.
arXiv Detail & Related papers (2020-08-24T00:05:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.