An Audit Framework for Technical Assessment of Binary Classifiers
- URL: http://arxiv.org/abs/2211.09500v1
- Date: Thu, 17 Nov 2022 12:48:11 GMT
- Title: An Audit Framework for Technical Assessment of Binary Classifiers
- Authors: Debarati Bhaumik and Diptish Dey
- Abstract summary: Multilevel models using logistic regression (MLogRM) and random forest models (RFM) are increasingly deployed in industry for the purpose of binary classification.
The European Commission's proposed Artificial Intelligence Act (AIA) necessitates, under certain conditions, that application of such models is fair, transparent, and ethical.
This paper proposes and demonstrates an audit framework for technical assessment of RFMs and MLogRMs by focussing on model-, discrimination, and transparency & explainability-related aspects.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multilevel models using logistic regression (MLogRM) and random forest models
(RFM) are increasingly deployed in industry for the purpose of binary
classification. The European Commission's proposed Artificial Intelligence Act
(AIA) necessitates, under certain conditions, that application of such models
is fair, transparent, and ethical, which consequently implies technical
assessment of these models. This paper proposes and demonstrates an audit
framework for technical assessment of RFMs and MLogRMs by focussing on model-,
discrimination-, and transparency & explainability-related aspects. To measure
these aspects 20 KPIs are proposed, which are paired to a traffic light risk
assessment method. An open-source dataset is used to train a RFM and a MLogRM
model and these KPIs are computed and compared with the traffic lights. The
performance of popular explainability methods such as kernel- and tree-SHAP are
assessed. The framework is expected to assist regulatory bodies in performing
conformity assessments of binary classifiers and also benefits providers and
users deploying such AI-systems to comply with the AIA.
Related papers
- Case-based Explainability for Random Forest: Prototypes, Critics, Counter-factuals and Semi-factuals [1.0485739694839669]
Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that elucidates the output of a model by referencing actual examples.
XCBR has been relatively underexplored for many algorithms such as tree-based models until recently.
arXiv Detail & Related papers (2024-08-13T07:08:54Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - How fair are we? From conceptualization to automated assessment of fairness definitions [6.741000368514124]
MODNESS is a model-driven approach for user-defined fairness concepts in software systems.
It generates the source code to implement fair assessment based on these custom definitions.
Our findings reveal that most of the current approaches do not support user-defined fairness concepts.
arXiv Detail & Related papers (2024-04-15T16:46:17Z) - Incorporating Domain Knowledge in Deep Neural Networks for Discrete
Choice Models [0.5801044612920815]
This paper proposes a framework that expands the potential of data-driven approaches for DCM.
It includes pseudo data samples that represent required relationships and a loss function that measures their fulfillment.
A case study demonstrates the potential of this framework for discrete choice analysis.
arXiv Detail & Related papers (2023-05-30T12:53:55Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models.
First, we show that neural causal models (NCMs) are expressive enough.
Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z) - A Framework for Auditing Multilevel Models using Explainability Methods [2.578242050187029]
An audit framework for technical assessment of regressions is proposed.
The focus is on three aspects, model, discrimination, and transparency and explainability.
It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models.
arXiv Detail & Related papers (2022-07-04T17:53:21Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - PermuteAttack: Counterfactual Explanation of Machine Learning Credit
Scorecards [0.0]
This paper is a note on new directions and methodologies for validation and explanation of Machine Learning (ML) models employed for retail credit scoring in finance.
Our proposed framework draws motivation from the field of Artificial Intelligence (AI) security and adversarial ML.
arXiv Detail & Related papers (2020-08-24T00:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.