A Framework for Auditing Multilevel Models using Explainability Methods
- URL: http://arxiv.org/abs/2207.01611v2
- Date: Fri, 15 Jul 2022 09:38:23 GMT
- Title: A Framework for Auditing Multilevel Models using Explainability Methods
- Authors: Debarati Bhaumik, Diptish Dey, Subhradeep Kayal
- Abstract summary: An audit framework for technical assessment of regressions is proposed.
The focus is on three aspects, model, discrimination, and transparency and explainability.
It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models.
- Score: 2.578242050187029
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Applications of multilevel models usually result in binary classification
within groups or hierarchies based on a set of input features. For transparent
and ethical applications of such models, sound audit frameworks need to be
developed. In this paper, an audit framework for technical assessment of
regression MLMs is proposed. The focus is on three aspects, model,
discrimination, and transparency and explainability. These aspects are
subsequently divided into sub aspects. Contributors, such as inter MLM group
fairness, feature contribution order, and aggregated feature contribution, are
identified for each of these sub aspects. To measure the performance of the
contributors, the framework proposes a shortlist of KPIs. A traffic light risk
assessment method is furthermore coupled to these KPIs. For assessing
transparency and explainability, different explainability methods (SHAP and
LIME) are used, which are compared with a model intrinsic method using
quantitative methods and machine learning modelling. Using an open source
dataset, a model is trained and tested and the KPIs are computed. It is
demonstrated that popular explainability methods, such as SHAP and LIME,
underperform in accuracy when interpreting these models. They fail to predict
the order of feature importance, the magnitudes, and occasionally even the
nature of the feature contribution. For other contributors, such as group
fairness and their associated KPIs, similar analysis and calculations have been
performed with the aim of adding profundity to the proposed audit framework.
The framework is expected to assist regulatory bodies in performing conformity
assessments of AI systems using multilevel binomial classification models at
businesses. It will also benefit businesses deploying MLMs to be future proof
and aligned with the European Commission proposed Regulation on Artificial
Intelligence.
Related papers
- MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [97.94579295913606]
Multimodal Large Language Models (MLLMs) have garnered increased attention from both industry and academia.
In the development process, evaluation is critical since it provides intuitive feedback and guidance on improving models.
This work aims to offer researchers an easy grasp of how to effectively evaluate MLLMs according to different needs and to inspire better evaluation methods.
arXiv Detail & Related papers (2024-11-22T18:59:54Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z) - Incorporating Domain Knowledge in Deep Neural Networks for Discrete
Choice Models [0.5801044612920815]
This paper proposes a framework that expands the potential of data-driven approaches for DCM.
It includes pseudo data samples that represent required relationships and a loss function that measures their fulfillment.
A case study demonstrates the potential of this framework for discrete choice analysis.
arXiv Detail & Related papers (2023-05-30T12:53:55Z) - An Audit Framework for Technical Assessment of Binary Classifiers [0.0]
Multilevel models using logistic regression (MLogRM) and random forest models (RFM) are increasingly deployed in industry for the purpose of binary classification.
The European Commission's proposed Artificial Intelligence Act (AIA) necessitates, under certain conditions, that application of such models is fair, transparent, and ethical.
This paper proposes and demonstrates an audit framework for technical assessment of RFMs and MLogRMs by focussing on model-, discrimination, and transparency & explainability-related aspects.
arXiv Detail & Related papers (2022-11-17T12:48:11Z) - Evaluation Gaps in Machine Learning Practice [13.963766987258161]
In practice, evaluations of machine learning models frequently focus on a narrow range of decontextualized predictive behaviours.
We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations.
By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts.
arXiv Detail & Related papers (2022-05-11T04:00:44Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - PermuteAttack: Counterfactual Explanation of Machine Learning Credit
Scorecards [0.0]
This paper is a note on new directions and methodologies for validation and explanation of Machine Learning (ML) models employed for retail credit scoring in finance.
Our proposed framework draws motivation from the field of Artificial Intelligence (AI) security and adversarial ML.
arXiv Detail & Related papers (2020-08-24T00:05:13Z) - Fairness by Explicability and Adversarial SHAP Learning [0.0]
We propose a new definition of fairness that emphasises the role of an external auditor and model explicability.
We develop a framework for mitigating model bias using regularizations constructed from the SHAP values of an adversarial surrogate model.
We demonstrate our approaches using gradient and adaptive boosting on: a synthetic dataset, the UCI Adult (Census) dataset and a real-world credit scoring dataset.
arXiv Detail & Related papers (2020-03-11T14:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.