Towards a multi-stakeholder value-based assessment framework for
algorithmic systems
- URL: http://arxiv.org/abs/2205.04525v1
- Date: Mon, 9 May 2022 19:28:32 GMT
- Title: Towards a multi-stakeholder value-based assessment framework for
algorithmic systems
- Authors: Mireia Yurrita, Dave Murray-Rust, Agathe Balayn, Alessandro Bozzon
- Abstract summary: We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
- Score: 76.79703106646967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In an effort to regulate Machine Learning-driven (ML) systems, current
auditing processes mostly focus on detecting harmful algorithmic biases. While
these strategies have proven to be impactful, some values outlined in documents
dealing with ethics in ML-driven systems are still underrepresented in auditing
processes. Such unaddressed values mainly deal with contextual factors that
cannot be easily quantified. In this paper, we develop a value-based assessment
framework that is not limited to bias auditing and that covers prominent
ethical principles for algorithmic systems. Our framework presents a circular
arrangement of values with two bipolar dimensions that make common motivations
and potential tensions explicit. In order to operationalize these high-level
principles, values are then broken down into specific criteria and their
manifestations. However, some of these value-specific criteria are mutually
exclusive and require negotiation. As opposed to some other auditing frameworks
that merely rely on ML researchers' and practitioners' input, we argue that it
is necessary to include stakeholders that present diverse standpoints to
systematically negotiate and consolidate value and criteria tensions. To that
end, we map stakeholders with different insight needs, and assign tailored
means for communicating value manifestations to them. We, therefore, contribute
to current ML auditing practices with an assessment framework that visualizes
closeness and tensions between values and we give guidelines on how to
operationalize them, while opening up the evaluation and deliberation process
to a wide range of stakeholders.
Related papers
- Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Pragmatic auditing: a pilot-driven approach for auditing Machine Learning systems [5.26895401335509]
We present a respective procedure that extends the AI-HLEG guidelines published by the European Commission.
Our audit procedure is based on an ML lifecycle model that explicitly focuses on documentation, accountability, and quality assurance.
We describe two pilots conducted on real-world use cases from two different organisations.
arXiv Detail & Related papers (2024-05-21T20:40:37Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z) - Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments.
It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z) - Fairness in Contextual Resource Allocation Systems: Metrics and
Incompatibility Results [7.705334602362225]
We study systems that allocate scarce resources to satisfy basic needs, such as homeless services that provide housing.
These systems often support communities disproportionately affected by systemic racial, gender, or other injustices.
We propose a framework for evaluating fairness in contextual resource allocation systems inspired by fairness metrics in machine learning.
arXiv Detail & Related papers (2022-12-04T02:30:58Z) - How to Evaluate Explainability? -- A Case for Three Criteria [0.0]
We will provide a multidisciplinary motivation for three quality criteria concerning the information that systems should provide.
Our aim is to fuel the discussion regarding these criteria, such as adequate evaluation methods for them will be conceived.
arXiv Detail & Related papers (2022-09-01T11:22:50Z) - A Framework for Auditing Multilevel Models using Explainability Methods [2.578242050187029]
An audit framework for technical assessment of regressions is proposed.
The focus is on three aspects, model, discrimination, and transparency and explainability.
It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models.
arXiv Detail & Related papers (2022-07-04T17:53:21Z) - Evaluation Gaps in Machine Learning Practice [13.963766987258161]
In practice, evaluations of machine learning models frequently focus on a narrow range of decontextualized predictive behaviours.
We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations.
By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts.
arXiv Detail & Related papers (2022-05-11T04:00:44Z) - Towards Quantifiable Dialogue Coherence Evaluation [126.55560816209756]
Quantifiable Dialogue Coherence Evaluation (QuantiDCE) is a novel framework aiming to train a quantifiable dialogue coherence metric.
QuantiDCE includes two training stages, Multi-Level Ranking (MLR) pre-training and Knowledge Distillation (KD) fine-tuning.
Experimental results show that the model trained by QuantiDCE presents stronger correlations with human judgements than the other state-of-the-art metrics.
arXiv Detail & Related papers (2021-06-01T14:11:17Z) - Uncertainty-aware Score Distribution Learning for Action Quality
Assessment [91.05846506274881]
We propose an uncertainty-aware score distribution learning (USDL) approach for action quality assessment (AQA)
Specifically, we regard an action as an instance associated with a score distribution, which describes the probability of different evaluated scores.
Under the circumstance where fine-grained score labels are available, we devise a multi-path uncertainty-aware score distributions learning (MUSDL) method to explore the disentangled components of a score.
arXiv Detail & Related papers (2020-06-13T15:41:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.