Towards a multi-stakeholder value-based assessment framework for
algorithmic systems
- URL: http://arxiv.org/abs/2205.04525v1
- Date: Mon, 9 May 2022 19:28:32 GMT
- Title: Towards a multi-stakeholder value-based assessment framework for
algorithmic systems
- Authors: Mireia Yurrita, Dave Murray-Rust, Agathe Balayn, Alessandro Bozzon
- Abstract summary: We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
- Score: 76.79703106646967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In an effort to regulate Machine Learning-driven (ML) systems, current
auditing processes mostly focus on detecting harmful algorithmic biases. While
these strategies have proven to be impactful, some values outlined in documents
dealing with ethics in ML-driven systems are still underrepresented in auditing
processes. Such unaddressed values mainly deal with contextual factors that
cannot be easily quantified. In this paper, we develop a value-based assessment
framework that is not limited to bias auditing and that covers prominent
ethical principles for algorithmic systems. Our framework presents a circular
arrangement of values with two bipolar dimensions that make common motivations
and potential tensions explicit. In order to operationalize these high-level
principles, values are then broken down into specific criteria and their
manifestations. However, some of these value-specific criteria are mutually
exclusive and require negotiation. As opposed to some other auditing frameworks
that merely rely on ML researchers' and practitioners' input, we argue that it
is necessary to include stakeholders that present diverse standpoints to
systematically negotiate and consolidate value and criteria tensions. To that
end, we map stakeholders with different insight needs, and assign tailored
means for communicating value manifestations to them. We, therefore, contribute
to current ML auditing practices with an assessment framework that visualizes
closeness and tensions between values and we give guidelines on how to
operationalize them, while opening up the evaluation and deliberation process
to a wide range of stakeholders.
Related papers
- A Conceptual Framework for Ethical Evaluation of Machine Learning Systems [12.887834116390358]
Ethical implications appear when designing evaluations of machine learning systems.
We present a utility framework, characterizing the key trade-off in ethical evaluation as balancing information gain against potential ethical harms.
Our analysis underscores the critical need for development teams to deliberately assess and manage ethical complexities.
arXiv Detail & Related papers (2024-08-05T01:06:49Z) - Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Pragmatic auditing: a pilot-driven approach for auditing Machine Learning systems [5.26895401335509]
We present a respective procedure that extends the AI-HLEG guidelines published by the European Commission.
Our audit procedure is based on an ML lifecycle model that explicitly focuses on documentation, accountability, and quality assurance.
We describe two pilots conducted on real-world use cases from two different organisations.
arXiv Detail & Related papers (2024-05-21T20:40:37Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z) - Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments.
It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z) - Fairness in Contextual Resource Allocation Systems: Metrics and
Incompatibility Results [7.705334602362225]
We study systems that allocate scarce resources to satisfy basic needs, such as homeless services that provide housing.
These systems often support communities disproportionately affected by systemic racial, gender, or other injustices.
We propose a framework for evaluating fairness in contextual resource allocation systems inspired by fairness metrics in machine learning.
arXiv Detail & Related papers (2022-12-04T02:30:58Z) - A Framework for Auditing Multilevel Models using Explainability Methods [2.578242050187029]
An audit framework for technical assessment of regressions is proposed.
The focus is on three aspects, model, discrimination, and transparency and explainability.
It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models.
arXiv Detail & Related papers (2022-07-04T17:53:21Z) - Evaluation Gaps in Machine Learning Practice [13.963766987258161]
In practice, evaluations of machine learning models frequently focus on a narrow range of decontextualized predictive behaviours.
We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations.
By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts.
arXiv Detail & Related papers (2022-05-11T04:00:44Z) - Towards Quantifiable Dialogue Coherence Evaluation [126.55560816209756]
Quantifiable Dialogue Coherence Evaluation (QuantiDCE) is a novel framework aiming to train a quantifiable dialogue coherence metric.
QuantiDCE includes two training stages, Multi-Level Ranking (MLR) pre-training and Knowledge Distillation (KD) fine-tuning.
Experimental results show that the model trained by QuantiDCE presents stronger correlations with human judgements than the other state-of-the-art metrics.
arXiv Detail & Related papers (2021-06-01T14:11:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.