Toward a Perspectivist Turn in Ground Truthing for Predictive Computing
- URL: http://arxiv.org/abs/2109.04270v3
- Date: Thu, 29 Jun 2023 11:56:59 GMT
- Title: Toward a Perspectivist Turn in Ground Truthing for Predictive Computing
- Authors: Valerio Basile, Federico Cabitza, Andrea Campagner, Michael Fell
- Abstract summary: We call data perspectivism, which moves away from traditional gold standard datasets, towards the adoption of methods that integrate the opinions and perspectives of the human subjects involved in the knowledge representation step of machine learning processes.
We present the main advantages of adopting a perspectivist stance in ML, as well as possible disadvantages, and various ways in which such a stance can be implemented in practice.
- Score: 1.3985293623849522
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Most Artificial Intelligence applications are based on supervised machine
learning (ML), which ultimately grounds on manually annotated data. The
annotation process is often performed in terms of a majority vote and this has
been proved to be often problematic, as highlighted by recent studies on the
evaluation of ML models. In this article we describe and advocate for a
different paradigm, which we call data perspectivism, which moves away from
traditional gold standard datasets, towards the adoption of methods that
integrate the opinions and perspectives of the human subjects involved in the
knowledge representation step of ML processes. Drawing on previous works which
inspired our proposal we describe the potential of our proposal for not only
the more subjective tasks (e.g. those related to human language) but also to
tasks commonly understood as objective (e.g. medical decision making), and
present the main advantages of adopting a perspectivist stance in ML, as well
as possible disadvantages, and various ways in which such a stance can be
implemented in practice. Finally, we share a set of recommendations and outline
a research agenda to advance the perspectivist stance in ML.
Related papers
- MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [97.94579295913606]
Multimodal Large Language Models (MLLMs) have garnered increased attention from both industry and academia.
In the development process, evaluation is critical since it provides intuitive feedback and guidance on improving models.
This work aims to offer researchers an easy grasp of how to effectively evaluate MLLMs according to different needs and to inspire better evaluation methods.
arXiv Detail & Related papers (2024-11-22T18:59:54Z) - Prompt and Prejudice [29.35618753825668]
This paper investigates the impact of using first names in Large Language Models (LLMs) and Vision Language Models (VLMs)
We propose an approach that appends first names to ethically annotated text scenarios to reveal demographic biases in model outputs.
arXiv Detail & Related papers (2024-08-07T14:11:33Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - A Survey on Human Preference Learning for Large Language Models [81.41868485811625]
The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning.
This survey covers the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs.
arXiv Detail & Related papers (2024-06-17T03:52:51Z) - Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [68.83624133567213]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.
We also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z) - From Understanding to Utilization: A Survey on Explainability for Large
Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs)
Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity.
When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z) - InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal
Large Language Models [50.03163753638256]
Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence.
Our benchmark comprises three key reasoning categories: deductive, abductive, and analogical reasoning.
We evaluate a selection of representative MLLMs using this rigorously developed open-ended multi-step elaborate reasoning benchmark.
arXiv Detail & Related papers (2023-11-20T07:06:31Z) - Toward Operationalizing Pipeline-aware ML Fairness: A Research Agenda
for Developing Practical Guidelines and Tools [18.513353100744823]
Recent work has called on the ML community to take a more holistic approach to tackle fairness issues.
We first demonstrate that without clear guidelines and toolkits, even individuals with specialized ML knowledge find it challenging to hypothesize how various design choices influence model behavior.
We then consult the fair-ML literature to understand the progress to date toward operationalizing the pipeline-aware approach.
arXiv Detail & Related papers (2023-09-29T15:48:26Z) - Towards Fair and Explainable AI using a Human-Centered AI Approach [5.888646114353372]
We present 5 research projects that aim to enhance explainability and fairness in classification systems and word embeddings.
The first project explores the utility/downsides of introducing local model explanations as interfaces for machine teachers.
The second project presents D-BIAS, a causality-based human-in-the-loop visual tool for identifying and mitigating social biases in datasets.
The third project presents WordBias, a visual interactive tool that helps audit pre-trained static word embeddings for biases against groups.
The fourth project presents DramatVis Personae, a visual analytics tool that helps identify social
arXiv Detail & Related papers (2023-06-12T21:08:55Z) - Forecast Evaluation for Data Scientists: Common Pitfalls and Best
Practices [4.2951168699706646]
We provide a tutorial-like compilation of the details of one of the most important steps in the overall forecasting process, namely the evaluation.
We elaborate on the different problematic characteristics of time series such as non-normalities and non-stationarities.
Best practices in forecast evaluation are outlined with respect to the different steps such as data partitioning, error calculation, statistical testing, and others.
arXiv Detail & Related papers (2022-03-21T03:24:46Z) - The Benchmark Lottery [114.43978017484893]
"A benchmark lottery" describes the overall fragility of the machine learning benchmarking process.
We show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks.
arXiv Detail & Related papers (2021-07-14T21:08:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.