Related papers: Can Interpretability Layouts Influence Human Perception of Offensive Sentences?

Can Interpretability Layouts Influence Human Perception of Offensive Sentences?

URL: http://arxiv.org/abs/2403.05581v1
Date: Fri, 1 Mar 2024 13:25:54 GMT
Title: Can Interpretability Layouts Influence Human Perception of Offensive Sentences?
Authors: Thiago Freitas dos Santos, Nardine Osman, Marco Schorlemmer,
Abstract summary: This paper conducts a user study to assess whether three machine learning (ML) interpretability layouts can influence participants' views when evaluating sentences containing hate speech.
Score: 1.474723404975345
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper conducts a user study to assess whether three machine learning (ML) interpretability layouts can influence participants' views when evaluating sentences containing hate speech, focusing on the "Misogyny" and "Racism" classes. Given the existence of divergent conclusions in the literature, we provide empirical evidence on using ML interpretability in online communities through statistical and qualitative analyses of questionnaire responses. The Generalized Additive Model estimates participants' ratings, incorporating within-subject and between-subject designs. While our statistical analysis indicates that none of the interpretability layouts significantly influences participants' views, our qualitative analysis demonstrates the advantages of ML interpretability: 1) triggering participants to provide corrective feedback in case of discrepancies between their views and the model, and 2) providing insights to evaluate a model's behavior beyond traditional performance metrics.

Related papers

Assessing the Reliability of LLMs Annotations in the Context of Demographic Bias and Model Explanation [5.907945985868999]
This study investigates the extent to which annotator demographic features influence labeling decisions compared to text content.<n>Using a Generalized Linear Mixed Model, we quantify this inf luence, finding that demographic factors account for a minor fraction ( 8%) of the observed variance.<n>We then assess the reliability of Generative AI (GenAI) models as annotators, specifically evaluating if guiding them with demographic personas improves alignment with human judgments.
arXiv Detail & Related papers (2025-07-17T14:00:13Z)
Interpreting Social Bias in LVLMs via Information Flow Analysis and Multi-Round Dialogue Evaluation [1.7997395646080083]
Large Vision Language Models (LVLMs) have achieved remarkable progress in multimodal tasks, yet they also exhibit notable social biases.<n>We propose an explanatory framework that combines information flow analysis with multi-round dialogue evaluation.<n>Experiments reveal that LVLMs exhibit systematic disparities in information usage when processing images of different demographic groups.
arXiv Detail & Related papers (2025-05-27T12:28:44Z)
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models [121.03333569013148]
We introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories. These types of questions can be evaluated to assess the visual reasoning capabilities of MLLMs from multiple perspectives. Most models score below 30% accuracy-only slightly above the 25% random baseline and far below the 51.4% achieved by humans.
arXiv Detail & Related papers (2025-04-21T17:59:53Z)
CBEval: A framework for evaluating and interpreting cognitive biases in LLMs [1.4633779950109127]
Large Language models exhibit notable gaps in their cognitive processes. As reflections of human-generated data, these models have the potential to inherit cognitive biases.
arXiv Detail & Related papers (2024-12-04T05:53:28Z)
Fairness Evaluation with Item Response Theory [10.871079276188649]
This paper proposes a novel Fair-IRT framework to evaluate fairness in Machine Learning (ML) models. Detailed explanations for item characteristic curves (ICCs) are provided for particular individuals. Experiments demonstrate the effectiveness of this framework as a fairness evaluation tool.
arXiv Detail & Related papers (2024-10-20T22:25:20Z)
Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism [62.571419297164645]
This paper provides a systematic overview of prior works on the logical reasoning ability of large language models for analyzing categorical syllogisms. We first investigate all the possible variations for the categorical syllogisms from a purely logical perspective. We then examine the underlying configurations (i.e., mood and figure) tested by the existing datasets.
arXiv Detail & Related papers (2024-06-26T21:17:20Z)
FairBelief - Assessing Harmful Beliefs in Language Models [25.032952666134157]
Language Models (LMs) have been shown to inherit undesired biases that might hurt minorities and underrepresented groups if such systems were integrated into real-world applications without careful fairness auditing. This paper proposes FairBelief, an analytical approach to capture and assess beliefs, i.e., propositions that an LM may embed with different degrees of confidence and that covertly influence its predictions.
arXiv Detail & Related papers (2024-02-27T10:31:00Z)
Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable. Existing susceptibility studies heavily rely on self-reported beliefs. We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Towards Fair and Explainable AI using a Human-Centered AI Approach [5.888646114353372]
We present 5 research projects that aim to enhance explainability and fairness in classification systems and word embeddings. The first project explores the utility/downsides of introducing local model explanations as interfaces for machine teachers. The second project presents D-BIAS, a causality-based human-in-the-loop visual tool for identifying and mitigating social biases in datasets. The third project presents WordBias, a visual interactive tool that helps audit pre-trained static word embeddings for biases against groups. The fourth project presents DramatVis Personae, a visual analytics tool that helps identify social
arXiv Detail & Related papers (2023-06-12T21:08:55Z)
Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks. We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations. We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z)
Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments. It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z)
Evaluation Gaps in Machine Learning Practice [13.963766987258161]
In practice, evaluations of machine learning models frequently focus on a narrow range of decontextualized predictive behaviours. We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations. By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts.
arXiv Detail & Related papers (2022-05-11T04:00:44Z)
Under the Microscope: Interpreting Readability Assessment Models for Filipino [0.0]
We dissect machine learning-based readability assessment models in Filipino by performing global and local model interpretation. Results show that using a model trained with top features from global interpretation obtained higher performance than the ones using features selected by Spearman correlation.
arXiv Detail & Related papers (2021-10-01T01:27:10Z)
On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions. To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations. Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.