Related papers: Bi-Attention HateXplain : Taking into account the sequential aspect of data during explainability in a multi-task context

Bi-Attention HateXplain : Taking into account the sequential aspect of data during explainability in a multi-task context

URL: http://arxiv.org/abs/2601.13018v1
Date: Mon, 19 Jan 2026 12:52:18 GMT
Title: Bi-Attention HateXplain : Taking into account the sequential aspect of data during explainability in a multi-task context
Authors: Ghislain Dorian Tchuente Mondjo,
Abstract summary: We propose a BiAtt-BiRNN-HateXplain (Bidirectional Attention BiRNN HateXplain) model which is easier to explain compared to LLMs.<n>The model could classify better and commit fewer unintentional bias errors related to communities.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Technological advances in the Internet and online social networks have brought many benefits to humanity. At the same time, this growth has led to an increase in hate speech, the main global threat. To improve the reliability of black-box models used for hate speech detection, post-hoc approaches such as LIME, SHAP, and LRP provide the explanation after training the classification model. In contrast, multi-task approaches based on the HateXplain benchmark learn to explain and classify simultaneously. However, results from HateXplain-based algorithms show that predicted attention varies considerably when it should be constant. This attention variability can lead to inconsistent interpretations, instability of predictions, and learning difficulties. To solve this problem, we propose the BiAtt-BiRNN-HateXplain (Bidirectional Attention BiRNN HateXplain) model which is easier to explain compared to LLMs which are more complex in view of the need for transparency, and will take into account the sequential aspect of the input data during explainability thanks to a BiRNN layer. Thus, if the explanation is correctly estimated, thanks to multi-task learning (explainability and classification task), the model could classify better and commit fewer unintentional bias errors related to communities. The experimental results on HateXplain data show a clear improvement in detection performance, explainability and a reduction in unintentional bias.

Related papers

Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps [3.8936716676293917]
This study investigates the in-context learning capabilities of various decoder-only transformer-based language models with different model sizes and training data.<n>We identify a critical parameter threshold (1.6 billion), beyond which reasoning performance improves significantly in tasks such as commonsense reasoning in multiple-choice question answering and deductive reasoning.
arXiv Detail & Related papers (2025-02-21T00:48:32Z)
CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning [52.63674911541416]
Few-shot class-incremental learning (FSCIL) faces several challenges, such as overfitting and forgetting. Our primary focus is representation learning on base classes to tackle the unique challenge of FSCIL. We find that trying to secure the spread of features within a more confined feature space enables the learned representation to strike a better balance between transferability and discriminability.
arXiv Detail & Related papers (2024-10-08T02:23:16Z)
LLMExplainer: Large Language Model based Bayesian Inference for Graph Explanation Generation [20.234100409015507]
Recent studies seek to provide Graph Neural Network (GNN) interpretability via multiple unsupervised learning models. Due to the scarcity of datasets, current methods easily suffer from learning bias. We embed a Large Language Model (LLM) as knowledge into the GNN explanation network to avoid the learning bias problem.
arXiv Detail & Related papers (2024-07-22T03:36:38Z)
TVE: Learning Meta-attribution for Transferable Vision Explainer [76.68234965262761]
We introduce a Transferable Vision Explainer (TVE) that can effectively explain various vision models in downstream tasks. TVE is realized through a pre-training process on large-scale datasets towards learning the meta-attribution. This meta-attribution leverages the versatility of generic backbone encoders to comprehensively encode the attribution knowledge for the input instance, which enables TVE to seamlessly transfer to explain various downstream tasks.
arXiv Detail & Related papers (2023-12-23T21:49:23Z)
Regressor-Segmenter Mutual Prompt Learning for Crowd Counting [70.49246560246736]
We propose mutual prompt learning (mPrompt) to solve bias and inaccuracy caused by annotation variance. Experiments show that mPrompt significantly reduces the Mean Average Error (MAE)
arXiv Detail & Related papers (2023-12-04T07:53:59Z)
Understanding and Mitigating Classification Errors Through Interpretable Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors. We propose to discover those patterns of tokens that distinguish correct and erroneous predictions. We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z)
Ignorance is Bliss: Robust Control via Information Gating [60.17644038829572]
Informational parsimony provides a useful inductive bias for learning representations that achieve better generalization by being robust to noise and spurious correlations. We propose textitinformation gating as a way to learn parsimonious representations that identify the minimal information required for a task.
arXiv Detail & Related papers (2023-03-10T18:31:50Z)
Parallel Sentence-Level Explanation Generation for Real-World Low-Resource Scenarios [18.5713713816771]
This paper is the first to explore the problem smoothly from weak-supervised learning to unsupervised learning. We propose a non-autoregressive interpretable model to facilitate parallel explanation generation and simultaneous prediction.
arXiv Detail & Related papers (2023-02-21T14:52:21Z)
Exploring Hate Speech Detection with HateXplain and BERT [2.673732496490253]
Hate Speech takes many forms to target communities with derogatory comments, and takes humanity a step back in societal progress. HateXplain is a recently published and first dataset to use annotated spans in the form of rationales, along with speech classification categories and targeted communities. We tune BERT to perform this task in the form of rationales and class prediction, and compare our performance on different metrics spanning across accuracy, explainability and bias.
arXiv Detail & Related papers (2022-08-09T01:32:44Z)
ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection [85.68684067031909]
We frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts. In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic 2020) improves the performance even further.
arXiv Detail & Related papers (2022-05-25T05:10:08Z)
Leveraging Multi-domain, Heterogeneous Data using Deep Multitask Learning for Hate Speech Detection [21.410160004193916]
We propose a Convolution Neural Network based multi-task learning models (MTLs)footnotecode to leverage information from multiple sources. Empirical analysis performed on three benchmark datasets shows the efficacy of the proposed approach.
arXiv Detail & Related papers (2021-03-23T09:31:01Z)
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback. We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.