Related papers: xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection

xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection

URL: http://arxiv.org/abs/2602.05874v1
Date: Thu, 05 Feb 2026 16:51:56 GMT
Title: xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection
Authors: Adrián Girón, Pablo Miralles, Javier Huertas-Tato, Sergio D'Antonio, David Camacho,
Abstract summary: We introduce xList-Hate, a diagnostic framework that decomposes hate speech detection into a checklist of explicit, concept-level questions.<n>The diagnostic signals are aggregated by a lightweight, fully interpretable decision tree, yielding transparent and auditable predictions.<n>Our results suggest that reframing hate speech detection as a diagnostic reasoning task, rather than a monolithic classification problem.
Score: 2.647843453311735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hate speech detection is commonly framed as a direct binary classification problem despite being a composite concept defined through multiple interacting factors that vary across legal frameworks, platform policies, and annotation guidelines. As a result, supervised models often overfit dataset-specific definitions and exhibit limited robustness under domain shift and annotation noise. We introduce xList-Hate, a diagnostic framework that decomposes hate speech detection into a checklist of explicit, concept-level questions grounded in widely shared normative criteria. Each question is independently answered by a large language model (LLM), producing a binary diagnostic representation that captures hateful content features without directly predicting the final label. These diagnostic signals are then aggregated by a lightweight, fully interpretable decision tree, yielding transparent and auditable predictions. We evaluate it across multiple hate speech benchmarks and model families, comparing it against zero-shot LLM classification and in-domain supervised fine-tuning. While supervised methods typically maximize in-domain performance, we consistently improves cross-dataset robustness and relative performance under domain shift. In addition, qualitative analysis of disagreement cases provides evidence that the framework can be less sensitive to certain forms of annotation inconsistency and contextual ambiguity. Crucially, the approach enables fine-grained interpretability through explicit decision paths and factor-level analysis. Our results suggest that reframing hate speech detection as a diagnostic reasoning task, rather than a monolithic classification problem, provides a robust, explainable, and extensible alternative for content moderation.

Related papers

Can Unified Generation and Understanding Models Maintain Semantic Equivalence Across Different Output Modalities? [61.533560295383786]
Unified Multimodal Large Language Models (U-MLLMs) integrate understanding and generation within a single architecture.<n>We observe that U-MLLMs fail to maintain semantic equivalence when required to render the same results in the image modality.<n>We introduce VGUBench, a framework to decouple reasoning logic from generation fidelity.
arXiv Detail & Related papers (2026-02-27T06:23:56Z)
AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering [97.52852990265136]
We introduce AQAScore, a backbone-agnostic evaluation framework that leverages the reasoning capabilities of audio-aware large language models.<n>We evaluate AQAScore across multiple benchmarks, including human-rated relevance, pairwise comparison, and compositional reasoning tasks.
arXiv Detail & Related papers (2026-01-21T07:35:36Z)
CLASH: A Benchmark for Cross-Modal Contradiction Detection [15.134491772506196]
CLASH is a novel benchmark for multimodal contradiction detection.<n>It features COCO images paired with contradictory captions containing controlled object-level or attribute-level contradictions.
arXiv Detail & Related papers (2025-11-24T15:09:07Z)
Multi-Rationale Explainable Object Recognition via Contrastive Conditional Inference [1.2309843977641421]
We introduce a multi-rationale explainable object recognition benchmark comprising datasets in which each image is annotated with multiple ground-truth rationales.<n>We propose a contrastive conditional inference framework that explicitly models the probabilistic relationships among image embeddings, category labels, and rationales.<n>Our approach achieves state-of-the-art results on the multi-rationale explainable object recognition benchmark, including strong zero-shot performance.
arXiv Detail & Related papers (2025-08-19T21:28:12Z)
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
AGENT-X: Adaptive Guideline-based Expert Network for Threshold-free AI-generated teXt detection [44.66668435489055]
AGENT-X is a zero-shot multi-agent framework for AI-generated text detection.<n>We organize detection guidelines into semantic, stylistic, and structural dimensions, each independently evaluated by specialized linguistic agents.<n>A meta agent integrates these assessments through confidence-aware aggregation, enabling threshold-free, interpretable classification.<n>Experiments on diverse datasets demonstrate that AGENT-X substantially surpasses state-of-the-art supervised and zero-shot approaches in accuracy, interpretability, and generalization.
arXiv Detail & Related papers (2025-05-21T08:39:18Z)
Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization [58.390885294401066]
Retrieval-augmented generation (RAG) has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs)<n>RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions.<n>We propose AlignRAG, a novel iterative framework grounded in Critique-Driven Alignment (CDA)<n>We introduce AlignRAG-auto, an autonomous variant that dynamically terminates refinement, removing the need to pre-specify the number of critique iterations.
arXiv Detail & Related papers (2025-04-21T04:56:47Z)
Improving Hate Speech Classification with Cross-Taxonomy Dataset Integration [0.0]
The work introduces a universal taxonomy and a hate speech classifier capable of detecting a wide range of definitions within a single framework.<n>Our approach is validated by combining two widely used but differently annotated datasets.<n>This work highlights the potential of dataset and taxonomy integration in advancing hate speech detection, increasing efficiency, and ensuring broader applicability across contexts.
arXiv Detail & Related papers (2025-03-07T12:01:02Z)
SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection [70.23196257213829]
We propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection.<n>Our proposed framework first constructs a scalable evaluation benchmark that currently includes 564 event types covering 7 major domains.<n>We then leverage large language models (LLMs) as automatic evaluation agents to compute a semantic F1-score, incorporating fine-grained definitions of semantically similar labels.
arXiv Detail & Related papers (2025-03-05T09:37:05Z)
Subjective Logic Encodings [20.458601113219697]
Data perspectivism seeks to leverage inter-annotator disagreement to learn models.<n>Subjective Logic SLEs is a framework for constructing classification targets that explicitly encodes annotations as opinions of the annotators.
arXiv Detail & Related papers (2025-02-17T15:14:10Z)
Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z)
Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples. By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.