Re-Examining Human Annotations for Interpretable NLP
- URL: http://arxiv.org/abs/2204.04580v1
- Date: Sun, 10 Apr 2022 02:27:30 GMT
- Title: Re-Examining Human Annotations for Interpretable NLP
- Authors: Cheng-Han Chiang and Hung-yi Lee
- Abstract summary: We conduct controlled experiments using crowd-sourced websites on two widely used datasets in Interpretable NLP.
We compare the annotation results obtained from recruiting workers satisfying different levels of qualification.
Our results reveal that the annotation quality is highly subject to the workers' qualification, and workers can be guided to provide certain annotations by the instructions.
- Score: 80.81532239566992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explanation methods in Interpretable NLP often explain the model's decision
by extracting evidence (rationale) from the input texts supporting the
decision. Benchmark datasets for rationales have been released to evaluate how
good the rationale is. The ground truth rationales in these datasets are often
human annotations obtained via crowd-sourced websites. Valuable as these
datasets are, the details on how those human annotations are obtained are often
not clearly specified. We conduct comprehensive controlled experiments using
crowd-sourced websites on two widely used datasets in Interpretable NLP to
understand how those unsaid details can affect the annotation results.
Specifically, we compare the annotation results obtained from recruiting
workers satisfying different levels of qualification. We also provide
high-quality workers with different instructions for completing the same
underlying tasks. Our results reveal that the annotation quality is highly
subject to the workers' qualification, and workers can be guided to provide
certain annotations by the instructions. We further show that specific
explanation methods perform better when evaluated using the ground truth
rationales obtained by particular instructions. Based on these observations, we
highlight the importance of providing complete details of the annotation
process and call for careful interpretation of any experiment results obtained
using those annotations.
Related papers
- On the Biased Assessment of Expert Finding Systems [11.083396379885478]
In large organisations, identifying experts on a given topic is crucial in leveraging the internal knowledge spread across teams and departments.
This case study provides an analysis of how these recommendations can impact the evaluation of expert finding systems.
We show that system-validated annotations lead to overestimated performance of traditional term-based retrieval models.
We also augment knowledge areas with synonyms to uncover a strong bias towards literal mentions of their constituent words.
arXiv Detail & Related papers (2024-10-07T13:19:08Z) - Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Bridging Benchmark Dataset [1.825224193230824]
We describe a novel, collaborative, and iterative annotator-in-the-loop methodology for annotation.
Our findings indicate that collaborative engagement with annotators can enhance annotation methods.
arXiv Detail & Related papers (2024-08-01T19:11:08Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model [42.70608373297776]
We propose a selective annotation framework called SANT.
It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms.
Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers.
arXiv Detail & Related papers (2024-05-20T14:52:05Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - DecompEval: Evaluating Generated Texts as Unsupervised Decomposed
Question Answering [95.89707479748161]
Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability.
We propose a metric called DecompEval that formulates NLG evaluation as an instruction-style question answering task.
We decompose our devised instruction-style question about the quality of generated texts into the subquestions that measure the quality of each sentence.
The subquestions with their answers generated by PLMs are then recomposed as evidence to obtain the evaluation result.
arXiv Detail & Related papers (2023-07-13T16:16:51Z) - Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs)
We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric.
Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z) - On Releasing Annotator-Level Labels and Information in Datasets [6.546195629698355]
We show that label aggregation may introduce representational biases of individual and group perspectives.
We propose recommendations for increased utility and transparency of datasets for downstream use cases.
arXiv Detail & Related papers (2021-10-12T02:35:45Z) - Teach Me to Explain: A Review of Datasets for Explainable NLP [6.256505195819595]
Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated explanations.
These explanations are used downstream in three ways: as data augmentation to improve performance on a predictive task, as a loss signal to train models to produce explanations for their predictions, and as a means to evaluate the quality of model-generated explanations.
In this review, we identify three predominant classes of explanations (highlights, free-text, and structured), organize the literature on annotating each type, point to what has been learned to date, and give recommendations for collecting ExNLP datasets in the future.
arXiv Detail & Related papers (2021-02-24T04:25:01Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.