Negation-Instance Based Evaluation of End-to-End Negation Resolution
- URL: http://arxiv.org/abs/2109.10013v1
- Date: Tue, 21 Sep 2021 07:49:41 GMT
- Title: Negation-Instance Based Evaluation of End-to-End Negation Resolution
- Authors: Elizaveta Sineva, Stefan Gr\"unewald, Annemarie Friedrich, Jonas Kuhn
- Abstract summary: We argue for a negation-instance based approach to evaluating negation resolution.
Our proposed metrics correspond to expectations over per-instance scores and hence are intuitively interpretable.
We provide results for a set of current state-of-the-art systems for negation resolution on three English corpora.
- Score: 10.56502771201411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we revisit the task of negation resolution, which includes the
subtasks of cue detection (e.g. "not", "never") and scope resolution. In the
context of previous shared tasks, a variety of evaluation metrics have been
proposed. Subsequent works usually use different subsets of these, including
variations and custom implementations, rendering meaningful comparisons between
systems difficult. Examining the problem both from a linguistic perspective and
from a downstream viewpoint, we here argue for a negation-instance based
approach to evaluating negation resolution. Our proposed metrics correspond to
expectations over per-instance scores and hence are intuitively interpretable.
To render research comparable and to foster future work, we provide results for
a set of current state-of-the-art systems for negation resolution on three
English corpora, and make our implementation of the evaluation scripts publicly
available.
Related papers
- Negation Triplet Extraction with Syntactic Dependency and Semantic Consistency [37.99421732397288]
SSENE is built based on a generative pretrained language model (PLM) of-Decoder architecture with a multi-task learning framework.
We have constructed a high-quality Chinese dataset NegComment based on the users' reviews from the real-world platform of Meituan.
arXiv Detail & Related papers (2024-04-15T14:28:33Z) - End-to-End Evaluation for Low-Latency Simultaneous Speech Translation [55.525125193856084]
We propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions.
This includes the segmentation of the audio as well as the run-time of the different components.
We also compare different approaches to low-latency speech translation using this framework.
arXiv Detail & Related papers (2023-08-07T09:06:20Z) - Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
Negation [59.307534363825816]
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods.
arXiv Detail & Related papers (2022-10-06T23:39:01Z) - Improving negation detection with negation-focused pre-training [58.32362243122714]
Negation is a common linguistic feature that is crucial in many language understanding tasks.
Recent work has shown that state-of-the-art NLP models underperform on samples containing negation.
We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking.
arXiv Detail & Related papers (2022-05-09T02:41:11Z) - Probing as Quantifying the Inductive Bias of Pre-trained Representations [99.93552997506438]
We present a novel framework for probing where the goal is to evaluate the inductive bias of representations for a particular task.
We apply our framework to a series of token-, arc-, and sentence-level tasks.
arXiv Detail & Related papers (2021-10-15T22:01:16Z) - Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language.
We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.