PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling
Systems Evaluation
- URL: http://arxiv.org/abs/2210.06408v1
- Date: Wed, 12 Oct 2022 17:04:28 GMT
- Title: PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling
Systems Evaluation
- Authors: Ishan Jindal, Alexandre Rademaker, Khoi-Nguyen Tran, Huaiyu Zhu,
Hiroshi Kanayama, Marina Danilevsky, Yunyao Li
- Abstract summary: We propose a more strict SRL evaluation metric PriMeSRL.
We show that PriMeSRL drops the quality evaluation of all SoTA SRL models significantly.
We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL models.
- Score: 66.79238445033795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic role labeling (SRL) identifies the predicate-argument structure in a
sentence. This task is usually accomplished in four steps: predicate
identification, predicate sense disambiguation, argument identification, and
argument classification. Errors introduced at one step propagate to later
steps. Unfortunately, the existing SRL evaluation scripts do not consider the
full effect of this error propagation aspect. They either evaluate arguments
independent of predicate sense (CoNLL09) or do not evaluate predicate sense at
all (CoNLL05), yielding an inaccurate SRL model performance on the argument
classification task. In this paper, we address key practical issues with
existing evaluation scripts and propose a more strict SRL evaluation metric
PriMeSRL. We observe that by employing PriMeSRL, the quality evaluation of all
SoTA SRL models drops significantly, and their relative rankings also change.
We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL
models.
Related papers
- DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection [0.7922558880545527]
Divergence-Aware Hallucination-Remediated SRL projection (DAHRS)
We implement DAHRS, leveraging linguistically-informed remediation alignment followed by greedy First-Come First-CFA (F) SRL projection.
We achieve a higher word-level F1 over XSRL: 87.6% vs. 77.3% (EN-FR) and 89.0% vs. 82.7% (EN-ES)
arXiv Detail & Related papers (2024-07-12T14:13:59Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs)
We propose a new RL method named RLMEC that incorporates a generative model as the reward model.
Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z) - CASA: Causality-driven Argument Sufficiency Assessment [79.13496878681309]
We propose CASA, a zero-shot causality-driven argument sufficiency assessment framework.
PS measures how likely introducing the premise event would lead to the conclusion when both the premise and conclusion events are absent.
Experiments on two logical fallacy detection datasets demonstrate that CASA accurately identifies insufficient arguments.
arXiv Detail & Related papers (2024-01-10T16:21:18Z) - Using Think-Aloud Data to Understand Relations between Self-Regulation
Cycle Characteristics and Student Performance in Intelligent Tutoring Systems [15.239133633467672]
The present study investigates SRL behaviors in relationship to learners' moment-by-moment performance.
We demonstrate the feasibility of labeling SRL behaviors based on AI-generated think-aloud transcripts.
Students' actions during earlier, process-heavy stages of SRL cycles exhibited lower moment-by-moment correctness during problem-solving than later SRL cycle stages.
arXiv Detail & Related papers (2023-12-09T20:36:58Z) - Persian Semantic Role Labeling Using Transfer Learning and BERT-Based
Models [5.592292907237565]
We present an end-to-end SRL method that not only eliminates the need for feature extraction but also outperforms existing methods in facing new samples.
The proposed method does not employ any auxiliary features and shows more than 16 (83.16) percent improvement in accuracy against previous methods in similar circumstances.
arXiv Detail & Related papers (2023-06-17T12:50:09Z) - Semantic Role Labeling Meets Definition Modeling: Using Natural Language
to Describe Predicate-Argument Structures [104.32063681736349]
We present an approach to describe predicate-argument structures using natural language definitions instead of discrete labels.
Our experiments and analyses on PropBank-style and FrameNet-style, dependency-based and span-based SRL also demonstrate that a flexible model with an interpretable output does not necessarily come at the expense of performance.
arXiv Detail & Related papers (2022-12-02T11:19:16Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - RL with KL penalties is better viewed as Bayesian inference [4.473139775790299]
We analyze challenges associated with treating a language model as anReinforcement Learning policy.
We show how avoiding those challenges requires moving beyond the RL paradigm.
arXiv Detail & Related papers (2022-05-23T12:47:13Z) - Transition-based Semantic Role Labeling with Pointer Networks [0.40611352512781856]
We propose the first transition-based SRL approach that is capable of completely processing an input sentence in a single left-to-right pass.
Thanks to our implementation based on Pointer Networks, full SRL can be accurately and efficiently done in $O(n2)$, achieving the best performance to date on the majority of languages from the CoNLL-2009 shared task.
arXiv Detail & Related papers (2022-05-20T08:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.