DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection
- URL: http://arxiv.org/abs/2407.09283v1
- Date: Fri, 12 Jul 2024 14:13:59 GMT
- Title: DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection
- Authors: Sangpil Youm, Brodie Mather, Chathuri Jayaweera, Juliana Prada, Bonnie Dorr,
- Abstract summary: Divergence-Aware Hallucination-Remediated SRL projection (DAHRS)
We implement DAHRS, leveraging linguistically-informed remediation alignment followed by greedy First-Come First-CFA (F) SRL projection.
We achieve a higher word-level F1 over XSRL: 87.6% vs. 77.3% (EN-FR) and 89.0% vs. 82.7% (EN-ES)
- Score: 0.7922558880545527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic role labeling (SRL) enriches many downstream applications, e.g., machine translation, question answering, summarization, and stance/belief detection. However, building multilingual SRL models is challenging due to the scarcity of semantically annotated corpora for multiple languages. Moreover, state-of-the-art SRL projection (XSRL) based on large language models (LLMs) yields output that is riddled with spurious role labels. Remediation of such hallucinations is not straightforward due to the lack of explainability of LLMs. We show that hallucinated role labels are related to naturally occurring divergence types that interfere with initial alignments. We implement Divergence-Aware Hallucination-Remediated SRL projection (DAHRS), leveraging linguistically-informed alignment remediation followed by greedy First-Come First-Assign (FCFA) SRL projection. DAHRS improves the accuracy of SRL projection without additional transformer-based machinery, beating XSRL in both human and automatic comparisons, and advancing beyond headwords to accommodate phrase-level SRL projection (e.g., EN-FR, EN-ES). Using CoNLL-2009 as our ground truth, we achieve a higher word-level F1 over XSRL: 87.6% vs. 77.3% (EN-FR) and 89.0% vs. 82.7% (EN-ES). Human phrase-level assessments yield 89.1% (EN-FR) and 91.0% (EN-ES). We also define a divergence metric to adapt our approach to other language pairs (e.g., English-Tagalog).
Related papers
- Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models [12.447489454369636]
This paper evaluates sentence-level hallucination detection approaches using Large Language Models (LLMs) and semantic similarity within massively multilingual embeddings.
LLMs can achieve performance comparable or even better than previously proposed models, despite not being explicitly trained for any machine translation task.
arXiv Detail & Related papers (2024-07-23T13:40:54Z) - FLAME: Factuality-Aware Alignment for Large Language Models [86.76336610282401]
The conventional alignment process fails to enhance the factual accuracy of large language models (LLMs)
We identify factors that lead to hallucination in both alignment steps: supervised fine-tuning (SFT) and reinforcement learning (RL)
We propose factuality-aware alignment, comprised of factuality-aware SFT and factuality-aware RL through direct preference optimization.
arXiv Detail & Related papers (2024-05-02T17:54:54Z) - NoMIRACL: Knowing When You Don't Know for Robust Multilingual
Retrieval-Augmented Generation [92.5132418788568]
Retrieval-augmented generation (RAG) grounds large language model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations.
NoMIRACL is a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages.
We measure robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.
arXiv Detail & Related papers (2023-12-18T17:18:04Z) - Persian Semantic Role Labeling Using Transfer Learning and BERT-Based
Models [5.592292907237565]
We present an end-to-end SRL method that not only eliminates the need for feature extraction but also outperforms existing methods in facing new samples.
The proposed method does not employ any auxiliary features and shows more than 16 (83.16) percent improvement in accuracy against previous methods in similar circumstances.
arXiv Detail & Related papers (2023-06-17T12:50:09Z) - CharSpan: Utilizing Lexical Similarity to Enable Zero-Shot Machine
Translation for Extremely Low-resource Languages [22.51558549091902]
We address the task of machine translation (MT) from extremely low-resource language (ELRL) to English by leveraging cross-lingual transfer from 'closely-related' high-resource language (HRL)
Many ELRLs share lexical similarities with some HRLs, which presents a novel modeling opportunity.
Existing subword-based neural MT models do not explicitly harness this lexical similarity, as they only implicitly align HRL and ELRL latent embedding space.
We propose a novel, CharSpan, approach based on 'character-span noise augmentation' into the training data of HRL. This serves as a
arXiv Detail & Related papers (2023-05-09T07:23:01Z) - PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling
Systems Evaluation [66.79238445033795]
We propose a more strict SRL evaluation metric PriMeSRL.
We show that PriMeSRL drops the quality evaluation of all SoTA SRL models significantly.
We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL models.
arXiv Detail & Related papers (2022-10-12T17:04:28Z) - Is Reinforcement Learning (Not) for Natural Language Processing?:
Benchmarks, Baselines, and Building Blocks for Natural Language Policy
Optimization [73.74371798168642]
We introduce an open-source modular library, RL4LMs, for optimizing language generators with reinforcement learning.
Next, we present the GRUE benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions.
Finally, we introduce an easy-to-use, performant RL algorithm, NLPO, that learns to effectively reduce the action space in language generation.
arXiv Detail & Related papers (2022-10-03T21:38:29Z) - Overlap-based Vocabulary Generation Improves Cross-lingual Transfer
Among Related Languages [18.862296065737347]
We argue that relatedness among languages in a language family along the dimension of lexical overlap may be leveraged to overcome some of the corpora limitations of LRLs.
We propose Overlap BPE, a simple yet effective modification to the BPE vocabulary generation algorithm which enhances overlap across related languages.
arXiv Detail & Related papers (2022-03-03T19:35:24Z) - Syntax Role for Neural Semantic Role Labeling [77.5166510071142]
Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence.
Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance.
Recent neural SRL studies show that syntax information becomes much less important for neural semantic role labeling.
arXiv Detail & Related papers (2020-09-12T07:01:12Z) - Cross-lingual Semantic Role Labeling with Model Transfer [49.85316125365497]
Cross-lingual semantic role labeling can be achieved by model transfer under the help of universal features.
We propose an end-to-end SRL model that incorporates a variety of universal features and transfer methods.
arXiv Detail & Related papers (2020-08-24T09:37:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.