Verb Knowledge Injection for Multilingual Event Processing
- URL: http://arxiv.org/abs/2012.15421v1
- Date: Thu, 31 Dec 2020 03:24:34 GMT
- Title: Verb Knowledge Injection for Multilingual Event Processing
- Authors: Olga Majewska, Ivan Vuli\'c, Goran Glava\v{s}, Edoardo M. Ponti, Anna
Korhonen
- Abstract summary: We investigate whether injecting explicit information on verbs' semantic-syntactic behaviour improves the performance of LM-pretrained Transformers.
We first demonstrate that injecting verb knowledge leads to performance gains in English event extraction.
We then explore the utility of verb adapters for event extraction in other languages.
- Score: 50.27826310460763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In parallel to their overwhelming success across NLP tasks, language ability
of deep Transformer networks, pretrained via language modeling (LM) objectives
has undergone extensive scrutiny. While probing revealed that these models
encode a range of syntactic and semantic properties of a language, they are
still prone to fall back on superficial cues and simple heuristics to solve
downstream tasks, rather than leverage deeper linguistic knowledge. In this
paper, we target one such area of their deficiency, verbal reasoning. We
investigate whether injecting explicit information on verbs' semantic-syntactic
behaviour improves the performance of LM-pretrained Transformers in event
extraction tasks -- downstream tasks for which accurate verb processing is
paramount. Concretely, we impart the verb knowledge from curated lexical
resources into dedicated adapter modules (dubbed verb adapters), allowing it to
complement, in downstream tasks, the language knowledge obtained during
LM-pretraining. We first demonstrate that injecting verb knowledge leads to
performance gains in English event extraction. We then explore the utility of
verb adapters for event extraction in other languages: we investigate (1)
zero-shot language transfer with multilingual Transformers as well as (2)
transfer via (noisy automatic) translation of English verb-based lexical
constraints. Our results show that the benefits of verb knowledge injection
indeed extend to other languages, even when verb adapters are trained on
noisily translated constraints.
Related papers
- Adapters for Altering LLM Vocabularies: What Languages Benefit the Most? [23.83290627671739]
We propose a novel method for vocabulary adaptation using adapter modules that are trained to learn the optimal linear combination of existing embeddings.
VocADT offers a flexible and scalable solution without requiring external resources or language constraints.
We find that Latin-script languages and highly fragmented languages benefit the most from vocabulary adaptation.
arXiv Detail & Related papers (2024-10-12T20:45:24Z) - The Impact of Language Adapters in Cross-Lingual Transfer for NLU [0.8702432681310401]
We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets.
Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models.
Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.
arXiv Detail & Related papers (2024-01-31T20:07:43Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - From Masked Language Modeling to Translation: Non-English Auxiliary
Tasks Improve Zero-shot Spoken Language Understanding [24.149299722716155]
We introduce xSID, a new benchmark for cross-lingual Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect.
We propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.
Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification.
arXiv Detail & Related papers (2021-05-15T23:51:11Z) - Multitasking Inhibits Semantic Drift [46.71462510028727]
We study the dynamics of learning in latent language policies (LLPs)
LLPs can solve challenging long-horizon reinforcement learning problems.
Previous work has found that LLP training is prone to semantic drift.
arXiv Detail & Related papers (2021-04-15T03:42:17Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.