Multilingual Word Sense Disambiguation with Unified Sense Representation
- URL: http://arxiv.org/abs/2210.07447v1
- Date: Fri, 14 Oct 2022 01:24:03 GMT
- Title: Multilingual Word Sense Disambiguation with Unified Sense Representation
- Authors: Ying Su, Hongming Zhang, Yangqiu Song, Tong Zhang
- Abstract summary: We propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems.
We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones.
Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.
- Score: 55.3061179361177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a key natural language processing (NLP) task, word sense disambiguation
(WSD) evaluates how well NLP models can understand the lexical semantics of
words under specific contexts. Benefited from the large-scale annotation,
current WSD systems have achieved impressive performances in English by
combining supervised learning with lexical knowledge. However, such success is
hard to be replicated in other languages, where we only have limited
annotations.In this paper, based on the multilingual lexicon BabelNet
describing the same set of concepts across languages, we propose building
knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD)
systems. We build unified sense representations for multiple languages and
address the annotation scarcity problem for MWSD by transferring annotations
from rich-sourced languages to poorer ones. With the unified sense
representations, annotations from multiple languages can be jointly trained to
benefit the MWSD tasks. Evaluations of SemEval-13 and SemEval-15 datasets
demonstrate the effectiveness of our methodology.
Related papers
- Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features [18.76505158652759]
We propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation.
On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features.
On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation.
arXiv Detail & Related papers (2024-08-02T17:10:12Z) - Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training [29.47243668154796]
BLOOMZMMS is a novel model that integrates a multilingual LLM with a multilingual speech encoder.
We demonstrate the transferability of linguistic knowledge from the text to the speech modality.
Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks.
arXiv Detail & Related papers (2024-04-16T21:45:59Z) - Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages? [34.38469832305664]
This paper focuses on human values-related concepts (i.e., value concepts) due to their significance for AI safety.
We first empirically confirm the presence of value concepts within LLMs in a multilingual format.
Further analysis on the cross-lingual characteristics of these concepts reveals 3 traits arising from language resource disparities.
arXiv Detail & Related papers (2024-02-28T07:18:39Z) - Semantic enrichment towards efficient speech representations [9.30840529284715]
This study investigates a specific in-domain semantic enrichment of the SAMU-XLSR model.
We show the benefits of the use of same-domain French and Italian benchmarks for low-resource language portability.
arXiv Detail & Related papers (2023-07-03T19:52:56Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Looking for Clues of Language in Multilingual BERT to Improve
Cross-lingual Generalization [56.87201892585477]
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information.
We control the output languages of multilingual BERT by manipulating the token embeddings.
arXiv Detail & Related papers (2020-10-20T05:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.