Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal
Information
- URL: http://arxiv.org/abs/2203.07426v1
- Date: Mon, 14 Mar 2022 18:37:09 GMT
- Title: Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal
Information
- Authors: Fanchao Qi, Chuancheng Lv, Zhiyuan Liu, Xiaojun Meng, Maosong Sun,
Hai-Tao Zheng
- Abstract summary: Sememe knowledge bases (KBs) are built by manually annotating words with sememes.
Existing sememe KBs only cover a few languages, which hinders the wide utilization of sememes.
This paper aims to build a multilingual sememe KB based on BabelNet, a multilingual encyclopedia dictionary.
- Score: 89.24684041258747
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In linguistics, a sememe is defined as the minimum semantic unit of
languages. Sememe knowledge bases (KBs), which are built by manually annotating
words with sememes, have been successfully applied to various NLP tasks.
However, existing sememe KBs only cover a few languages, which hinders the wide
utilization of sememes. To address this issue, the task of sememe prediction
for BabelNet synsets (SPBS) is presented, aiming to build a multilingual sememe
KB based on BabelNet, a multilingual encyclopedia dictionary. By automatically
predicting sememes for a BabelNet synset, the words in many languages in the
synset would obtain sememe annotations simultaneously. However, previous SPBS
methods have not taken full advantage of the abundant information in BabelNet.
In this paper, we utilize the multilingual synonyms, multilingual glosses and
images in BabelNet for SPBS. We design a multimodal information fusion model to
encode and combine this information for sememe prediction. Experimental results
show the substantial outperformance of our model over previous methods (about
10 MAP and F1 scores). All the code and data of this paper can be obtained at
https://github.com/thunlp/MSGI.
Related papers
- MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations [53.89380284760555]
We introduce Babel-ImageNet, a massively multilingual benchmark that offers partial translations of ImageNet labels to 100 languages.
We evaluate 11 public multilingual CLIP models on our benchmark, demonstrating a significant gap between English ImageNet performance and that of high-resource languages.
We show that the performance of multilingual CLIP can be drastically improved for low-resource languages with parameter-efficient language-specific training.
arXiv Detail & Related papers (2023-06-14T17:53:06Z) - Machine-Created Universal Language for Cross-lingual Transfer [73.44138687502294]
We propose a new Machine-created Universal Language (MUL) as an alternative intermediate language.
MUL comprises a set of discrete symbols forming a universal vocabulary and a natural language to MUL translator.
MUL unifies shared concepts from various languages into a single universal word, enhancing cross-language transfer.
arXiv Detail & Related papers (2023-05-22T14:41:09Z) - Investigating the Translation Performance of a Large Multilingual
Language Model: the Case of BLOOM [8.858671209228536]
We focus on BLOOM's multilingual ability by evaluating its machine translation performance across several datasets.
We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.
arXiv Detail & Related papers (2023-03-03T13:23:42Z) - Prix-LM: Pretraining for Multilingual Knowledge Base Construction [59.02868906044296]
We propose a unified framework, Prix-LM, for multilingual knowledge construction and completion.
We leverage two types of knowledge, monolingual triples and cross-lingual links, extracted from existing multilingual KBs.
Experiments on standard entity-related tasks, such as link prediction in multiple languages, cross-lingual entity linking and bilingual lexicon induction, demonstrate its effectiveness.
arXiv Detail & Related papers (2021-10-16T02:08:46Z) - Examining Cross-lingual Contextual Embeddings with Orthogonal Structural
Probes [0.2538209532048867]
A novel Orthogonal Structural Probe (Limisiewicz and Marevcek, 2021) allows us to answer this question for specific linguistic features.
We evaluate syntactic (UD) and lexical (WordNet) structural information encoded inmBERT's contextual representations for nine diverse languages.
We successfully apply our findings to zero-shot and few-shot cross-lingual parsing.
arXiv Detail & Related papers (2021-09-10T15:03:11Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Transferring Knowledge Distillation for Multilingual Social Event
Detection [42.663309895263666]
Recently published graph neural networks (GNNs) show promising performance at social event detection tasks.
We present a GNN that incorporates cross-lingual word embeddings for detecting events in multilingual data streams.
Experiments on both synthetic and real-world datasets show the framework to be highly effective at detection in both multilingual data and in languages where training samples are scarce.
arXiv Detail & Related papers (2021-08-06T12:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.