Related papers: Prix-LM: Pretraining for Multilingual Knowledge Base Construction

Prix-LM: Pretraining for Multilingual Knowledge Base Construction

URL: http://arxiv.org/abs/2110.08443v1
Date: Sat, 16 Oct 2021 02:08:46 GMT
Title: Prix-LM: Pretraining for Multilingual Knowledge Base Construction
Authors: Wenxuan Zhou, Fangyu Liu, Ivan Vuli\'c, Nigel Collier, Muhao Chen
Abstract summary: We propose a unified framework, Prix-LM, for multilingual knowledge construction and completion. We leverage two types of knowledge, monolingual triples and cross-lingual links, extracted from existing multilingual KBs. Experiments on standard entity-related tasks, such as link prediction in multiple languages, cross-lingual entity linking and bilingual lexicon induction, demonstrate its effectiveness.
Score: 59.02868906044296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge bases (KBs) contain plenty of structured world and commonsense knowledge. As such, they often complement distributional text-based information and facilitate various downstream tasks. Since their manual construction is resource- and time-intensive, recent efforts have tried leveraging large pretrained language models (PLMs) to generate additional monolingual knowledge facts for KBs. However, such methods have not been attempted for building and enriching multilingual KBs. Besides wider application, such multilingual KBs can provide richer combined knowledge than monolingual (e.g., English) KBs. Knowledge expressed in different languages may be complementary and unequally distributed: this implies that the knowledge available in high-resource languages can be transferred to low-resource ones. To achieve this, it is crucial to represent multilingual knowledge in a shared/unified space. To this end, we propose a unified framework, Prix-LM, for multilingual KB construction and completion. We leverage two types of knowledge, monolingual triples and cross-lingual links, extracted from existing multilingual KBs, and tune a multilingual language encoder XLM-R via a causal language modeling objective. Prix-LM integrates useful multilingual and KB-based factual knowledge into a single model. Experiments on standard entity-related tasks, such as link prediction in multiple languages, cross-lingual entity linking and bilingual lexicon induction, demonstrate its effectiveness, with gains reported over strong task-specialised baselines.

Related papers

How and Where to Translate? The Impact of Translation Strategies in Cross-lingual LLM Prompting [15.388822834013599]
In multilingual retrieval-augmented generation (RAG)-based systems, knowledge bases (KB) are often shared from high-resource languages (such as English) to low-resource ones.<n>Two common practices are pre-translation to create a mono-lingual prompt and cross-lingual prompting for direct inference.<n>We show that an optimized prompting strategy can significantly improve knowledge sharing across languages, therefore improve the performance on the downstream classification task.
arXiv Detail & Related papers (2025-07-21T19:37:15Z)
Multilingual Information Retrieval with a Monolingual Knowledge Base [2.419638771866955]
We propose a novel strategy to fine-tune multilingual embedding models with weighted sampling for contrastive learning.<n>We demonstrate that the weighted sampling strategy produces performance gains compared to standard ones by up to 31.03% in MRR and up to 33.98% in Recall@3.
arXiv Detail & Related papers (2025-06-03T07:05:49Z)
BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment [42.193395498828764]
We introduce BayLing 2, which efficiently transfers generative capabilities and knowledge from high-resource languages to low-resource languages. For multilingual translation across 100+ languages, BayLing shows superior performance compared to open-source models of similar scale. Demo, homepage, code and models of BayLing are available.
arXiv Detail & Related papers (2024-11-25T11:35:08Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Cross-Lingual Question Answering over Knowledge Base as Reading Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base. One of the major challenges facing xKBQA is the high cost of data annotation. We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z)
Adapters for Enhanced Modeling of Multilingual Knowledge and Text [54.02078328453149]
Language models have been extended to multilingual language models (MLLMs) Knowledge graphs contain facts in an explicit triple format, which require careful curation and are only available in a few high-resource languages. We propose to enhance MLLMs with knowledge from multilingual knowledge graphs (MLKGs) so as to tackle language and knowledge graph tasks across many languages.
arXiv Detail & Related papers (2022-10-24T21:33:42Z)
Knowledge Based Multilingual Language Model [44.70205282863062]
We present a novel framework to pretrain knowledge based multilingual language models (KMLMs) We generate a large amount of code-switched synthetic sentences and reasoning-based multilingual training data using the Wikidata knowledge graphs. Based on the intra- and inter-sentence structures of the generated data, we design pretraining tasks to facilitate knowledge learning.
arXiv Detail & Related papers (2021-11-22T02:56:04Z)
A Multilingual Modeling Method for Span-Extraction Reading Comprehension [2.4905424368103444]
We propose a multilingual extractive reading comprehension approach called XLRC. We show that our model outperforms the state-of-the-art baseline (i.e., RoBERTa_Large) on the CMRC 2018 task.
arXiv Detail & Related papers (2021-05-31T11:05:30Z)
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages. We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages. We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC) LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language. We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.