Automatic Construction of Sememe Knowledge Bases via Dictionaries
- URL: http://arxiv.org/abs/2105.12585v1
- Date: Wed, 26 May 2021 14:41:01 GMT
- Title: Automatic Construction of Sememe Knowledge Bases via Dictionaries
- Authors: Fanchao Qi, Yangyi Chen, Fengyu Wang, Zhiyuan Liu, Xiao Chen, Maosong
Sun
- Abstract summary: Sememe knowledge bases (SKBs) enable sememes to be applied to natural language processing.
Most languages have no SKBs, and manual construction of SKBs is time-consuming and labor-intensive.
We propose a simple and fully automatic method of building an SKB via an existing dictionary.
- Score: 53.8700954466358
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A sememe is defined as the minimum semantic unit in linguistics. Sememe
knowledge bases (SKBs), which comprise words annotated with sememes, enable
sememes to be applied to natural language processing. So far a large body of
research has showcased the unique advantages and effectiveness of SKBs in
various tasks. However, most languages have no SKBs, and manual construction of
SKBs is time-consuming and labor-intensive. To tackle this challenge, we
propose a simple and fully automatic method of building an SKB via an existing
dictionary. We use this method to build an English SKB and a French SKB, and
conduct comprehensive evaluations from both intrinsic and extrinsic
perspectives. Experimental results demonstrate that the automatically built
English SKB is even superior to HowNet, the most widely used SKB that takes
decades to build manually. And both the English and French SKBs can bring
obvious performance enhancement in multiple downstream tasks. All the code and
data of this paper (except the copyrighted dictionaries) can be obtained at
https://github.com/thunlp/DictSKB.
Related papers
- Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing
Idiomatic Translation with Language Models [57.60487455727155]
idioms, with their non-compositional nature, pose particular challenges for Transformer-based systems.
Traditional methods, which replace idioms using existing knowledge bases (KBs), often lack scale and context awareness.
We introduce a multilingual idiom KB (IdiomKB) developed using large LMs to address this.
This KB facilitates better translation by smaller models, such as BLOOMZ (7.1B), Alpaca (7B), and InstructGPT (6.7B)
arXiv Detail & Related papers (2023-08-26T21:38:31Z) - Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base.
One of the major challenges facing xKBQA is the high cost of data annotation.
We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z) - mOKB6: A Multilingual Open Knowledge Base Completion Benchmark [38.91023041725193]
We construct the first multilingual Open KBC dataset, called mOKB6, containing facts from Wikipedia in six languages (including English)
We experiment with several models for the task and observe a consistent benefit of combining languages with the help of shared embedding space as well as translations of facts.
arXiv Detail & Related papers (2022-11-13T17:10:49Z) - The Analysis about Building Cross-lingual Sememe Knowledge Base Based on
Deep Clustering Network [0.7310043452300736]
Sememe knowledge bases (KBs) contain words annotated with sememes.
We propose an unsupervised method based on a deep clustering network (DCN) to build a sememe KB.
arXiv Detail & Related papers (2022-08-10T17:40:45Z) - Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal
Information [89.24684041258747]
Sememe knowledge bases (KBs) are built by manually annotating words with sememes.
Existing sememe KBs only cover a few languages, which hinders the wide utilization of sememes.
This paper aims to build a multilingual sememe KB based on BabelNet, a multilingual encyclopedia dictionary.
arXiv Detail & Related papers (2022-03-14T18:37:09Z) - Prix-LM: Pretraining for Multilingual Knowledge Base Construction [59.02868906044296]
We propose a unified framework, Prix-LM, for multilingual knowledge construction and completion.
We leverage two types of knowledge, monolingual triples and cross-lingual links, extracted from existing multilingual KBs.
Experiments on standard entity-related tasks, such as link prediction in multiple languages, cross-lingual entity linking and bilingual lexicon induction, demonstrate its effectiveness.
arXiv Detail & Related papers (2021-10-16T02:08:46Z) - Reasoning Over Virtual Knowledge Bases With Open Predicate Relations [85.19305347984515]
We present the Open Predicate Query Language (OPQL)
OPQL is a method for constructing a virtual Knowledge Base (VKB) trained entirely from text.
We demonstrate that OPQL outperforms prior VKB methods on two different KB reasoning tasks.
arXiv Detail & Related papers (2021-02-14T01:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.