Related papers: Automatic Construction of Sememe Knowledge Bases via Dictionaries

Automatic Construction of Sememe Knowledge Bases via Dictionaries

URL: http://arxiv.org/abs/2105.12585v1
Date: Wed, 26 May 2021 14:41:01 GMT
Title: Automatic Construction of Sememe Knowledge Bases via Dictionaries
Authors: Fanchao Qi, Yangyi Chen, Fengyu Wang, Zhiyuan Liu, Xiao Chen, Maosong Sun
Abstract summary: Sememe knowledge bases (SKBs) enable sememes to be applied to natural language processing. Most languages have no SKBs, and manual construction of SKBs is time-consuming and labor-intensive. We propose a simple and fully automatic method of building an SKB via an existing dictionary.
Score: 53.8700954466358
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A sememe is defined as the minimum semantic unit in linguistics. Sememe knowledge bases (SKBs), which comprise words annotated with sememes, enable sememes to be applied to natural language processing. So far a large body of research has showcased the unique advantages and effectiveness of SKBs in various tasks. However, most languages have no SKBs, and manual construction of SKBs is time-consuming and labor-intensive. To tackle this challenge, we propose a simple and fully automatic method of building an SKB via an existing dictionary. We use this method to build an English SKB and a French SKB, and conduct comprehensive evaluations from both intrinsic and extrinsic perspectives. Experimental results demonstrate that the automatically built English SKB is even superior to HowNet, the most widely used SKB that takes decades to build manually. And both the English and French SKBs can bring obvious performance enhancement in multiple downstream tasks. All the code and data of this paper (except the copyrighted dictionaries) can be obtained at https://github.com/thunlp/DictSKB.

Related papers

Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language Models [57.60487455727155]
idioms, with their non-compositional nature, pose particular challenges for Transformer-based systems. Traditional methods, which replace idioms using existing knowledge bases (KBs), often lack scale and context awareness. We introduce a multilingual idiom KB (IdiomKB) developed using large LMs to address this. This KB facilitates better translation by smaller models, such as BLOOMZ (7.1B), Alpaca (7B), and InstructGPT (6.7B)
arXiv Detail & Related papers (2023-08-26T21:38:31Z)
Cross-Lingual Question Answering over Knowledge Base as Reading Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base. One of the major challenges facing xKBQA is the high cost of data annotation. We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z)
mOKB6: A Multilingual Open Knowledge Base Completion Benchmark [38.91023041725193]
We construct the first multilingual Open KBC dataset, called mOKB6, containing facts from Wikipedia in six languages (including English) We experiment with several models for the task and observe a consistent benefit of combining languages with the help of shared embedding space as well as translations of facts.
arXiv Detail & Related papers (2022-11-13T17:10:49Z)
The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network [0.7310043452300736]
Sememe knowledge bases (KBs) contain words annotated with sememes. We propose an unsupervised method based on a deep clustering network (DCN) to build a sememe KB.
arXiv Detail & Related papers (2022-08-10T17:40:45Z)
Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal Information [89.24684041258747]
Sememe knowledge bases (KBs) are built by manually annotating words with sememes. Existing sememe KBs only cover a few languages, which hinders the wide utilization of sememes. This paper aims to build a multilingual sememe KB based on BabelNet, a multilingual encyclopedia dictionary.
arXiv Detail & Related papers (2022-03-14T18:37:09Z)
Prix-LM: Pretraining for Multilingual Knowledge Base Construction [59.02868906044296]
We propose a unified framework, Prix-LM, for multilingual knowledge construction and completion. We leverage two types of knowledge, monolingual triples and cross-lingual links, extracted from existing multilingual KBs. Experiments on standard entity-related tasks, such as link prediction in multiple languages, cross-lingual entity linking and bilingual lexicon induction, demonstrate its effectiveness.
arXiv Detail & Related papers (2021-10-16T02:08:46Z)
Reasoning Over Virtual Knowledge Bases With Open Predicate Relations [85.19305347984515]
We present the Open Predicate Query Language (OPQL) OPQL is a method for constructing a virtual Knowledge Base (VKB) trained entirely from text. We demonstrate that OPQL outperforms prior VKB methods on two different KB reasoning tasks.
arXiv Detail & Related papers (2021-02-14T01:29:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.