Studying Taxonomy Enrichment on Diachronic WordNet Versions
- URL: http://arxiv.org/abs/2011.11536v1
- Date: Mon, 23 Nov 2020 16:49:37 GMT
- Title: Studying Taxonomy Enrichment on Diachronic WordNet Versions
- Authors: Irina Nikishina, Alexander Panchenko, Varvara Logacheva, Natalia
Loukachevitch
- Abstract summary: We explore the possibilities of taxonomy extension in a resource-poor setting and present methods which are applicable to a large number of languages.
We create novel English and Russian datasets for training and evaluating taxonomy enrichment models and describe a technique of creating such datasets for other languages.
- Score: 70.27072729280528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ontologies, taxonomies, and thesauri are used in many NLP tasks. However,
most studies are focused on the creation of these lexical resources rather than
the maintenance of the existing ones. Thus, we address the problem of taxonomy
enrichment. We explore the possibilities of taxonomy extension in a
resource-poor setting and present methods which are applicable to a large
number of languages. We create novel English and Russian datasets for training
and evaluating taxonomy enrichment models and describe a technique of creating
such datasets for other languages.
Related papers
- FLAME: Self-Supervised Low-Resource Taxonomy Expansion using Large
Language Models [19.863010475923414]
Taxonomies find utility in various real-world applications, such as e-commerce search engines and recommendation systems.
Traditional supervised taxonomy expansion approaches encounter difficulties stemming from limited resources.
We propose FLAME, a novel approach for taxonomy expansion in low-resource environments by harnessing the capabilities of large language models.
arXiv Detail & Related papers (2024-02-21T08:50:40Z) - Towards Visual Taxonomy Expansion [50.462998483087915]
We propose Visual Taxonomy Expansion (VTE), introducing visual features into the taxonomy expansion task.
We propose a textual hypernymy learning task and a visual prototype learning task to cluster textual and visual semantics.
Our method is evaluated on two datasets, where we obtain compelling results.
arXiv Detail & Related papers (2023-09-12T10:17:28Z) - Taxonomy Enrichment with Text and Graph Vector Representations [61.814256012166794]
We address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy.
We present a new method that allows achieving high results on this task with little effort.
We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.
arXiv Detail & Related papers (2022-01-21T09:01:12Z) - QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering
and Reading Comprehension [41.6087902739702]
This study is the largest survey of the field to date.
We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work.
We also discuss the implications of over-focusing on English, and survey the current monolingual resources for other languages and multilingual resources.
arXiv Detail & Related papers (2021-07-27T10:09:13Z) - Octet: Online Catalog Taxonomy Enrichment with Self-Supervision [67.26804972901952]
We present a self-supervised end-to-end framework, Octet for Online Catalog EnrichmenT.
We propose to train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure.
Octet enriches an online catalog in production to 2 times larger in the open-world evaluation.
arXiv Detail & Related papers (2020-06-18T04:53:07Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z) - Low resource language dataset creation, curation and classification:
Setswana and Sepedi -- Extended Abstract [2.3801001093799115]
We create datasets that are focused on news headlines for Setswana and Sepedi.
We propose baselines for classification, and investigate an approach on data augmentation better suited to low-resourced languages.
arXiv Detail & Related papers (2020-03-30T18:03:15Z) - Investigating an approach for low resource language dataset creation,
curation and classification: Setswana and Sepedi [2.3801001093799115]
We create datasets that are focused on news headlines for Setswana and Sepedi.
We also create a news topic classification task.
We investigate an approach on data augmentation, better suited to low resource languages.
arXiv Detail & Related papers (2020-02-18T13:58:06Z) - TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced
Graph Neural Network [62.12557274257303]
Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.
We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of query concept, anchor concept> pairs from the existing taxonomy as training data.
We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.
arXiv Detail & Related papers (2020-01-26T21:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.