DepNeCTI: Dependency-based Nested Compound Type Identification for
Sanskrit
- URL: http://arxiv.org/abs/2310.09501v1
- Date: Sat, 14 Oct 2023 06:11:53 GMT
- Title: DepNeCTI: Dependency-based Nested Compound Type Identification for
Sanskrit
- Authors: Jivnesh Sandhan, Yaswanth Narsupalli, Sreevatsa Muppirala, Sriram
Krishnan, Pavankumar Satuluri, Amba Kulkarni and Pawan Goyal
- Abstract summary: This work introduces the novel task of nested compound type identification (NeCTI)
It aims to identify nested spans of a multi-component compound and decode the implicit semantic relations between them.
To the best of our knowledge, this is the first attempt in the field of lexical semantics to propose this task.
- Score: 7.04795623262177
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-component compounding is a prevalent phenomenon in Sanskrit, and
understanding the implicit structure of a compound's components is crucial for
deciphering its meaning. Earlier approaches in Sanskrit have focused on binary
compounds and neglected the multi-component compound setting. This work
introduces the novel task of nested compound type identification (NeCTI), which
aims to identify nested spans of a multi-component compound and decode the
implicit semantic relations between them. To the best of our knowledge, this is
the first attempt in the field of lexical semantics to propose this task.
We present 2 newly annotated datasets including an out-of-domain dataset for
this task. We also benchmark these datasets by exploring the efficacy of the
standard problem formulations such as nested named entity recognition,
constituency parsing and seq2seq, etc. We present a novel framework named
DepNeCTI: Dependency-based Nested Compound Type Identifier that surpasses the
performance of the best baseline with an average absolute improvement of 13.1
points F1-score in terms of Labeled Span Score (LSS) and a 5-fold enhancement
in inference efficiency. In line with the previous findings in the binary
Sanskrit compound identification task, context provides benefits for the NeCTI
task. The codebase and datasets are publicly available at:
https://github.com/yaswanth-iitkgp/DepNeCTI
Related papers
- SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages [44.017657230247934]
We present textitSemRel, a new semantic relatedness dataset collection annotated by native speakers across 13 languages.
These languages originate from five distinct language families and are predominantly spoken in Africa and Asia.
Each instance in the SemRel datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences.
arXiv Detail & Related papers (2024-02-13T18:04:53Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - CompoundPiece: Evaluating and Improving Decompounding Performance of
Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents.
We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary.
We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z) - HIORE: Leveraging High-order Interactions for Unified Entity Relation
Extraction [85.80317530027212]
We propose HIORE, a new method for unified entity relation extraction.
The key insight is to leverage the complex association among word pairs, which contains richer information than the first-order word-by-word interactions.
Experiments show that HIORE achieves the state-of-the-art performance on relation extraction and an improvement of 1.11.8 F1 points over the prior best unified model.
arXiv Detail & Related papers (2023-05-07T14:57:42Z) - A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type
Identification in Sanskrit [13.742271198030998]
We propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information.
Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system.
arXiv Detail & Related papers (2022-08-22T13:41:51Z) - Multi-Modal Association based Grouping for Form Structure Extraction [14.134131448981295]
We present a novel multi-modal approach for form structure extraction.
We extract higher-order structures such as TextBlocks, Text Fields, Choice Fields, and Choice Groups.
Our approach achieves a recall of 90.29%, 73.80%, 83.12%, and 52.72% for the above structures, respectively.
arXiv Detail & Related papers (2021-07-09T12:49:34Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - Multilingual Irony Detection with Dependency Syntax and Neural Models [61.32653485523036]
It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme.
The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.
arXiv Detail & Related papers (2020-11-11T11:22:05Z) - Local Additivity Based Data Augmentation for Semi-supervised NER [59.90773003737093]
Named Entity Recognition (NER) is one of the first stages in deep language understanding.
Current NER models heavily rely on human-annotated data.
We propose a Local Additivity based Data Augmentation (LADA) method for semi-supervised NER.
arXiv Detail & Related papers (2020-10-04T20:46:26Z) - Nominal Compound Chain Extraction: A New Task for Semantic-enriched
Lexical Chain [34.352862428120126]
We introduce a novel task, Nominal Compound Chain Extraction (NCCE), extracting and clustering all the nominal compounds that share identical semantic topics.
In addition, we model the task as a two-stage prediction (i.e., compound extraction and chain detection), which is handled via a proposed joint framework.
The experiments are based on our manually annotated corpus, and the results prove the necessity of the NCCE task.
arXiv Detail & Related papers (2020-09-19T06:20:37Z) - Fine-Grained Named Entity Typing over Distantly Supervised Data Based on
Refined Representations [16.30478830298353]
Fine-Grained Named Entity Typing (FG-NET) is a key component in Natural Language Processing (NLP)
We propose an edge-weighted attentive graph convolution network that refines the noisy mention representations by attending over corpus-level contextual clues prior to the end classification.
Experimental evaluation shows that the proposed model outperforms the existing research by a relative score of upto 10.2% and 8.3% for macro f1 and micro f1 respectively.
arXiv Detail & Related papers (2020-04-07T17:26:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.