SMDDH: Singleton Mention detection using Deep Learning in Hindi Text
- URL: http://arxiv.org/abs/2301.09361v1
- Date: Mon, 23 Jan 2023 10:58:18 GMT
- Title: SMDDH: Singleton Mention detection using Deep Learning in Hindi Text
- Authors: Kusum Lata, Pardeep Singh, and Kamlesh Dutta
- Abstract summary: This paper proposes a singleton mention detection module based on a fully connected network and a Convolutional neural network for Hindi text.
In terms of Precision, Recall, and F-measure, the experimental findings obtained are excellent.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mention detection is an important component of coreference resolution system,
where mentions such as name, nominal, and pronominals are identified. These
mentions can be purely coreferential mentions or singleton mentions
(non-coreferential mentions). Coreferential mentions are those mentions in a
text that refer to the same entities in a real world. Whereas, singleton
mentions are mentioned only once in the text and do not participate in the
coreference as they are not mentioned again in the following text. Filtering of
these singleton mentions can substantially improve the performance of a
coreference resolution process. This paper proposes a singleton mention
detection module based on a fully connected network and a Convolutional neural
network for Hindi text. This model utilizes a few hand-crafted features and
context information, and word embedding for words. The coreference annotated
Hindi dataset comprising of 3.6K sentences, and 78K tokens are used for the
task. In terms of Precision, Recall, and F-measure, the experimental findings
obtained are excellent.
Related papers
- Fine-Grained Named Entities for Corona News [0.0]
This study proposes a data annotation pipeline to generate training data from corona news articles.
Named entity recognition models are trained on this annotated corpus and then evaluated on test sentences manually annotated by domain experts.
arXiv Detail & Related papers (2024-04-20T18:22:49Z) - SPLICE: A Singleton-Enhanced PipeLIne for Coreference REsolution [11.062090350704617]
Singleton mentions, i.e.entities mentioned only once in a text, are important to how humans understand discourse from a theoretical perspective.
Previous attempts to incorporate their detection in end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention spans in the OntoNotes benchmark.
This paper addresses this limitation by combining predicted mentions from existing nested NER systems and features derived from OntoNotes syntax trees.
arXiv Detail & Related papers (2024-03-25T22:46:16Z) - Incorporating Singletons and Mention-based Features in Coreference
Resolution via Multi-task Learning for Better Generalization [12.084539012992412]
This paper presents a coreference model that learns singletons as well as features such as entity type and information status.
This approach achieves new state-of-the-art scores on the OntoGUM benchmark.
arXiv Detail & Related papers (2023-09-20T18:44:24Z) - Part-of-Speech Tagging of Odia Language Using statistical and Deep
Learning-Based Approaches [0.0]
This research work is to present a conditional random field (CRF) and deep learning-based approaches (CNN and Bi-LSTM) to develop Odia part-of-speech tagger.
It has been observed that Bi-LSTM model with character sequence feature and pre-trained word vector achieved a significant state-of-the-art result.
arXiv Detail & Related papers (2022-07-07T12:15:23Z) - Automatic Dialect Density Estimation for African American English [74.44807604000967]
We explore automatic prediction of dialect density of the African American English (AAE) dialect.
dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect.
We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database.
arXiv Detail & Related papers (2022-04-03T01:34:48Z) - UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger.
We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences.
We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Coreference Resolution System for Indonesian Text with Mention Pair
Method and Singleton Exclusion using Convolutional Neural Network [0.0]
We propose a new coreference resolution system for Indonesian text with mention pair method.
In addition to lexical and syntactic features, in order to learn the representation of the mentions words and context, we use word embeddings and feed them to CNN.
Our proposed system outperforms the state-of-the-art system.
arXiv Detail & Related papers (2020-09-11T22:21:19Z) - Active Learning for Coreference Resolution using Discrete Annotation [76.36423696634584]
We improve upon pairwise annotation for active learning in coreference resolution.
We ask annotators to identify mention antecedents if a presented mention pair is deemed not coreferent.
In experiments with existing benchmark coreference datasets, we show that the signal from this additional question leads to significant performance gains per human-annotation hour.
arXiv Detail & Related papers (2020-04-28T17:17:11Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z) - Lexical Sememe Prediction using Dictionary Definitions by Capturing
Local Semantic Correspondence [94.79912471702782]
Sememes, defined as the minimum semantic units of human languages, have been proven useful in many NLP tasks.
We propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes.
We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-01-16T17:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.