Neural Modeling for Named Entities and Morphology (NEMO^2)
- URL: http://arxiv.org/abs/2007.15620v2
- Date: Mon, 10 May 2021 07:31:56 GMT
- Title: Neural Modeling for Named Entities and Morphology (NEMO^2)
- Authors: Dan Bareket and Reut Tsarfaty
- Abstract summary: We develop a novel NER benchmark for Modern Hebrew, a morphologically rich-and-ambiguous language.
We show that explicitly modeling morphological boundaries leads to improved NER performance.
A novel hybrid architecture, in which NER precedes and prunes morphological decomposition, greatly outperforms the standard pipeline.
- Score: 9.092452284460283
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated
as classification over a sequence of tokens. Morphologically-Rich Languages
(MRLs) pose a challenge to this basic formulation, as the boundaries of Named
Entities do not necessarily coincide with token boundaries, rather, they
respect morphological boundaries. To address NER in MRLs we then need to answer
two fundamental questions, namely, what are the basic units to be labeled, and
how can these units be detected and classified in realistic settings, i.e.,
where no gold morphology is available. We empirically investigate these
questions on a novel NER benchmark, with parallel tokenlevel and morpheme-level
NER annotations, which we develop for Modern Hebrew, a morphologically
rich-and-ambiguous language. Our results show that explicitly modeling
morphological boundaries leads to improved NER performance, and that a novel
hybrid architecture, in which NER precedes and prunes morphological
decomposition, greatly outperforms the standard pipeline, where morphological
decomposition strictly precedes NER, setting a new performance bar for both
Hebrew NER and Hebrew morphological decomposition tasks.
Related papers
- HeQ: a Large and Diverse Hebrew Reading Comprehension Benchmark [54.73504952691398]
We set out to deliver a Hebrew Machine Reading dataset as extractive Questioning.<n>The morphologically rich nature of Hebrew poses a challenge to this endeavor.<n>We devise a novel set of guidelines, a controlled crowdsourcing protocol, and revised evaluation metrics.
arXiv Detail & Related papers (2025-08-03T15:53:01Z) - Recent advancements in computational morphology : A comprehensive survey [0.11606731918609076]
Computational morphology handles the language processing at the word level.
Morpheme boundary detection, lemmatization, morphological feature tagging, morphological reinflection etc.
arXiv Detail & Related papers (2024-06-08T10:07:33Z) - A Truly Joint Neural Architecture for Segmentation and Parsing [15.866519123942457]
Performance of Morphologically Rich Languages (MRLs) is lower than other languages.
Due to high morphological complexity and ambiguity of the space-delimited input tokens, the linguistic units that act as nodes in the tree are not known in advance.
We introduce a joint neural architecture where a lattice-based representation preserving all morphological ambiguity of the input is provided to an arc-factored model, which then solves the morphological and syntactic parsing tasks at once.
arXiv Detail & Related papers (2024-02-04T16:56:08Z) - Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer [50.572974726351504]
We propose C-FNT, a novel E2E model that incorporates class-based LMs into FNT.
In C-FNT, the LM score of named entities can be associated with the name class instead of its surface form.
The experimental results show that our proposed C-FNT significantly reduces error in named entities without hurting performance in general word recognition.
arXiv Detail & Related papers (2023-09-14T12:14:49Z) - Gaussian Prior Reinforcement Learning for Nested Named Entity
Recognition [52.46740830977898]
We propose a novel seq2seq model named GPRL, which formulates the nested NER task as an entity triplet sequence generation process.
Experiments on three nested NER datasets demonstrate that GPRL outperforms previous nested NER models.
arXiv Detail & Related papers (2023-05-12T05:55:34Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Nested Named Entity Recognition as Holistic Structure Parsing [92.8397338250383]
This work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all.
Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art.
arXiv Detail & Related papers (2022-04-17T12:48:20Z) - Universal approximation property of invertible neural networks [76.95927093274392]
Invertible neural networks (INNs) are neural network architectures with invertibility by design.
Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning.
arXiv Detail & Related papers (2022-04-15T10:45:26Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - Unified Named Entity Recognition as Word-Word Relation Classification [25.801945832005504]
We present a novel alternative by modeling the unified NER as word-word relation classification, namely W2NER.
The architecture resolves the kernel bottleneck of unified NER by effectively modeling the neighboring relations between entity words.
Based on the W2NER scheme we develop a neural framework, in which the unified NER is modeled as a 2D grid of word pairs.
arXiv Detail & Related papers (2021-12-19T06:11:07Z) - Evaluation of Morphological Embeddings for the Russian Language [0.0]
morphology-based embeddings trained with Skipgram objective do not outperform existing embedding model -- FastText.
A more complex, but morphology unaware model, BERT, allows to achieve significantly greater performance on the tasks that presumably require understanding of a word's morphology.
arXiv Detail & Related papers (2021-03-11T11:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.