Exhaustive Entity Recognition for Coptic: Challenges and Solutions
- URL: http://arxiv.org/abs/2011.02068v1
- Date: Tue, 3 Nov 2020 23:49:42 GMT
- Title: Exhaustive Entity Recognition for Coptic: Challenges and Solutions
- Authors: Amir Zeldes, Lance Martin and Sichang Tu
- Abstract summary: In this paper we present entity recognition for Coptic, the language of Hellenistic era Egypt.
Weevaluate NLP approaches to the task and lay out difficulties in applying them to a low-resource,morphologically complex language.
We present solutions for named and non-named nested en-tity recognition and semi-automatic entity linking to Wikipedia, relying on robust dependencyparsing, feature-based CRF models, and hand-crafted knowledge base resources.
- Score: 8.980876474818153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Entity recognition provides semantic access to ancient materials in the
Digital Humanities: itexposes people and places of interest in texts that
cannot be read exhaustively, facilitates linkingresources and can provide a
window into text contents, even for texts with no translations. Inthis paper we
present entity recognition for Coptic, the language of Hellenistic era Egypt.
Weevaluate NLP approaches to the task and lay out difficulties in applying them
to a low-resource,morphologically complex language. We present solutions for
named and non-named nested en-tity recognition and semi-automatic entity
linking to Wikipedia, relying on robust dependencyparsing, feature-based CRF
models, and hand-crafted knowledge base resources, enabling highaccuracy NER
with orders of magnitude less data than those used for high resource
languages.The results suggest avenues for research on other languages in
similar settings.
Related papers
- Neurosymbolic AI approach to Attribution in Large Language Models [5.3454230926797734]
Neurosymbolic AI (NesyAI) combines the strengths of neural networks with structured symbolic reasoning.
This paper explores how NesyAI frameworks can enhance existing attribution models, offering more reliable, interpretable, and adaptable systems.
arXiv Detail & Related papers (2024-09-30T02:20:36Z) - Knowledge Graph-Enhanced Large Language Models via Path Selection [58.228392005755026]
Large Language Models (LLMs) have shown unprecedented performance in various real-world applications.
LLMs are known to generate factually inaccurate outputs, a.k.a. the hallucination problem.
We propose a principled framework KELP with three stages to handle the above problems.
arXiv Detail & Related papers (2024-06-19T21:45:20Z) - Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields [68.17213992395041]
Low-resource named entity recognition is still an open problem in NLP.
We present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly.
arXiv Detail & Related papers (2024-04-14T23:44:49Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Improving Chinese Named Entity Recognition by Search Engine Augmentation [2.971423962840551]
We propose a neural-based approach to perform semantic augmentation using external knowledge from search engine for Chinese NER.
In particular, a multi-channel semantic fusion model is adopted to generate the augmented input representations, which aggregates external related texts retrieved from the search engine.
arXiv Detail & Related papers (2022-10-23T08:42:05Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced
Language Model Pre-training [22.534866015730664]
We verbalize the entire English Wikidata KG.
We show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora.
arXiv Detail & Related papers (2020-10-23T22:14:50Z) - Soft Gazetteers for Low-Resource Named Entity Recognition [78.00856159473393]
We propose a method of "soft gazetteers" that incorporates ubiquitously available information from English knowledge bases into neural named entity recognition models.
Our experiments on four low-resource languages show an average improvement of 4 points in F1 score.
arXiv Detail & Related papers (2020-05-04T21:58:02Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.