Entity-Switched Datasets: An Approach to Auditing the In-Domain
Robustness of Named Entity Recognition Models
- URL: http://arxiv.org/abs/2004.04123v2
- Date: Wed, 13 Jan 2021 18:50:37 GMT
- Title: Entity-Switched Datasets: An Approach to Auditing the In-Domain
Robustness of Named Entity Recognition Models
- Authors: Oshin Agarwal, Yinfei Yang, Byron C. Wallace, Ani Nenkova
- Abstract summary: We propose a method for auditing the in-domain robustness of systems, focusing specifically on differences in performance due to the national origin of entities.
We create entity-switched datasets, in which named entities in the original texts are replaced by plausible named entities of the same type but of different national origin.
We find that state-of-the-art systems' performance vary widely even in-domain: In the same context, entities from certain origins are more reliably recognized than entities from elsewhere.
- Score: 49.878051587667244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named entity recognition systems perform well on standard datasets comprising
English news. But given the paucity of data, it is difficult to draw
conclusions about the robustness of systems with respect to recognizing a
diverse set of entities. We propose a method for auditing the in-domain
robustness of systems, focusing specifically on differences in performance due
to the national origin of entities. We create entity-switched datasets, in
which named entities in the original texts are replaced by plausible named
entities of the same type but of different national origin. We find that
state-of-the-art systems' performance vary widely even in-domain: In the same
context, entities from certain origins are more reliably recognized than
entities from elsewhere. Systems perform best on American and Indian entities,
and worst on Vietnamese and Indonesian entities. This auditing approach can
facilitate the development of more robust named entity recognition systems, and
will allow research in this area to consider fairness criteria that have
received heightened attention in other predictive technology work.
Related papers
- LLM-DER:A Named Entity Recognition Method Based on Large Language Models for Chinese Coal Chemical Domain [4.639851504108679]
We propose a Large Language Models (LLMs)-based entity recognition framework LLM-DER for the domain-specific entity recognition problem in Chinese.
LLMs-DER generates a list of relationships containing entity types through LLMs, and designs a plausibility and consistency evaluation method to remove misrecognized entities.
The experimental results of this paper on the Resume dataset and the self-constructed coal chemical dataset Coal show that LLM-DER performs outstandingly in domain-specific entity recognition.
arXiv Detail & Related papers (2024-09-16T08:28:05Z) - Entity Disambiguation via Fusion Entity Decoding [68.77265315142296]
We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions.
We observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
arXiv Detail & Related papers (2024-04-02T04:27:54Z) - Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition [100.39728263079736]
We introduce Entity6K, a comprehensive dataset for real-world entity recognition.
It features 5,700 entities across 26 categories, each supported by 5 human-verified images with annotations.
arXiv Detail & Related papers (2024-03-19T01:07:53Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Transformer-Based Named Entity Recognition for French Using Adversarial
Adaptation to Similar Domain Corpora [21.036698406367115]
We propose a transformer-based NER approach for French using adversarial adaptation to similar domain or general corpora.
We evaluate our approach on three labelled datasets and show that our adaptation framework outperforms the corresponding non-adaptive models.
arXiv Detail & Related papers (2022-12-05T23:33:36Z) - Using Domain Knowledge for Low Resource Named Entity Recognition [2.749726993052939]
We propose to use domain knowledge to improve the performance of named entity recognition in areas with low resources.
The proposed model avoids large-scale data adjustments in different domains while handling named entities recognition with low resources.
arXiv Detail & Related papers (2022-03-28T13:26:47Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.