A Rigorous Study on Named Entity Recognition: Can Fine-tuning Pretrained
Model Lead to the Promised Land?
- URL: http://arxiv.org/abs/2004.12126v2
- Date: Fri, 23 Oct 2020 07:06:06 GMT
- Title: A Rigorous Study on Named Entity Recognition: Can Fine-tuning Pretrained
Model Lead to the Promised Land?
- Authors: Hongyu Lin, Yaojie Lu, Jialong Tang, Xianpei Han, Le Sun, Zhicheng
Wei, Nicholas Jing Yuan
- Abstract summary: Fine-tuning pretrained model has achieved promising performance on standard NER benchmarks.
Unfortunately, when scaling NER to open situations, these advantages may no longer exist.
This paper proposes to conduct randomization test on standard benchmarks.
- Score: 44.87003366511073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning pretrained model has achieved promising performance on standard
NER benchmarks. Generally, these benchmarks are blessed with strong name
regularity, high mention coverage and sufficient context diversity.
Unfortunately, when scaling NER to open situations, these advantages may no
longer exist. And therefore it raises a critical question of whether previous
creditable approaches can still work well when facing these challenges. As
there is no currently available dataset to investigate this problem, this paper
proposes to conduct randomization test on standard benchmarks. Specifically, we
erase name regularity, mention coverage and context diversity respectively from
the benchmarks, in order to explore their impact on the generalization ability
of models. To further verify our conclusions, we also construct a new open NER
dataset that focuses on entity types with weaker name regularity and lower
mention coverage to verify our conclusion. From both randomization test and
empirical experiments, we draw the conclusions that 1) name regularity is
critical for the models to generalize to unseen mentions; 2) high mention
coverage may undermine the model generalization ability and 3) context patterns
may not require enormous data to capture when using pretrained encoders.
Related papers
- Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions [75.45274978665684]
Vision-Language Understanding (VLU) benchmarks contain samples where answers rely on assumptions unsupported by the provided context.
We collect contextual data for each sample whenever available and train a context selection module to facilitate evidence-based model predictions.
We develop a general-purpose Context-AwaRe Abstention detector to identify samples lacking sufficient context and enhance model accuracy.
arXiv Detail & Related papers (2024-05-18T02:21:32Z) - Rethinking Benchmark and Contamination for Language Models with
Rephrased Samples [49.18977581962162]
Large language models are increasingly trained on all the data ever produced by humans.
Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets.
arXiv Detail & Related papers (2023-11-08T17:35:20Z) - The Decaying Missing-at-Random Framework: Doubly Robust Causal Inference
with Partially Labeled Data [10.021381302215062]
In real-world scenarios, data collection limitations often result in partially labeled datasets, leading to difficulties in drawing reliable causal inferences.
Traditional approaches in the semi-parametric (SS) and missing data literature may not adequately handle these complexities, leading to biased estimates.
This framework tackles missing outcomes in high-dimensional settings and accounts for selection bias.
arXiv Detail & Related papers (2023-05-22T07:37:12Z) - Can NLP Models Correctly Reason Over Contexts that Break the Common
Assumptions? [14.991565484636745]
We investigate the ability of NLP models to correctly reason over contexts that break the common assumptions.
We show that while doing fairly well on contexts that follow the common assumptions, the models struggle to correctly reason over contexts that break those assumptions.
Specifically, the performance gap is as high as 20% absolute points.
arXiv Detail & Related papers (2023-05-20T05:20:37Z) - Testing for Overfitting [0.0]
We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data.
We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
arXiv Detail & Related papers (2023-05-09T22:49:55Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Context-aware Adversarial Training for Name Regularity Bias in Named
Entity Recognition [8.344476599818826]
We introduce NRB, a new testbed designed to diagnose Name Regularity Bias of NER models.
Our results indicate that all state-of-the-art models we tested show such a bias.
We propose a novel model-agnostic training method that adds learnable adversarial noise to some entity mentions.
arXiv Detail & Related papers (2021-07-24T13:55:35Z) - RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z) - Regularizing Models via Pointwise Mutual Information for Named Entity
Recognition [17.767466724342064]
We propose a Pointwise Mutual Information (PMI) to enhance generalization ability while outperforming an in-domain performance.
Our approach enables to debias highly correlated word and labels in the benchmark datasets.
For long-named and complex-structure entities, our method can predict these entities through debiasing on conjunction or special characters.
arXiv Detail & Related papers (2021-04-15T05:47:27Z) - A Brief Survey and Comparative Study of Recent Development of Pronoun
Coreference Resolution [55.39835612617972]
Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to.
As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models.
We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications.
arXiv Detail & Related papers (2020-09-27T01:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.