Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study
- URL: http://arxiv.org/abs/2001.03844v1
- Date: Sun, 12 Jan 2020 04:33:53 GMT
- Title: Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study
- Authors: Jinlan Fu, Pengfei Liu, Qi Zhang, Xuanjing Huang
- Abstract summary: We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
- Score: 81.11161697133095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While neural network-based models have achieved impressive performance on a
large body of NLP tasks, the generalization behavior of different models
remains poorly understood: Does this excellent performance imply a perfect
generalization model, or are there still some limitations? In this paper, we
take the NER task as a testbed to analyze the generalization behavior of
existing models from different perspectives and characterize the differences of
their generalization abilities through the lens of our proposed measures, which
guides us to better design models and training methods. Experiments with
in-depth analyses diagnose the bottleneck of existing neural NER models in
terms of breakdown performance analysis, annotation errors, dataset bias, and
category relationships, which suggest directions for improvement. We have
released the datasets: (ReCoNLL, PLONER) for the future research at our project
page: http://pfliu.com/InterpretNER/. As a by-product of this paper, we have
open-sourced a project that involves a comprehensive summary of recent NER
papers and classifies them into different research topics:
https://github.com/pfliu-nlp/Named-Entity-Recognition-NER-Papers.
Related papers
- Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models [20.29451537633895]
We propose the use of causal interventions to reverse engineer neural rankers.
We demonstrate how mechanistic interpretability methods can be used to isolate components satisfying term-frequency axioms.
arXiv Detail & Related papers (2024-05-03T22:30:15Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Robust Graph Representation Learning via Predictive Coding [46.22695915912123]
Predictive coding is a message-passing framework initially developed to model information processing in the brain.
In this work, we build models that rely on the message-passing rule of predictive coding.
We show that the proposed models are comparable to standard ones in terms of performance in both inductive and transductive tasks.
arXiv Detail & Related papers (2022-12-09T03:58:22Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Which Model To Trust: Assessing the Influence of Models on the
Performance of Reinforcement Learning Algorithms for Continuous Control Tasks [0.0]
It is not clear how much of the recent progress is due to improved algorithms or due to improved models.
A set of commonly adopted models is established for the purpose of model comparison.
Results reveal significant differences in model performance do exist.
arXiv Detail & Related papers (2021-10-25T16:17:26Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Small Data, Big Decisions: Model Selection in the Small-Data Regime [11.817454285986225]
We study the generalization performance as the size of the training set varies over multiple orders of magnitude.
Our experiments furthermore allow us to estimate Minimum Description Lengths for common datasets given modern neural network architectures.
arXiv Detail & Related papers (2020-09-26T12:52:56Z) - Critically Examining the Claimed Value of Convolutions over User-Item
Embedding Maps for Recommender Systems [14.414055798999764]
In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques to neural approaches.
We show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations.
arXiv Detail & Related papers (2020-07-23T10:03:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.