Reconciliation of Pre-trained Models and Prototypical Neural Networks in
Few-shot Named Entity Recognition
- URL: http://arxiv.org/abs/2211.03270v1
- Date: Mon, 7 Nov 2022 02:33:45 GMT
- Title: Reconciliation of Pre-trained Models and Prototypical Neural Networks in
Few-shot Named Entity Recognition
- Authors: Youcheng Huang, Wenqiang Lei, Jie Fu and Jiancheng Lv
- Abstract summary: We propose a one-line-code normalization method to reconcile such a mismatch with empirical and theoretical grounds.
Our work also provides an analytical viewpoint for addressing the general problems in few-shot name entity recognition.
- Score: 35.34238362639678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Incorporating large-scale pre-trained models with the prototypical neural
networks is a de-facto paradigm in few-shot named entity recognition. Existing
methods, unfortunately, are not aware of the fact that embeddings from
pre-trained models contain a prominently large amount of information regarding
word frequencies, biasing prototypical neural networks against learning word
entities. This discrepancy constrains the two models' synergy. Thus, we propose
a one-line-code normalization method to reconcile such a mismatch with
empirical and theoretical grounds. Our experiments based on nine benchmark
datasets show the superiority of our method over the counterpart models and are
comparable to the state-of-the-art methods. In addition to the model
enhancement, our work also provides an analytical viewpoint for addressing the
general problems in few-shot name entity recognition or other tasks that rely
on pre-trained models or prototypical neural networks.
Related papers
- BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND)
Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model.
Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z) - MENTOR: Human Perception-Guided Pretraining for Increased Generalization [5.596752018167751]
We introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization)
We train an autoencoder to learn human saliency maps given an input image, without class labels.
We remove the decoder part, add a classification layer on top of the encoder, and fine-tune this new model conventionally.
arXiv Detail & Related papers (2023-10-30T13:50:44Z) - Unleashing the power of Neural Collapse for Transferability Estimation [42.09673383041276]
Well-trained models exhibit the phenomenon of Neural Collapse.
We propose a novel method termed Fair Collapse (FaCe) for transferability estimation.
FaCe yields state-of-the-art performance on different tasks including image classification, semantic segmentation, and text classification.
arXiv Detail & Related papers (2023-10-09T14:30:10Z) - On Modifying a Neural Network's Perception [3.42658286826597]
We propose a method which allows one to modify what an artificial neural network is perceiving regarding specific human-defined concepts.
We test the proposed method on different models, assessing whether the performed manipulations are well interpreted by the models, and analyzing how they react to them.
arXiv Detail & Related papers (2023-03-05T12:09:37Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Amortized Bayesian Inference for Models of Cognition [0.1529342790344802]
Recent advances in simulation-based inference using specialized neural network architectures circumvent many previous problems of approximate Bayesian computation.
We provide a general introduction to amortized Bayesian parameter estimation and model comparison.
arXiv Detail & Related papers (2020-05-08T08:12:15Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.