Related papers: Diagnosing Representation Dynamics in NER Model Extension

Diagnosing Representation Dynamics in NER Model Extension

URL: http://arxiv.org/abs/2510.17930v2
Date: Thu, 23 Oct 2025 09:17:55 GMT
Title: Diagnosing Representation Dynamics in NER Model Extension
Authors: Xirui Zhang, Philippe de La Chevasnerie, Benoit Fabre,
Abstract summary: We find that fine-tuning a BERT model on standard semantic entities and new pattern-based PII results in minimal degradation for original classes.<n>This work provides a mechanistic diagnosis of NER model adaptation, highlighting feature independence, representation overlap, and 'O' tag plasticity.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Extending Named Entity Recognition (NER) models to new PII entities in noisy spoken-language data is a common need. We find that jointly fine-tuning a BERT model on standard semantic entities (PER, LOC, ORG) and new pattern-based PII (EMAIL, PHONE) results in minimal degradation for original classes. We investigate this "peaceful coexistence," hypothesizing that the model uses independent semantic vs. morphological feature mechanisms. Using an incremental learning setup as a diagnostic tool, we measure semantic drift and find two key insights. First, the LOC (location) entity is uniquely vulnerable due to a representation overlap with new PII, as it shares pattern-like features (e.g., postal codes). Second, we identify a "reverse O-tag representation drift." The model, initially trained to map PII patterns to 'O', blocks new learning. This is resolved only by unfreezing the 'O' tag's classifier, allowing the background class to adapt and "release" these patterns. This work provides a mechanistic diagnosis of NER model adaptation, highlighting feature independence, representation overlap, and 'O' tag plasticity. Work done based on data gathered by https://www.papernest.com

Related papers

Any-Way Meta Learning [27.16222034423108]
We introduce the any-way" learning paradigm, an innovative model training approach that liberates model from fixed cardinality constraints. Surprisingly, this model not only matches but often outperforms traditional fixed-way models in terms of performance, convergence speed, and stability.
arXiv Detail & Related papers (2024-01-10T12:00:53Z)
Exploiting Contextual Target Attributes for Target Sentiment Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task. We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z)
FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services. Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality. Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality. We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z)
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks [7.317342506617286]
Prototypical Network based end-to-end KVP extraction model is presented. No dependency on dataset used for initial training of the model. No intermediate synthetic data generation which tends to add noise and results in model's performance degradation.
arXiv Detail & Related papers (2023-10-03T18:52:19Z)
Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category. Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations. PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z)
Span-based Named Entity Recognition by Generating and Compressing Information [23.444967843156086]
We propose to combine the two types of IB models into one system to enhance Named Entity Recognition. Experiments on five different corpora indicate that jointly training both generative and information compression models can enhance the performance of the baseline span-based NER system.
arXiv Detail & Related papers (2023-02-10T17:40:51Z)
Nested Named Entity Recognition from Medical Texts: An Adaptive Shared Network Architecture with Attentive CRF [53.55504611255664]
We propose a novel method, referred to as ASAC, to solve the dilemma caused by the nested phenomenon. The proposed method contains two key modules: the adaptive shared (AS) part and the attentive conditional random field (ACRF) module. Our model could learn better entity representations by capturing the implicit distinctions and relationships between different categories of entities.
arXiv Detail & Related papers (2022-11-09T09:23:56Z)
SpanProto: A Two-stage Span-based Prototypical Network for Few-shot Named Entity Recognition [45.012327072558975]
Few-shot Named Entity Recognition (NER) aims to identify named entities with very little annotated data. We propose a seminal span-based prototypical network (SpanProto) that tackles few-shot NER via a two-stage approach. In the span extraction stage, we transform the sequential tags into a global boundary matrix, enabling the model to focus on the explicit boundary information. For mention classification, we leverage prototypical learning to capture the semantic representations for each labeled span and make the model better adapt to novel-class entities.
arXiv Detail & Related papers (2022-10-17T12:59:33Z)
Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing. Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection. Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z)
Rethinking the Two-Stage Framework for Grounded Situation Recognition [61.93345308377144]
Grounded Situation Recognition is an essential step towards "human-like" event understanding. Existing GSR methods resort to a two-stage framework: predicting the verb in the first stage and detecting the semantic roles in the second stage. We propose a novel SituFormer for GSR which consists of a Coarse-to-Fine Verb Model (CFVM) and a Transformer-based Noun Model (TNM)
arXiv Detail & Related papers (2021-12-10T08:10:56Z)
A Sequence-to-Set Network for Nested Named Entity Recognition [38.05786148160635]
We propose a novel sequence-to-set neural network for nested NER. We use a non-autoregressive decoder to predict the final set of entities in one pass. Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora.
arXiv Detail & Related papers (2021-05-19T03:10:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.