Few-Shot Domain Adaptation for Named-Entity Recognition via Joint Constrained k-Means and Subspace Selection
- URL: http://arxiv.org/abs/2412.00426v2
- Date: Thu, 12 Dec 2024 16:19:14 GMT
- Title: Few-Shot Domain Adaptation for Named-Entity Recognition via Joint Constrained k-Means and Subspace Selection
- Authors: Ayoub Hammal, Benno Uthayasooriyar, Caio Corro,
- Abstract summary: We propose a weakly supervised algorithm that combines small labeled datasets with large amounts of unlabeled data.
This framework achieves state-of-the-art results in few-shot NER on several English datasets.
- Score: 6.390468088226495
- License:
- Abstract: Named-entity recognition (NER) is a task that typically requires large annotated datasets, which limits its applicability across domains with varying entity definitions. This paper addresses few-shot NER, aiming to transfer knowledge to new domains with minimal supervision. Unlike previous approaches that rely solely on limited annotated data, we propose a weakly supervised algorithm that combines small labeled datasets with large amounts of unlabeled data. Our method extends the k-means algorithm with label supervision, cluster size constraints and domain-specific discriminative subspace selection. This unified framework achieves state-of-the-art results in few-shot NER on several English datasets.
Related papers
- Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition [0.0]
Cross-domain named entity recognition still poses a challenge for most NER methods.
We introduce a label alignment and reassignment approach, namely LAR, to address this issue.
We conduct an extensive range of experiments on NER datasets involving both supervised and zero-shot scenarios.
arXiv Detail & Related papers (2024-07-24T15:13:12Z) - Adaptive Betweenness Clustering for Semi-Supervised Domain Adaptation [108.40945109477886]
We propose a novel SSDA approach named Graph-based Adaptive Betweenness Clustering (G-ABC) for achieving categorical domain alignment.
Our method outperforms previous state-of-the-art SSDA approaches, demonstrating the superiority of the proposed G-ABC algorithm.
arXiv Detail & Related papers (2024-01-21T09:57:56Z) - A Boundary Offset Prediction Network for Named Entity Recognition [9.885278527023532]
Named entity recognition (NER) is a fundamental task in natural language processing that aims to identify and classify named entities in text.
We propose a novel approach for NER, named the Boundary Offset Prediction Network (BOPN), which predicts the boundary offsets between candidate spans and their nearest entity spans.
Our method integrates entity type and span representations to generate type-aware boundary offsets instead of using entity types as detection targets.
arXiv Detail & Related papers (2023-10-23T05:04:07Z) - Cross-domain Contrastive Learning for Unsupervised Domain Adaptation [108.63914324182984]
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.
We build upon contrastive self-supervised learning to align features so as to reduce the domain discrepancy between training and testing sets.
arXiv Detail & Related papers (2021-06-10T06:32:30Z) - Locate and Label: A Two-stage Identifier for Nested Named Entity
Recognition [9.809157050048375]
We propose a two-stage entity identifier for named entity recognition.
First, we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories.
Our method effectively utilizes the boundary information of entities and partially matched spans during training.
arXiv Detail & Related papers (2021-05-14T12:52:34Z) - Select, Label, and Mix: Learning Discriminative Invariant Feature
Representations for Partial Domain Adaptation [55.73722120043086]
We develop a "Select, Label, and Mix" (SLM) framework to learn discriminative invariant feature representations for partial domain adaptation.
First, we present a simple yet efficient "select" module that automatically filters out outlier source samples to avoid negative transfer.
Second, the "label" module iteratively trains the classifier using both the labeled source domain data and the generated pseudo-labels for the target domain to enhance the discriminability of the latent space.
arXiv Detail & Related papers (2020-12-06T19:29:32Z) - Discriminative Cross-Domain Feature Learning for Partial Domain
Adaptation [70.45936509510528]
Partial domain adaptation aims to adapt knowledge from a larger and more diverse source domain to a smaller target domain with less number of classes.
Recent practice on domain adaptation manages to extract effective features by incorporating the pseudo labels for the target domain.
It is essential to align target data with only a small set of source data.
arXiv Detail & Related papers (2020-08-26T03:18:53Z) - Domain Adaptation with Auxiliary Target Domain-Oriented Classifier [115.39091109079622]
Domain adaptation aims to transfer knowledge from a label-rich but heterogeneous domain to a label-scare domain.
One of the most popular SSL techniques is pseudo-labeling that assigns pseudo labels for each unlabeled data.
We propose a new pseudo-labeling framework called Auxiliary Target Domain-Oriented (ATDOC)
ATDOC alleviates the bias by introducing an auxiliary classifier for target data only, to improve the quality of pseudo labels.
arXiv Detail & Related papers (2020-07-08T15:01:35Z) - Inductive Unsupervised Domain Adaptation for Few-Shot Classification via
Clustering [16.39667909141402]
Few-shot classification tends to struggle when it needs to adapt to diverse domains.
We introduce a framework, DaFeC, to improve Domain adaptation performance for Few-shot classification via Clustering.
Our approach outperforms previous work with absolute gains (in classification accuracy) of 4.95%, 9.55%, 3.99% and 11.62%, respectively.
arXiv Detail & Related papers (2020-06-23T08:17:48Z) - Low-Budget Label Query through Domain Alignment Enforcement [48.06803561387064]
We tackle a new problem named low-budget label query.
We first improve an Unsupervised Domain Adaptation (UDA) method to better align source and target domains.
We then propose a simple yet effective selection method based on uniform sampling of the prediction consistency distribution.
arXiv Detail & Related papers (2020-01-01T16:52:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.