Related papers: Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models

Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models

URL: http://arxiv.org/abs/2506.22813v1
Date: Sat, 28 Jun 2025 08:28:52 GMT
Title: Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models
Authors: Zhuojun Ding, Wei Wei, Chenghao Fan,
Abstract summary: Supervised fine-tuning (SFT) is widely used to align large language models (LLMs) with information extraction (IE) tasks, such as named entity recognition (NER)<n>We propose the SaM framework, which dynamically Selects and Merges expert models at inference time.
Score: 5.466962214217334
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Supervised fine-tuning (SFT) is widely used to align large language models (LLMs) with information extraction (IE) tasks, such as named entity recognition (NER). However, annotating such fine-grained labels and training domain-specific models is costly. Existing works typically train a unified model across multiple domains, but such approaches lack adaptation and scalability since not all training data benefits target domains and scaling trained models remains challenging. We propose the SaM framework, which dynamically Selects and Merges expert models at inference time. Specifically, for a target domain, we select domain-specific experts pre-trained on existing domains based on (i) domain similarity to the target domain and (ii) performance on sampled instances, respectively. The experts are then merged to create task-specific models optimized for the target domain. By dynamically merging experts beneficial to target domains, we improve generalization across various domains without extra training. Additionally, experts can be added or removed conveniently, leading to great scalability. Extensive experiments on multiple benchmarks demonstrate our framework's effectiveness, which outperforms the unified model by an average of 10%. We further provide insights into potential improvements, practical experience, and extensions of our framework.

Related papers

MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization [0.0]
MoE-MLoRA is a mixture-of-experts framework where each expert is first trained independently to specialize in its domain.<n>We evaluate MoE-MLoRA across eight CTR models on Movielens and Taobao.
arXiv Detail & Related papers (2025-06-09T09:03:05Z)
LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization [61.16890890570814]
Domain generalization (DG) methods aim to maintain good performance in an unseen target domain by using training data from multiple source domains. This work introduces a simple yet effective framework, dubbed learning from multiple experts (LFME) that aims to make the target model an expert in all source domains to improve DG.
arXiv Detail & Related papers (2024-10-22T13:44:10Z)
Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification. We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z)
Boosting Large Language Models with Continual Learning for Aspect-based Sentiment Analysis [33.86086075084374]
Aspect-based sentiment analysis (ABSA) is an important subtask of sentiment analysis. We propose a Large Language Model-based Continual Learning (textttLLM-CL) model for ABSA.
arXiv Detail & Related papers (2024-05-09T02:00:07Z)
Large-Scale Multi-Domain Recommendation: an Automatic Domain Feature Extraction and Personalized Integration Framework [30.46152832695426]
We propose an Automatic Domain Feature Extraction and Personalized Integration (DFEI) framework for the large-scale multi-domain recommendation. The framework automatically transforms the behavior of each individual user into an aggregation of all user behaviors within the domain, which serves as the domain features. Experimental results on both public and industrial datasets, consisting of over 20 domains, clearly demonstrate that the proposed framework achieves significantly better performance compared with SOTA baselines.
arXiv Detail & Related papers (2024-04-12T09:57:17Z)
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts [33.21435044949033]
Most existing methods perform training on multiple source domains using a single model. We propose a novel framework for unsupervised test-time adaptation, which is formulated as a knowledge distillation process.
arXiv Detail & Related papers (2022-10-08T02:28:10Z)
META: Mimicking Embedding via oThers' Aggregation for Generalizable Person Re-identification [68.39849081353704]
Domain generalizable (DG) person re-identification (ReID) aims to test across unseen domains without access to the target domain data at training time. This paper presents a new approach called Mimicking Embedding via oThers' Aggregation (META) for DG ReID.
arXiv Detail & Related papers (2021-12-16T08:06:50Z)
TAL: Two-stream Adaptive Learning for Generalizable Person Re-identification [115.31432027711202]
We argue that both domain-specific and domain-invariant features are crucial for improving the generalization ability of re-id models. We name two-stream adaptive learning (TAL) to simultaneously model these two kinds of information. Our framework can be applied to both single-source and multi-source domain generalization tasks.
arXiv Detail & Related papers (2021-11-29T01:27:42Z)
Batch Normalization Embeddings for Deep Domain Generalization [50.51405390150066]
Domain generalization aims at training machine learning models to perform robustly across different and unseen domains. We show a significant increase in classification accuracy over current state-of-the-art techniques on popular domain generalization benchmarks.
arXiv Detail & Related papers (2020-11-25T12:02:57Z)
Zero-Resource Cross-Domain Named Entity Recognition [68.83177074227598]
Existing models for cross-domain named entity recognition rely on numerous unlabeled corpus or labeled NER training data in target domains. We propose a cross-domain NER model that does not use any external resources.
arXiv Detail & Related papers (2020-02-14T09:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.