A Unified Biomedical Named Entity Recognition Framework with Large Language Models
- URL: http://arxiv.org/abs/2510.08902v1
- Date: Fri, 10 Oct 2025 01:33:54 GMT
- Title: A Unified Biomedical Named Entity Recognition Framework with Large Language Models
- Authors: Tengxiao Lv, Ling Luo, Juntao Li, Yanhua Wang, Yuchen Pan, Chao Liu, Yanan Wang, Yan Jiang, Huiyi Lv, Yuanyuan Sun, Jian Wang, Hongfei Lin,
- Abstract summary: We propose a unified Biomedical Named Entity Recognition (BioNER) framework based on Large Language Models (LLMs)<n>We first reformulate BioNER as a text generation task and design a symbolic tagging strategy to jointly handle both flat and nested entities.<n>We perform bilingual joint fine-tuning across multiple Chinese and English datasets.
- Score: 44.92744341698289
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate recognition of biomedical named entities is critical for medical information extraction and knowledge discovery. However, existing methods often struggle with nested entities, entity boundary ambiguity, and cross-lingual generalization. In this paper, we propose a unified Biomedical Named Entity Recognition (BioNER) framework based on Large Language Models (LLMs). We first reformulate BioNER as a text generation task and design a symbolic tagging strategy to jointly handle both flat and nested entities with explicit boundary annotation. To enhance multilingual and multi-task generalization, we perform bilingual joint fine-tuning across multiple Chinese and English datasets. Additionally, we introduce a contrastive learning-based entity selector that filters incorrect or spurious predictions by leveraging boundary-sensitive positive and negative samples. Experimental results on four benchmark datasets and two unseen corpora show that our method achieves state-of-the-art performance and robust zero-shot generalization across languages. The source codes are freely available at https://github.com/dreamer-tx/LLMNER.
Related papers
- Multilingual Text-to-Image Person Retrieval via Bidirectional Relation Reasoning and Aligning [81.43257201833154]
We propose Bi-IRRA: a Bidirectional Implicit Relation Reasoning and Aligning framework to learn alignment across languages and modalities.<n>Within Bi-IRRA, a bidirectional implicit relation reasoning module enables bidirectional prediction of masked image and text.<n>The proposed method achieves new state-of-the-art results on all multilingual TIPR datasets.
arXiv Detail & Related papers (2025-10-20T16:01:11Z) - BIBERT-Pipe on Biomedical Nested Named Entity Linking at BioASQ 2025 [5.329747408496098]
We present our system for the BioNNE 2025 Multilingual Biomedical Nested Named Entity Linking shared task (English & Russian)<n>We leverage the same base encoder model in both stages: the retrieval stage uses the original pre-trained model, while the ranking stage applies domain-specific fine-tuning.<n>On the BioNNE 2025 leaderboard, our two stage system, bilingual bert (BIBERT-Pipe), ranks third in the multilingual track, demonstrating the effectiveness and competitiveness of these minimal yet principled modifications.
arXiv Detail & Related papers (2025-09-10T09:14:25Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - DualNER: A Dual-Teaching framework for Zero-shot Cross-lingual Named
Entity Recognition [27.245171237640502]
DualNER is a framework to make full use of both annotated source language corpus and unlabeled target language text.
We combine two complementary learning paradigms of NER, i.e., sequence labeling and span prediction, into a unified multi-task framework.
arXiv Detail & Related papers (2022-11-15T12:50:59Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity
Recognition [5.030581940990434]
Cross-lingual Named Entity Recognition (NER) has recently become a research hotspot because it can alleviate the data-hungry problem for low-resource languages.
In this paper, we describe our novel dual-contrastive framework ConCNER for cross-lingual NER under the scenario of limited source-language labeled data.
arXiv Detail & Related papers (2022-04-02T07:59:13Z) - Learning Domain-Specialised Representations for Cross-Lingual Biomedical
Entity Linking [66.76141128555099]
We propose a novel cross-lingual biomedical entity linking task (XL-BEL)
We first investigate the ability of standard knowledge-agnostic as well as knowledge-enhanced monolingual and multilingual LMs beyond the standard monolingual English BEL task.
We then address the challenge of transferring domain-specific knowledge in resource-rich languages to resource-poor ones.
arXiv Detail & Related papers (2021-05-30T00:50:00Z) - Improving Biomedical Pretrained Language Models with Knowledge [22.61591249168801]
We propose KeBioLM, a biomedical pretrained language model that explicitly leverages knowledge from the UMLS knowledge bases.
Specifically, we extract entities from PubMed abstracts and link them to UMLS.
We then train a knowledge-aware language model that firstly applies a text-only encoding layer to learn entity representation and applies a text-entity fusion encoding to aggregate entity representation.
arXiv Detail & Related papers (2021-04-21T03:57:26Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Cross-lingual Entity Alignment with Adversarial Kernel Embedding and
Adversarial Knowledge Translation [35.77482102674059]
Cross-lingual entity alignment often suffers challenges from feature inconsistency to sequence context unawareness.
This paper presents a dual adversarial learning framework for cross-lingual entity alignment, DAEA, with two original contributions.
arXiv Detail & Related papers (2021-04-16T00:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.