Enhancing Biomedical Named Entity Recognition using GLiNER-BioMed with Targeted Dictionary-Based Post-processing for BioASQ 2025 task 6
- URL: http://arxiv.org/abs/2510.08588v1
- Date: Fri, 03 Oct 2025 18:35:04 GMT
- Title: Enhancing Biomedical Named Entity Recognition using GLiNER-BioMed with Targeted Dictionary-Based Post-processing for BioASQ 2025 task 6
- Authors: Ritesh Mehta,
- Abstract summary: This study evaluates the GLiNER-BioMed model on a BioASQ dataset.<n>We introduce a targeted dictionary-based post-processing strategy to address common misclassifications.<n>This work highlights the potential of dictionary-based refinement for pre-trained BioNER models but underscores the critical challenge of overfitting to development data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biomedical Named Entity Recognition (BioNER), task6 in BioASQ (A challenge in large-scale biomedical semantic indexing and question answering), is crucial for extracting information from scientific literature but faces hurdles such as distinguishing between similar entity types like genes and chemicals. This study evaluates the GLiNER-BioMed model on a BioASQ dataset and introduces a targeted dictionary-based post-processing strategy to address common misclassifications. While this post-processing approach demonstrated notable improvement on our development set, increasing the micro F1-score from a baseline of 0.79 to 0.83, this enhancement did not generalize to the blind test set, where the post-processed model achieved a micro F1-score of 0.77 compared to the baselines 0.79. We also discuss insights gained from exploring alternative methodologies, including Conditional Random Fields. This work highlights the potential of dictionary-based refinement for pre-trained BioNER models but underscores the critical challenge of overfitting to development data and the necessity of ensuring robust generalization for real-world applicability.
Related papers
- Investigating Data Pruning for Pretraining Biological Foundation Models at Scale [47.09153330837959]
We propose a post-hoc influence-guided data pruning framework tailored to biological domains.<n>Our framework consistently outperforms random selection baselines under an extreme pruning rate of over 99 percent.<n>These findings underscore the potential of influence-guided data pruning to substantially reduce the computational cost of BioFM pretraining.
arXiv Detail & Related papers (2025-12-15T02:42:52Z) - Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling [74.25438319700929]
We propose CHMR (Cell-aware Hierarchical Multi-modal Representations), a robust framework that models local-global dependencies between molecules and cellular responses.<n> evaluated on nine public benchmarks spanning 728 tasks, CHMR outperforms state-of-the-art baselines.<n>Results demonstrate the advantage of hierarchy-aware, multimodal learning for reliable and biologically grounded molecular representations.
arXiv Detail & Related papers (2025-11-26T07:15:00Z) - GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition [0.06554326244334868]
We introduce GLiNER-BioMed, a domain-adapted suite of Generalist and Lightweight Model for NER (GLiNER) models specifically tailored for biomedicine.<n>In contrast to conventional approaches, GLiNER uses natural language labels to infer arbitrary entity types, enabling zero-shot recognition.<n> Experiments on several biomedical datasets demonstrate that GLiNER-BioMed outperforms the state-of-the-art in both zero-shot scenarios.
arXiv Detail & Related papers (2025-04-01T11:40:50Z) - KU AIGEN ICL EDI@BC8 Track 3: Advancing Phenotype Named Entity Recognition and Normalization for Dysmorphology Physical Examination Reports [20.19611327520341]
The objective of BioCreative8 Track 3 is to extract phenotypic key medical findings embedded within EHR texts and normalize these findings to Human Phenotype Ontology terms.<n>The presence of diverse surface forms in phenotypic findings makes it challenging to accurately normalize them to the correct HPO terms.<n>Our pipeline resulted in an exact extraction and normalization F1 score 2.6% higher than the mean score of all submissions received in response to the challenge.
arXiv Detail & Related papers (2025-01-16T18:53:32Z) - Augmenting Biomedical Named Entity Recognition with General-domain Resources [47.24727904076347]
Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations.<n>We propose GERBERA, a simple-yet-effective method that utilized general-domain NER datasets for training.<n>We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances.
arXiv Detail & Related papers (2024-06-15T15:28:02Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.<n>BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.<n>It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - Multi-level biomedical NER through multi-granularity embeddings and
enhanced labeling [3.8599767910528917]
This paper proposes a hybrid approach that integrates the strengths of multiple models.
BERT provides contextualized word embeddings, a pre-trained multi-channel CNN for character-level information capture, and following by a BiLSTM + CRF for sequence labelling and modelling dependencies between the words in the text.
We evaluate our model on the benchmark i2b2/2010 dataset, achieving an F1-score of 90.11.
arXiv Detail & Related papers (2023-12-24T21:45:36Z) - Improving Biomedical Entity Linking with Retrieval-enhanced Learning [53.24726622142558]
$k$NN-BioEL provides a BioEL model with the ability to reference similar instances from the entire training corpus as clues for prediction.
We show that $k$NN-BioEL outperforms state-of-the-art baselines on several datasets.
arXiv Detail & Related papers (2023-12-15T14:04:23Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves
Biomedical Machine Reading Comprehension Task [4.837365865245979]
We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task.
BioADAPT-MRC is a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets.
arXiv Detail & Related papers (2022-02-26T16:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.