Related papers: HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools

HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools

URL: http://arxiv.org/abs/2402.12372v2
Date: Tue, 20 Feb 2024 13:10:27 GMT
Title: HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools
Authors: Mario S\"anger, Samuele Garda, Xing David Wang, Leon Weber-Genzel, Pia Droop, Benedikt Fuchs, Alan Akbik, Ulf Leser
Abstract summary: We report on the results of a cross-corpus benchmark for named entity extraction using biomedical text mining tools. Our results indicate that users of BTM tools should expect diminishing performances when applying them in the wild compared to original publications.
Score: 4.882266258243112
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the exponential growth of the life science literature, biomedical text mining (BTM) has become an essential technology for accelerating the extraction of insights from publications. Identifying named entities (e.g., diseases, drugs, or genes) in texts and their linkage to reference knowledge bases are crucial steps in BTM pipelines to enable information aggregation from different documents. However, tools for these two steps are rarely applied in the same context in which they were developed. Instead, they are applied in the wild, i.e., on application-dependent text collections different from those used for the tools' training, varying, e.g., in focus, genre, style, and text type. This raises the question of whether the reported performance of BTM tools can be trusted for downstream applications. Here, we report on the results of a carefully designed cross-corpus benchmark for named entity extraction, where tools were applied systematically to corpora not used during their training. Based on a survey of 28 published systems, we selected five for an in-depth analysis on three publicly available corpora encompassing four different entity types. Comparison between tools results in a mixed picture and shows that, in a cross-corpus setting, the performance is significantly lower than the one reported in an in-corpus setting. HunFlair2 showed the best performance on average, being closely followed by PubTator. Our results indicate that users of BTM tools should expect diminishing performances when applying them in the wild compared to original publications and show that further research is necessary to make BTM tools more robust.

Related papers

Tool Graph Retriever: Exploring Dependency Graph-based Tool Retrieval for Large Language Models [43.50789219459378]
We propose Tool Graph Retriever (TGR), which exploits the dependencies among tools to learn better tool representations for retrieval.<n>First, we construct a dataset termed TDI300K to train a discriminator for identifying tool dependencies.<n>Then, we represent all candidate tools as a tool dependency graph and use graph convolution to integrate the dependencies into their representations.
arXiv Detail & Related papers (2025-08-07T08:36:26Z)
Zero-Shot Document-Level Biomedical Relation Extraction via Scenario-based Prompt Design in Two-Stage with LLM [7.808231572590279]
We propose a novel approach to achieve the same results from unannotated full documents using general large language models (LLMs) with lower hardware and labor costs.<n>Our approach combines two major stages: named entity recognition (NER) and relation extraction (RE)<n>To enhance the effectiveness of prompt, we propose a five-part template structure and a scenario-based prompt design principles.
arXiv Detail & Related papers (2025-05-02T07:33:20Z)
Pre-training data selection for biomedical domain adaptation using journal impact metrics [0.0]
We employ two straightforward journal impact metrics and conduct experiments by continually pre-training BERT on various subsets of the complete PubMed training set. Our results show that pruning using journal impact metrics is not efficient. But we also show that pre-training using fewer abstracts (but with the same number of training steps) does not necessarily decrease the resulting model's performance.
arXiv Detail & Related papers (2024-09-04T13:59:48Z)
Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models. Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions. We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z)
What Are Tools Anyway? A Survey from the Language Model Perspective [67.18843218893416]
Language models (LMs) are powerful yet mostly for text generation tasks. We provide a unified definition of tools as external programs used by LMs. We empirically study the efficiency of various tooling methods.
arXiv Detail & Related papers (2024-03-18T17:20:07Z)
TRIAD: Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts [53.92293118080274]
Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle. Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR)
arXiv Detail & Related papers (2023-12-28T06:44:24Z)
Pseudo Label-Guided Data Fusion and Output Consistency for Semi-Supervised Medical Image Segmentation [9.93871075239635]
We propose the PLGDF framework, which builds upon the mean teacher network for segmenting medical images with less annotation. We propose a novel pseudo-label utilization scheme, which combines labeled and unlabeled data to augment the dataset effectively. Our framework yields superior performance compared to six state-of-the-art semi-supervised learning methods.
arXiv Detail & Related papers (2023-11-17T06:36:43Z)
Integrating curation into scientific publishing to train AI models [1.6982459897303823]
We have embedded multimodal data curation into the academic publishing process to annotate segmented figure panels and captions. The dataset, SourceData-NLP, contains more than 620,000 annotated biomedical entities. We evaluate the utility of the dataset to train AI models using named-entity recognition, segmentation of figure captions into their constituent panels, and a novel context-dependent semantic task.
arXiv Detail & Related papers (2023-10-31T13:22:38Z)
Investigating Large Language Models and Control Mechanisms to Improve Text Readability of Biomedical Abstracts [16.05119302860606]
We investigate the ability of state-of-the-art large language models (LLMs) on the task of biomedical abstract simplification. The methods applied include domain fine-tuning and prompt-based learning (PBL) We used a range of automatic evaluation metrics, including BLEU, ROUGE, SARI, and BERTscore, and also conducted human evaluations.
arXiv Detail & Related papers (2023-09-22T22:47:32Z)
BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs [3.780924717521521]
We propose a modular framework for learning embeddings in knowledge graphs. It allows encoding attribute data of different modalities while also supporting entities with missing attributes. We train models using a biomedical KG containing approximately 2 million triples.
arXiv Detail & Related papers (2023-06-06T11:49:38Z)
LLM-based Interaction for Content Generation: A Case Study on the Perception of Employees in an IT department [85.1523466539595]
This paper presents a questionnaire survey to identify the intention to use generative tools by employees of an IT company. Our results indicate a rather average acceptability of generative tools, although the more useful the tool is perceived to be, the higher the intention seems to be. Our analyses suggest that the frequency of use of generative tools is likely to be a key factor in understanding how employees perceive these tools in the context of their work.
arXiv Detail & Related papers (2023-04-18T15:35:43Z)
SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction. Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.