Related papers: A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text

A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text

URL: http://arxiv.org/abs/2203.03823v1
Date: Tue, 8 Mar 2022 03:19:16 GMT
Title: A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text
Authors: Enwei Zhu, Qilin Sheng, Huanwan Yang, Jinpeng Li
Abstract summary: Current state-of-the-art (SOTA) NLP models are highly integrated with deep learning techniques. This study presents an engineering framework of medical entity recognition, relation extraction and attribute extraction.
Score: 1.4841452489515765
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. Current state-of-the-art (SOTA) NLP models are highly integrated with deep learning techniques and thus require massive annotated linguistic data. This study presents an engineering framework of medical entity recognition, relation extraction and attribute extraction, which are unified in annotation, modeling and evaluation. Specifically, the annotation scheme is comprehensive, and compatible between tasks, especially for the medical relations. The resulted annotated corpus includes 1,200 full medical records (or 18,039 broken-down documents), and achieves inter-annotator agreements (IAAs) of 94.53%, 73.73% and 91.98% F 1 scores for the three tasks. Three task-specific neural network models are developed within a shared structure, and enhanced by SOTA NLP techniques, i.e., pre-trained language models. Experimental results show that the system can retrieve medical entities, relations and attributes with F 1 scores of 93.47%, 67.14% and 90.89%, respectively. This study, in addition to our publicly released annotation scheme and code, provides solid and practical engineering experience of developing an integrated medical information extraction system.

Related papers

Pre-trained Language Models and Few-shot Learning for Medical Entity Extraction [2.9687381456164004]
This study proposes a medical entity extraction method based on Transformer. Considering the professionalism and complexity of medical texts, we compare the performance of different pre-trained language models. Few-shot Learning can enhance the accuracy of medical entity extraction.
arXiv Detail & Related papers (2025-04-06T06:36:33Z)
GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models [1.123722364748134]
This paper introduces GAMedX, a Named Entity Recognition (NER) approach utilizing Large Language Models (LLMs) The methodology integrates open-source LLMs for NER, utilizing chained prompts and Pydantic schemas for structured output to navigate the complexities of specialized medical jargon. The findings reveal significant ROUGE F1 score on one of the evaluation datasets with an accuracy of 98%.
arXiv Detail & Related papers (2024-05-31T02:53:22Z)
Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z)
High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models [1.9665865095034865]
We formulate the relation extraction task as binary classifications for large language models. We designate the main title as the tail entity and explicitly incorporate it into the context. Longer contents are sliced into text chunks, embedded, and retrieved with additional embedding models.
arXiv Detail & Related papers (2023-12-13T16:43:41Z)
Advancing Italian Biomedical Information Extraction with Transformers-based Models: Methodological Insights and Multicenter Practical Application [0.27027468002793437]
Information Extraction can help clinical practitioners overcome the limitation by using automated text-mining pipelines. We created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Transformers-based model. The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "low-resource" approach.
arXiv Detail & Related papers (2023-06-08T16:15:46Z)
Detecting automatically the layout of clinical documents to enhance the performances of downstream natural language processing [53.797797404164946]
We designed an algorithm to process clinical PDF documents and extract only clinically relevant text. The algorithm consists of several steps: initial text extraction using a PDF, followed by classification into such categories as body text, left notes, and footers. Medical performance was evaluated by examining the extraction of medical concepts of interest from the text in their respective sections.
arXiv Detail & Related papers (2023-05-23T08:38:33Z)
PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
Automated tabulation of clinical trial results: A joint entity and relation extraction approach with transformer-based language representations [5.825190876052148]
This paper investigates automating evidence table generation by decomposing the problem across two language processing tasks. We focus on the automatic tabulation of sentences from published RCT abstracts that report the practice outcomes. To train and test these models, a new gold-standard corpus was developed, comprising almost 600 result sentences from six disease areas.
arXiv Detail & Related papers (2021-12-10T15:26:43Z)
CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark. It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification. We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z)
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.