Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations
- URL: http://arxiv.org/abs/2404.05415v2
- Date: Mon, 15 Apr 2024 00:45:32 GMT
- Title: Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations
- Authors: Yiming Li, Xueqing Peng, Jianfu Li, Xu Zuo, Suyuan Peng, Donghong Pei, Cui Tao, Hua Xu, Na Hong,
- Abstract summary: Generative Pre-trained Transformers (GPT) present a significant opportunity for extracting relations related to acupoint locations.
This study compares the performance of GPT with traditional deep learning models (LSTM) and Bidirectional Representations from Transformers for Biomedical Text Mining (BioBERT)
Fine-tuned GPT-3.5 consistently outperformed other models in F1 scores across all relation types.
- Score: 12.632106431145047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In acupuncture therapy, the accurate location of acupoints is essential for its effectiveness. The advanced language understanding capabilities of large language models (LLMs) like Generative Pre-trained Transformers (GPT) present a significant opportunity for extracting relations related to acupoint locations from textual knowledge sources. This study aims to compare the performance of GPT with traditional deep learning models (Long Short-Term Memory (LSTM) and Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT)) in extracting acupoint-related location relations and assess the impact of pretraining and fine-tuning on GPT's performance. We utilized the World Health Organization Standard Acupuncture Point Locations in the Western Pacific Region (WHO Standard) as our corpus, which consists of descriptions of 361 acupoints. Five types of relations ('direction_of,' 'distance_of,' 'part_of,' 'near_acupoint,' and 'located_near') (n= 3,174) between acupoints were annotated. Five models were compared: BioBERT, LSTM, pre-trained GPT-3.5, fine-tuned GPT-3.5, as well as pre-trained GPT-4. Performance metrics included micro-average exact match precision, recall, and F1 scores. Our results demonstrate that fine-tuned GPT-3.5 consistently outperformed other models in F1 scores across all relation types. Overall, it achieved the highest micro-average F1 score of 0.92. This study underscores the effectiveness of LLMs like GPT in extracting relations related to acupoint locations, with implications for accurately modeling acupuncture knowledge and promoting standard implementation in acupuncture training and practice. The findings also contribute to advancing informatics applications in traditional and complementary medicine, showcasing the potential of LLMs in natural language processing.
Related papers
- Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources [13.750202656564907]
Adverse event (AE) extraction is crucial for monitoring and analyzing the safety profiles of immunizations.
This study aims to evaluate the effectiveness of large language models (LLMs) and traditional deep learning models in AE extraction.
arXiv Detail & Related papers (2024-06-26T03:56:21Z) - Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4 [10.01547158445743]
We evaluate various Large Language Models (LLMs) with multiple strategies, including Chain-of-Thought, In-Context Learning, and Efficient Fine-Tuning (PEFT)
We found that the two PEFT adapters improves the F1 score (+0.0346) and consistency (+0.152) of the LLMs.
Averaging the three metrics, GPT-4 ranks joint-first in the competition with 0.8328.
arXiv Detail & Related papers (2024-03-30T22:27:21Z) - Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers [0.29530625605275984]
structured reporting (SR) has been recommended by various medical societies.
We propose a pipeline to extract information from free-text reports.
Our work aims to leverage the potential of Natural Language Processing (NLP) and Transformer-based models.
arXiv Detail & Related papers (2024-03-27T18:38:39Z) - Optimal path for Biomedical Text Summarization Using Pointer GPT [21.919661430250798]
GPT models have a tendency to generate factual errors, lack context, and oversimplify words.
To address these limitations, we replaced the attention mechanism in the GPT model with a pointer network.
The effectiveness of the Pointer-GPT model was evaluated using the ROUGE score.
arXiv Detail & Related papers (2024-03-22T02:13:23Z) - HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs [61.41790586411816]
HuatuoGPT-II has shown state-of-the-art performance in Chinese medicine domain on a number of benchmarks.
It even outperforms proprietary models like ChatGPT and GPT-4 in some aspects, especially in Traditional Chinese Medicine.
arXiv Detail & Related papers (2023-11-16T10:56:24Z) - Exploring the Boundaries of GPT-4 in Radiology [46.30976153809968]
GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context.
For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.
arXiv Detail & Related papers (2023-10-23T05:13:03Z) - TractGeoNet: A geometric deep learning framework for pointwise analysis
of tract microstructure to predict language assessment performance [66.43360974979386]
We propose a geometric deep-learning-based framework, TractGeoNet, for performing regression using diffusion magnetic resonance imaging (dMRI) tractography.
To improve regression performance, we propose a novel loss function, the Paired-Siamese Regression loss.
We evaluate the effectiveness of the proposed method by predicting individual performance on two neuropsychological assessments of language.
arXiv Detail & Related papers (2023-07-08T14:10:37Z) - Improving Large Language Models for Clinical Named Entity Recognition
via Prompt Engineering [20.534197056683695]
This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks.
We developed a task-specific prompt framework that includes baseline prompts, annotation guideline-based prompts, error analysis-based instructions, and annotated samples.
We assessed each prompt's effectiveness and compared the models to BioClinicalBERT.
arXiv Detail & Related papers (2023-03-29T02:46:18Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets.
We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z) - White Matter Tracts are Point Clouds: Neuropsychological Score
Prediction and Critical Region Localization via Geometric Deep Learning [68.5548609642999]
We propose a deep-learning-based framework for neuropsychological score prediction using white matter tract data.
We represent the arcuate fasciculus (AF) as a point cloud with microstructure measurements at each point.
We improve prediction performance with the proposed Paired-Siamese Loss that utilizes information about differences between continuous neuropsychological scores.
arXiv Detail & Related papers (2022-07-06T02:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.