BeeManc at the PLABA Track of TAC-2023: Investigating LLMs and Controllable Attributes for Improving Biomedical Text Readability
- URL: http://arxiv.org/abs/2408.03871v1
- Date: Wed, 7 Aug 2024 16:21:41 GMT
- Title: BeeManc at the PLABA Track of TAC-2023: Investigating LLMs and Controllable Attributes for Improving Biomedical Text Readability
- Authors: Zihao Li, Samuel Belkadi, Nicolo Micheletti, Lifeng Han, Matthew Shardlow, Goran Nenadic,
- Abstract summary: We describe the models and methods we used for our participation in the PLABA2023 task on biomedical abstract simplification.
The system outputs we submitted come from the following three categories: 1) domain fine-tuned T5-like models including Biomedical-T5 and Lay-SciFive; 2) fine-tuned BARTLarge model with controllable attributes (via tokens) BART-w-CTs; 3) ChatGPTprompting.
- Score: 16.05119302860606
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this system report, we describe the models and methods we used for our participation in the PLABA2023 task on biomedical abstract simplification, part of the TAC 2023 tracks. The system outputs we submitted come from the following three categories: 1) domain fine-tuned T5-like models including Biomedical-T5 and Lay-SciFive; 2) fine-tuned BARTLarge model with controllable attributes (via tokens) BART-w-CTs; 3) ChatGPTprompting. We also present the work we carried out for this task on BioGPT finetuning. In the official automatic evaluation using SARI scores, BeeManc ranks 2nd among all teams and our model LaySciFive ranks 3rd among all 13 evaluated systems. In the official human evaluation, our model BART-w-CTs ranks 2nd on Sentence-Simplicity (score 92.84), 3rd on Term-Simplicity (score 82.33) among all 7 evaluated systems; It also produced a high score 91.57 on Fluency in comparison to the highest score 93.53. In the second round of submissions, our team using ChatGPT-prompting ranks the 2nd in several categories including simplified term accuracy score 92.26 and completeness score 96.58, and a very similar score on faithfulness score 95.3 to re-evaluated PLABA-base-1 (95.73) via human evaluations. Our codes, fine-tuned models, prompts, and data splits from the system development stage will be available at https://github.com/ HECTA-UoM/PLABA-MU
Related papers
- Preference Optimization for Reasoning with Pseudo Feedback [100.62603571434167]
We introduce a novel approach to generate pseudo feedback for reasoning tasks by framing the labeling of solutions as an evaluation against associated test cases.
We conduct experiments on both mathematical reasoning and coding tasks using pseudo feedback for preference optimization, and observe improvements across both tasks.
arXiv Detail & Related papers (2024-11-25T12:44:02Z) - BeeManc at the PLABA Track of TAC-2024: RoBERTa for task 1 -- LLaMA3.1 and GPT-4o for task 2 [11.380751114611368]
This report contains two sections corresponding to the two sub-tasks in PLABA 2024.
In task one, we applied fine-tuned ReBERTa-Base models to identify and classify the difficult terms, jargon and acronyms in the biomedical abstracts and reported the F1 score.
In task two, we leveraged Llamma3.1-70B-Instruct and GPT-4o with the one-shot prompts to complete the abstract adaptation and reported the scores in BLEU, SARI, BERTScore, LENS, and SALSA.
arXiv Detail & Related papers (2024-11-11T21:32:06Z) - ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - Investigating Large Language Models and Control Mechanisms to Improve Text Readability of Biomedical Abstracts [16.05119302860606]
We investigate the ability of state-of-the-art large language models (LLMs) on the task of biomedical abstract simplification.
The methods applied include domain fine-tuning and prompt-based learning (PBL)
We used a range of automatic evaluation metrics, including BLEU, ROUGE, SARI, and BERTscore, and also conducted human evaluations.
arXiv Detail & Related papers (2023-09-22T22:47:32Z) - Text Classification via Large Language Models [63.1874290788797]
We introduce Clue And Reasoning Prompting (CARP) to address complex linguistic phenomena involved in text classification.
Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks.
More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups.
arXiv Detail & Related papers (2023-05-15T06:24:45Z) - ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization [65.58562481279023]
We propose ZooD, a paradigm for PTMs ranking and ensemble with feature selection.
We evaluate our paradigm on a diverse model zoo consisting of 35 models for various Out-of-Distribution (OoD) tasks.
arXiv Detail & Related papers (2022-10-17T16:31:57Z) - BCH-NLP at BioCreative VII Track 3: medications detection in tweets
using transformer networks and multi-task learning [9.176393163624002]
We implement a multi-task learning model that is jointly trained on text classification and sequence labelling.
Our best system run achieved a strict F1 of 80.4, ranking first and more than 10 points higher than the average score of all participants.
arXiv Detail & Related papers (2021-11-26T19:22:51Z) - Benchmarking for Biomedical Natural Language Processing Tasks with a
Domain Specific ALBERT [9.8215089151757]
We present BioALBERT, a domain-specific adaptation of A Lite Bidirectional Representations from Transformers (ALBERT)
It is trained on biomedical and PubMed Central and clinical corpora and fine tuned for 6 different tasks across 20 benchmark datasets.
It represents a new state of the art in 17 out of 20 benchmark datasets.
arXiv Detail & Related papers (2021-07-09T11:47:13Z) - Multilabel 12-Lead Electrocardiogram Classification Using Gradient
Boosting Tree Ensemble [64.29529357862955]
We build an algorithm using gradient boosted tree ensembles fitted on morphology and signal processing features to classify ECG diagnosis.
For each lead, we derive features from heart rate variability, PQRST template shape, and the full signal waveform.
We join the features of all 12 leads to fit an ensemble of gradient boosting decision trees to predict probabilities of ECG instances belonging to each class.
arXiv Detail & Related papers (2020-10-21T18:11:36Z) - aschern at SemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF,
and Transfer Learning [22.90521056447551]
We describe our system for SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles.
We developed ensemble models using RoBERTa-based neural architectures, additional CRF layers, transfer learning between the two subtasks, and advanced post-processing to handle the multi-label nature of the task.
arXiv Detail & Related papers (2020-08-06T18:45:25Z) - DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques.
We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.