Related papers: MaLei at the PLABA Track of TREC 2024: RoBERTa for Term Replacement -- LLaMA3.1 and GPT-4o for Complete Abstract Adaptation

MaLei at the PLABA Track of TREC 2024: RoBERTa for Term Replacement -- LLaMA3.1 and GPT-4o for Complete Abstract Adaptation

URL: http://arxiv.org/abs/2411.07381v4
Date: Mon, 17 Feb 2025 18:54:59 GMT
Title: MaLei at the PLABA Track of TREC 2024: RoBERTa for Term Replacement -- LLaMA3.1 and GPT-4o for Complete Abstract Adaptation
Authors: Zhidong Ling, Zihao Li, Pablo Romero, Lifeng Han, Goran Nenadic,
Abstract summary: This report is the system description of the MaLei team for the shared task Plain Language Adaptation of Biomedical Abstracts (PLABA) 2024.<n>In task one (term replacement), we applied fine-tuned ReBERTa-Base models to identify and classify the difficult terms, jargon, and acronyms in the biomedical abstracts.<n>In task two (complete abstract adaptation), we leveraged Llamma3.1-70B-Instruct and GPT-4o with the one-shot prompts to complete the abstract adaptation and reported the scores in BLEU, SARI, BERTScore, LENS, and SALSA.
Score: 11.380751114611368
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This report is the system description of the MaLei team (Manchester and Leiden) for the shared task Plain Language Adaptation of Biomedical Abstracts (PLABA) 2024 (we had an earlier name BeeManc following last year), affiliated with TREC2024 (33rd Text REtrieval Conference https://ir.nist.gov/evalbase/conf/trec-2024). This report contains two sections corresponding to the two sub-tasks in PLABA-2024. In task one (term replacement), we applied fine-tuned ReBERTa-Base models to identify and classify the difficult terms, jargon, and acronyms in the biomedical abstracts and reported the F1 score (Task 1A and 1B). In task two (complete abstract adaptation), we leveraged Llamma3.1-70B-Instruct and GPT-4o with the one-shot prompts to complete the abstract adaptation and reported the scores in BLEU, SARI, BERTScore, LENS, and SALSA. From the official Evaluation from PLABA-2024 on Task 1A and 1B, our much smaller fine-tuned RoBERTa-Base model ranked 3rd and 2nd respectively on the two sub-tasks, and the 1st on averaged F1 scores across the two tasks from 9 evaluated systems. Our LLaMA-3.1-70B-instructed model achieved the highest Completeness score for Task 2. We share our source codes, fine-tuned models, and related resources at https://github.com/HECTA-UoM/PLABA2024

Related papers

BeeManc at the PLABA Track of TAC-2023: Investigating LLMs and Controllable Attributes for Improving Biomedical Text Readability [16.05119302860606]
We describe the models and methods we used for our participation in the PLABA2023 task on biomedical abstract simplification. The system outputs we submitted come from the following three categories: 1) domain fine-tuned T5-like models including Biomedical-T5 and Lay-SciFive; 2) fine-tuned BARTLarge model with controllable attributes (via tokens) BART-w-CTs; 3) ChatGPTprompting.
arXiv Detail & Related papers (2024-08-07T16:21:41Z)
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation [136.7876524839751]
Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks. We propose Branch-Merge (BSM), a Large Language Model program (Schlag et al., 2023) for tackling such challenging natural language tasks. BSM improves the evaluation correctness and consistency for each LLM by enhancing human-LLM agreement by up to 26%.
arXiv Detail & Related papers (2023-10-23T17:29:48Z)
Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles [47.04555835353173]
This paper presents the results of the shared task on Lay Summarisation of Biomedical Research Articles (BioLaySumm) hosted at the BioNLP Workshop at ACL 2023. The goal of this shared task is to develop abstractive summarisation models capable of generating "lay summaries" In addition to overall results, we report on the setup and insights from the BioLaySumm shared task, which attracted a total of 20 participating teams across both subtasks.
arXiv Detail & Related papers (2023-09-29T15:43:42Z)
Generating EDU Extracts for Plan-Guided Summary Re-Ranking [77.7752504102925]
Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach. We design a novel method to generate candidates for re-ranking that addresses these issues. We show large relevance improvements over previously published methods on widely used single document news article corpora.
arXiv Detail & Related papers (2023-05-28T17:22:04Z)
GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from Doctor-Patient Conversations through Fine-tuning and In-context Learning [4.2570830892708225]
This paper presents our contribution to the MEDIQA-2023 Dialogue2Note shared task, encompassing both subtask A and subtask B. We approach the task as a dialogue summarization problem and implement two distinct pipelines: (a) a fine-tuning of a pre-trained dialogue summarization model and GPT-3, and (b) few-shot in-context learning (ICL) using a large language model, GPT-4. Both methods achieve excellent results in terms of ROUGE-1 F1, BERTScore F1 (deberta-xlarge-mnli), and BLEURT
arXiv Detail & Related papers (2023-05-08T19:16:26Z)
Findings of the WMT 2022 Shared Task on Translation Suggestion [63.457874930232926]
We report the result of the first edition of the WMT shared task on Translation Suggestion. The task aims to provide alternatives for specific words or phrases given the entire documents generated by machine translation (MT) It consists two sub-tasks, namely, the naive translation suggestion and translation suggestion with hints.
arXiv Detail & Related papers (2022-11-30T03:48:36Z)
Evaluating the Factual Consistency of Large Language Models Through News Summarization [97.04685401448499]
We propose a new benchmark called FIB(Factual Inconsistency Benchmark) that focuses on the task of summarization. For factually consistent summaries, we use human-written reference summaries that we manually verify as factually consistent. For factually inconsistent summaries, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent.
arXiv Detail & Related papers (2022-11-15T18:50:34Z)
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them [108.54545521369688]
We focus on a suite of 23 challenging BIG-Bench tasks which we call BIG-Bench Hard (BBH) We find that applying chain-of-thought (CoT) prompting to BBH tasks enables PaLM to surpass the average human-rater performance on 10 of the 23 tasks, and Codex to surpass the average human-rater performance on 17 of the 23 tasks.
arXiv Detail & Related papers (2022-10-17T17:08:26Z)
Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task. We participate in the general translation task on English$Leftrightarrow$Livonian. Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z)
BERT based Transformers lead the way in Extraction of Health Information from Social Media [0.0]
We participated in two tasks: (1) Classification, extraction and normalization of adverse drug effect (ADE) mentions in English tweets (Task-1) and (2) Classification of COVID-19 tweets containing symptoms (Task-6) Our system ranked first among all the submissions for subtask-1(a) with an F1-score of 61%. For subtask-1(b), our system obtained an F1-score of 50% with improvements up to +8% F1 over the score averaged across all submissions. The BERTweet model achieved an F1 score of 94% on SMM4H 2021 Task-6.
arXiv Detail & Related papers (2021-04-15T10:50:21Z)
ReCAM@IITK at SemEval-2021 Task 4: BERT and ALBERT based Ensemble for Abstract Word Prediction [2.482368922343792]
We fine-tuned the pre-trained masked language models namely BERT and ALBERT. We tried multiple approaches and found that Masked Language Modeling(MLM) based approach works the best.
arXiv Detail & Related papers (2021-04-04T08:22:19Z)
Dimsum @LaySumm 20: BART-based Approach for Scientific Document Summarization [50.939885303186195]
We build a lay summary generation system based on the BART model. We leverage sentence labels as extra supervision signals to improve the performance of lay summarization.
arXiv Detail & Related papers (2020-10-19T06:36:11Z)
aschern at SemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF, and Transfer Learning [22.90521056447551]
We describe our system for SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. We developed ensemble models using RoBERTa-based neural architectures, additional CRF layers, transfer learning between the two subtasks, and advanced post-processing to handle the multi-label nature of the task.
arXiv Detail & Related papers (2020-08-06T18:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.