SemEval-2023 Task 7: Multi-Evidence Natural Language Inference for
Clinical Trial Data
- URL: http://arxiv.org/abs/2305.02993v2
- Date: Thu, 11 May 2023 09:10:06 GMT
- Title: SemEval-2023 Task 7: Multi-Evidence Natural Language Inference for
Clinical Trial Data
- Authors: Ma\"el Jullien, Marco Valentino, Hannah Frost, Paul O'Regan, Donal
Landers, Andr\'e Freitas
- Abstract summary: This paper describes the results of SemEval 2023 task 7 -- Multi-Evidence Natural Language Inference for Clinical Trial Data.
The proposed challenges require multi-hop biomedical and numerical reasoning.
We observe significantly better performance on the evidence selection task than on the entailment task.
- Score: 1.6932802756478729
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the results of SemEval 2023 task 7 -- Multi-Evidence
Natural Language Inference for Clinical Trial Data (NLI4CT) -- consisting of 2
tasks, a Natural Language Inference (NLI) task, and an evidence selection task
on clinical trial data. The proposed challenges require multi-hop biomedical
and numerical reasoning, which are of significant importance to the development
of systems capable of large-scale interpretation and retrieval of medical
evidence, to provide personalized evidence-based care.
Task 1, the entailment task, received 643 submissions from 40 participants,
and Task 2, the evidence selection task, received 364 submissions from 23
participants. The tasks are challenging, with the majority of submitted systems
failing to significantly outperform the majority class baseline on the
entailment task, and we observe significantly better performance on the
evidence selection task than on the entailment task. Increasing the number of
model parameters leads to a direct increase in performance, far more
significant than the effect of biomedical pre-training. Future works could
explore the limitations of large models for generalization and numerical
inference, and investigate methods to augment clinical datasets to allow for
more rigorous testing and to facilitate fine-tuning.
We envisage that the dataset, models, and results of this task will be useful
to the biomedical NLI and evidence retrieval communities. The dataset,
competition leaderboard, and website are publicly available.
Related papers
- Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts.
MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation.
MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z) - DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness [27.14794371879541]
This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials.
By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement, we introduce greater diversity and reduce shortcut learning.
Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark.
arXiv Detail & Related papers (2024-04-14T10:02:47Z) - SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials [13.59675117792588]
We present SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for ClinicalTrials.
Our contributions include the refined NLI4CT-P dataset (i.e., Natural Language Inference for Clinical Trials - Perturbed)
A total of 106 participants registered for the task contributing to over 1200 individual submissions and 25 system overview papers.
This initiative aims to advance the robustness and applicability of NLI models in healthcare, ensuring safer and more dependable AI assistance in clinical decision-making.
arXiv Detail & Related papers (2024-04-07T13:58:41Z) - SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes [48.83290963506378]
This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations.
We observe a number of key trends in how this approach was tackled.
While a majority of the teams did outperform our proposed baseline system, the performances of top-scoring systems are still consistent with a random handling of the more challenging items.
arXiv Detail & Related papers (2024-03-12T15:06:22Z) - Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models [73.79091519226026]
Uncertainty of Thoughts (UoT) is an algorithm to augment large language models with the ability to actively seek information by asking effective questions.
In experiments on medical diagnosis, troubleshooting, and the 20 Questions game, UoT achieves an average performance improvement of 38.1% in the rate of successful task completion.
arXiv Detail & Related papers (2024-02-05T18:28:44Z) - Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse
Biomedical Tasks [19.091278630792615]
Most existing biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks.
We present Taiyi, a bilingual fine-tuned LLM for diverse biomedical tasks.
arXiv Detail & Related papers (2023-11-20T08:51:30Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - Sebis at SemEval-2023 Task 7: A Joint System for Natural Language
Inference and Evidence Retrieval from Clinical Trial Reports [0.799536002595393]
SemEval-2023 Task 7 was to develop an NLP system for two tasks: evidence retrieval and natural language inference from clinical trial data.
Our system ranked 3rd out of 40 participants with a final submission.
arXiv Detail & Related papers (2023-04-25T22:22:42Z) - Specialty-Oriented Generalist Medical AI for Chest CT Screening [14.31187762890342]
We propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with application in lung cancer screening and related tasks.
M3FM consistently outperforms the state-of-the-art single-modal task-specific models.
As a specialty-oriented generalist medical AI model, M3FM paves the way for similar breakthroughs in other areas of medicine.
arXiv Detail & Related papers (2023-04-03T20:19:56Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.