Clinical Language Understanding Evaluation (CLUE)
- URL: http://arxiv.org/abs/2209.14377v1
- Date: Wed, 28 Sep 2022 19:14:08 GMT
- Title: Clinical Language Understanding Evaluation (CLUE)
- Authors: Travis R. Goodwin, and Dina Demner-Fushman
- Abstract summary: We present the Clinical Language Understanding Evaluation (CLUE) benchmark with a set of four clinical language understanding tasks, standard training, development, validation and testing sets derived from MIMIC data.
It is our hope that these data will enable direct comparison between approaches, improve and reduce the barrier-to-entry for developing novel models or methods for these clinical language understanding tasks.
- Score: 17.254884920876695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical language processing has received a lot of attention in recent years,
resulting in new models or methods for disease phenotyping, mortality
prediction, and other tasks. Unfortunately, many of these approaches are tested
under different experimental settings (e.g., data sources, training and testing
splits, metrics, evaluation criteria, etc.) making it difficult to compare
approaches and determine state-of-the-art. To address these issues and
facilitate reproducibility and comparison, we present the Clinical Language
Understanding Evaluation (CLUE) benchmark with a set of four clinical language
understanding tasks, standard training, development, validation and testing
sets derived from MIMIC data, as well as a software toolkit. It is our hope
that these data will enable direct comparison between approaches, improve
reproducibility, and reduce the barrier-to-entry for developing novel models or
methods for these clinical language understanding tasks.
Related papers
- Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments [67.80453452949303]
Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine.
Here, we focus on the widespread setting where the observational data come from multiple environments.
We propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models.
arXiv Detail & Related papers (2024-06-04T16:31:43Z) - Preserving the knowledge of long clinical texts using aggregated
ensembles of large language models [0.0]
Clinical texts contain rich and valuable information that can be used for various clinical outcome prediction tasks.
Applying large language models, such as BERT-based models, to clinical texts poses two major challenges.
This paper proposes a novel method to preserve the knowledge of long clinical texts using aggregated ensembles of large language models.
arXiv Detail & Related papers (2023-11-02T19:50:02Z) - ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data
and Comprehensive Evaluation [5.690250818139763]
Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks.
Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience.
We present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios.
arXiv Detail & Related papers (2023-06-16T16:56:32Z) - Process Knowledge-infused Learning for Clinician-friendly Explanations [14.405002816231477]
Language models can assess mental health using social media data.
They do not compare posts against clinicians' diagnostic processes.
It's challenging to explain language model outputs using concepts that the clinician can understand.
arXiv Detail & Related papers (2023-06-16T13:08:17Z) - Language Models are Few-shot Learners for Prognostic Prediction [0.4254099382808599]
We explore the use of transformers and language models in prognostic prediction for immunotherapy using real-world patients' clinical data and molecular profiles.
The study benchmarks the efficacy of baselines and language models on prognostic prediction across multiple cancer types and investigates the impact of different pretrained language models under few-shot regimes.
arXiv Detail & Related papers (2023-02-24T15:35:36Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.