ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for
Benchmarking Automatic Visit Note Generation
- URL: http://arxiv.org/abs/2306.02022v1
- Date: Sat, 3 Jun 2023 06:42:17 GMT
- Title: ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for
Benchmarking Automatic Visit Note Generation
- Authors: Wen-wai Yim, Yujuan Fu, Asma Ben Abacha, Neal Snider, Thomas Lin, and
Meliha Yetisgen
- Abstract summary: We present the largest dataset to date tackling the problem of AI-assisted note generation from visit dialogue.
We also present the benchmark performances of several common state-of-the-art approaches.
- Score: 4.1331432182859436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent immense breakthroughs in generative models such as in GPT4 have
precipitated re-imagined ubiquitous usage of these models in all applications.
One area that can benefit by improvements in artificial intelligence (AI) is
healthcare. The note generation task from doctor-patient encounters, and its
associated electronic medical record documentation, is one of the most arduous
time-consuming tasks for physicians. It is also a natural prime potential
beneficiary to advances in generative models. However with such advances,
benchmarking is more critical than ever. Whether studying model weaknesses or
developing new evaluation metrics, shared open datasets are an imperative part
of understanding the current state-of-the-art. Unfortunately as clinic
encounter conversations are not routinely recorded and are difficult to
ethically share due to patient confidentiality, there are no sufficiently large
clinic dialogue-note datasets to benchmark this task. Here we present the
Ambient Clinical Intelligence Benchmark (ACI-BENCH) corpus, the largest dataset
to date tackling the problem of AI-assisted note generation from visit
dialogue. We also present the benchmark performances of several common
state-of-the-art approaches.
Related papers
- Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges [2.1835659964186087]
This paper presents a systematic review of generative models used to synthesize various medical data types.
Our study encompasses a broad array of medical data modalities and explores various generative models.
arXiv Detail & Related papers (2024-06-27T14:00:11Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP [9.432205523734707]
We introduce a new task of automatically generating lay definitions, aiming to simplify medical terms into patient-friendly lay language.
We first created the dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions.
We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality.
arXiv Detail & Related papers (2023-12-24T23:01:00Z) - Explainable AI for clinical and remote health applications: a survey on
tabular and time series data [3.655021726150368]
It is worth noting that XAI has not gathered the same attention across different research areas and data types, especially in healthcare.
This paper provides a review of the literature in the last 5 years, illustrating the type of generated explanations and the efforts provided to evaluate their relevance and quality.
arXiv Detail & Related papers (2022-09-14T10:01:29Z) - ICDBigBird: A Contextual Embedding Model for ICD Code Classification [71.58299917476195]
Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks.
ICDBigBird is a BigBird-based model which can integrate a Graph Convolutional Network (GCN)
Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task.
arXiv Detail & Related papers (2022-04-21T20:59:56Z) - Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation [56.25869366777579]
In recent years, machine learning models have rapidly become better at generating clinical consultation notes.
We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors.
We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
arXiv Detail & Related papers (2022-04-01T14:04:16Z) - Benchmark datasets driving artificial intelligence development fail to
capture the needs of medical professionals [4.799783526620609]
We released a catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP)
A total of 450 NLP datasets were manually systematized and annotated with rich metadata.
Our analysis indicates that AI benchmarks of direct clinical relevance are scarce and fail to cover most work activities that clinicians want to see addressed.
arXiv Detail & Related papers (2022-01-18T15:05:28Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Biomedical Concept Relatedness -- A large EHR-based benchmark [10.133874724214984]
A promising application of AI to healthcare is the retrieval of information from electronic health records.
The suitability of AI methods for such applications is tested by predicting the relatedness of concepts with known relatedness scores.
All existing biomedical concept relatedness datasets are notoriously small and consist of hand-picked concept pairs.
We open-source a novel concept relatedness benchmark overcoming these issues.
arXiv Detail & Related papers (2020-10-30T12:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.