DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language
Processing
- URL: http://arxiv.org/abs/2209.14901v1
- Date: Thu, 29 Sep 2022 16:05:53 GMT
- Title: DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language
Processing
- Authors: Yanjun Gao, Dmitriy Dligach, Timothy Miller, John Caskey, Brihat
Sharma, Matthew M Churpek, Majid Afshar
- Abstract summary: Diagnostic Reasoning Benchmarks, DR.BENCH, is a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability.
DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models.
- Score: 5.022185333260402
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The meaningful use of electronic health records (EHR) continues to progress
in the digital era with clinical decision support systems augmented by
artificial intelligence. A priority in improving provider experience is to
overcome information overload and reduce the cognitive burden so fewer medical
errors and cognitive biases are introduced during patient care. One major type
of medical error is diagnostic error due to systematic or predictable errors in
judgment that rely on heuristics. The potential for clinical natural language
processing (cNLP) to model diagnostic reasoning in humans with forward
reasoning from data to diagnosis and potentially reduce the cognitive burden
and medical error has not been investigated. Existing tasks to advance the
science in cNLP have largely focused on information extraction and named entity
recognition through classification tasks. We introduce a novel suite of tasks
coined as Diagnostic Reasoning Benchmarks, DR.BENCH, as a new benchmark for
developing and evaluating cNLP models with clinical diagnostic reasoning
ability. The suite includes six tasks from ten publicly available datasets
addressing clinical text understanding, medical knowledge reasoning, and
diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to
be a natural language generation framework to evaluate pre-trained language
models. Experiments with state-of-the-art pre-trained generative language
models using large general domain models and models that were continually
trained on a medical corpus demonstrate opportunities for improvement when
evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab
repository with a systematic approach to load and evaluate models for the cNLP
community.
Related papers
- Diagnostic Reasoning in Natural Language: Computational Model and Application [68.47402386668846]
We investigate diagnostic abductive reasoning (DAR) in the context of language-grounded tasks (NL-DAR)
We propose a novel modeling framework for NL-DAR based on Pearl's structural causal models.
We use the resulting dataset to investigate the human decision-making process in NL-DAR.
arXiv Detail & Related papers (2024-09-09T06:55:37Z) - Leveraging Large Language Models through Natural Language Processing to provide interpretable Machine Learning predictions of mental deterioration in real time [5.635300481123079]
Based on official estimates, 50 million people worldwide are affected by dementia, and this number increases by 10 million new patients every year.
To this end, Artificial Intelligence and computational linguistics can be exploited for natural language analysis, personalized assessment, monitoring, and treatment.
We contribute with an affordable, flexible, non-invasive, personalized diagnostic system to this work.
arXiv Detail & Related papers (2024-09-05T09:27:05Z) - A Survey of Models for Cognitive Diagnosis: New Developments and Future Directions [66.40362209055023]
This paper aims to provide a survey of current models for cognitive diagnosis, with more attention on new developments using machine learning-based methods.
By comparing the model structures, parameter estimation algorithms, model evaluation methods and applications, we provide a relatively comprehensive review of the recent trends in cognitive diagnosis models.
arXiv Detail & Related papers (2024-07-07T18:02:00Z) - A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis [51.07114445705692]
neurodegenerative diseases (NDs) traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring.
As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs.
The current advances in artificial intelligence (AI) models enable automatic gait analysis for NDs identification and classification.
arXiv Detail & Related papers (2024-05-21T06:44:40Z) - Assertion Detection Large Language Model In-context Learning LoRA
Fine-tuning [2.401755243180179]
We introduce a novel methodology that utilizes Large Language Models (LLMs) pre-trained on a vast array of medical data for assertion detection.
Our approach achieved an F-1 of 0.74, which is 0.31 higher than the previous method.
arXiv Detail & Related papers (2024-01-31T05:11:00Z) - Leveraging A Medical Knowledge Graph into Large Language Models for
Diagnosis Prediction [7.5569033426158585]
We propose an innovative approach for augmenting the proficiency of Large Language Models (LLMs) in automated diagnosis generation.
We derive the KG from the National Library of Medicine's Unified Medical Language System (UMLS), a robust repository of biomedical knowledge.
Our approach offers an explainable diagnostic pathway, edging us closer to the realization of AI-augmented diagnostic decision support systems.
arXiv Detail & Related papers (2023-08-28T06:05:18Z) - Rationale production to support clinical decision-making [31.66739991129112]
We apply InfoCal to the task of predicting hospital readmission using hospital discharge notes.
We find each presented model with selected interpretability or feature importance methods yield varying results.
arXiv Detail & Related papers (2021-11-15T09:02:10Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Natural Language Processing to Detect Cognitive Concerns in Electronic
Health Records Using Deep Learning [0.970914263240787]
Dementia is under-recognized in the community, under-diagnosed by healthcare professionals, and under-coded in claims data.
Information on cognitive dysfunction is often found in unstructured clinician notes within medical records but manual review by experts is time consuming and often prone to errors.
In order to identify patients with cognitive concerns in electronic medical records, we applied natural language processing (NLP) algorithms.
arXiv Detail & Related papers (2020-11-12T16:59:56Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.