Text2MDT: Extracting Medical Decision Trees from Medical Texts
- URL: http://arxiv.org/abs/2401.02034v1
- Date: Thu, 4 Jan 2024 02:33:38 GMT
- Title: Text2MDT: Extracting Medical Decision Trees from Medical Texts
- Authors: Wei Zhu and Wenfeng Li and Xing Tian and Pengfei Wang and Xiaoling
Wang and Jin Chen and Yuanbin Wu and Yuan Ni and Guotong Xie
- Abstract summary: We propose a novel task, Text2MDT, to explore the automatic extraction of medical decision trees (MDTs) from medical texts.
We normalize the form of the MDT and create an annotated Text-to-MDT dataset in Chinese with the participation of medical experts.
- Score: 33.58610255918941
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge of the medical decision process, which can be modeled as medical
decision trees (MDTs), is critical to build clinical decision support systems.
However, the current MDT construction methods rely heavily on time-consuming
and laborious manual annotation. In this work, we propose a novel task,
Text2MDT, to explore the automatic extraction of MDTs from medical texts such
as medical guidelines and textbooks. We normalize the form of the MDT and
create an annotated Text-to-MDT dataset in Chinese with the participation of
medical experts. We investigate two different methods for the Text2MDT tasks:
(a) an end-to-end framework which only relies on a GPT style large language
models (LLM) instruction tuning to generate all the node information and tree
structures. (b) The pipeline framework which decomposes the Text2MDT task to
three subtasks. Experiments on our Text2MDT dataset demonstrate that: (a) the
end-to-end method basd on LLMs (7B parameters or larger) show promising
results, and successfully outperform the pipeline methods. (b) The
chain-of-thought (COT) prompting method \cite{Wei2022ChainOT} can improve the
performance of the fine-tuned LLMs on the Text2MDT test set. (c) the
lightweight pipelined method based on encoder-based pretrained models can
perform comparably with LLMs with model complexity two magnititudes smaller.
Our Text2MDT dataset is open-sourced at
\url{https://tianchi.aliyun.com/dataset/95414}, and the source codes are
open-sourced at \url{https://github.com/michael-wzhu/text2dt}.
Related papers
- Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard
Parameter Sharing [72.56219471145232]
We propose a ST/MT multi-tasking framework with hard parameter sharing.
Our method reduces the speech-text modality gap via a pre-processing stage.
We show that our framework improves attentional encoder-decoder, Connectionist Temporal Classification (CTC), transducer, and joint CTC/attention models by an average of +0.5 BLEU.
arXiv Detail & Related papers (2023-09-27T17:48:14Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - LLMDet: A Third Party Large Language Models Generated Text Detection
Tool [119.0952092533317]
Large language models (LLMs) are remarkably close to high-quality human-authored text.
Existing detection tools can only differentiate between machine-generated and human-authored text.
We propose LLMDet, a model-specific, secure, efficient, and extendable detection tool.
arXiv Detail & Related papers (2023-05-24T10:45:16Z) - Demonstrate-Search-Predict: Composing retrieval and language models for
knowledge-intensive NLP [77.817293104436]
We propose a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM.
We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings.
arXiv Detail & Related papers (2022-12-28T18:52:44Z) - Text2Struct: A Machine Learning Pipeline for Mining Structured Data from
Text [4.709764624933227]
This paper presents an end-to-end machine learning pipeline, Text2Struct.
It includes a text annotation scheme, training data processing, and machine learning implementation.
It is anticipated to further improve the pipeline by expanding the dataset and investigating other machine learning models.
arXiv Detail & Related papers (2022-12-18T09:31:36Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - Extend and Explain: Interpreting Very Long Language Models [0.0]
We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction.
MSP identifies 1.7x more clinically informative text blocks than the previous state-of-the-art, runs up to 100x faster, and is tractable for generating important phrase pairs.
arXiv Detail & Related papers (2022-09-02T17:15:43Z) - What Makes Data-to-Text Generation Hard for Pretrained Language Models? [17.07349898176898]
Expressing natural language descriptions of structured facts or relations -- data-to-text generation (D2T) -- increases the accessibility of structured knowledge repositories.
Previous work shows that pre-trained language models(PLMs) perform remarkably well on this task after fine-tuning on a significant amount of task-specific training data.
We conduct an empirical study of both fine-tuned and auto-regressive PLMs on the DART multi-domain D2T dataset.
arXiv Detail & Related papers (2022-05-23T17:58:39Z) - Neural Pipeline for Zero-Shot Data-to-Text Generation [3.42658286826597]
We propose to generate text by transforming single-item descriptions with a sequence of modules trained on general-domain text-based operations.
Our experiments on two major triple-to-text datasets -- WebNLG and E2E -- show that our approach enables D2T generation from RDF triples in zero-shot settings.
arXiv Detail & Related papers (2022-03-30T13:14:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.