Ascle: A Python Natural Language Processing Toolkit for Medical Text
Generation
- URL: http://arxiv.org/abs/2311.16588v2
- Date: Sat, 9 Dec 2023 09:14:24 GMT
- Title: Ascle: A Python Natural Language Processing Toolkit for Medical Text
Generation
- Authors: Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun
Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha D Dave, Tiarnan D.L.
Keenan, Emily Y Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene
Li
- Abstract summary: Ascle is a pioneering natural language processing (NLP) toolkit designed for medical text generation.
Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution.
- Score: 30.883733024137506
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This study introduces Ascle, a pioneering natural language processing (NLP)
toolkit designed for medical text generation. Ascle is tailored for biomedical
researchers and healthcare professionals with an easy-to-use, all-in-one
solution that requires minimal programming expertise. For the first time, Ascle
evaluates and provides interfaces for the latest pre-trained language models,
encompassing four advanced and challenging generative functions:
question-answering, text summarization, text simplification, and machine
translation. In addition, Ascle integrates 12 essential NLP functions, along
with query and search capabilities for clinical databases. The toolkit, its
models, and associated data are publicly available via
https://github.com/Yale-LILY/MedGen.
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - Multilingual Simplification of Medical Texts [49.469685530201716]
We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages.
We evaluate fine-tuned and zero-shot models across these languages, with extensive human assessments and analyses.
Although models can now generate viable simplified texts, we identify outstanding challenges that this dataset might be used to address.
arXiv Detail & Related papers (2023-05-21T18:25:07Z) - Annotated Dataset Creation through General Purpose Language Models for
non-English Medical NLP [0.5482532589225552]
In our work we suggest to leverage pretrained language models for training data acquisition.
We create a custom dataset which we use to train a medical NER model for German texts, GPTNERMED.
arXiv Detail & Related papers (2022-08-30T18:42:55Z) - A Medical Information Extraction Workbench to Process German Clinical
Text [5.519657218427976]
We introduce a workbench: a collection of German clinical text processing models.
The models are trained on a de-identified corpus of German nephrology reports.
Our workbench is made publicly available so it can be used out of the box, as a benchmark or transferred to related problems.
arXiv Detail & Related papers (2022-07-08T13:19:19Z) - BigBIO: A Framework for Data-Centric Biomedical Natural Language
Processing [13.30221348538759]
We introduce BigBIO, a community library of 126+ biomedical NLP datasets.
BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata.
We discuss our process for task schema, data auditing, contribution guidelines, and outline two illustrative use cases.
arXiv Detail & Related papers (2022-06-30T07:15:45Z) - EHRKit: A Python Natural Language Processing Toolkit for Electronic
Health Record Texts [12.10507006658038]
We create a python library for clinical texts, EHRKit.
This library contains two main parts: MIMIC-III-specific functions and tasks specific functions.
The first part introduces a list of interfaces for accessing MIMIC-III NOTEEVENTS data, including basic search, information retrieval, and information extraction.
The second part integrates many third-party libraries for up to 12 off-shelf NLP tasks such as named entity recognition, summarization, machine translation, etc.
arXiv Detail & Related papers (2022-04-13T18:51:01Z) - HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural
Language Processing [3.762895631262445]
We developed a novel prompt-based clinical NLP framework called HealthPrompt.
We performed an in-depth analysis of HealthPrompt on six different PLMs in a no-data setting.
Our experiments prove that prompts effectively capture the context of clinical texts and perform remarkably well without any training data.
arXiv Detail & Related papers (2022-03-09T21:44:28Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.