Related papers: Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

URL: http://arxiv.org/abs/2311.16588v2
Date: Sat, 9 Dec 2023 09:14:24 GMT
Title: Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation
Authors: Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha D Dave, Tiarnan D.L. Keenan, Emily Y Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li
Abstract summary: Ascle is a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution.
Score: 30.883733024137506
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing four advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. The toolkit, its models, and associated data are publicly available via https://github.com/Yale-LILY/MedGen.

Related papers

A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding. There is no publicly available NLI corpus for the Romanian language. We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z)
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics. This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z)
Multilingual Simplification of Medical Texts [49.469685530201716]
We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages. We evaluate fine-tuned and zero-shot models across these languages, with extensive human assessments and analyses. Although models can now generate viable simplified texts, we identify outstanding challenges that this dataset might be used to address.
arXiv Detail & Related papers (2023-05-21T18:25:07Z)
Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP [0.5482532589225552]
In our work we suggest to leverage pretrained language models for training data acquisition. We create a custom dataset which we use to train a medical NER model for German texts, GPTNERMED.
arXiv Detail & Related papers (2022-08-30T18:42:55Z)
A Medical Information Extraction Workbench to Process German Clinical Text [5.519657218427976]
We introduce a workbench: a collection of German clinical text processing models. The models are trained on a de-identified corpus of German nephrology reports. Our workbench is made publicly available so it can be used out of the box, as a benchmark or transferred to related problems.
arXiv Detail & Related papers (2022-07-08T13:19:19Z)
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing [13.30221348538759]
We introduce BigBIO, a community library of 126+ biomedical NLP datasets. BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata. We discuss our process for task schema, data auditing, contribution guidelines, and outline two illustrative use cases.
arXiv Detail & Related papers (2022-06-30T07:15:45Z)
EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts [12.10507006658038]
We create a python library for clinical texts, EHRKit. This library contains two main parts: MIMIC-III-specific functions and tasks specific functions. The first part introduces a list of interfaces for accessing MIMIC-III NOTEEVENTS data, including basic search, information retrieval, and information extraction. The second part integrates many third-party libraries for up to 12 off-shelf NLP tasks such as named entity recognition, summarization, machine translation, etc.
arXiv Detail & Related papers (2022-04-13T18:51:01Z)
HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing [3.762895631262445]
We developed a novel prompt-based clinical NLP framework called HealthPrompt. We performed an in-depth analysis of HealthPrompt on six different PLMs in a no-data setting. Our experiments prove that prompts effectively capture the context of clinical texts and perform remarkably well without any training data.
arXiv Detail & Related papers (2022-03-09T21:44:28Z)
SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts. SCROLLS contains summarization, question answering, and natural language inference tasks. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z)
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.