Multi-Label Clinical Text Eligibility Classification and Summarization System
- URL: http://arxiv.org/abs/2510.13115v1
- Date: Wed, 15 Oct 2025 03:21:43 GMT
- Title: Multi-Label Clinical Text Eligibility Classification and Summarization System
- Authors: Surya Tejaswi Yerramsetty, Almas Fathimah,
- Abstract summary: We propose a system that leverages Natural Language Processing (NLP) and Large Language Models (LLMs) to automate clinical text eligibility classification and summarization.<n>The system combines feature extraction methods such as word embeddings (Word2Vec) and named entity recognition to identify relevant medical concepts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical trials are central to medical progress because they help improve understanding of human health and the healthcare system. They play a key role in discovering new ways to detect, prevent, or treat diseases, and it is essential that clinical trials include participants with appropriate and diverse medical backgrounds. In this paper, we propose a system that leverages Natural Language Processing (NLP) and Large Language Models (LLMs) to automate multi-label clinical text eligibility classification and summarization. The system combines feature extraction methods such as word embeddings (Word2Vec) and named entity recognition to identify relevant medical concepts, along with traditional vectorization techniques such as count vectorization and TF-IDF (Term Frequency-Inverse Document Frequency). We further explore weighted TF-IDF word embeddings that integrate both count-based and embedding-based strengths to capture term importance effectively. Multi-label classification using Random Forest and SVM models is applied to categorize documents based on eligibility criteria. Summarization techniques including TextRank, Luhn, and GPT-3 are evaluated to concisely summarize eligibility requirements. Evaluation with ROUGE scores demonstrates the effectiveness of the proposed methods. This system shows potential for automating clinical trial eligibility assessment using data-driven approaches, thereby improving research efficiency.
Related papers
- Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z) - Large Language Models for Healthcare Text Classification: A Systematic Review [4.8342038441006805]
Large Language Models (LLMs) have fundamentally transformed approaches to Natural Language Processing (NLP)<n>In healthcare, accurate and cost-efficient text classification is crucial, whether for clinical notes analysis, diagnosis coding, or any other task.<n>Numerous studies have been conducted to leverage LLMs for automated healthcare text classification.
arXiv Detail & Related papers (2025-03-03T04:16:13Z) - Systematic Literature Review on Clinical Trial Eligibility Matching [0.24554686192257422]
Review highlights how explainable AI and standardized ontology can bolster clinician trust and broaden adoption.<n>Further research into advanced semantic and temporal representations, expanded data integration, and rigorous prospective evaluations is necessary to fully realize the transformative potential of NLP in clinical trial recruitment.
arXiv Detail & Related papers (2025-03-02T11:45:50Z) - SNOBERT: A Benchmark for clinical notes entity linking in the SNOMED CT clinical terminology [43.89160296332471]
We propose a method for linking text spans in clinical notes to specific concepts in the SNOMED CT using BERT-based models.
The method consists of two stages: candidate selection and candidate matching. The models were trained on one of the largest publicly available dataset of labeled clinical notes.
arXiv Detail & Related papers (2024-05-25T08:00:44Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Applying unsupervised keyphrase methods on concepts extracted from
discharge sheets [7.102620843620572]
It is necessary to identify the section in which each content is recorded and also to identify key concepts to extract meaning from clinical texts.
In this study, these challenges have been addressed by using clinical natural language processing techniques.
A set of popular unsupervised key phrase extraction methods has been verified and evaluated.
arXiv Detail & Related papers (2023-03-15T20:55:25Z) - User-Driven Research of Medical Note Generation Software [49.85146209418244]
We present three rounds of user studies carried out in the context of developing a medical note generation system.
We discuss the participating clinicians' impressions and views of how the system ought to be adapted to be of value to them.
We describe a three-week test run of the system in a live telehealth clinical practice.
arXiv Detail & Related papers (2022-05-05T10:18:06Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - Automated Coding of Under-Studied Medical Concept Domains: Linking
Physical Activity Reports to the International Classification of Functioning,
Disability, and Health [22.196642357767338]
Many domains of medical concepts lack well-developed terminologies that can support effective coding of medical text.
We present a framework for developing natural language processing (NLP) technologies for automated coding of under-studied types of medical information.
arXiv Detail & Related papers (2020-11-27T20:02:59Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.