TACL: Threshold-Adaptive Curriculum Learning Strategy for Enhancing Medical Text Understanding
- URL: http://arxiv.org/abs/2510.15269v1
- Date: Fri, 17 Oct 2025 03:16:51 GMT
- Title: TACL: Threshold-Adaptive Curriculum Learning Strategy for Enhancing Medical Text Understanding
- Authors: Mucheng Ren, Yucheng Yan, He Chen, Danqing Hu, Jun Xu, Xian Zeng,
- Abstract summary: We present TACL (Threshold-Adaptive Curriculum Learning), a novel framework designed to rethink how models interact with medical texts during training.<n>By categorizing data into difficulty levels and prioritizing simpler cases early in training, TACL builds a strong foundation before tackling more complex records.<n>We observe significant improvements across diverse clinical tasks, including automatic ICD coding, readmission prediction and TCM syndrome differentiation.
- Score: 8.188646882370792
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Medical texts, particularly electronic medical records (EMRs), are a cornerstone of modern healthcare, capturing critical information about patient care, diagnoses, and treatments. These texts hold immense potential for advancing clinical decision-making and healthcare analytics. However, their unstructured nature, domain-specific language, and variability across contexts make automated understanding an intricate challenge. Despite the advancements in natural language processing, existing methods often treat all data as equally challenging, ignoring the inherent differences in complexity across clinical records. This oversight limits the ability of models to effectively generalize and perform well on rare or complex cases. In this paper, we present TACL (Threshold-Adaptive Curriculum Learning), a novel framework designed to address these challenges by rethinking how models interact with medical texts during training. Inspired by the principle of progressive learning, TACL dynamically adjusts the training process based on the complexity of individual samples. By categorizing data into difficulty levels and prioritizing simpler cases early in training, the model builds a strong foundation before tackling more complex records. By applying TACL to multilingual medical data, including English and Chinese clinical records, we observe significant improvements across diverse clinical tasks, including automatic ICD coding, readmission prediction and TCM syndrome differentiation. TACL not only enhances the performance of automated systems but also demonstrates the potential to unify approaches across disparate medical domains, paving the way for more accurate, scalable, and globally applicable medical text understanding solutions.
Related papers
- MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI [66.0701326117134]
MedForget is a hierarchy-aware multimodal unlearning testbed for building compliant medical AI systems.<n>We show that existing methods struggle to achieve complete, hierarchy-aware forgetting without reducing diagnostic performance.<n>We introduce a reconstruction attack that progressively adds hierarchical level context to prompts.
arXiv Detail & Related papers (2025-12-10T17:55:06Z) - ClinStructor: AI-Powered Structuring of Unstructured Clinical Texts [3.073796943975155]
We present ClinStructor, a pipeline that leverages large language models (LLMs) to convert clinical free-text into structured, task-specific question-answer pairs prior to predictive modeling.<n>Our method substantially enhances transparency and controllability and only leads to a modest reduction in predictive performance.
arXiv Detail & Related papers (2025-11-14T21:21:16Z) - A Comprehensive Review of Datasets for Clinical Mental Health AI Systems [55.67299586253951]
We present the first comprehensive survey of clinical mental health datasets relevant to the training and development of AI-powered clinical assistants.<n>Our survey identifies critical gaps such as a lack of longitudinal data, limited cultural and linguistic representation, inconsistent collection and annotation standards, and a lack of modalities in synthetic data.
arXiv Detail & Related papers (2025-08-13T13:42:35Z) - Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z) - One Patient, Many Contexts: Scaling Medical AI with Contextual Intelligence [16.450764388516244]
Context switching enables medical AI to adapt across specialties, populations, and geographies.<n>It requires advances in data design, model architectures, and evaluation frameworks.
arXiv Detail & Related papers (2025-06-11T20:23:57Z) - Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis [0.9944647907864256]
We propose a novel approach that integrates clinically-enhanced dynamic soft labels and medical graphical alignment.<n>Our approach is easily integrated into the medical CLIP training pipeline and achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-05-28T08:00:18Z) - Systematic Literature Review on Clinical Trial Eligibility Matching [0.24554686192257422]
Review highlights how explainable AI and standardized ontology can bolster clinician trust and broaden adoption.<n>Further research into advanced semantic and temporal representations, expanded data integration, and rigorous prospective evaluations is necessary to fully realize the transformative potential of NLP in clinical trial recruitment.
arXiv Detail & Related papers (2025-03-02T11:45:50Z) - A Multi-View Joint Learning Framework for Embedding Clinical Codes and
Text Using Graph Neural Networks [23.06795121693656]
We propose a framework that learns from codes and text to combine the availability and forward-looking nature of text and better performance of ICD codes.
Our approach uses a Graph Neural Network (GNN) to process ICD codes, and Bi-LSTM to process text.
In experiments using planned surgical procedure text, our model outperforms BERT models fine-tuned to clinical data.
arXiv Detail & Related papers (2023-01-27T09:19:03Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.