Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages
- URL: http://arxiv.org/abs/2602.21374v1
- Date: Tue, 24 Feb 2026 21:10:29 GMT
- Title: Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages
- Authors: Mohammadreza Ghaffarzadeh-Esfahani, Nahid Yousefian, Ebrahim Heidari-Farsani, Ali Akbar Omidvarian, Sepehr Ghahraei, Atena Farangi, AmirBahador Boroumand,
- Abstract summary: This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source small language models (SLMs)<n>Models were assessed on macro-averaged F1-score, Matthews Correlation Coefficient (MCC), sensitivity, and specificity to account for class imbalance.<n>A bilingual analysis of Aya-expanse-8B revealed that translating Persian transcripts to English improved sensitivity, reduced missing outputs, and boosted metrics robust to class imbalance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source small language models (SLMs) -- Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Llama-3.2-3B-Instruct, Qwen2.5-1.5B-Instruct, and Gemma-3-1B-it -- for binary extraction of 13 clinical features from 1,221 anonymized Persian transcripts collected at a cancer palliative care call center. Using a few-shot prompting strategy without fine-tuning, models were assessed on macro-averaged F1-score, Matthews Correlation Coefficient (MCC), sensitivity, and specificity to account for class imbalance. Qwen2.5-7B-Instruct achieved the highest overall performance (median macro-F1: 0.899; MCC: 0.797), while Gemma-3-1B-it showed the weakest results. Larger models (7B--8B parameters) consistently outperformed smaller counterparts in sensitivity and MCC. A bilingual analysis of Aya-expanse-8B revealed that translating Persian transcripts to English improved sensitivity, reduced missing outputs, and boosted metrics robust to class imbalance, though at the cost of slightly lower specificity and precision. Feature-level results showed reliable extraction of physiological symptoms across most models, whereas psychological complaints, administrative requests, and complex somatic features remained challenging. These findings establish a practical, privacy-preserving blueprint for deploying open-source SLMs in multilingual clinical NLP settings with limited infrastructure and annotation resources, and highlight the importance of jointly optimizing model scale and input language strategy for sensitive healthcare applications.
Related papers
- Prompt Sensitivity and Answer Consistency of Small Open-Source Large Language Models on Clinical Question Answering: Implications for Low-Resource Healthcare Deployment [0.0]
Small open-source language models are gaining attention for healthcare applications in low-resource settings.<n>We evaluate five open-source models (Gemma 2 2B, Phi-3 Mini 3.8B, Llama 3.2 3B, Mistral 7B, and Meditron-7B) across three clinical question answering datasets.
arXiv Detail & Related papers (2026-03-01T04:37:48Z) - MedPT: A Massive Medical Question Answering Dataset for Brazilian-Portuguese Speakers [35.41469674626373]
We introduce MedPT, the first large-scale, real-world corpus for Brazilian Portuguese.<n>It comprises 384,095 authentic question-answer pairs from patient-doctor interactions.<n>Our analysis reveals its thematic breadth (3,200 topics) and unique linguistic properties, like the natural asymmetry in patient-doctor communication.
arXiv Detail & Related papers (2025-11-14T21:13:28Z) - Multilingual Lexical Feature Analysis of Spoken Language for Predicting Major Depression Symptom Severity [5.950020142175479]
We describe an initial exploratory analysis of longitudinal speech data and PHQ-8 assessments from 5,836 recordings of 586 participants in the UK, Netherlands, and Spain.<n>We sought to identify interpretable lexical features associated with MDD symptom severity with linear mixed-effects modelling.<n>In English, MDD symptom severity was associated with 7 features including lexical diversity measures and absolutist language.<n>In Dutch, associations were observed with words per sentence and positive word frequency; no associations were observed in recordings collected in Spain.
arXiv Detail & Related papers (2025-11-10T12:03:16Z) - Arabic Large Language Models for Medical Text Generation [0.5483130283061118]
This study proposes an approach that fine-tunes large language models (LLMs) for Arabic medical text generation.<n>The system is designed to assist patients by providing accurate medical advice, diagnoses, drug recommendations, and treatment plans based on user input.
arXiv Detail & Related papers (2025-09-12T09:37:26Z) - Relic: Enhancing Reward Model Generalization for Low-Resource Indic Languages with Few-Shot Examples [58.55904048776596]
Most open-source multilingual reward models are primarily trained on preference datasets in high-resource languages.<n>We propose RELIC, a novel in-context learning framework for reward modeling in low-resource Indic languages.
arXiv Detail & Related papers (2025-06-19T17:56:16Z) - A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context [0.9074663948713616]
Mental health disorders pose a growing public health concern in the Arab world.<n>This study comprehensively evaluates 8 large language models (LLMs) on diverse mental health datasets.
arXiv Detail & Related papers (2025-01-12T16:17:25Z) - SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment [78.4550589538805]
We propose an efficient multilingual reasoning alignment approach that precisely identifies and fine-tunes the layers responsible for handling multilingualism.<n> Experimental results show that our method, SLAM, only tunes 6 layers' feed-forward sub-layers including 6.5-8% of all parameters within 7B and 13B LLMs.
arXiv Detail & Related papers (2025-01-07T10:29:43Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking [68.77659513993507]
We present a simple and effective N-best re-ranking approach to improve multilingual ASR accuracy.
Our results show spoken language identification accuracy improvements of 8.7% and 6.1%, respectively, and word error rates which are 3.3% and 2.0% lower on these benchmarks.
arXiv Detail & Related papers (2024-09-27T03:31:32Z) - Evaluating Large Language Models for Radiology Natural Language
Processing [68.98847776913381]
The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP)
This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports.
arXiv Detail & Related papers (2023-07-25T17:57:18Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.