Related papers: Local Obfuscation by GLINER for Impartial Context Aware Lineage: Development and evaluation of PII Removal system

Local Obfuscation by GLINER for Impartial Context Aware Lineage: Development and evaluation of PII Removal system

URL: http://arxiv.org/abs/2510.19346v1
Date: Wed, 22 Oct 2025 08:12:07 GMT
Title: Local Obfuscation by GLINER for Impartial Context Aware Lineage: Development and evaluation of PII Removal system
Authors: Prakrithi Shivaprakash, Lekhansh Shukla, Animesh Mukherjee, Prabhat Chand, Pratima Murthy,
Abstract summary: LOGICAL is an efficient, locally deployable PII removal system built on a fine-tuned GLiNER model.<n>Fine-tuned GLiNER model achieved superior performance, with an overall micro-average F1-score of 0.980.<n> LOGICAL correctly sanitised 95% of documents completely, compared to 64% for the next-best solution.
Score: 3.823253824850948
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Removing Personally Identifiable Information (PII) from clinical notes in Electronic Health Records (EHRs) is essential for research and AI development. While Large Language Models (LLMs) are powerful, their high computational costs and the data privacy risks of API-based services limit their use, especially in low-resource settings. To address this, we developed LOGICAL (Local Obfuscation by GLINER for Impartial Context-Aware Lineage), an efficient, locally deployable PII removal system built on a fine-tuned Generalist and Lightweight Named Entity Recognition (GLiNER) model. We used 1515 clinical documents from a psychiatric hospital's EHR system. We defined nine PII categories for removal. A modern-gliner-bi-large-v1.0 model was fine-tuned on 2849 text instances and evaluated on a test set of 376 instances using character-level precision, recall, and F1-score. We compared its performance against Microsoft Azure NER, Microsoft Presidio, and zero-shot prompting with Gemini-Pro-2.5 and Llama-3.3-70B-Instruct. The fine-tuned GLiNER model achieved superior performance, with an overall micro-average F1-score of 0.980, significantly outperforming Gemini-Pro-2.5 (F1-score: 0.845). LOGICAL correctly sanitised 95% of documents completely, compared to 64% for the next-best solution. The model operated efficiently on a standard laptop without a dedicated GPU. However, a 2% entity-level false negative rate underscores the need for human-in-the-loop validation across all tested systems. Fine-tuned, specialised transformer models like GLiNER offer an accurate, computationally efficient, and secure solution for PII removal from clinical notes. This "sanitisation at the source" approach is a practical alternative to resource-intensive LLMs, enabling the creation of de-identified datasets for research and AI development while preserving data privacy, particularly in resource-constrained environments.

Related papers

Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of Traditional ML and LLM Approaches [8.864020712680976]
We introduce an annotated corpus of 6,393 radiology reports from 586 patients, each labeled for follow-up imaging status.<n>We compare traditional machine-learning classifiers, including logistic regression (LR), support vector machines (SVM), Longformer, and a fully fine-tuned Llama3-8B-Instruct.<n>To evaluate generative LLMs, we tested GPT-4o and the open-source GPT-OSS-20B under two configurations.
arXiv Detail & Related papers (2025-11-14T20:55:44Z)
U-Mamba2-SSL for Semi-Supervised Tooth and Pulp Segmentation in CBCT [44.3806898357896]
We propose U-Mamba2-SSL, a novel semi-supervised learning framework that builds on the U-Mamba2 model and employs a multi-stage training strategy.<n>U-Mamba2-SSL achieved an average score of 0.789 and a DSC of 0.917 on the hidden test set, achieving first place in Task 1 of the STSR 2025 challenge.
arXiv Detail & Related papers (2025-09-24T14:19:33Z)
Enhanced Predictive Modeling for Hazardous Near-Earth Object Detection: A Comparative Analysis of Advanced Resampling Strategies and Machine Learning Algorithms in Planetary Risk Assessment [0.0]
This study evaluates the performance of several machine learning models for predicting hazardous near-Earth objects (NEOs) through a binary classification framework.<n> RFC and GBC performed the best, both with an impressive F2-score of 0.987 and 0.896, respectively.
arXiv Detail & Related papers (2025-08-20T22:50:00Z)
QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation [51.393569044134445]
Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification.<n> Extending RLVR to automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges.<n>We introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs.
arXiv Detail & Related papers (2025-05-30T03:51:06Z)
Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification [0.12369842801624054]
We propose a hybrid NLP framework that integrates rule-based filtering, a Support Vector Machine (SVM) classifier, and a BERT-based model.<n>We applied this framework in a dementia identification case study involving 4.9 million veterans with incident hypertension, analyzing 2.1 billion clinical notes.
arXiv Detail & Related papers (2025-04-16T21:24:38Z)
VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences.<n>Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z)
Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP [5.297964922424743]
We develop state-of-the-art assertion detection models.<n>We evaluate these models against cloud-based commercial API solutions, the legacy rule-based NegEx approach, and GPT-4o.
arXiv Detail & Related papers (2025-03-21T10:18:47Z)
Process-Supervised Reward Models for Verifying Clinical Note Generation: A Scalable Approach Guided by Domain Expertise [14.052630186550628]
Process-supervised reward models (PRMs) excel at providing step-by-step verification for large language model (LLM) outputs in domains like mathematics and coding.<n>We introduce a novel framework for training PRMs to deliver step-level reward signals for LLM-generated clinical notes.
arXiv Detail & Related papers (2024-12-17T06:24:34Z)
Zero-Shot ATC Coding with Large Language Models for Clinical Assessments [40.72273945475456]
Manual assignment of Anatomical Therapeutic Chemical codes to prescription records is a significant bottleneck.<n>We develop a practical approach using locally deployable large language models (LLMs)<n>We evaluate our approach using GPT-4o as an accuracy ceiling and focus development on open-source Llama models suitable for privacy-sensitive deployment.
arXiv Detail & Related papers (2024-12-10T18:43:02Z)
Phikon-v2, A large and public feature extractor for biomarker prediction [42.52549987351643]
We train a vision transformer using DINOv2 and publicly release one iteration of this model for further experimentation, coined Phikon-v2. While trained on publicly available histology slides, Phikon-v2 surpasses our previously released model (Phikon) and performs on par with other histopathology foundation models (FM) trained on proprietary data.
arXiv Detail & Related papers (2024-09-13T20:12:29Z)
Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options. The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z)
Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition [6.917786124918387]
We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs. We examine the impact of an additional CRF layer on such models to encourage contextual learning.
arXiv Detail & Related papers (2022-10-23T16:27:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.