Anonpsy: A Graph-Based Framework for Structure-Preserving De-identification of Psychiatric Narratives
- URL: http://arxiv.org/abs/2601.13503v1
- Date: Tue, 20 Jan 2026 01:37:44 GMT
- Title: Anonpsy: A Graph-Based Framework for Structure-Preserving De-identification of Psychiatric Narratives
- Authors: Kyung Ho Lim, Byung-Hoon Kim,
- Abstract summary: We introduce Anonpsy, a de-identification framework that reformulates the task as graph-guided semantic rewriting.<n>Anonpsy converts each narrative into a semantic graph encoding clinical entities, temporal anchors, and typed relations.<n>It preserves diagnostic fidelity while achieving consistently low re-identification risk under expert, semantic, and GPT-5-based evaluations.
- Score: 1.4652274443334974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Psychiatric narratives encode patient identity not only through explicit identifiers but also through idiosyncratic life events embedded in their clinical structure. Existing de-identification approaches, including PHI masking and LLM-based synthetic rewriting, operate at the text level and offer limited control over which semantic elements are preserved or altered. We introduce Anonpsy, a de-identification framework that reformulates the task as graph-guided semantic rewriting. Anonpsy (1) converts each narrative into a semantic graph encoding clinical entities, temporal anchors, and typed relations; (2) applies graph-constrained perturbations that modify identifying context while preserving clinically essential structure; and (3) regenerates text via graph-conditioned LLM generation. Evaluated on 90 clinician-authored psychiatric case narratives, Anonpsy preserves diagnostic fidelity while achieving consistently low re-identification risk under expert, semantic, and GPT-5-based evaluations. Compared with a strong LLM-only rewriting baseline, Anonpsy yields substantially lower semantic similarity and identifiability. These results demonstrate that explicit structural representations combined with constrained generation provide an effective approach to de-identification for psychiatric narratives.
Related papers
- Structure Observation Driven Image-Text Contrastive Learning for Computed Tomography Report Generation [51.509572354327986]
This work introduces a novel two-stage (structure- and report-learning) framework tailored for Computed Tomography Report Generation (CTRG)<n>In the first stage, a set of learnable structure-specific visual queries observe corresponding structures in a CT image. The resulting observation tokens are contrasted with structure-specific textual features extracted from the accompanying radiology report with a structure-wise image-text contrastive loss.<n>In the second stage, the visual structure queries are frozen and used to select the critical image patch embeddings depicting each anatomical structure, minimizing distractions from irrelevant areas while reducing memory consumption.
arXiv Detail & Related papers (2026-03-05T07:07:07Z) - AgenticSum: An Agentic Inference-Time Framework for Faithful Clinical Text Summarization [6.99563009617414]
We present AgenticSum, an inference-time framework that separates context selection, generation, verification, and targeted correction to reduce hallucinated content.<n>We evaluate AgenticSum on two public datasets, using reference-based metrics, LLM-as-a-judge assessment, and human evaluation.<n>Our results indicate that structured, agentic design with targeted correction offers an effective inference time solution to improve clinical note summarization.
arXiv Detail & Related papers (2026-02-23T16:49:37Z) - Beyond surface form: A pipeline for semantic analysis in Alzheimer's Disease detection from spontaneous speech [4.447462467582385]
Alzheimer's Disease (AD) is a progressive neurodegenerative condition that adversely affects cognitive abilities.<n>Language models show promise as a basis for screening tools for AD, but their limited interpretability poses a challenge.<n>We introduce a novel approach where texts surface forms are transformed by altering syntax and vocabulary while preserving semantic content.
arXiv Detail & Related papers (2025-12-15T18:59:49Z) - Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z) - CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs [0.1578515540930834]
We introduce CLI-RAG (Clinically Informed Retrieval-Augmented Generation), a domain-specific framework for structured and clinically grounded text generation.<n>It incorporates a novel hierarchical chunking strategy that respects clinical document structure and introduces a task-specific dual-stage retrieval mechanism.<n>We apply the system to generate structured progress notes for individual hospital visits using 15 clinical note types from the MIMIC-III dataset.
arXiv Detail & Related papers (2025-07-09T10:13:38Z) - DeIDClinic: A Multi-Layered Framework for De-identification of Clinical Free-text Data [6.473402241020136]
This work enhances the MASK framework by integrating ClinicalBERT, a deep learning model specifically fine-tuned on clinical texts.
The system effectively identifies and either redacts or replaces sensitive identifiable entities within clinical documents.
A risk assessment feature has also been developed, which analyses the uniqueness of context within documents to classify them into risk levels.
arXiv Detail & Related papers (2024-10-02T15:16:02Z) - Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries [56.31117605097345]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.<n>Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.<n>AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z) - Pyclipse, a library for deidentification of free-text clinical notes [0.40329768057075643]
We propose the pyclipse framework to streamline the comparison of deidentification algorithms.
Pyclipse serves as a single interface for running open-source deidentification algorithms on local clinical data.
We find that algorithm performance consistently falls short of the results reported in the original papers, even when evaluated on the same benchmark dataset.
arXiv Detail & Related papers (2023-11-05T19:56:58Z) - De-identification of Unstructured Clinical Texts from Sequence to
Sequence Perspective [8.615499133294097]
We formulate the de-identification problem as a sequence to sequence learning problem instead of a token classification problem.
Early experimentation of our proposed approach achieved 98.91% recall rate on i2b2 dataset.
arXiv Detail & Related papers (2021-08-18T04:48:58Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.