Related papers: SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL

SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL

URL: http://arxiv.org/abs/2601.09136v1
Date: Wed, 14 Jan 2026 04:21:07 GMT
Title: SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL
Authors: Lijun Liu, Linwei Chen, Zhishou Zhang, Meng Tian, Hengfu Cui, Ruiyang Li, Zhaocheng Liu, Qiang Ju, Qianxi Li, Hong-Yu Zhou,
Abstract summary: General-purpose Large Vision-Language Models (LVLMs) often falter in dermatology due to "diffuse attention"<n>We introduce SkinFlow, a framework that treats diagnosis as an optimization of visual information transmission efficiency.
Score: 26.10211846938172
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: General-purpose Large Vision-Language Models (LVLMs), despite their massive scale, often falter in dermatology due to "diffuse attention" - the inability to disentangle subtle pathological lesions from background noise. In this paper, we challenge the assumption that parameter scaling is the only path to medical precision. We introduce SkinFlow, a framework that treats diagnosis as an optimization of visual information transmission efficiency. Our approach utilizes a Virtual-Width Dynamic Vision Encoder (DVE) to "unfold" complex pathological manifolds without physical parameter expansion, coupled with a two-stage Reinforcement Learning strategy. This strategy sequentially aligns explicit medical descriptions (Stage I) and reconstructs implicit diagnostic textures (Stage II) within a constrained semantic space. Furthermore, we propose a clinically grounded evaluation protocol that prioritizes diagnostic safety and hierarchical relevance over rigid label matching. Empirical results are compelling: our 7B model establishes a new state-of-the-art on the Fitzpatrick17k benchmark, achieving a +12.06% gain in Top-1 accuracy and a +28.57% boost in Top-6 accuracy over the massive general-purpose models (e.g., Qwen3VL-235B and GPT-5.2). These findings demonstrate that optimizing geometric capacity and information flow yields superior diagnostic reasoning compared to raw parameter scaling.

Related papers

Leveraging Synthetic Priors for Monocular Depth Estimation in Specular Surgical Environments [0.0]
Existing self-supervised methods often suffer from boundary collapse on thin surgical tools and transparent surfaces.<n>In this work, we address this by leveraging the high-fidelity synthetic priors of the Depth Anything V2 architecture.<n>Our approach establishes a new state-of-the-art, achieving an accuracy ( 1.25) of 98.1% and reducing Squared Relative Error by over 17% compared to established baselines.
arXiv Detail & Related papers (2025-12-29T17:29:42Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
MM-UNet: Morph Mamba U-shaped Convolutional Networks for Retinal Vessel Segmentation [21.90972169495466]
MM-UNet is a novel architecture tailored for efficient retinal vessel segmentation.<n>It incorporates Morph Mamba Convolution layers, which replace pointwise convolutions to enhance branching topological perception.<n>It achieves F1-score gains of 1.64 % on DRIVE and 1.25 % on STARE, demonstrating its effectiveness and advancement.
arXiv Detail & Related papers (2025-11-04T02:18:25Z)
Towards Explainable Skin Cancer Classification: A Dual-Network Attention Model with Lesion Segmentation and Clinical Metadata Fusion [1.503974529275767]
We propose a dual-encoder attention-based framework to enhance skin lesion classification in terms of both accuracy and interpretability.<n>A novel Deep-UNet architecture with Dual Attention Gates (DAG) and Atrous Spatial Pyramid Pooling (ASPP) is first employed to segment lesions.<n>We evaluate our approach on the HAM10000 dataset and the ISIC 2018 and 2019 challenges.
arXiv Detail & Related papers (2025-10-20T17:33:51Z)
A Novel Multi-branch ConvNeXt Architecture for Identifying Subtle Pathological Features in CT Scans [1.2461503242570642]
This paper introduces a novel multi-branch ConvNeXt architecture designed specifically for the nuanced challenges of medical image analysis.<n>The proposed model incorporates a rigorous end-to-end pipeline, from meticulous data preprocessing to augmentation to a disciplined two-phase training strategy.<n> Experimental results demonstrate a superior performance on the validation set, achieving a final ROC-AUC of 0.9937, a validation accuracy of 0.9757, and an F1-score of 0.9825 for COVID-19 cases.
arXiv Detail & Related papers (2025-10-10T08:00:46Z)
UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography [0.0]
Current approaches typically process images uniformly, limiting their ability to detect localized abnormalities.<n>We introduce UGPL, an uncertainty-guided progressive learning framework that performs a global-to-local analysis.<n> Experiments across three CT datasets demonstrate that UGPL consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-07-18T17:30:56Z)
ClipGS: Clippable Gaussian Splatting for Interactive Cinematic Visualization of Volumetric Medical Data [51.095474325541794]
We introduce ClipGS, an innovative Gaussian splatting framework with the clipping plane supported, for interactive cinematic visualization of medical data.<n>We validate our method on five volumetric medical data, and reach an average 36.635 PSNR rendering quality with 156 FPS and 16.1 MB model size.
arXiv Detail & Related papers (2025-07-09T08:24:28Z)
GS-TransUNet: Integrated 2D Gaussian Splatting and Transformer UNet for Accurate Skin Lesion Analysis [44.99833362998488]
We present a novel approach that combines 2D Gaussian splatting with the Transformer UNet architecture for automated skin cancer diagnosis.<n>Our findings illustrate significant advancements in the precision of segmentation and classification.<n>This integration sets new benchmarks in the field and highlights the potential for further research into multi-task medical image analysis methodologies.
arXiv Detail & Related papers (2025-02-23T23:28:47Z)
KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation [46.57880203321858]
We propose a novel network (KaLDeX) for vascular segmentation leveraging a Kalman filter based linear deformable cross attention (LDCA) module. Our approach is based on two key components: Kalman filter (KF) based linear deformable convolution (LD) and cross-attention (CA) modules. The proposed method is evaluated on retinal fundus image datasets (DRIVE, CHASE_BD1, and STARE) as well as the 3mm and 6mm of the OCTA-500 dataset.
arXiv Detail & Related papers (2024-10-28T16:00:42Z)
Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z)
Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models [48.87160158792048]
We introduce a cascaded amortized latent diffusion model (CA-LDM) that can synthesis high-resolution OCT volumes in a memory-efficient way. Experiments on a public high-resolution OCT dataset show that our synthetic data have realistic high-resolution and global features, surpassing the capabilities of existing methods.
arXiv Detail & Related papers (2024-05-26T10:58:22Z)
Improving Classification Model Performance on Chest X-Rays through Lung Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations. Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.