Related papers: HiLWS: A Human-in-the-Loop Weak Supervision Framework for Curating Clinical and Home Video Data for Neurological Assessment

HiLWS: A Human-in-the-Loop Weak Supervision Framework for Curating Clinical and Home Video Data for Neurological Assessment

URL: http://arxiv.org/abs/2509.10557v1
Date: Tue, 09 Sep 2025 22:30:25 GMT
Title: HiLWS: A Human-in-the-Loop Weak Supervision Framework for Curating Clinical and Home Video Data for Neurological Assessment
Authors: Atefeh Irani, Maryam S. Mirian, Alex Lassooij, Reshad Hosseini, Hadi Moradi, Martin J. McKeown,
Abstract summary: We present HiLWS, a cascaded human-in-the-loop weak supervision framework for curating and annotating hand motor task videos.<n>HiLWS employs a novel cascaded approach, first applies weak supervision to aggregate expert-provided annotations into probabilistic labels.<n>The complete pipeline includes quality filtering, optimized pose estimation, and task-specific segment extraction.
Score: 3.920493604448087
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video-based assessment of motor symptoms in conditions such as Parkinson's disease (PD) offers a scalable alternative to in-clinic evaluations, but home-recorded videos introduce significant challenges, including visual degradation, inconsistent task execution, annotation noise, and domain shifts. We present HiLWS, a cascaded human-in-the-loop weak supervision framework for curating and annotating hand motor task videos from both clinical and home settings. Unlike conventional single-stage weak supervision methods, HiLWS employs a novel cascaded approach, first applies weak supervision to aggregate expert-provided annotations into probabilistic labels, which are then used to train machine learning models. Model predictions, combined with expert input, are subsequently refined through a second stage of weak supervision. The complete pipeline includes quality filtering, optimized pose estimation, and task-specific segment extraction, complemented by context-sensitive evaluation metrics that assess both visual fidelity and clinical relevance by prioritizing ambiguous cases for expert review. Our findings reveal key failure modes in home recorded data and emphasize the importance of context-sensitive curation strategies for robust medical video analysis.

Related papers

Benchmarking Egocentric Clinical Intent Understanding Capability for Medical Multimodal Large Language Models [48.95516224614331]
We introduce MedGaze-Bench, the first benchmark leveraging clinician gaze as a Cognitive Cursor to assess intent understanding across surgery, emergency simulation, and diagnostic interpretation.<n>Our benchmark addresses three fundamental challenges: visual homogeneity of anatomical structures, strict temporal-causal dependencies in clinical, and implicit adherence to safety protocols.
arXiv Detail & Related papers (2026-01-11T02:20:40Z)
When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos [0.43981305860983705]
We systematically analyze the failure modes of point-based tracking in laparoscopic cholecystectomy videos.<n>Our results show that point-based tracking is competitive for surgical tools but consistently underperforms for anatomical targets.
arXiv Detail & Related papers (2025-10-02T15:06:49Z)
Benchmarking and Mitigate Sycophancy in Medical Vision-Language Models [21.353225217216252]
Vision language models often exhibit sycophantic behavior prioritizing alignment with user phrasing social cues or perceived authority over evidence based reasoning.<n>This study evaluate clinical sycophancy in medical visual question answering through a novel clinically grounded benchmark.
arXiv Detail & Related papers (2025-09-26T07:02:22Z)
STROKEVISION-BENCH: A Multimodal Video And 2D Pose Benchmark For Tracking Stroke Recovery [41.140934816875806]
We introduce StrokeVision-Bench, the first-ever dedicated dataset of stroke patients performing clinically structured block transfer tasks.<n>StrokeVision-Bench comprises 1,000 annotated videos categorized into four clinically meaningful action classes.<n>We benchmark several state-of-the-art video action recognition and skeleton-based action classification methods to establish performance baselines.
arXiv Detail & Related papers (2025-09-02T18:48:37Z)
Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z)
Interpretable and Granular Video-Based Quantification of Motor Characteristics from the Finger Tapping Test in Parkinson Disease [1.001970681951346]
This paper introduces a computer vision-based method for quantifying PD motor characteristics from video recordings.<n>Four sets of clinically relevant features are proposed to characterize hypokinesia, bradykinesia, sequence effect, and hesitation-halts.<n>We have used these features to train machine learning classifiers to estimate the Movement Disorder Society Unified Parkinson Disease Rating Scale (MDS-UPDRS) finger-tapping score.
arXiv Detail & Related papers (2025-06-19T12:49:06Z)
AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs [0.0]
gait impairment plays an important role in early diagnosis, disease monitoring, and treatment evaluation for neurodegenerative diseases.<n>Recent deep learning-based approaches have consistently improved classification accuracies, but they often lack interpretability.<n>We introduce AGIR, a novel pipeline consisting of a pre-trained VQ-VAE motion tokenizer and a Large Language Model (LLM) fine-tuned over pairs of motion tokens.
arXiv Detail & Related papers (2025-03-23T17:12:16Z)
Generalizable automated ischaemic stroke lesion segmentation with vision transformers [0.7400397057238803]
Diffusion-weighted imaging (DWI) provides the highest expressivity in ischemic stroke.<n>Current U-Net-based models therefore underperform, a problem accentuated by inadequate evaluation metrics.<n>Here, we present a high-performance DWI lesion segmentation tool addressing these challenges.
arXiv Detail & Related papers (2025-02-10T19:00:00Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
Towards Stroke Patients' Upper-limb Automatic Motor Assessment Using Smartwatches [5.132618393976799]
We aim to design an upper-limb assessment pipeline for stroke patients using smartwatches. Our main target is to automatically detect and recognize four key movements inspired by the Fugl-Meyer assessment scale.
arXiv Detail & Related papers (2022-12-09T14:00:49Z)
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem. Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools. We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z)
Factored Attention and Embedding for Unstructured-view Topic-related Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation. The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z)
Unsupervised deep learning techniques for powdery mildew recognition based on multispectral imaging [63.62764375279861]
This paper presents a deep learning approach to automatically recognize powdery mildew on cucumber leaves. We focus on unsupervised deep learning techniques applied to multispectral imaging data. We propose the use of autoencoder architectures to investigate two strategies for disease detection.
arXiv Detail & Related papers (2021-12-20T13:29:13Z)
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks. Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets. We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.