DSTED: Decoupling Temporal Stabilization and Discriminative Enhancement for Surgical Workflow Recognition
- URL: http://arxiv.org/abs/2512.19387v1
- Date: Mon, 22 Dec 2025 13:36:26 GMT
- Title: DSTED: Decoupling Temporal Stabilization and Discriminative Enhancement for Surgical Workflow Recognition
- Authors: Yueyao Chen, Kai-Ni Wang, Dario Tayupo, Arnaud Huaulm'e, Krystel Nyangoh Timoh, Pierre Jannin, Qi Dou,
- Abstract summary: Surgical workflow recognition enables context-aware assistance and skill assessment in computer-assisted interventions.<n>Current methods suffer from two critical challenges: prediction jitter across consecutive frames and poor discrimination of ambiguous phases.<n>This paper aims to develop a stable framework by selectively propagating reliable historical information and explicitly modeling uncertainty for hard sample enhancement.
- Score: 13.734575975699963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Purpose: Surgical workflow recognition enables context-aware assistance and skill assessment in computer-assisted interventions. Despite recent advances, current methods suffer from two critical challenges: prediction jitter across consecutive frames and poor discrimination of ambiguous phases. This paper aims to develop a stable framework by selectively propagating reliable historical information and explicitly modeling uncertainty for hard sample enhancement. Methods: We propose a dual-pathway framework DSTED with Reliable Memory Propagation (RMP) and Uncertainty-Aware Prototype Retrieval (UPR). RMP maintains temporal coherence by filtering and fusing high-confidence historical features through multi-criteria reliability assessment. UPR constructs learnable class-specific prototypes from high-uncertainty samples and performs adaptive prototype matching to refine ambiguous frame representations. Finally, a confidence-driven gate dynamically balances both pathways based on prediction certainty. Results: Our method achieves state-of-the-art performance on AutoLaparo-hysterectomy with 84.36% accuracy and 65.51% F1-score, surpassing the second-best method by 3.51% and 4.88% respectively. Ablations reveal complementary gains from RMP (2.19%) and UPR (1.93%), with synergistic effects when combined. Extensive analysis confirms substantial reduction in temporal jitter and marked improvement on challenging phase transitions. Conclusion: Our dual-pathway design introduces a novel paradigm for stable workflow recognition, demonstrating that decoupling the modeling of temporal consistency and phase ambiguity yields superior performance and clinical applicability.
Related papers
- Robust Uncertainty Estimation under Distribution Shift via Difference Reconstruction [2.920726125254978]
Estimating uncertainty in deep learning models is critical for reliable decision-making in high-stakes applications such as medical imaging.<n>Prior research has established that the difference between an input sample and its reconstructed version can serve as a useful proxy for uncertainty.<n>We propose Difference Reconstruction Uncertainty Estimation (DRUE), a method that mitigates this limitation by reconstructing inputs from two intermediate layers and measuring the discrepancy between their outputs as the uncertainty score.
arXiv Detail & Related papers (2026-01-27T08:23:00Z) - Agentic Uncertainty Quantification [76.94013626702183]
We propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals.<n>Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary.
arXiv Detail & Related papers (2026-01-22T07:16:26Z) - Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning [78.92934995292113]
We propose a Confidence-Aware Asymmetric Learning (CAL) framework, which balances confidence across known and novel forgery types.<n>CAL consistently outperforms previous methods, achieving new state-of-the-art performance on both known and novel forgery attribution.
arXiv Detail & Related papers (2025-12-14T12:31:28Z) - Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models [64.92045568376705]
Coherent Contextual Decoding (CCD) is a novel inference framework built upon two core innovations.<n>CCD employs a trajectory rectification mechanism that leverages historical context to enhance sequence coherence.<n>Instead of rigid allocations based on diffusion steps, we introduce an adaptive sampling strategy that dynamically adjusts the unmasking budget for each step.
arXiv Detail & Related papers (2025-11-26T09:49:48Z) - Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution [42.85462513661566]
We present a lightweight, transparent multimodal ensemble that fuses physiological time-series measurements with unstructured clinical notes from the first 48 hours of an ICU stay.<n>A logistic regression model combines predictions from two modality-specific models: a bidirectional LSTM for vitals and a finetuned ClinicalModernBERT transformer for notes.<n>On the MIMIC-III benchmark, our late-fusion ensemble improves discrimination over the best single model while maintaining well-calibrated predictions.
arXiv Detail & Related papers (2025-11-19T20:11:49Z) - Efficient Thought Space Exploration through Strategic Intervention [54.35208611253168]
We propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components.<n>The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), which dynamically identifies intervention points.<n> Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs.
arXiv Detail & Related papers (2025-11-13T07:26:01Z) - MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z) - Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation [0.0]
Large Language Models suffer from hallucination, generating plausible yet factually incorrect content.<n>Current mitigation strategies focus on post-generation correction, which is computationally expensive and fails to prevent unreliable content generation.<n>We propose a confidence-aware routing system that proactively assesses model uncertainty before generation and redirects queries based on estimated reliability.
arXiv Detail & Related papers (2025-09-23T18:34:20Z) - Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations [67.35596444651037]
Vision-language models (VLMs) exhibit remarkable zero-shot capabilities but struggle with distribution shifts in downstream tasks when labeled data is unavailable.<n>We propose a Reliable Test-time Adaptation (ReTA) method that enhances reliability from two perspectives.
arXiv Detail & Related papers (2025-07-13T05:37:33Z) - Effective Probabilistic Time Series Forecasting with Fourier Adaptive Noise-Separated Diffusion [13.640501360968782]
FALDA is a novel probabilistic framework for time series forecasting.<n>It incorporates a component-specific architecture, enabling tailored modeling of individual temporal components.<n>Experiments on six real-world benchmarks demonstrate that FALDA consistently outperforms existing probabilistic forecasting approaches.<n>FALDA also achieves superior overall performance compared to state-of-the-art (SOTA) point forecasting approaches, with improvements of up to 9%.
arXiv Detail & Related papers (2025-05-16T14:32:34Z) - HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction [25.739068829471297]
We propose an adaptive bimodal prediction framework that flexibly supports single- or dual-modality inputs.<n>Design dramatically improves H&E-only accuracy from 71.44% to 94.25%, 95.09% with full dual-modality inputs, and maintains 90.28% reliability under single-modality conditions.
arXiv Detail & Related papers (2025-04-12T11:24:06Z) - CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging [5.356623181327855]
Class-Aware Relative Feature-based method (CARef) and Class-Aware Decoupled Relative Feature-based method (CADRef) are proposed.<n>We show that both proposed methods exhibit effectiveness and robustness in OOD detection compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-03-01T03:23:10Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.