Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room
- URL: http://arxiv.org/abs/2602.02850v2
- Date: Mon, 09 Feb 2026 17:27:39 GMT
- Title: Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room
- Authors: Keqi Chen, Vinkle Srivastav, Armine Vardazaryan, Cindy Rolland, Didier Mutter, Nicolas Padoy,
- Abstract summary: We propose a novel multi-view video anonymization framework consisting of whole-body person detection and whole-body pose estimation.<n>Our core strategy is to enhance the single-view detector by "retrieving" false negatives using temporal and multi-view context.<n>Experiments on the 4D-OR dataset of simulated surgeries and our dataset of real surgeries show the effectiveness of our approach achieving over 97% recall.
- Score: 11.757426920071175
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Privacy preservation is a prerequisite for using video data in Operating Room (OR) research. Effective anonymization relies on the exhaustive localization of every individual; even a single missed detection necessitates extensive manual correction. However, existing approaches face two critical scalability bottlenecks: (1) they usually require manual annotations of each new clinical site for high accuracy; (2) while multi-camera setups have been widely adopted to address single-view ambiguity, camera calibration is typically required whenever cameras are repositioned. To address these problems, we propose a novel self-supervised multi-view video anonymization framework consisting of whole-body person detection and whole-body pose estimation, without annotation or camera calibration. Our core strategy is to enhance the single-view detector by "retrieving" false negatives using temporal and multi-view context, and conducting self-supervised domain adaptation. We first run an off-the-shelf whole-body person detector in each view with a low-score threshold to gather candidate detections. Then, we retrieve the low-score false negatives that exhibit consistency with the high-score detections via tracking and self-supervised uncalibrated multi-view association. These recovered detections serve as pseudo labels to iteratively fine-tune the whole-body detector. Finally, we apply whole-body pose estimation on each detected person, and fine-tune the pose model using its own high-score predictions. Experiments on the 4D-OR dataset of simulated surgeries and our dataset of real surgeries show the effectiveness of our approach achieving over 97% recall. Moreover, we train a real-time whole-body detector using our pseudo labels, achieving comparable performance and highlighting our method's practical applicability. Code will be available at https://github.com/CAMMA-public/OR_anonymization.
Related papers
- ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors [58.45131932883374]
We propose a fully self-supervised approach to detect deepfakes in videos.<n>Our model computes the identity distances between suspected videos and personalized subjects via diffusion reconstruction errors.<n>Our method is highly robust to corruptions such as blur and compression, highlighting the applicability in real-world face forgery detection.
arXiv Detail & Related papers (2026-01-05T18:59:54Z) - Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding [56.369026347458835]
We introduce a novel formulation of visual privacy preservation for video foundation models that operates entirely in the latent space.<n>Current privacy preservation methods on input-pixel-level anonymization require retraining the entire utility video model.<n>A lightweight Anonym Adapter Module (AAM) removes private information from video features while retaining general task utility.
arXiv Detail & Related papers (2025-11-11T18:56:27Z) - No Identity, no problem: Motion through detection for people tracking [48.708733485434394]
We propose exploiting motion clues while providing supervision only for the detections.
Our algorithm predicts detection heatmaps at two different times, along with a 2D motion estimate between the two images.
We show that our approach delivers state-of-the-art results for single- and multi-view multi-target tracking on the MOT17 and WILDTRACK datasets.
arXiv Detail & Related papers (2024-11-25T15:13:17Z) - Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames.
Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs.
This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z) - Reducing False Alarms in Video Surveillance by Deep Feature Statistical
Modeling [16.311150636417256]
We develop a method-a weakly supervised a-contrario validation process, based on high dimensional statistical modeling of deep features.
Experimental results reveal that the proposed a-contrario validation is able to largely reduce the number of false alarms at both pixel and object levels.
arXiv Detail & Related papers (2023-07-09T12:37:17Z) - Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and
Spatio-Temporal Consistency ID Re-Assignment [22.531044994763487]
We propose a novel multi-camera multiple people tracking method that uses anchor clustering-guided for cross-camera reassigning.
Our approach aims to improve accuracy of tracking by identifying key features that are unique to every individual.
The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data.
arXiv Detail & Related papers (2023-04-19T07:38:15Z) - Self-Supervised Person Detection in 2D Range Data using a Calibrated
Camera [83.31666463259849]
We propose a method to automatically generate training labels (called pseudo-labels) for 2D LiDAR-based person detectors.
We show that self-supervised detectors, trained or fine-tuned with pseudo-labels, outperform detectors trained using manual annotations.
Our method is an effective way to improve person detectors during deployment without any additional labeling effort.
arXiv Detail & Related papers (2020-12-16T12:10:04Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - Self-trained Deep Ordinal Regression for End-to-End Video Anomaly
Detection [114.9714355807607]
We show that applying self-trained deep ordinal regression to video anomaly detection overcomes two key limitations of existing methods.
We devise an end-to-end trainable video anomaly detection approach that enables joint representation learning and anomaly scoring without manually labeled normal/abnormal data.
arXiv Detail & Related papers (2020-03-15T08:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.