RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems
- URL: http://arxiv.org/abs/2511.14948v1
- Date: Tue, 18 Nov 2025 22:13:06 GMT
- Title: RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems
- Authors: Jaro Meyer, Frédéric Giraud, Joschua Wüthrich, Marc Pollefeys, Philipp Fürnstahl, Lilian Calvet,
- Abstract summary: We present a low-cost, general-purpose synchronization method that achieves millisecond-level temporal alignment across diverse camera systems.<n>The proposed solution employs a custom-built itLED Clock that encodes time through red and infrared, allowing visual decoding of the exposure window.<n>We validate the system in large-scale surgical recordings involving over 25 heterogeneous cameras spanning both IR and RGB modalities.
- Score: 38.099313678683224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate spatiotemporal alignment of multi-view video streams is essential for a wide range of dynamic-scene applications such as multi-view 3D reconstruction, pose estimation, and scene understanding. However, synchronizing multiple cameras remains a significant challenge, especially in heterogeneous setups combining professional and consumer-grade devices, visible and infrared sensors, or systems with and without audio, where common hardware synchronization capabilities are often unavailable. This limitation is particularly evident in real-world environments, where controlled capture conditions are not feasible. In this work, we present a low-cost, general-purpose synchronization method that achieves millisecond-level temporal alignment across diverse camera systems while supporting both visible (RGB) and infrared (IR) modalities. The proposed solution employs a custom-built \textit{LED Clock} that encodes time through red and infrared LEDs, allowing visual decoding of the exposure window (start and end times) from recorded frames for millisecond-level synchronization. We benchmark our method against hardware synchronization and achieve a residual error of 1.34~ms RMSE across multiple recordings. In further experiments, our method outperforms light-, audio-, and timecode-based synchronization approaches and directly improves downstream computer vision tasks, including multi-view pose estimation and 3D reconstruction. Finally, we validate the system in large-scale surgical recordings involving over 25 heterogeneous cameras spanning both IR and RGB modalities. This solution simplifies and streamlines the synchronization pipeline and expands access to advanced vision-based sensing in unconstrained environments, including industrial and clinical applications.
Related papers
- UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models [54.564740558030245]
We present UCM, a novel framework that unifies long-term memory and precise camera control via a time-aware positional encoding warping mechanism.<n>We also introduce a scalable data curation strategy utilizing point-cloud-based rendering to simulate scene revisiting.
arXiv Detail & Related papers (2026-02-26T12:54:46Z) - Lumosaic: Hyperspectral Video via Active Illumination and Coded-Exposure Pixels [19.00390495006801]
Lumosaic is a compact active hyperspectral video system designed for real-time capture of dynamic scenes.<n>Our approach combines a narrowband LED array with a coded-exposure-pixel camera capable of high-speed, per-pixel exposure control.
arXiv Detail & Related papers (2026-02-25T17:42:44Z) - Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID.<n>This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging.<n>We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z) - An Asynchronous Linear Filter Architecture for Hybrid Event-Frame Cameras [9.69495347826584]
We present an asynchronous linear filter architecture, fusing event and frame camera data, for HDR video reconstruction and spatial convolution.
The proposed AKF pipeline outperforms other state-of-the-art methods in both absolute intensity error (69.4% reduction) and image similarity indexes (average 35.5% improvement)
arXiv Detail & Related papers (2023-09-03T12:37:59Z) - Video Frame Interpolation with Stereo Event and Intensity Camera [40.07341828127157]
We propose a novel Stereo Event-based VFI network (SE-VFI-Net) to generate high-quality intermediate frames.
We exploit the fused features accomplishing accurate optical flow and disparity estimation.
Our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-07-17T04:02:00Z) - Self-Supervised Intensity-Event Stereo Matching [24.851819610561517]
Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy.
Event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously.
This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors.
arXiv Detail & Related papers (2022-11-01T14:52:25Z) - Rolling Shutter Inversion: Bring Rolling Shutter Images to High
Framerate Global Shutter Video [111.08121952640766]
This paper presents a novel deep-learning based solution to the RS temporal super-resolution problem.
By leveraging the multi-view geometry relationship of the RS imaging process, our framework successfully achieves high framerate GS generation.
Our method can produce high-quality GS image sequences with rich details, outperforming the state-of-the-art methods.
arXiv Detail & Related papers (2022-10-06T16:47:12Z) - Synchronized Smartphone Video Recording System of Depth and RGB Image
Frames with Sub-millisecond Precision [2.1286051580524523]
We propose a recording system with high time synchronization (sync) precision.
It consists of heterogeneous sensors such as smartphone, depth camera, IMU, etc.
arXiv Detail & Related papers (2021-11-05T15:16:54Z) - Combining Events and Frames using Recurrent Asynchronous Multimodal
Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors.
Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction.
We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z) - Event-based Stereo Visual Odometry [42.77238738150496]
We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig.
We seek to maximize thetemporal consistency of stereo event-based data while using a simple and efficient representation.
arXiv Detail & Related papers (2020-07-30T15:53:28Z) - Single-Frame based Deep View Synchronization for Unsynchronized
Multi-Camera Surveillance [56.964614522968226]
Multi-camera surveillance has been an active research topic for understanding and modeling scenes.
It is usually assumed that the cameras are all temporally synchronized when designing models for these multi-camera based tasks.
Our view synchronization models are applied to different DNNs-based multi-camera vision tasks under the unsynchronized setting.
arXiv Detail & Related papers (2020-07-08T04:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.