Stitching the Story: Creating Panoramic Incident Summaries from Body-Worn Footage
- URL: http://arxiv.org/abs/2509.04370v1
- Date: Thu, 04 Sep 2025 16:27:53 GMT
- Title: Stitching the Story: Creating Panoramic Incident Summaries from Body-Worn Footage
- Authors: Dor Cohen, Inga Efrosman, Yehudit Aperstein, Alexander Apartsin,
- Abstract summary: First responders widely adopt body-worn cameras to document incident scenes and support post-event analysis.<n>This work presents a computer vision pipeline that transforms body-camera footage into informative panoramic images summarizing the incident scene.
- Score: 40.12543056558646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: First responders widely adopt body-worn cameras to document incident scenes and support post-event analysis. However, reviewing lengthy video footage is impractical in time-critical situations. Effective situational awareness demands a concise visual summary that can be quickly interpreted. This work presents a computer vision pipeline that transforms body-camera footage into informative panoramic images summarizing the incident scene. Our method leverages monocular Simultaneous Localization and Mapping (SLAM) to estimate camera trajectories and reconstruct the spatial layout of the environment. Key viewpoints are identified by clustering camera poses along the trajectory, and representative frames from each cluster are selected. These frames are fused into spatially coherent panoramic images using multi-frame stitching techniques. The resulting summaries enable rapid understanding of complex environments and facilitate efficient decision-making and incident review.
Related papers
- DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation [72.89376712495464]
DAGE is a dual-stream transformer that disentangles global coherence from fine detail.<n>A low-resolution stream operates on aggressively downsampled frames with alternating frame/global attention to build a view-consistent representation.<n>A high-resolution stream processes the original images per-frame to preserve sharp boundaries and small structures.<n>This design scales resolution and clip length independently, supports inputs up to 2K, and maintains practical inference cost.
arXiv Detail & Related papers (2026-03-04T05:29:29Z) - Rethinking Camera Choice: An Empirical Study on Fisheye Camera Properties in Robotic Manipulation [53.27191803311681]
We rigorously analyze the properties of wrist-mounted fisheye cameras for imitation learning.<n>Fisheye-trained policies unlock superior scene generalization when trained with sufficient environmental diversity.<n>Our findings provide concrete, actionable guidance for the large-scale collection and effective use of fisheye datasets in robotic learning.
arXiv Detail & Related papers (2026-03-02T18:00:37Z) - IntelliCap: Intelligent Guidance for Consistent View Sampling [14.791526418738218]
High-quality view synthesis requires uniform and dense view sampling.<n>Existing approaches to guide humans during image acquisition concentrate on single objects.<n>We propose a novel situated visualization technique for scanning at multiple scales.
arXiv Detail & Related papers (2025-08-18T16:00:31Z) - KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction [58.04846444985808]
This paper introduces KRONC, a novel approach aimed at inferring view poses by leveraging prior knowledge about the object to reconstruct and its representation through semantic keypoints.
With a focus on vehicle scenes, KRONC is able to estimate the position of the views as a solution to a light optimization problem targeting the convergence of keypoints' back-projections to a singular point.
arXiv Detail & Related papers (2024-09-09T08:08:05Z) - MultiViPerFrOG: A Globally Optimized Multi-Viewpoint Perception Framework for Camera Motion and Tissue Deformation [18.261678529996104]
We propose a framework that can flexibly integrate the output of low-level perception modules with kinematic and scene-modeling priors.
Overall, our method shows robustness to combined noisy input measures and can process hundreds of points in a few milliseconds.
arXiv Detail & Related papers (2024-08-08T10:55:55Z) - Erasing the Ephemeral: Joint Camera Refinement and Transient Object
Removal for Street View Synthesis [44.90761677737313]
We introduce a method that tackles challenges on view synthesis for outdoor scenarios.
We employ a neural point light field scene representation and strategically detect and mask out dynamic objects to reconstruct novel scenes without artifacts.
We demonstrate state-of-the-art results in synthesizing novel views of urban scenes.
arXiv Detail & Related papers (2023-11-29T13:51:12Z) - Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames [23.229623379422303]
Scene summarization is the task of condensing long, continuous scene videos into a compact set of spatially diverses that facilitate global spatial reasoning.<n>We propose SceneSum, a two-stage self-supervised pipeline that first clusters video frames using visual place recognition to promote spatial diversity, then selects representatives from each cluster under resource constraints.<n> Experiments on real and simulated indoor datasets show that SceneSum produces more spatially informative summaries and outperforms existing video summarization baselines.
arXiv Detail & Related papers (2023-11-28T22:18:26Z) - DynPoint: Dynamic Neural Point For View Synthesis [43.27110788061267]
We propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos.<n>DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation.<n>Our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.
arXiv Detail & Related papers (2023-10-29T12:55:53Z) - Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis [76.72505510632904]
We present Total-Recon, the first method to reconstruct deformable scenes from long monocular RGBD videos.
Our method hierarchically decomposes the scene into the background and objects, whose motion is decomposed into root-body motion and local articulations.
arXiv Detail & Related papers (2023-04-24T17:59:52Z) - Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks [55.81577205593956]
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously.
Deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential.
arXiv Detail & Related papers (2023-02-17T14:19:28Z) - Crowdsampling the Plenoptic Function [56.10020793913216]
We present a new approach to novel view synthesis under time-varying illumination from such data.
We introduce a new DeepMPI representation, motivated by observations on the sparsity structure of the plenoptic function.
Our method can synthesize the same compelling parallax and view-dependent effects as previous MPI methods, while simultaneously interpolating along changes in reflectance and illumination with time.
arXiv Detail & Related papers (2020-07-30T02:52:10Z) - Perspective Plane Program Induction from a Single Image [85.28956922100305]
We study the inverse graphics problem of inferring a holistic representation for natural images.
We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image.
Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem.
arXiv Detail & Related papers (2020-06-25T21:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.