Motion-Compensated Latent Semantic Canvases for Visual Situational Awareness on Edge
- URL: http://arxiv.org/abs/2601.00854v1
- Date: Mon, 29 Dec 2025 20:25:02 GMT
- Title: Motion-Compensated Latent Semantic Canvases for Visual Situational Awareness on Edge
- Authors: Igor Lodin, Sergii Filatov, Vira Filatova, Dmytro Filatov,
- Abstract summary: We propose Motion-Compensated Latent Semantic Canvases for visual situational awareness on resource-constrained edge devices.<n>The core idea is to maintain persistent semantic metadata in two latent canvases defined in a baseline coordinate frame stabilized from the video stream.<n>On prerecorded 480p clips, our prototype reduces segmentation calls by >30x and lowers mean end-to-end processing time by >20x compared to naive per-frame segmentation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose Motion-Compensated Latent Semantic Canvases (MCLSC) for visual situational awareness on resource-constrained edge devices. The core idea is to maintain persistent semantic metadata in two latent canvases - a slowly accumulating static layer and a rapidly updating dynamic layer - defined in a baseline coordinate frame stabilized from the video stream. Expensive panoptic segmentation (Mask2Former) runs asynchronously and is motion-gated: inference is triggered only when motion indicates new information, while stabilization/motion compensation preserves a consistent coordinate system for latent semantic memory. On prerecorded 480p clips, our prototype reduces segmentation calls by >30x and lowers mean end-to-end processing time by >20x compared to naive per-frame segmentation, while maintaining coherent static/dynamic semantic overlays.
Related papers
- Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation [16.692450893925148]
We present a novel streaming framework named Knot Forcing for real-time portrait animation.<n>K Knot Forcing enables high-fidelity, temporally consistent, and interactive portrait animation over infinite sequences.
arXiv Detail & Related papers (2025-12-25T16:34:56Z) - 3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation [55.29423122177883]
3DScenePrompt is a framework that generates the next chunk from arbitrary-length input.<n>It enables camera control and preserving scene consistency.<n>Our framework significantly outperforms existing methods in scene consistency, camera controllability, and generation quality.
arXiv Detail & Related papers (2025-10-16T17:55:25Z) - STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding [31.38893861328115]
Video generation has recently made striking visual progress, but maintaining coherent object motion and interactions remains difficult.<n>We present STANCE, an image-to-video framework that addresses both issues with two simple components.
arXiv Detail & Related papers (2025-10-16T11:50:38Z) - DiViD: Disentangled Video Diffusion for Static-Dynamic Factorization [2.0032531485183345]
We introduce DiViD, the first end-to-end video diffusion framework for explicit static-dynamic factorization.<n>DiViD extracts a global static token from the first frame and per-frame dynamic tokens, explicitly removing static content from the motion code.<n>We evaluate DiViD on real-world benchmarks using swap-based accuracy and cross-leakage metrics.
arXiv Detail & Related papers (2025-07-18T14:09:18Z) - Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations [131.33758144860988]
Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with consistent human identity.<n>Current end-to-end frameworks suffer a critical spatial-temporal trade-off.<n>We propose a simple yet effective spatial-temporal decoupled framework that decomposes representations into spatial features for layouts and temporal features for motion dynamics.
arXiv Detail & Related papers (2025-07-07T06:54:44Z) - Motion-Aware Concept Alignment for Consistent Video Editing [57.08108545219043]
We introduce MoCA-Video (Motion-Aware Concept Alignment in Video), a training-free framework bridging the gap between image-domain semantic mixing and video.<n>Given a generated video and a user-provided reference image, MoCA-Video injects the semantic features of the reference image into a specific object within the video.<n>We evaluate MoCA's performance using the standard SSIM, image-level LPIPS, temporal LPIPS, and introduce a novel metric CASS (Conceptual Alignment Shift Score) to evaluate the consistency and effectiveness of the visual shifts between the source prompt and the modified video frames
arXiv Detail & Related papers (2025-06-01T13:28:04Z) - Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better [61.381599921020175]
Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts.<n>Traditional methods, such as temporal attention and 3D convolution, may struggle with significant object motion.<n>We propose the Tracktention Layer, a novel architectural component that explicitly integrates motion information using point tracks.
arXiv Detail & Related papers (2025-03-25T17:58:48Z) - Motion-state Alignment for Video Semantic Segmentation [4.375012768093524]
We propose a novel motion-state alignment framework for video semantic segmentation.
The proposed method picks up dynamic and static semantics in a targeted way.
Experiments on Cityscapes and CamVid datasets show that the proposed approach outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-04-18T08:34:46Z) - Motion-inductive Self-supervised Object Discovery in Videos [99.35664705038728]
We propose a model for processing consecutive RGB frames, and infer the optical flow between any pair of frames using a layered representation.
We demonstrate superior performance over previous state-of-the-art methods on three public video segmentation datasets.
arXiv Detail & Related papers (2022-10-01T08:38:28Z) - EA-Net: Edge-Aware Network for Flow-based Video Frame Interpolation [101.75999290175412]
We propose to reduce the image blur and get the clear shape of objects by preserving the edges in the interpolated frames.
The proposed Edge-Aware Network (EANet) integrates the edge information into the frame task.
Three edge-aware mechanisms are developed to emphasize the frame edges in estimating flow maps.
arXiv Detail & Related papers (2021-05-17T08:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.