Related papers: Motion-Compensated Latent Semantic Canvases for Visual Situational Awareness on Edge

Motion-Compensated Latent Semantic Canvases for Visual Situational Awareness on Edge

URL: http://arxiv.org/abs/2601.00854v1
Date: Mon, 29 Dec 2025 20:25:02 GMT
Title: Motion-Compensated Latent Semantic Canvases for Visual Situational Awareness on Edge
Authors: Igor Lodin, Sergii Filatov, Vira Filatova, Dmytro Filatov,
Abstract summary: We propose Motion-Compensated Latent Semantic Canvases for visual situational awareness on resource-constrained edge devices.<n>The core idea is to maintain persistent semantic metadata in two latent canvases defined in a baseline coordinate frame stabilized from the video stream.<n>On prerecorded 480p clips, our prototype reduces segmentation calls by >30x and lowers mean end-to-end processing time by >20x compared to naive per-frame segmentation.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We propose Motion-Compensated Latent Semantic Canvases (MCLSC) for visual situational awareness on resource-constrained edge devices. The core idea is to maintain persistent semantic metadata in two latent canvases - a slowly accumulating static layer and a rapidly updating dynamic layer - defined in a baseline coordinate frame stabilized from the video stream. Expensive panoptic segmentation (Mask2Former) runs asynchronously and is motion-gated: inference is triggered only when motion indicates new information, while stabilization/motion compensation preserves a consistent coordinate system for latent semantic memory. On prerecorded 480p clips, our prototype reduces segmentation calls by >30x and lowers mean end-to-end processing time by >20x compared to naive per-frame segmentation, while maintaining coherent static/dynamic semantic overlays.

Related papers

Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation [16.692450893925148]
We present a novel streaming framework named Knot Forcing for real-time portrait animation.<n>K Knot Forcing enables high-fidelity, temporally consistent, and interactive portrait animation over infinite sequences.
arXiv Detail & Related papers (2025-12-25T16:34:56Z)
3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation [55.29423122177883]
3DScenePrompt is a framework that generates the next chunk from arbitrary-length input.<n>It enables camera control and preserving scene consistency.<n>Our framework significantly outperforms existing methods in scene consistency, camera controllability, and generation quality.
arXiv Detail & Related papers (2025-10-16T17:55:25Z)
STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding [31.38893861328115]
Video generation has recently made striking visual progress, but maintaining coherent object motion and interactions remains difficult.<n>We present STANCE, an image-to-video framework that addresses both issues with two simple components.
arXiv Detail & Related papers (2025-10-16T11:50:38Z)
DiViD: Disentangled Video Diffusion for Static-Dynamic Factorization [2.0032531485183345]
We introduce DiViD, the first end-to-end video diffusion framework for explicit static-dynamic factorization.<n>DiViD extracts a global static token from the first frame and per-frame dynamic tokens, explicitly removing static content from the motion code.<n>We evaluate DiViD on real-world benchmarks using swap-based accuracy and cross-leakage metrics.
arXiv Detail & Related papers (2025-07-18T14:09:18Z)
Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations [131.33758144860988]
Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with consistent human identity.<n>Current end-to-end frameworks suffer a critical spatial-temporal trade-off.<n>We propose a simple yet effective spatial-temporal decoupled framework that decomposes representations into spatial features for layouts and temporal features for motion dynamics.
arXiv Detail & Related papers (2025-07-07T06:54:44Z)
Motion-Aware Concept Alignment for Consistent Video Editing [57.08108545219043]
We introduce MoCA-Video (Motion-Aware Concept Alignment in Video), a training-free framework bridging the gap between image-domain semantic mixing and video.<n>Given a generated video and a user-provided reference image, MoCA-Video injects the semantic features of the reference image into a specific object within the video.<n>We evaluate MoCA's performance using the standard SSIM, image-level LPIPS, temporal LPIPS, and introduce a novel metric CASS (Conceptual Alignment Shift Score) to evaluate the consistency and effectiveness of the visual shifts between the source prompt and the modified video frames
arXiv Detail & Related papers (2025-06-01T13:28:04Z)
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better [61.381599921020175]
Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts.<n>Traditional methods, such as temporal attention and 3D convolution, may struggle with significant object motion.<n>We propose the Tracktention Layer, a novel architectural component that explicitly integrates motion information using point tracks.
arXiv Detail & Related papers (2025-03-25T17:58:48Z)
Motion-state Alignment for Video Semantic Segmentation [4.375012768093524]
We propose a novel motion-state alignment framework for video semantic segmentation. The proposed method picks up dynamic and static semantics in a targeted way. Experiments on Cityscapes and CamVid datasets show that the proposed approach outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-04-18T08:34:46Z)
Motion-inductive Self-supervised Object Discovery in Videos [99.35664705038728]
We propose a model for processing consecutive RGB frames, and infer the optical flow between any pair of frames using a layered representation. We demonstrate superior performance over previous state-of-the-art methods on three public video segmentation datasets.
arXiv Detail & Related papers (2022-10-01T08:38:28Z)
EA-Net: Edge-Aware Network for Flow-based Video Frame Interpolation [101.75999290175412]
We propose to reduce the image blur and get the clear shape of objects by preserving the edges in the interpolated frames. The proposed Edge-Aware Network (EANet) integrates the edge information into the frame task. Three edge-aware mechanisms are developed to emphasize the frame edges in estimating flow maps.
arXiv Detail & Related papers (2021-05-17T08:44:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.