Multi-view data capture using edge-synchronised mobiles
- URL: http://arxiv.org/abs/2005.03286v1
- Date: Thu, 7 May 2020 07:13:20 GMT
- Title: Multi-view data capture using edge-synchronised mobiles
- Authors: Matteo Bortolon, Paul Chippendale, Stefano Messelodi and Fabio Poiesi
- Abstract summary: New-generation network architectures (e.g. 5G) promise lower latency and larger bandwidth connections supported by powerful edge computing.
We propose a novel and scalable data capture architecture that exploits edge resources to synchronise and harvest frame captures.
We empirically show the benefits of our edge computing unit by analysing latencies and show the quality of 3D reconstruction outputs against an alternative and popular centralised solution.
- Score: 0.17205106391379021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view data capture permits free-viewpoint video (FVV) content creation.
To this end, several users must capture video streams, calibrated in both time
and pose, framing the same object/scene, from different viewpoints.
New-generation network architectures (e.g. 5G) promise lower latency and larger
bandwidth connections supported by powerful edge computing, properties that
seem ideal for reliable FVV capture. We have explored this possibility, aiming
to remove the need for bespoke synchronisation hardware when capturing a scene
from multiple viewpoints, making it possible through off-the-shelf mobiles. We
propose a novel and scalable data capture architecture that exploits edge
resources to synchronise and harvest frame captures. We have designed an edge
computing unit that supervises the relaying of timing triggers to and from
multiple mobiles, in addition to synchronising frame harvesting. We empirically
show the benefits of our edge computing unit by analysing latencies and show
the quality of 3D reconstruction outputs against an alternative and popular
centralised solution based on Unity3D.
Related papers
- StableDPT: Temporal Stable Monocular Video Depth Estimation [14.453483279783908]
We propose a novel approach that adapts any state-of-the-art image-based (depth) estimation model for video processing.<n>Our architecture builds upon an off-the-shelf Vision Transformer (ViT) encoder and enhances the Dense Prediction Transformer (DPT) head.<n> Evaluations on multiple benchmark datasets demonstrate improved temporal consistency, competitive state-of-the-art performance and on top 2x faster processing in real-world scenarios.
arXiv Detail & Related papers (2026-01-06T08:02:14Z) - Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection? [57.000348519630286]
Recent advances in mobile edge computing have made it possible to offload-intensive object detection to edge servers equipped with high-accuracy neural networks.<n>This hybrid approach offers a promising solution but introduces a new challenge: deciding when to perform edge detection versus local tracking.<n>We propose the LTED-Ada in single-device setting, a deep reinforcement learning-based algorithm that adaptively selects between local tracking and edge detection.
arXiv Detail & Related papers (2025-11-25T04:54:51Z) - MiVID: Multi-Strategic Self-Supervision for Video Frame Interpolation using Diffusion Model [2.9795035162522194]
This article introduces MiVID, a lightweight, self-supervised, diffusion-based framework for video rendering.<n>Our model eliminates the need for explicit motion estimation by combining a 3D U-Net backbone with transformer-style temporal attention.<n>We show that MiVID achieves optimal results just 50 epochs, competitive with several supervised baselines.
arXiv Detail & Related papers (2025-11-08T14:10:04Z) - DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation [61.59996525424585]
DIFFVSGG is an online VSGG solution that frames this task as an iterative scene graph update problem.
We unify the decoding of object classification, bounding box regression, and graph generation three tasks using one shared feature embedding.
DIFFVSGG further facilitates continuous temporal reasoning, where predictions for subsequent frames leverage results of past frames as the conditional inputs of LDMs.
arXiv Detail & Related papers (2025-03-18T06:49:51Z) - Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR)
Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query.
Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z) - Spatio-temporal Prompting Network for Robust Video Feature Extraction [74.54597668310707]
Frametemporal is one of the main challenges in the field of video understanding.
Recent approaches exploit transformer-based integration modules to obtain quality-of-temporal information.
We present a neat and unified framework called N-Temporal Prompting Network (NNSTP)
It can efficiently extract video features by adjusting the input features in the network backbone.
arXiv Detail & Related papers (2024-02-04T17:52:04Z) - ViFiT: Reconstructing Vision Trajectories from IMU and Wi-Fi Fine Time
Measurements [6.632056181867312]
We propose ViFiT, a transformer-based model that reconstructs vision bounding box trajectories from phone data (IMU and Fine Time Measurements)
ViFiT achieves an MRFR of 0.65 that outperforms the state-of-the-art approach for cross-modal reconstruction in LSTM-Decoder architecture.
arXiv Detail & Related papers (2023-10-04T20:05:40Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z) - Task-Oriented Communication for Edge Video Analytics [11.03999024164301]
This paper proposes a task-oriented communication framework for edge video analytics.
Multiple devices collect visual sensory data and transmit the informative features to an edge server for processing.
We show that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.
arXiv Detail & Related papers (2022-11-25T12:09:12Z) - Graph Neural Network and Spatiotemporal Transformer Attention for 3D
Video Object Detection from Point Clouds [94.21415132135951]
We propose to detect 3D objects by exploiting temporal information in multiple frames.
We implement our algorithm based on prevalent anchor-based and anchor-free detectors.
arXiv Detail & Related papers (2022-07-26T05:16:28Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Multi-view data capture for dynamic object reconstruction using handheld
augmented reality mobiles [0.0]
We propose a system to capture nearly-synchronous frame streams from multiple and moving handheld mobiles.
Each mobile executes Simultaneous Localisation and Mapping on-board to estimate its pose, and uses a wireless communication channel to send or receive synchronisation triggers.
We show the effectiveness of our system by employing it for 3D skeleton and volumetric reconstructions.
arXiv Detail & Related papers (2021-03-14T10:26:50Z) - ApproxDet: Content and Contention-Aware Approximate Object Detection for
Mobiles [19.41234144545467]
We introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements.
We evaluate ApproxDet on a large benchmark video dataset and compare quantitatively to AdaScale and YOLOv3.
We find that ApproxDet is able to adapt to a wide variety of contention and content characteristics and outshines all baselines.
arXiv Detail & Related papers (2020-10-21T04:11:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.