A Multi-user Oriented Live Free-viewpoint Video Streaming System Based
On View Interpolation
- URL: http://arxiv.org/abs/2112.10603v2
- Date: Wed, 22 Dec 2021 06:43:47 GMT
- Title: A Multi-user Oriented Live Free-viewpoint Video Streaming System Based
On View Interpolation
- Authors: Jingchuan Hu, Shuai Guo, Kai Zhou, Yu Dong, Jun Xu and Li Song
- Abstract summary: We introduce a CNN-based view algorithm to synthesis dense virtual views in real time.
We also build an end-to-end live free-viewpoint system with a multi-user oriented streaming strategy.
- Score: 15.575219833681635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an important application form of immersive multimedia services,
free-viewpoint video(FVV) enables users with great immersive experience by
strong interaction. However, the computational complexity of virtual view
synthesis algorithms poses a significant challenge to the real-time performance
of an FVV system. Furthermore, the individuality of user interaction makes it
difficult to serve multiple users simultaneously for a system with conventional
architecture. In this paper, we novelly introduce a CNN-based view
interpolation algorithm to synthesis dense virtual views in real time. Based on
this, we also build an end-to-end live free-viewpoint system with a multi-user
oriented streaming strategy. Our system can utilize a single edge server to
serve multiple users at the same time without having to bring a large view
synthesis load on the client side. We analyze the whole system and show that
our approaches give the user a pleasant immersive experience, in terms of both
visual quality and latency.
Related papers
- AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation [62.682428307810525]
We introduce AVS-Mamba, a selective state space model to address the audio-visual segmentation task.
Our framework incorporates two key components for video understanding and cross-modal learning.
Our approach achieves new state-of-the-art results on the AVSBench-object and AVS-semantic datasets.
arXiv Detail & Related papers (2025-01-14T03:20:20Z) - Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents.
Our approach leverages image-based observations, and grounding instructions in natural language to visual elements.
To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z) - A Monocular SLAM-based Multi-User Positioning System with Image Occlusion in Augmented Reality [2.8155732302036176]
We propose a multi-user localization system based on ORB-SLAM2 using monocular RGB images as a development platform based on the Unity 3D game engine.
This system not only performs user localization but also places a common virtual object on a planar surface so that every user holds a proper perspective view of the object.
The positioning information is passed among every user's AR devices via a central server, based on which the relative position and movement of other users in the space of a specific user are presented.
arXiv Detail & Related papers (2024-11-17T02:39:30Z) - DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Generalized User Representations for Transfer Learning [6.953653891411339]
We present a novel framework for user representation in large-scale recommender systems.
Our approach employs a two-stage methodology combining representation learning and transfer learning.
We show how the proposed framework can significantly reduce infrastructure costs compared to alternative approaches.
arXiv Detail & Related papers (2024-03-01T15:05:21Z) - IDPro: Flexible Interactive Video Object Segmentation by ID-queried Concurrent Propagation [66.94214242968967]
We propose a framework that can accept multiple frames simultaneously and explore synergistic interaction across frames (SIAF)
Our SwinB-SIAF achieves new state-of-the-art performance on DAVIS 2017 (89.6%, J&F@60)
Our R50-SIAF is more than 3 faster than the state-of-the-art competitor under challenging multi-object scenarios.
arXiv Detail & Related papers (2024-01-23T04:19:15Z) - AEGIS: A real-time multimodal augmented reality computer vision based
system to assist facial expression recognition for individuals with autism
spectrum disorder [93.0013343535411]
This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN)
The proposed system, which we call AEGIS, is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses.
We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame.
arXiv Detail & Related papers (2020-10-22T17:20:38Z) - FVV Live: A real-time free-viewpoint video system with consumer
electronics hardware [1.1403672224109256]
FVV Live is a novel end-to-end free-viewpoint video system, designed for low cost and real-time operation.
The system has been designed to yield high-quality free-viewpoint video using consumer-grade cameras and hardware.
arXiv Detail & Related papers (2020-07-01T15:40:28Z) - Scene-Adaptive Video Frame Interpolation via Meta-Learning [54.87696619177496]
We propose to adapt the model to each video by making use of additional information that is readily available at test time.
We obtain significant performance gains with only a single gradient update without any additional parameters.
arXiv Detail & Related papers (2020-04-02T02:46:44Z) - Using CNNs For Users Segmentation In Video See-Through Augmented
Virtuality [0.0]
We present preliminary results on the use of deep learning techniques to integrate the users self-body and other participants into a head-mounted video see-through augmented virtuality scenario.
We propose to use a convolutional neural network for real time semantic segmentation of users bodies in the stereoscopic RGB video streams acquired from the perspective of the user.
arXiv Detail & Related papers (2020-01-02T15:22:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.