A Multi-user Oriented Live Free-viewpoint Video Streaming System Based
On View Interpolation
- URL: http://arxiv.org/abs/2112.10603v2
- Date: Wed, 22 Dec 2021 06:43:47 GMT
- Title: A Multi-user Oriented Live Free-viewpoint Video Streaming System Based
On View Interpolation
- Authors: Jingchuan Hu, Shuai Guo, Kai Zhou, Yu Dong, Jun Xu and Li Song
- Abstract summary: We introduce a CNN-based view algorithm to synthesis dense virtual views in real time.
We also build an end-to-end live free-viewpoint system with a multi-user oriented streaming strategy.
- Score: 15.575219833681635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an important application form of immersive multimedia services,
free-viewpoint video(FVV) enables users with great immersive experience by
strong interaction. However, the computational complexity of virtual view
synthesis algorithms poses a significant challenge to the real-time performance
of an FVV system. Furthermore, the individuality of user interaction makes it
difficult to serve multiple users simultaneously for a system with conventional
architecture. In this paper, we novelly introduce a CNN-based view
interpolation algorithm to synthesis dense virtual views in real time. Based on
this, we also build an end-to-end live free-viewpoint system with a multi-user
oriented streaming strategy. Our system can utilize a single edge server to
serve multiple users at the same time without having to bring a large view
synthesis load on the client side. We analyze the whole system and show that
our approaches give the user a pleasant immersive experience, in terms of both
visual quality and latency.
Related papers
- DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Generalized User Representations for Transfer Learning [6.953653891411339]
We present a novel framework for user representation in large-scale recommender systems.
Our approach employs a two-stage methodology combining representation learning and transfer learning.
We show how the proposed framework can significantly reduce infrastructure costs compared to alternative approaches.
arXiv Detail & Related papers (2024-03-01T15:05:21Z) - Explore Synergistic Interaction Across Frames for Interactive Video
Object Segmentation [70.93295323156876]
We propose a framework that can accept multiple frames simultaneously and explore synergistic interaction across frames (SIAF)
Our SwinB-SIAF achieves new state-of-the-art performance on DAVIS 2017 (89.6%, J&F@60)
Our R50-SIAF is more than 3 faster than the state-of-the-art competitor under challenging multi-object scenarios.
arXiv Detail & Related papers (2024-01-23T04:19:15Z) - Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding [55.65727739645824]
Chat-UniVi is a Unified Vision-language model capable of comprehending and engaging in conversations involving images and videos.
We employ a set of dynamic visual tokens to uniformly represent images and videos.
We leverage a multi-scale representation, enabling the model to perceive both high-level semantic concepts and low-level visual details.
arXiv Detail & Related papers (2023-11-14T10:11:36Z) - Virtual Avatar Stream: a cost-down approach to the Metaverse experience [0.0]
This project aims to provide an accessible entry point to the immersive Metaverse experience by leveraging web technologies.
The platform developed allows users to engage with rendered avatars using only a web browser, microphone, and webcam.
arXiv Detail & Related papers (2023-04-04T01:34:23Z) - You Only Train Once: Multi-Identity Free-Viewpoint Neural Human
Rendering from Monocular Videos [10.795522875068073]
You Only Train Once (YOTO) is a dynamic human generation framework, which performs free-viewpoint rendering of different human identities with distinct motions.
In this paper, we propose a set of learnable identity codes to expand the capability of the framework for multi-identity free-viewpoint rendering.
YOTO shows state-of-the-art performance on all evaluation metrics while showing significant benefits in training and inference efficiency as well as rendering quality.
arXiv Detail & Related papers (2023-03-10T10:23:17Z) - AEGIS: A real-time multimodal augmented reality computer vision based
system to assist facial expression recognition for individuals with autism
spectrum disorder [93.0013343535411]
This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN)
The proposed system, which we call AEGIS, is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses.
We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame.
arXiv Detail & Related papers (2020-10-22T17:20:38Z) - FVV Live: A real-time free-viewpoint video system with consumer
electronics hardware [1.1403672224109256]
FVV Live is a novel end-to-end free-viewpoint video system, designed for low cost and real-time operation.
The system has been designed to yield high-quality free-viewpoint video using consumer-grade cameras and hardware.
arXiv Detail & Related papers (2020-07-01T15:40:28Z) - Self-Supervised MultiModal Versatile Networks [76.19886740072808]
We learn representations using self-supervision by leveraging three modalities naturally present in videos: visual, audio and language streams.
We demonstrate how such networks trained on large collections of unlabelled video data can be applied on video, video-text, image and audio tasks.
arXiv Detail & Related papers (2020-06-29T17:50:23Z) - Scene-Adaptive Video Frame Interpolation via Meta-Learning [54.87696619177496]
We propose to adapt the model to each video by making use of additional information that is readily available at test time.
We obtain significant performance gains with only a single gradient update without any additional parameters.
arXiv Detail & Related papers (2020-04-02T02:46:44Z) - Using CNNs For Users Segmentation In Video See-Through Augmented
Virtuality [0.0]
We present preliminary results on the use of deep learning techniques to integrate the users self-body and other participants into a head-mounted video see-through augmented virtuality scenario.
We propose to use a convolutional neural network for real time semantic segmentation of users bodies in the stereoscopic RGB video streams acquired from the perspective of the user.
arXiv Detail & Related papers (2020-01-02T15:22:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.