Related papers: A Multi-user Oriented Live Free-viewpoint Video Streaming System Based On View Interpolation

A Multi-user Oriented Live Free-viewpoint Video Streaming System Based On View Interpolation

URL: http://arxiv.org/abs/2112.10603v2
Date: Wed, 22 Dec 2021 06:43:47 GMT
Title: A Multi-user Oriented Live Free-viewpoint Video Streaming System Based On View Interpolation
Authors: Jingchuan Hu, Shuai Guo, Kai Zhou, Yu Dong, Jun Xu and Li Song
Abstract summary: We introduce a CNN-based view algorithm to synthesis dense virtual views in real time. We also build an end-to-end live free-viewpoint system with a multi-user oriented streaming strategy.
Score: 15.575219833681635
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As an important application form of immersive multimedia services, free-viewpoint video(FVV) enables users with great immersive experience by strong interaction. However, the computational complexity of virtual view synthesis algorithms poses a significant challenge to the real-time performance of an FVV system. Furthermore, the individuality of user interaction makes it difficult to serve multiple users simultaneously for a system with conventional architecture. In this paper, we novelly introduce a CNN-based view interpolation algorithm to synthesis dense virtual views in real time. Based on this, we also build an end-to-end live free-viewpoint system with a multi-user oriented streaming strategy. Our system can utilize a single edge server to serve multiple users at the same time without having to bring a large view synthesis load on the client side. We analyze the whole system and show that our approaches give the user a pleasant immersive experience, in terms of both visual quality and latency.

Related papers

Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems [57.30711059396246]
Current Graphical User Interface (GUI) grounding systems locate interface elements based on natural language instructions. Inspired by human dual-system cognition, we present Focus, a novel GUI grounding framework that combines fast prediction with systematic analysis.
arXiv Detail & Related papers (2025-03-09T06:14:17Z)
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation [62.682428307810525]
We introduce AVS-Mamba, a selective state space model to address the audio-visual segmentation task. Our framework incorporates two key components for video understanding and cross-modal learning. Our approach achieves new state-of-the-art results on the AVSBench-object and AVS-semantic datasets.
arXiv Detail & Related papers (2025-01-14T03:20:20Z)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents. Our approach leverages image-based observations, and grounding instructions in natural language to visual elements. To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z)
A Monocular SLAM-based Multi-User Positioning System with Image Occlusion in Augmented Reality [2.8155732302036176]
We propose a multi-user localization system based on ORB-SLAM2 using monocular RGB images as a development platform based on the Unity 3D game engine. This system not only performs user localization but also places a common virtual object on a planar surface so that every user holds a proper perspective view of the object. The positioning information is passed among every user's AR devices via a central server, based on which the relative position and movement of other users in the space of a specific user are presented.
arXiv Detail & Related papers (2024-11-17T02:39:30Z)
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout. DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z)
Generalized User Representations for Transfer Learning [6.953653891411339]
We present a novel framework for user representation in large-scale recommender systems. Our approach employs a two-stage methodology combining representation learning and transfer learning. We show how the proposed framework can significantly reduce infrastructure costs compared to alternative approaches.
arXiv Detail & Related papers (2024-03-01T15:05:21Z)
Explore Synergistic Interaction Across Frames for Interactive Video Object Segmentation [70.93295323156876]
We propose a framework that can accept multiple frames simultaneously and explore synergistic interaction across frames (SIAF) Our SwinB-SIAF achieves new state-of-the-art performance on DAVIS 2017 (89.6%, J&F@60) Our R50-SIAF is more than 3 faster than the state-of-the-art competitor under challenging multi-object scenarios.
arXiv Detail & Related papers (2024-01-23T04:19:15Z)
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding [55.65727739645824]
Chat-UniVi is a Unified Vision-language model capable of comprehending and engaging in conversations involving images and videos. We employ a set of dynamic visual tokens to uniformly represent images and videos. We leverage a multi-scale representation, enabling the model to perceive both high-level semantic concepts and low-level visual details.
arXiv Detail & Related papers (2023-11-14T10:11:36Z)
Virtual Avatar Stream: a cost-down approach to the Metaverse experience [0.0]
This project aims to provide an accessible entry point to the immersive Metaverse experience by leveraging web technologies. The platform developed allows users to engage with rendered avatars using only a web browser, microphone, and webcam.
arXiv Detail & Related papers (2023-04-04T01:34:23Z)
AEGIS: A real-time multimodal augmented reality computer vision based system to assist facial expression recognition for individuals with autism spectrum disorder [93.0013343535411]
This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN) The proposed system, which we call AEGIS, is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses. We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame.
arXiv Detail & Related papers (2020-10-22T17:20:38Z)
FVV Live: A real-time free-viewpoint video system with consumer electronics hardware [1.1403672224109256]
FVV Live is a novel end-to-end free-viewpoint video system, designed for low cost and real-time operation. The system has been designed to yield high-quality free-viewpoint video using consumer-grade cameras and hardware.
arXiv Detail & Related papers (2020-07-01T15:40:28Z)
Self-Supervised MultiModal Versatile Networks [76.19886740072808]
We learn representations using self-supervision by leveraging three modalities naturally present in videos: visual, audio and language streams. We demonstrate how such networks trained on large collections of unlabelled video data can be applied on video, video-text, image and audio tasks.
arXiv Detail & Related papers (2020-06-29T17:50:23Z)
Scene-Adaptive Video Frame Interpolation via Meta-Learning [54.87696619177496]
We propose to adapt the model to each video by making use of additional information that is readily available at test time. We obtain significant performance gains with only a single gradient update without any additional parameters.
arXiv Detail & Related papers (2020-04-02T02:46:44Z)
Using CNNs For Users Segmentation In Video See-Through Augmented Virtuality [0.0]
We present preliminary results on the use of deep learning techniques to integrate the users self-body and other participants into a head-mounted video see-through augmented virtuality scenario. We propose to use a convolutional neural network for real time semantic segmentation of users bodies in the stereoscopic RGB video streams acquired from the perspective of the user.
arXiv Detail & Related papers (2020-01-02T15:22:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.