Distributed Radiance Fields for Edge Video Compression and Metaverse
Integration in Autonomous Driving
- URL: http://arxiv.org/abs/2402.14642v1
- Date: Thu, 22 Feb 2024 15:39:58 GMT
- Title: Distributed Radiance Fields for Edge Video Compression and Metaverse
Integration in Autonomous Driving
- Authors: Eugen \v{S}lapak, Mat\'u\v{s} Dopiriak, Mohammad Abdullah Al Faruque,
Juraj Gazda, Marco Levorato
- Abstract summary: metaverse is a virtual space that combines physical and digital elements, creating immersive and connected digital worlds.
Digital twins (DTs) offer virtual prototyping, prediction, and more.
DTs can be created with 3D scene reconstruction methods that capture the real world's geometry, appearance, and dynamics.
- Score: 13.536641570721798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The metaverse is a virtual space that combines physical and digital elements,
creating immersive and connected digital worlds. For autonomous mobility, it
enables new possibilities with edge computing and digital twins (DTs) that
offer virtual prototyping, prediction, and more. DTs can be created with 3D
scene reconstruction methods that capture the real world's geometry,
appearance, and dynamics. However, sending data for real-time DT updates in the
metaverse, such as camera images and videos from connected autonomous vehicles
(CAVs) to edge servers, can increase network congestion, costs, and latency,
affecting metaverse services. Herein, a new method is proposed based on
distributed radiance fields (RFs), multi-access edge computing (MEC) network
for video compression and metaverse DT updates. RF-based encoder and decoder
are used to create and restore representations of camera images. The method is
evaluated on a dataset of camera images from the CARLA simulator. Data savings
of up to 80% were achieved for H.264 I-frame - P-frame pairs by using RFs
instead of I-frames, while maintaining high peak signal-to-noise ratio (PSNR)
and structural similarity index measure (SSIM) qualitative metrics for the
reconstructed images. Possible uses and challenges for the metaverse and
autonomous mobility are also discussed.
Related papers
- Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications.
Virtual reality (VR) transmission over wireless networks is data- and computation-intensive.
We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z) - ViFiT: Reconstructing Vision Trajectories from IMU and Wi-Fi Fine Time
Measurements [6.632056181867312]
We propose ViFiT, a transformer-based model that reconstructs vision bounding box trajectories from phone data (IMU and Fine Time Measurements)
ViFiT achieves an MRFR of 0.65 that outperforms the state-of-the-art approach for cross-modal reconstruction in LSTM-Decoder architecture.
arXiv Detail & Related papers (2023-10-04T20:05:40Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and
Events [63.984927609545856]
Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals.
We show that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios.
arXiv Detail & Related papers (2023-04-14T05:30:02Z) - VoloGAN: Adversarial Domain Adaptation for Synthetic Depth Data [0.0]
We present VoloGAN, an adversarial domain adaptation network that translates synthetic RGB-D images of a high-quality 3D model of a person, into RGB-D images that could be generated with a consumer depth sensor.
This system is especially useful to generate high amount training data for single-view 3D reconstruction algorithms replicating the real-world capture conditions.
arXiv Detail & Related papers (2022-07-19T11:30:41Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing [14.67994875448175]
Video snapshot imaging (SCI) utilizes a 2D detector to capture sequential video frames and compress them into a single measurement.
Most existing reconstruction methods are incapable of efficiently capturing long-range spatial and temporal dependencies.
We propose a flexible and robust approach based on the graph neural network (GNN) to efficiently model non-local interactions between pixels in space and time regardless of the distance.
arXiv Detail & Related papers (2022-03-01T12:13:46Z) - Deep Learning for Robust Motion Segmentation with Non-Static Cameras [0.0]
This paper proposes a new end-to-end DCNN based approach for motion segmentation, especially for captured with such non-static cameras, called MOSNET.
While other approaches focus on spatial or temporal context, the proposed approach uses 3D convolutions as a key technology to factor in temporal features in video frames.
The network is able to perform well on scenes captured with non-static cameras where the image content changes significantly during the scene.
arXiv Detail & Related papers (2021-02-22T11:58:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.