Related papers: FreeVS: Generative View Synthesis on Free Driving Trajectory

FreeVS: Generative View Synthesis on Free Driving Trajectory

URL: http://arxiv.org/abs/2410.18079v1
Date: Wed, 23 Oct 2024 17:59:11 GMT
Title: FreeVS: Generative View Synthesis on Free Driving Trajectory
Authors: Qitai Wang, Lue Fan, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang,
Abstract summary: FreeVS is a novel fully generative approach that can synthesize camera views on free new trajectories in real driving scenes. FreeVS can be applied to any validation sequences without reconstruction process and synthesis views on novel trajectories.
Score: 55.49370963413221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing reconstruction-based novel view synthesis methods for driving scenes focus on synthesizing camera views along the recorded trajectory of the ego vehicle. Their image rendering performance will severely degrade on viewpoints falling out of the recorded trajectory, where camera rays are untrained. We propose FreeVS, a novel fully generative approach that can synthesize camera views on free new trajectories in real driving scenes. To control the generation results to be 3D consistent with the real scenes and accurate in viewpoint pose, we propose the pseudo-image representation of view priors to control the generation process. Viewpoint transformation simulation is applied on pseudo-images to simulate camera movement in each direction. Once trained, FreeVS can be applied to any validation sequences without reconstruction process and synthesis views on novel trajectories. Moreover, we propose two new challenging benchmarks tailored to driving scenes, which are novel camera synthesis and novel trajectory synthesis, emphasizing the freedom of viewpoints. Given that no ground truth images are available on novel trajectories, we also propose to evaluate the consistency of images synthesized on novel trajectories with 3D perception models. Experiments on the Waymo Open Dataset show that FreeVS has a strong image synthesis performance on both the recorded trajectories and novel trajectories. Project Page: https://freevs24.github.io/

Related papers

DreamDrive: Generative 4D Scene Modeling from Street View Images [55.45852373799639]
We present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction. Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references. We then render 3D-consistent driving videos via Gaussian splatting.
arXiv Detail & Related papers (2024-12-31T18:59:57Z)
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models [59.55232046525733]
We introduce StreetCrafter, a controllable video diffusion model that utilizes LiDAR point cloud renderings as pixel-level conditions. In addition, the utilization of pixel-level LiDAR conditions allows us to make accurate pixel-level edits to target scenes. Our model enables flexible control over viewpoint changes, enlarging the view for satisfying rendering regions.
arXiv Detail & Related papers (2024-12-17T18:58:55Z)
Driving Scene Synthesis on Free-form Trajectories with Generative Prior [39.24591650300784]
We propose a novel free-form driving view synthesis approach, dubbed DriveX. Our resulting model can produce high-fidelity virtual driving environments outside the recorded trajectory. Beyond real driving scenes, DriveX can also be utilized to simulate virtual driving worlds from AI-generated videos.
arXiv Detail & Related papers (2024-12-02T17:07:53Z)
Learning autonomous driving from aerial imagery [67.06858775696453]
Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views. We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle.
arXiv Detail & Related papers (2024-10-18T05:09:07Z)
Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation [54.60804602905519]
We learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together. Our approach chooses to disentangle scene geometry from scene motion, via lifting the 2D scene to 3D point clouds. To model future 3D scene motion, we propose a disentangled two-stage approach that initially forecasts ego-motion and subsequently the residual motion of dynamic objects.
arXiv Detail & Related papers (2024-07-31T08:54:50Z)
TC4D: Trajectory-Conditioned Text-to-4D Generation [94.90700997568158]
We propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components. We learn local deformations that conform to the global trajectory using supervision from a text-to-video model. Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion.
arXiv Detail & Related papers (2024-03-26T17:55:11Z)
ReShader: View-Dependent Highlights for Single Image View-Synthesis [5.736642774848791]
We propose to split the view synthesis process into two independent tasks of pixel reshading and relocation. During the reshading process, we take the single image as the input and adjust its shading based on the novel camera. This reshaded image is then used as the input to an existing view synthesis method to relocate the pixels and produce the final novel view image.
arXiv Detail & Related papers (2023-09-19T15:23:52Z)
Novel View Synthesis of Humans using Differentiable Rendering [50.57718384229912]
We present a new approach for synthesizing novel views of people in new poses. Our synthesis makes use of diffuse Gaussian primitives that represent the underlying skeletal structure of a human. Rendering these primitives gives results in a high-dimensional latent image, which is then transformed into an RGB image by a decoder network.
arXiv Detail & Related papers (2023-03-28T10:48:33Z)
Remote Sensing Novel View Synthesis with Implicit Multiplane Representations [26.33490094119609]
We propose a novel remote sensing view synthesis method by leveraging the recent advances in implicit neural representations. Considering the overhead and far depth imaging of remote sensing images, we represent the 3D space by combining implicit multiplane images (MPI) representation and deep neural networks. Images from any novel views can be freely rendered on the basis of the reconstructed model.
arXiv Detail & Related papers (2022-05-18T13:03:55Z)
Free View Synthesis [100.86844680362196]
We present a method for novel view synthesis from input images that are freely distributed around a scene. Our method does not rely on a regular arrangement of input views, can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts.
arXiv Detail & Related papers (2020-08-12T18:16:08Z)
Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision [26.885846254261626]
Continuous Object Representation Networks (CORN) is a conditional architecture that encodes an input image's geometry and appearance that map to a 3D consistent scene representation. CORN achieves well on challenging tasks such as novel view synthesis and single-view 3D reconstruction and performance comparable to state-of-the-art approaches that use direct supervision.
arXiv Detail & Related papers (2020-07-30T17:49:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.