MuVieCAST: Multi-View Consistent Artistic Style Transfer
- URL: http://arxiv.org/abs/2312.05046v1
- Date: Fri, 8 Dec 2023 14:01:03 GMT
- Title: MuVieCAST: Multi-View Consistent Artistic Style Transfer
- Authors: Nail Ibrahimli, Julian F. P. Kooij, Liangliang Nan
- Abstract summary: We introduce MuVieCAST, a modular multi-view consistent style transfer network architecture.
MuVieCAST supports both sparse and dense views, making it versatile enough to handle a wide range of multi-view image datasets.
- Score: 6.767885381740952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce MuVieCAST, a modular multi-view consistent style transfer
network architecture that enables consistent style transfer between multiple
viewpoints of the same scene. This network architecture supports both sparse
and dense views, making it versatile enough to handle a wide range of
multi-view image datasets. The approach consists of three modules that perform
specific tasks related to style transfer, namely content preservation, image
transformation, and multi-view consistency enforcement. We extensively evaluate
our approach across multiple application domains including depth-map-based
point cloud fusion, mesh reconstruction, and novel-view synthesis. Our
experiments reveal that the proposed framework achieves an exceptional
generation of stylized images, exhibiting consistent outcomes across
perspectives. A user study focusing on novel-view synthesis further confirms
these results, with approximately 68\% of cases participants expressing a
preference for our generated outputs compared to the recent state-of-the-art
method. Our modular framework is extensible and can easily be integrated with
various backbone architectures, making it a flexible solution for multi-view
style transfer. More results are demonstrated on our project page:
muviecast.github.io
Related papers
- RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement [19.751696790765635]
We make the first attempt to investigate multi-view low-light image enhancement.
We propose a deep multi-view enhancement framework based on the Recurrent Collaborative Network (RCNet)
Experimental results demonstrate that our RCNet significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-09-06T15:49:49Z) - Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation [11.611045114232187]
Recent methods only conduct view synthesis between existing camera views, leading to insufficient guidance.
We try to synthesize more virtual camera views by flow-based video frame making (VFI)
For multi-frame inference, to sidestep the problem of dynamic objects encountered by explicit geometry-based methods like ManyDepth, we return to the feature fusion paradigm.
We construct a unified self-supervised learning framework, named Mono-ViFI, to bilaterally connect single- and multi-frame depth.
arXiv Detail & Related papers (2024-07-19T08:51:51Z) - CountFormer: Multi-View Crowd Counting Transformer [43.92763885594129]
We propose a 3D MVC framework called textbfCountFormer to elevate multi-view image-level features to a scene-level volume representation.
By incorporating a camera encoding strategy, CountFormer successfully embeds camera parameters into the volume query and image-level features.
The proposed method performs favorably against the state-of-the-art approaches across various widely used datasets.
arXiv Detail & Related papers (2024-07-02T08:19:48Z) - Many-to-many Image Generation with Auto-regressive Diffusion Models [59.5041405824704]
This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images.
We present MIS, a novel large-scale multi-image dataset, containing 12M synthetic multi-image samples, each with 25 interconnected images.
We learn M2M, an autoregressive model for many-to-many generation, where each image is modeled within a diffusion framework.
arXiv Detail & Related papers (2024-04-03T23:20:40Z) - Consolidating Attention Features for Multi-view Image Editing [126.19731971010475]
We focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views.
We introduce QNeRF, a neural radiance field trained on the internal query features of the edited images.
We refine the process through a progressive, iterative method that better consolidates queries across the diffusion timesteps.
arXiv Detail & Related papers (2024-02-22T18:50:18Z) - ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion [61.37481051263816]
Given a single image of a 3D object, this paper proposes a method (named ConsistNet) that is able to generate multiple images of the same object.
Our method effectively learns 3D consistency over a frozen Zero123 backbone and can generate 16 surrounding views of the object within 40 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-10-16T12:29:29Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions.
StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN.
visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space.
instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z) - Distribution Aligned Multimodal and Multi-Domain Image Stylization [76.74823384524814]
We propose a unified framework for multimodal and multi-domain style transfer.
The key component of our method is a novel style distribution alignment module.
We validate our proposed framework on painting style transfer with a variety of different artistic styles and genres.
arXiv Detail & Related papers (2020-06-02T07:25:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.