FROMAT: Multiview Material Appearance Transfer via Few-Shot Self-Attention Adaptation
- URL: http://arxiv.org/abs/2512.09617v1
- Date: Wed, 10 Dec 2025 13:06:40 GMT
- Title: FROMAT: Multiview Material Appearance Transfer via Few-Shot Self-Attention Adaptation
- Authors: Hubert Kompanowski, Varun Jampani, Aaryaman Vasishta, Binh-Son Hua,
- Abstract summary: We present a lightweight adaptation technique for appearance transfer in multiview diffusion models.<n>Our method learns to combine object identity from an input image with appearance cues rendered in a separate reference image, producing multi-view-consistent output.
- Score: 49.74776147964999
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Multiview diffusion models have rapidly emerged as a powerful tool for content creation with spatial consistency across viewpoints, offering rich visual realism without requiring explicit geometry and appearance representation. However, compared to meshes or radiance fields, existing multiview diffusion models offer limited appearance manipulation, particularly in terms of material, texture, or style. In this paper, we present a lightweight adaptation technique for appearance transfer in multiview diffusion models. Our method learns to combine object identity from an input image with appearance cues rendered in a separate reference image, producing multi-view-consistent output that reflects the desired materials, textures, or styles. This allows explicit specification of appearance parameters at generation time while preserving the underlying object geometry and view coherence. We leverage three diffusion denoising processes responsible for generating the original object, the reference, and the target images, and perform reverse sampling to aggregate a small subset of layer-wise self-attention features from the object and the reference to influence the target generation. Our method requires only a few training examples to introduce appearance awareness to pretrained multiview models. The experiments show that our method provides a simple yet effective way toward multiview generation with diverse appearance, advocating the adoption of implicit generative 3D representations in practice.
Related papers
- ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models [70.28556518166037]
We introduce ViewMask-1-to-3, a pioneering approach to apply discrete diffusion models to multi-view image generation.<n>By unifying language and vision through masked token prediction, our approach enables progressive generation of multiple viewpoints.<n>Our approach ranks first on average across GSO and 3D-FUTURE datasets in terms of PSNR, SSIM, and LPIPS.
arXiv Detail & Related papers (2025-12-16T05:15:07Z) - UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features [8.962212671008201]
We propose a novel model dubbed as UniView, which can leverage reference images from a similar object to provide strong prior information during view synthesis.<n>Our UniView significantly improves novel view synthesis performance and outperforms state-of-the-art methods on the challenging datasets.
arXiv Detail & Related papers (2025-09-05T08:54:57Z) - MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention [83.56588173102594]
We introduce a solution called mesh attention to enable training at 1024x1024 resolution.<n>This approach significantly reduces the complexity of multiview attention while maintaining cross-view consistency.<n>Building on this foundation, we devise a mesh attention block and combine it with keypoint conditioning to create our human-specific multiview diffusion model, MEAT.
arXiv Detail & Related papers (2025-03-11T17:50:59Z) - Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion [27.836518920611557]
We introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints.<n>We train this model on a collection of more than 60 million multi-view samples from publicly available datasets.<n>We report state-of-the-art results in multiple novel view synthesis benchmarks, as well as multi-view stereo and video depth estimation.
arXiv Detail & Related papers (2025-01-30T23:43:06Z) - IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.<n>Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.<n>We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.