SeqTex: Generate Mesh Textures in Video Sequence
- URL: http://arxiv.org/abs/2507.04285v1
- Date: Sun, 06 Jul 2025 07:58:36 GMT
- Title: SeqTex: Generate Mesh Textures in Video Sequence
- Authors: Ze Yuan, Xin Yu, Yangtian Sun, Yuan-Chen Guo, Yan-Pei Cao, Ding Liang, Xiaojuan Qi,
- Abstract summary: We introduce SeqTex, a novel end-to-end framework for training 3D texture generative models.<n>We show that SeqTex achieves state-of-the-art performance on both image-conditioned and text-conditioned 3D texture generation tasks.
- Score: 62.766839821764144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training native 3D texture generative models remains a fundamental yet challenging problem, largely due to the limited availability of large-scale, high-quality 3D texture datasets. This scarcity hinders generalization to real-world scenarios. To address this, most existing methods finetune foundation image generative models to exploit their learned visual priors. However, these approaches typically generate only multi-view images and rely on post-processing to produce UV texture maps -- an essential representation in modern graphics pipelines. Such two-stage pipelines often suffer from error accumulation and spatial inconsistencies across the 3D surface. In this paper, we introduce SeqTex, a novel end-to-end framework that leverages the visual knowledge encoded in pretrained video foundation models to directly generate complete UV texture maps. Unlike previous methods that model the distribution of UV textures in isolation, SeqTex reformulates the task as a sequence generation problem, enabling the model to learn the joint distribution of multi-view renderings and UV textures. This design effectively transfers the consistent image-space priors from video foundation models into the UV domain. To further enhance performance, we propose several architectural innovations: a decoupled multi-view and UV branch design, geometry-informed attention to guide cross-domain feature alignment, and adaptive token resolution to preserve fine texture details while maintaining computational efficiency. Together, these components allow SeqTex to fully utilize pretrained video priors and synthesize high-fidelity UV texture maps without the need for post-processing. Extensive experiments show that SeqTex achieves state-of-the-art performance on both image-conditioned and text-conditioned 3D texture generation tasks, with superior 3D consistency, texture-geometry alignment, and real-world generalization.
Related papers
- UniTEX: Universal High Fidelity Generative Texturing for 3D Shapes [35.667175445637604]
We present UniTEX, a novel two-stage 3D texture generation framework.<n>UniTEX achieves superior visual quality and texture integrity compared to existing approaches.
arXiv Detail & Related papers (2025-05-29T08:58:41Z) - RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis [10.350576861948952]
RomanTex is a multiview-based texture generation framework that integrates a multi-attention network with an underlying 3D representation.<n>Our method achieves state-of-the-art results in texture quality and consistency.
arXiv Detail & Related papers (2025-03-24T17:56:11Z) - TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features [78.13246375582906]
We present a novel approach that learns a volumetric texture field from a single textured mesh by mapping semantic features to surface target colors.<n>Our approach achieves superior texture quality across 3D models in applications like game development.
arXiv Detail & Related papers (2025-03-20T18:35:03Z) - TEXGen: a Generative Diffusion Model for Mesh Textures [63.43159148394021]
We focus on the fundamental problem of learning in the UV texture space itself.
We propose a scalable network architecture that interleaves convolutions on UV maps with attention layers on point clouds.
We train a 700 million parameter diffusion model that can generate UV texture maps guided by text prompts and single-view images.
arXiv Detail & Related papers (2024-11-22T05:22:11Z) - MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D [63.9188712646076]
Texturing is a 3D asset production, which enhances the visual appeal and visual appeal.
Despite recent advancements, methods often yield subpar results, primarily due to local discontinuities.
We propose a novel framework called MVPaint, which can generate high-resolution, seamless multiview consistency.
arXiv Detail & Related papers (2024-11-04T17:59:39Z) - TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion
Models [77.85129451435704]
We present a new method to synthesize textures for 3D, using large-scale-guided image diffusion models.
Specifically, we leverage latent diffusion models, apply the set denoising model and aggregate denoising text map.
arXiv Detail & Related papers (2023-10-20T19:15:29Z) - FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction [46.3392612457273]
This dataset contains over 50,000 high-quality texture UV-maps with even illuminations, neutral expressions, and cleaned facial regions.
Our pipeline utilizes the recent advances in StyleGAN-based facial image editing approaches.
Experiments show that our method improves the reconstruction accuracy over state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-25T03:21:05Z) - AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis [78.17671694498185]
We propose AUV-Net which learns to embed 3D surfaces into a 2D aligned UV space.
As a result, textures are aligned across objects, and can thus be easily synthesized by generative models of images.
The learned UV mapping and aligned texture representations enable a variety of applications including texture transfer, texture synthesis, and textured single view 3D reconstruction.
arXiv Detail & Related papers (2022-04-06T21:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.