Related papers: CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration

CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration

URL: http://arxiv.org/abs/2306.08226v1
Date: Wed, 14 Jun 2023 03:39:32 GMT
Title: CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration
Authors: Jingyu Hu, Ka-Hei Hui, Zhengzhe liu, Hao Zhang and Chi-Wing Fu
Abstract summary: This paper presents a new framework that leverages a vision-language model to guide the exploration of the 3D shape space. We propose to leverage CLIP, a powerful pre-trained vision-language model, to aid the shape-space exploration. We design three exploration modes, binary-attribute-guided, text-guided, and sketch-guided, to locate suitable exploration trajectories in shape space and induce meaningful changes to the shape.
Score: 53.623649386871016
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space. Many recent methods have been developed to encode 3D shapes into a learned latent shape space to enable generative design and modeling. Yet, existing methods lack effective exploration mechanisms, despite the rich information. To this end, we propose to leverage CLIP, a powerful pre-trained vision-language model, to aid the shape-space exploration. Our idea is threefold. First, we couple the CLIP and shape spaces by generating paired CLIP and shape codes through sketch images and training a mapper network to connect the two spaces. Second, to explore the space around a given shape, we formulate a co-optimization strategy to search for the CLIP code that better matches the geometry of the shape. Third, we design three exploration modes, binary-attribute-guided, text-guided, and sketch-guided, to locate suitable exploration trajectories in shape space and induce meaningful changes to the shape. We perform a series of experiments to quantitatively and visually compare CLIPXPlore with different baselines in each of the three exploration modes, showing that CLIPXPlore can produce many meaningful exploration results that cannot be achieved by the existing solutions.

Related papers

PEGAsus: 3D Personalization of Geometry and Appearance [84.10611282310562]
PEGAsus is a new framework capable of generating Personalized 3D shapes by learning shape concepts at both Geometry and Appearance levels.<n>We formulate 3D shape personalization as extracting reusable, category-agnostic geometric and appearance attributes from reference shapes.<n>We extend our approach to region-wise concept learning, enabling flexible concept extraction, with context-aware and context-free losses.
arXiv Detail & Related papers (2026-02-09T01:41:27Z)
Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion? [5.0243930429558885]
This study introduces CISP (Contrastive Image Shape Pre training), designed to enhance 3D shape synthesis guided by 2D images. CISP aims to enrich the CLIP framework by aligning 2D images with 3D shapes in a shared embedding space. We find that, while matching CLIP in generation quality and diversity, CISP substantially improves coherence with input images.
arXiv Detail & Related papers (2024-02-02T09:09:23Z)
Unsupervised Representation Learning for Diverse Deformable Shape Collections [30.271818994854353]
We introduce a novel learning-based method for encoding and manipulating 3D surface meshes. Our method is specifically designed to create an interpretable embedding space for deformable shape collections.
arXiv Detail & Related papers (2023-10-27T13:45:30Z)
Explorable Mesh Deformation Subspaces from Unstructured Generative Models [53.23510438769862]
Deep generative models of 3D shapes often feature continuous latent spaces that can be used to explore potential variations. We present a method to explore variations among a given set of landmark shapes by constructing a mapping from an easily-navigable 2D exploration space to a subspace of a pre-trained generative model.
arXiv Detail & Related papers (2023-10-11T18:53:57Z)
ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency [39.7058456335011]
We present ShapeClipper, a novel method that reconstructs 3D object shapes from real-world single-view RGB images. ShapeClipper learns shape reconstruction from a set of single-view segmented images. We evaluate our method over three challenging real-world datasets.
arXiv Detail & Related papers (2023-04-13T03:53:12Z)
DreamStone: Image as Stepping Stone for Text-Guided 3D Shape Generation [105.97545053660619]
We present a new text-guided 3D shape generation approach DreamStone. It uses images as a stepping stone to bridge the gap between text and shape modalities for generating 3D shapes without requiring paired text and 3D data. Our approach is generic, flexible, and scalable, and it can be easily integrated with various SVR models to expand the generative space and improve the generative fidelity.
arXiv Detail & Related papers (2023-03-24T03:56:23Z)
CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language [21.727938353786218]
We introduce CLIP-Sculptor, a method to produce high-fidelity and diverse 3D shapes without the need for (text, shape) pairs during training. For improved shape diversity, we use a discrete latent space which is modeled using a transformer conditioned on CLIP's image-text embedding space.
arXiv Detail & Related papers (2022-11-02T18:50:25Z)
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training [121.46758260964114]
Pre-training across 3D vision and language remains under development because of limited training data. Recent works attempt to transfer vision-language pre-training models to 3D vision. PointCLIP converts point cloud data to multi-view depth maps, adopting CLIP for shape classification. We propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain.
arXiv Detail & Related papers (2022-10-03T16:13:14Z)
ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities. Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes. We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z)
Latent Partition Implicit with Surface Codes for 3D Representation [54.966603013209685]
We introduce a novel implicit representation to represent a single 3D shape as a set of parts in the latent space. We name our method Latent Partition Implicit (LPI), because of its ability of casting the global shape modeling into multiple local part modeling.
arXiv Detail & Related papers (2022-07-18T14:24:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.