3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions
- URL: http://arxiv.org/abs/2212.11263v1
- Date: Wed, 21 Dec 2022 18:54:47 GMT
- Title: 3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions
- Authors: Dale Decatur, Itai Lang, Rana Hanocka
- Abstract summary: 3D Highlighter is a technique for localizing semantic regions on a mesh using text as input.
Our system demonstrates the ability to reason about where to place non-obviously related concepts on an input 3D shape.
- Score: 14.65300898522962
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present 3D Highlighter, a technique for localizing semantic regions on a
mesh using text as input. A key feature of our system is the ability to
interpret "out-of-domain" localizations. Our system demonstrates the ability to
reason about where to place non-obviously related concepts on an input 3D
shape, such as adding clothing to a bare 3D animal model. Our method
contextualizes the text description using a neural field and colors the
corresponding region of the shape using a probability-weighted blend. Our
neural optimization is guided by a pre-trained CLIP encoder, which bypasses the
need for any 3D datasets or 3D annotations. Thus, 3D Highlighter is highly
flexible, general, and capable of producing localizations on a myriad of input
shapes. Our code is publicly available at
https://github.com/threedle/3DHighlighter.
Related papers
- OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding [54.981605111365056]
This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding.
Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing.
arXiv Detail & Related papers (2024-06-04T07:42:33Z) - BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D
Scene Generation [96.58789785954409]
We propose a practical and efficient 3D representation that incorporates an equivariant radiance field with the guidance of a bird's-eye view map.
We produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.
arXiv Detail & Related papers (2023-12-04T18:56:10Z) - WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space [77.92350895927922]
We propose WildFusion, a new approach to 3D-aware image synthesis based on latent diffusion models (LDMs)
Our 3D-aware LDM is trained without any direct supervision from multiview images or 3D geometry.
This opens up promising research avenues for scalable 3D-aware image synthesis and 3D content creation from in-the-wild image data.
arXiv Detail & Related papers (2023-11-22T18:25:51Z) - 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with
2D Diffusion Models [102.75875255071246]
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community.
We propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models.
arXiv Detail & Related papers (2023-11-09T15:51:27Z) - High-Fidelity 3D Face Generation from Natural Language Descriptions [12.22081892575208]
We argue the major obstacle lies in 1) the lack of high-quality 3D face data with descriptive text annotation, and 2) the complex mapping relationship between descriptive language space and shape/appearance space.
We build Describe3D dataset, the first large-scale dataset with fine-grained text descriptions for text-to-3D face generation task.
We propose a two-stage framework to first generate a 3D face that matches the concrete descriptions, then optimize the parameters in the 3D shape and texture space with abstract description to refine the 3D face model.
arXiv Detail & Related papers (2023-05-05T06:10:15Z) - Text2Mesh: Text-Driven Neural Stylization for Meshes [18.435567297462416]
Our framework, Text2Mesh, stylizes a 3D mesh by predicting color and local geometric details which conform to a target text prompt.
We consider a disentangled representation of a 3D object using a fixed mesh input (content) coupled with a learned neural network, which we term neural style field network.
In order to modify style, we obtain a similarity score between a text prompt (describing style) and a stylized mesh by harnessing the representational power of CLIP.
arXiv Detail & Related papers (2021-12-06T18:23:29Z) - NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of
3D Scenes [25.26518805603798]
NeSF is a method for producing 3D semantic fields from posed RGB images alone.
Our method is the first to offer truly dense 3D scene segmentations requiring only 2D supervision for training.
arXiv Detail & Related papers (2021-11-25T21:44:54Z) - Tracking People with 3D Representations [78.97070307547283]
We present a novel approach for tracking multiple people in video.
Unlike past approaches which employ 2D representations, we employ 3D representations of people, located in three-dimensional space.
We find that 3D representations are more effective than 2D representations for tracking in these settings.
arXiv Detail & Related papers (2021-11-15T16:15:21Z) - Parameter-Efficient Person Re-identification in the 3D Space [51.092669618679615]
We project 2D images to a 3D space and introduce a novel parameter-efficient Omni-scale Graph Network (OG-Net) to learn the pedestrian representation directly from 3D point clouds.
OG-Net effectively exploits the local information provided by sparse 3D points and takes advantage of the structure and appearance information in a coherent manner.
We are among the first attempts to conduct person re-identification in the 3D space.
arXiv Detail & Related papers (2020-06-08T13:20:33Z) - Local Implicit Grid Representations for 3D Scenes [24.331110387905962]
We introduce Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality.
We train an autoencoder to learn an embedding of local crops of 3D shapes at that size.
Then, we use the decoder as a component in a shape optimization that solves for a set of latent codes on a regular grid of overlapping crops.
arXiv Detail & Related papers (2020-03-19T18:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.