Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into
3D, alleviate Janus problem and Beyond
- URL: http://arxiv.org/abs/2304.04968v3
- Date: Wed, 26 Apr 2023 13:20:56 GMT
- Title: Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into
3D, alleviate Janus problem and Beyond
- Authors: Mohammadreza Armandpour, Ali Sadeghian, Huangjie Zheng, Amir
Sadeghian, Mingyuan Zhou
- Abstract summary: We propose Perp-Neg, a new algorithm that leverages the geometrical properties of the score space to address the shortcomings of the current negative prompts algorithm.
Perp-Neg does not require any training or fine-tuning of the model.
We demonstrate that Perp-Neg provides greater flexibility in generating images by enabling users to edit out unwanted concepts.
- Score: 49.94798429552442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although text-to-image diffusion models have made significant strides in
generating images from text, they are sometimes more inclined to generate
images like the data on which the model was trained rather than the provided
text. This limitation has hindered their usage in both 2D and 3D applications.
To address this problem, we explored the use of negative prompts but found that
the current implementation fails to produce desired results, particularly when
there is an overlap between the main and negative prompts. To overcome this
issue, we propose Perp-Neg, a new algorithm that leverages the geometrical
properties of the score space to address the shortcomings of the current
negative prompts algorithm. Perp-Neg does not require any training or
fine-tuning of the model. Moreover, we experimentally demonstrate that Perp-Neg
provides greater flexibility in generating images by enabling users to edit out
unwanted concepts from the initially generated images in 2D cases. Furthermore,
to extend the application of Perp-Neg to 3D, we conducted a thorough
exploration of how Perp-Neg can be used in 2D to condition the diffusion model
to generate desired views, rather than being biased toward the canonical views.
Finally, we applied our 2D intuition to integrate Perp-Neg with the
state-of-the-art text-to-3D (DreamFusion) method, effectively addressing its
Janus (multi-head) problem. Our project page is available at
https://Perp-Neg.github.io/
Related papers
- SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images [49.7344030427291]
We study the problem of single-image 3D object reconstruction.
Recent works have diverged into two directions: regression-based modeling and generative modeling.
We present SPAR3D, a novel two-stage approach aiming to take the best of both directions.
arXiv Detail & Related papers (2025-01-08T18:52:03Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - Learning Naturally Aggregated Appearance for Efficient 3D Editing [94.47518916521065]
We propose to replace the color field with an explicit 2D appearance aggregation, also called canonical image.
To avoid the distortion effect and facilitate convenient editing, we complement the canonical image with a projection field that maps 3D points onto 2D pixels for texture lookup.
Our representation, dubbed AGAP, well supports various ways of 3D editing (e.g., stylization, interactive drawing, and content extraction) with no need of re-optimization.
arXiv Detail & Related papers (2023-12-11T18:59:31Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model
Given Sparse Views [20.685453627120832]
Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings.
DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images.
arXiv Detail & Related papers (2023-06-06T05:26:26Z) - TextMesh: Generation of Realistic 3D Meshes From Text Prompts [56.2832907275291]
We propose a novel method for generation of highly realistic-looking 3D meshes.
To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction.
arXiv Detail & Related papers (2023-04-24T20:29:41Z) - Debiasing Scores and Prompts of 2D Diffusion for View-consistent
Text-to-3D Generation [38.032010026146146]
We propose two approaches to debias the score-distillation frameworks for view-consistent text-to-3D generation.
One of the most notable issues is the Janus problem, where the most canonical view of an object appears in other views.
Our methods improve the realism of the generated 3D objects by significantly reducing artifacts and achieve a good trade-off between faithfulness to the 2D diffusion models and 3D consistency with little overhead.
arXiv Detail & Related papers (2023-03-27T17:31:13Z) - DreamFusion: Text-to-3D using 2D Diffusion [52.52529213936283]
Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs.
In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.
Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
arXiv Detail & Related papers (2022-09-29T17:50:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.