VIDES: Virtual Interior Design via Natural Language and Visual Guidance
- URL: http://arxiv.org/abs/2308.13795v1
- Date: Sat, 26 Aug 2023 07:41:42 GMT
- Title: VIDES: Virtual Interior Design via Natural Language and Visual Guidance
- Authors: Minh-Hien Le and Chi-Bien Chu and Khanh-Duy Le and Tam V. Nguyen and
Minh-Triet Tran and Trung-Nghia Le
- Abstract summary: We propose Virtual Interior DESign (VIDES) system in response to this challenge.
Leveraging cutting-edge technology in generative AI, our system can assist users in generating and editing indoor scene concepts.
- Score: 16.35842298296878
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interior design is crucial in creating aesthetically pleasing and functional
indoor spaces. However, developing and editing interior design concepts
requires significant time and expertise. We propose Virtual Interior DESign
(VIDES) system in response to this challenge. Leveraging cutting-edge
technology in generative AI, our system can assist users in generating and
editing indoor scene concepts quickly, given user text description and visual
guidance. Using both visual guidance and language as the conditional inputs
significantly enhances the accuracy and coherence of the generated scenes,
resulting in visually appealing designs. Through extensive experimentation, we
demonstrate the effectiveness of VIDES in developing new indoor concepts,
changing indoor styles, and replacing and removing interior objects. The system
successfully captures the essence of users' descriptions while providing
flexibility for customization. Consequently, this system can potentially reduce
the entry barrier for indoor design, making it more accessible to users with
limited technical skills and reducing the time required to create high-quality
images. Individuals who have a background in design can now easily communicate
their ideas visually and effectively present their design concepts.
https://sites.google.com/view/ltnghia/research/VIDES
Related papers
- DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design Generation [25.532400438564334]
We propose DiffDesign, a controllable diffusion model with meta priors for efficient interior design generation.
Specifically, we utilize the generative priors of a 2D diffusion model pre-trained on a large image dataset as our rendering backbone.
We further guide the denoising process by disentangling cross-attention control over design attributes, such as appearance, pose, and size, and introduce an optimal transfer-based alignment module to enforce view consistency.
arXiv Detail & Related papers (2024-11-25T11:36:34Z) - MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models [51.1034358143232]
We introduce component-controllable personalization, a novel task that pushes the boundaries of text-to-image (T2I) models.
To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics.
arXiv Detail & Related papers (2024-10-17T09:22:53Z) - Inspired by AI? A Novel Generative AI System To Assist Conceptual Automotive Design [6.001793288867721]
Design inspiration is crucial for establishing the direction of a design as well as evoking feelings and conveying meanings during the conceptual design process.
Many practice designers use text-based searches on platforms like Pinterest to gather image ideas, followed by sketching on paper or using digital tools to develop concepts.
Emerging generative AI techniques, such as diffusion models, offer a promising avenue to streamline these processes by swiftly generating design concepts based on text and image inspiration inputs.
arXiv Detail & Related papers (2024-06-06T17:04:14Z) - I-Design: Personalized LLM Interior Designer [57.00412237555167]
I-Design is a personalized interior designer that allows users to generate and visualize their design goals through natural language communication.
I-Design starts with a team of large language model agents that engage in dialogues and logical reasoning with one another.
The final design is then constructed in 3D by retrieving and integrating assets from an existing object database.
arXiv Detail & Related papers (2024-04-03T16:17:53Z) - MyVLM: Personalizing VLMs for User-Specific Queries [78.33252556805931]
We take a first step toward the personalization of vision-language models, enabling them to learn and reason over user-provided concepts.
To effectively recognize a variety of user-specific concepts, we augment the VLM with external concept heads that function as toggles for the model.
Having recognized the concept, we learn a new concept embedding in the intermediate feature space of the VLM.
This embedding is tasked with guiding the language model to naturally integrate the target concept in its generated response.
arXiv Detail & Related papers (2024-03-21T17:51:01Z) - DressCode: Autoregressively Sewing and Generating Garments from Text Guidance [61.48120090970027]
DressCode aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation.
We first introduce SewingGPT, a GPT-based architecture integrating cross-attention with text-conditioned embedding to generate sewing patterns.
We then tailor a pre-trained Stable Diffusion to generate tile-based Physically-based Rendering (PBR) textures for the garments.
arXiv Detail & Related papers (2024-01-29T16:24:21Z) - VISHIEN-MAAT: Scrollytelling visualization design for explaining Siamese
Neural Network concept to non-technical users [8.939421900877742]
This work proposes a novel visualization design for creating a scrollytelling that can effectively explain an AI concept to non-technical users.
Our approach helps create a visualization valuable for a short-timeline situation like a sales pitch.
arXiv Detail & Related papers (2023-04-04T08:26:54Z) - Towards Counterfactual Image Manipulation via CLIP [106.94502632502194]
Existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images.
We investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP)
We design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives.
arXiv Detail & Related papers (2022-07-06T17:02:25Z) - Learning-based pose edition for efficient and interactive design [55.41644538483948]
In computer-aided animation artists define the key poses of a character by manipulating its skeletons.
Character pose must respect many ill-defined constraints, and so the resulting realism greatly depends on the animator's skill and knowledge.
We describe an efficient tool for pose design, allowing users to intuitively manipulate a pose to create character animations.
arXiv Detail & Related papers (2021-07-01T12:15:02Z) - DeFINE: Delayed Feedback based Immersive Navigation Environment for
Studying Goal-Directed Human Navigation [10.7197371210731]
Delayed Feedback based Immersive Navigation Environment (DeFINE) is a framework that allows for easy creation and administration of navigation tasks.
DeFINE has a built-in capability to provide performance feedback to participants during an experiment.
arXiv Detail & Related papers (2020-03-06T11:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.