Related papers: Fashion Matrix: Editing Photos by Just Talking

Fashion Matrix: Editing Photos by Just Talking

URL: http://arxiv.org/abs/2307.13240v1
Date: Tue, 25 Jul 2023 04:06:25 GMT
Title: Fashion Matrix: Editing Photos by Just Talking
Authors: Zheng Chong, Xujie Zhang, Fuwei Zhao, Zhenyu Xie and Xiaodan Liang
Abstract summary: We develop a hierarchical AI system called Fashion Matrix dedicated to editing photos by just talking. Fashion Matrix employs Large Language Models (LLMs) as its foundational support and engages in iterative interactions with users. Visual Foundation Models are leveraged to generate edited images from text prompts and masks, thereby facilitating the automation of fashion editing processes.
Score: 66.83502497764698
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The utilization of Large Language Models (LLMs) for the construction of AI systems has garnered significant attention across diverse fields. The extension of LLMs to the domain of fashion holds substantial commercial potential but also inherent challenges due to the intricate semantic interactions in fashion-related generation. To address this issue, we developed a hierarchical AI system called Fashion Matrix dedicated to editing photos by just talking. This system facilitates diverse prompt-driven tasks, encompassing garment or accessory replacement, recoloring, addition, and removal. Specifically, Fashion Matrix employs LLM as its foundational support and engages in iterative interactions with users. It employs a range of Semantic Segmentation Models (e.g., Grounded-SAM, MattingAnything, etc.) to delineate the specific editing masks based on user instructions. Subsequently, Visual Foundation Models (e.g., Stable Diffusion, ControlNet, etc.) are leveraged to generate edited images from text prompts and masks, thereby facilitating the automation of fashion editing processes. Experiments demonstrate the outstanding ability of Fashion Matrix to explores the collaborative potential of functionally diverse pre-trained models in the domain of fashion editing.

Related papers

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation [36.66066619847558]
Fashion industry has increasingly adopted AI technologies to enhance customer experience. Fashion-RAG is first work to introduce a retrieval-augmented generation approach specifically tailored for multimodal fashion image editing.
arXiv Detail & Related papers (2025-04-18T18:02:33Z)
BrushEdit: All-In-One Image Inpainting and Editing [79.55816192146762]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm. We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model. Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z)
Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era [50.19334853510935]
Recent strides in instruction-based editing have enabled intuitive interaction with visual content, using natural language as a bridge between user intent and complex editing operations. We aim to democratize powerful visual editing across various industries, from entertainment to education.
arXiv Detail & Related papers (2024-11-15T05:18:15Z)
DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing [26.090574235851083]
We introduce a new fashion image editing architecture based on latent diffusion models, called Detail-Preserved Diffusion Models (DPDEdit) DPDEdit guides the fashion image generation of diffusion models by integrating text prompts, region masks, human pose images, and garment texture images. To transfer the detail of the given garment texture into the target fashion image, we propose a texture injection and refinement mechanism.
arXiv Detail & Related papers (2024-09-02T09:15:26Z)
AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion [25.61572702219732]
Fashion image editing aims to modify a person's appearance based on a given instruction. Current methods require auxiliary tools like segmenters and keypoint extractors, lacking a flexible and unified framework. We propose AnyDesign, a diffusion-based method that enables mask-free editing on versatile areas.
arXiv Detail & Related papers (2024-08-21T12:04:32Z)
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models. T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z)
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models [91.22477798288003]
This paper introduces SmartEdit, a novel approach to instruction-based image editing. It exploits Multimodal Large Language Models (MLLMs) to enhance their understanding and reasoning capabilities. We show that a small amount of complex instruction editing data can effectively stimulate SmartEdit's editing capabilities for more complex instructions.
arXiv Detail & Related papers (2023-12-11T17:54:11Z)
Guiding Instruction-based Image Editing via Multimodal Large Language Models [102.82211398699644]
Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation. We investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE) MGIE learns to derive expressive instructions and provides explicit guidance.
arXiv Detail & Related papers (2023-09-29T10:01:50Z)
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing [40.70752781891058]
We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images. We tackle this problem by proposing a new architecture based on latent diffusion models. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
arXiv Detail & Related papers (2023-04-04T18:03:04Z)
FICE: Text-Conditioned Fashion Image Editing With Guided GAN Inversion [16.583537785874604]
We propose a novel text-conditioned editing model, called FICE, capable of handling a wide variety of diverse text descriptions. FICE generates highly realistic fashion images and leads to stronger editing performance than existing competing approaches.
arXiv Detail & Related papers (2023-01-05T15:33:23Z)
SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing [94.31103255204933]
We propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images. Our model learns a unified editing space that is more semantic, intuitive, and easy to manipulate. We show that by inverting image pairs into latent codes of the learned editing space, our model can be leveraged for various downstream editing tasks.
arXiv Detail & Related papers (2021-11-30T23:53:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.