Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models
- URL: http://arxiv.org/abs/2310.08574v2
- Date: Tue, 25 Jun 2024 12:50:34 GMT
- Title: Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models
- Authors: David Chuan-En Lin, Nikolas Martelaro,
- Abstract summary: Jigsaw is a prototype system that employs puzzle pieces as metaphors to represent foundation models.
Designers can combine different foundation model capabilities across various modalities by assembling compatible puzzle pieces.
- Score: 4.435190193476497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in AI foundation models have made it possible for them to be utilized off-the-shelf for creative tasks, including ideating design concepts or generating visual prototypes. However, integrating these models into the creative process can be challenging as they often exist as standalone applications tailored to specific tasks. To address this challenge, we introduce Jigsaw, a prototype system that employs puzzle pieces as metaphors to represent foundation models. Jigsaw allows designers to combine different foundation model capabilities across various modalities by assembling compatible puzzle pieces. To inform the design of Jigsaw, we interviewed ten designers and distilled design goals. In a user study, we showed that Jigsaw enhanced designers' understanding of available foundation model capabilities, provided guidance on combining capabilities across different modalities and tasks, and served as a canvas to support design exploration, prototyping, and documentation.
Related papers
- Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models [9.900586490845694]
This study introduces a generative imputation model leveraging graph attention networks and tabular diffusion models for completing missing parametric data in engineering designs.
We demonstrate our model significantly outperforms existing classical methods, such as MissForest, hotDeck, PPCA, and TabCSDI in both the accuracy and diversity of imputation options.
The graph model helps accurately capture and impute complex parametric interdependencies from an assembly graph, which is key for design problems.
arXiv Detail & Related papers (2024-06-17T16:03:17Z) - Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models [81.6240188672294]
In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources.
We introduce a novel multimodal instruction-following framework for layout planning, allowing users to easily arrange visual elements into tailored layouts.
Our method not only simplifies the design process for non-professionals but also surpasses the performance of few-shot GPT-4V models, with mIoU higher by 12% on Crello.
arXiv Detail & Related papers (2024-04-23T17:58:33Z) - Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem.
We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples.
In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z) - Visual Instruction Tuning towards General-Purpose Multimodal Model: A
Survey [59.95153883166705]
Traditional computer vision generally solves each single task independently by a dedicated model with the task instruction implicitly designed in the model architecture.
Visual Instruction Tuning (VIT) has been intensively studied recently, which finetunes a large vision model with language as task instructions.
This work aims to provide a systematic review of visual instruction tuning, covering (1) the background that presents computer vision task paradigms and the development of VIT; (2) the foundations of VIT that introduce commonly used network architectures, visual instruction tuning frameworks and objectives, and evaluation setups and tasks; and (3) the commonly used datasets in visual instruction tuning and evaluation.
arXiv Detail & Related papers (2023-12-27T14:54:37Z) - DesignGPT: Multi-Agent Collaboration in Design [4.6272626111555955]
DesignGPT uses artificial intelligence agents to simulate the roles of different positions in the design company and allows human designers to collaborate with them in natural language.
Experimental results show that compared with separate AI tools, DesignGPT improves the performance of designers.
arXiv Detail & Related papers (2023-11-20T08:05:52Z) - ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model
Reuse [59.500060790983994]
This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend.
ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
arXiv Detail & Related papers (2023-08-17T19:12:13Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - AI Art in Architecture [0.6853165736531939]
Recent diffusion-based AI art platforms are able to create impressive images from simple text descriptions.
This is also true for early stages of architectural design with multiple stages of ideation, sketching and modelling.
We research the applicability of the platforms Midjourney, DALL-E 2 and StableDiffusion to a series of common use cases in architectural design.
arXiv Detail & Related papers (2022-12-19T12:24:14Z) - Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks [53.09649785009528]
In this paper, we explore a paradigm that does not require training to obtain new models.
Similar to the birth of CNN inspired by receptive fields in the biological visual system, we propose Model Disassembling and Assembling.
For model assembling, we present the alignment padding strategy and parameter scaling strategy to construct a new model tailored for a specific task.
arXiv Detail & Related papers (2022-03-25T05:27:28Z) - CreativeGAN: Editing Generative Adversarial Networks for Creative Design
Synthesis [1.933681537640272]
This paper proposes an automated method, named CreativeGAN, for generating novel designs.
It does so by identifying components that make a design unique and modifying a GAN model such that it becomes more likely to generate designs with identified unique components.
Using a dataset of bicycle designs, we demonstrate that the method can create new bicycle designs with unique frames and handles, and rare novelties to a broad set of designs.
arXiv Detail & Related papers (2021-03-10T18:22:35Z) - Designing Machine Learning Toolboxes: Concepts, Principles and Patterns [0.0]
We provide an overview of key patterns in the design of AI modeling toolboxes.
Our analysis can not only explain the design of existing toolboxes, but also guide the development of new ones.
arXiv Detail & Related papers (2021-01-13T08:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.