SYNTHIA: Novel Concept Design with Affordance Composition
- URL: http://arxiv.org/abs/2502.17793v2
- Date: Thu, 10 Apr 2025 18:37:28 GMT
- Title: SYNTHIA: Novel Concept Design with Affordance Composition
- Authors: Hyeonjeong Ha, Xiaomeng Jin, Jeonghwan Kim, Jiateng Liu, Zhenhailong Wang, Khanh Duy Nguyen, Ansel Blume, Nanyun Peng, Kai-Wei Chang, Heng Ji,
- Abstract summary: We introduce SYNTHIA, a framework for generating novel, functionally coherent designs based on desired affordances.<n>We develop a curriculum learning scheme based on our ontology that contrast fine-tunes T2I models to progressively learn affordance composition.<n> Experimental results show that SYNTHIA outperforms state-of-the-art T2I models.
- Score: 114.19366716161655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image (T2I) models enable rapid concept design, making them widely used in AI-driven design. While recent studies focus on generating semantic and stylistic variations of given design concepts, functional coherence--the integration of multiple affordances into a single coherent concept--remains largely overlooked. In this paper, we introduce SYNTHIA, a framework for generating novel, functionally coherent designs based on desired affordances. Our approach leverages a hierarchical concept ontology that decomposes concepts into parts and affordances, serving as a crucial building block for functionally coherent design. We also develop a curriculum learning scheme based on our ontology that contrastively fine-tunes T2I models to progressively learn affordance composition while maintaining visual novelty. To elaborate, we (i) gradually increase affordance distance, guiding models from basic concept-affordance association to complex affordance compositions that integrate parts of distinct affordances into a single, coherent form, and (ii) enforce visual novelty by employing contrastive objectives to push learned representations away from existing concepts. Experimental results show that SYNTHIA outperforms state-of-the-art T2I models, demonstrating absolute gains of 25.1% and 14.7% for novelty and functional coherence in human evaluation, respectively.
Related papers
- A Concept-Centric Approach to Multi-Modality Learning [3.828996378105142]
We introduce a new multi-modality learning framework to create a more efficient AI system.
Our framework achieves on par with benchmark models while demonstrating more efficient learning curves.
arXiv Detail & Related papers (2024-12-18T13:40:21Z) - LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification [5.8754760054410955]
We introduce textttHi-CoDecomposition, a novel framework designed to enhance model interpretability through structured concept analysis.
Our approach not only aligns with the performance of state-of-the-art models but also advances transparency by providing clear insights into the decision-making process.
arXiv Detail & Related papers (2024-05-29T00:36:56Z) - Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks.
Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training.
This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z) - Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs [57.492124844326206]
This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision.
Our innovative framework, unlike any before, seamlessly integrates 2D correspondence matching, camera pose estimation, and NeRF rendering, fostering a synergistic enhancement of these tasks.
arXiv Detail & Related papers (2023-12-12T13:22:44Z) - Coarse-to-Fine Concept Bottleneck Models [9.910980079138206]
This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs)
Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on two levels of granularity.
Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene.
arXiv Detail & Related papers (2023-10-03T14:57:31Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams.
The learned concepts support downstream applications, such as answering questions by reasoning about unseen images.
We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z) - Interpretable Visual Reasoning via Induced Symbolic Space [75.95241948390472]
We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images.
We first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features.
We then come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words.
arXiv Detail & Related papers (2020-11-23T18:21:49Z) - Learning the Compositional Visual Coherence for Complementary
Recommendations [62.60648815930101]
Complementary recommendations aim at providing users product suggestions that are supplementary and compatible with their obtained items.
We propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents.
arXiv Detail & Related papers (2020-06-08T06:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.