Related papers: SYNTHIA: Novel Concept Design with Affordance Composition

SYNTHIA: Novel Concept Design with Affordance Composition

URL: http://arxiv.org/abs/2502.17793v2
Date: Thu, 10 Apr 2025 18:37:28 GMT
Title: SYNTHIA: Novel Concept Design with Affordance Composition
Authors: Hyeonjeong Ha, Xiaomeng Jin, Jeonghwan Kim, Jiateng Liu, Zhenhailong Wang, Khanh Duy Nguyen, Ansel Blume, Nanyun Peng, Kai-Wei Chang, Heng Ji,
Abstract summary: We introduce SYNTHIA, a framework for generating novel, functionally coherent designs based on desired affordances.<n>We develop a curriculum learning scheme based on our ontology that contrast fine-tunes T2I models to progressively learn affordance composition.<n> Experimental results show that SYNTHIA outperforms state-of-the-art T2I models.
Score: 114.19366716161655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image (T2I) models enable rapid concept design, making them widely used in AI-driven design. While recent studies focus on generating semantic and stylistic variations of given design concepts, functional coherence--the integration of multiple affordances into a single coherent concept--remains largely overlooked. In this paper, we introduce SYNTHIA, a framework for generating novel, functionally coherent designs based on desired affordances. Our approach leverages a hierarchical concept ontology that decomposes concepts into parts and affordances, serving as a crucial building block for functionally coherent design. We also develop a curriculum learning scheme based on our ontology that contrastively fine-tunes T2I models to progressively learn affordance composition while maintaining visual novelty. To elaborate, we (i) gradually increase affordance distance, guiding models from basic concept-affordance association to complex affordance compositions that integrate parts of distinct affordances into a single, coherent form, and (ii) enforce visual novelty by employing contrastive objectives to push learned representations away from existing concepts. Experimental results show that SYNTHIA outperforms state-of-the-art T2I models, demonstrating absolute gains of 25.1% and 14.7% for novelty and functional coherence in human evaluation, respectively.

Related papers

Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment [51.40989269202702]
aesthetic quality assessment task is crucial for developing a human-aligned quantitative evaluation system for AIGC.<n>We propose ArtQuant, an aesthetics assessment framework for artistic images which couples isolated aesthetic dimensions through description generation.<n>Our approach achieves epoch state-of-the-art performance on several datasets while requiring only 33% of conventional trainings.
arXiv Detail & Related papers (2025-12-29T12:18:26Z)
FaCT: Faithful Concept Traces for Explaining Neural Network Decisions [56.796533084868884]
Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge.<n>We put emphasis on the faithfulness of concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations.<n>Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced.
arXiv Detail & Related papers (2025-10-29T13:35:46Z)
EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization [30.814807961528572]
We propose a bidirectional concept distillation framework, EchoDistill, to enable one-step diffusion personalization.<n>Our approach involves an end-to-end training process where a multi-step diffusion model (teacher) and a one-step diffusion model (student) are trained simultaneously.<n>Our experiments demonstrate that this collaborative framework significantly outperforms existing personalization methods over the 1-SDP setup.
arXiv Detail & Related papers (2025-10-23T12:56:33Z)
Real-Time Intuitive AI Drawing System for Collaboration: Enhancing Human Creativity through Formal and Contextual Intent Integration [26.920087528015205]
This paper presents a real-time generative drawing system that interprets and integrates both formal intent and contextual intent.<n>The system achieves low-latency, two-stage transformation while supporting multi-user collaboration on shared canvases.
arXiv Detail & Related papers (2025-08-12T01:34:23Z)
A Concept-Centric Approach to Multi-Modality Learning [3.828996378105142]
We introduce a new multi-modality learning framework to create a more efficient AI system. Our framework achieves on par with benchmark models while demonstrating more efficient learning curves.
arXiv Detail & Related papers (2024-12-18T13:40:21Z)
LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification [5.8754760054410955]
We introduce textttHi-CoDecomposition, a novel framework designed to enhance model interpretability through structured concept analysis. Our approach not only aligns with the performance of state-of-the-art models but also advances transparency by providing clear insights into the decision-making process.
arXiv Detail & Related papers (2024-05-29T00:36:56Z)
Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks. Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training. This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z)
Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs [57.492124844326206]
This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision. Our innovative framework, unlike any before, seamlessly integrates 2D correspondence matching, camera pose estimation, and NeRF rendering, fostering a synergistic enhancement of these tasks.
arXiv Detail & Related papers (2023-12-12T13:22:44Z)
Coarse-to-Fine Concept Bottleneck Models [9.910980079138206]
This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs) Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on two levels of granularity. Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene.
arXiv Detail & Related papers (2023-10-03T14:57:31Z)
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z)
FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams. The learned concepts support downstream applications, such as answering questions by reasoning about unseen images. We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z)
Interpretable Visual Reasoning via Induced Symbolic Space [75.95241948390472]
We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images. We first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. We then come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words.
arXiv Detail & Related papers (2020-11-23T18:21:49Z)
Learning the Compositional Visual Coherence for Complementary Recommendations [62.60648815930101]
Complementary recommendations aim at providing users product suggestions that are supplementary and compatible with their obtained items. We propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents.
arXiv Detail & Related papers (2020-06-08T06:57:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.