Related papers: Fully Distributed, Flexible Compositional Visual Representations via Soft Tensor Products

Fully Distributed, Flexible Compositional Visual Representations via Soft Tensor Products

URL: http://arxiv.org/abs/2412.04671v3
Date: Thu, 23 Jan 2025 01:05:05 GMT
Title: Fully Distributed, Flexible Compositional Visual Representations via Soft Tensor Products
Authors: Bethia Sun, Maurice Pagnucco, Yang Song,
Abstract summary: We introduce Soft TPR, a representational form that encodes compositional structure in an inherently distributed, flexible manner.<n>We show that Soft TPR consistently outperforms conventional disentanglement alternatives.<n>These findings highlight the promise of a distributed and flexible approach to representing compositional structure.
Score: 13.306125510884563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Since the inception of the classicalist vs. connectionist debate, it has been argued that the ability to systematically combine symbol-like entities into compositional representations is crucial for human intelligence. In connectionist systems, the field of disentanglement has gained prominence for its ability to produce explicitly compositional representations; however, it relies on a fundamentally symbolic, concatenative representation of compositional structure that clashes with the continuous, distributed foundations of deep learning. To resolve this tension, we extend Smolensky's Tensor Product Representation (TPR) and introduce Soft TPR, a representational form that encodes compositional structure in an inherently distributed, flexible manner, along with Soft TPR Autoencoder, a theoretically-principled architecture designed specifically to learn Soft TPRs. Comprehensive evaluations in the visual representation learning domain demonstrate that the Soft TPR framework consistently outperforms conventional disentanglement alternatives -- achieving state-of-the-art disentanglement, boosting representation learner convergence, and delivering superior sample efficiency and low-sample regime performance in downstream tasks. These findings highlight the promise of a distributed and flexible approach to representing compositional structure by potentially enhancing alignment with the core principles of deep learning over the conventional symbolic approach.

Related papers

Sparsification and Reconstruction from the Perspective of Representation Geometry [10.834177456685538]
Sparse Autoencoders (SAEs) have emerged as a predominant tool in mechanistic interpretability.<n>This study explains the principles of sparsity from the perspective of representational geometry.<n>Specifically emphasizes the necessity of understanding representations and incorporating representational constraints.
arXiv Detail & Related papers (2025-05-28T15:54:33Z)
Distribution-Conditional Generation: From Class Distribution to Creative Generation [39.93527514513576]
DisTok is an encoder-decoder framework that maps class distributions into a latent space and decodes them into tokens of creative concept.<n>DisTok achieves state-of-the-art performance with superior text-image alignment and human preference scores.
arXiv Detail & Related papers (2025-05-06T16:07:12Z)
"Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space. Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z)
SYNTHIA: Novel Concept Design with Affordance Composition [114.19366716161655]
We introduce SYNTHIA, a framework for generating novel, functionally coherent designs based on desired affordances. We develop a curriculum learning scheme based on our ontology that contrast fine-tunes T2I models to progressively learn affordance composition. Experimental results show that SYNTHIA outperforms state-of-the-art T2I models.
arXiv Detail & Related papers (2025-02-25T02:54:11Z)
Systematic Abductive Reasoning via Diverse Relation Representations in Vector-symbolic Architecture [10.27696004820717]
We propose a Systematic Abductive Reasoning model with diverse relation representations (Rel-SAR) in Vector-symbolic Architecture (VSA) To derive representations with symbolic reasoning potential, we introduce not only various types of atomic vectors represent numeric, periodic and logical semantics, but also the structured high-dimentional representation (S) For systematic reasoning, we propose novel numerical and logical functions and perform rule abduction and generalization execution in a unified framework that integrates these relation representations.
arXiv Detail & Related papers (2025-01-21T05:17:08Z)
NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization [17.49136753589057]
NeSyCoCo is a neuro-symbolic framework that generates symbolic representations and maps them to differentiable neural computations. Our framework achieves state-of-the-art results on the ReaSCAN and CLEVR-CoGenT compositional generalization benchmarks.
arXiv Detail & Related papers (2024-12-20T05:48:58Z)
Learning Visual-Semantic Subspace Representations [49.17165360280794]
We introduce a nuclear norm-based loss function, grounded in the same information theoretic principles that have proved effective in self-supervised learning. We present a theoretical characterization of this loss, demonstrating that, in addition to promoting classity, it encodes the spectral geometry of the data within a subspace lattice.
arXiv Detail & Related papers (2024-05-25T12:51:38Z)
Generalized Holographic Reduced Representations [6.161066669674775]
Generalized Holographic Reduced Representations (GHRR) is an extension of Fourier Holographic Reduced Representations (FHRR) GHRR introduces a flexible, non-commutative binding operation, enabling improved encoding of complex data structures.
arXiv Detail & Related papers (2024-05-15T20:37:48Z)
Discovering Abstract Symbolic Relations by Learning Unitary Group Representations [7.303827428956944]
We investigate a principled approach for symbolic operation completion (SOC) SOC poses a unique challenge in modeling abstract relationships between discrete symbols. We demonstrate that SOC can be efficiently solved by a minimal model - a bilinear map - with a novel factorized architecture.
arXiv Detail & Related papers (2024-02-26T20:18:43Z)
Synergistic Anchored Contrastive Pre-training for Few-Shot Relation Extraction [4.7220779071424985]
Few-shot Relation Extraction (FSRE) aims to extract facts from a sparse set of labeled corpora. Recent studies have shown promising results in FSRE by employing Pre-trained Language Models. We introduce a novel synergistic anchored contrastive pre-training framework.
arXiv Detail & Related papers (2023-12-19T10:16:24Z)
Discrete, compositional, and symbolic representations through attractor dynamics [51.20712945239422]
We introduce a novel neural systems model that integrates attractor dynamics with symbolic representations to model cognitive processes akin to the probabilistic language of thought (PLoT) Our model segments the continuous representational space into discrete basins, with attractor states corresponding to symbolic sequences, that reflect the semanticity and compositionality characteristic of symbolic systems through unsupervised learning, rather than relying on pre-defined primitives. This approach establishes a unified framework that integrates both symbolic and sub-symbolic processing through neural dynamics, a neuroplausible substrate with proven expressivity in AI, offering a more comprehensive model that mirrors the complex duality of cognitive operations
arXiv Detail & Related papers (2023-10-03T05:40:56Z)
Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z)
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment [124.57488600605822]
Cross-modal garment synthesis and manipulation will significantly benefit the way fashion designers generate garments. We introduce DiffCloth, a diffusion-based pipeline for cross-modal garment synthesis and manipulation. Experiments on the CM-Fashion benchmark demonstrate that DiffCloth both yields state-of-the-art garment synthesis results.
arXiv Detail & Related papers (2023-08-22T05:43:33Z)
Im-Promptu: In-Context Composition from Image Prompts [10.079743487034762]
We investigate whether analogical reasoning can enable in-context composition over composable elements of visual stimuli. We use Im-Promptu to train agents with different levels of compositionality, including vector representations, patch representations, and object slots. Our experiments reveal tradeoffs between extrapolation abilities and the degree of compositionality, with non-compositional representations extending learned composition rules to unseen domains but performing poorly on tasks.
arXiv Detail & Related papers (2023-05-26T21:10:11Z)
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations. We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations. A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z)
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning [15.406125901927004]
We propose a novel framework termed Decomposed Fusion with Soft Prompt (DFSP)1, by involving vision-language models (VLMs) for unseen composition recognition. Specifically, DFSP constructs a vector combination of learnable soft prompts with state and object to establish the joint representation of them. In addition, a cross-modal fusion module is designed between the language and image branches, which decomposes state and object among language features instead of image features.
arXiv Detail & Related papers (2022-11-19T12:29:12Z)
Image Synthesis via Semantic Composition [74.68191130898805]
We present a novel approach to synthesize realistic images based on their semantic layouts. It hypothesizes that for objects with similar appearance, they share similar representation. Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations.
arXiv Detail & Related papers (2021-09-15T02:26:07Z)
Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization [131.23966358405767]
We adapt TP-TRANSFORMER with the explicitly compositional Product Representation (TPR) for the task of abstractive summarization. Key feature of our model is a structural bias that we introduce by encoding two separate representations for each token. We show that our TP-TRANSFORMER outperforms the Transformer and the original TP-TRANSFORMER significantly on several abstractive summarization datasets.
arXiv Detail & Related papers (2021-06-02T17:32:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.