Related papers: Learning from Pattern Completion: Self-supervised Controllable Generation

Learning from Pattern Completion: Self-supervised Controllable Generation

URL: http://arxiv.org/abs/2409.18694v2
Date: Thu, 7 Nov 2024 08:27:16 GMT
Title: Learning from Pattern Completion: Self-supervised Controllable Generation
Authors: Zhiqiang Chen, Guofan Fan, Jinying Gao, Lei Ma, Bo Lei, Tiejun Huang, Shan Yu,
Abstract summary: We propose a self-supervised controllable generation (SCG) framework, inspired by the neural mechanisms that may contribute to the brain's associative power. Experimental results demonstrate that the proposed modular autoencoder effectively achieves functional specialization. Our proposed approach not only demonstrates superior robustness in more challenging high-noise scenarios but also possesses more promising scalability potential due to its self-supervised manner.
Score: 31.694486524155593
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The human brain exhibits a strong ability to spontaneously associate different visual attributes of the same or similar visual scene, such as associating sketches and graffiti with real-world visual objects, usually without supervising information. In contrast, in the field of artificial intelligence, controllable generation methods like ControlNet heavily rely on annotated training datasets such as depth maps, semantic segmentation maps, and poses, which limits the method's scalability. Inspired by the neural mechanisms that may contribute to the brain's associative power, specifically the cortical modularization and hippocampal pattern completion, here we propose a self-supervised controllable generation (SCG) framework. Firstly, we introduce an equivariant constraint to promote inter-module independence and intra-module correlation in a modular autoencoder network, thereby achieving functional specialization. Subsequently, based on these specialized modules, we employ a self-supervised pattern completion approach for controllable generation training. Experimental results demonstrate that the proposed modular autoencoder effectively achieves functional specialization, including the modular processing of color, brightness, and edge detection, and exhibits brain-like features including orientation selectivity, color antagonism, and center-surround receptive fields. Through self-supervised training, associative generation capabilities spontaneously emerge in SCG, demonstrating excellent generalization ability to various tasks such as associative generation on painting, sketches, and ancient graffiti. Compared to the previous representative method ControlNet, our proposed approach not only demonstrates superior robustness in more challenging high-noise scenarios but also possesses more promising scalability potential due to its self-supervised manner.Codes are released on Github and Gitee.

Related papers

Meta-Representational Predictive Coding: Biomimetic Self-Supervised Learning [51.22185316175418]
We present a new form of predictive coding that we call meta-representational predictive coding (MPC) MPC sidesteps the need for learning a generative model of sensory input by learning to predict representations of sensory input across parallel streams.
arXiv Detail & Related papers (2025-03-22T22:13:14Z)
Without Paired Labeled Data: An End-to-End Self-Supervised Paradigm for UAV-View Geo-Localization [2.733505168507872]
UAV-View Geo-Localization (UVGL) aims to achieve accurate localization of unmanned aerial vehicles (UAVs) by retrieving the most relevant GPS-tagged satellite images. Existing methods heavily rely on pre-paired UAV-satellite images for supervised learning. We propose an end-to-end self-supervised UVGL method to overcome these limitations.
arXiv Detail & Related papers (2025-02-17T02:53:08Z)
Exploring Latent Pathways: Enhancing the Interpretability of Autonomous Driving with a Variational Autoencoder [79.70947339175572]
A bio-inspired neural circuit policy model has emerged as an innovative control module. We take a leap forward by integrating a variational autoencoder with the neural circuit policy controller. In addition to the architectural shift toward a variational autoencoder, this study introduces the automatic latent perturbation tool.
arXiv Detail & Related papers (2024-04-02T09:05:47Z)
Self-supervised Semi-implicit Graph Variational Auto-encoders with Masking [18.950919307926824]
We propose the SeeGera model, which is based on the family of self-supervised variational graph auto-encoder (VGAE) SeeGera co-embeds both nodes and features in the encoder and reconstructs both links and features in the decoder. We conduct extensive experiments comparing SeeGera with 9 other state-of-the-art competitors.
arXiv Detail & Related papers (2023-01-29T15:00:43Z)
Robust and Controllable Object-Centric Learning through Energy-based Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model. We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z)
Learning What and Where -- Unsupervised Disentangling Location and Identity Tracking [0.44040106718326594]
We introduce an unsupervisedd LOCation and Identity tracking system (Loci) Inspired by the dorsal-ventral pathways in the brain, Loci tackles the what-and-where binding problem by means of a self-supervised segregation mechanism. Loci may set the stage for deeper, explanation-oriented video processing.
arXiv Detail & Related papers (2022-05-26T13:30:14Z)
Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream. The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations. Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z)
Is Disentanglement enough? On Latent Representations for Controllable Music Generation [78.8942067357231]
In the absence of a strong generative decoder, disentanglement does not necessarily imply controllability. The structure of the latent space with respect to the VAE-decoder plays an important role in boosting the ability of a generative model to manipulate different attributes.
arXiv Detail & Related papers (2021-08-01T18:37:43Z)
Combining Probabilistic Logic and Deep Learning for Self-Supervised Learning [10.47937328610174]
Self-supervised learning has emerged as a promising direction to alleviate the supervision bottleneck. We present deep probabilistic logic, which offers a unifying framework for task-specific self-supervision. Next, we present self-supervised self-supervision(S4), which adds to DPL the capability to learn new self-supervision automatically.
arXiv Detail & Related papers (2021-07-27T04:25:56Z)
Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem. We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
Network Bending: Expressive Manipulation of Deep Generative Models [0.2062593640149624]
We introduce a new framework for manipulating and interacting with deep generative models that we call network bending. We show how it allows for the direct manipulation of semantically meaningful aspects of the generative process as well as allowing for a broad range of expressive outcomes.
arXiv Detail & Related papers (2020-05-25T21:48:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.