Neural-Driven Image Editing
- URL: http://arxiv.org/abs/2507.05397v1
- Date: Mon, 07 Jul 2025 18:31:50 GMT
- Title: Neural-Driven Image Editing
- Authors: Pengfei Zhou, Jie Xia, Xiaopeng Peng, Wangbo Zhao, Zilong Ye, Zekai Li, Suorong Yang, Jiadong Pan, Yuanxiang Chen, Ziqiao Wang, Kai Wang, Qian Zheng, Xiaojun Chang, Gang Pan, Shurong Dong, Kaipeng Zhang, Yang You,
- Abstract summary: Traditional image editing relies on manual prompting, making it labor-intensive and inaccessible to individuals with limited motor control or language abilities.<n>We propose LoongX, a hands-free image editing approach driven by neurophysiological signals.<n>LoongX utilizes state-of-the-art diffusion models trained on a comprehensive dataset of 23,928 image editing pairs.
- Score: 51.11173675034121
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Traditional image editing typically relies on manual prompting, making it labor-intensive and inaccessible to individuals with limited motor control or language abilities. Leveraging recent advances in brain-computer interfaces (BCIs) and generative models, we propose LoongX, a hands-free image editing approach driven by multimodal neurophysiological signals. LoongX utilizes state-of-the-art diffusion models trained on a comprehensive dataset of 23,928 image editing pairs, each paired with synchronized electroencephalography (EEG), functional near-infrared spectroscopy (fNIRS), photoplethysmography (PPG), and head motion signals that capture user intent. To effectively address the heterogeneity of these signals, LoongX integrates two key modules. The cross-scale state space (CS3) module encodes informative modality-specific features. The dynamic gated fusion (DGF) module further aggregates these features into a unified latent space, which is then aligned with edit semantics via fine-tuning on a diffusion transformer (DiT). Additionally, we pre-train the encoders using contrastive learning to align cognitive states with semantic intentions from embedded natural language. Extensive experiments demonstrate that LoongX achieves performance comparable to text-driven methods (CLIP-I: 0.6605 vs. 0.6558; DINO: 0.4812 vs. 0.4636) and outperforms them when neural signals are combined with speech (CLIP-T: 0.2588 vs. 0.2549). These results highlight the promise of neural-driven generative models in enabling accessible, intuitive image editing and open new directions for cognitive-driven creative technologies. Datasets and code will be released to support future work and foster progress in this emerging area.
Related papers
- Neuro2Semantic: A Transfer Learning Framework for Semantic Reconstruction of Continuous Language from Human Intracranial EEG [11.531598524209969]
We introduce Neuro2Semantic, a novel framework that reconstructs the semantic content of perceived speech from intracranial EEG (iEEG) recordings.<n>Our approach consists of two phases: first, an LSTM-based adapter aligns neural signals with pre-trained text embeddings; second, a corrector module generates continuous, natural text directly from these aligned embeddings.<n>Neuro2Semantic achieves strong performance with as little as 30 minutes of neural data, outperforming a recent state-of-the-art method in low-data settings.
arXiv Detail & Related papers (2025-05-31T04:17:19Z) - Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models [92.18057318458528]
Token-Shuffle is a novel method that reduces the number of image tokens in Transformer.<n>Our strategy requires no additional pretrained text-encoder and enables MLLMs to support extremely high-resolution image synthesis.<n>In GenAI-benchmark, our 2.7B model achieves 0.77 overall score on hard prompts, outperforming AR models LlamaGen by 0.18 and diffusion models LDM by 0.15.
arXiv Detail & Related papers (2025-04-24T17:59:56Z) - CoSimGen: Controllable Diffusion Model for Simultaneous Image and Mask Generation [1.9393128408121891]
Existing generative models fail to address the need for high-quality, simultaneous image-mask generation.<n>We propose CoSimGen, a diffusion-based framework for controllable simultaneous image and mask generation.<n>CoSimGen achieves state-of-the-art performance across all datasets, achieving the lowest KID of 0.11 and LPIPS of 0.53 across datasets.
arXiv Detail & Related papers (2025-03-25T13:48:22Z) - Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning [52.170253590364545]
Gen-SIS is a diffusion-based augmentation technique trained exclusively on unlabeled image data.<n>We show that these self-augmentations', i.e. generative augmentations based on the vanilla SSL encoder embeddings, facilitate the training of a stronger SSL encoder.
arXiv Detail & Related papers (2024-12-02T16:20:59Z) - Natively neuromorphic LMU architecture for encoding-free SNN-based HAR on commercial edge devices [4.79664336929499]
Neuromorphic models take inspiration from the human brain by adopting bio-plausible neuron models.
We present the L2MU, a native neuromorphic Legendre Memory Unit (LMU) which entirely relies on Leaky Integrate-and-Fire neurons.
We benchmarked our L2MU on smartwatch signals from hand-oriented activities, deploying it on three different commercial edge devices in compressed versions too.
arXiv Detail & Related papers (2024-07-04T17:35:12Z) - Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning [2.087148326341881]
This paper introduces a MUltimodal Similarity-keeping contrastivE learning framework for zero-shot EEG-based image classification.
We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining.
Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification.
arXiv Detail & Related papers (2024-06-05T16:42:23Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing [72.45257414889478]
We aim to reduce human workload by predicting connectivity between over-segmented neuron pieces.
We first construct a dataset, named FlyTracing, that contains millions of pairwise connections of segments expanding the whole fly brain.
We propose a novel connectivity-aware contrastive learning method to generate dense volumetric EM image embedding.
arXiv Detail & Related papers (2024-01-05T19:45:12Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - EEG to fMRI Synthesis: Is Deep Learning a candidate? [0.913755431537592]
This work provides the first comprehensive on how to use state-of-the-art principles from Neural Processing to synthesize fMRI data from electroencephalographic (EEG) view data.
A comparison of state-of-the-art synthesis approaches, including Autoencoders, Generative Adrial Networks and Pairwise Learning, is undertaken.
Results highlight the feasibility of EEG to fMRI brain image mappings, pinpointing the role of current advances in Machine Learning and showing the relevance of upcoming contributions to further improve performance.
arXiv Detail & Related papers (2020-09-29T16:29:20Z) - Neural Cellular Automata Manifold [84.08170531451006]
We show that the neural network architecture of the Neural Cellular Automata can be encapsulated in a larger NN.
This allows us to propose a new model that encodes a manifold of NCA, each of them capable of generating a distinct image.
In biological terms, our approach would play the role of the transcription factors, modulating the mapping of genes into specific proteins that drive cellular differentiation.
arXiv Detail & Related papers (2020-06-22T11:41:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.