Related papers: Supervised sparse auto-encoders as unconstrained feature models for semantic composition

Supervised sparse auto-encoders as unconstrained feature models for semantic composition

URL: http://arxiv.org/abs/2602.00924v1
Date: Sat, 31 Jan 2026 22:47:54 GMT
Title: Supervised sparse auto-encoders as unconstrained feature models for semantic composition
Authors: Ouns El Harzli, Hugo Wallner, Yoonsoo Nam, Haixuan Xavier Tao,
Abstract summary: Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability.<n>In this paper, we address these limitations by adapting unconstrained feature models.<n>We supervise (decoder-only) SAEs to reconstruct feature vectors by jointly learning sparse concept embeddings and decoder weights.
Score: 4.753990617760439
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability, yet they face two significant challenges: the non-smoothness of the $L_1$ penalty, which hinders reconstruction and scalability, and a lack of alignment between learned features and human semantics. In this paper, we address these limitations by adapting unconstrained feature models-a mathematical framework from neural collapse theory-and by supervising the task. We supervise (decoder-only) SAEs to reconstruct feature vectors by jointly learning sparse concept embeddings and decoder weights. Validated on Stable Diffusion 3.5, our approach demonstrates compositional generalization, successfully reconstructing images with concept combinations unseen during training, and enabling feature-level intervention for semantic image editing without prompt modification.

Related papers

Understanding Degradation with Vision Language Model [56.09241449206817]
Understanding visual degradations is a critical yet challenging problem in computer vision.<n>We introduce DU-VLM, a multimodal chain-of-thought model trained with supervised fine-tuning and reinforcement learning.<n>We also introduce textbfDU-110k, a large-scale dataset comprising 110,000 clean-degraded pairs with grounded physical annotations.
arXiv Detail & Related papers (2026-02-04T13:51:15Z)
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing [62.94394079771687]
A burgeoning trend is to adopt high-dimensional features from representation encoders as generative latents.<n>We propose a systematic framework to adapt understanding-oriented encoder features for generative tasks.<n>We show that our approach achieves state-of-the-art reconstruction, faster convergence, and substantial performance gains in both Text-to-Image (T2I) and image editing tasks.
arXiv Detail & Related papers (2025-12-19T18:59:57Z)
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction [83.50898344094153]
VQRAE produces Continuous semantic features for image understanding and tokens for visual generation within a unified tokenizer.<n>Design enables negligible semantic information for maintaining the ability of multimodal understanding, discrete tokens.<n>VQRAE presents competitive performance on several benchmarks of visual understanding, generation and reconstruction.
arXiv Detail & Related papers (2025-11-28T17:26:34Z)
Physically Interpretable Multi-Degradation Image Restoration via Deep Unfolding and Explainable Convolution [45.571542528079114]
We propose a novel interpretability-driven approach for multi-degradation image restoration.<n>We employ an improved second-order semi-smooth Newton algorithm to ensure that each module maintains clear physical interpretability.<n>To further enhance interpretability and adaptability, we design an explainable convolution module inspired by the human brain's flexible information processing.
arXiv Detail & Related papers (2025-11-13T10:27:41Z)
Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning [17.299267108673277]
Hebbian learning is a biological principle that intuitively describes how neurons adapt their connections through repeated stimuli.<n>We introduce the Structural Projection Hebbian Representation (SPHeRe), a novel unsupervised learning method.<n> Experimental results show that SPHeRe achieves SOTA performance among unsupervised synaptic plasticity approaches.
arXiv Detail & Related papers (2025-10-16T15:47:29Z)
"Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.<n>Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z)
Interpretable Representation Learning of Cardiac MRI via Attribute Regularization [2.0221870195041087]
Interpretability is essential in medical imaging to ensure that clinicians can comprehend and trust artificial intelligence models. We propose an Attributed-regularized Soft Introspective Variational Autoencoder that combines attribute regularization of the latent space within the framework of an adversarially trained variational autoencoder.
arXiv Detail & Related papers (2024-06-12T14:47:51Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach [53.376029341079054]
We propose a combined generative and contrastive neural architecture for learning latent representations of 3D shapes.<n>The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape.
arXiv Detail & Related papers (2023-01-11T18:14:24Z)
Sparse aNETT for Solving Inverse Problems with Deep Learning [2.5234156040689237]
We propose a sparse reconstruction framework (aNETT) for solving inverse problems. We train an autoencoder network $D circ E$ with $E$ acting as a nonlinear sparsifying transform. Numerical results are presented for sparse view CT.
arXiv Detail & Related papers (2020-04-20T18:43:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.