Related papers: From Directions to Regions: Decomposing Activations in Language Models via Local Geometry

From Directions to Regions: Decomposing Activations in Language Models via Local Geometry

URL: http://arxiv.org/abs/2602.02464v1
Date: Mon, 02 Feb 2026 18:49:05 GMT
Title: From Directions to Regions: Decomposing Activations in Language Models via Local Geometry
Authors: Or Shafran, Shaked Ronen, Omri Fahn, Shauli Ravfogel, Atticus Geiger, Mor Geva,
Abstract summary: We leverage Mixture of Factor Analyzers (MFA) as a scalable, unsupervised alternative that models the activation space.<n>MFA decomposes activations into two compositional geometric objects: the region's centroid in activation space, and the local variation from the centroid.<n>We train large-scale MFAs for Llama-3.1-8B and Gemma-2-2B, and show they capture complex, nonlinear structures in activation space.
Score: 37.50120706345745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Activation decomposition methods in language models are tightly coupled to geometric assumptions on how concepts are realized in activation space. Existing approaches search for individual global directions, implicitly assuming linear separability, which overlooks concepts with nonlinear or multi-dimensional structure. In this work, we leverage Mixture of Factor Analyzers (MFA) as a scalable, unsupervised alternative that models the activation space as a collection of Gaussian regions with their local covariance structure. MFA decomposes activations into two compositional geometric objects: the region's centroid in activation space, and the local variation from the centroid. We train large-scale MFAs for Llama-3.1-8B and Gemma-2-2B, and show they capture complex, nonlinear structures in activation space. Moreover, evaluations on localization and steering benchmarks show that MFA outperforms unsupervised baselines, is competitive with supervised localization methods, and often achieves stronger steering performance than sparse autoencoders. Together, our findings position local geometry, expressed through subspaces, as a promising unit of analysis for scalable concept discovery and model control, accounting for complex structures that isolated directions fail to capture.

Related papers

Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models [4.125187280299246]
Mixture-of-Experts (MoE) embedding models combine expert outputs using weighted linear summation, implicitly assuming a linear subspace structure in the embedding space.<n> Geometric analysis of a modern MoE embedding model reveals that expert outputs lie on a shared hyperspherical manifold characterized by tightly concentrated norms and substantial angular separation.<n>Spherical Barycentric Aggregation (SBA) is introduced as a geometry-preserving aggregation operator that separates radial and angular components to maintain hyperspherical structure while remaining fully compatible with existing routing mechanisms.
arXiv Detail & Related papers (2026-02-15T08:00:56Z)
TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning [104.66714520975837]
We introduce a geometry-grounded benchmark designed to evaluate compositional spatial reasoning through the lens of the classic Tangram game.<n>We propose the Tangram Construction Expression (TCE), a symbolic geometric framework that grounds tangram assemblies in exact, machine-verifiable coordinate specifications.<n>We conduct extensive evaluation experiments on advanced open-source and proprietary models, revealing an interesting insight: MLLMs tend to prioritize matching the target silhouette while neglecting geometric constraints.
arXiv Detail & Related papers (2026-01-23T07:35:05Z)
Mixture-of-Experts as Soft Clustering: A Dual Jacobian-PCA Spectral Geometry Perspective [0.5414847001704249]
Mixture-of-Experts (MoE) architectures are commonly motivated by efficiency and conditional computation.<n>We study MoEs through a geometric lens, interpreting routing as a form of soft partitioning of the representation space into overlapping local charts.
arXiv Detail & Related papers (2026-01-09T23:07:14Z)
Visualizing LLM Latent Space Geometry Through Dimensionality Reduction [0.0]
We extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction.<n>We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space.
arXiv Detail & Related papers (2025-11-26T17:11:39Z)
GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs [59.61242815508687]
Graph neural networks (GNNs) on text--attributed graphs (TAGs) encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation.<n>This work introduces a local PCA-based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure.
arXiv Detail & Related papers (2025-11-12T06:48:43Z)
Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images [51.95768218975529]
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs)<n>Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering.<n>We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering.
arXiv Detail & Related papers (2025-06-12T16:43:09Z)
HierRelTriple: Guiding Indoor Layout Generation with Hierarchical Relationship Triplet Losses [52.70183252341687]
We present a hierarchical triplet-based indoor relationship learning method, coined HierRelTriple, with a focus on spatial relationship learning.<n>We introduce HierRelTriple, a hierarchical relational triplets modeling framework that first partitions functional regions and then automatically extracts three levels of spatial relationships.<n>Experiments on unconditional layout synthesis, floorplan-conditioned layout generation, and scene rearrangement demonstrate that HierRel improves spatial-relation metrics by over 15%.
arXiv Detail & Related papers (2025-03-26T07:31:52Z)
Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts [3.9426000822656224]
We conjecture that in large language models, the embeddings live in a local manifold structure with different dimensions depending on the perplexities and domains of the input data.<n>By incorporating an attention-based soft-gating network, we verify that our model learns specialized sub-manifolds for an ensemble of input data sources.
arXiv Detail & Related papers (2025-02-19T09:33:16Z)
Analyzing the Latent Space of GAN through Local Dimension Estimation [4.688163910878411]
style-based GANs (StyleGANs) in high-fidelity image synthesis have motivated research to understand the semantic properties of their latent spaces. We propose a local dimension estimation algorithm for arbitrary intermediate layers in a pre-trained GAN model. Our proposed metric, called Distortion, measures an inconsistency of intrinsic space on the learned latent space.
arXiv Detail & Related papers (2022-05-26T06:36:06Z)
Unveiling the Potential of Structure-Preserving for Weakly Supervised Object Localization [71.79436685992128]
We propose a two-stage approach, termed structure-preserving activation (SPA), towards fully leveraging the structure information incorporated in convolutional features for WSOL. In the first stage, a restricted activation module (RAM) is designed to alleviate the structure-missing issue caused by the classification network. In the second stage, we propose a post-process approach, termed self-correlation map generating (SCG) module to obtain structure-preserving localization maps.
arXiv Detail & Related papers (2021-03-08T03:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.