Related papers: The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units

The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units

URL: http://arxiv.org/abs/2506.16289v1
Date: Thu, 19 Jun 2025 13:06:16 GMT
Title: The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units
Authors: Oswaldo Ludwig,
Abstract summary: We argue that a high condition number, though not sufficient for effective knowledge encoding, may indicate that the unit has learned to selectively amplify and compress information.<n>We present a practical case study where these principles are applied to guide selective fine-tuning of a multimodal Large Language Model.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper explores the relationship between the condition number of a neural network's weight tensor and the extent of information encoded by the associated processing unit, viewed through the lens of information theory. We argue that a high condition number, though not sufficient for effective knowledge encoding, may indicate that the unit has learned to selectively amplify and compress information. We formalize this intuition, particularly for linear units with Gaussian inputs, linking the condition number and the transformation's log-volume scaling factor to the characteristics of the output entropy and the geometric properties of the learned transformation. Our analysis demonstrates that for a fixed weight norm, a concentrated distribution of singular values (high condition number) corresponds to reduced overall information transfer, indicating a specialized and efficient encoding strategy. Furthermore, we present a practical case study where these principles are applied to guide selective fine-tuning of a multimodal Large Language Model, aiming to mitigate catastrophic forgetting during cross-modal adaptation. Unlike many existing catastrophic forgetting mitigation methods that rely on access to pre-training statistics, which are often unavailable, our selective fine-tuning approach offers a way to bypass this common requirement.

Related papers

Heterogeneous quantization regularizes spiking neural network activity [0.0]
We present a data-blind neuromorphic signal conditioning strategy whereby analog data are normalized and quantized into spike phase representations. We extend this mechanism by adding a data-aware calibration step whereby the range and density of the quantization weights adapt to accumulated input statistics.
arXiv Detail & Related papers (2024-09-27T02:25:44Z)
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z)
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture. We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Gacs-Korner Common Information Variational Autoencoder [102.89011295243334]
We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables. We demonstrate that our formulation allows us to learn semantically meaningful common and unique factors of variation even on high-dimensional data such as images and videos.
arXiv Detail & Related papers (2022-05-24T17:47:26Z)
Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization [76.68866368409216]
We propose learning to dynamically select discretization tightness conditioned on inputs. We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.
arXiv Detail & Related papers (2022-02-02T23:54:26Z)
Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it. Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z)
Scalable Vector Gaussian Information Bottleneck [19.21005180893519]
We study a variation of the problem, called scalable information bottleneck, in which the encoder outputs multiple descriptions of the observation. We derive a variational inference type algorithm for general sources with unknown distribution; and show means of parametrizing it using neural networks.
arXiv Detail & Related papers (2021-02-15T12:51:26Z)
On the Relevance-Complexity Region of Scalable Information Bottleneck [15.314757778110955]
We study a variation of the problem, called scalable information bottleneck, where the encoder outputs multiple descriptions of the observation. The problem at hand is motivated by some application scenarios that require varying levels of accuracy depending on the allowed level of generalization.
arXiv Detail & Related papers (2020-11-02T22:25:28Z)
Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications. However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z)
The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget [164.65771897804404]
In many applications, it is desirable to extract only the relevant information from complex input data. The information bottleneck method formalizes this as an information-theoretic optimization problem. We propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information.
arXiv Detail & Related papers (2020-04-24T18:29:31Z)
Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN) [30.74691299906988]
A central question of representation learning asks under which conditions it is possible to reconstruct the true latent variables of an arbitrarily complex generative process. Recent breakthrough work by Khehem et al. on nonlinear ICA has answered this question for a broad class of conditional generative processes. We extend this important result in a direction relevant for application to real-world data.
arXiv Detail & Related papers (2020-01-14T16:25:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.