Category learning in deep neural networks: Information content and geometry of internal representations
- URL: http://arxiv.org/abs/2510.19021v1
- Date: Tue, 21 Oct 2025 19:02:51 GMT
- Title: Category learning in deep neural networks: Information content and geometry of internal representations
- Authors: Laurent Bonnasse-Gahot, Jean-Pierre Nadal,
- Abstract summary: In animals, category learning enhances discrimination between stimuli close to the category boundary.<n>This phenomenon, called categorical perception, was also empirically observed in artificial neural networks trained on classification tasks.<n>We show that minimizing the Bayes cost (mean of the cross-entropy loss) implies maximizing the mutual information between the set of categories and the neural activities prior to the decision layer.
- Score: 2.1485350418225244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In animals, category learning enhances discrimination between stimuli close to the category boundary. This phenomenon, called categorical perception, was also empirically observed in artificial neural networks trained on classification tasks. In previous modeling works based on neuroscience data, we show that this expansion/compression is a necessary outcome of efficient learning. Here we extend our theoretical framework to artificial networks. We show that minimizing the Bayes cost (mean of the cross-entropy loss) implies maximizing the mutual information between the set of categories and the neural activities prior to the decision layer. Considering structured data with an underlying feature space of small dimension, we show that maximizing the mutual information implies (i) finding an appropriate projection space, and, (ii) building a neural representation with the appropriate metric. The latter is based on a Fisher information matrix measuring the sensitivity of the neural activity to changes in the projection space. Optimal learning makes this neural Fisher information follow a category-specific Fisher information, measuring the sensitivity of the category membership. Category learning thus induces an expansion of neural space near decision boundaries. We characterize the properties of the categorical Fisher information, showing that its eigenvectors give the most discriminant directions at each point of the projection space. We find that, unexpectedly, its maxima are in general not exactly at, but near, the class boundaries. Considering toy models and the MNIST dataset, we numerically illustrate how after learning the two Fisher information matrices match, and essentially align with the category boundaries. Finally, we relate our approach to the Information Bottleneck one, and we exhibit a bias-variance decomposition of the Bayes cost, of interest on its own.
Related papers
- Trade-Offs of Diagonal Fisher Information Matrix Estimators [53.35448232352667]
The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks.
We examine two popular estimators whose accuracy and sample complexity depend on their associated variances.
We derive bounds of the variances and instantiate them in neural networks for regression and classification.
arXiv Detail & Related papers (2024-02-08T03:29:10Z) - Information theoretic study of the neural geometry induced by category
learning [0.0]
We take an information theoretic approach to assess the efficiency of the representations induced by category learning.
One main consequence is that category learning induces an expansion of neural space near decision boundaries.
arXiv Detail & Related papers (2023-11-27T10:16:22Z) - Invariant Representations with Stochastically Quantized Neural Networks [5.7923858184309385]
We propose a methodology for direct computation of the mutual information between a neural layer and a sensitive attribute.
We show that this method compares favorably with the state of the art in fair representation learning.
arXiv Detail & Related papers (2022-08-04T13:36:06Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Category-orthogonal object features guide information processing in
recurrent neural networks trained for object categorization [0.12891210250935145]
Recurrent neural networks (RNNs) have been shown to perform better than feedforward architectures in visual object categorization tasks.
We test the hypothesis that recurrence iteratively aids object categorization via the communication of category-orthogonal auxiliary variables.
arXiv Detail & Related papers (2021-11-15T16:52:07Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Malicious Network Traffic Detection via Deep Learning: An Information
Theoretic View [0.0]
We study how homeomorphism affects learned representation of a malware traffic dataset.
Our results suggest that although the details of learned representations and the specific coordinate system defined over the manifold of all parameters differ slightly, the functional approximations are the same.
arXiv Detail & Related papers (2020-09-16T15:37:44Z) - Hold me tight! Influence of discriminative features on deep network
boundaries [63.627760598441796]
We propose a new perspective that relates dataset features to the distance of samples to the decision boundary.
This enables us to carefully tweak the position of the training samples and measure the induced changes on the boundaries of CNNs trained on large-scale vision datasets.
arXiv Detail & Related papers (2020-02-15T09:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.