Structured Contrastive Learning for Interpretable Latent Representations
- URL: http://arxiv.org/abs/2511.14920v1
- Date: Tue, 18 Nov 2025 21:18:20 GMT
- Title: Structured Contrastive Learning for Interpretable Latent Representations
- Authors: Zhengyang Shen, Hua Tu, Mayue Shi,
- Abstract summary: We propose Structured Contrastive Learning (SCL), a framework that partitions latent space representations into three semantic groups.<n> Experiments on ECG phase invariance and IMU rotation demonstrate superior performance.<n>This work represents a paradigm shift from reactive data augmentation to proactive structural learning, enabling interpretable latent representations in neural networks.
- Score: 2.8870482999983094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks exhibit severe brittleness to semantically irrelevant transformations. A mere 75ms electrocardiogram (ECG) phase shift degrades latent cosine similarity from 1.0 to 0.2, while sensor rotations collapse activity recognition performance with inertial measurement units (IMUs). We identify the root cause as "laissez-faire" representation learning, where latent spaces evolve unconstrained provided task performance is satisfied. We propose Structured Contrastive Learning (SCL), a framework that partitions latent space representations into three semantic groups: invariant features that remain consistent under given transformations (e.g., phase shifts or rotations), variant features that actively differentiate transformations via a novel variant mechanism, and free features that preserve task flexibility. This creates controllable push-pull dynamics where different latent dimensions serve distinct, interpretable purposes. The variant mechanism enhances contrastive learning by encouraging variant features to differentiate within positive pairs, enabling simultaneous robustness and interpretability. Our approach requires no architectural modifications and integrates seamlessly into existing training pipelines. Experiments on ECG phase invariance and IMU rotation robustness demonstrate superior performance: ECG similarity improves from 0.25 to 0.91 under phase shifts, while WISDM activity recognition achieves 86.65% accuracy with 95.38% rotation consistency, consistently outperforming traditional data augmentation. This work represents a paradigm shift from reactive data augmentation to proactive structural learning, enabling interpretable latent representations in neural networks.
Related papers
- Structural Action Transformer for 3D Dexterous Manipulation [80.07649565189035]
Cross-embodiment skill transfer is a challenge for high-DoF robotic hands.<n>Existing methods, often relying on 2D observations and temporal-centric action representation, struggle to capture 3D spatial relations.<n>This paper proposes a new 3D dexterous manipulation policy that challenges this paradigm by introducing a structural-centric perspective.
arXiv Detail & Related papers (2026-03-04T11:38:12Z) - \ extsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation [50.027425808733994]
textscNaVIDA is a unified VLN framework that couples policy learning with action-grounded visual dynamics and adaptive execution.<n>textscNaVIDA augments training with chunk-based inverse-dynamics supervision to learn causal relationship between visual changes and corresponding actions.<n>Experiments show that textscNaVIDA achieves superior navigation performance compared to state-of-the-art methods with fewer parameters.
arXiv Detail & Related papers (2026-01-26T06:16:17Z) - Deep Delta Learning [91.75868893250662]
We introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection.<n>We provide a spectral analysis of this operator, demonstrating that the gate $(mathbfX)$ enables dynamic between identity mapping, projection, and geometric reflection.<n>This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics.
arXiv Detail & Related papers (2026-01-01T18:11:38Z) - Efficient Neural Networks with Discrete Cosine Transform Activations [0.6933076588916188]
Expressive Neural Network (ENN) is a multilayer perceptron with adaptive activation functions parametrized using the Discrete Cosine Transform (DCT)<n>We show that ENNs achieve state-of-the-art accuracy while maintaining a low number of parameters.
arXiv Detail & Related papers (2025-11-05T15:02:58Z) - Learning with Category-Equivariant Representations for Human Activity Recognition [0.0]
We introduce a categorical symmetry-aware learning framework that captures how signals vary over time, scale, and sensor hierarchy.<n>We build these factors into the structure of feature representations, yielding models that automatically preserve the relationships between sensors.<n>On the UCI Human Activity Recognition benchmark, this categorical symmetry-driven design improves out-of-distribution accuracy by approx. 46 percentage points.
arXiv Detail & Related papers (2025-11-02T11:37:36Z) - Latent Diffusion Model without Variational Autoencoder [78.34722551463223]
SVG is a novel latent diffusion model without variational autoencoders for visual generation.<n>It constructs a feature space with clear semantic discriminability by leveraging frozen DINO features.<n>It enables accelerated diffusion training, supports few-step sampling, and improves generative quality.
arXiv Detail & Related papers (2025-10-17T04:17:44Z) - Learning by Steering the Neural Dynamics: A Statistical Mechanics Perspective [0.0]
We study how neural dynamics can support fully local, distributed learning.<n>We propose a biologically plausible algorithm for supervised learning with any binary recurrent network.
arXiv Detail & Related papers (2025-10-13T22:28:34Z) - Distribution Shift Aware Neural Tabular Learning [40.14597657016167]
Tabular learning transforms raw features into optimized spaces for downstream tasks.<n>But its effectiveness deteriorates under distribution shifts between training and testing data.<n>We propose a novel Shift-Aware Feature Transformation framework to address it.
arXiv Detail & Related papers (2025-08-27T00:14:08Z) - Adapting to Fragmented and Evolving Data: A Fisher Information Perspective [0.0]
FADE is a lightweight framework for robust learning under dynamic environments.<n>It employs a shift-aware regularization mechanism anchored in Fisher information geometry.<n>FADE operates online with fixed memory and no access to target labels.
arXiv Detail & Related papers (2025-07-25T06:50:09Z) - The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions [51.68215326304272]
We show that even small perturbations reliably cause otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time.<n>Our findings provide insights into neural network training stability, with practical implications for fine-tuning, model merging, and diversity of model ensembles.
arXiv Detail & Related papers (2025-06-16T08:35:16Z) - PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.