Spectral Disentanglement and Enhancement: A Dual-domain Contrastive Framework for Representation Learning
- URL: http://arxiv.org/abs/2602.09066v1
- Date: Mon, 09 Feb 2026 07:29:43 GMT
- Title: Spectral Disentanglement and Enhancement: A Dual-domain Contrastive Framework for Representation Learning
- Authors: Jinjin Guo, Yexin Li, Zhichao Huang, Jun Fang, Zhiyuan Liu, Chao Liu, Pengzhang Liu, Qixia Jiang,
- Abstract summary: Spectral Disentanglement and Enhancement (SDE) is a novel framework that bridges the gap between the geometry of the embedded spaces and their spectral properties.<n>SDE consistently improves representation and robustness, outperforming state-of-the-art methods.
- Score: 28.392130815615545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale multimodal contrastive learning has recently achieved impressive success in learning rich and transferable representations, yet it remains fundamentally limited by the uniform treatment of feature dimensions and the neglect of the intrinsic spectral structure of the learned features. Empirical evidence indicates that high-dimensional embeddings tend to collapse into narrow cones, concentrating task-relevant semantics in a small subspace, while the majority of dimensions remain occupied by noise and spurious correlations. Such spectral imbalance and entanglement undermine model generalization. We propose Spectral Disentanglement and Enhancement (SDE), a novel framework that bridges the gap between the geometry of the embedded spaces and their spectral properties. Our approach leverages singular value decomposition to adaptively partition feature dimensions into strong signals that capture task-critical semantics, weak signals that reflect ancillary correlations, and noise representing irrelevant perturbations. A curriculum-based spectral enhancement strategy is then applied, selectively amplifying informative components with theoretical guarantees on training stability. Building upon the enhanced features, we further introduce a dual-domain contrastive loss that jointly optimizes alignment in both the feature and spectral spaces, effectively integrating spectral regularization into the training process and encouraging richer, more robust representations. Extensive experiments on large-scale multimodal benchmarks demonstrate that SDE consistently improves representation robustness and generalization, outperforming state-of-the-art methods. SDE integrates seamlessly with existing contrastive pipelines, offering an effective solution for multimodal representation learning.
Related papers
- Spectral Discrepancy and Cross-modal Semantic Consistency Learning for Object Detection in Hyperspectral Image [40.38555448650773]
Hyperspectral images with high spectral resolution provide new insights into recognizing subtle differences in similar substances.<n> object detection in hyperspectral images faces significant challenges in intra- and inter-class similarity due to the spatial differences in hyperspectral inter-bands.<n>We propose a novel network termed textbfSpectral textbfDiscrepancy and textbfCross-textbfModal semantic consistency learning (SDCM)<n>Our proposed method achieves state-of-the-art performance when compared with other ones.
arXiv Detail & Related papers (2025-12-20T07:03:09Z) - Spectral Representation-based Reinforcement Learning [42.78610854620513]
We introduce the perspective of spectral representations as a solution to difficulties in reinforcement learning.<n>We show how to construct spectral representations for transition operators that possess latent variable structures or energy-based structures.<n>We also provably extend this spectral view to partially observable MDPs.
arXiv Detail & Related papers (2025-12-17T02:54:42Z) - Implicit Counterfactual Learning for Audio-Visual Segmentation [50.69377287012591]
We propose the implicit counterfactual framework (ICF) to achieve unbiased cross-modal understanding.<n>Due to the lack of semantics, heterogeneous representations may lead to erroneous matches.<n>We introduce the multi-granularity implicit text (MIT) involving video-, segment- and frame-level as the bridge to establish the modality-shared space.
arXiv Detail & Related papers (2025-07-28T11:46:35Z) - Equal is Not Always Fair: A New Perspective on Hyperspectral Representation Non-Uniformity [42.8098014428052]
Hyperspectral image (HSI) representation is fundamentally challenged by pervasive non-uniformity.<n>We propose FairHyp, a fairness-directed framework that disentangles and resolves the threefold non-uniformity.<n>Our findings redefine fairness as a structural necessity in HSI modeling and offer a new paradigm for balancing adaptability, efficiency, and fidelity.
arXiv Detail & Related papers (2025-05-16T14:00:11Z) - Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective [52.662463893268225]
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios.<n>Existing SHGL methods encounter two significant limitations.<n>We introduce a novel framework enhanced by rank and dual consistency constraints.
arXiv Detail & Related papers (2024-12-01T09:33:20Z) - Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - Enhancing Multimodal Unified Representations for Cross Modal Generalization [52.16653133604068]
We propose Training-free Optimization of Codebook (TOC) and Fine and Coarse cross-modal Information Disentangling (FCID)<n>These methods refine the unified discrete representations from pretraining and perform fine- and coarse-grained information disentanglement tailored to the specific characteristics of each modality.
arXiv Detail & Related papers (2024-03-08T09:16:47Z) - Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence [26.1694389791047]
Spectral super-resolution aims to recover hyperspectral image (HSI) from easily obtainable RGB image.
Two types of bottlenecks in existing Transformers limit performance improvement and practical applications.
We propose a novel Exhaustive Correlation Transformer (ECT) for spectral super-resolution.
arXiv Detail & Related papers (2023-12-20T08:30:07Z) - Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound.
We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements.
To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z) - Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy.
A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings.
An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.