Related papers: mHC: Manifold-Constrained Hyper-Connections

mHC: Manifold-Constrained Hyper-Connections

URL: http://arxiv.org/abs/2512.24880v2
Date: Mon, 05 Jan 2026 16:51:18 GMT
Title: mHC: Manifold-Constrained Hyper-Connections
Authors: Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Kuai Yu, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, Wenfeng Liang,
Abstract summary: Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm by expanding the residual stream width and diversifying connectivity patterns.<n>We propose Manifold-Constrained Hyper-Connections (mHC) to restore the identity mapping property intrinsic to the residual connection.<n>mHC is effective for training at scale, offering tangible performance improvements and superior scalability.
Score: 43.69451283828811
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.

Related papers

mHC-HSI: Clustering-Guided Hyper-Connection Mamba for Hyperspectral Image Classification [2.3379613890114395]
This paper presents a clustering-guided mHC Mamba model (CalgarymHC-HSI) for enhanced HSI classification.<n>The proposed approach is tested on benchmark datasets in comparison with the state-of-the-art methods.
arXiv Detail & Related papers (2026-03-03T18:56:40Z)
JPmHC Dynamical Isometry via Orthogonal Hyper-Connections [2.4311915994390403]
JPmHC is a framework that replaces identity skips with a trainable linear mixer acting on n parallel streams.<n>It prevents gradient pathologies and enhances stability.<n>It achieves faster convergence, higher accuracy, and lower computational cost compared to bistochastic baselines.
arXiv Detail & Related papers (2026-02-20T16:06:01Z)
Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization [68.89915707647138]
Large Reasoning Models (LRMs) have demonstrated impressive capabilities in solving complex tasks through the generation of long reasoning chains.<n>We propose textbfCoSMo (textbfSplit-textbfMerge textbfOptimization), a framework designed to eliminate structural redundancy rather than indiscriminately restricting token volume.
arXiv Detail & Related papers (2026-02-03T05:54:28Z)
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling [83.29209853451697]
Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs)<n>We introduce HGMem, a hypergraph-based memory mechanism that extends the concept of memory into a dynamic, expressive structure for complex reasoning and global understanding.<n>In our approach, memory is represented as a hypergraph whose hyperedges correspond to distinct memory units, enabling the progressive formation of higher-order interactions within memory.
arXiv Detail & Related papers (2025-12-30T03:13:10Z)
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models [71.9060068259379]
We propose cascaded domain-wise reinforcement learning to build general-purpose reasoning models.<n>Our 14B model, after RL, outperforms its SFT teacher, DeepSeek-R1-0528, on LiveCodeBench v5/v6 Pro and silver-medal performance in the 2025 International Olympiad in Informatics (IOI)
arXiv Detail & Related papers (2025-12-15T18:02:35Z)
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation [72.69742127579508]
Recent unified models integrate understanding experts (e.g., LLMs) with generative experts (e.g., diffusion models)<n>In this work, we propose HBridge, an asymmetric H-shaped architecture that enables heterogeneous experts to optimally leverage pretrained priors.<n> Extensive experiments across multiple benchmarks demonstrate the effectiveness and superior performance of HBridge.
arXiv Detail & Related papers (2025-11-25T17:23:38Z)
Adapformer: Adaptive Channel Management for Multivariate Time Series Forecasting [49.40321003932633]
Adapformer is an advanced Transformer-based framework that merges the benefits of CI and CD methodologies through effective channel management.<n>Adapformer achieves superior performance over existing models, enhancing both predictive accuracy and computational efficiency.
arXiv Detail & Related papers (2025-11-18T16:24:05Z)
Flow-Matching Guided Deep Unfolding for Hyperspectral Image Reconstruction [53.26903617819014]
Flow-Matching-guided Unfolding network (FMU) is first to integrate flow matching into HSI reconstruction.<n>To further strengthen the learned dynamics, we introduce a mean velocity loss.<n>Experiments on both simulated and real datasets show that FMU significantly outperforms existing approaches in reconstruction quality.
arXiv Detail & Related papers (2025-10-02T11:32:00Z)
CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling [52.05149789178508]
CCF is a novel context compression framework designed to enable efficient long-context modeling.<n>CCF integrates segment-wise semantic aggregation with key-value memory encoding, forming compact representations.<n> Empirical results on multiple long-context language modeling benchmarks demonstrate that CCF achieves competitive perplexity under high compression ratios.
arXiv Detail & Related papers (2025-09-11T07:13:49Z)
Scalable fluxonium qubit architecture with tunable interactions between non-computational levels [21.16783987031157]
We introduce a scalable fluxonium architecture that enables decoupling of qubit states while maintaining tunable couplings between non-computational states.<n>We demonstrate that the issue can be mitigated by implementing tunable couplings for fluxonium plasmon transitions, meanwhile enabling fast, high-fidelity gates with passive ZZ suppression.
arXiv Detail & Related papers (2025-04-14T05:31:47Z)
Retraining-Free Merging of Sparse MoE via Hierarchical Clustering [24.28646376876676]
This paper introduces Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE)<n>HC-SMoE is a task-agnostic expert merging framework for parameter reduction without retraining.<n>We provide theoretical analysis and evaluations across multiple zero-shot language tasks to demonstrate HC-SMoE's effectiveness in state-of-the-art models including Qwen and Mixtral.
arXiv Detail & Related papers (2024-10-11T07:36:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.