Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning
- URL: http://arxiv.org/abs/2506.17576v2
- Date: Tue, 22 Jul 2025 14:07:33 GMT
- Title: Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning
- Authors: Furong Peng, Jinzhen Gao, Xuan Lu, Kang Liu, Yifan Huo, Sheng Wang,
- Abstract summary: Graph Convolutional Networks (GCNs) suffer from severe performance degradation in deep architectures due to over-smoothing.<n>We propose Layer-wise Gradual Training (LGT), a novel training strategy that progressively builds deep GCNs while preserving their expressive capacity.<n>LGT achieves state-of-the-art performance on benchmark datasets, significantly improving accuracy even in 32-layer settings.
- Score: 7.841760459191837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Convolutional Networks (GCNs) suffer from severe performance degradation in deep architectures due to over-smoothing. While existing studies primarily attribute the over-smoothing to repeated applications of graph Laplacian operators, our empirical analysis reveals a critical yet overlooked factor: trainable linear transformations in GCNs significantly exacerbate feature collapse, even at moderate depths (e.g., 8 layers). In contrast, Simplified Graph Convolution (SGC), which removes these transformations, maintains stable feature diversity up to 32 layers, highlighting linear transformations' dual role in facilitating expressive power and inducing over-smoothing. However, completely removing linear transformations weakens the model's expressive capacity. To address this trade-off, we propose Layer-wise Gradual Training (LGT), a novel training strategy that progressively builds deep GCNs while preserving their expressiveness. LGT integrates three complementary components: (1) layer-wise training to stabilize optimization from shallow to deep layers, (2) low-rank adaptation to fine-tune shallow layers and accelerate training, and (3) identity initialization to ensure smooth integration of new layers and accelerate convergence. Extensive experiments on benchmark datasets demonstrate that LGT achieves state-of-the-art performance on vanilla GCN, significantly improving accuracy even in 32-layer settings. Moreover, as a training method, LGT can be seamlessly combined with existing methods such as PairNorm and ContraNorm, further enhancing their performance in deeper networks. LGT offers a general, architecture-agnostic training framework for scalable deep GCNs. The code is available at [https://github.com/jfklasdfj/LGT_GCN].
Related papers
- LiteGS: A High-performance Framework to Train 3DGS in Subminutes via System and Algorithm Codesign [9.937895857852029]
3D Gaussian Splatting (3DGS) has emerged as promising alternative in 3D representation, but it still suffers from high training cost.<n>This paper introduces LiteGS, a high performance computation framework that systematically optimize the 3DGS training pipeline.<n> Experimental results demonstrate that LiteGS accelerates the original 3DGS training by up to 13.4x with comparable or superior quality.
arXiv Detail & Related papers (2025-03-03T05:52:02Z) - Fast and Slow Gradient Approximation for Binary Neural Network Optimization [11.064044986709733]
hypernetwork based methods utilize neural networks to learn the gradients of non-differentiable quantization functions.<n>We propose a Historical Gradient Storage (HGS) module, which models the historical gradient sequence to generate the first-order momentum required for optimization.<n>We also introduce Layer Recognition Embeddings (LRE) into the hypernetwork, facilitating the generation of layer-specific fine gradients.
arXiv Detail & Related papers (2024-12-16T13:48:40Z) - GAQAT: gradient-adaptive quantization-aware training for domain generalization [54.31450550793485]
We propose a novel Gradient-Adaptive Quantization-Aware Training (GAQAT) framework for DG.<n>Our approach begins by identifying the scale-gradient conflict problem in low-precision quantization.<n>Extensive experiments validate the effectiveness of the proposed GAQAT framework.
arXiv Detail & Related papers (2024-12-07T06:07:21Z) - ZNorm: Z-Score Gradient Normalization Accelerating Skip-Connected Network Training without Architectural Modification [0.0]
Z-Score Normalization for Gradient Descent (ZNorm) is an innovative technique that adjusts only the gradients without modifying the network architecture to accelerate training and improve model performance.<n>ZNorm normalizes the overall gradients, providing consistent gradient scaling across layers, effectively reducing the risks of vanishing and exploding gradients and achieving superior performance.<n>In medical imaging applications, ZNorm significantly enhances tumor prediction and segmentation accuracy, underscoring its practical utility.
arXiv Detail & Related papers (2024-08-02T12:04:19Z) - Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks [5.507301894089302]
This paper is the first attempt to study a new optimization technique for deep neural networks, using the sum normalization of a gradient vector as coefficients.
The proposed technique is hence named as the adaptive gradient regularization (AGR)
arXiv Detail & Related papers (2024-07-24T02:23:18Z) - Efficient Real-world Image Super-Resolution Via Adaptive Directional Gradient Convolution [80.85121353651554]
We introduce kernel-wise differential operations within the convolutional kernel and develop several learnable directional gradient convolutions.
These convolutions are integrated in parallel with a novel linear weighting mechanism to form an Adaptive Directional Gradient Convolution (DGConv)
We further devise an Adaptive Information Interaction Block (AIIBlock) to adeptly balance the enhancement of texture and contrast while meticulously investigating the interdependencies, culminating in the creation of a DGPNet for Real-SR through simple stacking.
arXiv Detail & Related papers (2024-05-11T14:21:40Z) - Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again [96.4999517230259]
We provide a new perspective of gradient flow to understand the substandard performance of deep GCNs.
We propose to use gradient-guided dynamic rewiring of vanilla-GCNs with skip connections.
Our methods significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods.
arXiv Detail & Related papers (2022-10-14T21:30:25Z) - Orthogonal Graph Neural Networks [53.466187667936026]
Graph neural networks (GNNs) have received tremendous attention due to their superiority in learning node representations.
stacking more convolutional layers significantly decreases the performance of GNNs.
We propose a novel Ortho-GConv, which could generally augment the existing GNN backbones to stabilize the model training and improve the model's generalization performance.
arXiv Detail & Related papers (2021-09-23T12:39:01Z) - Gradient Centralization: A New Optimization Technique for Deep Neural
Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean.
GC can be viewed as a projected gradient descent method with a constrained loss function.
GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.