Fugu-MT 論文翻訳(概要): GraphSculptor: Sculpting Pre-training Coreset for Graph Self-supervised Learning

論文の概要: GraphSculptor: Sculpting Pre-training Coreset for Graph Self-supervised Learning

arxiv url: http://arxiv.org/abs/2605.01310v1
Date: Sat, 02 May 2026 07:54:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.698534
Title: GraphSculptor: Sculpting Pre-training Coreset for Graph Self-supervised Learning
Title（参考訳）: GraphSculptor: グラフ自己教師型学習のための事前学習コアセットの抽出
Authors: Chuang Liu, Zelin Yao, Xueqi Ma, Luzhi Wang, Mukun Chen, Pinghua Xu, Wenbin Hu,
Abstract要約: グラフの自己教師型学習は通常、大規模なラベルなしデータセットに依存します。コアセット構築の事前学習のためのGraphSculptorを導入する。 10%のコアセットは99.6%のフルデータ性能を実現し、トレーニング前の時間を90%近く短縮する。
参考スコア（独自算出の注目度）: 8.07575845153502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph self-supervised learning typically relies on large-scale unlabeled datasets, heavily inflating computational costs. However, empirical evidence suggests that these datasets contain substantial redundancy-our analysis reveals that uniformly subsampling 50% of graphs retains over 96% of downstream performance. To exploit this redundancy, we introduce GraphSculptor for pre-training coreset construction. Unlike methods dependent on additional training-time signals or limited solely to topological statistics, GraphSculptor provides a label-free solution that constructs coresets via two complementary perspectives: intrinsic structure and contextual semantics. Concretely, structural diversity is quantified using intrinsic graph statistics, yielding a structural feature vector for each graph, while semantic diversity is captured by utilizing a pre-trained language model to encode descriptions generated via graph-to-text. GraphSculptor integrates these signals into a unified metric space and performs cluster-aware selection to preserve joint structural-semantic diversity. We further derive a theoretical bound on the loss gap between coreset and full-data pre-training, offering theoretical motivation for our selection formulation. Extensive experiments demonstrate that GraphSculptor effectively sculpts the dataset: a 10% coreset achieves 99.6% of full-data performance while reducing pre-training time by nearly 90%, offering a scalable solution for data-efficient graph pre-training.
Abstract（参考訳）: グラフの自己教師型学習は通常、大規模にラベル付けされていないデータセットに依存し、計算コストを膨らませる。しかしながら、これらのデータセットにかなりの冗長性分析が含まれているという実証的な証拠は、グラフの50%を均一にサブサンプリングすることで、下流のパフォーマンスの96%以上を維持できることを示している。この冗長性を活用するために,コアセット構築の事前学習のためのGraphSculptorを導入する。追加の訓練時間信号に依存するメソッドや、トポロジカルな統計にのみ依存するメソッドとは異なり、GraphSculptorは2つの補完的な視点(内在的構造と文脈意味論)を通じてコアセットを構築するラベルなしのソリューションを提供する。具体的には、構造的多様性を内在的なグラフ統計を用いて定量化し、グラフ毎に構造的特徴ベクトルを生成する一方で、事前訓練された言語モデルを用いて、グラフからテキストへ生成した記述をエンコードすることで意味的多様性を捉える。 GraphSculptorは、これらの信号を統一されたメートル法空間に統合し、クラスタアウェアの選択を行い、共同構造とセマンティックの多様性を維持する。さらに、コアセットとフルデータ事前学習の間の損失ギャップの理論的境界を導出し、選択の定式化に理論的動機を与える。 10%のコアセットは、完全なデータパフォーマンスの99.6%を達成し、事前トレーニング時間を90%近く削減し、データ効率のよいグラフ事前トレーニングのためのスケーラブルなソリューションを提供する。

論文の概要: GraphSculptor: Sculpting Pre-training Coreset for Graph Self-supervised Learning

関連論文リスト