FuguReport

Large-scale semi-supervised learning with online spectral graph sparsification

Authors Daniele Calandriello, Alessandro Lazaric, Michal Valko
Affiliations INRIA
Categories Method / Semi-Supervised Learning / Scalable SSL algorithm design, Method / Graph Algorithms / Spectral sparsification techniques, Evaluation / Algorithm Efficiency / Time and space complexity analysis
License CC BY 4.0

Abstract Overview

The paper introduces Sparse-HFS, a scalable graph-based semi-supervised learning algorithm that operates under stringent memory and computation constraints where storing the full similarity graph is infeasible. It combines online spectral graph sparsification (Kelner and Levin, 2013) with SDD matrix solvers to incrementally maintain a spectral sparsifier from an insertion-only edge stream, achieving O(n polylog(n)) space and O(m polylog(n)) time complexity. The paper provides a theoretical generalization bound via algorithmic stability analysis that quantifies how sparsification accuracy ε affects learning performance. Experiments on a synthetic dataset demonstrate that the approximate method achieves accuracy close to exact stable-HFS while using substantially fewer edges.

Novelty

The main novelty is the integration of online spectral graph sparsification into graph-based semi-supervised learning, yielding a semi-streaming variant of stable-HFS that never requires storing the full graph in memory. The work is also distinctive in providing a stability-based generalization analysis that formally quantifies the effect of the sparsification approximation on learning performance, showing the convergence rate is preserved for fixed ε.

Results

Theorem 1 proves that for fixed sparsification accuracy ε, Sparse-HFS retains the same asymptotic order of generalization-error convergence as exact stable-HFS, with an additional approximation term scaling as ε²/l² that is dominated by other bound terms. In experiments on a synthetic dataset with 12,100 points and dense k-NN graphs (ε=0.8, γ=1), at k=4500 where accuracy is near optimal, the sparsifier retains only about 10% of the original graph's edges while achieving similar accuracy to stable-HFS.

Key Points

  1. Sparse-HFS maintains an online (1±ε) spectral sparsifier so that graph-based SSL can be performed in O(n polylog(n)) space and O(m polylog(n)) time, processing edges from an insertion-only stream without ever storing the full graph.
  2. Theoretical analysis via algorithmic stability shows that sparsification adds a controlled approximation term (scaling as ε²/l²) to the generalization bound, while preserving the same asymptotic convergence order as exact stable-HFS for fixed ε.
  3. On a synthetic benchmark with 12,100 points, Sparse-HFS closely tracks stable-HFS performance near the best operating region (k≈4500) while retaining only about 10% of the original graph's edges.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.