Related papers: MEGA: Second-Order Gradient Alignment for Catastrophic Forgetting Mitigation in GFSCIL

MEGA: Second-Order Gradient Alignment for Catastrophic Forgetting Mitigation in GFSCIL

URL: http://arxiv.org/abs/2504.13691v3
Date: Wed, 20 Aug 2025 11:45:29 GMT
Title: MEGA: Second-Order Gradient Alignment for Catastrophic Forgetting Mitigation in GFSCIL
Authors: Jinhui Pang, Changqing Lin, Hao Lin, Zhihui Zhang, Weiping Ding, Yu Liu, Xiaoshuai Hao,
Abstract summary: Graph Few-Shot Class-Incremental Learning (GFSCIL) enables models to continually learn from limited samples of novel tasks after initial training on a large base dataset.<n>Existing GFSCIL approaches typically utilize Prototypical Networks (PNs) for metric-based class representations and fine-tune the model during the incremental learning stage.<n>We introduce Model-Agnostic Meta Graph Continual Learning (MEGA), aimed at effectively alleviating catastrophic forgetting for GFSCIL.
Score: 9.557104125817668
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph Few-Shot Class-Incremental Learning (GFSCIL) enables models to continually learn from limited samples of novel tasks after initial training on a large base dataset. Existing GFSCIL approaches typically utilize Prototypical Networks (PNs) for metric-based class representations and fine-tune the model during the incremental learning stage. However, these PN-based methods oversimplify learning via novel query set fine-tuning and fail to integrate Graph Continual Learning (GCL) techniques due to architectural constraints. To address these challenges, we propose a more rigorous and practical setting for GFSCIL that excludes query sets during the incremental training phase. Building on this foundation, we introduce Model-Agnostic Meta Graph Continual Learning (MEGA), aimed at effectively alleviating catastrophic forgetting for GFSCIL. Specifically, by calculating the incremental second-order gradient during the meta-training stage, we endow the model to learn high-quality priors that enhance incremental learning by aligning its behaviors across both the meta-training and incremental learning stages. Extensive experiments on four mainstream graph datasets demonstrate that MEGA achieves state-of-the-art results and enhances the effectiveness of various GCL methods in GFSCIL. We believe that our proposed MEGA serves as a model-agnostic GFSCIL paradigm, paving the way for future research.

Related papers

Evolutionary Strategies lead to Catastrophic Forgetting in LLMs [51.91763220981834]
Evolutionary Strategies (ES) have recently re-emerged as a gradient-free alternative to traditional learning algorithms.<n>ES is able to reach performance numbers close to GRPO for math and reasoning tasks with a comparable compute budget.<n>ES is accompanied by significant forgetting of prior abilities, limiting its applicability for training models online.
arXiv Detail & Related papers (2026-01-28T18:59:34Z)
GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning [50.40400074353263]
Graph Neural Networks (GNNs) are powerful tools for precessing relational data but often struggle to generalize to unseen graphs.<n>We introduce textbfGraph textbfIn-context textbfL textbfTransformer (GILT), a framework built on an LLM-free and tuning-free architecture.
arXiv Detail & Related papers (2025-10-06T08:09:15Z)
Unbiased Online Curvature Approximation for Regularized Graph Continual Learning [9.70311578832594]
Graph continual learning (GCL) aims to learn from a continuous sequence of graph-based tasks.<n>Regularization methods are vital for preventing catastrophic forgetting in GCL.<n>We propose a new unbiased online curvature approximation of the full Fisher information matrix (FIM) based on the model's current learning state.
arXiv Detail & Related papers (2025-09-16T06:35:13Z)
A Graph Laplacian Eigenvector-based Pre-training Method for Graph Neural Networks [7.359145401513628]
Structure-based pre-training methods are under-explored yet crucial for downstream applications which rely on underlying graph structure.<n>We propose the Laplacian Eigenvector Learning Module (LELM), a novel pre-training module for graph neural networks (GNNs) based on predicting the low-frequency eigenvectors of the graph Laplacian.<n>LELM introduces a novel architecture that overcomes oversmoothing, allowing the GNN model to learn long-range interdependencies.
arXiv Detail & Related papers (2025-09-02T20:07:20Z)
Learning Causal Graphs at Scale: A Foundation Model Approach [28.966180222166766]
We propose Attention-DAG (ADAG), a novel attention-mechanism-based architecture for learning multiple linear Structural Equation Models (SEMs)<n>ADAG learns the mapping from observed data to both graph structure and parameters via a nonlinear attention-based kernel.<n>We evaluate our proposed approach on benchmark synthetic datasets and find that ADAG achieves substantial improvements in both DAG learning accuracy and zero-shot inference efficiency.
arXiv Detail & Related papers (2025-06-23T04:41:02Z)
Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study [35.60356938705585]
Real-world data, including graph-structure data, often arrives in a streaming manner, which means that learning systems need to continuously acquire new knowledge.<n>We propose a simple-yet-effective method, Simple Graph Continual Learning (SimGCL), that surpasses the previous state-of-the-art GNN-based baseline by around 20%.
arXiv Detail & Related papers (2025-05-24T13:43:29Z)
Instance-Prototype Affinity Learning for Non-Exemplar Continual Graph Learning [7.821213342456415]
Graph Neural Networks endure catastrophic forgetting, undermining their capacity to preserve previously acquired knowledge.<n>We propose Instance-Prototype Affinity Learning (IPAL), a novel paradigm for Non-Exemplar Continual Graph Learning (NECGL)<n>We embed a Decision Boundary Perception mechanism within PCL, fostering greater inter-class discriminability.
arXiv Detail & Related papers (2025-05-15T07:35:27Z)
Transfer Learning with Foundational Models for Time Series Forecasting using Low-Rank Adaptations [0.0]
This study proposes the methodology LLIAM, a straightforward adaptation of a kind of FM, Large Language Models, for the Time Series Forecasting task. A comparison was made between the performance of LLIAM and different state-of-the-art DL algorithms, including Recurrent Neural Networks and Temporal Convolutional Networks, as well as a LLM-based method, TimeLLM. The outcomes of this investigation demonstrate the efficacy of LLIAM, highlighting that this straightforward and general approach can attain competent results without the necessity for applying complex modifications.
arXiv Detail & Related papers (2024-10-15T12:14:01Z)
How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)<n>We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.<n>We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.<n>Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z)
Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning [15.475427498268393]
The Train-Attention-Augmented Language Model (TAALM) enhances learning efficiency by dynamically predicting and applying weights to tokens based on their usefulness.<n>We show that TAALM proves the state-of-the-art performance upon the baselines, and also shows synergistic compatibility when integrated with previous CKL approaches.
arXiv Detail & Related papers (2024-07-24T01:04:34Z)
Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition [6.14431765787048]
Continual learning (CL) aims to build machine learning models that can accumulate knowledge continuously over different tasks without retraining from scratch. Previous studies have shown that pre-training graph neural networks (GNN) may lead to negative transfer after fine-tuning. We propose the first continual graph learning benchmark for continual graph learning setting.
arXiv Detail & Related papers (2024-01-31T18:20:42Z)
Constructing Sample-to-Class Graph for Few-Shot Class-Incremental Learning [10.111587226277647]
Few-shot class-incremental learning (FSCIL) aims to build machine learning model that can continually learn new concepts from a few data samples. In this paper, we propose a Sample-to-Class (S2C) graph learning method for FSCIL.
arXiv Detail & Related papers (2023-10-31T08:38:14Z)
SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning. We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task. We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks. Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients. We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data. We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z)
B-SMALL: A Bayesian Neural Network approach to Sparse Model-Agnostic Meta-Learning [2.9189409618561966]
We propose a Bayesian neural network based MAML algorithm, which we refer to as the B-SMALL algorithm. We demonstrate the performance of B-MAML using classification and regression tasks, and highlight that training a sparsifying BNN using MAML indeed improves the parameter footprint of the model.
arXiv Detail & Related papers (2021-01-01T09:19:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.