Related papers: FGGM: Fisher-Guided Gradient Masking for Continual Learning

FGGM: Fisher-Guided Gradient Masking for Continual Learning

URL: http://arxiv.org/abs/2601.18261v1
Date: Mon, 26 Jan 2026 08:35:34 GMT
Title: FGGM: Fisher-Guided Gradient Masking for Continual Learning
Authors: Chao-Hong Tan, Qian Chen, Wen Wang, Yukun Ma, Chong Zhang, Chong Deng, Qinglin Zhang, Xiangang Li, Jieping Ye,
Abstract summary: Catastrophic forgetting impairs the continuous learning of large language models.<n>We propose Fisher-Guided Gradient Masking (FGGM), a framework that mitigates this by strategically selecting parameters for updates using diagonal Fisher Information.
Score: 57.56585138260662
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Catastrophic forgetting impairs the continuous learning of large language models. We propose Fisher-Guided Gradient Masking (FGGM), a framework that mitigates this by strategically selecting parameters for updates using diagonal Fisher Information. FGGM dynamically generates binary masks with adaptive thresholds, preserving critical parameters to balance stability and plasticity without requiring historical data. Unlike magnitude-based methods such as MIGU, our approach offers a mathematically principled parameter importance estimation. On the TRACE benchmark, FGGM shows a 9.6% relative improvement in retaining general capabilities over supervised fine-tuning (SFT) and a 4.4% improvement over MIGU on TRACE tasks. Additional analysis on code generation tasks confirms FGGM's superior performance and reduced forgetting, establishing it as an effective solution.

Related papers

Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights [0.0]
This work introduces the Meta-Adaptive UKF (MA-UKF), a framework that reformulates sigma-point weight as a hyper parameter optimization problem.<n>Unlike standard adaptive filters that rely on instantaneous corrections, our approach employs a Recurrent Context to compress the history of measurement innovations into a compact latent embedding.<n> Numerical benchmarks on maneuvering targets demonstrate that the MA-UKF significantly outperforms standard baselines.
arXiv Detail & Related papers (2026-03-04T18:27:59Z)
GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters [20.34415141254838]
GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters.<n>We introduce a novel sparse fine-tuning technique named GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters.
arXiv Detail & Related papers (2025-10-22T17:11:49Z)
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs [55.20230501807337]
We present the first systematic evaluation of 5 methods for improving prompt robustness within a unified experimental framework.<n>We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural Instructions dataset.
arXiv Detail & Related papers (2025-08-15T10:32:50Z)
LLM Unlearning using Gradient Ratio-Based Influence Estimation and Noise Injection [0.0]
Existing empirical methods often yield incomplete forgetting or unintended degradation of unrelated knowledge due to poor localization.<n>GRIN introduces a novel gradient-ratio-based metric to identify parameters most responsible for memorizing forget data.<n>We then perform selective noise injection into these parameters prior to fine-tuning, which improves unlearning performance while maintaining model utility.
arXiv Detail & Related papers (2025-08-08T17:15:32Z)
An Efficient Machine Learning Framework for Forest Height Estimation from Multi-Polarimetric Multi-Baseline SAR data [2.395410408500006]
This paper introduces FGump, a forest height estimation framework by gradient boosting using multi-channel SAR processing with LiDAR profiles as Ground Truth(GT)<n>It ensures a strong balance between accuracy and computational efficiency, using a limited set of hand-designed features and avoiding heavy preprocessing (e.g., calibration and/or quantization)<n> Experimental results confirm that FGump outperforms State-of-the-Art (SOTA) AI-based and classical methods, achieving higher accuracy and significantly lower training and inference times.
arXiv Detail & Related papers (2025-07-28T13:07:23Z)
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation [65.93276461982093]
Existing approaches either selectively fine-tune parameters or freeze the VFMs and update only the adapters.<n>We propose textbfFisherTune, a robust fine-tuning method guided by the Domain-Related Fisher Information Matrix (DR-FIM)<n>DR-FIM measures parameter sensitivity across tasks and domains, enabling selective updates that preserve generalization and enhance DGSS adaptability.
arXiv Detail & Related papers (2025-03-23T04:47:15Z)
UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z)
Stay on topic with Classifier-Free Guidance [57.28934343207042]
We show that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks.
arXiv Detail & Related papers (2023-06-30T17:07:02Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.