Related papers: Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting

Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting

URL: http://arxiv.org/abs/2505.24088v1
Date: Fri, 30 May 2025 00:16:53 GMT
Title: Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
Authors: Chen Huang, Skyler Seto, Hadi Pouransari, Mehrdad Farajtabar, Raviteja Vemulapalli, Fartash Faghri, Oncel Tuzel, Barry-John Theobald, Josh Susskind,
Abstract summary: Vision foundation models pre-trained on massive data encode rich representations of real-world concepts.<n>Recent methods of robust fine-tuning aim to mitigate forgetting of prior knowledge without affecting the fine-tuning performance.<n>We propose a novel regularization method Proxy-FDA that explicitly preserves the structural knowledge in feature space.
Score: 24.10386630735279
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision foundation models pre-trained on massive data encode rich representations of real-world concepts, which can be adapted to downstream tasks by fine-tuning. However, fine-tuning foundation models on one task often leads to the issue of concept forgetting on other tasks. Recent methods of robust fine-tuning aim to mitigate forgetting of prior knowledge without affecting the fine-tuning performance. Knowledge is often preserved by matching the original and fine-tuned model weights or feature pairs. However, such point-wise matching can be too strong, without explicit awareness of the feature neighborhood structures that encode rich knowledge as well. We propose a novel regularization method Proxy-FDA that explicitly preserves the structural knowledge in feature space. Proxy-FDA performs Feature Distribution Alignment (using nearest neighbor graphs) between the pre-trained and fine-tuned feature spaces, and the alignment is further improved by informative proxies that are generated dynamically to increase data diversity. Experiments show that Proxy-FDA significantly reduces concept forgetting during fine-tuning, and we find a strong correlation between forgetting and a distributional distance metric (in comparison to L2 distance). We further demonstrate Proxy-FDA's benefits in various fine-tuning settings (end-to-end, few-shot and continual tuning) and across different tasks like image classification, captioning and VQA.

Related papers

Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z)
Chasing Fairness in Graphs: A GNN Architecture Perspective [73.43111851492593]
We propose textsfFair textsfMessage textsfPassing (FMP) designed within a unified optimization framework for graph neural networks (GNNs) In FMP, the aggregation is first adopted to utilize neighbors' information and then the bias mitigation step explicitly pushes demographic group node presentation centers together. Experiments on node classification tasks demonstrate that the proposed FMP outperforms several baselines in terms of fairness and accuracy on three real-world datasets.
arXiv Detail & Related papers (2023-12-19T18:00:15Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning [49.999268109518255]
Proxy-based Deep Metric Learning learns by embedding images close to their class representatives (proxies) In addition, proxy-based DML struggles to learn class-internal structures. We introduce non-isotropic probabilistic proxy-based DML to address both issues.
arXiv Detail & Related papers (2022-07-08T09:34:57Z)
FedAvg with Fine Tuning: Local Updates Lead to Representation Learning [54.65133770989836]
Federated Averaging (FedAvg) algorithm consists of alternating between a few local gradient updates at client nodes, followed by a model averaging update at the server. We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks. We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.
arXiv Detail & Related papers (2022-05-27T00:55:24Z)
Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder [23.860842627883187]
We teach the model to capture broader variations of the feature distributions with a novel noise-enhanced supervised autoencoder (NSAE) NSAE trains the model by jointly reconstructing inputs and predicting the labels of inputs as well as their reconstructed pairs. We also take advantage of NSAE structure and propose a two-step fine-tuning procedure that achieves better adaption and improves classification performance in the target domain.
arXiv Detail & Related papers (2021-08-11T04:45:56Z)
Crowd Counting via Perspective-Guided Fractional-Dilation Convolution [75.36662947203192]
This paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dilation Network (PFDNet) By modeling the continuous scale variations, the proposed PFDNet is able to select the proper fractional dilation kernels for adapting to different spatial locations. It significantly improves the flexibility of the state-of-the-arts that only consider the discrete representative scales.
arXiv Detail & Related papers (2021-07-08T07:57:00Z)
Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy [67.4471689755097]
This paper empirically demonstrates that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity. We provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients' updates.
arXiv Detail & Related papers (2021-06-25T14:47:19Z)
FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout [16.250862114257277]
We introduce Ordered Dropout, a mechanism that achieves an ordered, nested representation of knowledge in Neural Networks. We employ this technique, along with a self-distillation methodology, in the realm of Federated Learning in a framework called FjORD. FjORD consistently leads to significant performance gains over state-of-the-art baselines, while maintaining its nested structure.
arXiv Detail & Related papers (2021-02-26T13:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.