Related papers: Curvature Tuning: Provable Training-free Model Steering From a Single Parameter

Curvature Tuning: Provable Training-free Model Steering From a Single Parameter

URL: http://arxiv.org/abs/2502.07783v1
Date: Tue, 11 Feb 2025 18:59:57 GMT
Title: Curvature Tuning: Provable Training-free Model Steering From a Single Parameter
Authors: Leyang Hu, Randall Balestriero,
Abstract summary: We show how a single parameter can be used to modulate the curvature of a model's decision boundary.<n>This makes CT both more efficient and interpretable than conventional fine-tuning methods.<n>We empirically validate its effectiveness in improving generalization and robustness of pretrained models.
Score: 13.412573082645096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The scaling of model size and data size has reshaped the paradigm of AI. As a result, the common protocol to leverage the latest models is to steer them towards a specific downstream task of interest through {\em fine-tuning}. Despite its importance, the main methods for fine-tuning remain limited to full or low-rank adapters--containing countless hyper-parameters and lacking interpretability. In this paper, we take a step back and demonstrate how novel and explainable post-training steering solutions can be derived theoretically from {\em spline operators}, a rich mathematical framing of Deep Networks that was recently developed. Our method--coined \textbf{Curvature Tuning (CT)}--has a single parameter that provably modulates the curvature of the model's decision boundary henceforth allowing training-free steering. This makes CT both more efficient and interpretable than conventional fine-tuning methods. We empirically validate its effectiveness in improving generalization and robustness of pretrained models. For example, CT improves out-of-distribution transfer performances of ResNet-18/50 by 2.57\%/1.74\% across seventeen downstream datasets, and improves RobustBench robust accuracy by 11.76\%/348.44\%. Additionally, we apply CT to ReLU-based Swin-T/S, improving their generalization on nine downstream datasets by 2.43\%/3.33\%. Our code is available at \href{https://github.com/Leon-Leyang/curvature-tuning}{https://github.com/Leon-Leyang/curvature-tuning}.

Related papers

Image Segmentation via Variational Model Based Tailored UNet: A Deep Variational Framework [6.146992603795658]
We propose Variational Model Based Tailored UNet (VM_TUNet) for image segmentation.<n>VM_TUNet combines the interpretability and edge-preserving properties of variational methods with the adaptive feature learning of neural networks.<n>We show that VM_TUNet achieves superior segmentation performance compared to existing approaches.
arXiv Detail & Related papers (2025-05-09T05:50:22Z)
Entropy-Based Adaptive Weighting for Self-Training [15.089334734753677]
We propose Entropy-Based Adaptive Weighting for Self-Training (EAST)<n>EAST is an adaptive weighting strategy designed to prioritize uncertain data during self-training.<n>We evaluate our approach on GSM8K and MATH benchmarks.
arXiv Detail & Related papers (2025-03-31T10:04:35Z)
FineGates: LLMs Finetuning with Compression using Stochastic Gates [7.093692674858257]
Large Language Models (LLMs) present significant challenges for full finetuning due to the high computational demands.<n>Lightweight finetuning techniques have been proposed, like learning low-rank adapter layers.<n>We propose an adaptor model based on gates that simultaneously sparsify the frozen base model with task-specific adaptation.
arXiv Detail & Related papers (2024-12-17T14:33:05Z)
PACE: Marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization [35.922096876707975]
PACE is a generalization of PArameter-efficient fine-tuning with Consistency rEgularization.<n>It implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge.<n>It also improves LoRA in text classification (GLUE) and mathematical reasoning.
arXiv Detail & Related papers (2024-09-25T17:56:00Z)
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.<n>We propose a novel model fine-tuning method to make full use of these ineffective parameters.<n>Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts [6.80671668491958]
Test-time adaptation (TTA) allows direct adaptation of a pre-trained model to unlabeled data during inference stage without access to source data or additional training. We propose three domain shift paradigms: photogrammetric to airborne LiDAR, airborne to mobile LiDAR, and synthetic to mobile laser scanning. Experimental results show our method improves classification accuracy by up to 20% mIoU, outperforming other methods.
arXiv Detail & Related papers (2024-07-08T15:40:28Z)
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z)
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z)
Trainable Projected Gradient Method for Robust Fine-tuning [36.470333094917436]
We propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization. This is motivated by formulating fine-tuning as a bi-level constrained optimization problem. We show that TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance.
arXiv Detail & Related papers (2023-03-19T17:30:44Z)
Online Hyperparameter Optimization for Class-Incremental Learning [99.70569355681174]
Class-incremental learning (CIL) aims to train a classification model while the number of classes increases phase-by-phase. An inherent challenge of CIL is the stability-plasticity tradeoff, i.e., CIL models should keep stable to retain old knowledge and keep plastic to absorb new knowledge. We propose an online learning method that can adaptively optimize the tradeoff without knowing the setting as a priori.
arXiv Detail & Related papers (2023-01-11T17:58:51Z)
On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z)
Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets. We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes. We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z)
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing) We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Rethinking the Hyperparameters for Fine-tuning [78.15505286781293]
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks. Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters. This paper re-examines several common practices of setting hyper parameters for fine-tuning.
arXiv Detail & Related papers (2020-02-19T18:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.