Tuning Pre-trained Model via Moment Probing
- URL: http://arxiv.org/abs/2307.11342v3
- Date: Mon, 2 Oct 2023 14:57:11 GMT
- Title: Tuning Pre-trained Model via Moment Probing
- Authors: Mingze Gao and Qilong Wang and Zhenyi Lin and Pengfei Zhu and Qinghua
Hu and Jingbo Zhou
- Abstract summary: We propose a novel Moment Probing (MP) method to explore the potential of LP.
MP performs a linear classification head based on the mean of final features.
Our MP significantly outperforms LP and is competitive with counterparts at less training cost.
- Score: 62.445281364055795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, efficient fine-tuning of large-scale pre-trained models has
attracted increasing research interests, where linear probing (LP) as a
fundamental module is involved in exploiting the final representations for
task-dependent classification. However, most of the existing methods focus on
how to effectively introduce a few of learnable parameters, and little work
pays attention to the commonly used LP module. In this paper, we propose a
novel Moment Probing (MP) method to further explore the potential of LP.
Distinguished from LP which builds a linear classification head based on the
mean of final features (e.g., word tokens for ViT) or classification tokens,
our MP performs a linear classifier on feature distribution, which provides the
stronger representation ability by exploiting richer statistical information
inherent in features. Specifically, we represent feature distribution by its
characteristic function, which is efficiently approximated by using first- and
second-order moments of features. Furthermore, we propose a multi-head
convolutional cross-covariance (MHC$^3$) to compute second-order moments in an
efficient and effective manner. By considering that MP could affect feature
learning, we introduce a partially shared module to learn two recalibrating
parameters (PSRP) for backbones based on MP, namely MP$_{+}$. Extensive
experiments on ten benchmarks using various models show that our MP
significantly outperforms LP and is competitive with counterparts at less
training cost, while our MP$_{+}$ achieves state-of-the-art performance.
Related papers
- Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Few-Shot Medical Image Segmentation with Large Kernel Attention [5.630842216128902]
We propose a few-shot medical segmentation model that acquire comprehensive feature representation capabilities.
Our model comprises four key modules: a dual-path feature extractor, an attention module, an adaptive prototype prediction module, and a multi-scale prediction fusion module.
The results demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-07-27T02:28:30Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - MAP: A Model-agnostic Pretraining Framework for Click-through Rate
Prediction [39.48740397029264]
We propose a Model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data.
We derive two practical algorithms: masked feature prediction (RFD) and replaced feature detection (RFD)
arXiv Detail & Related papers (2023-08-03T12:55:55Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Provable General Function Class Representation Learning in Multitask
Bandits and MDPs [58.624124220900306]
multitask representation learning is a popular approach in reinforcement learning to boost the sample efficiency.
In this work, we extend the analysis to general function class representations.
We theoretically validate the benefit of multitask representation learning within general function class for bandits and linear MDP.
arXiv Detail & Related papers (2022-05-31T11:36:42Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain,
Active and Continual Few-Shot Learning [41.07029317930986]
We propose a variance-sensitive class of models that operates in a low-label regime.
The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier.
We further extend this approach to a transductive learning setting, proposing Transductive CNAPS.
arXiv Detail & Related papers (2022-01-13T18:59:02Z) - Towards Better Object Detection in Scale Variation with Adaptive Feature
Selection [3.5352273012717044]
We propose a novel adaptive feature selection module (AFSM) to automatically learn the way to fuse multi-level representations in the channel dimension.
It significantly improves the performance of the detectors that have a feature pyramid structure.
A class-aware sampling mechanism (CASM) is proposed to tackle the class imbalance problem.
arXiv Detail & Related papers (2020-12-06T13:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.