Finetune-Informed Pretraining Boosts Downstream Performance
- URL: http://arxiv.org/abs/2601.20884v1
- Date: Tue, 27 Jan 2026 20:26:44 GMT
- Title: Finetune-Informed Pretraining Boosts Downstream Performance
- Authors: Atik Faysal, Mohammad Rostami, Reihaneh Gh. Roshan, Nikhil Muralidhar, Huaxia Wang,
- Abstract summary: Finetune-Informed Pretraining (FIP) is a model-agnostic method that biases representation learning toward a designated target modality.<n>FIP consistently improves downstream fine-tuned performance with no extra data or compute.
- Score: 13.807896870065706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal pretraining is effective for building general-purpose representations, but in many practical deployments, only one modality is heavily used during downstream fine-tuning. Standard pretraining strategies treat all modalities uniformly, which can lead to under-optimized representations for the modality that actually matters. We propose Finetune-Informed Pretraining (FIP), a model-agnostic method that biases representation learning toward a designated target modality needed at fine-tuning time. FIP combines higher masking difficulty, stronger loss weighting, and increased decoder capacity for the target modality, without modifying the shared encoder or requiring additional supervision. When applied to masked modeling on constellation diagrams for wireless signals, FIP consistently improves downstream fine-tuned performance with no extra data or compute. FIP is simple to implement, architecture-compatible, and broadly applicable across multimodal masked modeling pipelines.
Related papers
- FAIL: Flow Matching Adversarial Imitation Learning for Image Generation [52.643484089126844]
Post-training of flow matching models-aligning the output distribution with a high-quality target-is mathematically equivalent to Imitation learning.<n>We propose Flow Matching Adrial Learning (FAIL), which minimizes policy-expert divergence through adversarial training without explicit rewards or pairwise comparisons.
arXiv Detail & Related papers (2026-02-12T16:36:33Z) - Elastic ViTs from Pretrained Models without Retraining [74.5386166956142]
Vision foundation models achieve remarkable performance but are only available in a limited set of pre-determined sizes.<n>We introduce SnapViT: Single-shot network approximation for pruned Vision Transformers.<n>Our approach efficiently combines gradient information with cross-network structure correlations, approximated via an evolutionary algorithm.
arXiv Detail & Related papers (2025-10-20T16:15:03Z) - Exploring Cross-Modal Flows for Few-Shot Learning [9.866094371902372]
We propose a model-agnostic multi-step adjustment approach by learning a cross-modal velocity field: Flow Matching Alignment.<n>Results have demonstrated that FMA can consistently yield significant performance gains across various benchmarks and backbones.
arXiv Detail & Related papers (2025-10-16T10:32:48Z) - POME: Post Optimization Model Edit via Muon-style Projection [74.73326657229347]
Post-Optimization Model Edit (POME) enhances the performance of fine-tuned large language models.<n>It uses a muon-style projection to $Delta W$, the difference between the fine-tuned and pretrained weights.<n>As a simple post-processing step, POME is completely decoupled from the training pipeline.
arXiv Detail & Related papers (2025-10-08T04:20:11Z) - Towards Efficient General Feature Prediction in Masked Skeleton Modeling [59.46799426434277]
We propose a novel General Feature Prediction framework (GFP) for efficient mask skeleton modeling.<n>Our key innovation is replacing conventional low-level reconstruction with high-level feature prediction that spans from local motion patterns to global semantic representations.
arXiv Detail & Related papers (2025-09-03T18:05:02Z) - Improving Progressive Generation with Decomposable Flow Matching [50.63174319509629]
Decomposable Flow Matching (DFM) is a simple and effective framework for the progressive generation of visual media.<n>On Imagenet-1k 512px, DFM achieves 35.2% improvements in FDD scores over the base architecture and 26.4% over the best-performing baseline.
arXiv Detail & Related papers (2025-06-24T17:58:02Z) - Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning [41.19870454097444]
We propose a novel approach that models network weights by leveraging a combination of physical priors.<n>We use three foundational equations -- heat diffusion, wave propagation, and Poisson's steady-state equation -- each contributing distinctive modeling properties.<n>MoPPA improves PEFT accuracy by up to 2.1% on VTAB-1K image classification with a comparable number of trainable parameters.
arXiv Detail & Related papers (2024-12-03T19:00:34Z) - FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z) - DR-Tune: Improving Fine-tuning of Pretrained Visual Models by
Distribution Regularization with Semantic Calibration [38.4461170690033]
We propose a novel fine-tuning framework, namely distribution regularization with semantic calibration (DR-Tune)
DR-Tune employs distribution regularization by enforcing the downstream task head to decrease its classification error on the pretrained feature distribution.
To alleviate the interference by semantic drift, we develop the semantic calibration (SC) module.
arXiv Detail & Related papers (2023-08-23T10:59:20Z) - Frame Flexible Network [52.623337134518835]
Existing video recognition algorithms always conduct different training pipelines for inputs with different frame numbers.
If we evaluate the model using other frames which are not used in training, we observe the performance will drop significantly.
We propose a general framework, named Frame Flexible Network (FFN), which enables the model to be evaluated at different frames to adjust its computation.
arXiv Detail & Related papers (2023-03-26T20:51:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.