MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection
- URL: http://arxiv.org/abs/2505.23870v2
- Date: Fri, 08 Aug 2025 18:25:36 GMT
- Title: MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection
- Authors: Yixian Shen, Qi Bi, Jia-Hong Huang, Hongyi Zhu, Andy D. Pimentel, Anuj Pathania,
- Abstract summary: MaCP, Minimal yet Mighty adaptive Cosine Projection, achieves exceptional performance while requiring minimal parameters and memory.<n>It consistently delivers superior accuracy, significantly reduced computational complexity, and lower memory requirements compared to existing alternatives.
- Score: 10.300935899853748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a new adaptation method MaCP, Minimal yet Mighty adaptive Cosine Projection, that achieves exceptional performance while requiring minimal parameters and memory for fine-tuning large foundation models. Its general idea is to exploit the superior energy compaction and decorrelation properties of cosine projection to improve both model efficiency and accuracy. Specifically, it projects the weight change from the low-rank adaptation into the discrete cosine space. Then, the weight change is partitioned over different levels of the discrete cosine spectrum, and each partition's most critical frequency components are selected. Extensive experiments demonstrate the effectiveness of MaCP across a wide range of single-modality tasks, including natural language understanding, natural language generation, text summarization, as well as multi-modality tasks such as image classification and video understanding. MaCP consistently delivers superior accuracy, significantly reduced computational complexity, and lower memory requirements compared to existing alternatives.
Related papers
- CoSA: Compressed Sensing-Based Adaptation of Large Language Models [21.688889188355645]
CoSA (Compressed Sensing-Based Adaptation) is a new PEFT method extended from compressed sensing theory.<n>We show that CoSA provides a principled perspective for efficient and expressive multi-scale model adaptation.<n>We evaluate CoSA on 10 diverse tasks, including natural language understanding and generation, employing 5 models of different scales from RoBERTa, Llama, and Qwen families.
arXiv Detail & Related papers (2026-02-05T00:11:43Z) - High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning [57.85676271833619]
Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning.<n>We present textbfSMoA, a high-rank textbfStructured textbfMOdulation textbfAdapter that uses fewer trainable parameters while maintaining a higher rank.
arXiv Detail & Related papers (2026-01-12T13:06:17Z) - TuckA: Hierarchical Compact Tensor Experts for Efficient Fine-Tuning [83.93651411533533]
We introduce Tucker Adaptation (TuckA), a method with four key properties.<n>We develop an efficient batch-level routing mechanism, which reduces the router's parameter size by a factor of $L$.<n>Experiments on benchmarks in natural language understanding, image classification, and mathematical reasoning speak to the efficacy of TuckA.
arXiv Detail & Related papers (2025-11-10T09:03:16Z) - MISCGrasp: Leveraging Multiple Integrated Scales and Contrastive Learning for Enhanced Volumetric Grasping [15.127239823566194]
MISCGrasp is a volumetric grasping method that integrates multi-scale feature extraction with contrastive feature enhancement for self-adaptive grasping.<n>We propose a query-based interaction between high-level and low-level features through the Insight Transformer, while the Empower Transformer selectively attends to the highest-level features.<n>Extensive experiments in both simulated and real-world environments demonstrate that MISCGrasp outperforms baseline and variant methods in tabletop decluttering tasks.
arXiv Detail & Related papers (2025-07-03T14:36:45Z) - Singular Value Decomposition on Kronecker Adaptation for Large Language Model [0.8747606955991707]
Large pre-trained Transformer models achieve state-of-the-art results across diverse language and reasoning tasks.<n>Full fine-tuning incurs substantial storage, memory, and computational overhead.<n>We propose SoKA, a novel PEFT strategy that combines Kronecker-product tensor factorization with SVD-driven initialization and dynamic rank selection.
arXiv Detail & Related papers (2025-06-18T08:28:53Z) - Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence [131.41894248194995]
We propose context-oriented decomposition adaptation (CorDA), a novel method that initializes adapters in a task-aware manner.<n>Thanks to the task awareness, our method enables two optional adaptation modes, knowledge-preserved mode (KPM) and instruction-previewed mode (IPM)
arXiv Detail & Related papers (2025-06-16T07:55:14Z) - HiLAB: A Hybrid Inverse-Design Framework [0.0]
HiLAB is a new paradigm for inverse design of nanophotonic structures.<n>It addresses multi-functional device design by generating diverse freeform configurations at reduced simulation costs.
arXiv Detail & Related papers (2025-05-23T05:34:56Z) - SeWA: Selective Weight Average via Probabilistic Masking [51.015724517293236]
We show that only a few points are needed to achieve better and faster convergence.<n>We transform the discrete selection problem into a continuous subset optimization framework.<n>We derive the SeWA's stability bounds, which are sharper than that under both convex image checkpoints.
arXiv Detail & Related papers (2025-02-14T12:35:21Z) - Parameter-Efficient Fine-Tuning via Selective Discrete Cosine Transform [10.565509997395504]
We propose a novel Selective Discrete Cosine Transformation (sDCTFT) fine-tuning scheme to push this frontier.
Its general idea is to exploit the superior energy compaction and decorrelation properties of DCT.
Experiments on four benchmark datasets demonstrate the superior accuracy, reduced computational cost, and lower storage requirements.
arXiv Detail & Related papers (2024-10-09T16:07:42Z) - CWF: Consolidating Weak Features in High-quality Mesh Simplification [50.634070540791555]
We propose a smooth functional that simultaneously considers all of these requirements.
The functional comprises a normal anisotropy term and a Centroidal Voronoi Tessellation (CVT) energy term.
arXiv Detail & Related papers (2024-04-24T05:37:17Z) - Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models [1.5807079236265718]
KEN is a straightforward, universal and unstructured pruning algorithm based on Kernel Density Estimation (KDE)
Ken aims to construct optimized transformers by selectively preserving the most significant parameters while restoring others to their pre-training state.
Ken achieves equal or better performance than their original unpruned versions, with a minimum parameter reduction of 25%.
arXiv Detail & Related papers (2024-02-05T16:11:43Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Efficient Semantic Image Synthesis via Class-Adaptive Normalization [116.63715955932174]
Class-adaptive normalization (CLADE) is a lightweight but equally-effective variant that is only adaptive to semantic class.
We introduce intra-class positional map encoding calculated from semantic layouts to modulate the normalization parameters of CLADE.
The proposed CLADE can be generalized to different SPADE-based methods while achieving comparable generation quality compared to SPADE.
arXiv Detail & Related papers (2020-12-08T18:59:32Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Compression of descriptor models for mobile applications [26.498907514590165]
We evaluate the computational cost, model size, and matching accuracy tradeoffs for deep neural networks.
We observe a significant redundancy in the learned weights, which we exploit through the use of depthwise separable layers.
We propose the Convolution-Depthwise-Pointwise(CDP) layer, which provides a means of interpolating between the standard and depthwise separable convolutions.
arXiv Detail & Related papers (2020-01-09T17:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.