Related papers: PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models

URL: http://arxiv.org/abs/2412.07268v1
Date: Tue, 10 Dec 2024 07:49:07 GMT
Title: PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
Authors: Zining Wnag, Jinyang Guo, Ruihao Gong, Yang Yong, Aishan Liu, Yushi Huang, Jiaheng Liu, Xianglong Liu,
Abstract summary: PTSBench is the first comprehensive post-training sparsity benchmark towards algorithms and models.<n>We benchmark 10+ PTS general-pluggable fine-grained techniques on 3 typical tasks using over 40 off-the-shelf model architectures.<n>Our PTSBench can provide (1) new observations for a better understanding of the PTS algorithms, (2) in-depth and comprehensive evaluations for the sparsification ability of models, and (3) a well-structured and easy-integrate open-source framework.
Score: 39.56594737760323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increased attention to model efficiency, post-training sparsity (PTS) has become more and more prevalent because of its effectiveness and efficiency. However, there remain questions on better practice of PTS algorithms and the sparsification ability of models, which hinders the further development of this area. Therefore, a benchmark to comprehensively investigate the issues above is urgently needed. In this paper, we propose the first comprehensive post-training sparsity benchmark called PTSBench towards algorithms and models. We benchmark 10+ PTS general-pluggable fine-grained techniques on 3 typical tasks using over 40 off-the-shelf model architectures. Through extensive experiments and analyses, we obtain valuable conclusions and provide several insights from both algorithms and model aspects. Our PTSBench can provide (1) new observations for a better understanding of the PTS algorithms, (2) in-depth and comprehensive evaluations for the sparsification ability of models, and (3) a well-structured and easy-integrate open-source framework. We hope this work will provide illuminating conclusions and advice for future studies of post-training sparsity methods and sparsification-friendly model design. The code for our PTSBench is released at \href{https://github.com/ModelTC/msbench}{https://github.com/ModelTC/msbench}.

Related papers

Teaching Language Models to Reason with Tools [73.21700643314917]
We present emphHint-Engineering, a new data synthesis strategy that strategically injects diverse hints at optimal points within reasoning paths.<n>CoRT significantly enhances efficiency, reducing token usage by approximately 30% for the 32B model and 50% for the 1.5B model.
arXiv Detail & Related papers (2025-10-23T08:41:44Z)
Incorporating Pre-trained Diffusion Models in Solving the Schrödinger Bridge Problem [24.258114034494827]
Iterative Proportional Mean-Matching (IPMM), Iterative Proportional Terminus-Matching (IPTM), and Iterative Proportional Flow-Matching (IPFM)<n>Novel initialization strategies that use pre-trained SGMs to effectively train SB-based models.<n>Extensive experiments demonstrate the significant effectiveness and improvements of the proposed methods.
arXiv Detail & Related papers (2025-08-25T14:56:16Z)
KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z)
Shadow-FT: Tuning Instruct Model via Training on Paired Base Model [67.20706292627106]
Large language models (LLMs) consistently benefit from further fine-tuning on various tasks.<n>We propose a novel Shadow-FT framework to tune the Instruct models by leveraging the corresponding Base models.<n>Our proposed Shadow-FT introduces no additional parameters, is easy to implement, and significantly improves performance.
arXiv Detail & Related papers (2025-05-19T05:16:21Z)
A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models [2.171120568435925]
Large Language Models (LLMs) for code require significant computational resources for training and fine-tuning. To address this, the research community has increasingly turned to Efficient Fine-Tuning (PEFT) PEFT enables the adaptation of large models by updating only a small subset of parameters, rather than the entire model. Our study synthesizes findings from 27 peer-reviewed papers, identifying patterns in configuration strategies and adaptation trade-offs.
arXiv Detail & Related papers (2025-04-29T16:19:25Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis [89.60263788590893]
Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression. Existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth. We provide a novel benchmark for LLMs PTQ in this paper.
arXiv Detail & Related papers (2025-02-18T07:35:35Z)
High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models. To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence. Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z)
Stochastic Two Points Method for Deep Model Zeroth-order Optimization [32.459322001738144]
This paper introduces an efficient Two-Point (S2P) approach within the gradient-free regime. We present the theoretical convergence properties of S2P under the general and relaxed smoothness assumptions. We show that VS2P is highly effective in optimizing objectives for deep models.
arXiv Detail & Related papers (2024-02-02T18:39:40Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks [28.212889828892664]
We propose a novel source code model embedded with hierarchical dependencies. We introduce the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information. The results show that our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task.
arXiv Detail & Related papers (2021-11-20T04:03:42Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures. We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS. Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z)
Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis [10.154836127889487]
Knowledge Graph Embeddings (KGEs) have been intensively explored in recent years due to their promise for a wide range of applications. This paper proposes a simple yet effective KGE framework which can reduce the training time and carbon footprint by orders of magnitudes.
arXiv Detail & Related papers (2021-04-10T03:55:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.