Winning the Lottery by Preserving Network Training Dynamics with Concrete Ticket Search
- URL: http://arxiv.org/abs/2512.07142v1
- Date: Mon, 08 Dec 2025 03:48:51 GMT
- Title: Winning the Lottery by Preserving Network Training Dynamics with Concrete Ticket Search
- Authors: Tanay Arora, Christof Teuscher,
- Abstract summary: Lottery Ticket Hypothesis asserts existence of highly sparse, trainableworks ('winning tickets') within dense, randomly neural networks.<n>State-of-the-art methods of drawing these tickets, like Lottery Ticket Rewinding, are computationally prohibitive.<n>In this work, we argue that PaI's reliance on first-order saliency metrics, which ignore inter-weight dependencies, contributes substantially to this performance gap.
- Score: 0.5156484100374058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Lottery Ticket Hypothesis asserts the existence of highly sparse, trainable subnetworks ('winning tickets') within dense, randomly initialized neural networks. However, state-of-the-art methods of drawing these tickets, like Lottery Ticket Rewinding (LTR), are computationally prohibitive, while more efficient saliency-based Pruning-at-Initialization (PaI) techniques suffer from a significant accuracy-sparsity trade-off and fail basic sanity checks. In this work, we argue that PaI's reliance on first-order saliency metrics, which ignore inter-weight dependencies, contributes substantially to this performance gap, especially in the sparse regime. To address this, we introduce Concrete Ticket Search (CTS), an algorithm that frames subnetwork discovery as a holistic combinatorial optimization problem. By leveraging a Concrete relaxation of the discrete search space and a novel gradient balancing scheme (GRADBALANCE) to control sparsity, CTS efficiently identifies high-performing subnetworks near initialization without requiring sensitive hyperparameter tuning. Motivated by recent works on lottery ticket training dynamics, we further propose a knowledge distillation-inspired family of pruning objectives, finding that minimizing the reverse Kullback-Leibler divergence between sparse and dense network outputs (CTS-KL) is particularly effective. Experiments on varying image classification tasks show that CTS produces subnetworks that robustly pass sanity checks and achieve accuracy comparable to or exceeding LTR, while requiring only a small fraction of the computation. For example, on ResNet-20 on CIFAR10, it reaches 99.3% sparsity with 74.0% accuracy in 7.9 minutes, while LTR attains the same sparsity with 68.3% accuracy in 95.2 minutes. CTS's subnetworks outperform saliency-based methods across all sparsities, but its advantage over LTR is most pronounced in the highly sparse regime.
Related papers
- The Quest for Winning Tickets in Low-Rank Adapters [24.58659526975649]
We investigate whether the Lottery Ticket Hypothesis extends to parameter-efficient fine-tuning.<n>Our key finding is that LTH holds within Low-Rank Adaptation (LoRA) methods.<n>We propose Partial-LoRA, a method that identifies saidworks and trains sparse low-rank adapters aligned with task-relevant subspaces of the pre-trained model.
arXiv Detail & Related papers (2025-12-27T06:39:08Z) - TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training [6.7228358095570995]
TwIST is a distributed training framework for efficient large language model sparsification.<n>It trains multipleworks in parallel, periodically aggregates their parameters, and resamples newworks during training.<n>It identifies high-qualityworks ("golden tickets") without requiring post-training procedures such as calibration or Hessian-based recovery.
arXiv Detail & Related papers (2025-11-06T02:13:24Z) - Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency [3.6199690908942546]
Self-Consistency (SC) generates multiple reasoning chains in parallel and selects the final answer via majority voting.<n>We propose Slim-SC, a step-wise pruning strategy that identifies and removes redundant chains using inter-chain similarity at the thought level.<n> Experiments show that Slim-SC reduces latency and KVC usage by up to 45% and 26%, respectively, with R1-Distill.
arXiv Detail & Related papers (2025-09-17T14:00:51Z) - UniPTS: A Unified Framework for Proficient Post-Training Sparsity [67.16547529992928]
Post-training Sparsity (PTS) is a newly emerged avenue that chases efficient network sparsity with limited data in need.
In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS.
Our framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks.
arXiv Detail & Related papers (2024-05-29T06:53:18Z) - Towards Simple and Accurate Human Pose Estimation with Stair Network [34.421529219040295]
We develop a small yet discrimicative model called STair Network, which can be stacked towards an accurate multi-stage pose estimation system.
To reduce computational cost, STair Network is composed of novel basic feature extraction blocks.
We demonstrate the effectiveness of the STair Network on two standard datasets.
arXiv Detail & Related papers (2022-02-18T10:37:13Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Efficient Lottery Ticket Finding: Less Data is More [87.13642800792077]
Lottery ticket hypothesis (LTH) reveals existence of winning tickets (sparse but criticalworks) for dense networks.
Finding winning tickets requires burdensome computations in the train-prune-retrain process.
This paper explores a new perspective on finding lottery tickets more efficiently, by doing so only with a specially selected subset of data.
arXiv Detail & Related papers (2021-06-06T19:58:17Z) - Enabling certification of verification-agnostic networks via
memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations.
We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively.
We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.