Related papers: Hyperparameter Tuning is All You Need for LISTA

Hyperparameter Tuning is All You Need for LISTA

URL: http://arxiv.org/abs/2110.15900v1
Date: Fri, 29 Oct 2021 16:35:38 GMT
Title: Hyperparameter Tuning is All You Need for LISTA
Authors: Xiaohan Chen, Jialin Liu, Zhangyang Wang, Wotao Yin
Abstract summary: Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) introduces the concept of unrolling an iterative algorithm and training it like a neural network. We show that adding momentum to intermediate variables in the LISTA network achieves a better convergence rate. We call this new ultra-light weight network HyperLISTA.
Score: 92.7008234085887
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) introduces the concept of unrolling an iterative algorithm and training it like a neural network. It has had great success on sparse recovery. In this paper, we show that adding momentum to intermediate variables in the LISTA network achieves a better convergence rate and, in particular, the network with instance-optimal parameters is superlinearly convergent. Moreover, our new theoretical results lead to a practical approach of automatically and adaptively calculating the parameters of a LISTA network layer based on its previous layers. Perhaps most surprisingly, such an adaptive-parameter procedure reduces the training of LISTA to tuning only three hyperparameters from data: a new record set in the context of the recent advances on trimming down LISTA complexity. We call this new ultra-light weight network HyperLISTA. Compared to state-of-the-art LISTA models, HyperLISTA achieves almost the same performance on seen data distributions and performs better when tested on unseen distributions (specifically, those with different sparsity levels and nonzero magnitudes). Code is available: https://github.com/VITA-Group/HyperLISTA.

Related papers

Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models [0.0]
It is often assumed thatReinforcement learning (RL) fine-tuning requires updating most of a model's parameters.<n>We call this phenomenon RL-induced parameter update sparsity.<n>We show that fine-tuning only this sparse subnetwork recovers full model performance and yields parameters nearly identical to the fully fine-tuned model.
arXiv Detail & Related papers (2025-07-23T01:02:17Z)
Hyperpruning: Efficient Search through Pruned Variants of Recurrent Neural Networks Leveraging Lyapunov Spectrum [7.136205674624814]
We introduce an efficient hyperpruning framework, termed LS-based Hyperpruning (LSH)<n>LS-based Hyperpruning reduces search time by an order of magnitude compared to conventional approaches relying on full training.
arXiv Detail & Related papers (2025-06-09T17:49:29Z)
Histogram-based Parameter-efficient Tuning for Passive Sonar Classification [42.23422932643755]
We propose a novel parameter-efficient tuning (HPT) technique that captures statistics of the target domain and modulates the embeddings. Experimental results on three downstream passive sonar datasets (ShipsEar, DeepShip, VTUAD) demonstrate that HPT outperforms conventional adapters.
arXiv Detail & Related papers (2025-04-21T16:36:38Z)
Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? [42.362388367152256]
Large language models (LLMs) are used to fine-tune a parameter-efficient version of Code Llama using LoRA. Our method achieves competitive or superior results in terms of Root Mean Square Error (RMSE) while significantly reducing computational overhead.
arXiv Detail & Related papers (2025-04-08T13:15:47Z)
LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method. We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation. Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Low-Rank Representations Meets Deep Unfolding: A Generalized and Interpretable Network for Hyperspectral Anomaly Detection [41.50904949744355]
Current hyperspectral anomaly detection (HAD) benchmark datasets suffer from low resolution, simple background, and small size of the detection data. These factors also limit the performance of the well-known low-rank representation (LRR) models in terms of robustness. We build a new set of HAD benchmark datasets for improving the robustness of the HAD algorithm in complex scenarios, AIR-HAD for short.
arXiv Detail & Related papers (2024-02-23T14:15:58Z)
Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges. Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning. A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z)
Hyperparameter Optimization through Neural Network Partitioning [11.6941692990626]
We propose a simple and efficient way for optimizing hyper parameters in neural networks. Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions. We demonstrate that we can apply this objective to optimize a variety of different hyper parameters in a single training run.
arXiv Detail & Related papers (2023-04-28T11:24:41Z)
Hybrid ISTA: Unfolding ISTA With Convergence Guarantees Using Free-Form Deep Neural Networks [50.193061099112626]
It is promising to solve linear inverse problems by unfolding iterative algorithms as deep neural networks (DNNs) with learnable parameters. Existing ISTA-based unfolded algorithms restrict the network architectures for iterative updates with the partial weight coupling structure to guarantee convergence. This paper is the first to provide a convergence-provable framework that enables free-form DNNs in ISTA-based unfolded algorithms.
arXiv Detail & Related papers (2022-04-25T13:17:57Z)
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning. We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z)
Towards Robust and Automatic Hyper-Parameter Tunning [39.04604349338802]
We introduce a new class of HPO method and explore how the low-rank factorization of intermediate layers of a convolutional network can be used to define an analytical response surface. We quantify how this surface behaves as a surrogate to model performance and can be solved using a trust-region search algorithm, which we call autoHyper.
arXiv Detail & Related papers (2021-11-28T05:27:34Z)
Surrogate Model Based Hyperparameter Tuning for Deep Learning with SPOT [0.40611352512781856]
This article demonstrates how the architecture-level parameters of deep learning models that were implemented in Keras/tensorflow can be optimized. The implementation of the tuning procedure is 100 % based on R, the software environment for statistical computing.
arXiv Detail & Related papers (2021-05-30T21:16:51Z)
Practical and sample efficient zero-shot HPO [8.41866793161234]
We provide an overview of available approaches and introduce two novel techniques to handle the problem. The first is based on a surrogate model and adaptively chooses pairs of dataset, configuration to query. The second is for settings where finding, tuning and testing a surrogate model is problematic, is a multi-fidelity technique combining HyperBand with submodular optimization.
arXiv Detail & Related papers (2020-07-27T08:56:55Z)
HyperSTAR: Task-Aware Hyperparameters for Deep Networks [52.50861379908611]
HyperSTAR is a task-aware method to warm-start HPO for deep neural networks. It learns a dataset (task) representation along with the performance predictor directly from raw images. It evaluates 50% less configurations to achieve the best performance compared to existing methods.
arXiv Detail & Related papers (2020-05-21T08:56:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.