Related papers: Teleportation With Null Space Gradient Projection for Optimization Acceleration

Teleportation With Null Space Gradient Projection for Optimization Acceleration

URL: http://arxiv.org/abs/2502.11362v1
Date: Mon, 17 Feb 2025 02:27:16 GMT
Title: Teleportation With Null Space Gradient Projection for Optimization Acceleration
Authors: Zihao Wu, Juncheng Dong, Ahmed Aloui, Vahid Tarokh,
Abstract summary: We introduce an algorithm that projects the gradient of the teleportation objective function onto the input null space.<n>Our approach is readily generalizable from CNNs to transformers, and potentially other advanced architectures.
Score: 31.641252776379957
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optimization techniques have become increasingly critical due to the ever-growing model complexity and data scale. In particular, teleportation has emerged as a promising approach, which accelerates convergence of gradient descent-based methods by navigating within the loss invariant level set to identify parameters with advantageous geometric properties. Existing teleportation algorithms have primarily demonstrated their effectiveness in optimizing Multi-Layer Perceptrons (MLPs), but their extension to more advanced architectures, such as Convolutional Neural Networks (CNNs) and Transformers, remains challenging. Moreover, they often impose significant computational demands, limiting their applicability to complex architectures. To this end, we introduce an algorithm that projects the gradient of the teleportation objective function onto the input null space, effectively preserving the teleportation within the loss invariant level set and reducing computational cost. Our approach is readily generalizable from MLPs to CNNs, transformers, and potentially other advanced architectures. We validate the effectiveness of our algorithm across various benchmark datasets and optimizers, demonstrating its broad applicability.

Related papers

Accelerated Gradient-based Design Optimization Via Differentiable Physics-Informed Neural Operator: A Composites Autoclave Processing Case Study [0.0]
We introduce a novel Physics-Informed DeepONet (PIDON) architecture to effectively model the nonlinear behavior of complex engineering systems.<n>We demonstrate the effectiveness of this framework in the optimization of aerospace-grade composites curing processes achieving a 3x speedup.<n>The proposed model has the potential to be used as a scalable and efficient optimization tool for broader applications in advanced engineering and digital twin systems.
arXiv Detail & Related papers (2025-02-17T07:11:46Z)
Implementing transferable annealing protocols for combinatorial optimisation on neutral atom quantum processors: a case study on smart-charging of electric vehicles [1.53934570513443]
In this paper, we build on the promising potential of parameter transferability across problem instances with similar local structures. Our study reveals that, for Maximum Independent Set problems on graph families with shared geometries, optimal parameters naturally concentrate. We apply this method to address a smart-charging optimisation problem on a real dataset.
arXiv Detail & Related papers (2024-11-25T18:41:02Z)
Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers [14.176630393074149]
We present a novel trajectory generation framework that generalizes across diverse problem configurations. We leverage high-capacity transformer neural networks capable of learning from data sources. The framework is validated through simulations and experiments on a free-flyer platform.
arXiv Detail & Related papers (2024-10-15T15:55:42Z)
Deep-Unfolding for Next-Generation Transceivers [49.338084953253755]
The stringent performance requirements of future wireless networks are spurring studies on defining the next-generation multiple-input multiple-output (MIMO) transceivers. For the design of advanced transceivers in wireless communications, optimization approaches often leading to iterative algorithms have achieved great success. Deep learning, approximating the iterative algorithms with deep neural networks (DNNs) can significantly reduce the computational time. Deep-unfolding has emerged which incorporates the benefits of both deep learning and iterative algorithms, by unfolding the iterative algorithm into a layer-wise structure.
arXiv Detail & Related papers (2023-05-15T02:13:41Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
A mechanistic-based data-driven approach to accelerate structural topology optimization through finite element convolutional neural network (FE-CNN) [5.469226380238751]
A mechanistic data-driven approach is proposed to accelerate structural topology optimization. Our approach can be divided into two stages: offline training, and online optimization. Numerical examples demonstrate that this approach can accelerate optimization by up to an order of magnitude in computational time.
arXiv Detail & Related papers (2021-06-25T14:11:45Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity. In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.