Related papers: Assessing the Performance of Analog Training for Transfer Learning

Assessing the Performance of Analog Training for Transfer Learning

URL: http://arxiv.org/abs/2505.11067v1
Date: Fri, 16 May 2025 10:02:32 GMT
Title: Assessing the Performance of Analog Training for Transfer Learning
Authors: Omobayode Fagbohungbe, Corey Lammie, Malte J. Rasch, Takashi Ando, Tayfun Gokmen, Vijay Narayanan,
Abstract summary: Analog in-memory computing promises fast, parallel, and energy-efficient deep learning training and transfer learning.<n>A new algorithm chopped TTv2 (c-TTv2) has been introduced, which leverages the chopped technique to address many of the challenges mentioned above.<n>In this paper, we assess the performance of the c-TTv2 algorithm for analog TL using a Swin-ViT model on a subset of the CIFAR100 dataset.
Score: 0.26388783516590225
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Analog in-memory computing is a next-generation computing paradigm that promises fast, parallel, and energy-efficient deep learning training and transfer learning (TL). However, achieving this promise has remained elusive due to a lack of suitable training algorithms. Analog memory devices exhibit asymmetric and non-linear switching behavior in addition to device-to-device variation, meaning that most, if not all, of the current off-the-shelf training algorithms cannot achieve good training outcomes. Also, recently introduced algorithms have enjoyed limited attention, as they require bi-directionally switching devices of unrealistically high symmetry and precision and are highly sensitive. A new algorithm chopped TTv2 (c-TTv2), has been introduced, which leverages the chopped technique to address many of the challenges mentioned above. In this paper, we assess the performance of the c-TTv2 algorithm for analog TL using a Swin-ViT model on a subset of the CIFAR100 dataset. We also investigate the robustness of our algorithm to changes in some device specifications, including weight transfer noise, symmetry point skew, and symmetry point variability

Related papers

In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning [59.091567092071564]
In-memory training typically requires at least 8-bit conductance states to match digital baselines.<n>Many promising memristive devices such as ReRAM offer only about 4-bit resolution due to fabrication constraints.<n>This paper proposes a emphresidual learning framework that sequentially learns on multiple crossbar tiles to compensate the residual errors.
arXiv Detail & Related papers (2025-10-02T19:44:25Z)
Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization [56.805574957824135]
Two-way partial AUCAUC is a critical performance metric for binary classification with imbalanced data.<n>Existing algorithms for TPAUC optimization remain under-explored.<n>We introduce two innovative double-coordinate block-coordinate algorithms for TPAUC optimization.
arXiv Detail & Related papers (2025-05-28T03:55:05Z)
Towards Exact Gradient-based Training on Analog In-memory Computing [28.38387901763604]
Inference on analog accelerators has been studied recently, but the training perspective is underexplored. Recent studies have shown that the "workhorse" of digital AI training - gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices. This paper puts forth a theoretical foundation for gradient-based training on analog devices.
arXiv Detail & Related papers (2024-06-18T16:43:59Z)
Energy-based learning algorithms for analog computing: a comparative study [2.0937431058291933]
Energy-based learning algorithms have recently gained a surge of interest due to their compatibility with analog hardware. We compare seven learning algorithms, namely contrastive learning (CL), equilibrium propagation (EP) and coupled learning (CpL) We find that negative perturbations are better than positive ones, and highlight the centered variant of EP as the best-performing algorithm.
arXiv Detail & Related papers (2023-12-22T22:49:58Z)
In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent. For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z)
Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes. We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
Fast offset corrected in-memory training [0.0]
We propose and describe two new and improved algorithms for in-memory computing. Chopped-TTv2 (c-TTv2) and Analog Gradient Accumulation with Dynamic reference (AGAD) retain the same runtime complexity but correct for any remaining offsets using choppers.
arXiv Detail & Related papers (2023-03-08T17:07:09Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
TETRIS-ADAPT-VQE: An adaptive algorithm that yields shallower, denser circuit ans\"atze [0.0]
We introduce an algorithm called TETRIS-ADAPT-VQE, which iteratively builds up variational ans"atze a few operators at a time. It results in denser but significantly shallower circuits, without increasing the number of CNOT gates or variational parameters. These improvements bring us closer to the goal of demonstrating a practical quantum advantage on quantum hardware.
arXiv Detail & Related papers (2022-09-21T18:00:02Z)
Task-Oriented Sensing, Computation, and Communication Integration for Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC) We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z)
Neural Network Training with Asymmetric Crosspoint Elements [1.0773924713784704]
asymmetric conductance modulation of practical resistive devices critically degrades the classification of networks trained with conventional algorithms. Here, we describe and experimentally demonstrate an alternative fully-parallel training algorithm: Hamiltonian Descent. We provide critical intuition on why device asymmetry is fundamentally incompatible with conventional training algorithms and how the new approach exploits it as a useful feature instead.
arXiv Detail & Related papers (2022-01-31T17:41:36Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.