Related papers: Memory-Optimized Once-For-All Network

Memory-Optimized Once-For-All Network

URL: http://arxiv.org/abs/2409.05900v1
Date: Thu, 5 Sep 2024 20:06:33 GMT
Title: Memory-Optimized Once-For-All Network
Authors: Maxime Girard, Victor Quétu, Samuel Tardieu, Van-Tam Nguyen, Enzo Tartaglione,
Abstract summary: Memory-limited OFA (MOOFA) supernet is designed to maximize memory usage across different configurations. Our code is available at https://github.com/MaximeGirard/memory-optimized-once-for-all.
Score: 5.008189006630566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deploying Deep Neural Networks (DNNs) on different hardware platforms is challenging due to varying resource constraints. Besides handcrafted approaches aiming at making deep models hardware-friendly, Neural Architectures Search is rising as a toolbox to craft more efficient DNNs without sacrificing performance. Among these, the Once-For-All (OFA) approach offers a solution by allowing the sampling of well-performing sub-networks from a single supernet -- this leads to evident advantages in terms of computation. However, OFA does not fully utilize the potential memory capacity of the target device, focusing instead on limiting maximum memory usage per layer. This leaves room for an unexploited potential in terms of model generalizability. In this paper, we introduce a Memory-Optimized OFA (MOOFA) supernet, designed to enhance DNN deployment on resource-limited devices by maximizing memory usage (and for instance, features diversity) across different configurations. Tested on ImageNet, our MOOFA supernet demonstrates improvements in memory exploitation and model accuracy compared to the original OFA supernet. Our code is available at https://github.com/MaximeGirard/memory-optimized-once-for-all.

Related papers

Recurrent Diffusion for Large-Scale Parameter Generation [52.98888368644455]
We introduce Recurrent Diffusion for Large Scale Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU. RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.
arXiv Detail & Related papers (2025-01-20T16:46:26Z)
RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices [0.30458577208819987]
We aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM) We propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints. The resulting model from NAS optimized for speed achieved 5x-30x speedup.
arXiv Detail & Related papers (2024-09-27T15:35:36Z)
Multi-objective Differentiable Neural Architecture Search [58.67218773054753]
We propose a novel NAS algorithm that encodes user preferences for the trade-off between performance and hardware metrics. Our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets.
arXiv Detail & Related papers (2024-02-28T10:09:04Z)
SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget [18.63754969602021]
Deep neural networks (DNNs) on edge artificial intelligence (AI) devices enable various autonomous mobile computing applications. Existing solutions, such as model compression or cloud offloading, reduce the memory footprint of DNN inference. We develop SwapNet, an efficient block swapping ecosystem for edge AI devices.
arXiv Detail & Related papers (2024-01-30T05:29:49Z)
Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques. PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML [4.2019872499238256]
We propose a novel strategy for deploying Deep Neural Networks on microcontrollers (TinyML) based on Multi-Objective Bayesian optimization (MOBOpt) Our methodology aims at efficiently finding tradeoffs between a DNN's predictive accuracy, memory consumption on a given target system, and computational complexity.
arXiv Detail & Related papers (2023-05-23T14:31:52Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing Architectures [0.7792020418343023]
Graph neural networks (GNNs) can extract features by learning both the representation of each objects (i.e., graph nodes) and the relationship across different objects. Despite its strengths, utilizing these algorithms in a production environment faces several challenges as the number of graph nodes and edges amount to several billions to hundreds of billions scale. In this work, we first conduct a detailed characterization on a state-of-the-art, large-scale GNN training algorithm, GraphAGES. Based on the characterization, we then explore the feasibility of utilizing capacity-optimized NVM for storing
arXiv Detail & Related papers (2022-05-10T07:25:30Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
Enabling Homomorphically Encrypted Inference for Large DNN Models [1.0679692136113117]
Homomorphic encryption (HE) enables inference using encrypted data but it incurs 100x--10,000x memory and runtime overheads. Secure deep neural network (DNN) inference using HE is currently limited by computing and memory resources. We explore the feasibility of leveraging hybrid memory systems comprised of DRAM and persistent memory.
arXiv Detail & Related papers (2021-03-30T07:53:34Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.