Boosting Mobile CNN Inference through Semantic Memory
- URL: http://arxiv.org/abs/2112.02644v1
- Date: Sun, 5 Dec 2021 18:18:31 GMT
- Title: Boosting Mobile CNN Inference through Semantic Memory
- Authors: Yun Li, Chen Zhang, Shihao Han, Li Lyna Zhang, Baoqun Yin, Yunxin Liu,
Mengwei Xu
- Abstract summary: We develop a semantic memory design to improve on-device CNN inference.
SMTM employs a hierarchical memory architecture to leverage the long-tail distribution of objects of interest.
It can significantly speed up the model inference over standard approach (up to 2X) and prior cache designs (up to 1.5X), with acceptable accuracy loss.
- Score: 12.45440733435801
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human brains are known to be capable of speeding up visual recognition of
repeatedly presented objects through faster memory encoding and accessing
procedures on activated neurons. For the first time, we borrow and distill such
a capability into a semantic memory design, namely SMTM, to improve on-device
CNN inference. SMTM employs a hierarchical memory architecture to leverage the
long-tail distribution of objects of interest, and further incorporates several
novel techniques to put it into effects: (1) it encodes high-dimensional
feature maps into low-dimensional, semantic vectors for low-cost yet accurate
cache and lookup; (2) it uses a novel metric in determining the exit timing
considering different layers' inherent characteristics; (3) it adaptively
adjusts the cache size and semantic vectors to fit the scene dynamics. SMTM is
prototyped on commodity CNN engine and runs on both mobile CPU and GPU.
Extensive experiments on large-scale datasets and models show that SMTM can
significantly speed up the model inference over standard approach (up to 2X)
and prior cache designs (up to 1.5X), with acceptable accuracy loss.
Related papers
- mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling [0.5236468296934584]
mGRADE is a hybrid-memory system that integrates a temporal 1D-convolution with learnable spacings followed by a minimal gated recurrent unit.<n>We demonstrate that mGRADE effectively separates and preserves multi-scale temporal features.<n>This highlights mGRADE's promise as an efficient solution for memory-constrained multi-scale temporal processing at the edge.
arXiv Detail & Related papers (2025-07-02T15:44:35Z) - MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices [4.385815629175844]
MNN-LLM is a framework specifically designed to accelerate the deployment of large language models on mobile devices.<n>It addresses the runtime characteristics of LLMs through model quantization and DRAM-Flash hybrid storage.<n> Notably, MNN-LLM achieves up to a 8.6x speed increase compared to current mainstream LLM-specific frameworks.
arXiv Detail & Related papers (2025-06-12T07:45:29Z) - A Sensorimotor Vision Transformer [0.0]
Sensorimotor Transformer (SMT) is a vision model inspired by human saccadic eye movements.
SMT identifies and selects the most salient patches based on intrinsic two-dimensional (i2D) features.
arXiv Detail & Related papers (2025-04-03T12:37:44Z) - Enhancing Biologically Inspired Hierarchical Temporal Memory with Hardware-Accelerated Reflex Memory [0.29127054707887967]
This paper introduces a Reflex Memory (RM) block, inspired by the Spinal Cord's working mechanisms, to accelerate the processing of first-order inferences.
The integration of RM with HTM forms a system called the Accelerated Hierarchical Temporal Memory (AHTM), which processes repetitive information more efficiently.
Compared to the original algorithm AHTM, AHTM accelerates inference by up to 7.55x, while H-AHTM further enhances performance with a 10.10x speedup.
arXiv Detail & Related papers (2025-04-01T17:40:12Z) - Online Dense Point Tracking with Streaming Memory [54.22820729477756]
Dense point tracking is a challenging task requiring the continuous tracking of every point in the initial frame throughout a substantial portion of a video.
Recent point tracking algorithms usually depend on sliding windows for indirect information propagation from the first frame to the current one.
We present a lightweight and fast model with textbfStreaming memory for dense textbfPOint textbfTracking and online video processing.
arXiv Detail & Related papers (2025-03-09T06:16:49Z) - CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation [63.65323577445951]
We propose a novel approach called Cache Sparse Representation (CSR)
CSR transforms the dense Key-Value cache tensor into sparse indexes and weights, offering a more memory-efficient representation during LLM inference.
Our experiments demonstrate CSR achieves performance comparable to state-of-the-art KV cache quantization algorithms.
arXiv Detail & Related papers (2024-12-16T13:01:53Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - Robust and Efficient Memory Network for Video Object Segmentation [6.7995672846437305]
This paper proposes a Robust and Efficient Memory Network, or REMN, for studying semi-supervised video object segmentation (VOS)
We introduce a local attention mechanism that tackles the background distraction by enhancing the features of foreground objects with the previous mask.
Experiments demonstrate that our REMN achieves state-of-the-art results on DAVIS 2017, with a $mathcalJ&F$ score of 86.3% and on YouTube-VOS 2018, with a $mathcalG$ over mean of 85.5%.
arXiv Detail & Related papers (2023-04-24T06:19:21Z) - TinyAD: Memory-efficient anomaly detection for time series data in
Industrial IoT [43.207210990362825]
We propose a novel framework named Tiny Anomaly Detection (TinyAD) to efficiently facilitate onboard inference of CNNs for real-time anomaly detection.
To reduce the peak memory consumption of CNNs, we explore two complementary strategies, in-place, and patch-by-patch memory rescheduling.
Our framework can reduce peak memory consumption by 2-5x with negligible overhead.
arXiv Detail & Related papers (2023-03-07T02:56:15Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks.
MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z) - Memory-Augmented Deep Unfolding Network for Compressive Sensing [7.123516761504439]
Memory-Augmented Deep Unfolding Network (MADUN) is proposed to map a truncated optimization method into a deep neural network.
We show that our MADUN outperforms existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-10-19T07:03:12Z) - Unsupervised Motion Representation Learning with Capsule Autoencoders [54.81628825371412]
Motion Capsule Autoencoder (MCAE) models motion in a two-level hierarchy.
MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets.
arXiv Detail & Related papers (2021-10-01T16:52:03Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z) - Robust High-dimensional Memory-augmented Neural Networks [13.82206983716435]
Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues.
Access to this explicit memory occurs via soft read and write operations involving every individual memory entry.
We propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors.
arXiv Detail & Related papers (2020-10-05T12:01:56Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z) - STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition [39.58542259261567]
We present a novel S-Temporal Hybrid Network (STH) which simultaneously encodes spatial and temporal video information with a small parameter.
Such a design enables efficient-temporal modeling and maintains a small model scale.
STH enjoys performance superiority over 3D CNNs while maintaining an even smaller parameter cost than 2D CNNs.
arXiv Detail & Related papers (2020-03-18T04:46:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.