Related papers: Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks

Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks

URL: http://arxiv.org/abs/2410.09650v1
Date: Sat, 12 Oct 2024 21:07:55 GMT
Title: Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Authors: Ruhai Lin, Rui-Jie Zhu, Jason K. Eshraghian,
Abstract summary: This paper investigates the impact of bottleneck size on the performance of deep learning models in embedded multicore and many-core systems. We apply a hardware-software co-design methodology where data bottlenecks are replaced with extremely narrow layers to reduce the amount of data traffic. Hardware-side evaluation reveals that higher bottleneck ratios lead to substantial reductions in data transfer volume across the layers of the neural network.
Score: 5.32129361961937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid advancement of embedded multicore and many-core systems has revolutionized computing, enabling the development of high-performance, energy-efficient solutions for a wide range of applications. As models scale up in size, data movement is increasingly the bottleneck to performance. This movement of data can exist between processor and memory, or between cores and chips. This paper investigates the impact of bottleneck size, in terms of inter-chip data traffic, on the performance of deep learning models in embedded multicore and many-core systems. We conduct a systematic analysis of the relationship between bottleneck size, computational resource utilization, and model accuracy. We apply a hardware-software co-design methodology where data bottlenecks are replaced with extremely narrow layers to reduce the amount of data traffic. In effect, time-multiplexing of signals is replaced by learnable embeddings that reduce the demands on chip IOs. Our experiments on the CIFAR100 dataset demonstrate that the classification accuracy generally decreases as the bottleneck ratio increases, with shallower models experiencing a more significant drop compared to deeper models. Hardware-side evaluation reveals that higher bottleneck ratios lead to substantial reductions in data transfer volume across the layers of the neural network. Through this research, we can determine the trade-off between data transfer volume and model performance, enabling the identification of a balanced point that achieves good performance while minimizing data transfer volume. This characteristic allows for the development of efficient models that are well-suited for resource-constrained environments.

Related papers

Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications. Deep learning enabled anomaly detection has emerged as a critical direction. The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z)
High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates [50.406127962933915]
We develop solutions to problems which enable us to learn a communication-efficient distributed logistic regression model. In our experiments we demonstrate a large improvement in accuracy over distributed algorithms with only a few distributed update steps needed.
arXiv Detail & Related papers (2024-07-08T19:34:39Z)
IB-AdCSCNet:Adaptive Convolutional Sparse Coding Network Driven by Information Bottleneck [4.523653503622693]
We introduce IB-AdCSCNet, a deep learning model grounded in information bottleneck theory. IB-AdCSCNet seamlessly integrates the information bottleneck trade-off strategy into deep networks. Experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that IB-AdCSCNet not only matches the performance of deep residual convolutional networks but also outperforms them when handling corrupted data.
arXiv Detail & Related papers (2024-05-23T05:35:57Z)
DRCT: Saving Image Super-resolution away from Information Bottleneck [7.765333471208582]
Vision Transformer-based approaches for low-level vision tasks have achieved widespread success. Dense-residual-connected Transformer (DRCT) is proposed to mitigate the loss of spatial information. Our approach surpasses state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2024-03-31T15:34:45Z)
Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE) Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z)
SignalNet: A Low Resolution Sinusoid Decomposition and Estimation Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples. We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions. In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z)
Deep Cellular Recurrent Network for Efficient Analysis of Time-Series Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information. The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z)
On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time. We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)
Deep convolutional generative adversarial networks for traffic data imputation encoding time series as images [7.053891669775769]
We have developed a generative adversarial network (GAN) based traffic sensor data imputation framework (TGAN) In this study, we have developed a novel time-dependent encoding method called the Gramian Angular Summation Field (GASF) This study shows that the proposed model can significantly improve the traffic data imputation accuracy in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to state-of-the-art models on the benchmark dataset.
arXiv Detail & Related papers (2020-05-05T19:14:02Z)
A Survey on Impact of Transient Faults on BNN Inference Accelerators [0.9667631210393929]
Big data booming enables us to easily access and analyze the highly large data sets. Deep learning models require significant computation power and extremely high memory accesses. In this study, we demonstrate that the impact of soft errors on a customized deep learning algorithm might cause drastic image misclassification.
arXiv Detail & Related papers (2020-04-10T16:15:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.