Taurus: A Data Plane Architecture for Per-Packet ML
- URL: http://arxiv.org/abs/2002.08987v2
- Date: Wed, 19 Jan 2022 20:20:04 GMT
- Title: Taurus: A Data Plane Architecture for Per-Packet ML
- Authors: Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Ishan Gaur, and
Kunle Olukotun
- Abstract summary: We present the design and implementation of Taurus, a data plane for line-rate inference.
Our evaluation of a Taurus switch ASIC shows that Taurus operates orders of magnitude faster than a server-based control plane.
- Score: 59.1343317736213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emerging applications -- cloud computing, the internet of things, and
augmented/virtual reality -- demand responsive, secure, and scalable datacenter
networks. These networks currently implement simple, per-packet, data-plane
heuristics (e.g., ECMP and sketches) under a slow, millisecond-latency control
plane that runs data-driven performance and security policies. However, to meet
applications' service-level objectives (SLOs) in a modern data center, networks
must bridge the gap between line-rate, per-packet execution and complex
decision making.
In this work, we present the design and implementation of Taurus, a data
plane for line-rate inference. Taurus adds custom hardware based on a flexible,
parallel-patterns (MapReduce) abstraction to programmable network devices, such
as switches and NICs; this new hardware uses pipelined SIMD parallelism to
enable per-packet MapReduce operations (e.g., inference). Our evaluation of a
Taurus switch ASIC -- supporting several real-world models -- shows that Taurus
operates orders of magnitude faster than a server-based control plane while
increasing area by 3.8% and latency for line-rate ML models by up to 221 ns.
Furthermore, our Taurus FPGA prototype achieves full model accuracy and detects
two orders of magnitude more events than a state-of-the-art control-plane
anomaly-detection system.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed [33.455302442142994]
programmable networks sparked significant research on Intelligent Network Data Plane (INDP), which achieves learning-based traffic analysis at line-speed.
Prior art in INDP focus on deploying tree/forest models on the data plane.
We present BoS to push the boundaries of INDP by enabling Neural Network (NN) driven traffic analysis at line-speed.
arXiv Detail & Related papers (2024-03-17T04:59:30Z) - Complex-Valued Neural Networks for Data-Driven Signal Processing and
Signal Understanding [1.2691047660244337]
Complex-valued neural networks have emerged boasting superior modeling performance for many tasks across the signal processing, sensing, and communications arenas.
This paper overviews a package built on PyTorch with the intention of implementing light-weight interfaces for common complex-valued neural network operations and architectures.
arXiv Detail & Related papers (2023-09-14T16:55:28Z) - EasyNet: An Easy Network for 3D Industrial Anomaly Detection [49.26348455493123]
3D anomaly detection is an emerging and vital computer vision task in industrial manufacturing.
We propose an easy and deployment-friendly network (called EasyNet) without using pre-trained models and memory banks.
Experiments show that EasyNet achieves an anomaly detection AUROC of 92.6% without using pre-trained models and memory banks.
arXiv Detail & Related papers (2023-07-26T02:46:50Z) - RouteNet-Fermi: Network Modeling with Graph Neural Networks [7.227467283378366]
We present RouteNet-Fermi, a custom Graph Neural Networks (GNN) model that shares the same goals as Queuing Theory.
The proposed model predicts accurately the delay, jitter, and packet loss of a network.
Our experimental results show that RouteNet-Fermi achieves similar accuracy as computationally-expensive packet-level simulators.
arXiv Detail & Related papers (2022-12-22T23:02:40Z) - Pathways: Asynchronous Distributed Dataflow for ML [24.940220376358457]
We present the design of a new large scale orchestration layer for accelerators.
Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas.
arXiv Detail & Related papers (2022-03-23T16:50:53Z) - Parallel Successive Learning for Dynamic Distributed Model Training over
Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices.
We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions.
Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - A Photonic-Circuits-Inspired Compact Network: Toward Real-Time Wireless
Signal Classification at the Edge [3.841495731646297]
Large size of machine learning models can make them difficult to implement on edge devices for latency-sensitive downstream tasks.
In wireless communication systems, ML data processing at a sub-millisecond scale will enable real-time network monitoring.
We propose a novel compact deep network that consists of a photonic-hardware-inspired recurrent neural network model.
arXiv Detail & Related papers (2021-06-25T19:55:41Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.