Related papers: Taurus: A Data Plane Architecture for Per-Packet ML

Taurus: A Data Plane Architecture for Per-Packet ML

URL: http://arxiv.org/abs/2002.08987v2
Date: Wed, 19 Jan 2022 20:20:04 GMT
Title: Taurus: A Data Plane Architecture for Per-Packet ML
Authors: Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Ishan Gaur, and Kunle Olukotun
Abstract summary: We present the design and implementation of Taurus, a data plane for line-rate inference. Our evaluation of a Taurus switch ASIC shows that Taurus operates orders of magnitude faster than a server-based control plane.
Score: 59.1343317736213
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Emerging applications -- cloud computing, the internet of things, and augmented/virtual reality -- demand responsive, secure, and scalable datacenter networks. These networks currently implement simple, per-packet, data-plane heuristics (e.g., ECMP and sketches) under a slow, millisecond-latency control plane that runs data-driven performance and security policies. However, to meet applications' service-level objectives (SLOs) in a modern data center, networks must bridge the gap between line-rate, per-packet execution and complex decision making. In this work, we present the design and implementation of Taurus, a data plane for line-rate inference. Taurus adds custom hardware based on a flexible, parallel-patterns (MapReduce) abstraction to programmable network devices, such as switches and NICs; this new hardware uses pipelined SIMD parallelism to enable per-packet MapReduce operations (e.g., inference). Our evaluation of a Taurus switch ASIC -- supporting several real-world models -- shows that Taurus operates orders of magnitude faster than a server-based control plane while increasing area by 3.8% and latency for line-rate ML models by up to 221 ns. Furthermore, our Taurus FPGA prototype achieves full model accuracy and detects two orders of magnitude more events than a state-of-the-art control-plane anomaly-detection system.

Related papers

SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins [78.53885607559958]
We propose SCoTT, a wireless-aware path planning framework.<n>We show that SCoTT achieves path gains within 2% of DP-WA* while consistently generating shorter trajectories.<n>We also show the practical viability of our approach by deploying SCoTT as a ROS node within Gazebo simulations.
arXiv Detail & Related papers (2024-11-27T10:45:49Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Communication- and Computation-Efficient Distributed Submodular Optimization in Robot Mesh Networks [2.8936428431504164]
We provide a communication- and computation-efficient method for distributed submodular optimization in robot mesh networks. Our method, Resource-Aware distributed Greedy (RAG), introduces a new distributed optimization paradigm. RAG's decision-time scales linearly with the network size, while state-of-the-art near-optimal submodular optimization algorithms scale cubically.
arXiv Detail & Related papers (2024-07-15T01:25:39Z)
Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed [33.455302442142994]
programmable networks sparked significant research on Intelligent Network Data Plane (INDP), which achieves learning-based traffic analysis at line-speed. Prior art in INDP focus on deploying tree/forest models on the data plane. We present BoS to push the boundaries of INDP by enabling Neural Network (NN) driven traffic analysis at line-speed.
arXiv Detail & Related papers (2024-03-17T04:59:30Z)
Complex-Valued Neural Networks for Data-Driven Signal Processing and Signal Understanding [1.2691047660244337]
Complex-valued neural networks have emerged boasting superior modeling performance for many tasks across the signal processing, sensing, and communications arenas. This paper overviews a package built on PyTorch with the intention of implementing light-weight interfaces for common complex-valued neural network operations and architectures.
arXiv Detail & Related papers (2023-09-14T16:55:28Z)
EasyNet: An Easy Network for 3D Industrial Anomaly Detection [49.26348455493123]
3D anomaly detection is an emerging and vital computer vision task in industrial manufacturing. We propose an easy and deployment-friendly network (called EasyNet) without using pre-trained models and memory banks. Experiments show that EasyNet achieves an anomaly detection AUROC of 92.6% without using pre-trained models and memory banks.
arXiv Detail & Related papers (2023-07-26T02:46:50Z)
RouteNet-Fermi: Network Modeling with Graph Neural Networks [7.227467283378366]
We present RouteNet-Fermi, a custom Graph Neural Networks (GNN) model that shares the same goals as Queuing Theory. The proposed model predicts accurately the delay, jitter, and packet loss of a network. Our experimental results show that RouteNet-Fermi achieves similar accuracy as computationally-expensive packet-level simulators.
arXiv Detail & Related papers (2022-12-22T23:02:40Z)
Pathways: Asynchronous Distributed Dataflow for ML [24.940220376358457]
We present the design of a new large scale orchestration layer for accelerators. Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas.
arXiv Detail & Related papers (2022-03-23T16:50:53Z)
Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices. We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z)
Robust Semi-supervised Federated Learning for Images Automatic Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition. There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules. We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z)
A Photonic-Circuits-Inspired Compact Network: Toward Real-Time Wireless Signal Classification at the Edge [3.841495731646297]
Large size of machine learning models can make them difficult to implement on edge devices for latency-sensitive downstream tasks. In wireless communication systems, ML data processing at a sub-millisecond scale will enable real-time network monitoring. We propose a novel compact deep network that consists of a photonic-hardware-inspired recurrent neural network model.
arXiv Detail & Related papers (2021-06-25T19:55:41Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.