Related papers: Prioritising Interactive Flows in Data Center Networks With Central Control

Prioritising Interactive Flows in Data Center Networks With Central Control

URL: http://arxiv.org/abs/2402.00870v1
Date: Fri, 27 Oct 2023 07:15:15 GMT
Title: Prioritising Interactive Flows in Data Center Networks With Central Control
Authors: Mohana Prasad Sathya Moorthy
Abstract summary: We deal with two problems relating to central controller assisted prioritization of interactive flow in data center networks. In the first part of the thesis, we deal with the problem of congestion control in a software defined network. We propose a framework, where the controller with its global view of the network actively participates in the congestion control decisions of the end TCP hosts.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data centers are on the rise and scientists are re-thinking and re-designing networks for data centers. The concept of central control which was not effective in the Internet era is now gaining popularity and is used in many data centers due to lower scale of operation (compared to Internet), structured topologies and as the entire network resources is under a single entity's control. With new opportunities, data center networks also pose new problems. Data centers require: high utilization, low median, tail latencies and fairness. In the traditional systems, the bulk traffic generally stalls the interactive flows thereby affecting their flow completion times adversely. In this thesis, we deal with two problems relating to central controller assisted prioritization of interactive flow in data center networks. Fastpass is a centralized "zero-queue" data center network. But the central arbiter of Fastpass doesn't scale well for more than 256 nodes (or 8 cores). In our test runs, it supports only about 1.5 Terabits's of network traffic. In this work, we re-design their timeslot allocator of their central arbiter so that it scales linearly till 12 cores and supports about 1024 nodes and 7.1 Terabits's of network traffic. In the second part of the thesis, we deal with the problem of congestion control in a software defined network. We propose a framework, where the controller with its global view of the network actively participates in the congestion control decisions of the end TCP hosts, by setting the ECN bits of IPV4 packets appropriately. Our framework can be deployed very easily without any change to the end node TCPs or the SDN switches. We also show 30x improvement over TCP cubic and 1.7x improvement over RED in flow completion times of interactive traffic for one implementation of this framework.

Related papers

Semi-decentralized Training of Spatio-Temporal Graph Neural Networks for Traffic Prediction [0.15978270011184256]
We explore and adapt semi-decentralized training techniques for Spatiotemporal Graph-Temporal Neural Networks (ST-GNNs) in smart mobility domain. We implement a simulation framework where sensors are grouped by proximity into multiple cloudlets. We show that semi-decentralized setups are comparable to centralized approaches in performance metrics.
arXiv Detail & Related papers (2024-12-04T10:20:21Z)
I've Got 99 Problems But FLOPS Ain't One [70.3084616806354]
We take an unconventional approach to find relevant research directions, starting from public plans to build a $100 billion datacenter for machine learning applications. We discover what workloads such a datacenter might carry and explore the challenges one may encounter in doing so, with a focus on networking research. We conclude that building the datacenter and training such models is technically possible, but this requires novel wide-area transports for inter-DC communication, a multipath transport and novel datacenter topologies.
arXiv Detail & Related papers (2024-07-01T10:33:46Z)
Optimizing Closed Payment Networks on the Lightning Network: Dual Central Node Approach [0.0]
The Lightning Network, known for its millisecond settlement speeds and low transaction fees, offers a compelling alternative to traditional payment processors. This is particularly significant for the unbanked population, which lacks access to standard financial services. Our research targets businesses looking to shift their client to client payment processes, such as B2B invoicing, remittances, and cross-border transactions, to the Lightning Network.
arXiv Detail & Related papers (2023-12-06T21:35:19Z)
Center Focusing Network for Real-Time LiDAR Panoptic Segmentation [58.1194137706868]
A novel center focusing network (CFNet) is introduced to achieve accurate and real-time LiDAR panoptic segmentation. CFFE is proposed to explicitly understand the relationships between the original LiDAR points and virtual instance centers. Our CFNet outperforms all existing methods by a large margin and is 1.6 times faster than the most efficient method.
arXiv Detail & Related papers (2023-11-16T01:52:11Z)
GraphCC: A Practical Graph Learning-based Approach to Congestion Control in Datacenters [6.47712691414707]
Congestion Control (CC) plays a fundamental role in optimizing traffic in Data Center Networks (DCN) This paper presents GraphCC, a novel Machine Learning-based framework for in-network CC optimization.
arXiv Detail & Related papers (2023-08-09T12:04:41Z)
A Deep Reinforcement Learning Framework for Optimizing Congestion Control in Data Centers [2.310582065745938]
Various congestion control protocols have been designed to achieve high performance in different network environments. Modern online learning solutions that delegate the congestion control actions to a machine cannot properly converge in the stringent time scales of data centers. We leverage multiagent reinforcement learning to design a system for dynamic tuning of congestion control parameters at end-hosts in a data center.
arXiv Detail & Related papers (2023-01-29T22:08:35Z)
Bandwidth-efficient distributed neural network architectures with application to body sensor networks [73.02174868813475]
This paper describes a conceptual design methodology to design distributed neural network architectures. We show that the proposed framework enables up to a factor 20 in bandwidth reduction with minimal loss. While the application focus of this paper is on wearable brain-computer interfaces, the proposed methodology can be applied in other sensor network-like applications as well.
arXiv Detail & Related papers (2022-10-14T12:35:32Z)
Impact of RoCE Congestion Control Policies on Distributed Training of DNNs [7.573461420853252]
We analyze some of the SOTA RoCE congestion control schemes vs. PFC when running on distributed training platforms. Our results indicate that previously proposed RoCE congestion control schemes have little impact on the end-to-end performance of training workloads.
arXiv Detail & Related papers (2022-07-22T06:29:17Z)
Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs [64.26714148634228]
congestion control (CC) algorithms become extremely difficult to design. It is currently not possible to deploy AI models on network devices due to their limited computational capabilities. We build a computationally-light solution based on a recent reinforcement learning CC algorithm.
arXiv Detail & Related papers (2022-07-05T20:42:24Z)
Decentralized Control with Graph Neural Networks [147.84766857793247]
We propose a novel framework using graph neural networks (GNNs) to learn decentralized controllers. GNNs are well-suited for the task since they are naturally distributed architectures and exhibit good scalability and transferability properties. The problems of flocking and multi-agent path planning are explored to illustrate the potential of GNNs in learning decentralized controllers.
arXiv Detail & Related papers (2020-12-29T18:59:14Z)
Proximity-based Networking: Small world overlays optimized with particle swarm optimization [0.0]
Small world networks can be incredibly useful in the dissemination and lookup of information within an internet network. We propose a networking scheme that incorporates geographic location in chord for the organization of peers within each node's partitioned key space. The flexibility of our proposed schemes enables a variety of swarm models, and agents.
arXiv Detail & Related papers (2020-06-03T01:40:46Z)
Decentralized Learning for Channel Allocation in IoT Networks over Unlicensed Bandwidth as a Contextual Multi-player Multi-armed Bandit Game [134.88020946767404]
We study a decentralized channel allocation problem in an ad-hoc Internet of Things network underlaying on the spectrum licensed to a primary cellular network. Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error.
arXiv Detail & Related papers (2020-03-30T10:05:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.