[NetDev-People] [RESEND]0x17: Talk, Congestion control architecture for host congestion

2 Oct 2023


      Apologies, I left out the subject line in the last email, so resending this...
Link speeds are going up at an incredible rate.  Cloud vendors are already
asking for 4-800Gbps. Unfortunately the rest of the host hardware infra is not
keeping up. Even at the "low low speed;->" of 100Gbps the host CPU speeds,
cache sizes, memory access latency, memory bandwidth per core, NIC buffer sizes,
are just overwhelmed. As an example 6 CPU cores can saturate all available
memory bandwidth - imagine being on a machine that has 56C(112 hyperthreads)....
In this talk, Saksham Agarwal, Arvind Krishnamurthy and Rachit Agarwal say that
this has brought in new trends not seen before:
congestion within the processor, memory and peripheral interconnects of the host
which of course causes back pressure all the way to the NIC - meaning
back to the
network and, alas, causes congestion within the network.
Testing with Linux DCTCP shows that host congestion by our speakers found
as much as 1% packet drops at the host, 35-55% throughput degradation, and
120-5000x tail latency inflation!
Agarwal et al want us to look closely at more than just network heuristics to
resolve this emerging challenge. As an example: applications generating
CPU-to-memory traffic and clogging the memory bus are currently not considered
as part of the congestion control calculus at all - even though they are most
certainly part of the culprits.
In this talk our esteemed speakers introduce what they call "hostcc" as a
response to these challenges. Hostcc makes about ~800 line kernel code changes.
A little bit more details: https://netdevconf.info/0x17/6
For the real meat-and-potatoes come to the talk of course!
cheers,
jamal
PS: 5 more days left for early-bird registration. visit
netdevconf.info/0x17/registration

2025

2024

2023

2022

2021

2020

2019

2018

[NetDev-People] [RESEND]0x17: Talk, Congestion control architecture for host congestion