The war on memory bottlenecks[1][2] continues with this talk from David Wei and Pavel Begunkov.
Socket recv() data is first copied/DMAed into the kernel memory and then again into user space memory - adding pressure to overall memory bandwidth and of course comes with a CPU cost. While there are other approaches to avoid the second copy on recv() such as DPDK, RDMA etc, David and Pavel argue that all of them have downsides - ranging from being proprietary, custom patched drivers, and difficult to debug and worse requiring a rewrite of the applications... Our esteemed speakers instead opt for continuing to use the network stack as is and io_uring as the user facing API. So what do we need to get this working? As in [2] that the NIC supports header splitting and RSS flow steering. On incoming data the headers traverse the TCP stack while data is DMA'ed not to the kernel memory but into user space...
David and Pavel will discuss the overall approach they took and describe in some detail the kernel infra as well as what uapi would look like. They will further review kernel-existing zero copy approaches and how they plan to coexist with them. Last but not least they will dig into limitations and challenges of zero copy receive and how they overcome them to facilitate a real deployment.
[1] https://netdevconf.info/0x17/sessions/talk/congestion-control-architecture-f... [2] https://netdevconf.info/0x17/sessions/talk/device-memory-tcp.html
cheers, jamal
Reminder: 2 more days to go for early bird registration