On Fri, 29 Sep 2023 07:23:31 -0400 Jamal Hadi Salim via people people@netdevconf.info wrote:
Traditional host Network performance measurements are based on analyzing the correlation between the two classical metrics of throughput and latency against cpu utilization. This approach leaves out an important metric: Power utilization. The huge influx of AI infrastructure - which ends up consuming network resources - has brought much needed attention to power consumption as a variable in network infrastructure. If you consider that the operational cost of power is going up (significantly in some parts of the world with the ongoing crisis) we cannot ignore networking infrastructure contributions to not just the cost factor but also the environmental harm brought forth with high power use.
Nabil Bitar, Jamal Hadi Salim and Pedro Tammela feel that Linux, which dominates data center deployments, deserves special attention. There is not much literature or shared experience in this space so they hope to inspire a discussion by sharing their experiences with the community: How would one go about measuring power utilization for network workloads? How do we correlate metrics such as perf, throughput, etc to the power utilized? How would one go about saving power while still achieving an application's stated end goals?
One of the more common ways to increase network performance is using poll mode architecture. Linux kernel does this with NAPI settings, and DPDK takes full advantage of it. The problem is this a tradeoff of power for performance. There have been a multitude of heuristics tried to optimize the tradeoff of CPU vs performance, but there is no complete solution. It is the network analog of the kernel scheduler.
Since most of the time, the network is idle. I have seen efforts that work with changing number of queues, adjusting CPU clock frequency, etc.