[Netdev 0x17 Net-Power] Re: An excellent capture of the issues for xPUs

18 Feb 2024


      Hi Simon!
Good timing reading your email, just getting started with my coffee here ;->
On Wed, Feb 14, 2024 at 6:55 AM Simon Leinen simon.leinen@switch.ch wrote:
...
...
If you ignore any marketing, this NVIDIA doc is a good read:
https://resources.nvidia.com/en-us-accelerated-networking-resource-library/n...
Wasting energy on another mailing list post to reduce energy (and time)
waste from following the b0rked pointer:
https://resources.nvidia.com/en-us-accelerated-networking-resource-library/n...
Nice find, thanks for sharing, Jamal!
An interesting observation is that specialized silicon (here, the DPUs)
generate significant savings (only) when the system is under high load.
The devil's advocate would argue that most real-world systems spend most
of their life in lower-load regimes, and that for such systems, the
energy savings under high load must be traded off against the base
overhead of keeping the special-purpose silicon "lit" even during the
(probably dominant) lower-load times.
It goes without saying that if you dont have an overloaded system(ex
running under capacity) there is nothing to improve on.
In my reading of the paper though i see the case being made for DPUs
hinged on one thing:
If you are using X servers and they are running at low capacity maybe
you dont need X servers. Move the workloads instead into
VMs/containers and squeeze all of that into X-Y servers. Now you have
systems likely to be loaded over 50% and DPUs make sense.
Then the argument builds into: If you reduce the number of hosts, you
reduce the amount of power consumed and more importantly you dont need
to build extra cooling infrastructure in your data centre that was
needed to accomodate X hosts(which according to the charts are upto
40% of the power costs in data centres).
Cloud vendors do this - but it doesnt seem like anyone else does. The
motivation for cloud vendors is clear, host CPU cycles provide $ from
customers.
It sounds like enterprise/telco types mostly plan around "why do i
care, I will replace my server in 3 years with the latest and greatest
CPU Big Intel provides me". It's analogous to the QoS
counter-arguement - the answer to congestion is buying more bandwidth
which is then underutilized.... Unfortunately we (engineers) often
ignore the operational challenges and think squeezing the cycles is
the answer to everything  - which often requires above-average skills.
The WP is certainly influenced by some engineering philosophy more
than operational perspective. IOW, it is also possible the motivation
for these enterprises/telcos is that they dont have the in house
skills to manage and operate "compressing workloads in hosts" and more
hardware is "reasonably" cheaper than hiring geniuses... they wouldnt
be using kubernetes if they really cared about power (or performance)
;->
There is another argument for DPUs (which is not being made in the WP)
that i have seen, cant remember which paper but it was MS making that
arguement on why they offload and I think i have seen some P4 folks
from Google repeat that view:
Instead of refreshing your servers every 3 years, keep them longer and
offload more as the newer workloads get more intense.
...
In particular, the white paper has some cost savings projections (tables
1 - 3) - which is nice - from power saved by DPUs over three years for
different applications, but the assumption there seems to be that the
servers are 100% utilized over all of these three years, which seems
quite contrived, at the very least for the telco workloads.
So for workloads with typical time-of-day/seasonal variations, where you
have to provision for maximum load, the savings might be significantly
lower in practice - CPU frequency scaling seems to do a nice job at low
utilization levels today.  It may be hard to justify the investment in
(and presumable increase in base power consumption of) DPUs.
The white paper was honest(hard to do for marketing people ;->) in
showing a commodity feature like "CPU micro-sleep and frequency
scaling" is a great way to get power savings. I am going to guess
Ericsson wanted that in there ;->
I have a different perspective on what you said on DPUs increasing the
power base.
Consider a 2x25G BF2 which sucks power from the PCI3 bus at ~45W.
_Under load_ there is a clear win in experiments we conducted. A "high
load enough"  which offloads ACLs and TLS would cost almost ~100W if
running on the host. The operative term is "under load".
You could argue even if you used a plain non-smart NIC on PCI3 it
would still consume 45 W to turn on. You can play with PCI registers
to lower the power consumption for the non-smart NIC but that comes
with a cost of increasing latency and other issues (I dont remember
whether Jesse mentioned mucking with ASPM in PCIE).
OTOH, i have seen: once you start going past PCIE3, these xPUs now
provide an extra cable you connect to the motherboard to draw extra
power (very similar to GPUs). When asked, vendors would say "it's just
insurance in case we need more power on overload" - it's hard to judge
if the "truth".
BTW: One strange thing we observed is that CPU power management seems
to be "work conserving". We totally shut down more than half the CPUs
and when we run a high enough load (>90% CPU) on the outstanding CPUs
it draws as much power as if all the CPUs were on! Maybe someone has
thoughts on this..
...
That said, some applications are different (crypto mining comes to
mind), and in some places the energy costs are/will be much higher than
the USD 0.15/kWh.  But still :-)
I am looking at those "savings" shown and i am scratching my head
whether the savings  compared to the cost of hardware are not a drop
in the bucket. Does anybody care if it is USD 0.15/kWh? Of course we
care from an environmental pov but who else does?
One thing the paper doesnt consider is the cost of operations. Humans
dont like the inconvenience of change. Quoting USD 0.15/kWh is not
going to move things - the pain has to be a lot bigger than that. If
you could make it "operationally easy" for people to save then it is
also easier to change their behavior. Or if you make both power and
hardware very expensive...
The AI craze will help. 800G NICs available today - and according to
broadcom soon over 1Tbps NICs. There is no way hosts today can keep up
other than for a very slim number of use cases (mostly bulk, latency
insensitive workloads where you can reduce the PPS into the host using
various tricks like GS/RO etc). So xPUS have a role to play despite
all that...
cheers,
jamal
...
Simon.

2026

2025

2024

2023

[Netdev 0x17 Net-Power] Re: An excellent capture of the issues for xPUs