Qdiscs rely on global lock to sync state across CPUs and therefore dont scale in presence of many cores (or in presence of very high bandwidth). Jonas Köppeler, Toke Høiland-Jørgensen, and Stefan Schmid implemented a multi-queue variant of sch_cake that can scale its rate limiting across hardware queues (and thus CPU cores) by sharing a bit of state on top of the mq qdisc.
In this talk, they will present the implementation, performance evaluation, as well as discuss their proposal for an API that will make this work upstreamable, and applicable to other qdiscs as well.
Details: https://netdevconf.info/0x19/16
cheers, jamal