Network latency and CPU C-States

Year ago I switched my home router to small Intel based PC, and after some time I noticed that this device has relative large network latency (for local ethernet).

For example:

$ ping -c 4 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.34 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.46 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=1.23 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=64 time=1.16 ms

--- 192.168.1.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 8ms
rtt min/avg/max/mdev = 1.157/1.293/1.455/0.118 ms

And 1.2 ms latency is really huge for a small LAN, it’s actually even higher than some WiFi devices (in some cases). I got curious about this and decided to investigate this further.

PC motherboard had Intel I211 Gigabit NIC, and some sources claim that this device is relatively low-latency, which makes this case even more interesting. At first, I thought that latency was caused somewhere in the kernel, so I isolated device from traffic and started debugging with bpftrace. Isolation device from network traffic and debugging via serial port would allow me to use much simpler eBPF programs (I could assume that all received IRQs from ethernet device were generated by my ICMP echo probes).

So I’ve tried something like this:

#!/usr/bin/env bpftrace
tracepoint:irq:irq_handler_entry /args->irq == 152 || args->irq == 153/ {
    @start = nsecs
}
kprobe:icmp_echo {
    printf("%d\n", nsecs - @start);
}'

I repeat that this would produce correct result only if all IRQ events from ethernet device is generated by our ICMP probes. In this case 151 and 152 is IRQ numbers assigned to this ethernet device.

$ cat /proc/interrupts | egrep '(152|153):'
 152:          0        111        440       1362  IR-PCI-MSI 2621441-edge      eth5-rx-0
 153:        137          0         80         10  IR-PCI-MSI 2621442-edge      eth5-rx-1

But results were quite normal. In the meantime I noticed that latency isn’t always as high as 1.2 ms and it’s related to CPU consumption (latency dropped when I greped over kernel sources). And then it occurred to me that latency was caused by some power management code, first thing that came to my mind was Intel C-States. So I decided to disable and see what happens next. When I disabled C-Stats device started to respond 6 times faster, latency dropped to 200 µs.

Here’s asciicast that demonstrates ping latency with different C-States.

I’ve found another post about this from 2011, but latency in this post isn’t jumping as high as in my case. I guess this is because now we have more C-States and they are much more aggressive. Also, interestingly C-States doesn’t affect latency of USB devices, at least when I conducted tests with an old USB-to-Ethernet dongle it wasn’t affected in any way and it performed better (in terms of latency) than PCI-E Ethernet on system with enabled all C-States, since it had almost always 0.4 ms latency. I guess this because USB uses polling and not interrupts.

There is also another interesting moment, according to /sys/devices/system/cpu/cpu0/cpuidle/state8/latency file, max C-State latency should be 890 μs and as you saw with disabled C-States ping latency is around 200 µs, so I would probably guess that in the worst case, C-States could cause latency jump up to 1.1 ms, maybe 1.2 or even 1.3. But sometimes it jumps up to 1.6 ms for some reason.

Also, I must admit that there is many other possibilities for increased device latency, like for example interrupt coalescing. Although it wasn’t causing any troubles for me:

$ sudo ethtool --show-coalesce eth5
Coalesce parameters for eth5:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 3
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 3
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

To be honest, I wasn’t satisfied with disabling C-States, cause it increased power consumption significantly (around 16%), disabling only some C-States was looking as much better solution, but I decided to make my own service that will change allowed C-States depending on amount traffic passing through device. So when nobody is using network or using it very insignificantly, device could use deep C-States (C7 and higher), but when someone starts to use network enough these deep C-States are disabled.

If you interested in power management and it’s connection with latency on Intel platform I highly recommend you to check linux cpuidle driver documentation.