[WireGuard] Major Queueing Algorithm Simplification
Jason A. Donenfeld
Jason at zx2c4.com
Fri Nov 4 14:24:58 CET 2016
Hey,
This might be of interest...
Before, every time I got a GSO superpacket from the kernel, I'd split
it into little packets, and then queue each little packet as a
different parallel job.
Now, every time I get a GSO super packet from the kernel, I split it
into little packets, and queue up that whole bundle of packets as a
single parallel job. This means that each GSO superpacket expansion
gets processed on a single CPU. This greatly simplifies the algorithm,
and delivers mega impressive performance throughput gains.
In practice, what this means is that if you call send(tcp_socket_fd,
buffer, biglength), then each 65k contiguous chunk of buffer will be
encrypted on the same CPU. Before, each 1.5k contiguous chunk would be
encrypted on the same CPU.
I had thought about doing this a long time ago, but didn't, due to
reasons that are now fuzzy to me. I believe it had something to do
with latency. But at the moment, I think this solution will actually
reduce latency on systems with lots of cores, since it means those
cores don't all have to be synchronized before a bundle can be sent
out. I haven't measured this yet, and I welcome any such tests. The
magic commit for this is [1], if you'd like to compare before and
after.
Are there any obvious objections I've overlooked with this simplified approach?
Thanks,
Jason
[1] https://git.zx2c4.com/WireGuard/commit/?id=7901251422e55bcd55ab04afb7fb390983593e39
More information about the WireGuard
mailing list