Performance of Wireguard on Infiniband 40G

Baptiste Jonglez baptiste at bitsofnetworks.org
Sun May 14 11:55:52 CEST 2017


On Sun, May 14, 2017 at 12:52:11AM +0200, Jason A. Donenfeld wrote:
> One small and unfortunate thought just occurred to me: the backporting
> to really old kernels I'm pretty sure is way less efficient than newer
> kernels on the RX, due to some missing core fast-path APIs in the old
> kernels. In particular, I had to wrap the UDP layer with some nasty
> hacks to get packets out, whereas newer kernels have an elegant API
> for that which integrates in the right place. Just a thought... I
> haven't actually done concrete measurements though.

Good idea, I have redone the same setup with kernel 4.9.18 from
jessie-backports.

TL;DR: when switching from kernel 3.16 to 4.9, wireguard has a 50%
performance gain in the most favourable case (large MTU).  Also, iperf
seems generally faster than iperf3, most likely because iperf3 has no
multi-threading.


The full results, still over Infiniand 40G, are:

- unidirectional iperf[1 thread] with 1420 MTU: 2.1 Gbit/s
  (instead of 1.6 Gbit/s with kernel 3.16)

- bidirectional iperf[1 thread] with 1420 MTU: 780 Mbit/s + 1.0 Gbit/s
  (instead of 700 Mbit/s + 800 Mbit/s with kernel 3.16)

- unidirectional iperf[8 threads] with 65450 MTU: 11.4 Gbit/s
  (instead of 7.6 Gbit/s with kernel 3.16)

Without wireguard, as a baseline:

- unidirectional iperf[8 threads] with 65450 MTU: 23.3 Gbit/s
  (instead of 21.7 Gbit/s with kernel 3.16)

So, the new kernel definitely improved performance: by 7% for iperf, and
by up to 50% for wireguard + iperf.

> > - iperf 2.0.5
> 
> iperf2 has the -b bidirectional mode which is nice, but it seems like
> most people are using iperf3 now. Out of curiosity, is there a reason
> for preferring iperf2, beyond the -b switch?

As I said, it was just a quick test (to see if it worked fine with
Jessie's 3.16 kernel).  Iperf was already installed but Iperf3 was not.

It turns out that iperf3 is slower in this setup, most likely because
iperf is multi-threaded but iperf3 is not.  For the baseline test (without
wireguard):

- iperf[1 thread]:   13.7 Gbit/s
- iperf[8 threads]:  23.4 Gbit/s
- iperf3[1 stream]:  16.8 Gbit/s 
- iperf3[8 streams]: 13.6 Gbit/s

This was with iperf 2.0.5 and iperf3 3.0.7 (jessie).

Just to be sure, with more recent versions (iperf 2.0.9, iperf3 3.1.3):

- iperf[1 thread]:   13.6 Gbit/s
- iperf[8 threads]:  23.3 Gbit/s
- iperf3[1 stream]:  16.8 Gbit/s 
- iperf3[8 streams]: 13.6 Gbit/s

So, the behaviour is the same: iperf is faster than iperf3 thanks to
multi-threading.

I also tested through wireguard:

- unidirectional iperf3[1 stream] with 65450 MTU: 6.47 Gbit/s
  (instead of 6.42 Gbit/s with iperf[1 thread])

- unidirectional iperf3[8 streams] with 65450 MTU: 10.9 Gbit/s
  (instead of 11.4 Gbit/s with iperf[8 threads])

> > - Xeon E5520 @2.27GHz (2 CPUs, 4 cores each)
> > - Mellanox ConnectX IB 4X QDR MT26428
> 
> *drools* That's some awesome hardware!

Well, it's not my hardware :)  But it's not exactly new, it dates back
from 2009.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.zx2c4.com/pipermail/wireguard/attachments/20170514/b21e7bd4/attachment-0001.asc>


More information about the WireGuard mailing list