Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s first message benchmarks

Jason A. Donenfeld Jason at zx2c4.com
Tue May 23 14:58:57 CEST 2017


Hey folks,

[Noise-related, but CCing curves@, since this essentially amounts to a
benchmark of 25519.]

I added multi-core handshake processing to WireGuard this afternoon.
With that in place, I decided to run some tests on how many real life
network packets could be handled. To do this, I simply replayed the
same valid initiation packet over and over, from localhost, which
means the processing of the packet went all the way through up to the
timestamp/counter in the payload, when it then saw it was a replay and
discarded. This means that pretty much all the Noise calculations were
being executed. Measurements below are in kilo-packets per second;
each packet requires 2 ECDH() calls and a bunch of hashing.

Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz

AVX-accelerated ChaCha20Poly1305, Blake2s, Curve25519 (sandy2x):
multi-core: 48k/second
single-core: 10k/second

AVX-accelerated ChaCha20Poly1305, Blake2s | Curve25519-donna 64-bit:
multi-core: 42k/second
single-core: 8.8k/second

Reference C ChaCha20Poly1305, Blake2s | Curve25519-donna 64-bit:
multi-core: 41k/second
single-core: 8.6k/second

Having accelerated hashing and encryption helps only a _little_,
whereas having accelerated ECDH helps _a bit more than a little_ but
still not _tons and tons_.

I found that on this hardware, with an incoming packet queue length of
4096, and a "do not process unless a mac2 is present, thereby
requiring a cookie reply message" max queue depth of 512, I was able
to fend of a localhost-based (read: infinite bandwidth) DoS attack.

Given that IK computes two ECDH() in the first message, are these
measurements ±  how you'd expect 25519 to perform? Is it "expected"
that the difference between donna and sandy2x isn't that massive?

Jason


More information about the WireGuard mailing list