Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s first message benchmarks
Jason A. Donenfeld
Jason at zx2c4.com
Tue May 23 14:58:57 CEST 2017
[Noise-related, but CCing curves@, since this essentially amounts to a
benchmark of 25519.]
I added multi-core handshake processing to WireGuard this afternoon.
With that in place, I decided to run some tests on how many real life
network packets could be handled. To do this, I simply replayed the
same valid initiation packet over and over, from localhost, which
means the processing of the packet went all the way through up to the
timestamp/counter in the payload, when it then saw it was a replay and
discarded. This means that pretty much all the Noise calculations were
being executed. Measurements below are in kilo-packets per second;
each packet requires 2 ECDH() calls and a bunch of hashing.
Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz
AVX-accelerated ChaCha20Poly1305, Blake2s, Curve25519 (sandy2x):
AVX-accelerated ChaCha20Poly1305, Blake2s | Curve25519-donna 64-bit:
Reference C ChaCha20Poly1305, Blake2s | Curve25519-donna 64-bit:
Having accelerated hashing and encryption helps only a _little_,
whereas having accelerated ECDH helps _a bit more than a little_ but
still not _tons and tons_.
I found that on this hardware, with an incoming packet queue length of
4096, and a "do not process unless a mac2 is present, thereby
requiring a cookie reply message" max queue depth of 512, I was able
to fend of a localhost-based (read: infinite bandwidth) DoS attack.
Given that IK computes two ECDH() in the first message, are these
measurements ± how you'd expect 25519 to perform? Is it "expected"
that the difference between donna and sandy2x isn't that massive?
More information about the WireGuard