Standardized IPv6 ULA from PublicKey

Tue Jun 30 10:01:30 CEST 2020

> Fun fact: initial versions of WireGuard from years ago weren't like
> this. We wound up redoing some crypto and coming up with the `_psk2`
> variant for this purpose. I'm glad it's useful. I'm interested to
> learn: what are you doing this for? Got any code online?

That's a dangerous question to ask, because I'm really excited about
it! It's an embedded device that connects otherwise-insecure stuff
together into a transparent overlay network that's centrally
configurable. You get a set of them in a box, plug one legacy
thingamabob into each, and all the devices show up in a GUI. You
configure all the IP allocations and allowed traffic flows, and an
included hardware token signs the configuration. You then throw the
master key into a safe somewhere, confident in the knowledge that even
if your network infrastructure is all broken into by
nation-state-du-jour your traffic will stay confidential; even if the
network you're using to communicate is re-addressed underneath you --
or your stuff is moved across the country -- none of your legacy stuff
will need reconfiguration; and even if the legacy stuff itself is
broken into, the network configuration is being enforced by hardware.
The boxes themselves have no persistent storage at all; in fact,
possession of a private key the devices don't have is required to
unlock the flash. The point is to resist malware infection by making
assured remediation as simple as power-cycling the unit -- eventually,
I'll have a dedicated microcontroller acting as a watchdog which
shorts the reset pin to ground if the unit can't provide a TPM-backed
health attestation every minute or so. The grand master plan is that
as soon as you hack in and try to run something interesting, it all
resets and you're back out again.

Of course, each unit can't be updated every time you need to add a new
one, and that's where the LLAs and in-band authentication stuff comes
in. New boxes use Zeroconf to find peers, after which they connect and
present the certificate authorizing their `AllowedIPs` and appropriate
firewall setup. My goal is for every packet that comes out of each
device except for ICMP, DHCP, and (m)DNS to be a WireGuard packet,
cutting the attack surface to the bone -- and until you present a
valid certificate, the only thing allowed inside the tunnel is TFTP.

Most of my code for this thing is all fairly hacky and
environment-specific at the moment -- that `wg-lla.sh` Gist from
before is the first real piece I've been able to clean up and
open-source. It's a fairly big project, but there are some more
sections that I'm fairly certain I'll end up releasing as well; for
example, the mesh-routing setup might be useful to some people. The
principle is that by "brute-forcing" the MAC1 field from handshake
initiations against the static public keys of all known peers, you can
figure out what peer (or, rather, peer's endpoint) to send the
handshake and subsequent flow towards, which makes every node in the
mesh a potential endpoint for any other peer in the mesh. There's also
a microcontroller-compatible implementation of WireGuard I'm working
on (though in the very early stages), targeted at the Cortex-M0
platform and written in purely `no_std`, `forbid(unsafe_code)` Rust.
All this stuff integrates into one big product in the end, but I'm
very open to hearing community feedback on which of these bits would
be most useful to others -- I'll prioritize them. (And if you're
really interested in any of it, it's all at least proof-of-concept,
and I do contract work!)

By the way, putting the PSK exchange at the end is useful for another
reason, too: you can use it to chain authentication mechanisms. The
key here is that the PSK can be updated after sending an initiation
packet, but before receiving the response. I've done an experiment
using nfqueue on the initiator to catch an outgoing handshake request
and stick an extra nonce on the end -- which is signed using a
secondary key. On the responder side, another nonce is chosen; a new
PSK, calculated by hashing the nonces, is set using `wg`, and the
initiator ID is noted. The handshake initiation is then released for
processing, which occurs using the freshly-set PSK. When WireGuard
sends the handshake response, nfqueue intercepts the outgoing packet,
matches the initiator ID, and sticks the responder's nonce on the end
encrypted to the secondary key. The initiator intercepts the response
with nfqueue, decrypts the second nonce, calculates the new PSK, and
issues the same `wg set` command before releasing the response packet.
This all works just fine (at least, as long as the daemon stays up),
proving that you can do interactive authentication out-of-band. I've
since realized that sticking the authentication I'm looking for inside
the tunnel is a much better choice for my application, but I'm glad to
have options.

> This sounds like a motivation for doing the LLv6 generation inside of
> your daemon, not inside of the kernel, right? In that case, your
> design must already take into account a malicious peer finding public
> key collisions after hashing.

I'm not actually looking for a feature here as much as I am a
standard, and this definitely shouldn't go in the kernel. (Heck, I
wrote a Blake2s implementation in Bash just so it wouldn't have to go
any deeper than `wg-quick`.) That said, part of me would really like
to see a command like `wg lla
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=` that spits out
`fe8b:5ea9:9e65:3bc2:b593:db41:30d1:0a4e`. That would serve the dual
purposes of avoiding running a hash algorithm in a shell script and
serving as a standard. There's not a lot of decisions to make when you
sit down with the goal to make an LLA from a public key hash -- I
listed them in my prior post, and I'm pretty sure it's exhaustive --
and it would be a shame if we didn't have sensible defaults to adopt.

As for security, a 256-bit ECC public key only gives 128 bits of
security in the first place. Hashing the key down to 16 bytes doesn't
hurt security, because it would take as much effort to find a
collision as it would to just run Pollard-rho and crack the key you're
trying to mess with. The compromise comes in when you start masking
off bits, and losing 10 bits to fit into `fe80::/10` isn't actually
that bad -- in fact, I argue that it's negligible, because finding a
colliding keypair requires an ECC scalar multiplication to determine
the public key associated with each private key guess. This easily
takes more than 1024 times as long as running the hash itself, meaning
that the process of finding a keypair that's a second-preimage of a
desired 118-bit LLA suffix actually takes longer than brute-forcing a
second-preimage of a 128-bit Blake2s hash. (For reference, Curve25519
takes [832457 cycles][1] for a single scalar multiplication; Blake2s
on a single 64-byte block takes [5.5 cycles per byte][2], or 352
cycles. These numbers are different microarchitectures, so it's kind
of an apples-to-oranges thing, but we're talking orders of magnitude
here.)

[1]: https://cr.yp.to/ecdh/curve25519-20051115.pdf
[2]: https://blake2.net/blake2.pdf