Standardized IPv6 ULA from PublicKey

Sat Jun 27 23:43:56 CEST 2020

I've been working on this a bit from a completely independent
perspective: bootstrapping embedded systems which have a persistent
keypair, but no persistent storage for stuff like `AllowedIPs`
assignments. In my usecase, the by-convention assignment of an IPv6
link-local address to each WireGuard peer allows a gossip-style
protocol to update a newly-joined node with a (signed) set of
configuration parameters, including the `AllowedIPs` entries that
enable more comprehensive communication.

The fact that the assignment of cryptographically-bound IPv6 LLAs has
independently occurred to multiple parties now is not lost on me --
it's usually a sign of good design! I also agree that this type of
thing makes a lot more sense in the context of `wg-quick` than the
kernel module or `wg` tools themselves. However, care should be taken
to make sure that all potential implementations can adopt it without
extra overhead. For this reason, I'm biased towards simplicity in the
specification, not necessarily simplicity of implementation as part of
`wg-quick`. I would caution that the decision of how to generate and
assign addresses from public keys should be treated as a layer-3
problem. Each IPv6 network device is *required* to have a link-local
address by the RFC -- even if you can get away without one in practice
this makes it clear that the proper conceptual home of LLA assignment
is in the realm of bits and bytes, rather than strings and pipes --
even if its appropriate place in the architecture of the *reference
implementation* is in a optional shell script.

One more point of clarification: ULAs are in the `fc00::/7` space,
while LLAs are in `fe80::/10`. LLAs are what we want, because they are
explicitly interface-scoped -- and that means that they can be counted
on to be always be bound to the peer, no matter what the specific
network configuration of the node might be. Sending a packet to
`fe80::dead:beef%wg0` will always refer to a specific peer on the
`wg0` interface, and provides a guarantee that the contents of that
packet will be transmitted securely; whereas sending to
`fc00::dead:beef` *might* be on the `wg0` interface, but to be sure
you'd have to know that you didn't have a route to that address via
any other interface. This might be true on some -- or most -- nodes,
but it's not something that can be assumed. This makes
cryptographically-derived ULAs much less useful than
cryptographically-*bound* LLAs.

# OK, so how do we do it?
The general idea of using a hash to generate an IPv6 LLA is fairly
straightforward (and obvious, given that several people have come up
with it independently), but there are still some points that require
standardization. I think I have an exhaustive list of the points of
divergence that must be addressed; I will discuss each of them and my
perspective.

## What netmask should be used?
**fe80::/10.**

The IPv6 RFCs separate the address into a subnet and interface
identifier, which would seem to indicate that something like
`fe80::/64` should be used instead; however, by their very nature
link-local addresses are not part of a subnet. In addition, it is
desirable that each address be bound as strongly as possible to the
key it is derived from -- 118-bit security is a lot closer to 128-bit
than 64-bit security.

## Should the subnet identifier be concatenated with the results of
the hash, or should leading bits of the hash be dropped?
**(SUBNET & MASK) | (HASH & ~MASK)**

Binary math is good, cheap, and obvious, whereas concatenation is only
straightforward if the netmask is a whole number of bytes. Otherwise
you have to bitshift everything and it just gets messy. Besides, it's
a net*mask* -- seems like you should use it to *mask* things.

## Should the hash be taken over the key itself, or the Base64
encoding of the key?
**The key itself.**

While the tools are fairly consistent in the use of the Base64
encoding in user-facing scenarios, it's important to consider that
there's nothing fundamental about the WireGuard protocol itself that
requires the use of Base64 anywhere. I argue that it would be
inappropriate to introduce a dependency on it at such a low level --
especially since you can just do `base64 -d` inside `wg-quick`.

## What algorithm should the hash be done with?
**Blake2s with 32 bytes of output**.

This is simply the `HASH()` function in the WireGuard protocol
specification, and I think that using the same hash function as the
Noise construction makes a lot of sense. Even though output length is
a tunable parameter of the Blake2s function and an LLA will never use
more than 16 bytes, I feel that being consistent and obvious is
important. (Also, note that Blake2 tunes output length by truncation
internally; the only difference between taking a 16- or 32-byte long
digest is flipping a couple of bits during the setup phase. The
performance characteristics are exactly the same.)

That said, most of the attempts at implementing a IPv6 LLA assignment
scheme I've seen simply depend on `sha256sum` and call it a day,
because there's not a widespread CLI tool that does Blake2s for you.
There *are* a couple of different tools named `b2sum` -- the one made
by the Blake2 authors is fine, but the identically-named GNU coreutils
utility, which most people will get if they install their distro's
`b2sum` package, only does Blake2b (and takes a different set of flags
to boot).

 Still, like I mentioned above, we should be looking at this from a
protocol point of view, and requiring a whole extra crypto primitive
just for calculating an LLA seems wasteful. Implementing WireGuard
already requires that the Blake2s hash be available, and that it's not
easily accessible by the wg-quick tool is simply an unfortunate quirk
of the reference implementation. Think about a constrained environment
like a microcontroller -- SHA256 isn't a simple algorithm, and it
would probably cause a 50% increase in code size.

Luckily, Blake2s is a simple and elegant algorithm, and in an effort
to get some working code out there I've [implemented][1] it in ~100
lines of Bash script. (It's gotta be Bash because it needs array
support, but that's what `wg-quick` uses anyway.) It's slow compared
to a typical implementation, but it's not like we're mining
cryptocurrency here, and because WireGuard public keys are of a known,
fixed length the input will never be longer than a single block.
(Single-block hashes benchmark at around 50ms on my system, just for
reference.) I hope this helps accelerate the project, but I can
understand that a shell implementation might seem too janky for
long-term use: a potential solution would be to integrate the LLA
calculation into the wg tool, in a similar fashion to how the
Curve25519 public key calculation is handed by `wg pubkey`. I'm
imagining a `wg lla` command which takes in a Base64-encoded public
key and spits out a string of the form
`fe8b:5ea9:9e65:3bc2:b593:db41:30d1:0a4e` (which happens to be the LLA
associated with an all-zero public key under my proposed scheme).

[1]: https://gist.github.com/reidrankin/3a39210ce437680f5cf1ac549fd1f1ff

--Reid

On Wed, Jun 24, 2020 at 1:11 PM Chriztoffer Hansen <ch at ntrv.dk> wrote:
>
> On Wed, 24 Jun 2020 at 17:37, Florian Klink <flokli at flokli.de> wrote:
> > Deriving an IPv6 link-local address from the pubkey and adding it to the
> > interface should be a no-brainer and sane default, and already fix Babel
> > Routing (and most other issues) for "point-to-point tunnels"
> > (only one peer, both sides set AllowedIPs=::/0).
>
> An idea to implement as an option for e.g. wg-quick, rather than the
> base code-base itself?
>
> --
>
> Chriztoffer