[WireGuard] NAT-T Keepalives

Fri Jul 15 14:07:54 CEST 2016

Hey Guus,

On Thu, Jul 14, 2016 at 12:55 PM, Guus Sliepen <guus at tinc-vpn.org> wrote:
> Some insights learned from tinc:

Thanks very much for these!

>
> On Thu, Jul 07, 2016 at 06:33:11PM +0200, Jason A. Donenfeld wrote:
>
>> a) The persistent keepalive does not need an active session and does
>> not need to send any encrypted data. It simply is a UDP packet to the
>> endpoint. The payload doesn't matter for the purpose of just keeping
>> the NAT mapping alive.
>
> Indeed.
>
>> 1. What should the payload be? Should it be a single fixed byte? Or
>> should it be a zero length UDP packet?
>
> A zero-length UDP packet should be fine, although it might upset some
> OSes or firewalls.

In fact, we wound up switching to an encrypted keepalive, so that it
would work nicely with roaming and endpoint discovery. Now, setting
persistent-keepalive mode on will ensure that wireguard remains
"connected", while its default is to "go to sleep".

>
> Another issue that tinc deals with is path MTU discovery. It combines
> this with the heartbeat packets. While a zero-length UDP packets is
> enough to keep a NAT mapping alive, the actual path between two peers
> might change, and that also changes the path MTU. AFAIK WireGuard
> doesn't care about this, but in case you (start to) do, you want to send
> packets with the discovered MTU and perhaps a slightly bigger one too,
> once in a while, to check whether the PMTU changed.
>
> Discovering the PMTU between two peers and enforcing this inside the
> tunnel helps prevent fragmentation of the outer UDP packets. This
> improves performance and sometimes it's just necessary because there are
> firewalls out there that block fragments.

Doesn't the Linux kernel already support PMTU discovery with the usual
ICMP notifications, unless you turn it off with the sysctl nob? Have
you experimented at all with how this discovery trickles down to tinc?
I wonder if, since I'm inside the kernel, I'd have an even closer way
of integrating with the already existing mechanisms.

> If you want to keep alive a NAT mapping, then experience tells me 10
> seconds is something that works for virtually all NAT devices. Once you
> start to go over 10 seconds, you will find there are those that will
> drop the mappings. There are RFCs which tell you how a NAT device should
> behave (RFC 4787 and 7857), but it's hard to find devices that follow
> all these requirements. The recommended timeout for NAT devices is 5
> minutes. I'm quite sure a 3600 second interval is useless in practice.

Yea, I read those RFCs and then promptly found out nobody follows
them. What a disappointment.

In practice, have you seen any devices that are worse than 30 seconds?
That's about the lowest I saw.