Ultra low bandwidth wireguard question

Ed W lists at wildgooses.com
Tue Sep 28 14:43:02 UTC 2021


Hi, I have a satellite ISP where bandwidth costs around the $10-100/MB range (that's MB, not GB).
There are also additional costs for each "connection", meaning its more efficient to transmit in
bursts, than a gentle trickle of packets intermittently

An additional limitation of the network is that it takes 5-15 seconds to start passing packets once
it's been idle for a while ("a while" is about 25 seconds), and during that time wireguard will
retransmit packets on a 5 sec fixed interval, however, the network queues all these packets and so
after say 15 seconds I might have 4+ rekey packets queued, which result in 4+ responses from the far
end. Such data is quite expensive for this network

The device side is behind a network NAT and my goal is to keep a reverse connection in place, ie the
far end of the network can send packets back to the device. So I enable Keepalive in the wireguard
config

A conceptual description would be: hub and spoke arrangement of IOT devices on the far end of a
satellite internet link, where we want the hub to be able to keep an open pipe and push commands to
the IOT devices. The NAT UDP timeout on the satellite link firewalls is measured at approx 3 minutes.


So I face 3 challenges:

- keepalive packets are desired to be retransmitted every 3 minutes, but the encryption rekey timers
are set closer to 2 minutes. For example setting the keepalive timer to 2 minutes leads to sending
148 byte rekey packets for each rekey. However, setting the keepalive timer just short of 2 minutes
leads to a situation of 1 keepalive around the 2 min mark, followed by a rekey packet at the second
2 min mark, etc)

- wireguard has a 2 minute ish rekey timeout which causes sending a 148 byte request and triggering
a 92 byte response. However, as the retry interval is every 5 seconds, which usually leads to
sending 3-10x 148 byte requests (which are queued and retransmitted one the interface is up) and
leads to an equal number of 92 byte responses


So questions:

- Is it feasible within the design of wireguard to be able to "debounce" a stream of rekey packets
that will arrive reasonably consecutively (at about 22kbit/s), particularly it's the replies that I
want to queue and only send the latest? I couldn't see that this was feasible from the code as it
stands today? Suggestions appreciated though?


- Is it possible to adjust these constants

     REKEY_AFTER_TIME = 120,
     REJECT_AFTER_TIME = 180,

My concern looking at the code is that if I have some unmodified clients using the default settings,
then it's not clear to me how they would respond if one side has passed the REJECT_AFTER_TIME
interval and the other has not? (The intended scenario might be a hub spoke of IOT clients on the
satellite network, being accessed by other clients via general internet. The IOT clients and hub
server would be modified, but the other clients would be at defaults)

Can anyone comment on the implications of say altering only the client IOT devices to have a say
REKEY/REJECT times closer to 30 minutes? (ie server remaining on defaults)


- I implemented a very basic backoff on the resend of rekeys which better suits the characteristics
of this network, eg first retry is not until after 15 seconds, then it retries at 10, 15, 20, 25 sec
interval after that. Usually this leads to very few retries for my network. Code is below, any
comments?


Results:

With these changes and assuming a somewhat unreliable satellite network which might not have
coverage for some of the time (leading to additional retransmits), I see theoretical monthly idle
usage close to 3MB/month. However, being able to increase the REKEY/REJECT times to 30 mins might
drop this by a factor 10x or more. Can it be done?

Thanks

Ed W


Patch:

--- a/src/messages.h    2021-09-06 16:24:47.121985094 +0000
+++ b/src/messages.h    2021-09-06 13:54:59.879700016 +0000
@@ -40,14 +40,15 @@
 enum limits {
     REKEY_AFTER_MESSAGES = 1ULL << 60,
     REJECT_AFTER_MESSAGES = U64_MAX - COUNTER_WINDOW_SIZE - 1,
-    REKEY_TIMEOUT = 5,
+    REKEY_TIMEOUT = 10,
+    REKEY_BACKOFF = 5,
     REKEY_TIMEOUT_JITTER_MAX_JIFFIES = HZ / 3,
     REKEY_AFTER_TIME = 120,
     REJECT_AFTER_TIME = 180,
     INITIATIONS_PER_SECOND = 50,
     MAX_PEERS_PER_DEVICE = 1U << 20,
     KEEPALIVE_TIMEOUT = 10,
-    MAX_TIMER_HANDSHAKES = 90 / REKEY_TIMEOUT,
+    MAX_TIMER_HANDSHAKES = 5, /* 100 secs */
     MAX_QUEUED_INCOMING_HANDSHAKES = 4096, /* TODO: replace this with DQL */
     MAX_STAGED_PACKETS = 128,
     MAX_QUEUED_PACKETS = 1024 /* TODO: replace this with DQL */
--- a/src/timers.c    2021-09-06 16:24:47.122985106 +0000
+++ b/src/timers.c    2021-09-06 16:27:41.050156437 +0000
@@ -64,7 +64,7 @@
         ++peer->timer_handshake_attempts;
         pr_debug("%s: Handshake for peer %llu (%pISpfsc) did not complete after %d seconds,
retrying (try %d)\n",
              peer->device->dev->name, peer->internal_id,
-             &peer->endpoint.addr, REKEY_TIMEOUT,
+             &peer->endpoint.addr, (REKEY_TIMEOUT + (peer->timer_handshake_attempts * REKEY_BACKOFF)),
              peer->timer_handshake_attempts + 1);

         /* We clear the endpoint address src address, in case this is
@@ -182,7 +182,7 @@
 void wg_timers_handshake_initiated(struct wg_peer *peer)
 {
     mod_peer_timer(peer, &peer->timer_retransmit_handshake,
-               jiffies + REKEY_TIMEOUT * HZ +
+               jiffies + (REKEY_TIMEOUT + (peer->timer_handshake_attempts * REKEY_BACKOFF) + 5) * HZ +
                prandom_u32_max(REKEY_TIMEOUT_JITTER_MAX_JIFFIES));
 }






More information about the WireGuard mailing list