From vlady.ivasyuk at gmail.com  Mon Feb  1 14:08:43 2021
From: vlady.ivasyuk at gmail.com (Vladyslav Ivasyuk)
Date: Mon, 1 Feb 2021 16:08:43 +0200
Subject: [Windows] Unable to build latest wireguard missing
 i686-w64-mingw32-windres
Message-ID: <CAJ0ZNNmwpRVdygsgUnTqJCZR436RhcCWvw2bgUG3KpM5s0=otg@mail.gmail.com>

Hi,

the problem with "unknown revision" is happening because git history
was rewritten. There are two projects which are kept in branches of
the Windows WireGuard client project. Those are "github.com/lxn/walk"
and "github.com/lxn/win" and branches for them are "pkg/walk" and
"plg/walk-win" respectively. Both branches have a number of
customization commits which are kept at the top of the git history.
Hence sync with lxn's code is done with rebasing and then force
pushing. So after the sync, old commit id is lost and replaced with a
new one. Each time those branches are updated, you can also see the
"mod: bump" commit in the main branch. All commits before that will
not work anymore. Therefore old tags are not reliable too.

Probably, it can be fixed by keeping separate branch for at least
every tag, so build on tags does not break. Though, it would be better
to create a new branch on every update/sync. Another options to
consider are maintaining the lxn in a separate repository or keep
changes as a set of patches.

Best,
Vladyslav

From pg131072 at protonmail.com  Sun Feb  7 14:20:56 2021
From: pg131072 at protonmail.com (pg131072)
Date: Sun, 07 Feb 2021 14:20:56 +0000
Subject: Fw: Suggestion: Extended AllowedIPs syntax
In-Reply-To: <IVCtdT6g25RGmvC0J8IaIfgXll8aFl8QRaSYYN-fsGA40QXw2xuMVS39Ujnaq1bqqg6LoKRxyo80ZrV-yxoYIUSNXAZV_sWZhfdRj3QWPFo=@protonmail.com>
References: <IVCtdT6g25RGmvC0J8IaIfgXll8aFl8QRaSYYN-fsGA40QXw2xuMVS39Ujnaq1bqqg6LoKRxyo80ZrV-yxoYIUSNXAZV_sWZhfdRj3QWPFo=@protonmail.com>
Message-ID: <GTiw2RsJf2QRZPV5fdL0Ae5UH4u5SlycvoZJKMDbH4f7D7H7MFeNaIdNiMAcJRFZISYndJSaj-rQQIH0XhV7r1kJk3GkVH-7yiCkZeJQSCU=@protonmail.com>

I find the AllowedIPs CIDR format difficult to grok. What if Wireguard allowed...

 +IP/mask - add a range
 +IP-IP - add a range
 -IP/mask - remove a range
 -IP-IP - remove a range

Multiple terms would be interpreted left to right

i.e.

AllowedIPs: +1.2.3.0/24 -1.2.3.1-1.2.3.10 -1.2.3.255

Example C++ code:https://pastebin.com/mCLCg5vr

Thanks

PG

Note: I originally posted to Reddit:?
https://www.reddit.com/r/WireGuard/comments/lemdmv/suggestion_extended_allowedips_syntax/


From harald.dunkel at aixigo.com  Thu Feb  4 07:51:35 2021
From: harald.dunkel at aixigo.com (Harald Dunkel)
Date: Thu, 4 Feb 2021 08:51:35 +0100
Subject: QR code is very useful, if its available
Message-ID: <0e968b54-d3c3-bc15-9139-e7e1b6b79138@aixigo.com>

Hi folks,

2 remarks about the very useful QR code feature:

* Too bad its missing on some platforms. All these road warrior
    laptops (Mac or Intel) do have a camera.
* Maybe it would be possible to support password protected QR
    codes? Not sure if such a thing exists at all.


Regards
Harri


From erik at essd.nl  Mon Feb  8 11:36:36 2021
From: erik at essd.nl (Erik Schuitema)
Date: Mon, 8 Feb 2021 12:36:36 +0100
Subject: "BUG: scheduling while atomic" on 5.4 kernels with PREEMPT_RT
In-Reply-To: <CAHmME9pjcBFBQM67iTOjG+HLH8OnMUhxNOX2rne2umiNW9bHew@mail.gmail.com>
References: <90c10d21558d31825a56aac48692b080@essd.nl>
 <CAHmME9qmYATYk2oV0T3OmnE4J1uTy49p09gyuRHdrGNZHKsejg@mail.gmail.com>
 <9c5569cf88048c3ceb343340e68d7564@essd.nl>
 <CAHmME9pjcBFBQM67iTOjG+HLH8OnMUhxNOX2rne2umiNW9bHew@mail.gmail.com>
Message-ID: <0d84e883-2aa5-df3c-95bd-24304223d07f@essd.nl>

Hi Jason,

(Sorry for the delay in my reply..)

On 19/12/2020 19:16, Jason A. Donenfeld wrote:
 > So far as I can tell, upstream is fine with this. I'd encourage you to
 > move to the newer LTS, 5.10. The compat stuff has always been pretty
 > meh. It was an important step in getting WireGuard bootstrapped, of
 > course, but just look at this horror:
 >
 > https://git.zx2c4.com/wireguard-linux-compat/tree/src/compat/compat.h

I don't have doubts about the upstream code, I was merely wondering 
whether the performance hit from disabling SIMD is still present in 
newer kernels (it wasn't immediately obvious to me while browsing the 
5.10 source).

 > I'll keep it working as people need, but folks should really really
 > move to the new LTS, now that it's out.

These efforts are highly appreciated! It's not trivial for me to switch 
to a new kernel (needs extensive product testing), so I'm happy with the 
5.4 patch. But I'll be sure to skip right to 5.10 when moving to a new 
kernel.

Best regards,
Erik


From Jason at zx2c4.com  Mon Feb  8 13:38:16 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Mon,  8 Feb 2021 14:38:16 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring buffers
Message-ID: <20210208133816.45333-1-Jason@zx2c4.com>

Having two ring buffers per-peer means that every peer results in two
massive ring allocations. On an 8-core x86_64 machine, this commit
reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which
is an 90% reduction. Ninety percent! With some single-machine
deployments approaching 400,000 peers, we're talking about a reduction
from 7 gigs of memory down to 700 megs of memory.

In order to get rid of these per-peer allocations, this commit switches
to using a list-based queueing approach. Currently GSO fragments are
chained together using the skb->next pointer, so we form the per-peer
queue around the unused skb->prev pointer, which makes sense because the
links are pointing backwards. Multiple cores can write into the queue at
any given time, because its writes occur in the start_xmit path or in
the udp_recv path. But reads happen in a single workqueue item per-peer,
amounting to a multi-producer, single-consumer paradigm.

The MPSC queue is implemented locklessly and never blocks. However, it
is not linearizable (though it is serializable), with a very tight and
unlikely race on writes, which, when hit (about 0.15% of the time on a
fully loaded 16-core x86_64 system), causes the queue reader to
terminate early. However, because every packet sent queues up the same
workqueue item after it is fully added, the queue resumes again, and
stopping early isn't actually a problem, since at that point the packet
wouldn't have yet been added to the encryption queue. These properties
allow us to avoid disabling interrupts or spinning.

Performance-wise, ordinarily list-based queues aren't preferable to
ringbuffers, because of cache misses when following pointers around.
However, we *already* have to follow the adjacent pointers when working
through fragments, so there shouldn't actually be any change there. A
potential downside is that dequeueing is a bit more complicated, but the
ptr_ring structure used prior had a spinlock when dequeueing, so all and
all the difference appears to be a wash.

Actually, from profiling, the biggest performance hit, by far, of this
commit winds up being atomic_add_unless(count, 1, max) and atomic_
dec(count), which account for the majority of CPU time, according to
perf. In that sense, the previous ring buffer was superior in that it
could check if it was full by head==tail, which the list-based approach
cannot do.

Cc: Dmitry Vyukov <dvyukov at google.com>
Signed-off-by: Jason A. Donenfeld <Jason at zx2c4.com>
---
Hoping to get some feedback here from people running massive deployments
and running into ram issues, as well as Dmitry on the queueing semantics
(the mpsc queue is his design), before I send this to Dave for merging.
These changes are quite invasive, so I don't want to get anything wrong.

 drivers/net/wireguard/device.c   | 12 ++---
 drivers/net/wireguard/device.h   | 15 +++---
 drivers/net/wireguard/peer.c     | 29 ++++-------
 drivers/net/wireguard/peer.h     |  4 +-
 drivers/net/wireguard/queueing.c | 82 +++++++++++++++++++++++++-------
 drivers/net/wireguard/queueing.h | 45 +++++++++++++-----
 drivers/net/wireguard/receive.c  | 16 +++----
 drivers/net/wireguard/send.c     | 31 +++++-------
 8 files changed, 141 insertions(+), 93 deletions(-)

diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index cd51a2afa28e..d744199823b3 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -234,8 +234,8 @@ static void wg_destruct(struct net_device *dev)
 	destroy_workqueue(wg->handshake_receive_wq);
 	destroy_workqueue(wg->handshake_send_wq);
 	destroy_workqueue(wg->packet_crypt_wq);
-	wg_packet_queue_free(&wg->decrypt_queue, true);
-	wg_packet_queue_free(&wg->encrypt_queue, true);
+	wg_packet_queue_free(&wg->decrypt_queue);
+	wg_packet_queue_free(&wg->encrypt_queue);
 	rcu_barrier(); /* Wait for all the peers to be actually freed. */
 	wg_ratelimiter_uninit();
 	memzero_explicit(&wg->static_identity, sizeof(wg->static_identity));
@@ -337,12 +337,12 @@ static int wg_newlink(struct net *src_net, struct net_device *dev,
 		goto err_destroy_handshake_send;
 
 	ret = wg_packet_queue_init(&wg->encrypt_queue, wg_packet_encrypt_worker,
-				   true, MAX_QUEUED_PACKETS);
+				   MAX_QUEUED_PACKETS);
 	if (ret < 0)
 		goto err_destroy_packet_crypt;
 
 	ret = wg_packet_queue_init(&wg->decrypt_queue, wg_packet_decrypt_worker,
-				   true, MAX_QUEUED_PACKETS);
+				   MAX_QUEUED_PACKETS);
 	if (ret < 0)
 		goto err_free_encrypt_queue;
 
@@ -367,9 +367,9 @@ static int wg_newlink(struct net *src_net, struct net_device *dev,
 err_uninit_ratelimiter:
 	wg_ratelimiter_uninit();
 err_free_decrypt_queue:
-	wg_packet_queue_free(&wg->decrypt_queue, true);
+	wg_packet_queue_free(&wg->decrypt_queue);
 err_free_encrypt_queue:
-	wg_packet_queue_free(&wg->encrypt_queue, true);
+	wg_packet_queue_free(&wg->encrypt_queue);
 err_destroy_packet_crypt:
 	destroy_workqueue(wg->packet_crypt_wq);
 err_destroy_handshake_send:
diff --git a/drivers/net/wireguard/device.h b/drivers/net/wireguard/device.h
index 4d0144e16947..cb919f2ad1f8 100644
--- a/drivers/net/wireguard/device.h
+++ b/drivers/net/wireguard/device.h
@@ -27,13 +27,14 @@ struct multicore_worker {
 
 struct crypt_queue {
 	struct ptr_ring ring;
-	union {
-		struct {
-			struct multicore_worker __percpu *worker;
-			int last_cpu;
-		};
-		struct work_struct work;
-	};
+	struct multicore_worker __percpu *worker;
+	int last_cpu;
+};
+
+struct prev_queue {
+	struct sk_buff *head, *tail, *peeked;
+	struct { struct sk_buff *next, *prev; } empty;
+	atomic_t count;
 };
 
 struct wg_device {
diff --git a/drivers/net/wireguard/peer.c b/drivers/net/wireguard/peer.c
index b3b6370e6b95..1969fc22d47e 100644
--- a/drivers/net/wireguard/peer.c
+++ b/drivers/net/wireguard/peer.c
@@ -32,27 +32,22 @@ struct wg_peer *wg_peer_create(struct wg_device *wg,
 	peer = kzalloc(sizeof(*peer), GFP_KERNEL);
 	if (unlikely(!peer))
 		return ERR_PTR(ret);
-	peer->device = wg;
+	if (dst_cache_init(&peer->endpoint_cache, GFP_KERNEL))
+		goto err;
 
+	peer->device = wg;
 	wg_noise_handshake_init(&peer->handshake, &wg->static_identity,
 				public_key, preshared_key, peer);
-	if (dst_cache_init(&peer->endpoint_cache, GFP_KERNEL))
-		goto err_1;
-	if (wg_packet_queue_init(&peer->tx_queue, wg_packet_tx_worker, false,
-				 MAX_QUEUED_PACKETS))
-		goto err_2;
-	if (wg_packet_queue_init(&peer->rx_queue, NULL, false,
-				 MAX_QUEUED_PACKETS))
-		goto err_3;
-
 	peer->internal_id = atomic64_inc_return(&peer_counter);
 	peer->serial_work_cpu = nr_cpumask_bits;
 	wg_cookie_init(&peer->latest_cookie);
 	wg_timers_init(peer);
 	wg_cookie_checker_precompute_peer_keys(peer);
 	spin_lock_init(&peer->keypairs.keypair_update_lock);
-	INIT_WORK(&peer->transmit_handshake_work,
-		  wg_packet_handshake_send_worker);
+	INIT_WORK(&peer->transmit_handshake_work, wg_packet_handshake_send_worker);
+	INIT_WORK(&peer->transmit_packet_work, wg_packet_tx_worker);
+	wg_prev_queue_init(&peer->tx_queue);
+	wg_prev_queue_init(&peer->rx_queue);
 	rwlock_init(&peer->endpoint_lock);
 	kref_init(&peer->refcount);
 	skb_queue_head_init(&peer->staged_packet_queue);
@@ -68,11 +63,7 @@ struct wg_peer *wg_peer_create(struct wg_device *wg,
 	pr_debug("%s: Peer %llu created\n", wg->dev->name, peer->internal_id);
 	return peer;
 
-err_3:
-	wg_packet_queue_free(&peer->tx_queue, false);
-err_2:
-	dst_cache_destroy(&peer->endpoint_cache);
-err_1:
+err:
 	kfree(peer);
 	return ERR_PTR(ret);
 }
@@ -197,8 +188,8 @@ static void rcu_release(struct rcu_head *rcu)
 	struct wg_peer *peer = container_of(rcu, struct wg_peer, rcu);
 
 	dst_cache_destroy(&peer->endpoint_cache);
-	wg_packet_queue_free(&peer->rx_queue, false);
-	wg_packet_queue_free(&peer->tx_queue, false);
+	WARN_ON(wg_prev_queue_dequeue(&peer->tx_queue) || peer->tx_queue.peeked);
+	WARN_ON(wg_prev_queue_dequeue(&peer->rx_queue) || peer->rx_queue.peeked);
 
 	/* The final zeroing takes care of clearing any remaining handshake key
 	 * material and other potentially sensitive information.
diff --git a/drivers/net/wireguard/peer.h b/drivers/net/wireguard/peer.h
index aaff8de6e34b..8d53b687a1d1 100644
--- a/drivers/net/wireguard/peer.h
+++ b/drivers/net/wireguard/peer.h
@@ -36,7 +36,7 @@ struct endpoint {
 
 struct wg_peer {
 	struct wg_device *device;
-	struct crypt_queue tx_queue, rx_queue;
+	struct prev_queue tx_queue, rx_queue;
 	struct sk_buff_head staged_packet_queue;
 	int serial_work_cpu;
 	bool is_dead;
@@ -46,7 +46,7 @@ struct wg_peer {
 	rwlock_t endpoint_lock;
 	struct noise_handshake handshake;
 	atomic64_t last_sent_handshake;
-	struct work_struct transmit_handshake_work, clear_peer_work;
+	struct work_struct transmit_handshake_work, clear_peer_work, transmit_packet_work;
 	struct cookie latest_cookie;
 	struct hlist_node pubkey_hash;
 	u64 rx_bytes, tx_bytes;
diff --git a/drivers/net/wireguard/queueing.c b/drivers/net/wireguard/queueing.c
index 71b8e80b58e1..a72380ce97dd 100644
--- a/drivers/net/wireguard/queueing.c
+++ b/drivers/net/wireguard/queueing.c
@@ -9,8 +9,7 @@ struct multicore_worker __percpu *
 wg_packet_percpu_multicore_worker_alloc(work_func_t function, void *ptr)
 {
 	int cpu;
-	struct multicore_worker __percpu *worker =
-		alloc_percpu(struct multicore_worker);
+	struct multicore_worker __percpu *worker = alloc_percpu(struct multicore_worker);
 
 	if (!worker)
 		return NULL;
@@ -23,7 +22,7 @@ wg_packet_percpu_multicore_worker_alloc(work_func_t function, void *ptr)
 }
 
 int wg_packet_queue_init(struct crypt_queue *queue, work_func_t function,
-			 bool multicore, unsigned int len)
+			 unsigned int len)
 {
 	int ret;
 
@@ -31,25 +30,74 @@ int wg_packet_queue_init(struct crypt_queue *queue, work_func_t function,
 	ret = ptr_ring_init(&queue->ring, len, GFP_KERNEL);
 	if (ret)
 		return ret;
-	if (function) {
-		if (multicore) {
-			queue->worker = wg_packet_percpu_multicore_worker_alloc(
-				function, queue);
-			if (!queue->worker) {
-				ptr_ring_cleanup(&queue->ring, NULL);
-				return -ENOMEM;
-			}
-		} else {
-			INIT_WORK(&queue->work, function);
-		}
+	queue->worker = wg_packet_percpu_multicore_worker_alloc(function, queue);
+	if (!queue->worker) {
+		ptr_ring_cleanup(&queue->ring, NULL);
+		return -ENOMEM;
 	}
 	return 0;
 }
 
-void wg_packet_queue_free(struct crypt_queue *queue, bool multicore)
+void wg_packet_queue_free(struct crypt_queue *queue)
 {
-	if (multicore)
-		free_percpu(queue->worker);
+	free_percpu(queue->worker);
 	WARN_ON(!__ptr_ring_empty(&queue->ring));
 	ptr_ring_cleanup(&queue->ring, NULL);
 }
+
+#define NEXT(skb) ((skb)->prev)
+#define STUB(queue) ((struct sk_buff *)&queue->empty)
+
+void wg_prev_queue_init(struct prev_queue *queue)
+{
+	NEXT(STUB(queue)) = NULL;
+	queue->head = queue->tail = STUB(queue);
+	queue->peeked = NULL;
+	atomic_set(&queue->count, 0);
+}
+
+static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
+{
+	WRITE_ONCE(NEXT(skb), NULL);
+	smp_wmb();
+	WRITE_ONCE(NEXT(xchg_relaxed(&queue->head, skb)), skb);
+}
+
+bool wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
+{
+	if (!atomic_add_unless(&queue->count, 1, MAX_QUEUED_PACKETS))
+		return false;
+	__wg_prev_queue_enqueue(queue, skb);
+	return true;
+}
+
+struct sk_buff *wg_prev_queue_dequeue(struct prev_queue *queue)
+{
+	struct sk_buff *tail = queue->tail, *next = smp_load_acquire(&NEXT(tail));
+
+	if (tail == STUB(queue)) {
+		if (!next)
+			return NULL;
+		queue->tail = next;
+		tail = next;
+		next = smp_load_acquire(&NEXT(next));
+	}
+	if (next) {
+		queue->tail = next;
+		atomic_dec(&queue->count);
+		return tail;
+	}
+	if (tail != READ_ONCE(queue->head))
+		return NULL;
+	__wg_prev_queue_enqueue(queue, STUB(queue));
+	next = smp_load_acquire(&NEXT(tail));
+	if (next) {
+		queue->tail = next;
+		atomic_dec(&queue->count);
+		return tail;
+	}
+	return NULL;
+}
+
+#undef NEXT
+#undef STUB
diff --git a/drivers/net/wireguard/queueing.h b/drivers/net/wireguard/queueing.h
index dfb674e03076..4ef2944a68bc 100644
--- a/drivers/net/wireguard/queueing.h
+++ b/drivers/net/wireguard/queueing.h
@@ -17,12 +17,13 @@ struct wg_device;
 struct wg_peer;
 struct multicore_worker;
 struct crypt_queue;
+struct prev_queue;
 struct sk_buff;
 
 /* queueing.c APIs: */
 int wg_packet_queue_init(struct crypt_queue *queue, work_func_t function,
-			 bool multicore, unsigned int len);
-void wg_packet_queue_free(struct crypt_queue *queue, bool multicore);
+			 unsigned int len);
+void wg_packet_queue_free(struct crypt_queue *queue);
 struct multicore_worker __percpu *
 wg_packet_percpu_multicore_worker_alloc(work_func_t function, void *ptr);
 
@@ -135,8 +136,31 @@ static inline int wg_cpumask_next_online(int *next)
 	return cpu;
 }
 
+void wg_prev_queue_init(struct prev_queue *queue);
+
+/* Multi producer */
+bool wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb);
+
+/* Single consumer */
+struct sk_buff *wg_prev_queue_dequeue(struct prev_queue *queue);
+
+/* Single consumer */
+static inline struct sk_buff *wg_prev_queue_peek(struct prev_queue *queue)
+{
+	if (queue->peeked)
+		return queue->peeked;
+	queue->peeked = wg_prev_queue_dequeue(queue);
+	return queue->peeked;
+}
+
+/* Single consumer */
+static inline void wg_prev_queue_drop_peeked(struct prev_queue *queue)
+{
+	queue->peeked = NULL;
+}
+
 static inline int wg_queue_enqueue_per_device_and_peer(
-	struct crypt_queue *device_queue, struct crypt_queue *peer_queue,
+	struct crypt_queue *device_queue, struct prev_queue *peer_queue,
 	struct sk_buff *skb, struct workqueue_struct *wq, int *next_cpu)
 {
 	int cpu;
@@ -145,8 +169,9 @@ static inline int wg_queue_enqueue_per_device_and_peer(
 	/* We first queue this up for the peer ingestion, but the consumer
 	 * will wait for the state to change to CRYPTED or DEAD before.
 	 */
-	if (unlikely(ptr_ring_produce_bh(&peer_queue->ring, skb)))
+	if (unlikely(!wg_prev_queue_enqueue(peer_queue, skb)))
 		return -ENOSPC;
+
 	/* Then we queue it up in the device queue, which consumes the
 	 * packet as soon as it can.
 	 */
@@ -157,9 +182,7 @@ static inline int wg_queue_enqueue_per_device_and_peer(
 	return 0;
 }
 
-static inline void wg_queue_enqueue_per_peer(struct crypt_queue *queue,
-					     struct sk_buff *skb,
-					     enum packet_state state)
+static inline void wg_queue_enqueue_per_peer_tx(struct sk_buff *skb, enum packet_state state)
 {
 	/* We take a reference, because as soon as we call atomic_set, the
 	 * peer can be freed from below us.
@@ -167,14 +190,12 @@ static inline void wg_queue_enqueue_per_peer(struct crypt_queue *queue,
 	struct wg_peer *peer = wg_peer_get(PACKET_PEER(skb));
 
 	atomic_set_release(&PACKET_CB(skb)->state, state);
-	queue_work_on(wg_cpumask_choose_online(&peer->serial_work_cpu,
-					       peer->internal_id),
-		      peer->device->packet_crypt_wq, &queue->work);
+	queue_work_on(wg_cpumask_choose_online(&peer->serial_work_cpu, peer->internal_id),
+		      peer->device->packet_crypt_wq, &peer->transmit_packet_work);
 	wg_peer_put(peer);
 }
 
-static inline void wg_queue_enqueue_per_peer_napi(struct sk_buff *skb,
-						  enum packet_state state)
+static inline void wg_queue_enqueue_per_peer_rx(struct sk_buff *skb, enum packet_state state)
 {
 	/* We take a reference, because as soon as we call atomic_set, the
 	 * peer can be freed from below us.
diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c
index 2c9551ea6dc7..7dc84bcca261 100644
--- a/drivers/net/wireguard/receive.c
+++ b/drivers/net/wireguard/receive.c
@@ -444,7 +444,6 @@ static void wg_packet_consume_data_done(struct wg_peer *peer,
 int wg_packet_rx_poll(struct napi_struct *napi, int budget)
 {
 	struct wg_peer *peer = container_of(napi, struct wg_peer, napi);
-	struct crypt_queue *queue = &peer->rx_queue;
 	struct noise_keypair *keypair;
 	struct endpoint endpoint;
 	enum packet_state state;
@@ -455,11 +454,10 @@ int wg_packet_rx_poll(struct napi_struct *napi, int budget)
 	if (unlikely(budget <= 0))
 		return 0;
 
-	while ((skb = __ptr_ring_peek(&queue->ring)) != NULL &&
+	while ((skb = wg_prev_queue_peek(&peer->rx_queue)) != NULL &&
 	       (state = atomic_read_acquire(&PACKET_CB(skb)->state)) !=
 		       PACKET_STATE_UNCRYPTED) {
-		__ptr_ring_discard_one(&queue->ring);
-		peer = PACKET_PEER(skb);
+		wg_prev_queue_drop_peeked(&peer->rx_queue);
 		keypair = PACKET_CB(skb)->keypair;
 		free = true;
 
@@ -508,7 +506,7 @@ void wg_packet_decrypt_worker(struct work_struct *work)
 		enum packet_state state =
 			likely(decrypt_packet(skb, PACKET_CB(skb)->keypair)) ?
 				PACKET_STATE_CRYPTED : PACKET_STATE_DEAD;
-		wg_queue_enqueue_per_peer_napi(skb, state);
+		wg_queue_enqueue_per_peer_rx(skb, state);
 		if (need_resched())
 			cond_resched();
 	}
@@ -531,12 +529,10 @@ static void wg_packet_consume_data(struct wg_device *wg, struct sk_buff *skb)
 	if (unlikely(READ_ONCE(peer->is_dead)))
 		goto err;
 
-	ret = wg_queue_enqueue_per_device_and_peer(&wg->decrypt_queue,
-						   &peer->rx_queue, skb,
-						   wg->packet_crypt_wq,
-						   &wg->decrypt_queue.last_cpu);
+	ret = wg_queue_enqueue_per_device_and_peer(&wg->decrypt_queue, &peer->rx_queue, skb,
+						   wg->packet_crypt_wq, &wg->decrypt_queue.last_cpu);
 	if (unlikely(ret == -EPIPE))
-		wg_queue_enqueue_per_peer_napi(skb, PACKET_STATE_DEAD);
+		wg_queue_enqueue_per_peer_rx(skb, PACKET_STATE_DEAD);
 	if (likely(!ret || ret == -EPIPE)) {
 		rcu_read_unlock_bh();
 		return;
diff --git a/drivers/net/wireguard/send.c b/drivers/net/wireguard/send.c
index f74b9341ab0f..5368f7c35b4b 100644
--- a/drivers/net/wireguard/send.c
+++ b/drivers/net/wireguard/send.c
@@ -239,8 +239,7 @@ void wg_packet_send_keepalive(struct wg_peer *peer)
 	wg_packet_send_staged_packets(peer);
 }
 
-static void wg_packet_create_data_done(struct sk_buff *first,
-				       struct wg_peer *peer)
+static void wg_packet_create_data_done(struct wg_peer *peer, struct sk_buff *first)
 {
 	struct sk_buff *skb, *next;
 	bool is_keepalive, data_sent = false;
@@ -262,22 +261,19 @@ static void wg_packet_create_data_done(struct sk_buff *first,
 
 void wg_packet_tx_worker(struct work_struct *work)
 {
-	struct crypt_queue *queue = container_of(work, struct crypt_queue,
-						 work);
+	struct wg_peer *peer = container_of(work, struct wg_peer, transmit_packet_work);
 	struct noise_keypair *keypair;
 	enum packet_state state;
 	struct sk_buff *first;
-	struct wg_peer *peer;
 
-	while ((first = __ptr_ring_peek(&queue->ring)) != NULL &&
+	while ((first = wg_prev_queue_peek(&peer->tx_queue)) != NULL &&
 	       (state = atomic_read_acquire(&PACKET_CB(first)->state)) !=
 		       PACKET_STATE_UNCRYPTED) {
-		__ptr_ring_discard_one(&queue->ring);
-		peer = PACKET_PEER(first);
+		wg_prev_queue_drop_peeked(&peer->tx_queue);
 		keypair = PACKET_CB(first)->keypair;
 
 		if (likely(state == PACKET_STATE_CRYPTED))
-			wg_packet_create_data_done(first, peer);
+			wg_packet_create_data_done(peer, first);
 		else
 			kfree_skb_list(first);
 
@@ -306,16 +302,14 @@ void wg_packet_encrypt_worker(struct work_struct *work)
 				break;
 			}
 		}
-		wg_queue_enqueue_per_peer(&PACKET_PEER(first)->tx_queue, first,
-					  state);
+		wg_queue_enqueue_per_peer_tx(first, state);
 		if (need_resched())
 			cond_resched();
 	}
 }
 
-static void wg_packet_create_data(struct sk_buff *first)
+static void wg_packet_create_data(struct wg_peer *peer, struct sk_buff *first)
 {
-	struct wg_peer *peer = PACKET_PEER(first);
 	struct wg_device *wg = peer->device;
 	int ret = -EINVAL;
 
@@ -323,13 +317,10 @@ static void wg_packet_create_data(struct sk_buff *first)
 	if (unlikely(READ_ONCE(peer->is_dead)))
 		goto err;
 
-	ret = wg_queue_enqueue_per_device_and_peer(&wg->encrypt_queue,
-						   &peer->tx_queue, first,
-						   wg->packet_crypt_wq,
-						   &wg->encrypt_queue.last_cpu);
+	ret = wg_queue_enqueue_per_device_and_peer(&wg->encrypt_queue, &peer->tx_queue, first,
+						   wg->packet_crypt_wq, &wg->encrypt_queue.last_cpu);
 	if (unlikely(ret == -EPIPE))
-		wg_queue_enqueue_per_peer(&peer->tx_queue, first,
-					  PACKET_STATE_DEAD);
+		wg_queue_enqueue_per_peer_tx(first, PACKET_STATE_DEAD);
 err:
 	rcu_read_unlock_bh();
 	if (likely(!ret || ret == -EPIPE))
@@ -393,7 +384,7 @@ void wg_packet_send_staged_packets(struct wg_peer *peer)
 	packets.prev->next = NULL;
 	wg_peer_get(keypair->entry.peer);
 	PACKET_CB(packets.next)->keypair = keypair;
-	wg_packet_create_data(packets.next);
+	wg_packet_create_data(peer, packets.next);
 	return;
 
 out_invalid:
-- 
2.30.0


From jp at sec.uni-passau.de  Mon Feb  8 21:10:46 2021
From: jp at sec.uni-passau.de (Posegga, Joachim)
Date: Mon, 8 Feb 2021 21:10:46 +0000
Subject: Suggestion: Extended AllowedIPs syntax
In-Reply-To: <GTiw2RsJf2QRZPV5fdL0Ae5UH4u5SlycvoZJKMDbH4f7D7H7MFeNaIdNiMAcJRFZISYndJSaj-rQQIH0XhV7r1kJk3GkVH-7yiCkZeJQSCU=@protonmail.com>
References: <IVCtdT6g25RGmvC0J8IaIfgXll8aFl8QRaSYYN-fsGA40QXw2xuMVS39Ujnaq1bqqg6LoKRxyo80ZrV-yxoYIUSNXAZV_sWZhfdRj3QWPFo=@protonmail.com>
 <GTiw2RsJf2QRZPV5fdL0Ae5UH4u5SlycvoZJKMDbH4f7D7H7MFeNaIdNiMAcJRFZISYndJSaj-rQQIH0XhV7r1kJk3GkVH-7yiCkZeJQSCU=@protonmail.com>
Message-ID: <e2f7a02c4aa1436abbb88467faf2a05b@smith.sec.uni-passau.de>

I would very much appreciate a way to exclude subnets from being routed through a wg tunnel. Would be much more convenient than changing the system's routing table by hand, e.g. if you want to keep connectivity to your local subnet when establishing a tunnel for 0.0.0.0/0.

-----Original Message-----
From: WireGuard [mailto:wireguard-bounces at lists.zx2c4.com] On Behalf Of pg131072
Sent: Sunday, 7 February, 2021 15:21
To: wireguard at lists.zx2c4.com
Subject: Fw: Suggestion: Extended AllowedIPs syntax

I find the AllowedIPs CIDR format difficult to grok. What if Wireguard allowed...

 +IP/mask - add a range
 +IP-IP - add a range
 -IP/mask - remove a range
 -IP-IP - remove a range

Multiple terms would be interpreted left to right

i.e.

AllowedIPs: +1.2.3.0/24 -1.2.3.1-1.2.3.10 -1.2.3.255

Example C++ code:https://pastebin.com/mCLCg5vr

Thanks

PG

Note: I originally posted to Reddit:?
https://www.reddit.com/r/WireGuard/comments/lemdmv/suggestion_extended_allowedips_syntax/


From dvyukov at google.com  Tue Feb  9 08:24:27 2021
From: dvyukov at google.com (Dmitry Vyukov)
Date: Tue, 9 Feb 2021 09:24:27 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <20210208133816.45333-1-Jason@zx2c4.com>
References: <20210208133816.45333-1-Jason@zx2c4.com>
Message-ID: <CACT4Y+ZhwdjNAOhT7zgfe2OtNhtfxAzwcWRxGy8SRe+B4c=q7A@mail.gmail.com>

On Mon, Feb 8, 2021 at 2:38 PM Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>
> Having two ring buffers per-peer means that every peer results in two
> massive ring allocations. On an 8-core x86_64 machine, this commit
> reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which
> is an 90% reduction. Ninety percent! With some single-machine
> deployments approaching 400,000 peers, we're talking about a reduction
> from 7 gigs of memory down to 700 megs of memory.
>
> In order to get rid of these per-peer allocations, this commit switches
> to using a list-based queueing approach. Currently GSO fragments are
> chained together using the skb->next pointer, so we form the per-peer
> queue around the unused skb->prev pointer, which makes sense because the
> links are pointing backwards. Multiple cores can write into the queue at
> any given time, because its writes occur in the start_xmit path or in
> the udp_recv path. But reads happen in a single workqueue item per-peer,
> amounting to a multi-producer, single-consumer paradigm.
>
> The MPSC queue is implemented locklessly and never blocks. However, it
> is not linearizable (though it is serializable), with a very tight and
> unlikely race on writes, which, when hit (about 0.15% of the time on a
> fully loaded 16-core x86_64 system), causes the queue reader to
> terminate early. However, because every packet sent queues up the same
> workqueue item after it is fully added, the queue resumes again, and
> stopping early isn't actually a problem, since at that point the packet
> wouldn't have yet been added to the encryption queue. These properties
> allow us to avoid disabling interrupts or spinning.

Hi Jason,

Exciting! I reviewed only the queue code itself.

Strictly saying, 0.15% is for delaying the newly added item only. This
is not a problem, we can just consider that push has not finished yet
in this case. You can get this with any queue. It's just that consumer
has peeked on producer that it started enqueue but has not finished
yet. In a mutex-protected queue consumers just don't have the
opportunity to peek, they just block until enqueue has completed.
The problem is only when a partially queued item blocks subsequent
completely queued items. That should be some small fraction of 0.15%.


> Performance-wise, ordinarily list-based queues aren't preferable to
> ringbuffers, because of cache misses when following pointers around.
> However, we *already* have to follow the adjacent pointers when working
> through fragments, so there shouldn't actually be any change there. A
> potential downside is that dequeueing is a bit more complicated, but the
> ptr_ring structure used prior had a spinlock when dequeueing, so all and
> all the difference appears to be a wash.
>
> Actually, from profiling, the biggest performance hit, by far, of this
> commit winds up being atomic_add_unless(count, 1, max) and atomic_
> dec(count), which account for the majority of CPU time, according to
> perf. In that sense, the previous ring buffer was superior in that it
> could check if it was full by head==tail, which the list-based approach
> cannot do.

We could try to cheat a bit here.
We could split the counter into:

atomic_t enqueued;
unsigned dequeued;

then, consumer will do just dequeued++.
Producers can do (depending on how precise you want them to be):

if ((int)(atomic_read(&enqueued) - dequeued) >= MAX)
    return false;
atomic_add(&enqueued, 1);

or, for more precise counting we could do a CAS loop on enqueued.
Since any modifications to dequeued can only lead to reduction of
size, we don't need to double check it before CAS, thus the CAS loop
should provide a precise upper bound on size.
Or, we could check, opportunistically increment, and then decrement if
overflow, but that looks the least favorable option.


> Cc: Dmitry Vyukov <dvyukov at google.com>
> Signed-off-by: Jason A. Donenfeld <Jason at zx2c4.com>

The queue logic looks correct to me.
I did not spot any significant algorithmic differences with my algorithm:
https://groups.google.com/g/lock-free/c/Vd9xuHrLggE/m/B9-URa3B37MJ

Reviewed-by: Dmitry Vyukov <dvyukov at google.com>

> ---
> Hoping to get some feedback here from people running massive deployments
> and running into ram issues, as well as Dmitry on the queueing semantics
> (the mpsc queue is his design), before I send this to Dave for merging.
> These changes are quite invasive, so I don't want to get anything wrong.


> +struct prev_queue {
> +       struct sk_buff *head, *tail, *peeked;
> +       struct { struct sk_buff *next, *prev; } empty;
> +       atomic_t count;
>  };


This would benefit from a comment explaining that empty needs to match
sk_buff up to prev (and a corresponding build bug that offset of prev
match in empty and sk_buff), and why we use prev instead of next (I
don't know).


> +#define NEXT(skb) ((skb)->prev)
> +#define STUB(queue) ((struct sk_buff *)&queue->empty)
> +
> +void wg_prev_queue_init(struct prev_queue *queue)
> +{
> +       NEXT(STUB(queue)) = NULL;
> +       queue->head = queue->tail = STUB(queue);
> +       queue->peeked = NULL;
> +       atomic_set(&queue->count, 0);
> +}
> +
> +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
> +{
> +       WRITE_ONCE(NEXT(skb), NULL);
> +       smp_wmb();
> +       WRITE_ONCE(NEXT(xchg_relaxed(&queue->head, skb)), skb);
> +}
> +
> +bool wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
> +{
> +       if (!atomic_add_unless(&queue->count, 1, MAX_QUEUED_PACKETS))
> +               return false;
> +       __wg_prev_queue_enqueue(queue, skb);
> +       return true;
> +}
> +
> +struct sk_buff *wg_prev_queue_dequeue(struct prev_queue *queue)
> +{
> +       struct sk_buff *tail = queue->tail, *next = smp_load_acquire(&NEXT(tail));
> +
> +       if (tail == STUB(queue)) {
> +               if (!next)
> +                       return NULL;
> +               queue->tail = next;
> +               tail = next;
> +               next = smp_load_acquire(&NEXT(next));
> +       }
> +       if (next) {
> +               queue->tail = next;
> +               atomic_dec(&queue->count);
> +               return tail;
> +       }
> +       if (tail != READ_ONCE(queue->head))
> +               return NULL;
> +       __wg_prev_queue_enqueue(queue, STUB(queue));
> +       next = smp_load_acquire(&NEXT(tail));
> +       if (next) {
> +               queue->tail = next;
> +               atomic_dec(&queue->count);
> +               return tail;
> +       }
> +       return NULL;
> +}
> +
> +#undef NEXT
> +#undef STUB


> +void wg_prev_queue_init(struct prev_queue *queue);
> +
> +/* Multi producer */
> +bool wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb);
> +
> +/* Single consumer */
> +struct sk_buff *wg_prev_queue_dequeue(struct prev_queue *queue);
> +
> +/* Single consumer */
> +static inline struct sk_buff *wg_prev_queue_peek(struct prev_queue *queue)
> +{
> +       if (queue->peeked)
> +               return queue->peeked;
> +       queue->peeked = wg_prev_queue_dequeue(queue);
> +       return queue->peeked;
> +}
> +
> +/* Single consumer */
> +static inline void wg_prev_queue_drop_peeked(struct prev_queue *queue)
> +{
> +       queue->peeked = NULL;
> +}


> @@ -197,8 +188,8 @@ static void rcu_release(struct rcu_head *rcu)
>         struct wg_peer *peer = container_of(rcu, struct wg_peer, rcu);
>
>         dst_cache_destroy(&peer->endpoint_cache);
> -       wg_packet_queue_free(&peer->rx_queue, false);
> -       wg_packet_queue_free(&peer->tx_queue, false);
> +       WARN_ON(wg_prev_queue_dequeue(&peer->tx_queue) || peer->tx_queue.peeked);
> +       WARN_ON(wg_prev_queue_dequeue(&peer->rx_queue) || peer->rx_queue.peeked);

This could use just wg_prev_queue_peek.

From Jason at zx2c4.com  Tue Feb  9 15:44:41 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Tue, 9 Feb 2021 16:44:41 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <CACT4Y+ZhwdjNAOhT7zgfe2OtNhtfxAzwcWRxGy8SRe+B4c=q7A@mail.gmail.com>
References: <20210208133816.45333-1-Jason@zx2c4.com>
 <CACT4Y+ZhwdjNAOhT7zgfe2OtNhtfxAzwcWRxGy8SRe+B4c=q7A@mail.gmail.com>
Message-ID: <CAHmME9pc==7a3vw-bv7TTDPv6XXAba_uwOHhBJvvJHZpH7hqAg@mail.gmail.com>

Hi Dmitry,

Thanks for the review.

On Tue, Feb 9, 2021 at 9:24 AM Dmitry Vyukov <dvyukov at google.com> wrote:
> Strictly saying, 0.15% is for delaying the newly added item only. This
> is not a problem, we can just consider that push has not finished yet
> in this case. You can get this with any queue. It's just that consumer
> has peeked on producer that it started enqueue but has not finished
> yet. In a mutex-protected queue consumers just don't have the
> opportunity to peek, they just block until enqueue has completed.
> The problem is only when a partially queued item blocks subsequent
> completely queued items. That should be some small fraction of 0.15%.

Ah right. I'll make that clear in the commit message.

> We could try to cheat a bit here.
> We could split the counter into:
>
> atomic_t enqueued;
> unsigned dequeued;
>
> then, consumer will do just dequeued++.
> Producers can do (depending on how precise you want them to be):
>
> if ((int)(atomic_read(&enqueued) - dequeued) >= MAX)
>     return false;
> atomic_add(&enqueued, 1);
>
> or, for more precise counting we could do a CAS loop on enqueued.

I guess the CAS case would look like `if
(!atomic_add_unless(&enqueued, 1, MAX + dequeued))` or similar, though
>= might be safer than ==, so writing out the loop manually wouldn't
be a bad idea.

But... I would probably need smp_load/smp_store helpers around
dequeued, right? Unless we argue some degree of courseness doesn't
matter.

> Or, we could check, opportunistically increment, and then decrement if
> overflow, but that looks the least favorable option.

I had originally done something like that, but I didn't like the idea
of it being able to grow beyond the limit by the number of CPU cores.

The other option, of course, is to just do nothing, and keep the
atomic as-is. There's already ~high overhead from kref_get, so I could
always revisit this after I move from kref.h over to
percpu-refcount.h.

>
> > +struct prev_queue {
> > +       struct sk_buff *head, *tail, *peeked;
> > +       struct { struct sk_buff *next, *prev; } empty;
> > +       atomic_t count;
> >  };
>
>
> This would benefit from a comment explaining that empty needs to match
> sk_buff up to prev (and a corresponding build bug that offset of prev
> match in empty and sk_buff), and why we use prev instead of next (I
> don't know).

That's a good idea. Will do.


> > @@ -197,8 +188,8 @@ static void rcu_release(struct rcu_head *rcu)
> >         struct wg_peer *peer = container_of(rcu, struct wg_peer, rcu);
> >
> >         dst_cache_destroy(&peer->endpoint_cache);
> > -       wg_packet_queue_free(&peer->rx_queue, false);
> > -       wg_packet_queue_free(&peer->tx_queue, false);
> > +       WARN_ON(wg_prev_queue_dequeue(&peer->tx_queue) || peer->tx_queue.peeked);
> > +       WARN_ON(wg_prev_queue_dequeue(&peer->rx_queue) || peer->rx_queue.peeked);
>
> This could use just wg_prev_queue_peek.

Nice catch, thanks.

Jason

From dvyukov at google.com  Tue Feb  9 16:20:01 2021
From: dvyukov at google.com (Dmitry Vyukov)
Date: Tue, 9 Feb 2021 17:20:01 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <CAHmME9pc==7a3vw-bv7TTDPv6XXAba_uwOHhBJvvJHZpH7hqAg@mail.gmail.com>
References: <20210208133816.45333-1-Jason@zx2c4.com>
 <CACT4Y+ZhwdjNAOhT7zgfe2OtNhtfxAzwcWRxGy8SRe+B4c=q7A@mail.gmail.com>
 <CAHmME9pc==7a3vw-bv7TTDPv6XXAba_uwOHhBJvvJHZpH7hqAg@mail.gmail.com>
Message-ID: <CACT4Y+YM9d6nN7Sjy=Z=Rw++r333RtFOXJgNU7WfrJk4V+nofw@mail.gmail.com>

On Tue, Feb 9, 2021 at 4:44 PM Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>
> Hi Dmitry,
>
> Thanks for the review.
>
> On Tue, Feb 9, 2021 at 9:24 AM Dmitry Vyukov <dvyukov at google.com> wrote:
> > Strictly saying, 0.15% is for delaying the newly added item only. This
> > is not a problem, we can just consider that push has not finished yet
> > in this case. You can get this with any queue. It's just that consumer
> > has peeked on producer that it started enqueue but has not finished
> > yet. In a mutex-protected queue consumers just don't have the
> > opportunity to peek, they just block until enqueue has completed.
> > The problem is only when a partially queued item blocks subsequent
> > completely queued items. That should be some small fraction of 0.15%.
>
> Ah right. I'll make that clear in the commit message.
>
> > We could try to cheat a bit here.
> > We could split the counter into:
> >
> > atomic_t enqueued;
> > unsigned dequeued;
> >
> > then, consumer will do just dequeued++.
> > Producers can do (depending on how precise you want them to be):
> >
> > if ((int)(atomic_read(&enqueued) - dequeued) >= MAX)
> >     return false;
> > atomic_add(&enqueued, 1);
> >
> > or, for more precise counting we could do a CAS loop on enqueued.
>
> I guess the CAS case would look like `if
> (!atomic_add_unless(&enqueued, 1, MAX + dequeued))` or similar, though
> >= might be safer than ==, so writing out the loop manually wouldn't
> be a bad idea.

What I had in mind is:

int e = READ_ONCE(q->enqueued);
for (;;) {
  int d = READ_ONCE(q->dequeued);
  if (e - d >= MAX)
    return false;
  int x = CAS(&q->enqueued, e, e+1);
  if (x == e)
    break;
  e = x;
}

From bspencer at blackberry.com  Wed Feb 10 14:02:14 2021
From: bspencer at blackberry.com (Brad Spencer)
Date: Wed, 10 Feb 2021 10:02:14 -0400
Subject: [Wintun] DEPENDENTLOADFLAG for wintun.dll?
Message-ID: <6c752624-1195-ec77-c16a-9fd438cb11ae@blackberry.com>

Would it make sense to link the official wintun.dll with the MSVC 
linker's -DEPENDENTLOADFLAG:0x800 option?

https://docs.microsoft.com/en-us/cpp/build/reference/dependentloadflag

Doing so restricts the search path for immediate dependencies to the 
%windows%\system32\ directory, and I think all of the DLLs Wintun needs 
are there.

-- 
Brad Spencer


From Jason at zx2c4.com  Wed Feb 10 14:43:13 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Wed, 10 Feb 2021 15:43:13 +0100
Subject: [Wintun] DEPENDENTLOADFLAG for wintun.dll?
In-Reply-To: <6c752624-1195-ec77-c16a-9fd438cb11ae@blackberry.com>
References: <6c752624-1195-ec77-c16a-9fd438cb11ae@blackberry.com>
Message-ID: <CAHmME9o97YiTK6rjOmfE3G5BPaBzM_bdfiOBEGoBHbrDRzf9uA@mail.gmail.com>

Hi Brad,

On Wed, Feb 10, 2021 at 3:04 PM Brad Spencer <bspencer at blackberry.com> wrote:
>
> Would it make sense to link the official wintun.dll with the MSVC
> linker's -DEPENDENTLOADFLAG:0x800 option?
>
> https://docs.microsoft.com/en-us/cpp/build/reference/dependentloadflag
>
> Doing so restricts the search path for immediate dependencies to the
> %windows%\system32\ directory, and I think all of the DLLs Wintun needs
> are there.

That flag is a bit of a can of worms, which I haven't been too
inclined to open. See:
https://skanthak.homepage.t-online.de/snafu.html

Instead, wintun.dll uses delay loading for all DLLs except for
kernel32.dll and ntdll.dll, and then forces the delay loader hook
through LoadLibraryEx. See:
https://git.zx2c4.com/wintun/tree/api/entry.c#n25 You can see this in
action by putting wintun.dll into depends:
https://data.zx2c4.com/depends-for-wintun-dll-feb-2021.png

(CCing Stefan, in case he's curious. The DLLs in question are
https://www.wintun.net/builds/wintun-0.10.1.zip )

Jason

From b.braunger at syseleven.de  Mon Feb  8 20:05:30 2021
From: b.braunger at syseleven.de (Benedikt Braunger)
Date: Mon, 8 Feb 2021 21:05:30 +0100
Subject: Wireguard DKMS build on OpenVZ 7
Message-ID: <cf30a113-8621-ccff-36e1-e34e09545d49@syseleven.de>

Hello Tunnelerz,

Again I'm having trouble to compile the newest Wireguard DKMS module for
the wonderfully frankensteined OpenVZ / Virtuozzo 7 kernel.

The problem occures when updating the wireguard-dkms package

[17:32:10] root at test ~ # uname -a
Linux test 3.10.0-1127.18.2.vz7.163.46 #1 SMP Fri Nov 20 21:47:55 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux
[17:32:11] root at test ~ # yum update wireguard-dkms -y 

...

Updated:

? wireguard-dkms.noarch 1:1.0.20210124-1.el7??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? 

Complete!

[17:34:02] root at test ~ # dkms status
wireguard, 1.0.20210124: added

[17:34:03] root at test ~ # dkms autoinstall 
Kernel preparation unnecessary for this kernel.? Skipping...

Building module:

cleaning build area...

make -j48 KERNELRELEASE=3.10.0-1127.18.2.vz7.163.46 -C /lib/modules/3.10.0-1127.18.2.vz7.163.46/build M=/var/lib/dkms/wireguard/1.0.20210124/build...(bad exit status: 2)
Error! Bad return status for module build on kernel: 3.10.0-1127.18.2.vz7.163.46 (x86_64)
Consult /var/lib/dkms/wireguard/1.0.20210124/build/make.log for more information.

[17:34:14] root at test ~ # tail /var/lib/dkms/wireguard/1.0.20210124/build/make.log
? AS [M]? /var/lib/dkms/wireguard/1.0.20210124/build/crypto/zinc/chacha20/chacha20-x86_64.o
/var/lib/dkms/wireguard/1.0.20210124/build/crypto/zinc/chacha20/chacha20-x86_64.o: warning: objtool: chacha20_avx512vl()+0x35: can't find jump dest instruction at .text+0x1f3f
/var/lib/dkms/wireguard/1.0.20210124/build/socket.c: In function ?send6?:
/var/lib/dkms/wireguard/1.0.20210124/build/socket.c:139:18: error: ?const struct ipv6_stub? has no member named ?ipv6_dst_lookup_flow?
?? dst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(sock), sock, &fl,
????????????????? ^

make[1]: *** [/var/lib/dkms/wireguard/1.0.20210124/build/socket.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [_module_/var/lib/dkms/wireguard/1.0.20210124/build] Error 2
make: Leaving directory `/usr/src/kernels/3.10.0-1127.18.2.vz7.163.46'

As far as I understand this comes from the fact that
"ipv6_dst_lookup_flo" is not available in some kernels but is in others
and obviously it is used wrong here.

So I fixed this using the following workaround:

export VERSION='1.0.20210124';
yum install -y wireguard-dkms;
echo '#define ipv6_dst_lookup_flow(a, b, c, d) ipv6_dst_lookup(a, b, &dst, c) + (void *)0 ?: dst' >> /usr/src/wireguard-$VERSION/compat/compat.h;
dkms build wireguard/$VERSION && dkms install wireguard/$VERSION;
modprobe wireguard

[17:43:17] root at test~ # dkms status
wireguard, 1.0.20210124, 3.10.0-1127.18.2.vz7.163.46, x86_64: installed

I think the culprit is in
https://git.zx2c4.com/wireguard-linux-compat/tree/src/compat/compat.h#n92
and following

These conditions do not match for the OpenVZ systems as they have a
3.10.0 kernel and are RHEL based. I suggest an additional check like

?(LINUX_VERSION_CODE = KERNEL_VERSION(3, 10, 0) && defined(ISRHEL7))

but I am not sure if this could interfere with older RHEL versions.

Feedback or good ideas very welcome :-)

Regards,
Beni


From Marc-Antoine at Perennou.com  Tue Feb  9 15:58:43 2021
From: Marc-Antoine at Perennou.com (Marc-Antoine Perennou)
Date: Tue,  9 Feb 2021 16:58:43 +0100
Subject: [PATCH] wg-quick: add syncconf
Message-ID: <20210209155843.2100191-1-Marc-Antoine@Perennou.com>

Simplifies the process to reload an updated configuration, avoid subshells

Signed-off-by: Marc-Antoine Perennou <Marc-Antoine at Perennou.com>
---
 src/man/wg-quick.8            |  4 ++++
 src/systemd/wg-quick at .service |  2 +-
 src/wg-quick/darwin.bash      | 13 ++++++++++++-
 src/wg-quick/freebsd.bash     | 11 ++++++++++-
 src/wg-quick/linux.bash       | 11 ++++++++++-
 src/wg-quick/openbsd.bash     | 11 ++++++++++-
 6 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/src/man/wg-quick.8 b/src/man/wg-quick.8
index b84eb64..f52a3fe 100644
--- a/src/man/wg-quick.8
+++ b/src/man/wg-quick.8
@@ -256,6 +256,10 @@ sessions:
 
 \fB    # wg syncconf wgnet0 <(wg-quick strip wgnet0)\fP
 
+You can also use the \fIsyncconf\fP command for the same purpose
+
+\fB    # wg-quick syncconf wgnet0\fP
+
 .SH SEE ALSO
 .BR wg (8),
 .BR ip (8),
diff --git a/src/systemd/wg-quick at .service b/src/systemd/wg-quick at .service
index dbdab44..cb8b3a9 100644
--- a/src/systemd/wg-quick at .service
+++ b/src/systemd/wg-quick at .service
@@ -15,7 +15,7 @@ Type=oneshot
 RemainAfterExit=yes
 ExecStart=/usr/bin/wg-quick up %i
 ExecStop=/usr/bin/wg-quick down %i
-ExecReload=/bin/bash -c 'exec /usr/bin/wg syncconf %i <(exec /usr/bin/wg-quick strip %i)'
+ExecReload=/usr/bin/wg-quick syncconf %i
 Environment=WG_ENDPOINT_RESOLUTION_RETRIES=infinity
 
 [Install]
diff --git a/src/wg-quick/darwin.bash b/src/wg-quick/darwin.bash
index cde1b54..94c669e 100755
--- a/src/wg-quick/darwin.bash
+++ b/src/wg-quick/darwin.bash
@@ -418,7 +418,7 @@ execute_hooks() {
 
 cmd_usage() {
 	cat >&2 <<-_EOF
-	Usage: $PROGRAM [ up | down | save | strip ] [ CONFIG_FILE | INTERFACE ]
+	Usage: $PROGRAM [ up | down | save | strip | syncconf ] [ CONFIG_FILE | INTERFACE ]
 
 	  CONFIG_FILE is a configuration file, whose filename is the interface name
 	  followed by \`.conf'. Otherwise, INTERFACE is an interface name, with
@@ -489,6 +489,13 @@ cmd_strip() {
 	echo "$WG_CONFIG"
 }
 
+cmd_syncconf() {
+	if ! get_real_interface || [[ " $(wg show interfaces) " != *" $REAL_INTERFACE "* ]]; then
+		die "\`$INTERFACE' is not a WireGuard interface"
+	fi
+	cmd wg syncconf "$REAL_INTERFACE" <(echo "$WG_CONFIG")
+}
+
 # ~~ function override insertion point ~~
 
 if [[ $# -eq 1 && ( $1 == --help || $1 == -h || $1 == help ) ]]; then
@@ -510,6 +517,10 @@ elif [[ $# -eq 2 && $1 == strip ]]; then
 	auto_su
 	parse_options "$2"
 	cmd_strip
+elif [[ $# -eq 2 && $1 == syncconf ]]; then
+	auto_su
+	parse_options "$2"
+	cmd_syncconf
 else
 	cmd_usage
 	exit 1
diff --git a/src/wg-quick/freebsd.bash b/src/wg-quick/freebsd.bash
index e1ee67f..9415926 100755
--- a/src/wg-quick/freebsd.bash
+++ b/src/wg-quick/freebsd.bash
@@ -387,7 +387,7 @@ execute_hooks() {
 
 cmd_usage() {
 	cat >&2 <<-_EOF
-	Usage: $PROGRAM [ up | down | save | strip ] [ CONFIG_FILE | INTERFACE ]
+	Usage: $PROGRAM [ up | down | save | strip | syncconf ] [ CONFIG_FILE | INTERFACE ]
 
 	  CONFIG_FILE is a configuration file, whose filename is the interface name
 	  followed by \`.conf'. Otherwise, INTERFACE is an interface name, with
@@ -454,6 +454,11 @@ cmd_strip() {
 	echo "$WG_CONFIG"
 }
 
+cmd_syncconf() {
+	[[ " $(wg show interfaces) " == *" $INTERFACE "* ]] || die "\`$INTERFACE' is not a WireGuard interface"
+	cmd wg syncconf "$INTERFACE" <(echo "$WG_CONFIG")
+}
+
 # ~~ function override insertion point ~~
 
 make_temp
@@ -477,6 +482,10 @@ elif [[ $# -eq 2 && $1 == strip ]]; then
 	auto_su
 	parse_options "$2"
 	cmd_strip
+elif [[ $# -eq 2 && $1 == syncconf ]]; then
+	auto_su
+	parse_options "$2"
+	cmd_syncconf
 else
 	cmd_usage
 	exit 1
diff --git a/src/wg-quick/linux.bash b/src/wg-quick/linux.bash
index e4d4c4f..83ae4a8 100755
--- a/src/wg-quick/linux.bash
+++ b/src/wg-quick/linux.bash
@@ -298,7 +298,7 @@ execute_hooks() {
 
 cmd_usage() {
 	cat >&2 <<-_EOF
-	Usage: $PROGRAM [ up | down | save | strip ] [ CONFIG_FILE | INTERFACE ]
+	Usage: $PROGRAM [ up | down | save | strip | syncconf ] [ CONFIG_FILE | INTERFACE ]
 
 	  CONFIG_FILE is a configuration file, whose filename is the interface name
 	  followed by \`.conf'. Otherwise, INTERFACE is an interface name, with
@@ -361,6 +361,11 @@ cmd_strip() {
 	echo "$WG_CONFIG"
 }
 
+cmd_syncconf() {
+	[[ " $(wg show interfaces) " == *" $INTERFACE "* ]] || die "\`$INTERFACE' is not a WireGuard interface"
+	cmd wg syncconf "$INTERFACE" <(echo "$WG_CONFIG")
+}
+
 # ~~ function override insertion point ~~
 
 if [[ $# -eq 1 && ( $1 == --help || $1 == -h || $1 == help ) ]]; then
@@ -381,6 +386,10 @@ elif [[ $# -eq 2 && $1 == strip ]]; then
 	auto_su
 	parse_options "$2"
 	cmd_strip
+elif [[ $# -eq 2 && $1 == syncconf ]]; then
+	auto_su
+	parse_options "$2"
+	cmd_syncconf
 else
 	cmd_usage
 	exit 1
diff --git a/src/wg-quick/openbsd.bash b/src/wg-quick/openbsd.bash
index 15550c8..6d0efa8 100755
--- a/src/wg-quick/openbsd.bash
+++ b/src/wg-quick/openbsd.bash
@@ -376,7 +376,7 @@ execute_hooks() {
 
 cmd_usage() {
 	cat >&2 <<-_EOF
-	Usage: $PROGRAM [ up | down | save | strip ] [ CONFIG_FILE | INTERFACE ]
+	Usage: $PROGRAM [ up | down | save | strip | syncconf ] [ CONFIG_FILE | INTERFACE ]
 
 	  CONFIG_FILE is a configuration file, whose filename is the interface name
 	  followed by \`.conf'. Otherwise, INTERFACE is an interface name, with
@@ -441,6 +441,11 @@ cmd_strip() {
 	echo "$WG_CONFIG"
 }
 
+cmd_syncconf() {
+	get_real_interface || die "\`$INTERFACE' is not a WireGuard interface"
+	cmd wg syncconf "$REAL_INTERFACE" <(echo "$WG_CONFIG")
+}
+
 # ~~ function override insertion point ~~
 
 if [[ $# -eq 1 && ( $1 == --help || $1 == -h || $1 == help ) ]]; then
@@ -461,6 +466,10 @@ elif [[ $# -eq 2 && $1 == strip ]]; then
 	auto_su
 	parse_options "$2"
 	cmd_strip
+elif [[ $# -eq 2 && $1 == syncconf ]]; then
+	auto_su
+	parse_options "$2"
+	cmd_syncconf
 else
 	cmd_usage
 	exit 1
-- 
2.30.0


From stefan.kanthak at nexgo.de  Wed Feb 10 14:57:40 2021
From: stefan.kanthak at nexgo.de (Stefan Kanthak)
Date: Wed, 10 Feb 2021 15:57:40 +0100
Subject: [Wintun] DEPENDENTLOADFLAG for wintun.dll?
In-Reply-To: <CAHmME9o97YiTK6rjOmfE3G5BPaBzM_bdfiOBEGoBHbrDRzf9uA@mail.gmail.com>
References: <6c752624-1195-ec77-c16a-9fd438cb11ae@blackberry.com>
 <CAHmME9o97YiTK6rjOmfE3G5BPaBzM_bdfiOBEGoBHbrDRzf9uA@mail.gmail.com>
Message-ID: <25F2CB0BAE7B48F7B5250BA5BBF68749@H270>

"Jason A. Donenfeld" <Jason at zx2c4.com> wrote:

> Hi Brad,
> 
> On Wed, Feb 10, 2021 at 3:04 PM Brad Spencer <bspencer at blackberry.com> wrote:
>>
>> Would it make sense to link the official wintun.dll with the MSVC
>> linker's -DEPENDENTLOADFLAG:0x800 option?
>>
>> https://docs.microsoft.com/en-us/cpp/build/reference/dependentloadflag
>>
>> Doing so restricts the search path for immediate dependencies to the
>> %windows%\system32\ directory, and I think all of the DLLs Wintun needs
>> are there.

This flag is supported only on current versions of Windows 10.
Since Wireguard still supports Windows 7 and 8 you but need the "classic"
mitigation there, i.e. delay-loading and your own delay-loading routine, as
Jason writes below.

> That flag is a bit of a can of worms, which I haven't been too
> inclined to open. See:
> https://skanthak.homepage.t-online.de/snafu.html

This flag also doesn't help with exports forwarded to "unknown" DLLs,
neither with /DEPENDENTLOADFLAG:... nor with LoadLibraryEx(): see
https://skanthak.homepage.t-online.de/detour.html

> Instead, wintun.dll uses delay loading for all DLLs except for
> kernel32.dll and ntdll.dll, and then forces the delay loader hook
> through LoadLibraryEx. See:
> https://git.zx2c4.com/wintun/tree/api/entry.c#n25 You can see this in
> action by putting wintun.dll into depends:
> https://data.zx2c4.com/depends-for-wintun-dll-feb-2021.png

Stefan

From bspencer at blackberry.com  Wed Feb 10 17:52:58 2021
From: bspencer at blackberry.com (Brad Spencer)
Date: Wed, 10 Feb 2021 13:52:58 -0400
Subject: [Wintun] DEPENDENTLOADFLAG for wintun.dll?
In-Reply-To: <25F2CB0BAE7B48F7B5250BA5BBF68749@H270>
References: <6c752624-1195-ec77-c16a-9fd438cb11ae@blackberry.com>
 <CAHmME9o97YiTK6rjOmfE3G5BPaBzM_bdfiOBEGoBHbrDRzf9uA@mail.gmail.com>
 <25F2CB0BAE7B48F7B5250BA5BBF68749@H270>
Message-ID: <61250c12-c74c-e94b-bbe7-cf9aaccd031f@blackberry.com>

On 2021-02-10 10:57 a.m., Stefan Kanthak wrote:
> This flag is supported only on current versions of Windows 10.
> Since Wireguard still supports Windows 7 and 8 you but need the "classic"
> mitigation there, i.e. delay-loading and your own delay-loading routine, as
> Jason writes below.

Thanks.? I have actually read your pages previously, Stefan, but I 
neglected to dig in to how wintun.dll loads its dependencies already.? 
Thanks to you both for the comprehensive replies.

-- 

Brad Spencer


From iiordanov at gmail.com  Mon Feb  8 10:42:12 2021
From: iiordanov at gmail.com (i iordanov)
Date: Mon, 8 Feb 2021 05:42:12 -0500
Subject: Nested Wireguard tunnels not working on Android
Message-ID: <CAMS0tn3oqdty_DKoxkQF0jNFGfS5i+hhC20SBOChOuZAN67CAw@mail.gmail.com>

Hello,

In order to allow traffic to assist devices that cannot reach each
other directly, I am setting up wireguard tunnels through a server
with a public IP (40.30.40.30 in the example below).

For reasons of privacy, I'd like for the server to not be able to
decrypt my traffic. As a result, I would like for one encapsulating
Wireguard tunnel (subnet 10.1.2.0/24) to be peered through the server,
while a second nested Wireguard tunnel (subnet 10.1.3.0/24) to be
established through the first tunnel, peered only at the two devices
(Android and Linux in this case) that need to communicate.

An attempt was made to use a single Wireguard interface. Doing it this
way works between two Linux machines and even between Linux and Mac OS
X, but does not work between a Pixel 3a XL running Android 11 with the
GoBackend Wireguard implementation and my Linux laptop.

The config on the Android device, obtained with toWgQuickString():
======================================
    [Interface]
    Address = 10.1.2.5/24, 10.1.3.5/24
    ListenPort = 46847
    MTU = 1200
    PrivateKey = PRIVATE_KEY

    [Peer]
    AllowedIPs = 10.1.2.0/24
    Endpoint = 40.30.40.30:10000
    PersistentKeepalive = 3600
    PublicKey = VF5dic+a+6MllssbV+ShVwEBRrX9gr4do2iNylWrPGs=

    [Peer]
    AllowedIPs = 10.1.3.1/32
    Endpoint = 10.1.2.1:51555
    PersistentKeepalive = 3600
    PublicKey = 0Awdb451Z4+3Gezm7UlbRquC1kcF52r68J9wG1x/zUE=
======================================

The 10.1.2.0/24 subnet is the one that is "visible" to the public
server. The 10.1.3.0/24 subnet is the one that is private to the two
devices.

The devices can actually reach each other with netcat over UDP at
10.1.2.5:46847 and 10.1.2.1:51555 respectively. So the "encapsulating"
tunnel is working, and iperf3 were used to test it over UDP and TCP
successfully.

The "nested" tunnel does not get established.

The following permutations of the above config have the commented problems:

# Only 10.1.2.0/24 works, 10.1.3.0/24 does not.
    Address = 10.1.2.1/24, 10.1.3.1/24

# Only 10.1.2.0/24 works, 10.1.3.0/24, as expected, does not.
    Address = 10.1.2.1/24

# Neither network works
    Address = 10.1.3.1/24, 10.1.2.1/24

Suspecting routing, i ran ip route over adb, and obtained:
===================================
$ ip route show table 0 | grep 10.1
10.1.2.0/24 dev tun0 table 1548 proto static scope link
10.1.3.0/24 dev tun0 table 1548 proto static scope link
10.1.3.1 dev tun0 table 1548 proto static scope link
10.1.2.0/24 dev tun0 proto kernel scope link src 10.1.2.5
10.1.3.0/24 dev tun0 proto kernel scope link src 10.1.3.5
broadcast 10.1.2.0 dev tun0 table local proto kernel scope link src 10.1.2.5
local 10.1.2.5 dev tun0 table local proto kernel scope host src 10.1.2.5
broadcast 10.1.2.255 dev tun0 table local proto kernel scope link src 10.1.2.5
broadcast 10.1.3.0 dev tun0 table local proto kernel scope link src 10.1.3.5
local 10.1.3.5 dev tun0 table local proto kernel scope host src 10.1.3.5
broadcast 10.1.3.255 dev tun0 table local proto kernel scope link src 10.1.3.5
======================================

ip addr over adb shows:
======================================
    550: tun0: <POINTOPOINT,UP,LOWER_UP> mtu 1200 qdisc pfifo_fast
state UNKNOWN group default qlen 500
    link/none
    inet 10.1.2.5/24 scope global tun0
       valid_lft forever preferred_lft forever
    inet 10.1.3.5/24 scope global tun0:1
       valid_lft forever preferred_lft forever
======================================

On the Android logcat, the log appears to show handshakes exchanged:
======================================
peer(VF5d?rPGs) - Received handshake response
peer(VF5d?rPGs) - Sending keepalive packet
peer(0Awd?/zUE) - Received handshake initiation
peer(0Awd?/zUE) - Sending handshake response
======================================

The other device (not the public server) is a Linux box. Dmesg shows
======================================
[334831.125034] wireguard: LinuxWg: Handshake for peer 520
(10.1.2.5:46847) did not complete after 5 seconds, retrying (try 17)
[334831.125062] wireguard: LinuxWg: Sending handshake initiation to
peer 520 (10.1.2.5:46847)
======================================

wg showconf shows:
======================================
[Interface]
ListenPort = 51555
PrivateKey = PRIVATE_KEY

[Peer]
PublicKey = BOApHt2nj7Tvm/LAGpYB9/2KsZ8iYkWjfEUEUm7x6Q0=
AllowedIPs = 10.1.3.5/32
Endpoint = 10.1.2.5:46847
PersistentKeepalive = 25
======================================

wg show:
======================================
interface: LinuxWg
  public key: 0Awdb451Z4+3Gezm7UlbRquC1kcF52r68J9wG1x/zUE=
  private key: (hidden)
  listening port: 51555

peer: BOApHt2nj7Tvm/LAGpYB9/2KsZ8iYkWjfEUEUm7x6Q0=
  endpoint: 10.1.2.5:46847
  allowed ips: 10.1.3.5/32
  transfer: 0 B received, 37.00 KiB sent
  persistent keepalive: every 25 seconds

interface: LinuxWg2
  public key: Bb92MANIA5rzukELvNdTXMDWaBAi8+T8s7C+nnytRiE=
  private key: (hidden)
  listening port: 51556

peer: VF5dic+a+6MllssbV+ShVwEBRrX9gr4do2iNylWrPGs=
  endpoint: 40.30.40.30:10000
  allowed ips: 10.1.2.0/24
  latest handshake: 1 minute, 22 seconds ago
  transfer: 11.89 KiB received, 61.08 KiB sent
  persistent keepalive: every 25 seconds
======================================

Kernel:
======================================
Linux hostname 5.4.0-59-generic #65~18.04.1-Ubuntu SMP Mon Dec 14
15:59:40 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
======================================

Would you expect for this to work with GoBackend, or is there an
inherent limitation that would break it?

Any suggestions on what to do differently are welcome!

Thank you very much,
iordan


-- 
The conscious mind has only one thread of execution.

From birdman_hsu at icloud.com  Fri Feb 12 02:33:49 2021
From: birdman_hsu at icloud.com (=?utf-8?B?5b6Q5ZWf6ICA?=)
Date: Fri, 12 Feb 2021 10:33:49 +0800
Subject: Possible to have direct download link for WireGuard macOS client?
Message-ID: <49FB35F7-ABFB-4AE3-AEB2-EB7A0B76AC55@icloud.com>

As many users in China, there are some difficulties downloading WireGuard macOS client from Apple App Store, wondering if there are some direct links able to download it?

Thanks/Br, 

Birdman Hsu 

Sent from my iPad Pro 11!

From Jason at zx2c4.com  Fri Feb 12 16:50:36 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Fri, 12 Feb 2021 17:50:36 +0100
Subject: com.wireguard.android:tunnel moved to Maven Central
Message-ID: <CAHmME9pVX7AOFuPD8kZZ8nptFS7ZwL78POj-Ch1qHyCpLYJr8A@mail.gmail.com>

Hi,

Ahead of JCenter shutting down, com.wireguard.android:tunnel has moved
to Maven Central:
https://search.maven.org/artifact/com.wireguard.android/tunnel

If you previously had in your Gradle:

repositories {
    ...
    jcenter()
}

You can now replace that with:

repositories {
    ...
    mavenCentral()
    jcenter()
}

Or simply remove the jcenter() line if it's no longer needed by other
dependencies you might have.

More information about com.wireguard.android:tunnel is available at:
https://git.zx2c4.com/wireguard-android/about/#embedding

Information about integrating WireGuard into other platforms is available at:
https://www.wireguard.com/embedding/

Thanks,
Jason

From clint at openziti.org  Mon Feb 15 13:32:52 2021
From: clint at openziti.org (Clint Dovholuk)
Date: Mon, 15 Feb 2021 08:32:52 -0500
Subject: Wintun changelog?
Message-ID: <b17cf1b329499660a7f242671b8f8114@mail.gmail.com>

This is not a wireguard issue/question, this is one for wintun. Is there a
changelog for Wintun itself? We're using wintun and we're seeing some
strange behavior in windows around DHCP somehow not working. The individual
reports that turning our software off (which closes the wintun
session/removes the adapter) 'fixes the problem' but an ipconfig /all taken
before this action shows the user having no valid IPv4 address. They have
the 169.*.*.* assigned IP.  Our software doesn't/shouldn't interfere with
DHCP at all and I doubt wintun would but I did notice we used wintun 0.10.0
and 0.10.1 is out.

I was wondering if there's a changelog somewhere that I didn't find - I
checked all the regular places but just didn't locate it. Hopefully I didn't
miss it. I expect to read the commit log to determine the changes in
between.

Thanks

From Jason at zx2c4.com  Mon Feb 15 18:04:31 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Mon, 15 Feb 2021 19:04:31 +0100
Subject: Wintun changelog?
In-Reply-To: <b17cf1b329499660a7f242671b8f8114@mail.gmail.com>
References: <b17cf1b329499660a7f242671b8f8114@mail.gmail.com>
Message-ID: <CAHmME9qRbAhk3ms6azd3D2GxE347okmBoDyH3Z-QAUVfiUqsdQ@mail.gmail.com>

Changes are available in the git repo:

https://git.zx2c4.com/wintun/log/

Regarding DHCP -- Wintun is Layer 3, so DHCP doesn't go over it. I'm
not sure what the user in question is up to but it's hard to narrow
this down without extensive information.

From toke at toke.dk  Wed Feb 17 18:36:35 2021
From: toke at toke.dk (Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?=)
Date: Wed, 17 Feb 2021 19:36:35 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <20210208133816.45333-1-Jason@zx2c4.com>
References: <20210208133816.45333-1-Jason@zx2c4.com>
Message-ID: <87czwymtho.fsf@toke.dk>

"Jason A. Donenfeld" <Jason at zx2c4.com> writes:

> Having two ring buffers per-peer means that every peer results in two
> massive ring allocations. On an 8-core x86_64 machine, this commit
> reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which
> is an 90% reduction. Ninety percent! With some single-machine
> deployments approaching 400,000 peers, we're talking about a reduction
> from 7 gigs of memory down to 700 megs of memory.
>
> In order to get rid of these per-peer allocations, this commit switches
> to using a list-based queueing approach. Currently GSO fragments are
> chained together using the skb->next pointer, so we form the per-peer
> queue around the unused skb->prev pointer, which makes sense because the
> links are pointing backwards.

"which makes sense because the links are pointing backwards" - huh?

> Multiple cores can write into the queue at any given time, because its
> writes occur in the start_xmit path or in the udp_recv path. But reads
> happen in a single workqueue item per-peer, amounting to a
> multi-producer, single-consumer paradigm.
>
> The MPSC queue is implemented locklessly and never blocks. However, it
> is not linearizable (though it is serializable), with a very tight and
> unlikely race on writes, which, when hit (about 0.15% of the time on a
> fully loaded 16-core x86_64 system), causes the queue reader to
> terminate early. However, because every packet sent queues up the same
> workqueue item after it is fully added, the queue resumes again, and
> stopping early isn't actually a problem, since at that point the packet
> wouldn't have yet been added to the encryption queue. These properties
> allow us to avoid disabling interrupts or spinning.

Wow, so this was a fascinating rabbit hole into the concurrent algorithm
realm, thanks to Dmitry's link to his original posting of the algorithm.
Maybe referencing the origin of the algorithm would be nice for context
and posterity (as well as commenting it so the original properties are
not lost if the source should disappear)?

> Performance-wise, ordinarily list-based queues aren't preferable to
> ringbuffers, because of cache misses when following pointers around.
> However, we *already* have to follow the adjacent pointers when working
> through fragments, so there shouldn't actually be any change there. A
> potential downside is that dequeueing is a bit more complicated, but the
> ptr_ring structure used prior had a spinlock when dequeueing, so all and
> all the difference appears to be a wash.
>
> Actually, from profiling, the biggest performance hit, by far, of this
> commit winds up being atomic_add_unless(count, 1, max) and atomic_
> dec(count), which account for the majority of CPU time, according to
> perf. In that sense, the previous ring buffer was superior in that it
> could check if it was full by head==tail, which the list-based approach
> cannot do.

Are these performance measurements are based on micro-benchmarks of the
queueing structure, or overall wireguard performance? Do you see any
measurable difference in the overall performance (i.e., throughput
drop)? And what about relative to using one of the existing skb queueing
primitives in the kernel? Including some actual numbers would be nice to
justify adding yet-another skb queueing scheme to the kernel :)

I say this also because the actual queueing of the packets has never
really shown up on any performance radar in the qdisc and mac80211
layers, which both use traditional spinlock-protected queueing
structures. Now Wireguard does have a somewhat unusual structure with
the MPSC pattern, so it may of course be different here. But quantifying
that would be good; also for figuring out if this algorithm might be
useful in other areas as well (and don't get me wrong, I'm fascinated by
it!).

> Cc: Dmitry Vyukov <dvyukov at google.com>
> Signed-off-by: Jason A. Donenfeld <Jason at zx2c4.com>
> ---
> Hoping to get some feedback here from people running massive deployments
> and running into ram issues, as well as Dmitry on the queueing semantics
> (the mpsc queue is his design), before I send this to Dave for merging.
> These changes are quite invasive, so I don't want to get anything wrong.
>
>  drivers/net/wireguard/device.c   | 12 ++---
>  drivers/net/wireguard/device.h   | 15 +++---
>  drivers/net/wireguard/peer.c     | 29 ++++-------
>  drivers/net/wireguard/peer.h     |  4 +-
>  drivers/net/wireguard/queueing.c | 82 +++++++++++++++++++++++++-------
>  drivers/net/wireguard/queueing.h | 45 +++++++++++++-----
>  drivers/net/wireguard/receive.c  | 16 +++----
>  drivers/net/wireguard/send.c     | 31 +++++-------
>  8 files changed, 141 insertions(+), 93 deletions(-)
>
> diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
> index cd51a2afa28e..d744199823b3 100644
> --- a/drivers/net/wireguard/device.c
> +++ b/drivers/net/wireguard/device.c
> @@ -234,8 +234,8 @@ static void wg_destruct(struct net_device *dev)
>  	destroy_workqueue(wg->handshake_receive_wq);
>  	destroy_workqueue(wg->handshake_send_wq);
>  	destroy_workqueue(wg->packet_crypt_wq);
> -	wg_packet_queue_free(&wg->decrypt_queue, true);
> -	wg_packet_queue_free(&wg->encrypt_queue, true);
> +	wg_packet_queue_free(&wg->decrypt_queue);
> +	wg_packet_queue_free(&wg->encrypt_queue);
>  	rcu_barrier(); /* Wait for all the peers to be actually freed. */
>  	wg_ratelimiter_uninit();
>  	memzero_explicit(&wg->static_identity, sizeof(wg->static_identity));
> @@ -337,12 +337,12 @@ static int wg_newlink(struct net *src_net, struct net_device *dev,
>  		goto err_destroy_handshake_send;
>  
>  	ret = wg_packet_queue_init(&wg->encrypt_queue, wg_packet_encrypt_worker,
> -				   true, MAX_QUEUED_PACKETS);
> +				   MAX_QUEUED_PACKETS);
>  	if (ret < 0)
>  		goto err_destroy_packet_crypt;
>  
>  	ret = wg_packet_queue_init(&wg->decrypt_queue, wg_packet_decrypt_worker,
> -				   true, MAX_QUEUED_PACKETS);
> +				   MAX_QUEUED_PACKETS);
>  	if (ret < 0)
>  		goto err_free_encrypt_queue;
>  
> @@ -367,9 +367,9 @@ static int wg_newlink(struct net *src_net, struct net_device *dev,
>  err_uninit_ratelimiter:
>  	wg_ratelimiter_uninit();
>  err_free_decrypt_queue:
> -	wg_packet_queue_free(&wg->decrypt_queue, true);
> +	wg_packet_queue_free(&wg->decrypt_queue);
>  err_free_encrypt_queue:
> -	wg_packet_queue_free(&wg->encrypt_queue, true);
> +	wg_packet_queue_free(&wg->encrypt_queue);
>  err_destroy_packet_crypt:
>  	destroy_workqueue(wg->packet_crypt_wq);
>  err_destroy_handshake_send:
> diff --git a/drivers/net/wireguard/device.h b/drivers/net/wireguard/device.h
> index 4d0144e16947..cb919f2ad1f8 100644
> --- a/drivers/net/wireguard/device.h
> +++ b/drivers/net/wireguard/device.h
> @@ -27,13 +27,14 @@ struct multicore_worker {
>  
>  struct crypt_queue {
>  	struct ptr_ring ring;
> -	union {
> -		struct {
> -			struct multicore_worker __percpu *worker;
> -			int last_cpu;
> -		};
> -		struct work_struct work;
> -	};
> +	struct multicore_worker __percpu *worker;
> +	int last_cpu;
> +};
> +
> +struct prev_queue {
> +	struct sk_buff *head, *tail, *peeked;
> +	struct { struct sk_buff *next, *prev; } empty;
> +	atomic_t count;
>  };
>  
>  struct wg_device {
> diff --git a/drivers/net/wireguard/peer.c b/drivers/net/wireguard/peer.c
> index b3b6370e6b95..1969fc22d47e 100644
> --- a/drivers/net/wireguard/peer.c
> +++ b/drivers/net/wireguard/peer.c
> @@ -32,27 +32,22 @@ struct wg_peer *wg_peer_create(struct wg_device *wg,
>  	peer = kzalloc(sizeof(*peer), GFP_KERNEL);
>  	if (unlikely(!peer))
>  		return ERR_PTR(ret);
> -	peer->device = wg;
> +	if (dst_cache_init(&peer->endpoint_cache, GFP_KERNEL))
> +		goto err;
>  
> +	peer->device = wg;
>  	wg_noise_handshake_init(&peer->handshake, &wg->static_identity,
>  				public_key, preshared_key, peer);
> -	if (dst_cache_init(&peer->endpoint_cache, GFP_KERNEL))
> -		goto err_1;
> -	if (wg_packet_queue_init(&peer->tx_queue, wg_packet_tx_worker, false,
> -				 MAX_QUEUED_PACKETS))
> -		goto err_2;
> -	if (wg_packet_queue_init(&peer->rx_queue, NULL, false,
> -				 MAX_QUEUED_PACKETS))
> -		goto err_3;
> -
>  	peer->internal_id = atomic64_inc_return(&peer_counter);
>  	peer->serial_work_cpu = nr_cpumask_bits;
>  	wg_cookie_init(&peer->latest_cookie);
>  	wg_timers_init(peer);
>  	wg_cookie_checker_precompute_peer_keys(peer);
>  	spin_lock_init(&peer->keypairs.keypair_update_lock);
> -	INIT_WORK(&peer->transmit_handshake_work,
> -		  wg_packet_handshake_send_worker);
> +	INIT_WORK(&peer->transmit_handshake_work, wg_packet_handshake_send_worker);
> +	INIT_WORK(&peer->transmit_packet_work, wg_packet_tx_worker);

It's not quite clear to me why changing the queue primitives requires
adding another work queue?

> +	wg_prev_queue_init(&peer->tx_queue);
> +	wg_prev_queue_init(&peer->rx_queue);
>  	rwlock_init(&peer->endpoint_lock);
>  	kref_init(&peer->refcount);
>  	skb_queue_head_init(&peer->staged_packet_queue);
> @@ -68,11 +63,7 @@ struct wg_peer *wg_peer_create(struct wg_device *wg,
>  	pr_debug("%s: Peer %llu created\n", wg->dev->name, peer->internal_id);
>  	return peer;
>  
> -err_3:
> -	wg_packet_queue_free(&peer->tx_queue, false);
> -err_2:
> -	dst_cache_destroy(&peer->endpoint_cache);
> -err_1:
> +err:
>  	kfree(peer);
>  	return ERR_PTR(ret);
>  }
> @@ -197,8 +188,8 @@ static void rcu_release(struct rcu_head *rcu)
>  	struct wg_peer *peer = container_of(rcu, struct wg_peer, rcu);
>  
>  	dst_cache_destroy(&peer->endpoint_cache);
> -	wg_packet_queue_free(&peer->rx_queue, false);
> -	wg_packet_queue_free(&peer->tx_queue, false);
> +	WARN_ON(wg_prev_queue_dequeue(&peer->tx_queue) || peer->tx_queue.peeked);
> +	WARN_ON(wg_prev_queue_dequeue(&peer->rx_queue) || peer->rx_queue.peeked);
>  
>  	/* The final zeroing takes care of clearing any remaining handshake key
>  	 * material and other potentially sensitive information.
> diff --git a/drivers/net/wireguard/peer.h b/drivers/net/wireguard/peer.h
> index aaff8de6e34b..8d53b687a1d1 100644
> --- a/drivers/net/wireguard/peer.h
> +++ b/drivers/net/wireguard/peer.h
> @@ -36,7 +36,7 @@ struct endpoint {
>  
>  struct wg_peer {
>  	struct wg_device *device;
> -	struct crypt_queue tx_queue, rx_queue;
> +	struct prev_queue tx_queue, rx_queue;
>  	struct sk_buff_head staged_packet_queue;
>  	int serial_work_cpu;
>  	bool is_dead;
> @@ -46,7 +46,7 @@ struct wg_peer {
>  	rwlock_t endpoint_lock;
>  	struct noise_handshake handshake;
>  	atomic64_t last_sent_handshake;
> -	struct work_struct transmit_handshake_work, clear_peer_work;
> +	struct work_struct transmit_handshake_work, clear_peer_work, transmit_packet_work;
>  	struct cookie latest_cookie;
>  	struct hlist_node pubkey_hash;
>  	u64 rx_bytes, tx_bytes;
> diff --git a/drivers/net/wireguard/queueing.c b/drivers/net/wireguard/queueing.c
> index 71b8e80b58e1..a72380ce97dd 100644
> --- a/drivers/net/wireguard/queueing.c
> +++ b/drivers/net/wireguard/queueing.c
> @@ -9,8 +9,7 @@ struct multicore_worker __percpu *
>  wg_packet_percpu_multicore_worker_alloc(work_func_t function, void *ptr)
>  {
>  	int cpu;
> -	struct multicore_worker __percpu *worker =
> -		alloc_percpu(struct multicore_worker);
> +	struct multicore_worker __percpu *worker = alloc_percpu(struct multicore_worker);
>  
>  	if (!worker)
>  		return NULL;
> @@ -23,7 +22,7 @@ wg_packet_percpu_multicore_worker_alloc(work_func_t function, void *ptr)
>  }
>  
>  int wg_packet_queue_init(struct crypt_queue *queue, work_func_t function,
> -			 bool multicore, unsigned int len)
> +			 unsigned int len)
>  {
>  	int ret;
>  
> @@ -31,25 +30,74 @@ int wg_packet_queue_init(struct crypt_queue *queue, work_func_t function,
>  	ret = ptr_ring_init(&queue->ring, len, GFP_KERNEL);
>  	if (ret)
>  		return ret;
> -	if (function) {
> -		if (multicore) {
> -			queue->worker = wg_packet_percpu_multicore_worker_alloc(
> -				function, queue);
> -			if (!queue->worker) {
> -				ptr_ring_cleanup(&queue->ring, NULL);
> -				return -ENOMEM;
> -			}
> -		} else {
> -			INIT_WORK(&queue->work, function);
> -		}
> +	queue->worker = wg_packet_percpu_multicore_worker_alloc(function, queue);
> +	if (!queue->worker) {
> +		ptr_ring_cleanup(&queue->ring, NULL);
> +		return -ENOMEM;
>  	}
>  	return 0;
>  }
>  
> -void wg_packet_queue_free(struct crypt_queue *queue, bool multicore)
> +void wg_packet_queue_free(struct crypt_queue *queue)
>  {
> -	if (multicore)
> -		free_percpu(queue->worker);
> +	free_percpu(queue->worker);
>  	WARN_ON(!__ptr_ring_empty(&queue->ring));
>  	ptr_ring_cleanup(&queue->ring, NULL);
>  }
> +

It would be nice to add a comment block here explaining the algorithm,
with a link to the original implementation and the same reasoning as you
have in the commit message. And some of the explanation from the
original thread would be nice. But if you do copy that, please for the
love of $DEITY, expand the acronyms - it took me half an hour of
extremely frustrating Googling to figure out what PDR means! :D

(I finally found out that it means "Partial copy-on-write Deferred
Reclamation" in another of Dmitry's replies here:
https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Non-blocking-data-structures-vs-garbage-collection/td-p/847215 )

> +#define NEXT(skb) ((skb)->prev)

In particular, please explain this oxymoronic define :)

> +#define STUB(queue) ((struct sk_buff *)&queue->empty)
> +
> +void wg_prev_queue_init(struct prev_queue *queue)
> +{
> +	NEXT(STUB(queue)) = NULL;
> +	queue->head = queue->tail = STUB(queue);
> +	queue->peeked = NULL;
> +	atomic_set(&queue->count, 0);
> +}
> +
> +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
> +{
> +	WRITE_ONCE(NEXT(skb), NULL);
> +	smp_wmb();
> +	WRITE_ONCE(NEXT(xchg_relaxed(&queue->head, skb)), skb);

While this is nice and compact it's also really hard to read. It's also
hiding the "race condition" between the xchg() and setting the next ptr.
So why not split it between two lines and make the race explicit with a
comment?

> +}
> +
> +bool wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
> +{
> +	if (!atomic_add_unless(&queue->count, 1, MAX_QUEUED_PACKETS))
> +		return false;
> +	__wg_prev_queue_enqueue(queue, skb);
> +	return true;
> +}
> +
> +struct sk_buff *wg_prev_queue_dequeue(struct prev_queue *queue)
> +{
> +	struct sk_buff *tail = queue->tail, *next = smp_load_acquire(&NEXT(tail));
> +
> +	if (tail == STUB(queue)) {
> +		if (!next)
> +			return NULL;
> +		queue->tail = next;
> +		tail = next;
> +		next = smp_load_acquire(&NEXT(next));
> +	}
> +	if (next) {
> +		queue->tail = next;
> +		atomic_dec(&queue->count);
> +		return tail;
> +	}
> +	if (tail != READ_ONCE(queue->head))
> +		return NULL;
> +	__wg_prev_queue_enqueue(queue, STUB(queue));
> +	next = smp_load_acquire(&NEXT(tail));
> +	if (next) {
> +		queue->tail = next;
> +		atomic_dec(&queue->count);
> +		return tail;
> +	}
> +	return NULL;
> +}

I don't see anywhere that you're clearing the next pointer (or prev, as
it were). Which means you'll likely end up passing packets up or down
the stack with that pointer still set, right? See this commit for a
previous instance where something like this has lead to issues:

22f6bbb7bcfc ("net: use skb_list_del_init() to remove from RX sublists")

> +#undef NEXT
> +#undef STUB
> diff --git a/drivers/net/wireguard/queueing.h b/drivers/net/wireguard/queueing.h
> index dfb674e03076..4ef2944a68bc 100644
> --- a/drivers/net/wireguard/queueing.h
> +++ b/drivers/net/wireguard/queueing.h
> @@ -17,12 +17,13 @@ struct wg_device;
>  struct wg_peer;
>  struct multicore_worker;
>  struct crypt_queue;
> +struct prev_queue;
>  struct sk_buff;
>  
>  /* queueing.c APIs: */
>  int wg_packet_queue_init(struct crypt_queue *queue, work_func_t function,
> -			 bool multicore, unsigned int len);
> -void wg_packet_queue_free(struct crypt_queue *queue, bool multicore);
> +			 unsigned int len);
> +void wg_packet_queue_free(struct crypt_queue *queue);
>  struct multicore_worker __percpu *
>  wg_packet_percpu_multicore_worker_alloc(work_func_t function, void *ptr);
>  
> @@ -135,8 +136,31 @@ static inline int wg_cpumask_next_online(int *next)
>  	return cpu;
>  }
>  
> +void wg_prev_queue_init(struct prev_queue *queue);
> +
> +/* Multi producer */
> +bool wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb);
> +
> +/* Single consumer */
> +struct sk_buff *wg_prev_queue_dequeue(struct prev_queue *queue);
> +
> +/* Single consumer */
> +static inline struct sk_buff *wg_prev_queue_peek(struct prev_queue *queue)
> +{
> +	if (queue->peeked)
> +		return queue->peeked;
> +	queue->peeked = wg_prev_queue_dequeue(queue);
> +	return queue->peeked;
> +}
> +
> +/* Single consumer */
> +static inline void wg_prev_queue_drop_peeked(struct prev_queue *queue)
> +{
> +	queue->peeked = NULL;
> +}
> +
>  static inline int wg_queue_enqueue_per_device_and_peer(
> -	struct crypt_queue *device_queue, struct crypt_queue *peer_queue,
> +	struct crypt_queue *device_queue, struct prev_queue *peer_queue,
>  	struct sk_buff *skb, struct workqueue_struct *wq, int *next_cpu)
>  {
>  	int cpu;
> @@ -145,8 +169,9 @@ static inline int wg_queue_enqueue_per_device_and_peer(
>  	/* We first queue this up for the peer ingestion, but the consumer
>  	 * will wait for the state to change to CRYPTED or DEAD before.
>  	 */
> -	if (unlikely(ptr_ring_produce_bh(&peer_queue->ring, skb)))
> +	if (unlikely(!wg_prev_queue_enqueue(peer_queue, skb)))
>  		return -ENOSPC;
> +
>  	/* Then we queue it up in the device queue, which consumes the
>  	 * packet as soon as it can.
>  	 */
> @@ -157,9 +182,7 @@ static inline int wg_queue_enqueue_per_device_and_peer(
>  	return 0;
>  }
>  
> -static inline void wg_queue_enqueue_per_peer(struct crypt_queue *queue,
> -					     struct sk_buff *skb,
> -					     enum packet_state state)
> +static inline void wg_queue_enqueue_per_peer_tx(struct sk_buff *skb, enum packet_state state)
>  {
>  	/* We take a reference, because as soon as we call atomic_set, the
>  	 * peer can be freed from below us.
> @@ -167,14 +190,12 @@ static inline void wg_queue_enqueue_per_peer(struct crypt_queue *queue,
>  	struct wg_peer *peer = wg_peer_get(PACKET_PEER(skb));
>  
>  	atomic_set_release(&PACKET_CB(skb)->state, state);
> -	queue_work_on(wg_cpumask_choose_online(&peer->serial_work_cpu,
> -					       peer->internal_id),
> -		      peer->device->packet_crypt_wq, &queue->work);
> +	queue_work_on(wg_cpumask_choose_online(&peer->serial_work_cpu, peer->internal_id),
> +		      peer->device->packet_crypt_wq, &peer->transmit_packet_work);
>  	wg_peer_put(peer);
>  }
>  
> -static inline void wg_queue_enqueue_per_peer_napi(struct sk_buff *skb,
> -						  enum packet_state state)
> +static inline void wg_queue_enqueue_per_peer_rx(struct sk_buff *skb, enum packet_state state)
>  {
>  	/* We take a reference, because as soon as we call atomic_set, the
>  	 * peer can be freed from below us.
> diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c
> index 2c9551ea6dc7..7dc84bcca261 100644
> --- a/drivers/net/wireguard/receive.c
> +++ b/drivers/net/wireguard/receive.c
> @@ -444,7 +444,6 @@ static void wg_packet_consume_data_done(struct wg_peer *peer,
>  int wg_packet_rx_poll(struct napi_struct *napi, int budget)
>  {
>  	struct wg_peer *peer = container_of(napi, struct wg_peer, napi);
> -	struct crypt_queue *queue = &peer->rx_queue;
>  	struct noise_keypair *keypair;
>  	struct endpoint endpoint;
>  	enum packet_state state;
> @@ -455,11 +454,10 @@ int wg_packet_rx_poll(struct napi_struct *napi, int budget)
>  	if (unlikely(budget <= 0))
>  		return 0;
>  
> -	while ((skb = __ptr_ring_peek(&queue->ring)) != NULL &&
> +	while ((skb = wg_prev_queue_peek(&peer->rx_queue)) != NULL &&
>  	       (state = atomic_read_acquire(&PACKET_CB(skb)->state)) !=
>  		       PACKET_STATE_UNCRYPTED) {
> -		__ptr_ring_discard_one(&queue->ring);
> -		peer = PACKET_PEER(skb);
> +		wg_prev_queue_drop_peeked(&peer->rx_queue);
>  		keypair = PACKET_CB(skb)->keypair;
>  		free = true;
>  
> @@ -508,7 +506,7 @@ void wg_packet_decrypt_worker(struct work_struct *work)
>  		enum packet_state state =
>  			likely(decrypt_packet(skb, PACKET_CB(skb)->keypair)) ?
>  				PACKET_STATE_CRYPTED : PACKET_STATE_DEAD;
> -		wg_queue_enqueue_per_peer_napi(skb, state);
> +		wg_queue_enqueue_per_peer_rx(skb, state);
>  		if (need_resched())
>  			cond_resched();
>  	}
> @@ -531,12 +529,10 @@ static void wg_packet_consume_data(struct wg_device *wg, struct sk_buff *skb)
>  	if (unlikely(READ_ONCE(peer->is_dead)))
>  		goto err;
>  
> -	ret = wg_queue_enqueue_per_device_and_peer(&wg->decrypt_queue,
> -						   &peer->rx_queue, skb,
> -						   wg->packet_crypt_wq,
> -						   &wg->decrypt_queue.last_cpu);
> +	ret = wg_queue_enqueue_per_device_and_peer(&wg->decrypt_queue, &peer->rx_queue, skb,
> +						   wg->packet_crypt_wq, &wg->decrypt_queue.last_cpu);
>  	if (unlikely(ret == -EPIPE))
> -		wg_queue_enqueue_per_peer_napi(skb, PACKET_STATE_DEAD);
> +		wg_queue_enqueue_per_peer_rx(skb, PACKET_STATE_DEAD);
>  	if (likely(!ret || ret == -EPIPE)) {
>  		rcu_read_unlock_bh();
>  		return;
> diff --git a/drivers/net/wireguard/send.c b/drivers/net/wireguard/send.c
> index f74b9341ab0f..5368f7c35b4b 100644
> --- a/drivers/net/wireguard/send.c
> +++ b/drivers/net/wireguard/send.c
> @@ -239,8 +239,7 @@ void wg_packet_send_keepalive(struct wg_peer *peer)
>  	wg_packet_send_staged_packets(peer);
>  }
>  
> -static void wg_packet_create_data_done(struct sk_buff *first,
> -				       struct wg_peer *peer)
> +static void wg_packet_create_data_done(struct wg_peer *peer, struct sk_buff *first)
>  {
>  	struct sk_buff *skb, *next;
>  	bool is_keepalive, data_sent = false;
> @@ -262,22 +261,19 @@ static void wg_packet_create_data_done(struct sk_buff *first,
>  
>  void wg_packet_tx_worker(struct work_struct *work)
>  {
> -	struct crypt_queue *queue = container_of(work, struct crypt_queue,
> -						 work);
> +	struct wg_peer *peer = container_of(work, struct wg_peer, transmit_packet_work);
>  	struct noise_keypair *keypair;
>  	enum packet_state state;
>  	struct sk_buff *first;
> -	struct wg_peer *peer;
>  
> -	while ((first = __ptr_ring_peek(&queue->ring)) != NULL &&
> +	while ((first = wg_prev_queue_peek(&peer->tx_queue)) != NULL &&
>  	       (state = atomic_read_acquire(&PACKET_CB(first)->state)) !=
>  		       PACKET_STATE_UNCRYPTED) {
> -		__ptr_ring_discard_one(&queue->ring);
> -		peer = PACKET_PEER(first);
> +		wg_prev_queue_drop_peeked(&peer->tx_queue);
>  		keypair = PACKET_CB(first)->keypair;
>  
>  		if (likely(state == PACKET_STATE_CRYPTED))
> -			wg_packet_create_data_done(first, peer);
> +			wg_packet_create_data_done(peer, first);
>  		else
>  			kfree_skb_list(first);
>  
> @@ -306,16 +302,14 @@ void wg_packet_encrypt_worker(struct work_struct *work)
>  				break;
>  			}
>  		}
> -		wg_queue_enqueue_per_peer(&PACKET_PEER(first)->tx_queue, first,
> -					  state);
> +		wg_queue_enqueue_per_peer_tx(first, state);
>  		if (need_resched())
>  			cond_resched();
>  	}
>  }
>  
> -static void wg_packet_create_data(struct sk_buff *first)
> +static void wg_packet_create_data(struct wg_peer *peer, struct sk_buff *first)
>  {
> -	struct wg_peer *peer = PACKET_PEER(first);
>  	struct wg_device *wg = peer->device;
>  	int ret = -EINVAL;
>  
> @@ -323,13 +317,10 @@ static void wg_packet_create_data(struct sk_buff *first)
>  	if (unlikely(READ_ONCE(peer->is_dead)))
>  		goto err;
>  
> -	ret = wg_queue_enqueue_per_device_and_peer(&wg->encrypt_queue,
> -						   &peer->tx_queue, first,
> -						   wg->packet_crypt_wq,
> -						   &wg->encrypt_queue.last_cpu);
> +	ret = wg_queue_enqueue_per_device_and_peer(&wg->encrypt_queue, &peer->tx_queue, first,
> +						   wg->packet_crypt_wq, &wg->encrypt_queue.last_cpu);
>  	if (unlikely(ret == -EPIPE))
> -		wg_queue_enqueue_per_peer(&peer->tx_queue, first,
> -					  PACKET_STATE_DEAD);
> +		wg_queue_enqueue_per_peer_tx(first, PACKET_STATE_DEAD);
>  err:
>  	rcu_read_unlock_bh();
>  	if (likely(!ret || ret == -EPIPE))
> @@ -393,7 +384,7 @@ void wg_packet_send_staged_packets(struct wg_peer *peer)
>  	packets.prev->next = NULL;
>  	wg_peer_get(keypair->entry.peer);
>  	PACKET_CB(packets.next)->keypair = keypair;
> -	wg_packet_create_data(packets.next);
> +	wg_packet_create_data(peer, packets.next);
>  	return;
>  
>  out_invalid:
> -- 
> 2.30.0

From Jason at zx2c4.com  Wed Feb 17 22:28:43 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Wed, 17 Feb 2021 23:28:43 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <87czwymtho.fsf@toke.dk>
References: <20210208133816.45333-1-Jason@zx2c4.com> <87czwymtho.fsf@toke.dk>
Message-ID: <CAHmME9q98j=vXMJ5gonSg_Eaot08bBP+zXCExDVzPcCmLDb4vQ@mail.gmail.com>

On Wed, Feb 17, 2021 at 7:36 PM Toke H?iland-J?rgensen <toke at toke.dk> wrote:
> Are these performance measurements are based on micro-benchmarks of the
> queueing structure, or overall wireguard performance? Do you see any
> measurable difference in the overall performance (i.e., throughput
> drop)?

These are from counting cycles per instruction using perf and seeing
which instructions are hotspots that take a greater or smaller
percentage of the overall time.

> And what about relative to using one of the existing skb queueing
> primitives in the kernel? Including some actual numbers would be nice to
> justify adding yet-another skb queueing scheme to the kernel :)

If you're referring to skb_queue_* and friends, those very much will
not work in any way, shape, or form here. Aside from the fact that the
MPSC nature of it is problematic for performance, those functions use
a doubly linked list. In wireguard's case, there is only one pointer
available (skb->prev), as skb->next is used to create the singly
linked skb_list (see skb_list_walk_safe) of gso frags. And in fact, by
having these two pointers next to each other for the separate lists,
it doesn't need to pull in another cache line. This isn't "yet-another
queueing scheme" in the kernel. This is just a singly linked list
queue.

> I say this also because the actual queueing of the packets has never
> really shown up on any performance radar in the qdisc and mac80211
> layers, which both use traditional spinlock-protected queueing
> structures.

Those are single threaded and the locks aren't really contended much.

> that would be good; also for figuring out if this algorithm might be
> useful in other areas as well (and don't get me wrong, I'm fascinated by
> it!).

If I find the motivation -- and if the mailing list conversations
don't become overly miserable -- I might try to fashion the queueing
mechanism into a general header-only data structure in include/linux/.
But that'd take a bit of work to see if there are actually places
where it matters and where it's useful. WireGuard can get away with it
because of its workqueue design, but other things probably aren't as
lucky like that. So I'm on the fence about generality.


> > -     if (wg_packet_queue_init(&peer->tx_queue, wg_packet_tx_worker, false,
> > -                              MAX_QUEUED_PACKETS))
> > -             goto err_2;
> > +     INIT_WORK(&peer->transmit_packet_work, wg_packet_tx_worker);
>
> It's not quite clear to me why changing the queue primitives requires
> adding another work queue?

It doesn't require a new workqueue. It's just that a workqueue was
init'd earlier in the call to "wg_packet_queue_init", which allocated
a ring buffer at the same time. We're not going through that
infrastructure anymore, but I still want the workqueue it used, so I
init it there instead. I truncated the diff in my quoted reply -- take
a look at that quote above and you'll see more clearly what I mean.

> > +#define NEXT(skb) ((skb)->prev)
>
> In particular, please explain this oxymoronic define :)

I can write more about that, sure. But it's what I wrote earlier in
this email -- the next pointer is taken; the prev one is free. So,
this uses the prev one.

> While this is nice and compact it's also really hard to read.

Actually I've already reworked that a bit in master to get the memory
barrier better.

> I don't see anywhere that you're clearing the next pointer (or prev, as
> it were). Which means you'll likely end up passing packets up or down
> the stack with that pointer still set, right? See this commit for a
> previous instance where something like this has lead to issues:
>
> 22f6bbb7bcfc ("net: use skb_list_del_init() to remove from RX sublists")

The prev pointer is never used for anything or initialized to NULL
anywhere. skb_mark_not_on_list concerns skb->next.

Thanks for the review.

Jason

From toke at toke.dk  Wed Feb 17 23:41:41 2021
From: toke at toke.dk (Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?=)
Date: Thu, 18 Feb 2021 00:41:41 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <CAHmME9q98j=vXMJ5gonSg_Eaot08bBP+zXCExDVzPcCmLDb4vQ@mail.gmail.com>
References: <20210208133816.45333-1-Jason@zx2c4.com> <87czwymtho.fsf@toke.dk>
 <CAHmME9q98j=vXMJ5gonSg_Eaot08bBP+zXCExDVzPcCmLDb4vQ@mail.gmail.com>
Message-ID: <874kiamfd6.fsf@toke.dk>

"Jason A. Donenfeld" <Jason at zx2c4.com> writes:

> On Wed, Feb 17, 2021 at 7:36 PM Toke H?iland-J?rgensen <toke at toke.dk> wrote:
>> Are these performance measurements are based on micro-benchmarks of the
>> queueing structure, or overall wireguard performance? Do you see any
>> measurable difference in the overall performance (i.e., throughput
>> drop)?
>
> These are from counting cycles per instruction using perf and seeing
> which instructions are hotspots that take a greater or smaller
> percentage of the overall time.

Right. Would still love to see some actual numbers if you have them.
I.e., what kind of overhead is the queueing operations compared to the
rest of the wg data path, and how much of that is the hotspot
operations? Even better if you have a comparison with a spinlock
version, but I do realise that may be asking too much :)

>> And what about relative to using one of the existing skb queueing
>> primitives in the kernel? Including some actual numbers would be nice to
>> justify adding yet-another skb queueing scheme to the kernel :)
>
> If you're referring to skb_queue_* and friends, those very much will
> not work in any way, shape, or form here. Aside from the fact that the
> MPSC nature of it is problematic for performance, those functions use
> a doubly linked list. In wireguard's case, there is only one pointer
> available (skb->prev), as skb->next is used to create the singly
> linked skb_list (see skb_list_walk_safe) of gso frags. And in fact, by
> having these two pointers next to each other for the separate lists,
> it doesn't need to pull in another cache line. This isn't "yet-another
> queueing scheme" in the kernel. This is just a singly linked list
> queue.

Having this clearly articulated in the commit message would be good, and
could prevent others from pushing back against what really does appear
at first glance to be "yet-another queueing scheme"...

I.e., in the version you posted you go "the ring buffer is too much
memory, so here's a new linked-list queueing algoritm", skipping the
"and this is why we can't use any of the existing ones" in-between.

>> I say this also because the actual queueing of the packets has never
>> really shown up on any performance radar in the qdisc and mac80211
>> layers, which both use traditional spinlock-protected queueing
>> structures.
>
> Those are single threaded and the locks aren't really contended much.
>
>> that would be good; also for figuring out if this algorithm might be
>> useful in other areas as well (and don't get me wrong, I'm fascinated by
>> it!).
>
> If I find the motivation -- and if the mailing list conversations
> don't become overly miserable -- I might try to fashion the queueing
> mechanism into a general header-only data structure in include/linux/.
> But that'd take a bit of work to see if there are actually places
> where it matters and where it's useful. WireGuard can get away with it
> because of its workqueue design, but other things probably aren't as
> lucky like that. So I'm on the fence about generality.

Yeah, I can't think of any off the top of my head either. But I'll
definitely keep this in mind if I do run into any. If there's no obvious
contenders, IMO it would be fine to just keep it internal to wg until
such a use case shows up, and then generalise it at that time. Although
that does give it less visibility for other users, it also saves you
some potentially-redundant work :)

>> > -     if (wg_packet_queue_init(&peer->tx_queue, wg_packet_tx_worker, false,
>> > -                              MAX_QUEUED_PACKETS))
>> > -             goto err_2;
>> > +     INIT_WORK(&peer->transmit_packet_work, wg_packet_tx_worker);
>>
>> It's not quite clear to me why changing the queue primitives requires
>> adding another work queue?
>
> It doesn't require a new workqueue. It's just that a workqueue was
> init'd earlier in the call to "wg_packet_queue_init", which allocated
> a ring buffer at the same time. We're not going through that
> infrastructure anymore, but I still want the workqueue it used, so I
> init it there instead. I truncated the diff in my quoted reply -- take
> a look at that quote above and you'll see more clearly what I mean.

Ah, right, it's moving things from wg_packet_queue_init() - missed that.
Thanks!

>> > +#define NEXT(skb) ((skb)->prev)
>>
>> In particular, please explain this oxymoronic define :)
>
> I can write more about that, sure. But it's what I wrote earlier in
> this email -- the next pointer is taken; the prev one is free. So,
> this uses the prev one.

Yeah, I just meant to duplicate the explanation and references in
comments as well as the commit message, to save the people looking at
the code in the future some head scratching, and to make the origins
of the algorithm clear (credit where credit is due and all that).

>> While this is nice and compact it's also really hard to read.
>
> Actually I've already reworked that a bit in master to get the memory
> barrier better.

That version still hides the possible race inside a nested macro
expansion, though. Not doing your readers any favours.

>> I don't see anywhere that you're clearing the next pointer (or prev, as
>> it were). Which means you'll likely end up passing packets up or down
>> the stack with that pointer still set, right? See this commit for a
>> previous instance where something like this has lead to issues:
>>
>> 22f6bbb7bcfc ("net: use skb_list_del_init() to remove from RX sublists")
>
> The prev pointer is never used for anything or initialized to NULL
> anywhere. skb_mark_not_on_list concerns skb->next.

I was more concerned with stepping on the 'struct list_head' that shares
the space with the next and prev pointers, actually. But if you audited
that there are no other users of the pointer space at all, great! Please
do note this somewhere, though.

> Thanks for the review.

You're welcome - feel free to add my:

Reviewed-by: Toke H?iland-J?rgensen <toke at redhat.com>

-Toke

From bjorn at kernel.org  Thu Feb 18 13:49:52 2021
From: bjorn at kernel.org (=?UTF-8?B?QmrDtnJuIFTDtnBlbA==?=)
Date: Thu, 18 Feb 2021 14:49:52 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <20210208133816.45333-1-Jason@zx2c4.com>
References: <20210208133816.45333-1-Jason@zx2c4.com>
Message-ID: <CAJ+HfNjNBUg9rvFtiuvNDP3KKmjGg50O+23c6KJvtGfJ2Qf+bA@mail.gmail.com>

On Mon, 8 Feb 2021 at 14:47, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>
> Having two ring buffers per-peer means that every peer results in two
> massive ring allocations. On an 8-core x86_64 machine, this commit
> reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which
> is an 90% reduction. Ninety percent! With some single-machine
> deployments approaching 400,000 peers, we're talking about a reduction
> from 7 gigs of memory down to 700 megs of memory.
>
> In order to get rid of these per-peer allocations, this commit switches
> to using a list-based queueing approach. Currently GSO fragments are
> chained together using the skb->next pointer, so we form the per-peer
> queue around the unused skb->prev pointer, which makes sense because the
> links are pointing backwards. Multiple cores can write into the queue at
> any given time, because its writes occur in the start_xmit path or in
> the udp_recv path. But reads happen in a single workqueue item per-peer,
> amounting to a multi-producer, single-consumer paradigm.
>
> The MPSC queue is implemented locklessly and never blocks. However, it
> is not linearizable (though it is serializable), with a very tight and
> unlikely race on writes, which, when hit (about 0.15% of the time on a
> fully loaded 16-core x86_64 system), causes the queue reader to
> terminate early. However, because every packet sent queues up the same
> workqueue item after it is fully added, the queue resumes again, and
> stopping early isn't actually a problem, since at that point the packet
> wouldn't have yet been added to the encryption queue. These properties
> allow us to avoid disabling interrupts or spinning.
>
> Performance-wise, ordinarily list-based queues aren't preferable to
> ringbuffers, because of cache misses when following pointers around.
> However, we *already* have to follow the adjacent pointers when working
> through fragments, so there shouldn't actually be any change there. A
> potential downside is that dequeueing is a bit more complicated, but the
> ptr_ring structure used prior had a spinlock when dequeueing, so all and
> all the difference appears to be a wash.
>
> Actually, from profiling, the biggest performance hit, by far, of this
> commit winds up being atomic_add_unless(count, 1, max) and atomic_
> dec(count), which account for the majority of CPU time, according to
> perf. In that sense, the previous ring buffer was superior in that it
> could check if it was full by head==tail, which the list-based approach
> cannot do.
>
> Cc: Dmitry Vyukov <dvyukov at google.com>
> Signed-off-by: Jason A. Donenfeld <Jason at zx2c4.com>
> ---
> Hoping to get some feedback here from people running massive deployments
> and running into ram issues, as well as Dmitry on the queueing semantics
> (the mpsc queue is his design), before I send this to Dave for merging.
> These changes are quite invasive, so I don't want to get anything wrong.
>

[...]

> diff --git a/drivers/net/wireguard/queueing.c b/drivers/net/wireguard/queueing.c
> index 71b8e80b58e1..a72380ce97dd 100644
> --- a/drivers/net/wireguard/queueing.c
> +++ b/drivers/net/wireguard/queueing.c

[...]

> +
> +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
> +{
> +       WRITE_ONCE(NEXT(skb), NULL);
> +       smp_wmb();
> +       WRITE_ONCE(NEXT(xchg_relaxed(&queue->head, skb)), skb);
> +}
> +

I'll chime in with Toke; This MPSC and Dmitry's links really took me
to the "verify with pen/paper"-level! Thanks!

I'd replace the smp_wmb()/_relaxed above with a xchg_release(), which
might perform better on some platforms. Also, it'll be a nicer pair
with the ldacq below. :-P In general, it would be nice with some
wording how the fences pair. It would help the readers (like me!) a
lot.


Cheers,
Bj?rn

[...]

From Jason at zx2c4.com  Thu Feb 18 13:53:20 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 18 Feb 2021 14:53:20 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <CAJ+HfNjNBUg9rvFtiuvNDP3KKmjGg50O+23c6KJvtGfJ2Qf+bA@mail.gmail.com>
References: <20210208133816.45333-1-Jason@zx2c4.com>
 <CAJ+HfNjNBUg9rvFtiuvNDP3KKmjGg50O+23c6KJvtGfJ2Qf+bA@mail.gmail.com>
Message-ID: <CAHmME9r+c+e-ibBOwnFoz6Bnis4tBoeht63fqVdkKF-oTcEtMg@mail.gmail.com>

Hey Bjorn,

On Thu, Feb 18, 2021 at 2:50 PM Bj?rn T?pel <bjorn at kernel.org> wrote:
> > +
> > +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
> > +{
> > +       WRITE_ONCE(NEXT(skb), NULL);
> > +       smp_wmb();
> > +       WRITE_ONCE(NEXT(xchg_relaxed(&queue->head, skb)), skb);
> > +}
> > +
>
> I'll chime in with Toke; This MPSC and Dmitry's links really took me
> to the "verify with pen/paper"-level! Thanks!
>
> I'd replace the smp_wmb()/_relaxed above with a xchg_release(), which
> might perform better on some platforms. Also, it'll be a nicer pair
> with the ldacq below. :-P In general, it would be nice with some
> wording how the fences pair. It would help the readers (like me!) a
> lot.

Exactly. This is what's been in my dev tree for the last week or so:

+static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct
sk_buff *skb)
+{
+       WRITE_ONCE(NEXT(skb), NULL);
+       WRITE_ONCE(NEXT(xchg_release(&queue->head, skb)), skb);
+}

Look good?

Jason

From bjorn at kernel.org  Thu Feb 18 14:04:29 2021
From: bjorn at kernel.org (=?UTF-8?B?QmrDtnJuIFTDtnBlbA==?=)
Date: Thu, 18 Feb 2021 15:04:29 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <CAHmME9r+c+e-ibBOwnFoz6Bnis4tBoeht63fqVdkKF-oTcEtMg@mail.gmail.com>
References: <20210208133816.45333-1-Jason@zx2c4.com>
 <CAJ+HfNjNBUg9rvFtiuvNDP3KKmjGg50O+23c6KJvtGfJ2Qf+bA@mail.gmail.com>
 <CAHmME9r+c+e-ibBOwnFoz6Bnis4tBoeht63fqVdkKF-oTcEtMg@mail.gmail.com>
Message-ID: <CAJ+HfNi9kaU1JN36gavVX7mh6tbuoFgUiXM03U7C5gLw_eiprQ@mail.gmail.com>

On Thu, 18 Feb 2021 at 14:53, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>
> Hey Bjorn,
>
> On Thu, Feb 18, 2021 at 2:50 PM Bj?rn T?pel <bjorn at kernel.org> wrote:
> > > +
> > > +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
> > > +{
> > > +       WRITE_ONCE(NEXT(skb), NULL);
> > > +       smp_wmb();
> > > +       WRITE_ONCE(NEXT(xchg_relaxed(&queue->head, skb)), skb);
> > > +}
> > > +
> >
> > I'll chime in with Toke; This MPSC and Dmitry's links really took me
> > to the "verify with pen/paper"-level! Thanks!
> >
> > I'd replace the smp_wmb()/_relaxed above with a xchg_release(), which
> > might perform better on some platforms. Also, it'll be a nicer pair
> > with the ldacq below. :-P In general, it would be nice with some
> > wording how the fences pair. It would help the readers (like me!) a
> > lot.
>
> Exactly. This is what's been in my dev tree for the last week or so:
>

Ah, nice!

> +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct
> sk_buff *skb)
> +{
> +       WRITE_ONCE(NEXT(skb), NULL);
> +       WRITE_ONCE(NEXT(xchg_release(&queue->head, skb)), skb);
> +}
>
> Look good?
>

Yes, exactly like that!


Cheers,
Bj?rn

From Jason at zx2c4.com  Thu Feb 18 14:15:24 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 18 Feb 2021 15:15:24 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <CAJ+HfNi9kaU1JN36gavVX7mh6tbuoFgUiXM03U7C5gLw_eiprQ@mail.gmail.com>
References: <20210208133816.45333-1-Jason@zx2c4.com>
 <CAJ+HfNjNBUg9rvFtiuvNDP3KKmjGg50O+23c6KJvtGfJ2Qf+bA@mail.gmail.com>
 <CAHmME9r+c+e-ibBOwnFoz6Bnis4tBoeht63fqVdkKF-oTcEtMg@mail.gmail.com>
 <CAJ+HfNi9kaU1JN36gavVX7mh6tbuoFgUiXM03U7C5gLw_eiprQ@mail.gmail.com>
Message-ID: <CAHmME9qf_EY8xTi81VPr9O9_95HpdjxobLtd1=C7fK2kRUxWiw@mail.gmail.com>

On Thu, Feb 18, 2021 at 3:04 PM Bj?rn T?pel <bjorn at kernel.org> wrote:
>
> On Thu, 18 Feb 2021 at 14:53, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
> >
> > Hey Bjorn,
> >
> > On Thu, Feb 18, 2021 at 2:50 PM Bj?rn T?pel <bjorn at kernel.org> wrote:
> > > > +
> > > > +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct sk_buff *skb)
> > > > +{
> > > > +       WRITE_ONCE(NEXT(skb), NULL);
> > > > +       smp_wmb();
> > > > +       WRITE_ONCE(NEXT(xchg_relaxed(&queue->head, skb)), skb);
> > > > +}
> > > > +
> > >
> > > I'll chime in with Toke; This MPSC and Dmitry's links really took me
> > > to the "verify with pen/paper"-level! Thanks!
> > >
> > > I'd replace the smp_wmb()/_relaxed above with a xchg_release(), which
> > > might perform better on some platforms. Also, it'll be a nicer pair
> > > with the ldacq below. :-P In general, it would be nice with some
> > > wording how the fences pair. It would help the readers (like me!) a
> > > lot.
> >
> > Exactly. This is what's been in my dev tree for the last week or so:
> >
>
> Ah, nice!
>
> > +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct
> > sk_buff *skb)
> > +{
> > +       WRITE_ONCE(NEXT(skb), NULL);
> > +       WRITE_ONCE(NEXT(xchg_release(&queue->head, skb)), skb);
> > +}
> >
> > Look good?
> >
>
> Yes, exactly like that!

The downside is that on armv7, this becomes a dmb(ish) instead of a
dmb(ishst). But I was unable to measure any actual difference anyway,
and the atomic bounded increment is already more expensive, so I think
it's okay.

Jason

From bjorn at kernel.org  Thu Feb 18 15:12:49 2021
From: bjorn at kernel.org (=?UTF-8?B?QmrDtnJuIFTDtnBlbA==?=)
Date: Thu, 18 Feb 2021 16:12:49 +0100
Subject: [PATCH RFC v1] wireguard: queueing: get rid of per-peer ring
 buffers
In-Reply-To: <CAHmME9qf_EY8xTi81VPr9O9_95HpdjxobLtd1=C7fK2kRUxWiw@mail.gmail.com>
References: <20210208133816.45333-1-Jason@zx2c4.com>
 <CAJ+HfNjNBUg9rvFtiuvNDP3KKmjGg50O+23c6KJvtGfJ2Qf+bA@mail.gmail.com>
 <CAHmME9r+c+e-ibBOwnFoz6Bnis4tBoeht63fqVdkKF-oTcEtMg@mail.gmail.com>
 <CAJ+HfNi9kaU1JN36gavVX7mh6tbuoFgUiXM03U7C5gLw_eiprQ@mail.gmail.com>
 <CAHmME9qf_EY8xTi81VPr9O9_95HpdjxobLtd1=C7fK2kRUxWiw@mail.gmail.com>
Message-ID: <CAJ+HfNi2Bdi-xa+rsgxWkxFdykM1CKznGMdj=FoazEbNh1m4Dw@mail.gmail.com>

On Thu, 18 Feb 2021 at 15:15, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>

[...]

> >
> > > +static void __wg_prev_queue_enqueue(struct prev_queue *queue, struct
> > > sk_buff *skb)
> > > +{
> > > +       WRITE_ONCE(NEXT(skb), NULL);
> > > +       WRITE_ONCE(NEXT(xchg_release(&queue->head, skb)), skb);
> > > +}
> > >
> > > Look good?
> > >
> >
> > Yes, exactly like that!
>
> The downside is that on armv7, this becomes a dmb(ish) instead of a
> dmb(ishst). But I was unable to measure any actual difference anyway,
> and the atomic bounded increment is already more expensive, so I think
> it's okay.
>

Who cares about armv7!? The world is moving to Armv8/LSE, where we'll
end up with one fine "swpl" in this case, w/o any explicit (well...)
fence. ;-P

On a more serious note, it does make sense to base the decision on
benchmarks. OTOH I'd guess that the systems that mostly benefit from
this memory saving patch are x86_64, where the
smp_wmb()/xchg_relaxed() and xchg_release() are identical.


Bj?rn

From cascardo at canonical.com  Thu Feb 18 17:01:32 2021
From: cascardo at canonical.com (Thadeu Lima de Souza Cascardo)
Date: Thu, 18 Feb 2021 14:01:32 -0300
Subject: [PATCH] compat: skb_mark_not_on_list will be backported to Ubuntu
 18.04
Message-ID: <20210218170132.22917-1-cascardo@canonical.com>

linux commit 22f6bbb7bcfcef0b373b0502a7ff390275c575dd ("net: use
skb_list_del_init() to remove from RX sublists") will be backported to Ubuntu
18.04 default kernel, which is based on linux 4.15.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
---
 src/compat/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compat/compat.h b/src/compat/compat.h
index c1b77d99e817..78e942dec084 100644
--- a/src/compat/compat.h
+++ b/src/compat/compat.h
@@ -823,7 +823,7 @@ static __always_inline void old_rcu_barrier(void)
 #define COMPAT_CANNOT_DEPRECIATE_BH_RCU
 #endif
 
-#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 19, 10) && LINUX_VERSION_CODE >= KERNEL_VERSION(4, 15, 0) && !defined(ISRHEL8)) || LINUX_VERSION_CODE < KERNEL_VERSION(4, 14, 217)
+#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 19, 10) && LINUX_VERSION_CODE >= KERNEL_VERSION(4, 15, 0) && !defined(ISRHEL8) && !defined(ISUBUNTU1804)) || LINUX_VERSION_CODE < KERNEL_VERSION(4, 14, 217)
 static inline void skb_mark_not_on_list(struct sk_buff *skb)
 {
 	skb->next = NULL;
-- 
2.27.0


From Jason at zx2c4.com  Thu Feb 18 19:30:20 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 18 Feb 2021 20:30:20 +0100
Subject: [PATCH] compat: skb_mark_not_on_list will be backported to
 Ubuntu 18.04
In-Reply-To: <20210218170132.22917-1-cascardo@canonical.com>
References: <20210218170132.22917-1-cascardo@canonical.com>
Message-ID: <YC7ATI5bnRO0XHpA@zx2c4.com>

On Thu, Feb 18, 2021 at 02:01:32PM -0300, Thadeu Lima de Souza Cascardo wrote:
> linux commit 22f6bbb7bcfcef0b373b0502a7ff390275c575dd ("net: use
> skb_list_del_init() to remove from RX sublists") will be backported to Ubuntu
> 18.04 default kernel, which is based on linux 4.15.

Applied. Thanks for the patch.

https://git.zx2c4.com/wireguard-linux-compat/commit/?id=cad80597c7947f0def83caf8cb56aff0149c83a8

Jason

From prochazka.nicolas at gmail.com  Fri Feb 19 07:58:53 2021
From: prochazka.nicolas at gmail.com (nicolas prochazka)
Date: Fri, 19 Feb 2021 08:58:53 +0100
Subject: ipv6 multicast peer ?
Message-ID: <CADdae-jn684uFbFPgkWY47TsmOAnTfzLq7dVGrBxv8Vv8Gcg-Q@mail.gmail.com>

Hello,
On a "server side" I've for example these peers, and i want to send a
ipv6 multicast group
ff02::1
How can I do that with peer / allowed-ips routing ?

Regards
Nicolas

interface: wg0
  public key: **************
  private key: (hidden)
  listening port: 6081

peer: ************
  preshared key: (hidden)
  endpoint: x.x.130.134:6081
  allowed ips: fd00:0:222d:0:f64d:30ff:fe6e:222d/128
  latest handshake: 52 seconds ago
  transfer: 56.96 MiB received, 1.96 GiB sent

peer: **********
  preshared key: (hidden)
  endpoint: x.x.x.x:6081
  allowed ips: fd00::8e2:97ff:fe2e:3/128,
fd00:0:2836:0:1e69:7aff:fe01:2836/128,
fd00:0:3340:0:a00:27ff:fe5a:3340/128
  latest handshake: 1 minute, 54 seconds ago
  transfer: 513.17 MiB received, 6.27 GiB sent
  persistent keepalive: every 25 seconds

peer: *****
  preshared key: (hidden)
  endpoint: x.x.x.x:6081
  allowed ips: fd00::/32, fd00::8e2:97ff:fe2e:0/112,
fd00::8e2:97ff:fe2e:eeee/128
  latest handshake: 1 minute, 59 seconds ago
  transfer: 2.70 MiB received, 6.69 MiB sent
  persistent keepalive: every 25 seconds

peer: **************
  preshared key: (hidden)
  endpoint: x.x.100.142:6081
  allowed ips: fd00:0:ec58:0:b26e:bfff:fe1e:2d5a/128
  latest handshake: 2 minutes, 5 seconds ago
  transfer: 195.00 MiB received, 2.19 GiB sent

From Jason at zx2c4.com  Fri Feb 19 14:14:45 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Fri, 19 Feb 2021 15:14:45 +0100
Subject: [ANNOUNCE] wireguard-linux-compat v1.0.20210219 released
Message-ID: <mailman.1.1613744092.16265.wireguard@lists.zx2c4.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hello,

A new version, v1.0.20210219, of the backported WireGuard kernel module for
3.10 <= Linux <= 5.5.y has been tagged in the git repository.

== Changes ==

  * compat: remove unused version.h headers
  * compat: redefine version constants for sublevel>=256
  * compat: skb_mark_not_on_list will be backported to Ubuntu 18.04
  * compat: zero out skb->cb before icmp
  
  Various compat fixes, most notable of which is that the 4.9.256 and 4.4.256
  kernels no longer cause integer wraparound problems. For more info:
  https://lwn.net/Articles/845120/ https://lwn.net/Articles/845207/
  
  * selftests: test multiple parallel streams
  * qemu: bump default kernel version
  
  Usual test harness improvements.
  
  * peer: put frequently used members above cache lines
  * device: do not generate ICMP for non-IP packets
  * queueing: get rid of per-peer ring buffers
  
  Most notable here is the queueing commit. Having two ring buffers per-peer
  means that every peer resulted in two massive ring allocations. On an 8-core
  x86_64 machine, this commit reduces the per-peer allocation from 18,688 bytes
  to 1,856 bytes, which is an 90% reduction. With some single-machine
  deployments approaching 500,000 peers, we're talking about a reduction from 7
  gigs of memory down to 700 megs of memory.

This release contains commits from: Jason A. Donenfeld and Thadeu Lima de 
Souza Cascardo.

As always, the source is available at https://git.zx2c4.com/wireguard-linux-compat/
and information about the project is available at https://www.wireguard.com/ .

This version is available in compressed tarball form here:
  https://git.zx2c4.com/wireguard-linux-compat/snapshot/wireguard-linux-compat-1.0.20210219.tar.xz
  SHA2-256: 99d35296b8d847a0d4db97a4dda96b464311a6354e75fe0bef6e7c4578690f00

A PGP signature of that file decompressed is available here:
  https://git.zx2c4.com/wireguard-linux-compat/snapshot/wireguard-linux-compat-1.0.20210219.tar.asc
  Signing key: AB9942E6D4A4CFC3412620A749FC7012A5DE03AE
  Remember to unxz the tarball before verifying the signature.

If you're a package maintainer, please bump your package version. If you're a
user, the WireGuard team welcomes any and all feedback on this latest version.

Finally, WireGuard development thrives on donations. By popular demand, we
have a webpage for this: https://www.wireguard.com/donations/

Thank you,
Jason Donenfeld


-----BEGIN PGP SIGNATURE-----

iQJEBAEBCAAuFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmAvx8IQHGphc29uQHp4
MmM0LmNvbQAKCRBJ/HASpd4DrplOD/9DhOh9/IcW0HtQ1dpY3oiCQQwoSfZNwBsy
84xOTMDs3+/OcTklLJabyryMOMbzOtR9sj0Dlp32PNsIxPEpCrmi4QfjmAT77SnS
+Om4QsQhlzxAAuEdA0ZVlbHdV9+9Lxa1ajn1yHnz0oC2iDWIrMjvascggBBcexX4
9qmJV/bsEjVlI3LYS7WrISeFW9MhEMt1eDkUgGV32UlLDMkHNvexg/fRFaEl5bJL
u95mmY28nqv4MgtP0m5RRcQgWlgp/W3fYBp+ThRvm2rMPV1EjH1ZHphZ9imH7ZUt
w+aXiQHbIzlV0jUKWIVGISsHqT1rHXGhTH0fxQSl8oaa3jNBPj/RDWU1uxGfMJDP
OY5DP5x9RkEjmv6KfZS3aIz2OXgDHOVa/2M9HTo+ye5SLSr0Og374LXAHVvHR+xK
yjkLi5yturusltjbo3iK/0LzUZ6QZt3gc6fzid0ljlg1+QJW332qQCtZAEmeSKzt
xVf8iAapl5ezwN6NZNxTSxuzlVDl0f1c9sgjAjGbVAphziQvrmK9n4Iz52oOyNre
bLe5Al/tUBysnT6yKglODJhr7jrhtOEoaoU5ROEcPswT6QmBUOW2EAMVoQ+AYyp9
vMGQ97Jvew1sNBYmfAUrO1l/Azpfi0Mj3nFtyOGx/mgVPkGqhntWpFol0aImaEPc
7SReRurgMA==
=wGOc
-----END PGP SIGNATURE-----

From Jason at zx2c4.com  Sat Feb 20 14:33:23 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Sat, 20 Feb 2021 15:33:23 +0100
Subject: Feature request: tag incoming packets
In-Reply-To: <cc0a6bc9-5450-c30b-b3e9-8bb248b8c07d@noris.de>
References: <mailman.0.1612101211.63283.wireguard@lists.zx2c4.com>
 <cc0a6bc9-5450-c30b-b3e9-8bb248b8c07d@noris.de>
Message-ID: <CAHmME9pQYws_=qohpKGNeL_=aCmrJ6QWtJ+MyFLQrUqHWAscrQ@mail.gmail.com>

There is no need for this and WireGuard was designed to avoid needing
something like this. The AllowedIPs binding gives you a mapping
between source IP and peer public key.

So, if you have on wg0:

PublicKey = ABCD
AllowedIPs = 192.168.33.99/32

Then you can safely have a netfilter rule that says:

iptables -A INPUT -i wg0 -s 192.168.33.99/32 -j ACCEPT

You only need to match two things: the wireguard interface and the
source IP. The strong binding to the public key is the primary
security property that WireGuard gives you via cryptokey routing.

From Jason at zx2c4.com  Tue Feb 23 18:34:40 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Tue, 23 Feb 2021 19:34:40 +0100
Subject: [ANNOUNCE] wireguard-tools v1.0.20210223 released 
Message-ID: <mailman.2.1614105290.16265.wireguard@lists.zx2c4.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hello,

A new version, v1.0.20210223, of wireguard-tools has been tagged in the git
repository, containing various required userspace utilities, such as the
wg(8) and wg-quick(8) commands and documentation.

== Changes ==

  * wg-quick: android: do not free iterated pointer
  * wg-quick: openbsd: no use for userspace support
  * embeddable-wg-library: sync latest from netlink.h
  * wincompat: recent mingw has inet_ntop/inet_pton
  * wincompat: add resource and manifest and enable lto
  * wincompat: do not elevate by default
  * completion: add help and syncconf completions
  * sticky-sockets: do not use SO_REUSEADDR
  * man: LOG_LEVEL variables changed name
  * ipc: do not use fscanf with trailing \n
  * ipc: read trailing responses after set operation

This release contains commits from: Jason A. Donenfeld.

As always, the source is available at https://git.zx2c4.com/wireguard-tools/ and
information about the project is available at https://www.wireguard.com/ .

This release is available in compressed tarball form here:
  https://git.zx2c4.com/wireguard-tools/snapshot/wireguard-tools-1.0.20210223.tar.xz
  SHA2-256: 1f72da217044622d79e0bab57779e136a3df795e3761a3fc1dc0941a9055877c

A PGP signature of that file decompressed is available here:
  https://git.zx2c4.com/wireguard-tools/snapshot/wireguard-tools-1.0.20210223.tar.asc
  Signing key: AB9942E6D4A4CFC3412620A749FC7012A5DE03AE
  Remember to unxz the tarball before verifying the signature.

If you're a package maintainer, please bump your package version. If you're a
user, the WireGuard team welcomes any and all feedback on this latest version.

Finally, WireGuard development thrives on donations. By popular demand, we
have a webpage for this: https://www.wireguard.com/donations/

Thank you,
Jason Donenfeld


-----BEGIN PGP SIGNATURE-----

iQJEBAEBCAAuFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmA1SrkQHGphc29uQHp4
MmM0LmNvbQAKCRBJ/HASpd4DrhPKD/0WYx0cqotY+xmQWyoti03cYo0QPB5uhcSk
Y+DFJb2uSBAgrSGwFPojozsR7JqT8vTxtVloxM/HGBU1f0e8bC87rSdqu8XJUuJV
s9SCvFfTjKL72l6C4a7o53HGQbtge4qsY3O0qFuX/LTCanryHk3mX48nsKFnZyBC
AMOR3lQPwxolshp7brrdfPYQk32ZOMnhPJcozvHMw7opxgeUhkA7KQVPsndSB6fr
NZK5f+OW3C2Z/d8HFzEtrLrdGlhIelcDxFCye5X3sUagZYnBSnCbQU0g93t9YC9+
/FKKYJSoJ28qZiRJU1oTYRo/wcBBWazAOqMni7MT/cCV8sILba7zRem2/AySpTV1
nNWCcoGsLbeM5kpaYMHpdtL3MKUUOI9FlLX/kfULg3d1xfpLv+A0mzs0YieM0+/5
HcnGD9lBhTEE70kic5OduMlyScLDhvclM9bOdN3/npW4xYy47R3896bSU0SMLk6C
S/ccax8XTFYKaq9aeKbi1k77RTPF1ZEBvLa+PQktHfWbl0MSc6XmETBoI0MJr1tL
b77MPz5sl1WJC653Kx+hD+8qIimiebZvY0GAAUo9GolaHqS3dapPAcUD2LyN+EUF
c8yMig3lqV7Lmr/b+ZSIceJuepYPt3BjfiNhnoZl85WbYmxXH2xsILvmqsECwdVn
bIyIEdAG7g==
=9bPf
-----END PGP SIGNATURE-----

From lifeng1519 at gmail.com  Thu Feb 25 04:51:48 2021
From: lifeng1519 at gmail.com (Feng Li)
Date: Thu, 25 Feb 2021 12:51:48 +0800
Subject: Windows 10 has poor bandwidth when using wireguard
Message-ID: <CAEK8JBCS9gUXFRFpBS5pns_BZ_fKWHPV3+VrxtqEpt4kA0m1Vw@mail.gmail.com>

Hi,
The wireguard version 0.3.5.
In the LAN environment, the download speed is 40MB/s through WIFI to
access the peer.
When the wireguard starts, the speed is down to 8MB/s.

Two machines are in the same LAN.
Any suggestions?
Thanks.

From laura.zelenku at wandera.com  Thu Feb 25 09:50:14 2021
From: laura.zelenku at wandera.com (Laura Zelenku)
Date: Thu, 25 Feb 2021 10:50:14 +0100
Subject: Patch: initialise device.peers.empty
Message-ID: <F332C4E6-0559-4BE0-8AF6-3AD942C01FAC@wandera.com>

Hi devs,
in some custom unit test for wireguard go I?m experiencing failed tests because ?device.peers.empty? contains default ?false? value right after device creation. Please apply following patch to initialise the value to true (empty = true) in device creation.

thanks
Laura

Index: device/device.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- device/device.go	(revision 7a0fb5bbb1720fdd9404a4cf41920e24a46e0dad)
+++ device/device.go	(date 1614241756159)
@@ -292,6 +292,7 @@
 	}
 	device.tun.mtu = int32(mtu)
 	device.peers.keyMap = make(map[NoisePublicKey]*Peer)
+	device.peers.empty.Set(true)
 	device.rate.limiter.Init()
 	device.indexTable.Init()
 	device.PopulatePools()


-- 
*IMPORTANT NOTICE*: This email, its attachments and any rights attaching 
hereto are confidential and intended exclusively for the person to whom the 
email is addressed. If you are not the intended recipient, do not read, 
copy, disclose or use the contents in any way. Wandera accepts no liability 
for any loss, damage or consequence resulting directly or indirectly from 
the use of this email and attachments.

From Jason at zx2c4.com  Thu Feb 25 10:47:46 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 25 Feb 2021 11:47:46 +0100
Subject: Patch: initialise device.peers.empty
In-Reply-To: <F332C4E6-0559-4BE0-8AF6-3AD942C01FAC@wandera.com>
References: <F332C4E6-0559-4BE0-8AF6-3AD942C01FAC@wandera.com>
Message-ID: <CAHmME9qyD4ZDoT9dyb-cdQ319jjTHYXhhXbcdFphU8+pSKdWiw@mail.gmail.com>

Hi Laura,

Thanks for the patch. Can you resubmit this as a proper git-formatted
patch containing your Signed-off-by line?

git commit -s --amend --no-edit
git send-email HEAD~

Also, you mentioned custom unit tests. Any of those suitable for
sending upstream?

Jason

From Jason at zx2c4.com  Thu Feb 25 11:23:28 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 25 Feb 2021 12:23:28 +0100
Subject: Handshake state collision between parralel RoutineHandshake
 threads
In-Reply-To: <27D86318-AED9-49EC-94EE-1FFC806533DC@wandera.com>
References: <27D86318-AED9-49EC-94EE-1FFC806533DC@wandera.com>
Message-ID: <CAHmME9oOKvMqvSw_CrjhiWU+R60AEVxLYzoaC8W7F7D5vWgWvA@mail.gmail.com>

Hi Laura,

I'm not sure this is actually a problem. The latest handshake message
should probably win the race. I don't see state machine or data
corruption here, but just one handshake interrupting another, which is
par for the course with WireGuard.

Or have I overlooked something important in the state machine implementation?

Jason

From Jason at zx2c4.com  Thu Feb 25 11:29:55 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 25 Feb 2021 12:29:55 +0100
Subject: Patch: initialise device.peers.empty
In-Reply-To: <CAHmME9qyD4ZDoT9dyb-cdQ319jjTHYXhhXbcdFphU8+pSKdWiw@mail.gmail.com>
References: <F332C4E6-0559-4BE0-8AF6-3AD942C01FAC@wandera.com>
 <CAHmME9qyD4ZDoT9dyb-cdQ319jjTHYXhhXbcdFphU8+pSKdWiw@mail.gmail.com>
Message-ID: <CAHmME9rJo5BzJDJubvVr_+0ra=hVkYv9Cvd_p_+TBcrr92KstQ@mail.gmail.com>

Fixed differently here:
https://git.zx2c4.com/wireguard-go/commit/?id=355fed440bd066b8aa32e63e04c7f92e7a097d88

From rudiwillalwaysloveyou at gmail.com  Thu Feb 25 12:16:43 2021
From: rudiwillalwaysloveyou at gmail.com (Rudi C)
Date: Thu, 25 Feb 2021 15:46:43 +0330
Subject: How to tunnel only udp traffic through Wireguard?
In-Reply-To: <CAE9z9A0mNSoCfV2oYBojk=sAtzUmA8iknMM=keAChTEZrp8p4g@mail.gmail.com>
References: <CAE9z9A0mNSoCfV2oYBojk=sAtzUmA8iknMM=keAChTEZrp8p4g@mail.gmail.com>
Message-ID: <CAE9z9A2a+G5pcjPpvRVf5Rtb6znLOCQgvsnv01iua4xwONg9NQ@mail.gmail.com>

I use naiveproxy+v2ray to proxy my tcp traffic, but naiveproxy doesn?t
support udp, and it just passes them through my normal network. I want
to tunnel all my udp traffic through WireGuard. Is this achievable?
Thanks.

From Jason at zx2c4.com  Thu Feb 25 12:16:56 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 25 Feb 2021 13:16:56 +0100
Subject: Windows 10 has poor bandwidth when using wireguard
In-Reply-To: <CAEK8JBCS9gUXFRFpBS5pns_BZ_fKWHPV3+VrxtqEpt4kA0m1Vw@mail.gmail.com>
References: <CAEK8JBCS9gUXFRFpBS5pns_BZ_fKWHPV3+VrxtqEpt4kA0m1Vw@mail.gmail.com>
Message-ID: <CAHmME9oj6v43HtVHFh7fwUT4WO7bpDsAMq-9eLREWStoEJcJRA@mail.gmail.com>

Try out version 0.3.6, just released minutes ago, which should be much
much faster.

From Jason at zx2c4.com  Thu Feb 25 13:11:08 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 25 Feb 2021 14:11:08 +0100
Subject: [Android] couldn't find "libwg-go.so" on Nexus 5X
In-Reply-To: <etPan.5f91aee8.f1a4b3.1474c@ovpn.com>
References: <etPan.5f91aee8.f1a4b3.1474c@ovpn.com>
Message-ID: <CAHmME9pLahB7n_O1tHtBAuKhAtk1cn2g=3Ko614F+8U5R6ytpA@mail.gmail.com>

Hey David,

That's a pretty interesting bug... are you able to reproduce every
time on that device? I'd seen reports like this in the play console
but never with useful verbose error reporting like this. The fact that
it's giving you /lib/x86 on an arm device is madness...

Jason

From Jason at zx2c4.com  Thu Feb 25 14:35:45 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 25 Feb 2021 15:35:45 +0100
Subject: Windows 10 has poor bandwidth when using wireguard
In-Reply-To: <CAEK8JBCg94R_AgrS6ek7DEUjEiCGFE4M3utipCRDLj5VUQa0vA@mail.gmail.com>
References: <CAEK8JBCS9gUXFRFpBS5pns_BZ_fKWHPV3+VrxtqEpt4kA0m1Vw@mail.gmail.com>
 <CAHmME9oj6v43HtVHFh7fwUT4WO7bpDsAMq-9eLREWStoEJcJRA@mail.gmail.com>
 <CAEK8JBCg94R_AgrS6ek7DEUjEiCGFE4M3utipCRDLj5VUQa0vA@mail.gmail.com>
Message-ID: <CAHmME9rKdyBq-Ky1jwnUuE4vzbqES2H_J=QnnfB45ifQ0mjTGw@mail.gmail.com>

Hi Feng,

Great to hear!

You wrote:
> In the LAN environment, the download speed is 40MB/s through WIFI to access the peer.
> When the wireguard starts, the speed is down to 8MB/s.
And now:
> The speed is up to 25MB/s, 2x faster than the previous version.

So new speed is 3.125x old speed.

I'll keep thinking about the problem space and try to get us up to the
full 40MB/s.

Jason

From lifeng1519 at gmail.com  Thu Feb 25 14:37:51 2021
From: lifeng1519 at gmail.com (Feng Li)
Date: Thu, 25 Feb 2021 22:37:51 +0800
Subject: Windows 10 has poor bandwidth when using wireguard
In-Reply-To: <CAHmME9rKdyBq-Ky1jwnUuE4vzbqES2H_J=QnnfB45ifQ0mjTGw@mail.gmail.com>
References: <CAEK8JBCS9gUXFRFpBS5pns_BZ_fKWHPV3+VrxtqEpt4kA0m1Vw@mail.gmail.com>
 <CAHmME9oj6v43HtVHFh7fwUT4WO7bpDsAMq-9eLREWStoEJcJRA@mail.gmail.com>
 <CAEK8JBCg94R_AgrS6ek7DEUjEiCGFE4M3utipCRDLj5VUQa0vA@mail.gmail.com>
 <CAHmME9rKdyBq-Ky1jwnUuE4vzbqES2H_J=QnnfB45ifQ0mjTGw@mail.gmail.com>
Message-ID: <CAEK8JBD1xt8nK6UJySmUMtG4F9tt_Te4-dXMw8uoKzTxxAOOWg@mail.gmail.com>

Great, Amazing!
thanks!

On Thu, Feb 25, 2021 at 10:35 PM Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>
> Hi Feng,
>
> Great to hear!
>
> You wrote:
> > In the LAN environment, the download speed is 40MB/s through WIFI to access the peer.
> > When the wireguard starts, the speed is down to 8MB/s.
> And now:
> > The speed is up to 25MB/s, 2x faster than the previous version.
>
> So new speed is 3.125x old speed.
>
> I'll keep thinking about the problem space and try to get us up to the
> full 40MB/s.
>
> Jason

From s.devanath at gmail.com  Thu Feb 25 06:30:47 2021
From: s.devanath at gmail.com (Devanath S)
Date: Wed, 24 Feb 2021 22:30:47 -0800
Subject: wireguard-go on windows
Message-ID: <CADjMg7Vp+p7n_JLMfFU4GwtNu_X3ARFJOuhxLasonuD-fvB5Hg@mail.gmail.com>

Hi All,

I am trying to run wireguard-go on windows for debugging purpose only
and seem to get the below error.

Login user is local admin on the box and it is run as administrator. Plz advice.

c:\Go\wire-win\wireguard-go>.\wireguard.exe wg0
Warning: this is a test program for Windows, mainly used for debugging
this Go package. For a real WireGuard for Windows client, the repo you
want is <https://git.zx2c4.com/wireguard-windows/>, which includes
this code as a module.
INFO: (wg0) 2021/02/24 22:09:55 Starting wireguard-go version 0.0.20201118
DEBUG: (wg0) 2021/02/24 22:09:55 Debug log enabled
2021/02/24 22:09:55 [Wintun] CreateAdapter: Creating adapter
DEBUG: (wg0) 2021/02/24 22:09:56 UDP bind has been updated
INFO: (wg0) 2021/02/24 22:09:56 Device started
ERROR: (wg0) 2021/02/24 22:09:56 Failed to listen on uapi socket: open
\\.\pipe\ProtectedPrefix\Administrators\WireGuard\wg0: This security
ID may not be assigned as the owner of this object.


Regards,
srini

From Jason at zx2c4.com  Thu Feb 25 15:53:56 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 25 Feb 2021 16:53:56 +0100
Subject: wireguard-go on windows
In-Reply-To: <CADjMg7Vp+p7n_JLMfFU4GwtNu_X3ARFJOuhxLasonuD-fvB5Hg@mail.gmail.com>
References: <CADjMg7Vp+p7n_JLMfFU4GwtNu_X3ARFJOuhxLasonuD-fvB5Hg@mail.gmail.com>
Message-ID: <CAHmME9o9YNUKawN8bYArdfbDF9cZF+1nSTZuMef5RRVeaPTrXQ@mail.gmail.com>

I'm curious to learn what you're trying to debug this way; you're
better off using wireguard-windows.

The pipe permissions are too strict internally, it appears. Try
running as Local System.

Jason

From s.devanath at gmail.com  Thu Feb 25 16:42:58 2021
From: s.devanath at gmail.com (Devanath S)
Date: Thu, 25 Feb 2021 08:42:58 -0800
Subject: Fwd: wireguard-go on windows
In-Reply-To: <CADjMg7VHCX04+UgCY4AwLub5=otDJ9dTHMLdX16OVDZoDG88xQ@mail.gmail.com>
References: <CADjMg7Vp+p7n_JLMfFU4GwtNu_X3ARFJOuhxLasonuD-fvB5Hg@mail.gmail.com>
 <CAHmME9o9YNUKawN8bYArdfbDF9cZF+1nSTZuMef5RRVeaPTrXQ@mail.gmail.com>
 <CADjMg7VHCX04+UgCY4AwLub5=otDJ9dTHMLdX16OVDZoDG88xQ@mail.gmail.com>
Message-ID: <CADjMg7WuZ-W0bRPM9oCXT6CnVSdYK=fMRANna7XimtEkerOerw@mail.gmail.com>

Hi Jason,

Thank you for your prompt response.

We are trying to use wgctrl way of configuring the wireguard devices
and facing issues while creating/configuring the wireguard device on
windows.

1) First problem was while creating the wintun device using wintun.dll
and using wgctrl for configuring it.  It hangs in
wgclient.ConfigureDevice api()

2) So tried to first create the device through wireguard.exe. And then
used wgctrl way to configure it, but wgClient.Devices() is not able to
get the devices on our test windows boxes (even though it works on my
development machine)

So was trying to investigate how wireguard works on windows.  With
wgctrl package I was able to get it working on linux/mac, but facing
such issues on windows.  The reason for using wgctrl was to make it
configurable through our own APP.

Regard,
Dev

On Thu, Feb 25, 2021 at 7:54 AM Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>
> I'm curious to learn what you're trying to debug this way; you're
> better off using wireguard-windows.
>
> The pipe permissions are too strict internally, it appears. Try
> running as Local System.
>
> Jason

From miclman.0x0efbd3 at gmail.com  Thu Feb 25 16:56:50 2021
From: miclman.0x0efbd3 at gmail.com (Michael Lennartz)
Date: Thu, 25 Feb 2021 17:56:50 +0100
Subject: Wireguard on Mac not working through a corporate VPN
Message-ID: <80702DEB-2D7D-4B4C-A268-3E4C8FCB746C@gmail.com>

Hi team,

Since a while already we?re testing Wireguard in our environment and I think, it?s a great project.
The focus is currently on Mac clients, where we?ve used the CLI version from homebrew so far very successfully.

It?s important to note, the we?re reaching the server peer via another (Corporate) VPN interface.


Recently we?ve updated to MacOS 11.2 (Big Sur) on the M1 architecture and the (most recent) CLI version of Wireguard stopped working:
When I now try to connect to the server peer, the "wg-quick up ?? hangs at the first ?wg set utun3 peer ?? command.

Then we try to use the GUI version from the AppStore, which seems to establish the tunnel interface and routing correctly. But we can?t see any traffic passing the corporate VPN interface towards the server peer. Even not the initial handshake.


Do you have some hints, if this setup is supposed to be working ? Or any suggestion where to look at ?


Br,
Michael


From Jason at zx2c4.com  Thu Feb 25 17:54:18 2021
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Thu, 25 Feb 2021 18:54:18 +0100
Subject: wireguard-go on windows
In-Reply-To: <CADjMg7WuZ-W0bRPM9oCXT6CnVSdYK=fMRANna7XimtEkerOerw@mail.gmail.com>
References: <CADjMg7Vp+p7n_JLMfFU4GwtNu_X3ARFJOuhxLasonuD-fvB5Hg@mail.gmail.com>
 <CAHmME9o9YNUKawN8bYArdfbDF9cZF+1nSTZuMef5RRVeaPTrXQ@mail.gmail.com>
 <CADjMg7VHCX04+UgCY4AwLub5=otDJ9dTHMLdX16OVDZoDG88xQ@mail.gmail.com>
 <CADjMg7WuZ-W0bRPM9oCXT6CnVSdYK=fMRANna7XimtEkerOerw@mail.gmail.com>
Message-ID: <CAHmME9qNVFuf170GEObu1_d=F=snG94qE1Y+KWZJ5DEeL_14Vg@mail.gmail.com>

+ Matt Layher

Hi Davanath,

> We are trying to use wgctrl way of configuring the wireguard devices
> and facing issues while creating/configuring the wireguard device on
> windows.
>
> 1) First problem was while creating the wintun device using wintun.dll
> and using wgctrl for configuring it.  It hangs in
> wgclient.ConfigureDevice api()

wgctrl works with wireguard. wireguard uses wintun, but wireguard is not wintun.

>
> 2) So tried to first create the device through wireguard.exe. And then
> used wgctrl way to configure it, but wgClient.Devices() is not able to
> get the devices on our test windows boxes (even though it works on my
> development machine)

This sounds like a potential bug in wgctrl.

Matt -- I wonder if there's a bug in the parser, recently unearthed by
a change in wireguard-go. Specifically, uapi stipulates that requests
and responses end with \n\n. Is it possible that you're relying on the
socket to EOF, instead of looking for the \n\n? Recent wireguard-go
keeps the socket open, in case you want to send one request after
another.

Jason

From mdlayher at gmail.com  Thu Feb 25 20:14:58 2021
From: mdlayher at gmail.com (Matt Layher)
Date: Thu, 25 Feb 2021 15:14:58 -0500
Subject: wireguard-go on windows
In-Reply-To: <CAHmME9qNVFuf170GEObu1_d=F=snG94qE1Y+KWZJ5DEeL_14Vg@mail.gmail.com>
References: <CADjMg7Vp+p7n_JLMfFU4GwtNu_X3ARFJOuhxLasonuD-fvB5Hg@mail.gmail.com>
 <CAHmME9o9YNUKawN8bYArdfbDF9cZF+1nSTZuMef5RRVeaPTrXQ@mail.gmail.com>
 <CADjMg7VHCX04+UgCY4AwLub5=otDJ9dTHMLdX16OVDZoDG88xQ@mail.gmail.com>
 <CADjMg7WuZ-W0bRPM9oCXT6CnVSdYK=fMRANna7XimtEkerOerw@mail.gmail.com>
 <CAHmME9qNVFuf170GEObu1_d=F=snG94qE1Y+KWZJ5DEeL_14Vg@mail.gmail.com>
Message-ID: <f9d3c90b-57d7-a0fb-2c28-9a5ba8b8e103@gmail.com>

A glance at 
https://github.com/WireGuard/wgctrl-go/blob/master/internal/wguser/parse.go#L48 
seems to indicate that we treat the first "blank" line produced by 
bufio.Scanner (which strips \n) as a sentinel to stop parsing, which 
would mean something like "errno=0\n\n" would parse the errno and be 
done once it interprets the final line "\n".

The tests seem to indicate this works as expected, but I don't regularly 
develop on Windows and welcome PRs if something has changed.
- Matt

On 2/25/21 12:54 PM, Jason A. Donenfeld wrote:
> + Matt Layher
>
> Hi Davanath,
>
>> We are trying to use wgctrl way of configuring the wireguard devices
>> and facing issues while creating/configuring the wireguard device on
>> windows.
>>
>> 1) First problem was while creating the wintun device using wintun.dll
>> and using wgctrl for configuring it.  It hangs in
>> wgclient.ConfigureDevice api()
> wgctrl works with wireguard. wireguard uses wintun, but wireguard is not wintun.
>
>> 2) So tried to first create the device through wireguard.exe. And then
>> used wgctrl way to configure it, but wgClient.Devices() is not able to
>> get the devices on our test windows boxes (even though it works on my
>> development machine)
> This sounds like a potential bug in wgctrl.
>
> Matt -- I wonder if there's a bug in the parser, recently unearthed by
> a change in wireguard-go. Specifically, uapi stipulates that requests
> and responses end with \n\n. Is it possible that you're relying on the
> socket to EOF, instead of looking for the \n\n? Recent wireguard-go
> keeps the socket open, in case you want to send one request after
> another.
>
> Jason

From clint at openziti.org  Thu Feb 25 20:41:03 2021
From: clint at openziti.org (Clint Dovholuk)
Date: Thu, 25 Feb 2021 15:41:03 -0500
Subject: Wintun releases notification - or 'latest' option?
Message-ID: <a477c3a212d0566b76618848651827bd@mail.gmail.com>

Any chance you'd consider sending an email to the list when wintun gets an
update?  Right now there's no great way for me to get the 'latest'. The code
changes infrequently so it's not a huge deal but it'd be fantastic to be
notified when a new release is out.

Alternatively some form of redirect from say
https://www.wintun.net/builds/latest.zip -->
https://www.wintun.net/builds/wintun-0.10.2.zip would be really nice?

Something to consider, maybe? :)

Thanks
-Clint

From wireguard at lindenberg.one  Thu Feb 25 21:17:06 2021
From: wireguard at lindenberg.one (Joachim Lindenberg)
Date: Thu, 25 Feb 2021 22:17:06 +0100
Subject: best way for redundancy?
Message-ID: <03ff01d70bbb$953d4750$bfb7d5f0$@lindenberg.one>

Hello
I do have a wireguard VPN that connects multiple sites. Unfortunately some routers are not available all the time, causing network disruption. I?d like to improve connectivity via redundancy, i.e. add multiple routers that connect the networks.
What are the options to do that using wireguard? Can I have multiple peers with different keys and endpoint but same Allowed IPs? Will wireguard select the one available?
Any suggestions?
Thanks, Joachim


From s.devanath at gmail.com  Thu Feb 25 20:21:52 2021
From: s.devanath at gmail.com (Devanath S)
Date: Thu, 25 Feb 2021 12:21:52 -0800
Subject: wireguard-go on windows
In-Reply-To: <f9d3c90b-57d7-a0fb-2c28-9a5ba8b8e103@gmail.com>
References: <CADjMg7Vp+p7n_JLMfFU4GwtNu_X3ARFJOuhxLasonuD-fvB5Hg@mail.gmail.com>
 <CAHmME9o9YNUKawN8bYArdfbDF9cZF+1nSTZuMef5RRVeaPTrXQ@mail.gmail.com>
 <CADjMg7VHCX04+UgCY4AwLub5=otDJ9dTHMLdX16OVDZoDG88xQ@mail.gmail.com>
 <CADjMg7WuZ-W0bRPM9oCXT6CnVSdYK=fMRANna7XimtEkerOerw@mail.gmail.com>
 <CAHmME9qNVFuf170GEObu1_d=F=snG94qE1Y+KWZJ5DEeL_14Vg@mail.gmail.com>
 <f9d3c90b-57d7-a0fb-2c28-9a5ba8b8e103@gmail.com>
Message-ID: <CADjMg7UzeURgQPd9+XDsOLAOQZJa_7Nsz6UKgyXWHBzTC4E+eA@mail.gmail.com>

Hi  Jason/Matt,

I could try running any debug binaries or debug patches, that you want
to run to troubleshoot the issue. Plz, advice.

Regards,
Dev

On Thu, Feb 25, 2021 at 12:15 PM Matt Layher <mdlayher at gmail.com> wrote:
>
> A glance at
> https://github.com/WireGuard/wgctrl-go/blob/master/internal/wguser/parse.go#L48
> seems to indicate that we treat the first "blank" line produced by
> bufio.Scanner (which strips \n) as a sentinel to stop parsing, which
> would mean something like "errno=0\n\n" would parse the errno and be
> done once it interprets the final line "\n".
>
> The tests seem to indicate this works as expected, but I don't regularly
> develop on Windows and welcome PRs if something has changed.
> - Matt
>
> On 2/25/21 12:54 PM, Jason A. Donenfeld wrote:
> > + Matt Layher
> >
> > Hi Davanath,
> >
> >> We are trying to use wgctrl way of configuring the wireguard devices
> >> and facing issues while creating/configuring the wireguard device on
> >> windows.
> >>
> >> 1) First problem was while creating the wintun device using wintun.dll
> >> and using wgctrl for configuring it.  It hangs in
> >> wgclient.ConfigureDevice api()
> > wgctrl works with wireguard. wireguard uses wintun, but wireguard is not wintun.
> >
> >> 2) So tried to first create the device through wireguard.exe. And then
> >> used wgctrl way to configure it, but wgClient.Devices() is not able to
> >> get the devices on our test windows boxes (even though it works on my
> >> development machine)
> > This sounds like a potential bug in wgctrl.
> >
> > Matt -- I wonder if there's a bug in the parser, recently unearthed by
> > a change in wireguard-go. Specifically, uapi stipulates that requests
> > and responses end with \n\n. Is it possible that you're relying on the
> > socket to EOF, instead of looking for the \n\n? Recent wireguard-go
> > keeps the socket open, in case you want to send one request after
> > another.
> >
> > Jason

From iiordanov at gmail.com  Thu Feb 25 17:48:45 2021
From: iiordanov at gmail.com (i iordanov)
Date: Thu, 25 Feb 2021 12:48:45 -0500
Subject: Nested Wireguard tunnels not working on Android and Windows
Message-ID: <CAMS0tn2Pq65UAVWqjV6mNd1hGqvUxVhVgmR3s2h+kh=XF8jyKA@mail.gmail.com>

Hello!

In order to allow traffic to assist devices that cannot reach each
other directly, I am setting up wireguard tunnels through a server
with a public IP (40.30.40.30 in the example below).

For reasons of privacy, I'd like for the server to not be able to
decrypt my traffic. As a result, I would like for one encapsulating
Wireguard tunnel (subnet 10.1.2.0/24) to be peered through the server,
while a second nested Wireguard tunnel (subnet 10.1.3.0/24) to be
established through the first tunnel, peered only at the two devices
(Android and Linux in this case) that need to communicate.

An attempt was made to use a single Wireguard interface. Doing it this
way works between two Linux machines and even between Linux and Mac OS
X, but does not work between a Pixel 3a XL running Android 11 with the
GoBackend Wireguard implementation and my Linux laptop. I also tried
the same config on Windows 10 to no avail.

The config on the Android device, obtained with toWgQuickString():
======================================
    [Interface]
    Address = 10.1.2.5/24, 10.1.3.5/24
    ListenPort = 46847
    MTU = 1200
    PrivateKey = PRIVATE_KEY

    [Peer]
    AllowedIPs = 10.1.2.0/24
    Endpoint = 40.30.40.30:10000
    PersistentKeepalive = 3600
    PublicKey = VF5dic+a+6MllssbV+ShVwEBRrX9gr4do2iNylWrPGs=

    [Peer]
    AllowedIPs = 10.1.3.1/32
    Endpoint = 10.1.2.1:51555
    PersistentKeepalive = 3600
    PublicKey = 0Awdb451Z4+3Gezm7UlbRquC1kcF52r68J9wG1x/zUE=
======================================

The 10.1.2.0/24 subnet is the one that is "visible" to the public
server. The 10.1.3.0/24 subnet is the one that is private to the two
devices.

The devices can actually reach each other with netcat over UDP at
10.1.2.5:46847 and 10.1.2.1:51555 respectively. So the "encapsulating"
tunnel is working, and iperf3 were used to test it over UDP and TCP
successfully.

The "nested" tunnel does not get established.

The following permutations of the above config have the commented problems:

# Only 10.1.2.0/24 works, 10.1.3.0/24 does not.
    Address = 10.1.2.1/24, 10.1.3.1/24

# Only 10.1.2.0/24 works, 10.1.3.0/24 (as expected) does not.
    Address = 10.1.2.1/24

# Neither network works
    Address = 10.1.3.1/24, 10.1.2.1/24

This looks like a bug that is triggered when multiple addresses are
assigned to the interface.

Any suggestions on what to try are welcome.

Thanks!
iordan


-- 
The conscious mind has only one thread of execution.

From timstartuptim at gmail.com  Fri Feb 26 20:30:19 2021
From: timstartuptim at gmail.com (Tim)
Date: Fri, 26 Feb 2021 14:30:19 -0600
Subject: Wireguard Windows client/exe - disable creation of routes
Message-ID: <CABmZ64xFvs+yDU4=ZYYiRzNicUQK2u3zt2+Wb0S2V+1rvGc8fQ@mail.gmail.com>

Hello,

Does anybody know how one can use the Wireguard Windows
client/executables to establish a connection but not have the
client/executables modify the Windows routing tables.  Functionality
similar to the "Table = off" in the linux WG but using the Wireguard
Windows executables (wg.exe/wireguard.exe)?

My goal is to use the existing executables but to handle route
creation on my own.

I've tried "Table = off" but it does not work.  Nor does a cursory
examination of the source code.  Though this seems to be something
basic, so I imagine I am missing how to do it.

Any idea?

Thanks

From frank at carmickle.com  Sat Feb 27 17:16:47 2021
From: frank at carmickle.com (Frank Carmickle)
Date: Sat, 27 Feb 2021 12:16:47 -0500
Subject: Nested Wireguard tunnels not working on Android and Windows
In-Reply-To: <CAMS0tn2Pq65UAVWqjV6mNd1hGqvUxVhVgmR3s2h+kh=XF8jyKA@mail.gmail.com>
References: <CAMS0tn2Pq65UAVWqjV6mNd1hGqvUxVhVgmR3s2h+kh=XF8jyKA@mail.gmail.com>
Message-ID: <E8F8664E-3C8A-45B9-B9B8-1A1E2873289B@carmickle.com>

Iordan,

It's not totally clear to me how you are trying to achieve this, however I'm pretty certain that you want to be creating a second interface that routes the traffic to the endpoint reachable inside the other tunnel, 

You say that it's possible to run a nested configuration on Linux and Macos with just a single interface each,. Have you done a packet capture to prove that that is in fact what is happening? That doesn't seem like how it would act given the design goals.

--FC

On Feb 25, 2021, at 12:48 PM, i iordanov <iiordanov at gmail.com> wrote:
> 
> Hello!
> 
> In order to allow traffic to assist devices that cannot reach each
> other directly, I am setting up wireguard tunnels through a server
> with a public IP (40.30.40.30 in the example below).
> 
> For reasons of privacy, I'd like for the server to not be able to
> decrypt my traffic. As a result, I would like for one encapsulating
> Wireguard tunnel (subnet 10.1.2.0/24) to be peered through the server,
> while a second nested Wireguard tunnel (subnet 10.1.3.0/24) to be
> established through the first tunnel, peered only at the two devices
> (Android and Linux in this case) that need to communicate.
> 
> An attempt was made to use a single Wireguard interface. Doing it this
> way works between two Linux machines and even between Linux and Mac OS
> X, but does not work between a Pixel 3a XL running Android 11 with the
> GoBackend Wireguard implementation and my Linux laptop. I also tried
> the same config on Windows 10 to no avail.
> 
> The config on the Android device, obtained with toWgQuickString():
> ======================================
>    [Interface]
>    Address = 10.1.2.5/24, 10.1.3.5/24
>    ListenPort = 46847
>    MTU = 1200
>    PrivateKey = PRIVATE_KEY
> 
>    [Peer]
>    AllowedIPs = 10.1.2.0/24
>    Endpoint = 40.30.40.30:10000
>    PersistentKeepalive = 3600
>    PublicKey = VF5dic+a+6MllssbV+ShVwEBRrX9gr4do2iNylWrPGs=
> 
>    [Peer]
>    AllowedIPs = 10.1.3.1/32
>    Endpoint = 10.1.2.1:51555
>    PersistentKeepalive = 3600
>    PublicKey = 0Awdb451Z4+3Gezm7UlbRquC1kcF52r68J9wG1x/zUE=
> ======================================
> 
> The 10.1.2.0/24 subnet is the one that is "visible" to the public
> server. The 10.1.3.0/24 subnet is the one that is private to the two
> devices.
> 
> The devices can actually reach each other with netcat over UDP at
> 10.1.2.5:46847 and 10.1.2.1:51555 respectively. So the "encapsulating"
> tunnel is working, and iperf3 were used to test it over UDP and TCP
> successfully.
> 
> The "nested" tunnel does not get established.
> 
> The following permutations of the above config have the commented problems:
> 
> # Only 10.1.2.0/24 works, 10.1.3.0/24 does not.
>    Address = 10.1.2.1/24, 10.1.3.1/24
> 
> # Only 10.1.2.0/24 works, 10.1.3.0/24 (as expected) does not.
>    Address = 10.1.2.1/24
> 
> # Neither network works
>    Address = 10.1.3.1/24, 10.1.2.1/24
> 
> This looks like a bug that is triggered when multiple addresses are
> assigned to the interface.
> 
> Any suggestions on what to try are welcome.
> 
> Thanks!
> iordan
> 
> 
> -- 
> The conscious mind has only one thread of execution.


From labawi-wg at matrix-dream.net  Sat Feb 27 17:36:52 2021
From: labawi-wg at matrix-dream.net (Ivan =?iso-8859-1?Q?Lab=E1th?=)
Date: Sat, 27 Feb 2021 17:36:52 +0000
Subject: ipv6 multicast peer ?
In-Reply-To: <CADdae-jn684uFbFPgkWY47TsmOAnTfzLq7dVGrBxv8Vv8Gcg-Q@mail.gmail.com>
References: <CADdae-jn684uFbFPgkWY47TsmOAnTfzLq7dVGrBxv8Vv8Gcg-Q@mail.gmail.com>
Message-ID: <20210227173652.GA27005@matrix-dream.net>

Hello,

you can't, as wireguard does not currently implement multicast nor broadcast.

You can add unicast IPs and duplicate packets if you feel like it,
or add another layer or two, which is what most people posting here do.

Regards,
Ivan

On Fri, Feb 19, 2021 at 08:58:53AM +0100, nicolas prochazka wrote:
> Hello,
> On a "server side" I've for example these peers, and i want to send a
> ipv6 multicast group
> ff02::1
> How can I do that with peer / allowed-ips routing ?
> 
> Regards
> Nicolas
> 
> interface: wg0
>   public key: **************
>   private key: (hidden)
>   listening port: 6081
> 
> peer: ************
>   preshared key: (hidden)
>   endpoint: x.x.130.134:6081
>   allowed ips: fd00:0:222d:0:f64d:30ff:fe6e:222d/128
>   latest handshake: 52 seconds ago
>   transfer: 56.96 MiB received, 1.96 GiB sent
> 
> peer: **********
>   preshared key: (hidden)
>   endpoint: x.x.x.x:6081
>   allowed ips: fd00::8e2:97ff:fe2e:3/128,
> fd00:0:2836:0:1e69:7aff:fe01:2836/128,
> fd00:0:3340:0:a00:27ff:fe5a:3340/128
>   latest handshake: 1 minute, 54 seconds ago
>   transfer: 513.17 MiB received, 6.27 GiB sent
>   persistent keepalive: every 25 seconds
> 
> peer: *****
>   preshared key: (hidden)
>   endpoint: x.x.x.x:6081
>   allowed ips: fd00::/32, fd00::8e2:97ff:fe2e:0/112,
> fd00::8e2:97ff:fe2e:eeee/128
>   latest handshake: 1 minute, 59 seconds ago
>   transfer: 2.70 MiB received, 6.69 MiB sent
>   persistent keepalive: every 25 seconds
> 
> peer: **************
>   preshared key: (hidden)
>   endpoint: x.x.100.142:6081
>   allowed ips: fd00:0:ec58:0:b26e:bfff:fe1e:2d5a/128
>   latest handshake: 2 minutes, 5 seconds ago
>   transfer: 195.00 MiB received, 2.19 GiB sent

From kendziorra at dresearch-fe.de  Sat Feb 27 11:19:16 2021
From: kendziorra at dresearch-fe.de (Heiko Kendziorra)
Date: Sat, 27 Feb 2021 12:19:16 +0100
Subject: Fwd: Wireguard Win10 Client not work through an openVPN tunnel on the
 same machine
In-Reply-To: <CAA4ESQ28f+Q77uduCHRmb9YDqCfE3qKp7+P=n=0MU9VDOCCbDg@mail.gmail.com>
References: <CAA4ESQ28f+Q77uduCHRmb9YDqCfE3qKp7+P=n=0MU9VDOCCbDg@mail.gmail.com>
Message-ID: <CAA4ESQ2k-EK9-o6ThDQYt+C87ojvE71PLmXFyKpMg4BC2jNTYA@mail.gmail.com>

Machine A in Intranet Windows 10 Prof Version : 20H2
Address 172.1.2.3
Firewall is open for  webserver und wireguard (8080 tcp, 44444 udp)
is WireguardServer  Version 0.3.7

wg.conf:
PublicKey = A8C8+bRYaqu2MKs2SpwuRRgmwqItYwFFJjk77UtUUxU=
[Interface]
PrivateKey = ********************************
ListenPort = 44444
Address = 192.168.44.44/32
[Peer]
PublicKey = JkacJ6IYPUgCOv+OdHN6ZMJ+JRZr6V5/kDzthil/CUs=
AllowedIPs = 192.168.44.4/32
PersistentKeepalive = 25
--------------------------------------------------------------------------------
Machine B extern over openVPN connected with the Intranet Windows 10
Prof Version : 20H2  (OpenVPN Client running on B)
Address 172.11.12.13 could reach A over Routing  (Test: Webserver on
A: 172.1.2.3:8080)
is WireguardClient Version 0.3.7

wg.conf:
PublicKey = JkacJ6IYPUgCOv+OdHN6ZMJ+JRZr6V5/kDzthil/CUs=
[Interface]
PrivateKey = **********************
Address = 192.168.44.4/32

[Peer]
PublicKey = A8C8+bRYaqu2MKs2SpwuRRgmwqItYwFFJjk77UtUUxU=
AllowedIPs = 192.168.44.44/32
Endpoint = 172.16.41.20:44444
PersistentKeepalive = 25
--------------------------------------------------------------------------------

Result after Activation
The Client B could not estable a working Wireguard-Connetion to A :

Protokoll Server:
2021-02-27 10:53:02.636: [TUN] [44444] Startup complete
2021-02-27 10:53:03.615: [TUN] [44444] peer(Jkac?/CUs) - Received
handshake initiation
2021-02-27 10:53:03.615: [TUN] [44444] peer(Jkac?/CUs) - Sending
handshake response
2021-02-27 10:53:07.821: [TUN] [44444] peer(Jkac?/CUs) - Handshake did
not complete after 5 seconds, retrying (try 2)
2021-02-27 10:53:11.480: [MGR] [Wintun] IsPoolMember: Reading pool
devpkey failed, falling back: Element nicht gefunden. (Code
0x00000490)
2021-02-27 10:53:28.626: [TUN] [44444] peer(Jkac?/CUs) - Sending
handshake initiation
2021-02-27 10:53:33.794: [TUN] [44444] peer(Jkac?/CUs) - Handshake did
not complete after 5 seconds, retrying (try 2)
2021-02-27 10:53:33.794: [TUN] [44444] peer(Jkac?/CUs) - Sending
handshake initiation
2021-02-27 10:53:39.094: [TUN] [44444] peer(Jkac?/CUs) - Handshake did
not complete after 5 seconds, retrying (try 3)
2021-02-27 10:53:39.094: [TUN] [44444] peer(Jkac?/CUs) - Sending
handshake initiation
2021-02-27 10:53:44.286: [TUN] [44444] peer(Jkac?/CUs) - Handshake did
not complete after 5 seconds, retrying (try 4)
2021-02-27 10:53:44.286: [TUN] [44444] peer(Jkac?/CUs) - Sending
handshake initiation
2021-02-27 10:53:49.549: [TUN] [44444] peer(Jkac?/CUs) - Handshake did
not complete after 5 seconds, retrying (try 5)
2021-02-27 10:53:49.549: [TUN] [44444] peer(Jkac?/CUs) - Sending
handshake initiation

Protokoll Client:
2021-02-27 10:53:02.793: [TUN] [test-44444] Startup complete
2021-02-27 10:53:02.836: [TUN] [test-44444] peer(A8C8?UUxU) - Received
handshake response
2021-02-27 10:53:23.530: [TUN] [test-44444] peer(A8C8?UUxU) - Retrying
handshake because we stopped hearing back after 15 seconds
2021-02-27 10:53:23.530: [TUN] [test-44444] peer(A8C8?UUxU) - Sending
handshake initiation
2021-02-27 10:53:27.815: [TUN] [test-44444] peer(A8C8?UUxU) - Received
handshake initiation
2021-02-27 10:53:27.815: [TUN] [test-44444] peer(A8C8?UUxU) - Sending
handshake response
2021-02-27 10:53:28.815: [TUN] [test-44444] peer(A8C8?UUxU) -
Handshake did not complete after 5 seconds, retrying (try 2)
2021-02-27 10:53:32.982: [TUN] [test-44444] peer(A8C8?UUxU) - Received
handshake initiation
2021-02-27 10:53:32.982: [TUN] [test-44444] peer(A8C8?UUxU) - Sending
handshake response
2021-02-27 10:53:38.283: [TUN] [test-44444] peer(A8C8?UUxU) - Received
handshake initiation
2021-02-27 10:53:38.283: [TUN] [test-44444] peer(A8C8?UUxU) - Sending
handshake response
2021-02-27 10:53:43.475: [TUN] [test-44444] peer(A8C8?UUxU) - Received
handshake initiation
2021-02-27 10:53:43.475: [TUN] [test-44444] peer(A8C8?UUxU) - Sending
handshake response
2021-02-27 10:53:48.738: [TUN] [test-44444] peer(A8C8?UUxU) - Received
handshake initiation
2021-02-27 10:53:48.738: [TUN] [test-44444] peer(A8C8?UUxU) - Sending
handshake response
2021-02-27 10:53:54.066: [TUN] [test-44444] peer(A8C8?UUxU) - Received
handshake initiation
2021-02-27 10:53:54.066: [TUN] [test-44444] peer(A8C8?UUxU) - Sending
handshake response
2021-02-27 10:53:59.148: [TUN] [test-44444] peer(A8C8?UUxU) - Received
handshake initiation
2021-02-27 10:53:59.148: [TUN] [test-44444] peer(A8C8?UUxU) - Sending
handshake response
2021-02-27 10:54:04.459: [TUN] [test-44444] peer(A8C8?UUxU) - Received
handshake initiation
2021-02-27 10:54:04.459: [TUN] [test-44444] peer(A8C8?UUxU) - Sending
handshake response
2021-02-27 10:54:09.601: [TUN] [test-44444] Device closing

Apparently, the only message that the server has received from the
client is the one that was sent to the public address on port 44444.
After that, the client can no longer send a message - but the other
way round it can.

Modifikation

start a Win10 Sandbox on B.
install the Wireguard Client  there with the same configuration like on B
deactivate  WG-Client on  B
the Sandbox could reach A over routing through the running Open-VPN of B
under these conditions, the wiregiard connection can also be established!!

Protokoll Server:
2021-02-27 11:46:04.958: [TUN] [44444] Startup complete
2021-02-27 11:46:05.762: [TUN] [44444] peer(Jkac?/CUs) - Received
handshake initiation
2021-02-27 11:46:05.762: [TUN] [44444] peer(Jkac?/CUs) - Sending
handshake response
2021-02-27 11:46:05.786: [TUN] [44444] peer(Jkac?/CUs) - Receiving
keepalive packet
2021-02-27 11:46:13.757: [MGR] [Wintun] IsPoolMember: Reading pool
devpkey failed, falling back: Element nicht gefunden. (Code
0x00000490)
2021-02-27 11:46:30.795: [TUN] [44444] peer(Jkac?/CUs) - Sending
keepalive packet
2021-02-27 11:46:30.812: [TUN] [44444] peer(Jkac?/CUs) - Receiving
keepalive packet

Protokoll Client:
2021-02-27 11:46:05.050: [TUN] [wg-test-sandbox] Startup complete
2021-02-27 11:46:05.065: [TUN] [wg-test-sandbox] peer(A8C8?UUxU) -
Received handshake response
2021-02-27 11:46:05.088: [TUN] [wg-test-sandbox] peer(A8C8?UUxU) -
Receiving keepalive packet
2021-02-27 11:46:30.093: [TUN] [wg-test-sandbox] peer(A8C8?UUxU) -
Sending keepalive packet
2021-02-27 11:46:30.097: [TUN] [wg-test-sandbox] peer(A8C8?UUxU) -
Receiving keepalive packet

Heiko Kendziorra

From me at aaronmdjones.net  Sun Feb 28 00:53:41 2021
From: me at aaronmdjones.net (Aaron Jones)
Date: Sun, 28 Feb 2021 00:53:41 +0000
Subject: Nested Wireguard tunnels not working on Android and Windows
In-Reply-To: <E8F8664E-3C8A-45B9-B9B8-1A1E2873289B@carmickle.com>
References: <CAMS0tn2Pq65UAVWqjV6mNd1hGqvUxVhVgmR3s2h+kh=XF8jyKA@mail.gmail.com>
 <E8F8664E-3C8A-45B9-B9B8-1A1E2873289B@carmickle.com>
Message-ID: <65365aa6-cdd0-f9dc-f894-3a040ca596ae@aaronmdjones.net>

On 27/02/2021 17:16, Frank Carmickle wrote:
> Iordan,
>
> You say that it's possible to run a nested configuration on
> Linux and Macos with just a single interface each. Have you
> done a packet capture to prove that that is in fact what is
> happening? That doesn't seem like how it would act given the
> design goals.

Nesting (Using one of Peer A's AllowedIPs as Peer B's Endpoint) does
work within the same WireGuard interface, at least on Linux.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.zx2c4.com/pipermail/wireguard/attachments/20210228/a792beb1/attachment.sig>

From frank.mosch at tutanota.com  Wed Feb 24 00:11:02 2021
From: frank.mosch at tutanota.com (frank.mosch at tutanota.com)
Date: Wed, 24 Feb 2021 01:11:02 +0100 (CET)
Subject: event for peer connection
Message-ID: <MUGMypO--3-2@tutanota.com>

Hello,

first thank you all for your great work on Wireguard! 

We would like to switch to wireguard at work but for our setup we need some automatic configuration based on the clients and their IP numbers. Is there a way to get notified when peers connect or send handshake packets? We'd like to avoid polling as the configuration needs to be done rather quickly on a connect for a pleasant user experience.
A file in /sys or /proc that can be waited on would be best but listening for netlink events would be useful, too. 

Servus, Frank

From peter.truman at gmail.com  Tue Feb 23 23:24:25 2021
From: peter.truman at gmail.com (Peter Truman)
Date: Tue, 23 Feb 2021 23:24:25 +0000
Subject: Hairpin/interface change...
Message-ID: <CAGJxn5oGmfp-vDPw7A6kTb0vtUMDRyaBhamQn1XqWoUyDrv+_g@mail.gmail.com>

Hi,

I'll no doubt be shot for asking, but having just switched routers to
one which appears not to want to support hairpin, everything I had
running has keeled over (yay).

I have setup internal DNS to point to the internal WG address, as well
as having a valid external one, but wireguard (Android) won't survive
an interface change (mobile to wifi and vice versa) - as I think it
tries the previously remembered IP.

Is there anyway to set an option for "crappy router" to force a DNS
lookup on an interface change? (or does the Android device DNS cache
break that?)