From nazar at mokrynskyi.com Wed Nov 1 01:23:22 2023 From: nazar at mokrynskyi.com (Nazar Mokrynskyi) Date: Wed, 1 Nov 2023 03:23:22 +0200 Subject: Android can't complete handshake Message-ID: <1ac814fd-85ba-4b73-97b2-b13ddde13873@mokrynskyi.com> Basically every day I have the following in logs: > 10-16 00:46:05.380? 5200? 5568 D WireGuard/GoBackend/xyz: peer(8MoP?XPVc) - Handshake did not complete after 5 seconds, retrying (try 2) > 10-16 00:46:05.381? 5200? 5568 D WireGuard/GoBackend/xyz: peer(8MoP?XPVc) - Sending handshake initiation > 10-16 00:46:10.468? 5200? 5678 D WireGuard/GoBackend/xyz: peer(8MoP?XPVc) - Handshake did not complete after 5 seconds, retrying (try 3) > 10-16 00:46:10.469? 5200? 5678 D WireGuard/GoBackend/xyz: peer(8MoP?XPVc) - Sending handshake initiation This is happening on Android 14 when connected to Wi-Fi. It is definitely not a loss of Wi-Fi or Internet connectivity, I turn off and immediately turn on the VPN connection and it connects instantly and successfully, it is just those retries that are not succeeding. Is this a know issue, maybe there is a workaround? I have seen similar reports online that seem to be related to network connectivity, which is not the case here or else reconnection would fail too. -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc From kc at omnigroup.com Thu Nov 2 23:48:07 2023 From: kc at omnigroup.com (Ken Case) Date: Thu, 2 Nov 2023 16:48:07 -0700 Subject: [PATCH] Qualify routed DNS queries based on search domains Message-ID: Implement support for DNS search domains in the native apps for Apple platforms (Mac and iOS), matching the search domain support already implemented for other platforms. Rather than unconditionally routing all DNS queries through the associated tunnel's DNS, only route queries when no search domains have been specified. When search domains _have_ been specified, route those domains to the tunnel's DNS but let other domains continue to be routed to other network interfaces. Signed-off-by: Ken Case --- Sources/WireGuardKit/PacketTunnelSettingsGenerator.swift | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/Sources/WireGuardKit/PacketTunnelSettingsGenerator.swift b/Sources/WireGuardKit/PacketTunnelSettingsGenerator.swift index c53a82c..5b7f63c 100644 --- a/Sources/WireGuardKit/PacketTunnelSettingsGenerator.swift +++ b/Sources/WireGuardKit/PacketTunnelSettingsGenerator.swift @@ -88,7 +88,13 @@ class PacketTunnelSettingsGenerator { let dnsSettings = NEDNSSettings(servers: dnsServerStrings) dnsSettings.searchDomains = tunnelConfiguration.interface.dnsSearch if !tunnelConfiguration.interface.dns.isEmpty { - dnsSettings.matchDomains = [""] // All DNS queries must first go through the tunnel's DNS + if tunnelConfiguration.interface.dnsSearch.isEmpty { + // Since no search domains were listed, use this tunnel's DNS for all queries + dnsSettings.matchDomains = [""] + } else { + // Only use this tunnel for the listed search domains + dnsSettings.matchDomains = tunnelConfiguration.interface.dnsSearch + } } networkSettings.dnsSettings = dnsSettings } -- 2.41.0 From dxld at darkboxed.org Sat Nov 18 02:19:01 2023 From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=) Date: Sat, 18 Nov 2023 03:19:01 +0100 Subject: [Babel-users] [RFC] Replace WireGuard AllowedIPs with IP route attribute In-Reply-To: References: <20230819140218.5algu2nfmfostngh@House.clients.dxld.at> <4b-64e11f80-13-5e880900@8744214> <20230819212357.lkshcpslkgbeaq4e@House.clients.dxld.at> <20230828160705.a5uxv5l2zknna7yj@House.clients.dxld.at> <87v8czqd3w.wl-jch@irif.fr> <20230828221312.fw5pvnt4x7p2c52k@House.clients.dxld.at> <804a0c0a-78df-7f4c-1d0d-213e8bdb4120@nic.cz> Message-ID: <20231118021901.47kzvwn4pup4vkmg@House.clients.dxld.at> Hi Alexander, On Thu, Nov 09, 2023 at 12:57:26PM +0100, Alexander Zubkov wrote: > I heard recently about the lightweight tunnel infrastructure in Linux > kernel (ip route ... encap ...). And I think this might be helpful in > the context of this thread. I hadn't seen that yet, thanks for pointing it out. > Linux kernel allows already to add encapsulation parameters to the route > entry in its table. So you do not need to create tunnel devices for > that. And wireguard encapsulation and destination might be added there > too. Right, I think ultimately it's going to come down to either technical constraints or in the absence of that, maintainer preference whether via-wgpeer or "encap wg" is the way. The idea is very similar anyway. > But as I understood the technology, it works only in one way (for > outgoing packets) and the decapsulation should be processed separately, > for example in case of VXLAN and MPLS they have their own tables. That would be a problem as I specifically want to tie the source address filtering to this too. I'll have a look at the internals (if and) when I get around to starting work on this. Thanks, --Daniel -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From mabived at proton.me Wed Nov 1 21:46:19 2023 From: mabived at proton.me (mabived) Date: Wed, 01 Nov 2023 21:46:19 -0000 Subject: Reduce height of macOS menu bar icon Message-ID: The menu bar extra icon for macOS looks out of place. Can the icon be resized to match system icons? Screenshot: https://ibb.co/1Gxtz7P https://developer.apple.com/design/human-interface-guidelines/the-menu-bar#Menu-bar-extras https://bjango.com/articles/designingmenubarextras/ From green at qrator.net Thu Nov 9 11:57:26 2023 From: green at qrator.net (Alexander Zubkov) Date: Thu, 9 Nov 2023 12:57:26 +0100 Subject: [Babel-users] [RFC] Replace WireGuard AllowedIPs with IP route attribute In-Reply-To: <804a0c0a-78df-7f4c-1d0d-213e8bdb4120@nic.cz> References: <20230819140218.5algu2nfmfostngh@House.clients.dxld.at> <4b-64e11f80-13-5e880900@8744214> <20230819212357.lkshcpslkgbeaq4e@House.clients.dxld.at> <20230828160705.a5uxv5l2zknna7yj@House.clients.dxld.at> <87v8czqd3w.wl-jch@irif.fr> <20230828221312.fw5pvnt4x7p2c52k@House.clients.dxld.at> <804a0c0a-78df-7f4c-1d0d-213e8bdb4120@nic.cz> Message-ID: Hello all, I heard recently about the lightweight tunnel infrastructure in Linux kernel (ip route ... encap ...). And I think this might be helpful in the context of this thread. Linux kernel allows already to add encapsulation parameters to the route entry in its table. So you do not need to create tunnel devices for that. And wireguard encapsulation and destination might be added there too. But as I understood the technology, it works only in one way (for outgoing packets) and the decapsulation should be processed separately, for example in case of VXLAN and MPLS they have their own tables. Regards, Alexander Zubkov Qrator Labs On Mon, Sep 11, 2023 at 5:46?PM Maria Matejka via Bird-users wrote: > > Hello! > > On 8/29/23 00:13, Daniel Gr?ber wrote: > > On Mon, Aug 28, 2023 at 07:40:51PM +0200, Juliusz Chroboczek wrote: > > I've read the whole discussion, and I'm still not clear what advantages > the proposed route attribute has over having one interface per peer. Is > it because interfaces are expensive in the Linux kernel? Or is there some > other reason why it is better to run all WG tunnels over a single interface? > > Off the top of my head UDP port exhaustion is a scalability concern here, > > For enterprise setups, this very easily _can_ get a scalability concern fairly easily. > > One wg-device per-peer means we need one UDP port per-peer and since > currently binding to a specific IP is also not supported by wg (I have a > patch pending for this though) there's no good way to work around this. > > There is a theoretical frankenstein approach, running a virtual machine (maybe netns is enough) for each of the public IP address, and connect them by veth. You do not want to do this, but theoretically, it should work. > > Frankly having tons of interfaces is just an operational PITA in all sorts > of ways. Apart from the port exhaustion having more than one wg device also > means I have to _allocate_ a new port for each node in my managment system > somehow instead of just using a static port for the entire network. This > gets dicy fast as I want to move in the direction of dynamic peering as in > tinc. > > Even with my 6 machines running in weird locations, it's a mess. > > All of that could be solved, but I would also like to get my wg+babel VPN > setup deployed more widely at some point and all that friction isn't going > to help with that so I'd rather have this supported properly. > > All in all, I would also like to see this setup deployed worldwide. If we could somehow help on the BIRD side, please let us know. > > Thank you for bringing this up. > > -- > Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o. From grishka93 at gmail.com Sat Nov 11 11:03:54 2023 From: grishka93 at gmail.com (Gregory Klyushnikov) Date: Sat, 11 Nov 2023 14:03:54 +0300 Subject: Wireguard Android app sometimes shows its UI instead of just toggling the VPN Message-ID: <2E6A421F-50FF-44BA-B0A6-FA1F2F421ABB@gmail.com> Hi, I haven't found a proper bug tracker to report this in (wireguard-android github repo has issues off) so... I have a Wireguard tile in my quick settings. Sometimes, when I tap it, instead of toggling the VPN, the app's activity starts and the VPN state remains unchanged. My device is a Pixel 4a running Android 13, the app version is 1.0.20231018, installed from Google Play. I'm not able to reproduce this on purpose but it's always annoying when it happens. If logcat logs would be helpful, I'll try to get some next time I run into this issue. Regards, Gregory From jch at irif.fr Sat Nov 18 12:22:03 2023 From: jch at irif.fr (Juliusz Chroboczek) Date: Sat, 18 Nov 2023 13:22:03 +0100 Subject: [Babel-users] [RFC] Replace WireGuard AllowedIPs with IP route attribute In-Reply-To: <918e1d5b-9f11-4f9c-bf9a-94cb0d41ce2b@app.fastmail.com> References: <20230819140218.5algu2nfmfostngh@House.clients.dxld.at> <4b-64e11f80-13-5e880900@8744214> <20230819212357.lkshcpslkgbeaq4e@House.clients.dxld.at> <20230828160705.a5uxv5l2zknna7yj@House.clients.dxld.at> <87v8czqd3w.wl-jch@irif.fr> <20230828221312.fw5pvnt4x7p2c52k@House.clients.dxld.at> <804a0c0a-78df-7f4c-1d0d-213e8bdb4120@nic.cz> <20231118021901.47kzvwn4pup4vkmg@House.clients.dxld.at> <918e1d5b-9f11-4f9c-bf9a-94cb0d41ce2b@app.fastmail.com> Message-ID: <871qcn8d5g.wl-jch@irif.fr> > Is tying source address filtering to the routing table the right thing to do > here? It seems to me that it would cause issues similar to those we see more > generally with Unicast Reverse Path Filtering Issues are caused by the kernel performing filtering that the routing protocol is not aware of: it causes the routing daemon's routing table to no longer match the effective forwarding table (the kernel's routing table). That's the reason why uRPF breaks most routing protocols, that's the reason why we have trouble making Wireguard work with Babel, and also the reason behind https://github.com/jech/babeld/issues/111. Contrariwise, we can teach Babel to explicitly take into account the kernel features that we're interested in using. Thus, Babel could be aware of the restrictions placed on a wireguard interface, and collaborate with Wireguard so that the routing table and the forwarding table remain congruent. I haven't looked at the issue in detail, but I believe that would be an interesting (short-term) research project, one that I would be glad to collaborate with (but not necessarily lead, at least not right now). For the specific case of source address filtering, Babel already has an (implemented) extension to deal with source addresses, and I encourage you to consider whether it can be used to deal with the issue at hand. Please see https://arxiv.org/pdf/1403.0445.pdf and RFC 9079. -- Juliusz From erikschulz184 at gmail.com Mon Nov 6 18:43:06 2023 From: erikschulz184 at gmail.com (Erik Schulz) Date: Mon, 6 Nov 2023 19:43:06 +0100 Subject: Bugs in MacOS client: Infinite reconnect when using on-demand and switching user; missing reconnect feature Message-ID: I'm using the MacOS App Store client, App version: 1.0.16 (27) Go backend version: 1e2c3e5a I use multiple users and switch between them. a) When logged in as user A, which has the tunnel set up, and Wireguard running, when switching to user B, Wireguard disconnects the tunnel. As user B, trying to switch on the tunnel in Settings > VPN, fails. I'm guessing this is unavoidable, and a security feature of the OS, but if not, it would be nice to have a configuration option to allow the tunnel to continue to operate. b) when having "On-Demand" enabled for ethernet and wifi, When switching to user B, the Settings > VPN seems to be in an infinite loop, switching on/off. I'm guessing that Wireguard (running in user A) is trying to establish the tunnel, but failing. I'm guessing that there is a bug in the retry/wait logic for On-Demand. This causes high cpu load. This means that I'm unable to use "On-Demand". c) Instead of On-Demand, it would be nice to automatically reconnect when switching back to user A. Currently I have to enable it manually each time I switch to user A. Could the app remember that the connection was active before user switch, and when switching back, automatically reconnect? Thanks! From mprobert at gmail.com Fri Nov 3 13:12:30 2023 From: mprobert at gmail.com (M P Robert) Date: Fri, 03 Nov 2023 13:12:30 -0000 Subject: wg-quick set_mtu_up - largest or smallest MTU? Message-ID: <81DE093A-5905-49CA-A412-5F934E0E0EBB@gmail.com> I was looking at the auto MTU detection in wg-quick. It appears that wg-quick is taking the LARGEST of all endpoint MTUs https://github.com/WireGuard/wireguard-tools/blob/13f4ac4cb74b5a833fa7f825ba785b1e5774e84f/src/wg-quick/linux.bash#L134 In a scenario where you have different peers on different network devices or routes with different MTUs, I would think you would want to take the SMALLEST mtu from all peers in order to avoid having fragmentation talking to the peers on networks with smaller MTUs. Or perhaps fragmentation for some peers is faster than selecting a smaller packet size for all peers? Or I am missing something (more likely). Happy to be educated on this point. I don't have git-send-email setup at this point, but just in case this is a valid issue, I've attached a sample fix for set_mtu_up that will take the smallest of the discovered peer MTUs rather than the largest. I'm not a bash guy, so just take it for illustration purposes. Thanks! -Matt -------------- next part -------------- A non-text attachment was scrubbed... Name: wg-quick.set_mtu_up.sh Type: application/octet-stream Size: 912 bytes Desc: not available URL: -------------- next part -------------- From dxld at darkboxed.org Sun Nov 19 14:41:51 2023 From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=) Date: Sun, 19 Nov 2023 15:41:51 +0100 Subject: wg-quick set_mtu_up - largest or smallest MTU? In-Reply-To: <81DE093A-5905-49CA-A412-5F934E0E0EBB@gmail.com> References: <81DE093A-5905-49CA-A412-5F934E0E0EBB@gmail.com> Message-ID: <20231119144151.2wcvbph4hplfydfj@House.clients.dxld.at> Hi Matt, On Fri, Nov 03, 2023 at 01:12:22PM +0000, M P Robert wrote: > I was looking at the auto MTU detection in wg-quick. > > It appears that wg-quick is taking the LARGEST of all endpoint MTUs > > https://github.com/WireGuard/wireguard-tools/blob/13f4ac4cb74b5a833fa7f825ba785b1e5774e84f/src/wg-quick/linux.bash#L134 My reading of the code is that we iterate over all peers' endpoints do a route lookup to get the destination specific MTU and fall back to default route's MTU if that didn't return anything. In both cases we use the "mtu" route attribute or failing that the actual interface MTU. Incidentally the docs don't mention the details of the MTU selection behaviour. wg-quick(1) has this: ? MTU ? if not specified, the MTU is automatically determined from the endpoint addresses or the system default route, which is usually a sane choice. However, to manually specify an MTU to override this automatic discovery, this value may be specified explicitly. > In a scenario where you have different peers on different network devices > or routes with different MTUs, I would think you would want to take the > SMALLEST mtu from all peers in order to avoid having fragmentation > talking to the peers on networks with smaller MTUs. I agree using the smallest MTU would probably be best for most users. > Or perhaps fragmentation for some peers is faster than selecting a > smaller packet size for all peers? Or I am missing something (more > likely). Happy to be educated on this point. In principle hosts that have (some) interfaces using jumbo frames may want to make use of the largest MTU for efficiency, in this case one may be willing to take the fragmentation hit on interfaces with smaller MTU. In this case it'd still be possible to override the automatic MTU selection by configuring MTU= statically so this isn't a blocker. > I don't have git-send-email setup at this point, but just in case this is > a valid issue, I've attached a sample fix for set_mtu_up that will take > the smallest of the discovered peer MTUs rather than the largest. I'm > not a bash guy, so just take it for illustration purposes. Looks like Thomas already whipped up a patch. You could test it and report back :) --Daniel From dxld at darkboxed.org Sun Nov 19 14:54:31 2023 From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=) Date: Sun, 19 Nov 2023 15:54:31 +0100 Subject: Wireguard Windows keeps using lower priority interface (wifi) when a higher priority interface (wired) becomes available In-Reply-To: References: Message-ID: <20231119145431.leefvef3r4wdseq2@House.clients.dxld.at> Hi Dave, On Thu, Oct 19, 2023 at 09:43:46AM +0200, Dave Mifsud wrote: > Has anyone come across this issue? Can anything be done, apart from > creating a trigger in windows such that whenever a wired connection > becomes available Wireguard is restarted? We would like to avoid this, > as the solution seems too drastic. Sounds very similar to the behaviour I'm seeing with the Linux kernel implementation. This is intentional as best I can tell, it's called "sticky sockets". See my lament thread "Wg source address is too sticky for multihomed systems aka multiple endpoints redux" https://lists.zx2c4.com/pipermail/wireguard/2023-July/008111.html It's safe to say many people have run into this and I think will continue to do so as multihoming (aka wifi+ethernet) is pervasive. I have a workaround for this on Linux without breaking connectivity by completely restarting the interface. It involves setting fwmark which invalidates the cached route, not sure a comparable codepath exists in the windows impl. --Daniel From dxld at darkboxed.org Mon Nov 20 01:17:01 2023 From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=) Date: Mon, 20 Nov 2023 02:17:01 +0100 Subject: [PATCH] wg-quick: linux: fix MTU calculation (use PMTUD) In-Reply-To: <20231029192210.120316-1-tomxor@gmail.com> References: <20231029192210.120316-1-tomxor@gmail.com> Message-ID: <20231120011701.asllvpzuffih34wz@House.clients.dxld.at> Hi Thomas, On Sun, Oct 29, 2023 at 07:22:10PM +0000, Thomas Brierley wrote: > Currently MTU calculation fails to successfully utilise the kernel's > built-in path MTU discovery mechanism (PMTUD). Fixing this required a > re-write of the set_mtu_up() function, which also addresses two related > MTU issues as a side effect... > > 1. Trigger PMTUD Before Query > > Currently the endpoint path MTU acquired from `ip route get` will almost > definitely be empty, This is not entirely true, routes can specify the `mtu` route attribute explicitly and this will show up here. Something like: $ ip route add 2001:db8::1 dev eth0 via fe80::1 mtu 9000 $ ip route get 2001:db8::1 ~ 2001:db8::1 from :: via fe80::1 dev eth0 src 2001:db8:2 metric 1 mtu 9000 pref medium So this is useful even when not taking PMTU into account. The only concerning problem I see here is that in the case where ip-route-get returns a cached PMTU so the MTU selection isn't fully deterministic. > because this only queries the routing cache. To > trigger PMTUD on the endpoint and fill this cache, it is necessary to > send an ICMP with the DF bit set. I don't think this is useful. Path MTU may change, doing this only once when the interface comes up just makes wg-quick less predictable IMO. > We now perform a ping beforehand with a total packet size equal to the > interface MTU, larger will not trigger PMTUD, and smaller can miss a > bottleneck. To calculate the ping payload, the device MTU and IP header > size must be determined first. > > 2. Consider IPv6/4 Header Size > > Currently an 80 byte header size is assumed i.e. IPv6=40 + WireGuard=40. > However this is not optimal in the case of IPv4. Since determining the > IP header size is required for PMTUD anyway, this is now optimised as a > side effect of endpoint MTU calculation. This is not a good idea. Consider what happens when a peer roams from an IPv4 to a IPv6 endpoint address. It's better to be conservative and assume IPv6 sized overhead, besides IPv4 is legacy anyway ;) > 3. Use Smallest Endpoint MTU > > Currently in the case of multiple endpoints the largest endpoint path > MTU is used. However WireGuard will dynamically switch between endpoints > when e.g. one fails, so the smallest MTU is now used to ensure all > endpoints will function correctly. "function correctly". Do note that wireguard lets it's UDP packets be fragmented. So connectivty will still work even when the wg device MTU doesn't match the (current) PMTU. The only downsides to this mismatch being performance: - additional header overhead for fragments, - less than half max packets-per-second performance and - additional lateny for tunnel packets hit by IPv6 PMTU discovery I was surprised to learn that this would happen periodically, every time the PMTU cache expires. Seems inherent in the IPv6 design as there's no way (AFAICT) for the kernel to validate the PMTU before the cache expires (like is done for NDP for example). > Signed-off-by: Thomas Brierley > --- > src/wg-quick/linux.bash | 41 ++++++++++++++++++++++++++--------------- > 1 file changed, 26 insertions(+), 15 deletions(-) > > diff --git a/src/wg-quick/linux.bash b/src/wg-quick/linux.bash > index 4193ce5..5aba2cb 100755 > --- a/src/wg-quick/linux.bash > +++ b/src/wg-quick/linux.bash > @@ -123,22 +123,33 @@ add_addr() { > } > > set_mtu_up() { > - local mtu=0 endpoint output > - if [[ -n $MTU ]]; then > - cmd ip link set mtu "$MTU" up dev "$INTERFACE" > - return > - fi > - while read -r _ endpoint; do > - [[ $endpoint =~ ^\[?([a-z0-9:.]+)\]?:[0-9]+$ ]] || continue > - output="$(ip route get "${BASH_REMATCH[1]}" || true)" > - [[ ( $output =~ mtu\ ([0-9]+) || ( $output =~ dev\ ([^ ]+) && $(ip link show dev "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) ) ) && ${BASH_REMATCH[1]} -gt $mtu ]] && mtu="${BASH_REMATCH[1]}" > - done < <(wg show "$INTERFACE" endpoints) > - if [[ $mtu -eq 0 ]]; then > - read -r output < <(ip route show default || true) || true > - [[ ( $output =~ mtu\ ([0-9]+) || ( $output =~ dev\ ([^ ]+) && $(ip link show dev "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) ) ) && ${BASH_REMATCH[1]} -gt $mtu ]] && mtu="${BASH_REMATCH[1]}" > + local dev devmtu end endmtu iph=40 wgh=40 mtu > + # Device MTU > + if [[ -n $(ip route show default) ]]; then > + [[ $(ip route show default) =~ dev\ ([^ ]+) ]] > + dev=${BASH_REMATCH[1]} > + [[ $(ip addr show $dev scope global) =~ inet6 ]] && > + iph=40 || iph=20 > + if [[ $(ip link show dev $dev) =~ mtu\ ([0-9]+) ]]; then > + devmtu=${BASH_REMATCH[1]} > + [[ $(( devmtu - iph - wgh )) -gt $mtu ]] && > + mtu=$(( devmtu - iph - wgh )) > + fi > + # Endpoint MTU > + while read -r _ end; do > + [[ $end =~ ^\[?([a-f0-9:.]+)\]?:[0-9]+$ ]] > + end=${BASH_REMATCH[1]} > + [[ $end =~ [:] ]] && > + iph=40 || iph=20 > + ping -w 1 -M do -s $(( devmtu - iph - 8 )) $end &> /dev/null || true > + if [[ $(ip route get $end) =~ mtu\ ([0-9]+) ]]; then > + endmtu=${BASH_REMATCH[1]} > + [[ $(( endmtu - iph - wgh )) -lt $mtu ]] && > + mtu=$(( endmtu - iph - wgh )) > + fi > + done < <(wg show "$INTERFACE" endpoints) > fi > - [[ $mtu -gt 0 ]] || mtu=1500 > - cmd ip link set mtu $(( mtu - 80 )) up dev "$INTERFACE" > + cmd ip link set mtu ${MTU:-${mtu:-1420}} up dev "$INTERFACE" > } > > resolvconf_iface_prefix() { > -- > 2.30.2 > --Daniel From dxld at darkboxed.org Mon Nov 20 02:05:01 2023 From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=) Date: Mon, 20 Nov 2023 03:05:01 +0100 Subject: [Babel-users] [RFC] Replace WireGuard AllowedIPs with IP route attribute In-Reply-To: <918e1d5b-9f11-4f9c-bf9a-94cb0d41ce2b@app.fastmail.com> <871qcn8d5g.wl-jch@irif.fr> Message-ID: <20231120020501.t6jw2k2u42y2ntqt@House.clients.dxld.at> Hi Erin, Juliusz, On Sat, Nov 18, 2023 at 11:21:57AM +0100, Erin Shepherd wrote: > On Sat, 18 Nov 2023, at 03:19, Daniel Gr?ber wrote: > > That would be a problem as I specifically want to tie the source address > > filtering to this too. I'll have a look at the internals (if and) when I > > get around to starting work on this. > > Is tying source address filtering to the routing table the right thing to > do here? It seems to me that it would cause issues similar to those we > see more generally with Unicast Reverse Path Filtering IMO not providing a way to do source address filtering at the routing level was the original sin :) There is certianly the multihoming challange to be overcome as traditional BCP38 style filtering doesn't cut it in the general case. I have some ideas on how to deal with this. I've done some experiments and found that in Linux multi-nexthop routes actually match reverse path lookups (using nftables "rt") for _any_ of the source interfaces involved. I think this can be used to build RFC 3704 style Feasible Path Reverse Path Forwarding when the routing daemon involved supports ECMP. This experiment is what got me interested in having via-wgpeer in the routing table in the first place, once we have that we can apply the above idea not just at the interface level but at the wg peer level. Neat. Can you think of a use-case where fpRPF isn't enough? It's also noteworthy that once we have this support for via-wgpeer it'd be possible to apply ip-rule policy to the filtering decision. Perhaps that gives some additional power for more fun use-cases :) On Sat, Nov 18, 2023 at 01:22:03PM +0100, Juliusz Chroboczek wrote: > Issues are caused by the kernel performing filtering that the routing > protocol is not aware of: it causes the routing daemon's routing table to > no longer match the effective forwarding table (the kernel's routing > table). That's the reason why uRPF breaks most routing protocols, that's > the reason why we have trouble making Wireguard work with Babel, and also > the reason behind https://github.com/jech/babeld/issues/111. Right on the money as always. This idea has been on my mind too. > Contrariwise, we can teach Babel to explicitly take into account the > kernel features that we're interested in using. Thus, Babel could be > aware of the restrictions placed on a wireguard interface, and collaborate > with Wireguard so that the routing table and the forwarding table remain > congruent. I haven't looked at the issue in detail, but I believe that > would be an interesting (short-term) research project, one that I would be > glad to collaborate with (but not necessarily lead, at least not right now). Sounds interesting do you have a funding source in mind? > For the specific case of source address filtering, Babel already has an > (implemented) extension to deal with source addresses, and I encourage you > to consider whether it can be used to deal with the issue at hand. Please > see https://arxiv.org/pdf/1403.0445.pdf and RFC 9079. I don't think I mentioned this to you yet, but I have another one of my crazy ideas of doing something vaguely similar to BGP flowspec with babel. Restricted to IP source/destination address, so no L4 stuff. I just want to represent firewall policy using ipv6 subtrees and distribute it in realtime using babel :) Unfortunately this is currently stalled due to an apparent nft rt match kernel bug preventing me from representing multiple possible outcomes since I want to support dropping, accepting but also stateful firewalling of matching flows. --Daniel From dxld at darkboxed.org Wed Nov 22 07:39:35 2023 From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=) Date: Wed, 22 Nov 2023 08:39:35 +0100 Subject: [Babel-users] [RFC] Replace WireGuard AllowedIPs with IP route attribute In-Reply-To: References: <918e1d5b-9f11-4f9c-bf9a-94cb0d41ce2b@app.fastmail.com> <871qcn8d5g.wl-jch@irif.fr> <20231120020501.t6jw2k2u42y2ntqt@House.clients.dxld.at> Message-ID: <20231122073935.7bkugxmv4wknsqfb@House.clients.dxld.at> Hi Alexander, On Wed, Nov 22, 2023 at 12:17:49AM +0100, Alexander Zubkov wrote: > > Can you think of a use-case where fpRPF isn't enough? > > Yes. IMHO, the problem with RPF is that routing table doesn't reflect the > network topology, but only a subset of it. Right that is the fundamental problem, so my solution to that is: routing should "just represent the full network topology" :) As the routing protocol sees it anyway, since the whole point of RFP is to only allow paths that the routing system chooses. Do note that while I implement the topology information using ECMP routes there's no reason you actually have to use ECMP. You could still have regular routes in your (main) routing table and use a separate table with ECMP routes for RPF and this is very much something I want us to support. > I mean in topologies where multiple pathes are possible, you can choose > to use or even learn only a subset of those pathes. If I undestand correctly you're talking about (local) routing daemon policy here. Yes this is something you can do and my current approach of (abusing) ECMP only works when your routing policy satisfies some symmetry criteria. However as Juliusz pointed out integrating this idea into the routing protocol proper could allow using arbitrary policy without ever breaking RPF, but figuring out the details is (exciting) future work. > In that sense might be yes, the original sin is that the routing table > doesn't reflect all the topology, not only the pathes we choose for egress. > Not sure though if it is a sin, in that case routing table would be too > overcomplicated. Right routing table (modification) performance and clutter is certainly a reason to forgoe this approach but I find that for the kind of (small) networks I want to run and that many people might run using wireguard this is perfectly fine. > If I understand correctly, such fpRPF approach works only if you both learn > all possible pathes and use all of them in a multi-nexthop route. But for > example in the Internet with its advanced BGP announcement policies it is > not true at all. Right to deply fpRPF on a large scale you really need some kind of support from the routing protocol. AFAIK there's nothing like that for BGP yet? I don't think it's completely inapplicable either though, might still work for iBGP with appropriately designed routing policy. My interest lies mostly with doing this using babel though. > So from my point of view it is good to split the topology definition > (ingress decapsulation) and the chosen pathes (egress routing). Because it > is related, but still different processes. So the system can be more > flexible. Although we need to repeat common things and keep ingress and > egress consistent/synced. To me flexibility is only desirable insofar as it doesn't conflict with system security. Source address authenticity is an important property I wouldn't want to give up here. If it's easier to ignore source address filtering than it is to implement it nobody is going to do it (cf. the internet) and I think that's the crux of the problem with "encap". Wireguard gifted us this amazing state of source filtering being the easy default and I want to keep it that way. > my point is that RPF (with its variations too) has its bounds and cannot > be a universal solution, there is no silver bullet here. No, ofc. nothing we do can possibly "fix everything for everyone" but that's no reason not to try a new approach for a particular problem in a particular use-case :) --Daniel -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From tomxor at gmail.com Thu Nov 23 03:33:39 2023 From: tomxor at gmail.com (Thomas Brierley) Date: Thu, 23 Nov 2023 03:33:39 +0000 Subject: [PATCH] wg-quick: linux: fix MTU calculation (use PMTUD) In-Reply-To: <20231120011701.asllvpzuffih34wz@House.clients.dxld.at> References: <20231029192210.120316-1-tomxor@gmail.com> <20231120011701.asllvpzuffih34wz@House.clients.dxld.at> Message-ID: Hi Daniel Thanks for having a look at this. On Mon, 20 Nov 2023 at 01:17, Daniel Gr?ber wrote: > > because this only queries the routing cache. To > > trigger PMTUD on the endpoint and fill this cache, it is necessary to > > send an ICMP with the DF bit set. > > I don't think this is useful. Path MTU may change, doing this only once > when the interface comes up just makes wg-quick less predictable IMO. Yes, I understand PMTU may change, usually when changing internet connection. There is also the issue of bringing up an interface without a connection, such as when using the wg-quick startup service. Accommodating dynamic PMTU is probably out of scope of the wg-quick script, but is something I would like to look into separately. I still think it would be beneficial to set the MTU optimally if only upon bringing an interface up, because PMTU is usually stable for a particular gateway and having this built in makes it far easier for users to automatically obtain the appropriate MTU. I think it also more accurately reflects the man page which suggests automatic discovery. > > 2. Consider IPv6/4 Header Size > > > > Currently an 80 byte header size is assumed i.e. IPv6=40 + WireGuard=40. > > However this is not optimal in the case of IPv4. Since determining the > > IP header size is required for PMTUD anyway, this is now optimised as a > > side effect of endpoint MTU calculation. > > This is not a good idea. Consider what happens when a peer roams from an > IPv4 to a IPv6 endpoint address. It's better to be conservative and assume > IPv6 sized overhead, besides IPv4 is legacy anyway ;) MTU calculation is performed independently for each endpoint, with separate header size calculation accommodating both IPv4 and IPv6 addresses along side each other. The smallest MTU of all endpoints is used, so switching from an IPv4 to an IPv6 endpoint should not result in an MTU which is too large due to IP header size differences. In my case the current behaviour is not conservative enough, but due to absence of PMTUD rather than assumed IP header sizes. > > 3. Use Smallest Endpoint MTU > > > > Currently in the case of multiple endpoints the largest endpoint path > > MTU is used. However WireGuard will dynamically switch between endpoints > > when e.g. one fails, so the smallest MTU is now used to ensure all > > endpoints will function correctly. > > "function correctly". Do note that WireGuard lets it's UDP packets be > fragmented. So connectivity will still work even when the wg device MTU > doesn't match the (current) PMTU. The only downsides to this mismatch being > performance: > > - additional header overhead for fragments, > - less than half max packets-per-second performance and > - additional lateny for tunnel packets hit by IPv6 PMTU discovery > > I was surprised to learn that this would happen periodically, every time > the PMTU cache expires. Seems inherent in the IPv6 design as there's no > way (AFAICT) for the kernel to validate the PMTU before the cache > expires (like is done for NDP for example). So, the reason I ended up tinkering with WireGuard MTU is due to real world reliability issues. Although the risk in setting it optimally based on PMTU remains unclear to me, marginal performance gains are not what brought me here. Networking is not my area of expertise, so the best I can do is lay out my experience and see if you think it adds any weight in favour of this change in behaviour, because I haven't done a full root cause analysis: I found that browsing the web over WireGuard with an MTU set larger than the PMTU resulted in randomly stalled HTTP requests. This is noticeable even with a single stalled HTTP request due to the HTTP 1.1 head of line blocking issue. I tested this manually with individual HTTP requests with a large enough payload, verifying that it only occurs over WireGuard connections. With naked HTTP/TCP the network seems happy, I assume it is fragmenting packets; but over WireGuard, somehow, some packets just seem to get dropped. Maybe UDP is getting treated differently, or maybe what's actually happening is the network is blackholing in both cases but PMTUD is figuring this out in the case of TCP (RFC 2923), and maybe that stops working when encapsulated in UDP?... But this is pure speculation, I'm out of my depth here, and haven't dug any deeper. This behaviour is probably network operator dependent, or specific to LTE networks, which I use for permanent internet access, and which commonly use a lower than average MTU. For example my current ISP uses 1380, and the current wg-quick behaviour is to set the MTU to the default route interface MTU less 80 bytes (1420 for regular interfaces), which results in the above behaviour. I've used all four of the major mobile network operators in my country and experienced this on two of them (separate physical networks, not virtual operators). The other two used an MTU of 1500 anyway. Just to prove I'm not entirely on my own, this issue also appears to be known to WireGuard VPN providers, .e.g from Mullvad's FAQ: > The default MTU (maximum transmission unit) for WireGuard in the Mullvad > app is 1380. You can set it to 1280 if the WireGuard connection stops > working. This may be necessary in some mobile networks. I suppose it could be argued this is not a WireGuard concern, mobile networks are behaving weirdly. Also IME it's not entirely unreliable above the optimal MTU, it's just *less* reliable. I had not anticipated such a patch would have any down sides, I saw this as a general deficiency - Although I appreciate, as you pointed out, it is not a 100% complete solution. I'm interested more in what your concerns are and what you think of the above, but will move along if you still think it's not suitable. Cheers Tom From dxld at darkboxed.org Mon Nov 27 14:19:24 2023 From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=) Date: Mon, 27 Nov 2023 15:19:24 +0100 Subject: Wg fragment packet blackholing issue (Was: [PATCH] wg-quick: linux: fix MTU calculation (use PMTUD)) In-Reply-To: References: <20231029192210.120316-1-tomxor@gmail.com> <20231120011701.asllvpzuffih34wz@House.clients.dxld.at> Message-ID: <20231127141924.txr6kdhtk6ainrur@House.clients.dxld.at> Hi Tom, On Thu, Nov 23, 2023 at 03:33:39AM +0000, Thomas Brierley wrote: > On Mon, 20 Nov 2023 at 01:17, Daniel Gr?ber wrote: > > > > because this only queries the routing cache. To > > > trigger PMTUD on the endpoint and fill this cache, it is necessary to > > > send an ICMP with the DF bit set. > > > > I don't think this is useful. Path MTU may change, doing this only once > > when the interface comes up just makes wg-quick less predictable IMO. > > Yes, I understand PMTU may change, usually when changing internet > connection. Just to make sure you understand the distinction, PMTU may change at any time. It's a property of the *path* IP packets take to get to their destination, specifically the minimimum *link* MTU involved in forwarding your packet on the routers along the path. > > > 2. Consider IPv6/4 Header Size > > > > > > Currently an 80 byte header size is assumed i.e. IPv6=40 + WireGuard=40. > > > However this is not optimal in the case of IPv4. Since determining the > > > IP header size is required for PMTUD anyway, this is now optimised as a > > > side effect of endpoint MTU calculation. > > > > This is not a good idea. Consider what happens when a peer roams from an > > IPv4 to a IPv6 endpoint address. It's better to be conservative and assume > > IPv6 sized overhead, besides IPv4 is legacy anyway ;) > > MTU calculation is performed independently for each endpoint I think you misunderstand, this is not about different peers having different endpoint IPs, it's about one peer (say a laptop) moving from an IPv4 network to an IPv6 one. In this case doing the PMTU "optimization" on the v4 one would cause fragmentation on the v6 network. > > "function correctly". Do note that WireGuard lets it's UDP packets be > > fragmented. So connectivity will still work even when the wg device MTU > > doesn't match the (current) PMTU. The only downsides to this mismatch being > > performance: > > > > - additional header overhead for fragments, > > - less than half max packets-per-second performance and > > - additional lateny for tunnel packets hit by IPv6 PMTU discovery > > > > I was surprised to learn that this would happen periodically, every time > > the PMTU cache expires. Seems inherent in the IPv6 design as there's no > > way (AFAICT) for the kernel to validate the PMTU before the cache > > expires (like is done for NDP for example). > > So, the reason I ended up tinkering with WireGuard MTU is due to real > world reliability issues. Although the risk in setting it optimally > based on PMTU remains unclear to me, marginal performance gains are not > what brought me here. Networking is not my area of expertise, so the > best I can do is lay out my experience and see if you think it adds any > weight in favour of this change in behaviour, because I haven't done a > full root cause analysis: > > I found that browsing the web over WireGuard with an MTU set larger than > the PMTU resulted in randomly stalled HTTP requests. This is noticeable > even with a single stalled HTTP request due to the HTTP 1.1 head of line > blocking issue. I tested this manually with individual HTTP requests > with a large enough payload, verifying that it only occurs over > WireGuard connections. Hanging TCP/HTTP connections is a typical symptom of broken IP fragmentation, there's a number of possible causes. There's a browser based test for the common issues here (thanks majek): http://icmpcheck.popcount.org/ (IPv4) http://icmpcheck6.popcount.org/ (IPv6) Try it (without any active VPNs or tunnels obv.) to see if your internet provider's network is fundamentally broken. Note: The IPv6 test can give wrong results for networks deploying DNS64+NAT64. To make sure this doesn't affect you you should ensure ipv4only.arpa fails to resolve. Expected: $ getent ahostsv6 ipv4only.arpa; echo rv=$? rv=2 If broken fragmentation is the root problem you can work around this in a number of ways. The easiest is probably by implementing TCP MSS clamping on your gateway or host(s). > With naked HTTP/TCP the network seems happy, I assume it is fragmenting > packets; but over WireGuard, somehow, some packets just seem to get > dropped. Maybe UDP is getting treated differently That's entirely possible. UDP is (unfortunately) commonly abused for DDoS and so network operators may filter it in desperation and pray no customer notices. So it's up to you to notice, complain loudly and switch providers if all else fails :) > or maybe what's actually happening is the network is blackholing in both > cases but PMTUD is figuring this out in the case of TCP (RFC 2923), and > maybe that stops working when encapsulated in UDP?... PMTUD is independent of the L4 protocol so it normally works either way, but that always assumes your network operatior is doing their job correctly. It's possible to filter (eg.) only ICMP-PTB errors involving UDP packets. While this sounds pointless to me it could be some attempt at UDP fragment attack DDoS mitigation. Speaking of RFC 2923, another possible workaround is enabling RFC 4821 (Packetization Layer PMTU) behaviour on your hosts, but MSS clamping has the advantage of letting you control this on a gateway i.e. without touching (all) your hosts. > This behaviour is probably network operator dependent, or specific to > LTE networks, which I use for permanent internet access, and which > commonly use a lower than average MTU. For example my current ISP uses > 1380, and the current wg-quick behaviour is to set the MTU to the > default route interface MTU less 80 bytes (1420 for regular interfaces), > which results in the above behaviour. > > I've used all four of the major mobile network operators in my country > and experienced this on two of them (separate physical networks, not > virtual operators). The other two used an MTU of 1500 anyway. > > Just to prove I'm not entirely on my own, this issue also appears to be > known to WireGuard VPN providers, .e.g from Mullvad's FAQ: I have no doubt the problem you're facing is real, depending on the results from the test above you may just be complaining to the wrong party ;) Happy to help you figure out how to work around any ISP bullshit once we figure out what it is that's happening. --Daniel From dxld at darkboxed.org Mon Nov 27 15:33:06 2023 From: dxld at darkboxed.org (=?UTF-8?q?Daniel=20Gr=C3=B6ber?=) Date: Mon, 27 Nov 2023 16:33:06 +0100 Subject: [PATCH net-next v2] wireguard: Add netlink attrs for binding to address and netdev Message-ID: <20231127153306.1975792-1-dxld@darkboxed.org> Multihomed hosts may want to run distinct wg tunnels across all their uplinks for redundant connectivity. Currently this entails picking different ports for each wg tunnel since we allow only binding to the wildcard address. Sharing a single port-number for all uplink connections (but bound to a particular IP/netdev) simplifies managment considerably. A closely related use-case that also touches the socket binding code is having a wg socket be part of a VRF. This mirrors how we support socket and wg device in distinct namespaces. To make using VRFs with wg easy we want to be able to bind to a particular device as this will cause the kernel to automatically route all outgoing packets with the VRF's routing table and (in the default udp_l3mdev_accept=0 config) only accept packets from interfaces in the VRF without the need for netfilter rules. While users can currently use VRFs for wg tunnel traffic by configuring fwmark ip-rules and setting sysctl udp_l3mdev_accept=1 (with or without additional nft filtering) this is at best a cludge. When VRF membership changes it becomes a major hassle to keep ip-rules up to date. Signed-off-by: Daniel Gr?ber --- Changes in v2: - Fix building without CONFIG_IPV6 drivers/net/wireguard/device.c | 4 +-- drivers/net/wireguard/device.h | 3 +- drivers/net/wireguard/netlink.c | 56 ++++++++++++++++++++++++++++----- drivers/net/wireguard/socket.c | 41 +++++++++++++++--------- drivers/net/wireguard/socket.h | 3 +- include/uapi/linux/wireguard.h | 6 ++++ 6 files changed, 88 insertions(+), 25 deletions(-) diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c index deb9636b0ecf..ec28f5021791 100644 --- a/drivers/net/wireguard/device.c +++ b/drivers/net/wireguard/device.c @@ -48,7 +48,7 @@ static int wg_open(struct net_device *dev) dev_v6->cnf.addr_gen_mode = IN6_ADDR_GEN_MODE_NONE; mutex_lock(&wg->device_update_lock); - ret = wg_socket_init(wg, wg->incoming_port); + ret = wg_socket_init(wg, wg->port_cfg); if (ret < 0) goto out; list_for_each_entry(peer, &wg->peer_list, peer_list) { @@ -249,7 +249,7 @@ static void wg_destruct(struct net_device *dev) rtnl_unlock(); mutex_lock(&wg->device_update_lock); rcu_assign_pointer(wg->creating_net, NULL); - wg->incoming_port = 0; + memzero_explicit(&wg->port_cfg, sizeof(wg->port_cfg)); wg_socket_reinit(wg, NULL, NULL); /* The final references are cleared in the below calls to destroy_workqueue. */ wg_peer_remove_all(wg); diff --git a/drivers/net/wireguard/device.h b/drivers/net/wireguard/device.h index 43c7cebbf50b..ac4092d8c9d0 100644 --- a/drivers/net/wireguard/device.h +++ b/drivers/net/wireguard/device.h @@ -17,6 +17,7 @@ #include #include #include +#include struct wg_device; @@ -53,7 +54,7 @@ struct wg_device { atomic_t handshake_queue_len; unsigned int num_peers, device_update_gen; u32 fwmark; - u16 incoming_port; + struct udp_port_cfg port_cfg; }; int wg_device_init(void); diff --git a/drivers/net/wireguard/netlink.c b/drivers/net/wireguard/netlink.c index e220d761b1f2..cfc4f92d3dba 100644 --- a/drivers/net/wireguard/netlink.c +++ b/drivers/net/wireguard/netlink.c @@ -26,6 +26,8 @@ static const struct nla_policy device_policy[WGDEVICE_A_MAX + 1] = { [WGDEVICE_A_PUBLIC_KEY] = NLA_POLICY_EXACT_LEN(NOISE_PUBLIC_KEY_LEN), [WGDEVICE_A_FLAGS] = { .type = NLA_U32 }, [WGDEVICE_A_LISTEN_PORT] = { .type = NLA_U16 }, + [WGDEVICE_A_LISTEN_ADDR] = NLA_POLICY_MIN_LEN(sizeof(struct in_addr)), + [WGDEVICE_A_LISTEN_IFINDEX] = { .type = NLA_U32 }, [WGDEVICE_A_FWMARK] = { .type = NLA_U32 }, [WGDEVICE_A_PEERS] = { .type = NLA_NESTED } }; @@ -230,11 +232,22 @@ static int wg_get_device_dump(struct sk_buff *skb, struct netlink_callback *cb) if (!ctx->next_peer) { if (nla_put_u16(skb, WGDEVICE_A_LISTEN_PORT, - wg->incoming_port) || + ntohs(wg->port_cfg.local_udp_port)) || + nla_put_u32(skb, WGDEVICE_A_LISTEN_IFINDEX, wg->port_cfg.bind_ifindex) || nla_put_u32(skb, WGDEVICE_A_FWMARK, wg->fwmark) || nla_put_u32(skb, WGDEVICE_A_IFINDEX, wg->dev->ifindex) || nla_put_string(skb, WGDEVICE_A_IFNAME, wg->dev->name)) goto out; + if (wg->port_cfg.family == AF_INET && + nla_put_in_addr(skb, WGDEVICE_A_LISTEN_ADDR, + wg->port_cfg.local_ip.s_addr)) + goto out; +#if IS_ENABLED(CONFIG_IPV6) + if (wg->port_cfg.family == AF_INET6 && + nla_put_in6_addr(skb, WGDEVICE_A_LISTEN_ADDR, + &wg->port_cfg.local_ip6)) + goto out; +#endif down_read(&wg->static_identity.lock); if (wg->static_identity.has_identity) { @@ -311,19 +324,49 @@ static int wg_get_device_done(struct netlink_callback *cb) return 0; } -static int set_port(struct wg_device *wg, u16 port) +static int set_port_cfg(struct wg_device *wg, struct nlattr **attrs) { struct wg_peer *peer; + struct udp_port_cfg port_cfg = { + .family = AF_UNSPEC, + }; + + if (attrs[WGDEVICE_A_LISTEN_PORT]) + port_cfg.local_udp_port = + htons(nla_get_u16(attrs[WGDEVICE_A_LISTEN_PORT])); + if (attrs[WGDEVICE_A_LISTEN_ADDR]) { + union { + struct in_addr addr4; + struct in6_addr addr6; + } *u_addr = nla_data(attrs[WGDEVICE_A_LISTEN_ADDR]); + size_t len = nla_len(attrs[WGDEVICE_A_LISTEN_ADDR]); + if (len == sizeof(struct in_addr)) { + port_cfg.family = AF_INET; + port_cfg.local_ip = u_addr->addr4; + } else if (len == sizeof(struct in6_addr)) { +#if IS_ENABLED(CONFIG_IPV6) + port_cfg.family = AF_INET6; + port_cfg.local_ip6 = u_addr->addr6; +#else + return -EAFNOSUPPORT; +#endif + + } + } + if (attrs[WGDEVICE_A_LISTEN_IFINDEX]) { + port_cfg.bind_ifindex = + nla_get_u32(attrs[WGDEVICE_A_LISTEN_IFINDEX]); + } - if (wg->incoming_port == port) + if (memcmp(&port_cfg, &wg->port_cfg, sizeof(port_cfg)) == 0) return 0; list_for_each_entry(peer, &wg->peer_list, peer_list) wg_socket_clear_peer_endpoint_src(peer); if (!netif_running(wg->dev)) { - wg->incoming_port = port; + wg->port_cfg = port_cfg; return 0; } - return wg_socket_init(wg, port); + return wg_socket_init(wg, port_cfg); } static int set_allowedip(struct wg_peer *peer, struct nlattr **attrs) @@ -531,8 +574,7 @@ static int wg_set_device(struct sk_buff *skb, struct genl_info *info) } if (info->attrs[WGDEVICE_A_LISTEN_PORT]) { - ret = set_port(wg, - nla_get_u16(info->attrs[WGDEVICE_A_LISTEN_PORT])); + ret = set_port_cfg(wg, info->attrs); if (ret) goto out; } diff --git a/drivers/net/wireguard/socket.c b/drivers/net/wireguard/socket.c index 0414d7a6ce74..47bb46e0cdd9 100644 --- a/drivers/net/wireguard/socket.c +++ b/drivers/net/wireguard/socket.c @@ -346,7 +346,7 @@ static void set_sock_opts(struct socket *sock) sk_set_memalloc(sock->sk); } -int wg_socket_init(struct wg_device *wg, u16 port) +int wg_socket_init(struct wg_device *wg, struct udp_port_cfg port_cfg) { struct net *net; int ret; @@ -356,12 +356,7 @@ int wg_socket_init(struct wg_device *wg, u16 port) .encap_rcv = wg_receive }; struct socket *new4 = NULL, *new6 = NULL; - struct udp_port_cfg port4 = { - .family = AF_INET, - .local_ip.s_addr = htonl(INADDR_ANY), - .local_udp_port = htons(port), - .use_udp_checksums = true - }; + struct udp_port_cfg port4; #if IS_ENABLED(CONFIG_IPV6) int retries = 0; struct udp_port_cfg port6 = { @@ -373,6 +368,23 @@ int wg_socket_init(struct wg_device *wg, u16 port) }; #endif + if (port_cfg.family == AF_UNSPEC) { + port4 = (struct udp_port_cfg) { + .family = AF_INET, + .local_ip.s_addr = htonl(INADDR_ANY), + .local_udp_port = port_cfg.local_udp_port, + .use_udp_checksums = true + }; + } else { + port4 = port_cfg; + port4.use_udp_checksums = true; + if (IS_ENABLED(CONFIG_IPV6) && port_cfg.family == AF_INET6) { + port4.use_udp6_tx_checksums = true; + port4.use_udp6_rx_checksums = true; + port4.ipv6_v6only = true; + } + } + rcu_read_lock(); net = rcu_dereference(wg->creating_net); net = net ? maybe_get_net(net) : NULL; @@ -380,10 +392,6 @@ int wg_socket_init(struct wg_device *wg, u16 port) if (unlikely(!net)) return -ENONET; -#if IS_ENABLED(CONFIG_IPV6) -retry: -#endif - ret = udp_sock_create(net, &port4, &new4); if (ret < 0) { pr_err("%s: Could not create IPv4 socket\n", wg->dev->name); @@ -392,13 +400,18 @@ int wg_socket_init(struct wg_device *wg, u16 port) set_sock_opts(new4); setup_udp_tunnel_sock(net, new4, &cfg); + if (port_cfg.family != AF_UNSPEC) + goto reinit; + #if IS_ENABLED(CONFIG_IPV6) +retry: if (ipv6_mod_enabled()) { port6.local_udp_port = inet_sk(new4->sk)->inet_sport; ret = udp_sock_create(net, &port6, &new6); if (ret < 0) { udp_tunnel_sock_release(new4); - if (ret == -EADDRINUSE && !port && retries++ < 100) + if (ret == -EADDRINUSE && !port_cfg.local_udp_port && + retries++ < 100) goto retry; pr_err("%s: Could not create IPv6 socket\n", wg->dev->name); @@ -409,6 +422,8 @@ int wg_socket_init(struct wg_device *wg, u16 port) } #endif +reinit: + wg->port_cfg = port_cfg; wg_socket_reinit(wg, new4->sk, new6 ? new6->sk : NULL); ret = 0; out: @@ -428,8 +443,6 @@ void wg_socket_reinit(struct wg_device *wg, struct sock *new4, lockdep_is_held(&wg->socket_update_lock)); rcu_assign_pointer(wg->sock4, new4); rcu_assign_pointer(wg->sock6, new6); - if (new4) - wg->incoming_port = ntohs(inet_sk(new4)->inet_sport); mutex_unlock(&wg->socket_update_lock); synchronize_net(); sock_free(old4); diff --git a/drivers/net/wireguard/socket.h b/drivers/net/wireguard/socket.h index bab5848efbcd..1532a263c518 100644 --- a/drivers/net/wireguard/socket.h +++ b/drivers/net/wireguard/socket.h @@ -10,8 +10,9 @@ #include #include #include +#include -int wg_socket_init(struct wg_device *wg, u16 port); +int wg_socket_init(struct wg_device *wg, struct udp_port_cfg port); void wg_socket_reinit(struct wg_device *wg, struct sock *new4, struct sock *new6); int wg_socket_send_buffer_to_peer(struct wg_peer *peer, void *data, diff --git a/include/uapi/linux/wireguard.h b/include/uapi/linux/wireguard.h index ae88be14c947..240d1c850dfd 100644 --- a/include/uapi/linux/wireguard.h +++ b/include/uapi/linux/wireguard.h @@ -28,6 +28,8 @@ * WGDEVICE_A_PRIVATE_KEY: NLA_EXACT_LEN, len WG_KEY_LEN * WGDEVICE_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN * WGDEVICE_A_LISTEN_PORT: NLA_U16 + * WGDEVICE_A_LISTEN_ADDR : NLA_MIN_LEN(struct sockaddr), struct sockaddr_in or struct sockaddr_in6 + * WGDEVICE_A_LISTEN_IFINDEX : NLA_U32 * WGDEVICE_A_FWMARK: NLA_U32 * WGDEVICE_A_PEERS: NLA_NESTED * 0: NLA_NESTED @@ -82,6 +84,8 @@ * peers should be removed prior to adding the list below. * WGDEVICE_A_PRIVATE_KEY: len WG_KEY_LEN, all zeros to remove * WGDEVICE_A_LISTEN_PORT: NLA_U16, 0 to choose randomly + * WGDEVICE_A_LISTEN_ADDR : struct sockaddr_in or struct sockaddr_in6. + * WGDEVICE_A_LISTEN_IFINDEX : NLA_U32 * WGDEVICE_A_FWMARK: NLA_U32, 0 to disable * WGDEVICE_A_PEERS: NLA_NESTED * 0: NLA_NESTED @@ -157,6 +161,8 @@ enum wgdevice_attribute { WGDEVICE_A_LISTEN_PORT, WGDEVICE_A_FWMARK, WGDEVICE_A_PEERS, + WGDEVICE_A_LISTEN_ADDR, + WGDEVICE_A_LISTEN_IFINDEX, __WGDEVICE_A_LAST }; #define WGDEVICE_A_MAX (__WGDEVICE_A_LAST - 1) -- 2.39.2 From dxld at darkboxed.org Mon Nov 27 17:38:09 2023 From: dxld at darkboxed.org (=?UTF-8?q?Daniel=20Gr=C3=B6ber?=) Date: Mon, 27 Nov 2023 18:38:09 +0100 Subject: [PATCH] wg-quick: Pick smallest MTU of any endpoint route Message-ID: <20231127173809.2050198-1-dxld@darkboxed.org> If the goal is to avoid fragmentation picking the largest MTU of any endpoint route doesn't make sense. Users with networks supporting jumbo frames (MTU>1500) can still set MTU=, but most users will want to avoid fratmentation by default. Incidentally android.c already had the correct behaviour: next_mtu = get_route_mtu(endpoint); if (next_mtu > 0 && next_mtu < endpoint_mtu) endpoint_mtu = next_mtu; only the shell implementations were problematic. Signed-off-by: Daniel Gr?ber --- src/man/wg-quick.8 | 14 +++++++++++--- src/wg-quick/freebsd.bash | 4 ++-- src/wg-quick/linux.bash | 4 ++-- src/wg-quick/openbsd.bash | 4 ++-- 4 files changed, 17 insertions(+), 9 deletions(-) diff --git a/src/man/wg-quick.8 b/src/man/wg-quick.8 index bc9e145..391a095 100644 --- a/src/man/wg-quick.8 +++ b/src/man/wg-quick.8 @@ -83,9 +83,17 @@ specified multiple times. Upon bringing the interface up, this runs .BR resolvconf (8) are undesirable, the PostUp and PostDown keys below may be used instead. .IP \(bu -MTU \(em if not specified, the MTU is automatically determined from the endpoint addresses -or the system default route, which is usually a sane choice. However, to manually specify -an MTU to override this automatic discovery, this value may be specified explicitly. +MTU \(em wg tunnel interface MTU. If not specified, the MTU is set to the +lowest route MTU of any peer endpoint or if route lookups don't return an +explicit MTU the system default route's MTU less encapsulation overhead. + +Note that matching the tunnel MTU to the underlying network MTU is normally +only a performance concern. WireGuard allows encapsulated (UDP) packets to +be fragmented through PMTUD (IPv4/IPv6) or in-network fragmentation (IPv4) +depending on system behaviour. + +However, to forgoe this automatic behaviour, a static value may may be +specified here. .IP \(bu Table \(em Controls the routing table to which routes are added. There are two special values: `off' disables the creation of routes altogether, and `auto' diff --git a/src/wg-quick/freebsd.bash b/src/wg-quick/freebsd.bash index f72daf6..3886f47 100755 --- a/src/wg-quick/freebsd.bash +++ b/src/wg-quick/freebsd.bash @@ -191,11 +191,11 @@ set_mtu() { family=inet [[ ${BASH_REMATCH[1]} == *:* ]] && family=inet6 output="$(route -n get "-$family" "${BASH_REMATCH[1]}" || true)" - [[ $output =~ interface:\ ([^ ]+)$'\n' && $(ifconfig "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) && ${BASH_REMATCH[1]} -gt $mtu ]] && mtu="${BASH_REMATCH[1]}" + [[ $output =~ interface:\ ([^ ]+)$'\n' && $(ifconfig "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) && ${BASH_REMATCH[1]} -lt $mtu ]] && mtu="${BASH_REMATCH[1]}" done < <(wg show "$INTERFACE" endpoints) if [[ $mtu -eq 0 ]]; then read -r output < <(route -n get default || true) || true - [[ $output =~ interface:\ ([^ ]+)$'\n' && $(ifconfig "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) && ${BASH_REMATCH[1]} -gt $mtu ]] && mtu="${BASH_REMATCH[1]}" + [[ $output =~ interface:\ ([^ ]+)$'\n' && $(ifconfig "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) && ${BASH_REMATCH[1]} -lt $mtu ]] && mtu="${BASH_REMATCH[1]}" fi [[ $mtu -gt 0 ]] || mtu=1500 cmd ifconfig "$INTERFACE" mtu $(( mtu - 80 )) diff --git a/src/wg-quick/linux.bash b/src/wg-quick/linux.bash index 4193ce5..eab411c 100755 --- a/src/wg-quick/linux.bash +++ b/src/wg-quick/linux.bash @@ -131,11 +131,11 @@ set_mtu_up() { while read -r _ endpoint; do [[ $endpoint =~ ^\[?([a-z0-9:.]+)\]?:[0-9]+$ ]] || continue output="$(ip route get "${BASH_REMATCH[1]}" || true)" - [[ ( $output =~ mtu\ ([0-9]+) || ( $output =~ dev\ ([^ ]+) && $(ip link show dev "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) ) ) && ${BASH_REMATCH[1]} -gt $mtu ]] && mtu="${BASH_REMATCH[1]}" + [[ ( $output =~ mtu\ ([0-9]+) || ( $output =~ dev\ ([^ ]+) && $(ip link show dev "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) ) ) && ${BASH_REMATCH[1]} -lt $mtu ]] && mtu="${BASH_REMATCH[1]}" done < <(wg show "$INTERFACE" endpoints) if [[ $mtu -eq 0 ]]; then read -r output < <(ip route show default || true) || true - [[ ( $output =~ mtu\ ([0-9]+) || ( $output =~ dev\ ([^ ]+) && $(ip link show dev "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) ) ) && ${BASH_REMATCH[1]} -gt $mtu ]] && mtu="${BASH_REMATCH[1]}" + [[ ( $output =~ mtu\ ([0-9]+) || ( $output =~ dev\ ([^ ]+) && $(ip link show dev "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) ) ) && ${BASH_REMATCH[1]} -lt $mtu ]] && mtu="${BASH_REMATCH[1]}" fi [[ $mtu -gt 0 ]] || mtu=1500 cmd ip link set mtu $(( mtu - 80 )) up dev "$INTERFACE" diff --git a/src/wg-quick/openbsd.bash b/src/wg-quick/openbsd.bash index b58ecf5..14f26ff 100755 --- a/src/wg-quick/openbsd.bash +++ b/src/wg-quick/openbsd.bash @@ -174,11 +174,11 @@ set_mtu() { family=inet [[ ${BASH_REMATCH[1]} == *:* ]] && family=inet6 output="$(route -n get "-$family" "${BASH_REMATCH[1]}" || true)" - [[ $output =~ interface:\ ([^ ]+)$'\n' && $(ifconfig "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) && ${BASH_REMATCH[1]} -gt $mtu ]] && mtu="${BASH_REMATCH[1]}" + [[ $output =~ interface:\ ([^ ]+)$'\n' && $(ifconfig "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) && ${BASH_REMATCH[1]} -lt $mtu ]] && mtu="${BASH_REMATCH[1]}" done < <(wg show "$REAL_INTERFACE" endpoints) if [[ $mtu -eq 0 ]]; then read -r output < <(route -n get default || true) || true - [[ $output =~ interface:\ ([^ ]+)$'\n' && $(ifconfig "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) && ${BASH_REMATCH[1]} -gt $mtu ]] && mtu="${BASH_REMATCH[1]}" + [[ $output =~ interface:\ ([^ ]+)$'\n' && $(ifconfig "${BASH_REMATCH[1]}") =~ mtu\ ([0-9]+) && ${BASH_REMATCH[1]} -lt $mtu ]] && mtu="${BASH_REMATCH[1]}" fi [[ $mtu -gt 0 ]] || mtu=1500 cmd ifconfig "$REAL_INTERFACE" mtu $(( mtu - 80 )) -- 2.39.2 From dzm at unexpl0.red Thu Nov 23 14:32:17 2023 From: dzm at unexpl0.red (z) Date: Thu, 23 Nov 2023 14:32:17 -0000 Subject: UAPI socket for the macOS sandboxed Wireguard app In-Reply-To: References: Message-ID: <44acf7f1-3a68-4e82-bb2e-0f296490b7ce@app.fastmail.com> Would like to see this reviewed, as it appears to accomplish #4 on the MacOS TODO list[0]. I know Jason hasn't gotten a chance to review yet, as he says in the wgctrl-go PR. If we need extra review bandwidth, I can do some testing if desired. -dzm [0]: https://docs.google.com/document/d/1BnzImOF8CkungFnuRlWhnEpY2OmEHSckat62aZ6LYGY/edit On Sat, Oct 7, 2023, at 10:46 PM, Jan Noha wrote: > Hello, > > I want to submit a series of patches concerning Wireguard on macOS. > > If it's ok, I will just link to a github PR which links to three other > PRs (in wireguard-apple, wireguard-go and wireguard-tools). > > https://github.com/WireGuard/wgctrl-go/pull/143 > > Let me explain what this is about. I've been trying to automate > Wireguard tunnel configuration for some P2P use cases and I wanted to > use wgctrl-go library for the task. > > This already works fine on Linux and Windows. On macOS, it's a bit > more complicated. If you only use CLI for creating tun interfaces > (using wireguard from homebrew for example), it also works. > Specifically, wgctrl-go communicates with the wireguard user-space > daemon via a unix domain socket located in /var/run/wireguard/ (this > is referred to as UAPI in the code). > > However, if you want to use Wireguard from the App Store - which has > some other advantages besides the UI (such as on-demand VPN and > generally nice OS integration) - it comes as a sandboxed Network > Extension. Currently, it does not expose any UAPI socket, so wgctrl-go > cannot be used to configure it. > > The socket can be opened except it has to be inside the sandbox home > directory. There is no problem connecting to it from "outside" using > cli tools which are not sandboxed themselves. > > That's basically what I did here. Changes were needed in > wireguard-apple and wireguard-go to open the socket in a > macOS-specific location, then I updated wgctrl-go and wireguard-tools > (so that wg commands work too) to look for UAPI sockets in both the > sandbox location and the default one. > > If you're interested in discussing this topic further, I'll look > forward to any feedback. > > Thank you, > Jan Noha From mirco.barone.3 at gmail.com Tue Nov 28 19:10:58 2023 From: mirco.barone.3 at gmail.com (Mirco Barone) Date: Tue, 28 Nov 2023 19:10:58 -0000 Subject: Implementation of UDP GSO and GRO in WireGuard Kernel Version Message-ID: Hi everyone, I've noticed the progress made by the Tailscale team with the userspace version of WireGuard (wireguard-go). According to their report, the implementation of UDP GSO in the physical interface responsible for delivering UDP packets in the tunnel and UDP GRO in the interface receiving UDP packets was crucial for their success. I'm curious to know if this feature has been extended to the kernel version of WireGuard. Additionally, is there anyone currently working on implementing these features in the kernel version, if feasible? Kind regards.