From regressions at leemhuis.info  Sun Jul  2 12:37:18 2023
From: regressions at leemhuis.info (Linux regression tracking (Thorsten Leemhuis))
Date: Sun, 2 Jul 2023 14:37:18 +0200
Subject: Fwd: RCU stalls with wireguard over bonding over igb on Linux
 6.3.0+
In-Reply-To: <79196679-fb65-e5ad-e836-2c43447cfacd@gmail.com>
References: <e5b76a4f-81ae-5b09-535f-114149be5069@gmail.com>
 <79196679-fb65-e5ad-e836-2c43447cfacd@gmail.com>
Message-ID: <10f2a5ee-91e2-1241-9e3b-932c493e61b6@leemhuis.info>

On 02.07.23 13:57, Bagas Sanjaya wrote:
> [also Cc: original reporter]

BTW: I think you CCed too many developers here. There are situations
where this can makes sense, but it's rare. And if you do this too often
people might start to not really look into your mails or might even
ignore them completely.

Normally it's enough to write the mail to (1) the people in the
signed-off-by-chain, (2) the maintainers of the subsystem that merged a
commit, and (3) the lists for all affected subsystems; leave it up to
developers from the first two groups to CC the maintainers of the third
group.

> On 7/2/23 10:31, Bagas Sanjaya wrote:
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>
>>> I've spent the last week on debugging a problem with my attempt to upgrade my kernel from 6.2.8 to 6.3.8 (now also with 
> [...]
>> See Bugzilla for the full thread.
>>
>> Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
>> unnoticed:
>>
>> #regzbot introduced: fed8d8773b8ea6 https://bugzilla.kernel.org/show_bug.cgi?id=217620
>> #regzbot title: correcting acpi_is_processor_usable() check causes RCU stalls with wireguard over bonding+igb
>> #regzbot link: https://bugs.gentoo.org/909066

> satmd: Can you repeat bisection to confirm that fed8d8773b8ea6 is
> really the culprit?

I'd be careful to ask people that, as that might mean a lot of work for
them. Best to leave things like that to developers, unless it's pretty
obvious that something went sideways.

> Thorsten: It seems like the reporter concluded bisection to the
> (possibly) incorrect culprit.

What makes your think so? I just looked at bugzilla and it (for now)
seems reverting fed8d8773b8ea6 ontop of 6.4 fixed things for the
reporter, which is a pretty strong indicator that this change really
causes the trouble somehow.

/me really wonders what's he's missing

> What can I do in this case besides
> asking to repeat bisection?

Not much apart from updating regzbot state (e.g. something like "regzbot
introduced v6.3..v6.4") and a reply to your initial report (ideally with
a quick apology) to let everyone know it was a false alarm.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

From Jason at zx2c4.com  Sun Jul  2 13:46:38 2023
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Sun, 2 Jul 2023 15:46:38 +0200
Subject: Fwd: RCU stalls with wireguard over bonding over igb on Linux
 6.3.0+
In-Reply-To: <10f2a5ee-91e2-1241-9e3b-932c493e61b6@leemhuis.info>
References: <e5b76a4f-81ae-5b09-535f-114149be5069@gmail.com>
 <79196679-fb65-e5ad-e836-2c43447cfacd@gmail.com>
 <10f2a5ee-91e2-1241-9e3b-932c493e61b6@leemhuis.info>
Message-ID: <CAHmME9onMWdJVUerf86V0kpmNKByt+VC=SUfys+GFryGq1ziHQ@mail.gmail.com>

I've got an overdue patch that I still need to submit to netdev, which
I suspect might actually fix this.

Can you let me know if
https://git.zx2c4.com/wireguard-linux/patch/?id=54d5e4329efe0d1dba8b4a58720d29493926bed0
solves the problem?

Jason

From bagasdotme at gmail.com  Sun Jul  2 03:31:56 2023
From: bagasdotme at gmail.com (Bagas Sanjaya)
Date: Sun, 2 Jul 2023 10:31:56 +0700
Subject: Fwd: RCU stalls with wireguard over bonding over igb on Linux 6.3.0+
Message-ID: <e5b76a4f-81ae-5b09-535f-114149be5069@gmail.com>

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> I've spent the last week on debugging a problem with my attempt to upgrade my kernel from 6.2.8 to 6.3.8 (now also with 6.4.0 too).
> 
> The lenghty and detailed bug reports with all aspects of git bisect are at
> https://bugs.gentoo.org/909066
> 
> A summary:
> - if I do not configure wg0, the kernel does not hang
> - if I use a kernel older than commit fed8d8773b8ea68ad99d9eee8c8343bef9da2c2c, it does not hang
> 
> The commit refers to code that seems unrelated to the problem for my naiive eye.
> 
> The hardware is a Dell PowerEdge R620 running Gentoo ~amd64.
> 
> I have so far excluded:
> - dracut for generating the initramfs is the same version over all kernels
> - linux-firmware has been the same
> - CPU microcode has been the same
> 
> It's been a long time since I seriously involved with software development and I have been even less involved with kernel development.
> 
> Gentoo maintainers recommended me to open a bug with upstream, so here I am.
> 
> I currently have no idea how to make progress, but I'm willing to try things.

See Bugzilla for the full thread.

Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
unnoticed:

#regzbot introduced: fed8d8773b8ea6 https://bugzilla.kernel.org/show_bug.cgi?id=217620
#regzbot title: correcting acpi_is_processor_usable() check causes RCU stalls with wireguard over bonding+igb
#regzbot link: https://bugs.gentoo.org/909066

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217620

-- 
An old man doll... just what I always wanted! - Clara

From bagasdotme at gmail.com  Sun Jul  2 11:57:05 2023
From: bagasdotme at gmail.com (Bagas Sanjaya)
Date: Sun, 2 Jul 2023 18:57:05 +0700
Subject: Fwd: RCU stalls with wireguard over bonding over igb on Linux
 6.3.0+
In-Reply-To: <e5b76a4f-81ae-5b09-535f-114149be5069@gmail.com>
References: <e5b76a4f-81ae-5b09-535f-114149be5069@gmail.com>
Message-ID: <79196679-fb65-e5ad-e836-2c43447cfacd@gmail.com>

[also Cc: original reporter]

On 7/2/23 10:31, Bagas Sanjaya wrote:
> Hi,
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
>> I've spent the last week on debugging a problem with my attempt to upgrade my kernel from 6.2.8 to 6.3.8 (now also with 6.4.0 too).
>>
>> The lenghty and detailed bug reports with all aspects of git bisect are at
>> https://bugs.gentoo.org/909066
>>
>> A summary:
>> - if I do not configure wg0, the kernel does not hang
>> - if I use a kernel older than commit fed8d8773b8ea68ad99d9eee8c8343bef9da2c2c, it does not hang
>>
>> The commit refers to code that seems unrelated to the problem for my naiive eye.
>>
>> The hardware is a Dell PowerEdge R620 running Gentoo ~amd64.
>>
>> I have so far excluded:
>> - dracut for generating the initramfs is the same version over all kernels
>> - linux-firmware has been the same
>> - CPU microcode has been the same
>>
>> It's been a long time since I seriously involved with software development and I have been even less involved with kernel development.
>>
>> Gentoo maintainers recommended me to open a bug with upstream, so here I am.
>>
>> I currently have no idea how to make progress, but I'm willing to try things.
> 
> See Bugzilla for the full thread.
> 
> Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
> unnoticed:
> 
> #regzbot introduced: fed8d8773b8ea6 https://bugzilla.kernel.org/show_bug.cgi?id=217620
> #regzbot title: correcting acpi_is_processor_usable() check causes RCU stalls with wireguard over bonding+igb
> #regzbot link: https://bugs.gentoo.org/909066
> 

satmd: Can you repeat bisection to confirm that fed8d8773b8ea6 is
really the culprit?

Thorsten: It seems like the reporter concluded bisection to the
(possibly) incorrect culprit. What can I do in this case besides
asking to repeat bisection?

-- 
An old man doll... just what I always wanted! - Clara


From bagasdotme at gmail.com  Sun Jul  2 14:08:27 2023
From: bagasdotme at gmail.com (Bagas Sanjaya)
Date: Sun, 2 Jul 2023 21:08:27 +0700
Subject: Fwd: RCU stalls with wireguard over bonding over igb on Linux
 6.3.0+
In-Reply-To: <10f2a5ee-91e2-1241-9e3b-932c493e61b6@leemhuis.info>
References: <e5b76a4f-81ae-5b09-535f-114149be5069@gmail.com>
 <79196679-fb65-e5ad-e836-2c43447cfacd@gmail.com>
 <10f2a5ee-91e2-1241-9e3b-932c493e61b6@leemhuis.info>
Message-ID: <644f4551-32e8-11f9-0d4a-ad1045fdae77@gmail.com>

On 7/2/23 19:37, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 02.07.23 13:57, Bagas Sanjaya wrote:
>> [also Cc: original reporter]
> 
> BTW: I think you CCed too many developers here. There are situations
> where this can makes sense, but it's rare. And if you do this too often
> people might start to not really look into your mails or might even
> ignore them completely.
> 
> Normally it's enough to write the mail to (1) the people in the
> signed-off-by-chain, (2) the maintainers of the subsystem that merged a
> commit, and (3) the lists for all affected subsystems; leave it up to
> developers from the first two groups to CC the maintainers of the third
> group.
> 

Hi,

In this case I had to also Cc: wireguard, bonding, RCU, and x86 people,
since this issue spans these subsystems (I naively thought). Anyway,
thanks for detailed tip (honestly /me wonder if I forgot this later, as
is often the case).

>> On 7/2/23 10:31, Bagas Sanjaya wrote:
>>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>>
>>>> I've spent the last week on debugging a problem with my attempt to upgrade my kernel from 6.2.8 to 6.3.8 (now also with 
>> [...]
>>> See Bugzilla for the full thread.
>>>
>>> Anyway, I'm adding it to regzbot to make sure it doesn't fall through cracks
>>> unnoticed:
>>>
>>> #regzbot introduced: fed8d8773b8ea6 https://bugzilla.kernel.org/show_bug.cgi?id=217620
>>> #regzbot title: correcting acpi_is_processor_usable() check causes RCU stalls with wireguard over bonding+igb
>>> #regzbot link: https://bugs.gentoo.org/909066
> 
>> satmd: Can you repeat bisection to confirm that fed8d8773b8ea6 is
>> really the culprit?
> 
> I'd be careful to ask people that, as that might mean a lot of work for
> them. Best to leave things like that to developers, unless it's pretty
> obvious that something went sideways.
> 

OK.

>> Thorsten: It seems like the reporter concluded bisection to the
>> (possibly) incorrect culprit.
> 
> What makes your think so? I just looked at bugzilla and it (for now)
> seems reverting fed8d8773b8ea6 ontop of 6.4 fixed things for the
> reporter, which is a pretty strong indicator that this change really
> causes the trouble somehow.
> 

OK too.

> /me really wonders what's he's missing
> 
>> What can I do in this case besides
>> asking to repeat bisection?
> 
> Not much apart from updating regzbot state (e.g. something like "regzbot
> introduced v6.3..v6.4") and a reply to your initial report (ideally with
> a quick apology) to let everyone know it was a false alarm.
> 

OK.

-- 
An old man doll... just what I always wanted! - Clara


From Jason at zx2c4.com  Mon Jul  3 01:29:30 2023
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Mon, 3 Jul 2023 03:29:30 +0200
Subject: Fwd: RCU stalls with wireguard over bonding over igb on Linux
 6.3.0+
In-Reply-To: <CAHmME9onMWdJVUerf86V0kpmNKByt+VC=SUfys+GFryGq1ziHQ@mail.gmail.com>
References: <e5b76a4f-81ae-5b09-535f-114149be5069@gmail.com>
 <79196679-fb65-e5ad-e836-2c43447cfacd@gmail.com>
 <10f2a5ee-91e2-1241-9e3b-932c493e61b6@leemhuis.info>
 <CAHmME9onMWdJVUerf86V0kpmNKByt+VC=SUfys+GFryGq1ziHQ@mail.gmail.com>
Message-ID: <ZKIkevSrMJISHDig@zx2c4.com>

On Sun, Jul 02, 2023 at 03:46:38PM +0200, Jason A. Donenfeld wrote:
> I've got an overdue patch that I still need to submit to netdev, which
> I suspect might actually fix this.
> 
> Can you let me know if
> https://git.zx2c4.com/wireguard-linux/patch/?id=54d5e4329efe0d1dba8b4a58720d29493926bed0
> solves the problem?

satmd, the original reporter, confirmed over on the Gentoo bug report -
https://bugs.gentoo.org/909066 - that this patch fixes the issue.

This patch has been sent into netdev and will presumably hit the various
trees and stable in due time.

Jason

From bagasdotme at gmail.com  Mon Jul  3 01:34:01 2023
From: bagasdotme at gmail.com (Bagas Sanjaya)
Date: Mon, 3 Jul 2023 08:34:01 +0700
Subject: Fwd: RCU stalls with wireguard over bonding over igb on Linux
 6.3.0+
In-Reply-To: <CAHmME9onMWdJVUerf86V0kpmNKByt+VC=SUfys+GFryGq1ziHQ@mail.gmail.com>
References: <e5b76a4f-81ae-5b09-535f-114149be5069@gmail.com>
 <79196679-fb65-e5ad-e836-2c43447cfacd@gmail.com>
 <10f2a5ee-91e2-1241-9e3b-932c493e61b6@leemhuis.info>
 <CAHmME9onMWdJVUerf86V0kpmNKByt+VC=SUfys+GFryGq1ziHQ@mail.gmail.com>
Message-ID: <ZKIlibX5wCgWlonq@debian.me>

On Sun, Jul 02, 2023 at 03:46:38PM +0200, Jason A. Donenfeld wrote:
> I've got an overdue patch that I still need to submit to netdev, which
> I suspect might actually fix this.
> 
> Can you let me know if
> https://git.zx2c4.com/wireguard-linux/patch/?id=54d5e4329efe0d1dba8b4a58720d29493926bed0
> solves the problem?

The reporter on Bugzilla [1] said it fixed the regression, so telling
regzbot:

#regzbot fix: 54d5e4329efe0d

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217620#c6

-- 
An old man doll... just what I always wanted! - Clara
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.zx2c4.com/pipermail/wireguard/attachments/20230703/170c64b8/attachment.sig>

From syzbot+list1615f52d4d969bd05dc8 at syzkaller.appspotmail.com  Fri Jul 14 09:49:00 2023
From: syzbot+list1615f52d4d969bd05dc8 at syzkaller.appspotmail.com (syzbot)
Date: Fri, 14 Jul 2023 02:49:00 -0700
Subject: [syzbot] Monthly wireguard report (Jul 2023)
Message-ID: <000000000000b46bf806006f5b02@google.com>

Hello wireguard maintainers/developers,

This is a 31-day syzbot report for the wireguard subsystem.
All related reports/information can be found at:
https://syzkaller.appspot.com/upstream/s/wireguard

During the period, 0 new issues were detected and 0 were fixed.
In total, 3 issues are still open and 13 have been fixed so far.

Some of the still happening issues:

Ref Crashes Repro Title
<1> 714     No    KCSAN: data-race in wg_packet_send_staged_packets / wg_packet_send_staged_packets (3)
                  https://syzkaller.appspot.com/bug?extid=6ba34f16b98fe40daef1
<2> 480     No    KCSAN: data-race in wg_packet_decrypt_worker / wg_packet_rx_poll (2)
                  https://syzkaller.appspot.com/bug?extid=d1de830e4ecdaac83d89
<3> 3       Yes   INFO: rcu detected stall in wg_packet_handshake_receive_worker
                  https://syzkaller.appspot.com/bug?extid=dbb6a05624cf5064858c

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller at googlegroups.com.

To disable reminders for individual bugs, reply with the following command:
#syz set <Ref> no-reminders

To change bug's subsystems, reply with:
#syz set <Ref> subsystems: new-subsystem

You may send multiple commands in a single email message.

From maarten at de-vri.es  Fri Jul 14 10:27:01 2023
From: maarten at de-vri.es (Maarten de Vries)
Date: Fri, 14 Jul 2023 12:27:01 +0200
Subject: ip netns del zaps wg link
In-Reply-To: <4fd6c9cb-c2cf-7a16-ee62-d958790652ea@gmail.com>
References: <4fd6c9cb-c2cf-7a16-ee62-d958790652ea@gmail.com>
Message-ID: <bbbf8973-de16-781f-68dd-739e5033e681@de-vri.es>

On 18/05/2023 01:13, Harry G Coin wrote:
> First, Hi and thanks for all the effort!
>
> At least on Ubuntu latest LTS:? As advertised, if a wireguard link 
> gets created by systemd/networkd, then set into a different net 
> namespace, all works well.
>
> However, if that namespace is deleted, the link appears to be 'gone 
> forever'.? Other link types reappear in the primary namespace when the 
> namespace they are in gets deleted.?? I'm not sure whether the link 
> retains its 'up' or 'down' state when the namespace it's in gets 
> deleted and reset to primary.? Not a big deal, doesn't happen often.
>
> This is 100% repeatable.?? Some other answer than 'inaccessible until 
> the next reboot' would be nice.
>
>

Hi,

This behavior is exactly what I would expect. I'm using namespaces to 
restrict access to a wireguard link. If the namespace gets destroyed, I 
absolutely do not want other programs to have access to the wireguard link.

You can simply re-create the wireguard link to use it again. This may 
not be the most convenient for you, but your use case seems to be a bit 
unconventional: you are moving and deleting a resource created by 
systemd and/or networkd manually. You are mixing automatic and manual 
management, so there is a risk of breaking the automatic management.

Alternatively, you could move the interface back before deleting the 
namespace.

Kind regards,

Maarten de Vries


From hgcoin at gmail.com  Fri Jul 14 21:48:30 2023
From: hgcoin at gmail.com (Harry G Coin)
Date: Fri, 14 Jul 2023 16:48:30 -0500
Subject: ip netns del zaps wg link
In-Reply-To: <bbbf8973-de16-781f-68dd-739e5033e681@de-vri.es>
References: <4fd6c9cb-c2cf-7a16-ee62-d958790652ea@gmail.com>
 <bbbf8973-de16-781f-68dd-739e5033e681@de-vri.es>
Message-ID: <3a110fda-8fc0-d2d3-e866-2a975cce085b@gmail.com>


On 7/14/23 05:27, Maarten de Vries wrote:
> On 18/05/2023 01:13, Harry G Coin wrote:
>> First, Hi and thanks for all the effort!
>>
>> At least on Ubuntu latest LTS:? As advertised, if a wireguard link 
>> gets created by systemd/networkd, then set into a different net 
>> namespace, all works well.
>>
>> However, if that namespace is deleted, the link appears to be 'gone 
>> forever'.? Other link types reappear in the primary namespace when 
>> the namespace they are in gets deleted.?? I'm not sure whether the 
>> link retains its 'up' or 'down' state when the namespace it's in gets 
>> deleted and reset to primary.? Not a big deal, doesn't happen often.
>>
>> This is 100% repeatable.?? Some other answer than 'inaccessible until 
>> the next reboot' would be nice.
>>
>>
>
> Hi,
>
> This behavior is exactly what I would expect. I'm using namespaces to 
> restrict access to a wireguard link. If the namespace gets destroyed, 
> I absolutely do not want other programs to have access to the 
> wireguard link.
>
> You can simply re-create the wireguard link to use it again. This may 
> not be the most convenient for you, but your use case seems to be a 
> bit unconventional: you are moving and deleting a resource created by 
> systemd and/or networkd manually. You are mixing automatic and manual 
> management, so there is a risk of breaking the automatic management.
>
> Alternatively, you could move the interface back before deleting the 
> namespace.
>
> Kind regards,
>
> Maarten de Vries
>
Hi,

It's worth thinking about the only means by which a namespace 'gets 
destroyed'.

The point of systemd/networkd for most of us is similarity and 
convenience and uniformity in initialization across interface device 
types.? That frees later choices in nic management to involve only the 
detail specific to those choices.? Remember systemd/networkd (can be 
just one-and-done setup time management) is a very different thing than 
NetworkManager (Automatic active ongoing management).? Someday I hope 
systemd/networkd adds namespace comprehension.

As wireguard and namespaces management are both limited to the root 
user, who presumably is aware of the security implications involved, and 
wireguard's birth in the initial namespace is a selling point no matter 
how it moves among namespaces later: allowing wireguard interfaces to 
behave like all other interfaces when a namespace is destroyed? (moving 
back to the namespace where it was born and to which it retains 
connection anyhow) avoids imposing further 'wireguard only' admin 
burden.?? It might be convenient to automatically set the wireguard link 
'down' as the interface transitions back from the namespace being 
destroyed to the primary so as to avoid any possibility of overlapping 
existing entries in the primary routing table.? But destroying the 
interface altogether generates admin burden beyond need.

Thanks for all the wireguard work!

Harry Coin


From mjt at tls.msk.ru  Sat Jul 15 04:48:48 2023
From: mjt at tls.msk.ru (Michael Tokarev)
Date: Sat, 15 Jul 2023 07:48:48 +0300
Subject: ip netns del zaps wg link
In-Reply-To: <3a110fda-8fc0-d2d3-e866-2a975cce085b@gmail.com>
References: <4fd6c9cb-c2cf-7a16-ee62-d958790652ea@gmail.com>
 <bbbf8973-de16-781f-68dd-739e5033e681@de-vri.es>
 <3a110fda-8fc0-d2d3-e866-2a975cce085b@gmail.com>
Message-ID: <45a9cfad-7d77-1be7-9e38-165d12a31c08@tls.msk.ru>

15.07.2023 00:48, Harry G Coin wrote:
..

> [] allowing wireguard interfaces to behave 
> like all other interfaces when a namespace is destroyed? (moving back to the namespace where it was born and to which it retains connection anyhow) 

The thing is that all interface types behave like this when a network namespace is removed:
they're destroyed together with the namespace.  All which can be deleted anyway, for which
an `ip link del' command works, - like, physical NICs are the only exception here b/c you
can't remove a physical NIC from a physical machine this way.

So in this context, wg interfaces are *already* behaving like all other virtual interfaces,
and this is done by linux network/namespace subsystem, not by wireguard.

/mjt

From hgcoin at gmail.com  Sat Jul 15 20:29:25 2023
From: hgcoin at gmail.com (Harry G Coin)
Date: Sat, 15 Jul 2023 15:29:25 -0500
Subject: ip netns del zaps wg link
In-Reply-To: <45a9cfad-7d77-1be7-9e38-165d12a31c08@tls.msk.ru>
References: <4fd6c9cb-c2cf-7a16-ee62-d958790652ea@gmail.com>
 <bbbf8973-de16-781f-68dd-739e5033e681@de-vri.es>
 <3a110fda-8fc0-d2d3-e866-2a975cce085b@gmail.com>
 <45a9cfad-7d77-1be7-9e38-165d12a31c08@tls.msk.ru>
Message-ID: <c210924e-71f2-2243-dbf3-ef7a202a7661@gmail.com>


On 7/14/23 23:48, Michael Tokarev wrote:
> 15.07.2023 00:48, Harry G Coin wrote:
> ..
>
>> [] allowing wireguard interfaces to behave like all other interfaces 
>> when a namespace is destroyed? (moving back to the namespace where it 
>> was born and to which it retains connection anyhow) 
>
> The thing is that all interface types behave like this when a network 
> namespace is removed:
> they're destroyed together with the namespace.? All which can be 
> deleted anyway, for which
> an `ip link del' command works, - like, physical NICs are the only 
> exception here b/c you
> can't remove a physical NIC from a physical machine this way.
>
> So in this context, wg interfaces are *already* behaving like all 
> other virtual interfaces,
> and this is done by linux network/namespace subsystem, not by wireguard.
>
> /mjt


Oh dear.??? It sure makes more sense to me for anything called 'an 
interface' to move in the same fashion as any other.? Having to 'just 
know' which ones will 'remain' and which ones will 'go away and need to 
be entirely reconfigured all the time' seems more than the security need 
calls for.? Just setting the link down when the netns goes away would be 
better, I can decide when, whether and how to create and destroy 
interfaces. ? Or at least an option to 'treat all links the same when 
the netns goes away' somehow.

Off soap box now!

Thanks for the comments.


From syzbot+96eb4e0d727f0ae998a6 at syzkaller.appspotmail.com  Tue Jul 18 15:21:13 2023
From: syzbot+96eb4e0d727f0ae998a6 at syzkaller.appspotmail.com (syzbot)
Date: Tue, 18 Jul 2023 08:21:13 -0700
Subject: [syzbot] [wireguard?] [jfs?] KASAN: slab-use-after-free Read in
 wg_noise_keypair_get
Message-ID: <0000000000002bfa570600c477b3@google.com>

Hello,

syzbot found the following issue on:

HEAD commit:    51f269a6ecc7 Merge tag 'probes-fixes-6.4-rc4' of git://git..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=111705d1280000
kernel config:  https://syzkaller.appspot.com/x/.config?x=162cf2103e4a7453
dashboard link: https://syzkaller.appspot.com/bug?extid=96eb4e0d727f0ae998a6
compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13101715280000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/dc3a22741e4e/disk-51f269a6.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/61d77fe6cfb4/vmlinux-51f269a6.xz
kernel image: https://storage.googleapis.com/syzbot-assets/bebce35b62e9/bzImage-51f269a6.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/d2ff4ad2d0c2/mount_0.gz

The issue was bisected to:

commit 586fb2641371cf7f23a401ab1c79b17e3ec457f4
Author: Kuninori Morimoto <kuninori.morimoto.gx at renesas.com>
Date:   Wed Jun 22 05:54:06 2022 +0000

    ASoC: soc-core.c: fixup snd_soc_of_get_dai_link_cpus()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=108780b5280000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=128780b5280000
console output: https://syzkaller.appspot.com/x/log.txt?x=148780b5280000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+96eb4e0d727f0ae998a6 at syzkaller.appspotmail.com
Fixes: 586fb2641371 ("ASoC: soc-core.c: fixup snd_soc_of_get_dai_link_cpus()")

IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
==================================================================
BUG: KASAN: slab-use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline]
BUG: KASAN: slab-use-after-free in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
BUG: KASAN: slab-use-after-free in refcount_read include/linux/refcount.h:147 [inline]
BUG: KASAN: slab-use-after-free in __refcount_add_not_zero include/linux/refcount.h:152 [inline]
BUG: KASAN: slab-use-after-free in __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
BUG: KASAN: slab-use-after-free in refcount_inc_not_zero include/linux/refcount.h:245 [inline]
BUG: KASAN: slab-use-after-free in kref_get_unless_zero include/linux/kref.h:111 [inline]
BUG: KASAN: slab-use-after-free in wg_noise_keypair_get+0xd2/0x3a0 drivers/net/wireguard/noise.c:146
Read of size 4 at addr ffff88807d0304d8 by task kworker/0:6/5139

CPU: 0 PID: 5139 Comm: kworker/0:6 Not tainted 6.4.0-rc4-syzkaller-00268-g51f269a6ecc7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:351 [inline]
 print_report+0x163/0x540 mm/kasan/report.c:462
 kasan_report+0x176/0x1b0 mm/kasan/report.c:572
 kasan_check_range+0x283/0x290 mm/kasan/generic.c:187
 instrument_atomic_read include/linux/instrumented.h:68 [inline]
 atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
 refcount_read include/linux/refcount.h:147 [inline]
 __refcount_add_not_zero include/linux/refcount.h:152 [inline]
 __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
 refcount_inc_not_zero include/linux/refcount.h:245 [inline]
 kref_get_unless_zero include/linux/kref.h:111 [inline]
 wg_noise_keypair_get+0xd2/0x3a0 drivers/net/wireguard/noise.c:146
 wg_packet_send_staged_packets+0x406/0x1890 drivers/net/wireguard/send.c:357
 wg_xmit+0xbca/0x1120 drivers/net/wireguard/device.c:217
 __netdev_start_xmit include/linux/netdevice.h:4915 [inline]
 netdev_start_xmit include/linux/netdevice.h:4929 [inline]
 xmit_one net/core/dev.c:3578 [inline]
 dev_hard_start_xmit+0x241/0x750 net/core/dev.c:3594
 __dev_queue_xmit+0x19b9/0x38b0 net/core/dev.c:4244
 neigh_output include/net/neighbour.h:544 [inline]
 ip6_finish_output2+0xf80/0x1560 net/ipv6/ip6_output.c:134
 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
 ip6_finish_output+0x6b0/0xa80 net/ipv6/ip6_output.c:206
 dst_output include/net/dst.h:458 [inline]
 NF_HOOK include/linux/netfilter.h:303 [inline]
 ndisc_send_skb+0xb08/0x1390 net/ipv6/ndisc.c:508
 addrconf_dad_completed+0x6ea/0xcf0 net/ipv6/addrconf.c:4254
 addrconf_dad_work+0xd92/0x16b0
 process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2405
 worker_thread+0xa63/0x1210 kernel/workqueue.c:2552
 kthread+0x2b8/0x350 kernel/kthread.c:379
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
 </TASK>

Allocated by task 5137:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x4f/0x70 mm/kasan/common.c:52
 ____kasan_kmalloc mm/kasan/common.c:374 [inline]
 __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:383
 kmalloc include/linux/slab.h:559 [inline]
 kzalloc include/linux/slab.h:680 [inline]
 keypair_create drivers/net/wireguard/noise.c:100 [inline]
 wg_noise_handshake_begin_session+0xc4/0xb60 drivers/net/wireguard/noise.c:827
 wg_packet_send_handshake_response+0x120/0x2d0 drivers/net/wireguard/send.c:96
 wg_receive_handshake_packet drivers/net/wireguard/receive.c:154 [inline]
 wg_packet_handshake_receive_worker+0x5dd/0xf00 drivers/net/wireguard/receive.c:213
 process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2405
 worker_thread+0xa63/0x1210 kernel/workqueue.c:2552
 kthread+0x2b8/0x350 kernel/kthread.c:379
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

Freed by task 5086:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x4f/0x70 mm/kasan/common.c:52
 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:521
 ____kasan_slab_free+0xd6/0x120 mm/kasan/common.c:236
 kasan_slab_free include/linux/kasan.h:162 [inline]
 slab_free_hook mm/slub.c:1781 [inline]
 slab_free_freelist_hook mm/slub.c:1807 [inline]
 slab_free mm/slub.c:3786 [inline]
 __kmem_cache_free+0x264/0x3c0 mm/slub.c:3799
 diUnmount+0xf3/0x100 fs/jfs/jfs_imap.c:195
 jfs_umount+0x186/0x3a0 fs/jfs/jfs_umount.c:63
 jfs_put_super+0x8a/0x190 fs/jfs/super.c:194
 generic_shutdown_super+0x134/0x340 fs/super.c:500
 kill_block_super+0x84/0xf0 fs/super.c:1407
 deactivate_locked_super+0xa4/0x110 fs/super.c:331
 cleanup_mnt+0x426/0x4c0 fs/namespace.c:1177
 task_work_run+0x24a/0x300 kernel/task_work.c:179
 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
 exit_to_user_mode_loop+0xd9/0x100 kernel/entry/common.c:171
 exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
 syscall_exit_to_user_mode+0x64/0x280 kernel/entry/common.c:297
 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff88807d030000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1240 bytes inside of
 freed 2048-byte region [ffff88807d030000, ffff88807d030800)

The buggy address belongs to the physical page:
page:ffffea0001f40c00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7d030
head:ffffea0001f40c00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000010200 ffff888012442000 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 5137, tgid 5137 (kworker/1:5), ts 1071863956331, free_ts 1071845351658
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x1e6/0x210 mm/page_alloc.c:1731
 prep_new_page mm/page_alloc.c:1738 [inline]
 get_page_from_freelist+0x321c/0x33a0 mm/page_alloc.c:3502
 __alloc_pages+0x255/0x670 mm/page_alloc.c:4768
 alloc_slab_page+0x6a/0x160 mm/slub.c:1851
 allocate_slab mm/slub.c:1998 [inline]
 new_slab+0x84/0x2f0 mm/slub.c:2051
 ___slab_alloc+0xa85/0x10a0 mm/slub.c:3192
 __slab_alloc mm/slub.c:3291 [inline]
 __slab_alloc_node mm/slub.c:3344 [inline]
 slab_alloc_node mm/slub.c:3441 [inline]
 __kmem_cache_alloc_node+0x1b8/0x290 mm/slub.c:3490
 kmalloc_trace+0x2a/0xe0 mm/slab_common.c:1057
 kmalloc include/linux/slab.h:559 [inline]
 kzalloc include/linux/slab.h:680 [inline]
 keypair_create drivers/net/wireguard/noise.c:100 [inline]
 wg_noise_handshake_begin_session+0xc4/0xb60 drivers/net/wireguard/noise.c:827
 wg_packet_send_handshake_response+0x120/0x2d0 drivers/net/wireguard/send.c:96
 wg_receive_handshake_packet drivers/net/wireguard/receive.c:154 [inline]
 wg_packet_handshake_receive_worker+0x5dd/0xf00 drivers/net/wireguard/receive.c:213
 process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2405
 worker_thread+0xa63/0x1210 kernel/workqueue.c:2552
 kthread+0x2b8/0x350 kernel/kthread.c:379
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1302 [inline]
 free_unref_page_prepare+0x903/0xa30 mm/page_alloc.c:2564
 free_unref_page+0x37/0x3f0 mm/page_alloc.c:2659
 free_large_kmalloc+0xff/0x190 mm/slab_common.c:943
 diMount+0x657/0x870 fs/jfs/jfs_imap.c:115
 jfs_mount_rw+0x2da/0x6a0 fs/jfs/jfs_mount.c:240
 jfs_remount+0x3d1/0x6b0 fs/jfs/super.c:454
 reconfigure_super+0x3c9/0x7c0 fs/super.c:956
 vfs_fsconfig_locked fs/fsopen.c:254 [inline]
 __do_sys_fsconfig fs/fsopen.c:439 [inline]
 __se_sys_fsconfig+0xa29/0xf70 fs/fsopen.c:314
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Memory state around the buggy address:
 ffff88807d030380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88807d030400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88807d030480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                    ^
 ffff88807d030500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88807d030580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller at googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

From Jason at zx2c4.com  Tue Jul 18 17:57:22 2023
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Tue, 18 Jul 2023 19:57:22 +0200
Subject: [syzbot] [wireguard?] [jfs?] KASAN: slab-use-after-free Read in
 wg_noise_keypair_get
In-Reply-To: <0000000000002bfa570600c477b3@google.com>
References: <0000000000002bfa570600c477b3@google.com>
Message-ID: <CAHmME9reBny-ufJp58uOg+KdMptircBRhLFd-N2KwxNUp6myTA@mail.gmail.com>

Freed in:

 diUnmount+0xf3/0x100 fs/jfs/jfs_imap.c:195
 jfs_umount+0x186/0x3a0 fs/jfs/jfs_umount.c:63
 jfs_put_super+0x8a/0x190 fs/jfs/super.c:194

So maybe not a wg issue?

From dvyukov at google.com  Wed Jul 19 07:18:14 2023
From: dvyukov at google.com (Dmitry Vyukov)
Date: Wed, 19 Jul 2023 09:18:14 +0200
Subject: Fwd: WireGuard Mailling List
In-Reply-To: <20230719091342.789107a9@parrot>
References: <20230719091342.789107a9@parrot>
Message-ID: <CACT4Y+bsdEDukXi02RKWuEa-BYajP=4w5B-7KEUqf--HOVPVjg@mail.gmail.com>

Hi,

I was asked to forward this to the list because the author has
problems reaching it.
It looks legit:

---------- Forwarded message ---------
From: Marek K?the <m.k at mk16.de>

Hello,

the WireGuard logo is a Chinese dragon. Does he/she have a name? What
could you call him/her? The background is that I want to make a mod for
[UnCiv](https://github.com/yairm210/Unciv) and would like to incorporate
there various motifs of computer life - among others, the dragon of
WireGuard. Therefore he/she needs - at least for my purposes - a name.
What would you call it?

--
Marek K?the
m.k at mk16.de
er/ihm he/him

From dave.kleikamp at oracle.com  Tue Jul 18 21:02:53 2023
From: dave.kleikamp at oracle.com (Dave Kleikamp)
Date: Tue, 18 Jul 2023 16:02:53 -0500
Subject: [syzbot] [wireguard?] [jfs?] KASAN: slab-use-after-free Read in
 wg_noise_keypair_get
In-Reply-To: <CAHmME9reBny-ufJp58uOg+KdMptircBRhLFd-N2KwxNUp6myTA@mail.gmail.com>
References: <0000000000002bfa570600c477b3@google.com>
 <CAHmME9reBny-ufJp58uOg+KdMptircBRhLFd-N2KwxNUp6myTA@mail.gmail.com>
Message-ID: <97a9c205-2074-07f8-ae9d-9f2b4aebbf9a@oracle.com>

On 7/18/23 12:57PM, Jason A. Donenfeld wrote:
> Freed in:
> 
>   diUnmount+0xf3/0x100 fs/jfs/jfs_imap.c:195
>   jfs_umount+0x186/0x3a0 fs/jfs/jfs_umount.c:63
>   jfs_put_super+0x8a/0x190 fs/jfs/super.c:194
> 
> So maybe not a wg issue?

Maybe not. It could possibly fixed by:
https://github.com/kleikamp/linux-shaggy/commit/6e2bda2c192d0244b5a78b787ef20aa10cb319b7

Shaggy

From syzbot+96eb4e0d727f0ae998a6 at syzkaller.appspotmail.com  Wed Jul 19 17:52:36 2023
From: syzbot+96eb4e0d727f0ae998a6 at syzkaller.appspotmail.com (syzbot)
Date: Wed, 19 Jul 2023 10:52:36 -0700
Subject: [syzbot] [wireguard?] [jfs?] KASAN: slab-use-after-free Read in
 wg_noise_keypair_get
In-Reply-To: <30f03978-3035-a28e-c097-112036901bcb@nerdbynature.de>
Message-ID: <00000000000069291b0600dab2d6@google.com>

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+96eb4e0d727f0ae998a6 at syzkaller.appspotmail.com

Tested on:

commit:         6e2bda2c jfs: fix invalid free of JFS_IP(ipimap)->i_im..
git tree:       https://github.com/kleikamp/linux-shaggy.git
console output: https://syzkaller.appspot.com/x/log.txt?x=172aecaaa80000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f631232ee56196bd
dashboard link: https://syzkaller.appspot.com/bug?extid=96eb4e0d727f0ae998a6
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Note: no patches were applied.
Note: testing is done by a robot and is best-effort only.

From dxld at darkboxed.org  Fri Jul 21 00:06:43 2023
From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=)
Date: Fri, 21 Jul 2023 02:06:43 +0200
Subject: Wg source address is too sticky for multihomed systems aka multiple
 endpoints redux
Message-ID: <20230721000643.44y5pd7sfcjzhbjw@House.clients.dxld.at>

Hi wire-guard, :)

tl;dr: I wan to implement mutliple peer endpoints to fix the only two
problems haunting me with wireguard.

I have a multihomed router with two public IPv4 addresses plus default
routes in a failover configuration. The setup includes the two default
routes with different metrics and appropriate ip-rule(s) to make traffic
with a preselected source address leave via the correct interface.

On top of this v4 underlay I run a number of wireguard interfaces providing
IPv6 service for my network. Since one of the v4 uplinks is an LTE/5G
router the main uplink is usually preferable and the (default) route
metrics reflect this.

However I've observed wireguard continuing to send traffic via the larger
metric default route after failover events even after the primary link and
it's default route is back.

Source address issues on multihomed hosts have been discussed on the
list multiple times before. See for example:
- https://lists.zx2c4.com/pipermail/wireguard/2023-February/007948.html
- https://lists.zx2c4.com/pipermail/wireguard/2021-October/007205.html
- https://lists.zx2c4.com/pipermail/wireguard/2021-November/007309.html

So I'm certainly not the only one experiencing issues like this.

I set out on a quest to debug this. My first reading of the code indicated
that perhaps the dst_cache is at fault but after adding some tracing code
it became clear that our endpoint logic is simply broken for multihomed
systems:

The dst_cache gets properly invalidated whenever route switchover happens
but when doing a new rt lookup we force the lookup to use the (known good)
src address.

This is deficient because if we run a full route lookup we might get a
different source address (as is the case in my setup). I do think I
understand why we do things like this: we know this source address is
working and the new one could break connectivity. Fair enough.

So here's a proposal: we introduce a second wg_peer endpoint address for
use with handshakes. This way we can send a handshake using the new source
address and only switch if it succeeds.

I do expect this to be a fair bit of additional logic since we need to deal
with timeouts, retrys and such. However I think this is a good opportunity
to kill two birds with one stone. Hear me out.

I have a second issue with wireguard that's been bugging me for ages:
IPv4/6 non-dual-stack support _sucks_. The kernel only knows about one
endpoint address ever so if a endpoint (DNS) host resolves to multiple
addresses there's nothing userspace can easily do to make things work on
IPv4-only *and* IPv6-only networks.

This is kind of the same problem we're having with multihoming though: if
only wireguard could keep track of multiple endpoints (think: dst+src
address pairs).

So my proposal is to just add support for multiple endpoints. There is only
ever one endpoint involved in sending user data but we attempt handshakes
over all endpoints. (Exact logic TBD)

To fix the multihoming issue we then check if the socket.c:sendX rt lookup
returns a different src address form what we're expecting. If not we clone
the current (dst) endpoint with the new source address and kick off a
handshake over it.

Note "multiple endpoints" was suggested before in "[RFC] Handling multiple
endpoints for a single peer" and I agree with most of the design spec
presented in it:
- https://lists.zx2c4.com/pipermail/wireguard/2017-January/000917.html

I would perhaps not go as far as to introduce fancy RTT measurment. Me
personally, I have a proper routing daemon (babeld) in userspace using an
RTT metric for that. No need to do this in kernel.

The ability to send "out-of-band" packets to a particular peer mentioned by
Jason in the above mail would actually help routing daemons to cover the
entire failover story as that's the only limitation currently: I need one
wg tunnel per-peer to do routing but I digress.

Let me know what y'all think, I'd like to start hacking/designing this
ASAP. These things have been the only pain point in an otherwise stellar
user experience with wireguard!

Thanks,
--Daniel

PS: I have found one viable workaround for this source stickyness. `wg set
$iface fwmar $mark` will reset all peer src addresses, but it doesn't stick
at hight packet rates because (I think) the incoming packets immediately
overwrite the src address in wg_packet_consume_data_done() via
wg_socket_set_peer_endpoint(). So you have to do it a couple of times
(perhaps in a tight loop) for it to un-stick the source address :)

From nico.schottelius at ungleich.ch  Fri Jul 21 07:31:33 2023
From: nico.schottelius at ungleich.ch (Nico Schottelius)
Date: Fri, 21 Jul 2023 09:31:33 +0200
Subject: Wg source address is too sticky for multihomed systems aka
 multiple endpoints redux
In-Reply-To: <20230721000643.44y5pd7sfcjzhbjw@House.clients.dxld.at>
References: <20230721000643.44y5pd7sfcjzhbjw@House.clients.dxld.at>
Message-ID: <87351h4rp7.fsf@ungleich.ch>


Good morning,

Daniel Gr?ber <dxld at darkboxed.org> writes:
> [...]
> I have a multihomed router [...]

following up the thread from February, we migrated away from wireguard
to openvpn on systems that have are multi homed.

The main reason for that is the following type of connection to a high
probability fails to work:

1) device -> [NAT/FIREWALL] -> multi homed server [IP A]
2) multi homed server [IP B] -- blocked by firewall as it does not match
table entry

This always happens when the server has as an asymmetric route back to
the originating device, which really depends on the routing tables
or routing policy present on the multi homed server.

I'm a big fan of simplicity, but without an equivalent of openvpn's
"local" statement, wireguard is deemed to be unusable in many network
scenarios.

Best regards,

Nico


--
Sustainable and modern Infrastructures by ungleich.ch

From johnalauro at gmail.com  Fri Jul 21 13:47:11 2023
From: johnalauro at gmail.com (John Lauro)
Date: Fri, 21 Jul 2023 09:47:11 -0400
Subject: Wg source address is too sticky for multihomed systems aka
 multiple endpoints redux
In-Reply-To: <87351h4rp7.fsf@ungleich.ch>
References: <20230721000643.44y5pd7sfcjzhbjw@House.clients.dxld.at>
 <87351h4rp7.fsf@ungleich.ch>
Message-ID: <CADGd2Dpxst52_OxMJBO19Qn11hrgxsXVaNnryb9YryipLvX8Gg@mail.gmail.com>

I have a lots of multihomed routers setup for vpn site to site and
running bgp over the vpn mesh.

First, make sure these are all 0 as are multihomed.
cat $( find /proc/sys/net/ipv4 -name rp_filter )

The other thing I do is I run a different wireguard interface and peer
on a different port and interface.

With bgp on top, one multihomed router to another multihomed router
just ends up being multiple links it can route over and let linux/bgp
decide which ones to use and automatically fail over if one path goes
down.

That said, I don't have any NAT and both ends have fixed IPs, although
they are multihomed.

Can you create a separate wireguard interface for each physical
interface (I suggest a different port too).  Separate wireguard
interfaces should keep WG from having issues, and of course disabling
rp_filter to keep linux from having issues.


On Fri, Jul 21, 2023 at 4:05?AM Nico Schottelius
<nico.schottelius at ungleich.ch> wrote:
>
>
> Good morning,
>
> Daniel Gr?ber <dxld at darkboxed.org> writes:
> > [...]
> > I have a multihomed router [...]
>
> following up the thread from February, we migrated away from wireguard
> to openvpn on systems that have are multi homed.
>
> The main reason for that is the following type of connection to a high
> probability fails to work:
>
> 1) device -> [NAT/FIREWALL] -> multi homed server [IP A]
> 2) multi homed server [IP B] -- blocked by firewall as it does not match
> table entry
>
> This always happens when the server has as an asymmetric route back to
> the originating device, which really depends on the routing tables
> or routing policy present on the multi homed server.
>
> I'm a big fan of simplicity, but without an equivalent of openvpn's
> "local" statement, wireguard is deemed to be unusable in many network
> scenarios.
>
> Best regards,
>
> Nico
>
>
> --
> Sustainable and modern Infrastructures by ungleich.ch

From larkwang at gmail.com  Sun Jul 23 15:31:56 2023
From: larkwang at gmail.com (Wang Jian)
Date: Sun, 23 Jul 2023 23:31:56 +0800
Subject: WireGuard Mailling List
In-Reply-To: <CACT4Y+bsdEDukXi02RKWuEa-BYajP=4w5B-7KEUqf--HOVPVjg@mail.gmail.com>
References: <20230719091342.789107a9@parrot>
 <CACT4Y+bsdEDukXi02RKWuEa-BYajP=4w5B-7KEUqf--HOVPVjg@mail.gmail.com>
Message-ID: <CAF75rJA4ev7f9L4qm4LgKAe2xS5vyb7z=vrDq0RN_WXbOX_0kw@mail.gmail.com>

Hi,

I don't know the origin of the Wireguard logo, but it seems to be a ?
 (Jiao, pronounced j-ee-au), not a ? (Long)

The differences between Long (know as Chinese dragon) and Jiao is
* Jiao has 2 small straight horns or no horn at all while Long has 2 bigger
  and forky horns
* Long has beard while Jiao has no beard
* Jiao has snake like tail while Long has fish like tail (with round fin)
* Long has 4 claws while Jiao has only 2 claws (foreclaws), but this feature
  is usually misused.

Generally, Jiao is upgraded/developed snake living in rivers; Long is
upgraded/developed Jiao living in the sea, and Long can fly into the sky.

Of course you can call the logo  wireguard dragon as you wish.

Dmitry Vyukov <dvyukov at google.com> ?2023?7?19??? 15:22???
>
> Hi,
>
> I was asked to forward this to the list because the author has
> problems reaching it.
> It looks legit:
>
> ---------- Forwarded message ---------
> From: Marek K?the <m.k at mk16.de>
>
> Hello,
>
> the WireGuard logo is a Chinese dragon. Does he/she have a name? What
> could you call him/her? The background is that I want to make a mod for
> [UnCiv](https://github.com/yairm210/Unciv) and would like to incorporate
> there various motifs of computer life - among others, the dragon of
> WireGuard. Therefore he/she needs - at least for my purposes - a name.
> What would you call it?
>
> --
> Marek K?the
> m.k at mk16.de
> er/ihm he/him

From Jason at zx2c4.com  Sun Jul 23 16:00:00 2023
From: Jason at zx2c4.com (Jason A. Donenfeld)
Date: Sun, 23 Jul 2023 18:00:00 +0200
Subject: WireGuard Mailling List
In-Reply-To: <CAF75rJA4ev7f9L4qm4LgKAe2xS5vyb7z=vrDq0RN_WXbOX_0kw@mail.gmail.com>
References: <20230719091342.789107a9@parrot>
 <CACT4Y+bsdEDukXi02RKWuEa-BYajP=4w5B-7KEUqf--HOVPVjg@mail.gmail.com>
 <CAF75rJA4ev7f9L4qm4LgKAe2xS5vyb7z=vrDq0RN_WXbOX_0kw@mail.gmail.com>
Message-ID: <CAHmME9rzmyZzAETFGm7NGZ9FBhgSueskgmUD7Q17SOSe=zXtzA@mail.gmail.com>

On 7/23/23, Wang Jian <larkwang at gmail.com> wrote:
> Hi,
>
> I don't know the origin of the Wireguard logo, but it seems to be a ?
>  (Jiao, pronounced j-ee-au), not a ? (Long)
>
> The differences between Long (know as Chinese dragon) and Jiao is
> * Jiao has 2 small straight horns or no horn at all while Long has 2 bigger
>   and forky horns
> * Long has beard while Jiao has no beard
> * Jiao has snake like tail while Long has fish like tail (with round fin)
> * Long has 4 claws while Jiao has only 2 claws (foreclaws), but this
> feature
>   is usually misused.
>
> Generally, Jiao is upgraded/developed snake living in rivers; Long is
> upgraded/developed Jiao living in the sea, and Long can fly into the sky.
>
> Of course you can call the logo  wireguard dragon as you wish.

It is absolutely none of those things at all. Please stop speculating
and keep discussions on this here technical.

From sam at gentoo.org  Sun Jul  2 14:04:24 2023
From: sam at gentoo.org (Sam James)
Date: Sun, 02 Jul 2023 14:04:24 -0000
Subject: Fwd: RCU stalls with wireguard over bonding over igb on Linux
 6.3.0+
In-Reply-To: <10f2a5ee-91e2-1241-9e3b-932c493e61b6@leemhuis.info>
Message-ID: <87sfa6if4x.fsf@gentoo.org>

#regzbot link: https://bugs.gentoo.org/909066

From lists at nerdbynature.de  Wed Jul 19 15:02:13 2023
From: lists at nerdbynature.de (Christian Kujau)
Date: Wed, 19 Jul 2023 17:02:13 +0200 (CEST)
Subject: [Jfs-discussion] [syzbot] [wireguard?] [jfs?] KASAN:
 slab-use-after-free Read in wg_noise_keypair_get
In-Reply-To: <97a9c205-2074-07f8-ae9d-9f2b4aebbf9a@oracle.com>
References: <0000000000002bfa570600c477b3@google.com>
 <CAHmME9reBny-ufJp58uOg+KdMptircBRhLFd-N2KwxNUp6myTA@mail.gmail.com>
 <97a9c205-2074-07f8-ae9d-9f2b4aebbf9a@oracle.com>
Message-ID: <30f03978-3035-a28e-c097-112036901bcb@nerdbynature.de>

On Tue, 18 Jul 2023, Dave Kleikamp via Jfs-discussion wrote:
> Maybe not. It could possibly fixed by:
> https://github.com/kleikamp/linux-shaggy/commit/6e2bda2c192d0244b5a78b787ef20aa10cb319b7

Let's try this:

#syz test: https://github.com/kleikamp/linux-shaggy.git 6e2bda2c192d0244b5a78b787ef20aa10cb319b7

-- 
BOFH excuse #371:

Incorrectly configured static routes on the corerouters.

From n5d9xq3ti233xiyif2vp at protonmail.ch  Sun Jul 23 16:27:29 2023
From: n5d9xq3ti233xiyif2vp at protonmail.ch (Laura Smith)
Date: Sun, 23 Jul 2023 16:27:29 +0000
Subject: WireGuard Mailling List
In-Reply-To: <CAHmME9rzmyZzAETFGm7NGZ9FBhgSueskgmUD7Q17SOSe=zXtzA@mail.gmail.com>
References: <20230719091342.789107a9@parrot>
 <CACT4Y+bsdEDukXi02RKWuEa-BYajP=4w5B-7KEUqf--HOVPVjg@mail.gmail.com>
 <CAF75rJA4ev7f9L4qm4LgKAe2xS5vyb7z=vrDq0RN_WXbOX_0kw@mail.gmail.com>
 <CAHmME9rzmyZzAETFGm7NGZ9FBhgSueskgmUD7Q17SOSe=zXtzA@mail.gmail.com>
Message-ID: <JxWHQoPbe6yp-PphvO9y5GXGICRGLMS_J51rZ869XRdYRGR5r3yj10OmaJBgE6v_ltrWdMjg63dzAo8XtTw5MSWWPGhCsKnz-8qRHg-HaH8=@protonmail.ch>


> It is absolutely none of those things at all. Please stop speculating
> and keep discussions on this here technical.

I mean, the speculating muppets couuld have at least Googled it !

For the record, for people who are too lazy to Google Jason already answered the question: https://www.reddit.com/r/linux/comments/hzyu8j/comment/fznp3md/

From dxld at darkboxed.org  Sun Jul 23 17:05:04 2023
From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=)
Date: Sun, 23 Jul 2023 19:05:04 +0200
Subject: Wg source address is too sticky for multihomed systems aka
 multiple endpoints redux
In-Reply-To: <CADGd2Dpxst52_OxMJBO19Qn11hrgxsXVaNnryb9YryipLvX8Gg@mail.gmail.com>
References: <20230721000643.44y5pd7sfcjzhbjw@House.clients.dxld.at>
 <87351h4rp7.fsf@ungleich.ch>
 <CADGd2Dpxst52_OxMJBO19Qn11hrgxsXVaNnryb9YryipLvX8Gg@mail.gmail.com>
Message-ID: <20230723170504.srjgry54xkyva4wf@House.clients.dxld.at>

Hi John,

On Fri, Jul 21, 2023 at 09:47:11AM -0400, John Lauro wrote:
> I have a lots of multihomed routers setup for vpn site to site and
> running bgp over the vpn mesh.
> 
> First, make sure these are all 0 as are multihomed.
> cat $( find /proc/sys/net/ipv4 -name rp_filter )

My routers are behind consumer ISPs so I never get packets which would fail
RPF and I have RPF upstream of me either way, so this doesn't make a
difference in my case. Like I said I have ip-rules (PBR) to direct traffic
to the correct interface based on source address to appease upstream's RPF.

> The other thing I do is I run a different wireguard interface and peer
> on a different port and interface.

Same, in order to run a routing daemon on top of wg you pretty much have to
do that currently as only one peer may have AllowedIPs=::/0 but the routing
daemons dont (yet, I'm working on this for babel) know how to update
AllowedIPs.

> With bgp on top, one multihomed router to another multihomed router
> just ends up being multiple links it can route over and let linux/bgp
> decide which ones to use and automatically fail over if one path goes
> down.
> 
> That said, I don't have any NAT and both ends have fixed IPs, although
> they are multihomed.

I'm pretty sure you're not seeing the problem I describe here because your
paths are going to be pretty equivalent, but in my case one is DOCSIS3 and
one is LTE/5G (depends on weather) which is much worse in terms of
bandwidth and latency/jitter consistency. So I can actually see the
difference in applications (video buffering etc) which is what had me start
debugging in the first place :)

> Can you create a separate wireguard interface for each physical
> interface (I suggest a different port too).  Separate wireguard
> interfaces should keep WG from having issues, and of course disabling
> rp_filter to keep linux from having issues.

Hmm, that might just work since my routing daemon does RTT based routing
and the mobile connection is going to be much worse there. I already have
to deploy two tunnel because of the mentioned v4/v6 dualstack issue so I'm
not really keen to multiply that number _again_. Besides my `set fwmark`
workaround does actually legitimately work but it's ugly as hell :)

> On Fri, Jul 21, 2023 at 4:05?AM Nico Schottelius

/me realizes you were replying to Nico *blush*. See this is why you don't
top-post. Learn some netiquette people :-)

I've actually taken my followup discussion with Nico off-list because I
think it might be a more involved debug session on what's going on in his
setup, which is going to distract from my proposal. I'll send any
conclusions we come to back to the list though.

FYI: I do have a patch to add the necessary debugging code and logs to show
the concrete issue here, I just didn't want to cause information overload
in the initial mail. Just let me know and I'll send those along if there's
any doubt about whether what I describe is the actual issue I'm having. I'm
pretty convinced but the first rule of the internet it that the problem is
always the X-Y problem~.

Thanks,
--Daniel

From tech at tootai.net  Mon Jul 31 21:39:35 2023
From: tech at tootai.net (Daniel)
Date: Mon, 31 Jul 2023 23:39:35 +0200
Subject: Endpoint failover ip
Message-ID: <e9ddd93a-4517-7411-a5d3-360fa8e1bde6@tootai.net>

Hello,

I create a hostname with few IPs v4 & v6 for my wireguard server. I 
faced today a problem that after a failure with the ip a customer wg was 
registered, it continue to try to register with this ip insteed to 
fallback to another one.

Is there a way to avoid this problem and to get failover working 
properly with wireguard ?

Thanks for any hint
-- 
Daniel

From dxld at darkboxed.org  Mon Jul 31 22:27:44 2023
From: dxld at darkboxed.org (Daniel =?utf-8?Q?Gr=C3=B6ber?=)
Date: Tue, 1 Aug 2023 00:27:44 +0200
Subject: Endpoint failover ip
In-Reply-To: <e9ddd93a-4517-7411-a5d3-360fa8e1bde6@tootai.net>
References: <e9ddd93a-4517-7411-a5d3-360fa8e1bde6@tootai.net>
Message-ID: <20230731222744.5wej7mv5sef57w46@House.clients.dxld.at>

Hi Daniel,

On Mon, Jul 31, 2023 at 11:39:35PM +0200, Daniel wrote:
> I create a hostname with few IPs v4 & v6 for my wireguard server. I faced
> today a problem that after a failure with the ip a customer wg was
> registered, it continue to try to register with this ip insteed to fallback
> to another one.

Your message is hard to parse, but I think you're having the same v4/v6
failover problem as me. See my patch "wg: Support restricting address
family of DNS resolved Endpoint":

  https://lists.zx2c4.com/pipermail/wireguard/2023-February/007961.html

which has yet to get any attention from Jason unfortunately.

The headline is this: wireguard doesn't support multiple endpoints so you
have to be careful with how you setup your host records. At the moment you
can't just throw multiple IPs in there and hope for the best. Wg will stick
to whatever IP the system picks when the tunnel comes up.

> Is there a way to avoid this problem and to get failover working properly
> with wireguard ?

There isn't any wg native solution[1] right now, only hacky
workarounds. You basically need one wg tunnel per unique endpoint but once
you do that routing becomes an issue. Plain static routes wont cut it
anymore. On top of that using an endpoint domain with multiple IPs is a
problem. Things are easier if you stick to one IP per domain or just
hardcode one endpoint IP for each of the many tunnels.

[1]: Supporting multiple active endpoints is where we have to head to fix
this properly IMO, see my recent proposal
https://lists.zx2c4.com/pipermail/wireguard/2023-July/008111.html

Anyway with the many wg tunnels one could then write a script to ping
through the tunnels and switch the appropriate route to the one that
responds. This has to happen at both ends of the tunnel. Me personally, I
just use an easy to setup routing daemon (babeld) to do that.

--Daniel

From matteofranzil at gmail.com  Mon Jul 24 05:29:33 2023
From: matteofranzil at gmail.com (Matteo Franzil)
Date: Mon, 24 Jul 2023 05:29:33 -0000
Subject: wg-quick down not reverting DNS parameters on MacOS
In-Reply-To: <7522f2c6-5782-25f5-6f25-75d05d50b868@gmail.com>
References: <7522f2c6-5782-25f5-6f25-75d05d50b868@gmail.com>
Message-ID: <7cde2e10-b27d-0502-1b97-bacdbd9dd4a4@gmail.com>

Hi!

I extensively searched for any discussion on this bug (or at least, I 
hope so), which has been bugging me for a while.

I am a Wireguard user on macOS Ventura (version 13.4.1 (c)), and 
installed wireguard via the wireguard-tools (version 1.0.20210914) and 
wireguard-go (0.0.20230223) commands on brew.

Assume I have set my DNS servers either via GUI or via DHCP (doesn't 
matter how), and I use wg-quick to connect to a wg conf file (also 
irrelevant what is the target server).

The moment I:
- use wg-quick to bring up the VPN,
- put my Mac to sleep,
- reopen the lid,
- use wg-quick to stop the VPN,

then DNS servers are not updated back to the original value, and instead 
stick to what the previous VPN configuration had commanded.

The workaround is just to verify what DNS servers are set with scutil 
--dns and cat /etc/resolv.conf, but editing them is a pain. I often work 
with an open VPN and closing the lid without remembering to turn it of 
is common.

Let me know if I also need to provide further details.

See also this GitHub issue, which was posted on an unrelated repository 
but perfectly matches what I have just said:

https://github.com/StreisandEffect/streisand/issues/1334


Matteo


From jacob at schooley.com  Fri Jul 21 14:06:23 2023
From: jacob at schooley.com (Jacob Schooley)
Date: Fri, 21 Jul 2023 14:06:23 -0000
Subject: Search domains in iOS client
Message-ID: <CAAvyDFtz85toVPLBk_P0FMQ1BCE2UBOoLRAX2M-YKWYe5BstGA@mail.gmail.com>

Hello all,

I?m trying to figure out how to get search domains working. I
configured mine in the DNS line, but it only seems to work if I have
all traffic routed through the tunnel. If I set AllowedIPs to only
route specified subnets, I can?t resolve hostnames without the FQDN.

It appears this was brought up a few years ago:
https://lists.zx2c4.com/pipermail/wireguard/2021-July/006927.html and
it has not been fixed.

Jacob