From liuhangbin at gmail.com Thu Nov 7 02:44:18 2024 From: liuhangbin at gmail.com (Hangbin Liu) Date: Thu, 7 Nov 2024 02:44:18 +0000 Subject: [PATCH net] selftests: wireguard: load nf_conntrack if it's not present Message-ID: <20241107024418.3606-1-liuhangbin@gmail.com> Some distros may not load nf_conntrack by default, which will cause subsequent nf_conntrack settings to fail. Let's load this module if it's not loaded by default. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Hangbin Liu --- tools/testing/selftests/wireguard/netns.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh index 405ff262ca93..508b391e8d9a 100755 --- a/tools/testing/selftests/wireguard/netns.sh +++ b/tools/testing/selftests/wireguard/netns.sh @@ -66,6 +66,7 @@ cleanup() { orig_message_cost="$(< /proc/sys/net/core/message_cost)" trap cleanup EXIT printf 0 > /proc/sys/net/core/message_cost +lsmod | grep -q nf_conntrack || modprobe nf_conntrack ip netns del $netns0 2>/dev/null || true ip netns del $netns1 2>/dev/null || true -- 2.46.0 From liuhangbin at gmail.com Thu Nov 7 02:54:38 2024 From: liuhangbin at gmail.com (Hangbin Liu) Date: Thu, 7 Nov 2024 02:54:38 +0000 Subject: [PATCH net-next] selftests: wireguards: use nft by default Message-ID: <20241107025438.3766-1-liuhangbin@gmail.com> Use nft by default if it's supported, as nft is the replacement for iptables, which is used by default in some releases. Additionally, iptables is dropped in some releases. Signed-off-by: Hangbin Liu --- CC nft developers to see if there are any easier configurations, as I'm not very familiar with nft commands. --- tools/testing/selftests/wireguard/netns.sh | 63 ++++++++++++++++++---- 1 file changed, 53 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh index 405ff262ca93..4e29c1a7003c 100755 --- a/tools/testing/selftests/wireguard/netns.sh +++ b/tools/testing/selftests/wireguard/netns.sh @@ -44,6 +44,7 @@ sleep() { read -t "$1" -N 1 || true; } waitiperf() { pretty "${1//*-}" "wait for iperf:${3:-5201} pid $2"; while [[ $(ss -N "$1" -tlpH "sport = ${3:-5201}") != *\"iperf3\",pid=$2,fd=* ]]; do sleep 0.1; done; } waitncatudp() { pretty "${1//*-}" "wait for udp:1111 pid $2"; while [[ $(ss -N "$1" -ulpH 'sport = 1111') != *\"ncat\",pid=$2,fd=* ]]; do sleep 0.1; done; } waitiface() { pretty "${1//*-}" "wait for $2 to come up"; ip netns exec "$1" bash -c "while [[ \$(< \"/sys/class/net/$2/operstate\") != up ]]; do read -t .1 -N 0 || true; done;"; } +use_nft() { nft --version &> /dev/null; } cleanup() { set +e @@ -196,13 +197,23 @@ ip1 link set wg0 mtu 1300 ip2 link set wg0 mtu 1300 n1 wg set wg0 peer "$pub2" endpoint 127.0.0.1:2 n2 wg set wg0 peer "$pub1" endpoint 127.0.0.1:1 -n0 iptables -A INPUT -m length --length 1360 -j DROP +if use_nft; then + n0 nft add table inet filter + n0 nft add chain inet filter INPUT { type filter hook input priority filter \; policy accept \; } + n0 nft add rule inet filter INPUT meta length 1360 counter drop +else + n0 iptables -A INPUT -m length --length 1360 -j DROP +fi n1 ip route add 192.168.241.2/32 dev wg0 mtu 1299 n2 ip route add 192.168.241.1/32 dev wg0 mtu 1299 n2 ping -c 1 -W 1 -s 1269 192.168.241.1 n2 ip route delete 192.168.241.1/32 dev wg0 mtu 1299 n1 ip route delete 192.168.241.2/32 dev wg0 mtu 1299 -n0 iptables -F INPUT +if use_nft; then + n0 nft delete table inet filter +else + n0 iptables -F INPUT +fi ip1 link set wg0 mtu $orig_mtu ip2 link set wg0 mtu $orig_mtu @@ -334,7 +345,13 @@ waitiface $netns2 veths n0 bash -c 'printf 1 > /proc/sys/net/ipv4/ip_forward' n0 bash -c 'printf 2 > /proc/sys/net/netfilter/nf_conntrack_udp_timeout' n0 bash -c 'printf 2 > /proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream' -n0 iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1 +if use_nft; then + n0 nft add table inet nat + n0 nft add chain inet nat POSTROUTING { type nat hook postrouting priority srcnat\; policy accept \; } + n0 nft add rule inet nat POSTROUTING ip saddr 192.168.1.0/24 ip daddr 10.0.0.0/24 counter snat to 10.0.0.1 +else + n0 iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1 +fi n1 wg set wg0 peer "$pub2" endpoint 10.0.0.100:2 persistent-keepalive 1 n1 ping -W 1 -c 1 192.168.241.2 @@ -348,10 +365,20 @@ n1 wg set wg0 peer "$pub2" persistent-keepalive 0 # Test that sk_bound_dev_if works n1 ping -I wg0 -c 1 -W 1 192.168.241.2 # What about when the mark changes and the packet must be rerouted? -n1 iptables -t mangle -I OUTPUT -j MARK --set-xmark 1 +if use_nft; then + n1 nft add table inet mangle + n1 nft add chain inet mangle OUTPUT { type route hook output priority mangle\; policy accept \; } + n1 nft add rule inet mangle OUTPUT counter meta mark set 0x1 +else + n1 iptables -t mangle -I OUTPUT -j MARK --set-xmark 1 +fi n1 ping -c 1 -W 1 192.168.241.2 # First the boring case n1 ping -I wg0 -c 1 -W 1 192.168.241.2 # Then the sk_bound_dev_if case -n1 iptables -t mangle -D OUTPUT -j MARK --set-xmark 1 +if use_nft; then + n1 nft delete table inet mangle +else + n1 iptables -t mangle -D OUTPUT -j MARK --set-xmark 1 +fi # Test that onion routing works, even when it loops n1 wg set wg0 peer "$pub3" allowed-ips 192.168.242.2/32 endpoint 192.168.241.2:5 @@ -385,16 +412,32 @@ n1 ping -W 1 -c 100 -f 192.168.99.7 n1 ping -W 1 -c 100 -f abab::1111 # Have ns2 NAT into wg0 packets from ns0, but return an icmp error along the right route. -n2 iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -d 192.168.241.0/24 -j SNAT --to 192.168.241.2 -n0 iptables -t filter -A INPUT \! -s 10.0.0.0/24 -i vethrs -j DROP # Manual rpfilter just to be explicit. +if use_nft; then + n2 nft add table inet nat + n2 nft add chain inet nat POSTROUTING { type nat hook postrouting priority srcnat\; policy accept \; } + n2 nft add rule inet nat POSTROUTING ip saddr 10.0.0.0/24 ip daddr 192.168.241.0/24 counter snat to 192.168.241.2 + + n0 nft add table inet filter + n0 nft add chain inet filter INPUT { type filter hook input priority filter \; policy accept \; } + n0 nft add rule inet filter INPUT iifname "vethrs" ip saddr != 10.0.0.0/24 counter drop +else + n2 iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -d 192.168.241.0/24 -j SNAT --to 192.168.241.2 + n0 iptables -t filter -A INPUT \! -s 10.0.0.0/24 -i vethrs -j DROP # Manual rpfilter just to be explicit. +fi n2 bash -c 'printf 1 > /proc/sys/net/ipv4/ip_forward' ip0 -4 route add 192.168.241.1 via 10.0.0.100 n2 wg set wg0 peer "$pub1" remove [[ $(! n0 ping -W 1 -c 1 192.168.241.1 || false) == *"From 10.0.0.100 icmp_seq=1 Destination Host Unreachable"* ]] -n0 iptables -t nat -F -n0 iptables -t filter -F -n2 iptables -t nat -F +if use_nft; then + n0 nft delete table inet nat + n0 nft delete table inet filter + n2 nft delete table inet nat +else + n0 iptables -t nat -F + n0 iptables -t filter -F + n2 iptables -t nat -F +fi ip0 link del vethrc ip0 link del vethrs ip1 link del wg0 -- 2.46.0 From horms at kernel.org Sun Nov 10 13:42:30 2024 From: horms at kernel.org (Simon Horman) Date: Sun, 10 Nov 2024 13:42:30 +0000 Subject: [PATCH net] selftests: wireguard: load nf_conntrack if it's not present In-Reply-To: <20241107024418.3606-1-liuhangbin@gmail.com> References: <20241107024418.3606-1-liuhangbin@gmail.com> Message-ID: <20241110134230.GR4507@kernel.org> On Thu, Nov 07, 2024 at 02:44:18AM +0000, Hangbin Liu wrote: > Some distros may not load nf_conntrack by default, which will cause > subsequent nf_conntrack settings to fail. Let's load this module if it's > not loaded by default. > > Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") > Signed-off-by: Hangbin Liu > --- > tools/testing/selftests/wireguard/netns.sh | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh > index 405ff262ca93..508b391e8d9a 100755 > --- a/tools/testing/selftests/wireguard/netns.sh > +++ b/tools/testing/selftests/wireguard/netns.sh > @@ -66,6 +66,7 @@ cleanup() { > orig_message_cost="$(< /proc/sys/net/core/message_cost)" > trap cleanup EXIT > printf 0 > /proc/sys/net/core/message_cost > +lsmod | grep -q nf_conntrack || modprobe nf_conntrack Hi Hangbin, As modprobe should be idempotent both for the case were nf_conntrack is built-in (I'm unsure if that case can ever occur) and the module has already been inserted, I think you simply use: modprobe nf_conntrack Of course, if nf_conntrack isn't available at all, then this will fail. But that was the case with your patch too. And so I assume it is intended. > > ip netns del $netns0 2>/dev/null || true > ip netns del $netns1 2>/dev/null || true > -- > 2.46.0 > > From liuhangbin at gmail.com Mon Nov 11 04:19:02 2024 From: liuhangbin at gmail.com (Hangbin Liu) Date: Mon, 11 Nov 2024 04:19:02 +0000 Subject: [PATCHv2 net-next] selftests: wireguards: use nft by default Message-ID: <20241111041902.25814-1-liuhangbin@gmail.com> Use nft by default if it's supported, as nft is the replacement for iptables, which is used by default in some releases. Additionally, iptables is dropped in some releases. Signed-off-by: Hangbin Liu --- v2: use one nft table for testing (Phil Sutter) --- tools/testing/selftests/wireguard/netns.sh | 63 ++++++++++++++++++---- 1 file changed, 53 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh index 405ff262ca93..be4e3b13ed22 100755 --- a/tools/testing/selftests/wireguard/netns.sh +++ b/tools/testing/selftests/wireguard/netns.sh @@ -44,6 +44,7 @@ sleep() { read -t "$1" -N 1 || true; } waitiperf() { pretty "${1//*-}" "wait for iperf:${3:-5201} pid $2"; while [[ $(ss -N "$1" -tlpH "sport = ${3:-5201}") != *\"iperf3\",pid=$2,fd=* ]]; do sleep 0.1; done; } waitncatudp() { pretty "${1//*-}" "wait for udp:1111 pid $2"; while [[ $(ss -N "$1" -ulpH 'sport = 1111') != *\"ncat\",pid=$2,fd=* ]]; do sleep 0.1; done; } waitiface() { pretty "${1//*-}" "wait for $2 to come up"; ip netns exec "$1" bash -c "while [[ \$(< \"/sys/class/net/$2/operstate\") != up ]]; do read -t .1 -N 0 || true; done;"; } +use_nft() { nft --version &> /dev/null; } cleanup() { set +e @@ -75,6 +76,12 @@ pp ip netns add $netns1 pp ip netns add $netns2 ip0 link set up dev lo +if use_nft; then + n0 nft add table ip wgtest + n1 nft add table ip wgtest + n2 nft add table ip wgtest +fi + ip0 link add dev wg0 type wireguard ip0 link set wg0 netns $netns1 ip0 link add dev wg0 type wireguard @@ -196,13 +203,22 @@ ip1 link set wg0 mtu 1300 ip2 link set wg0 mtu 1300 n1 wg set wg0 peer "$pub2" endpoint 127.0.0.1:2 n2 wg set wg0 peer "$pub1" endpoint 127.0.0.1:1 -n0 iptables -A INPUT -m length --length 1360 -j DROP +if use_nft; then + n0 nft add chain ip wgtest INPUT { type filter hook input priority filter \; policy accept \; } + n0 nft add rule ip wgtest INPUT meta length 1360 counter drop +else + n0 iptables -A INPUT -m length --length 1360 -j DROP +fi n1 ip route add 192.168.241.2/32 dev wg0 mtu 1299 n2 ip route add 192.168.241.1/32 dev wg0 mtu 1299 n2 ping -c 1 -W 1 -s 1269 192.168.241.1 n2 ip route delete 192.168.241.1/32 dev wg0 mtu 1299 n1 ip route delete 192.168.241.2/32 dev wg0 mtu 1299 -n0 iptables -F INPUT +if use_nft; then + n0 nft flush table ip wgtest +else + n0 iptables -F INPUT +fi ip1 link set wg0 mtu $orig_mtu ip2 link set wg0 mtu $orig_mtu @@ -334,7 +350,12 @@ waitiface $netns2 veths n0 bash -c 'printf 1 > /proc/sys/net/ipv4/ip_forward' n0 bash -c 'printf 2 > /proc/sys/net/netfilter/nf_conntrack_udp_timeout' n0 bash -c 'printf 2 > /proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream' -n0 iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1 +if use_nft; then + n0 nft add chain ip wgtest POSTROUTING { type nat hook postrouting priority srcnat\; policy accept \; } + n0 nft add rule ip wgtest POSTROUTING ip saddr 192.168.1.0/24 ip daddr 10.0.0.0/24 counter snat to 10.0.0.1 +else + n0 iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1 +fi n1 wg set wg0 peer "$pub2" endpoint 10.0.0.100:2 persistent-keepalive 1 n1 ping -W 1 -c 1 192.168.241.2 @@ -348,10 +369,19 @@ n1 wg set wg0 peer "$pub2" persistent-keepalive 0 # Test that sk_bound_dev_if works n1 ping -I wg0 -c 1 -W 1 192.168.241.2 # What about when the mark changes and the packet must be rerouted? -n1 iptables -t mangle -I OUTPUT -j MARK --set-xmark 1 +if use_nft; then + n1 nft add chain ip wgtest OUTPUT { type route hook output priority mangle\; policy accept \; } + n1 nft add rule ip wgtest OUTPUT counter meta mark set 0x1 +else + n1 iptables -t mangle -I OUTPUT -j MARK --set-xmark 1 +fi n1 ping -c 1 -W 1 192.168.241.2 # First the boring case n1 ping -I wg0 -c 1 -W 1 192.168.241.2 # Then the sk_bound_dev_if case -n1 iptables -t mangle -D OUTPUT -j MARK --set-xmark 1 +if use_nft; then + n1 nft flush table ip wgtest +else + n1 iptables -t mangle -D OUTPUT -j MARK --set-xmark 1 +fi # Test that onion routing works, even when it loops n1 wg set wg0 peer "$pub3" allowed-ips 192.168.242.2/32 endpoint 192.168.241.2:5 @@ -385,16 +415,29 @@ n1 ping -W 1 -c 100 -f 192.168.99.7 n1 ping -W 1 -c 100 -f abab::1111 # Have ns2 NAT into wg0 packets from ns0, but return an icmp error along the right route. -n2 iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -d 192.168.241.0/24 -j SNAT --to 192.168.241.2 -n0 iptables -t filter -A INPUT \! -s 10.0.0.0/24 -i vethrs -j DROP # Manual rpfilter just to be explicit. +if use_nft; then + n2 nft add chain ip wgtest POSTROUTING { type nat hook postrouting priority srcnat\; policy accept \; } + n2 nft add rule ip wgtest POSTROUTING ip saddr 10.0.0.0/24 ip daddr 192.168.241.0/24 counter snat to 192.168.241.2 + + n0 nft add chain ip wgtest INPUT { type filter hook input priority filter \; policy accept \; } + n0 nft add rule ip wgtest INPUT iifname "vethrs" ip saddr != 10.0.0.0/24 counter drop +else + n2 iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -d 192.168.241.0/24 -j SNAT --to 192.168.241.2 + n0 iptables -t filter -A INPUT \! -s 10.0.0.0/24 -i vethrs -j DROP # Manual rpfilter just to be explicit. +fi n2 bash -c 'printf 1 > /proc/sys/net/ipv4/ip_forward' ip0 -4 route add 192.168.241.1 via 10.0.0.100 n2 wg set wg0 peer "$pub1" remove [[ $(! n0 ping -W 1 -c 1 192.168.241.1 || false) == *"From 10.0.0.100 icmp_seq=1 Destination Host Unreachable"* ]] -n0 iptables -t nat -F -n0 iptables -t filter -F -n2 iptables -t nat -F +if use_nft; then + n0 nft flush table ip wgtest + n2 nft flush table ip wgtest +else + n0 iptables -t nat -F + n0 iptables -t filter -F + n2 iptables -t nat -F +fi ip0 link del vethrc ip0 link del vethrs ip1 link del wg0 -- 2.39.5 (Apple Git-154) From liuhangbin at gmail.com Tue Nov 12 09:48:27 2024 From: liuhangbin at gmail.com (Hangbin Liu) Date: Tue, 12 Nov 2024 09:48:27 +0000 Subject: [PATCHv2 net] selftests: wireguard: load nf_conntrack if it's not present Message-ID: <20241112094828.391002-1-liuhangbin@gmail.com> Some distros may not load nf_conntrack by default, which will cause subsequent nf_conntrack settings to fail. Let's load this module if it's not loaded by default. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Hangbin Liu --- v2: load the mode directly in case nf_conntrack is build in (Simon Horman) --- tools/testing/selftests/wireguard/netns.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh index 405ff262ca93..fa4dd7eb5918 100755 --- a/tools/testing/selftests/wireguard/netns.sh +++ b/tools/testing/selftests/wireguard/netns.sh @@ -66,6 +66,7 @@ cleanup() { orig_message_cost="$(< /proc/sys/net/core/message_cost)" trap cleanup EXIT printf 0 > /proc/sys/net/core/message_cost +modprobe nf_conntrack ip netns del $netns0 2>/dev/null || true ip netns del $netns1 2>/dev/null || true -- 2.39.5 (Apple Git-154) From horms at kernel.org Wed Nov 13 09:48:32 2024 From: horms at kernel.org (Simon Horman) Date: Wed, 13 Nov 2024 09:48:32 +0000 Subject: [PATCHv2 net] selftests: wireguard: load nf_conntrack if it's not present In-Reply-To: <20241112094828.391002-1-liuhangbin@gmail.com> References: <20241112094828.391002-1-liuhangbin@gmail.com> Message-ID: <20241113094832.GW4507@kernel.org> On Tue, Nov 12, 2024 at 09:48:27AM +0000, Hangbin Liu wrote: > Some distros may not load nf_conntrack by default, which will cause > subsequent nf_conntrack settings to fail. Let's load this module if it's > not loaded by default. > > Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") > Signed-off-by: Hangbin Liu > --- > v2: load the mode directly in case nf_conntrack is build in (Simon Horman) Thanks for the update. Reviewed-by: Simon Horman From kuba at kernel.org Fri Nov 15 23:35:26 2024 From: kuba at kernel.org (Jakub Kicinski) Date: Fri, 15 Nov 2024 15:35:26 -0800 Subject: [PATCH net-next v3 4/6] rtnetlink: Decouple net namespaces in rtnl_newlink_create() In-Reply-To: <20241113125715.150201-5-shaw.leon@gmail.com> References: <20241113125715.150201-1-shaw.leon@gmail.com> <20241113125715.150201-5-shaw.leon@gmail.com> Message-ID: <20241115153526.3582ebcd@kernel.org> On Wed, 13 Nov 2024 20:57:13 +0800 Xiao Liang wrote: > +/** > + * struct rtnl_link_nets - net namespace context of newlink. > + * > + * @src_net: Source netns of rtnetlink socket > + * @link_net: Link netns by IFLA_LINK_NETNSID, NULL if not specified. > + */ > +struct rtnl_link_nets { > + struct net *src_net; > + struct net *link_net; > +}; Let's not limit ourselves to passing just netns via this struct. Let's call it rtnl_newlink_args or params. The first patch of the series got merged independently so you'll need to respin. -- pw-bot: cr From Jason at zx2c4.com Sun Nov 17 20:06:18 2024 From: Jason at zx2c4.com (Jason A. Donenfeld) Date: Sun, 17 Nov 2024 21:06:18 +0100 Subject: [PATCH net-next] wireguard: allowedips: Fix useless call issue In-Reply-To: <20241115110721.22932-1-dheeraj.linuxdev@gmail.com> References: <20241115110721.22932-1-dheeraj.linuxdev@gmail.com> Message-ID: On Fri, Nov 15, 2024 at 04:37:21PM +0530, Dheeraj Reddy Jonnalagadda wrote: > This commit fixes a useless call issue detected > by Coverity (CID 1508092). The call to > horrible_allowedips_lookup_v4 is unnecessary as > its return value is never checked. Applied to the wireguard tree, thanks. From Jason at zx2c4.com Sun Nov 17 20:09:00 2024 From: Jason at zx2c4.com (Jason A. Donenfeld) Date: Sun, 17 Nov 2024 21:09:00 +0100 Subject: [PATCHv2 net-next] selftests: wireguards: use nft by default In-Reply-To: <20241111041902.25814-1-liuhangbin@gmail.com> References: <20241111041902.25814-1-liuhangbin@gmail.com> Message-ID: On Mon, Nov 11, 2024 at 04:19:02AM +0000, Hangbin Liu wrote: > Use nft by default if it's supported, as nft is the replacement for iptables, > which is used by default in some releases. Additionally, iptables is dropped > in some releases. Rather than having this optionality, I'd rather just do everything in one way or the other. So if you're adamant that we need to use nft, just convert the whole thing. And then subsequently, make sure that the qemu test harness supports it. That should probably be a series. Jason From Jason at zx2c4.com Sun Nov 17 20:11:52 2024 From: Jason at zx2c4.com (Jason A. Donenfeld) Date: Sun, 17 Nov 2024 21:11:52 +0100 Subject: [PATCHv2 net] selftests: wireguard: load nf_conntrack if it's not present In-Reply-To: <20241112094828.391002-1-liuhangbin@gmail.com> References: <20241112094828.391002-1-liuhangbin@gmail.com> Message-ID: On Tue, Nov 12, 2024 at 09:48:27AM +0000, Hangbin Liu wrote: > Some distros may not load nf_conntrack by default, which will cause > subsequent nf_conntrack settings to fail. Let's load this module if it's > not loaded by default. > > Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") > Signed-off-by: Hangbin Liu Applied, thanks. From Jason at zx2c4.com Sun Nov 17 20:50:27 2024 From: Jason at zx2c4.com (Jason A. Donenfeld) Date: Sun, 17 Nov 2024 21:50:27 +0100 Subject: [PATCH v2 net-next] wireguard: allowedips: Add WGALLOWEDIP_F_REMOVE_ME flag In-Reply-To: <20240905200551.4099064-1-jrife@google.com> References: <20240905200551.4099064-1-jrife@google.com> Message-ID: Hi Jordan, Sorry it's taken me a bit to look at this patch. A few notes: On Thu, Sep 05, 2024 at 03:05:41PM -0500, Jordan Rife wrote: > With the current API the only way to remove an allowed IP is to > completely rebuild the allowed IPs set for a peer using > WGPEER_F_REPLACE_ALLOWEDIPS. In other words, if my current configuration > is such that a peer has allowed IP IPs 192.168.0.2 and 192.168.0.3 and I > want to remove 192.168.0.2 the actual transition looks like this. > > [192.168.0.2, 192.168.0.3] <-- Initial state > [] <-- Step 1: Allowed IPs removed for peer > [192.168.0.3] <-- Step 2: Allowed IPs added back for peer > > This is true even if the allowed IP list is small and the update does > not need to be batched into multiple WG_CMD_SET_DEVICE requests, as > the removal and subsequent addition of IPs is non-atomic within a single > request. Consequently, wg_allowedips_lookup_dst and > wg_allowedips_lookup_src may return NULL while reconfiguring a peer even > for packets bound for IPs a user did not intend to remove leading to > unintended interruptions in connectivity. This presents in userspace as > failed calls to sendto and sendmsg for UDP sockets. In my case, I ran > netperf while repeatedly reconfiguring the allowed IPs for a peer with > wg. > > /usr/local/bin/netperf -H 10.102.73.72 -l 10m -t UDP_STREAM -- -R 1 -m 1024 > send_data: data send error: No route to host (errno 113) > netperf: send_omni: send_data failed: No route to host That's a worthwhile point. This indeed fixes a *bug*, beyond being just a helpful new feature. > incrementally updated. This patch also bumps WG_GENL_VERSION which can > be used by clients to detect whether or not their system supports the > WGALLOWEDIP_F_REMOVE_ME flag. I'm actually less enthusiastic about this part, but mainly because I haven't looked closely at what the convention for this is. I was wondering though - this adds WGALLOWEDIP_A_FLAGS, which didn't exist before. Shouldn't some upper layer return a relevant value in that case? And even within the existing flags, for WGPEER_A_FLAGS, for example, old kernels check to see if there are new flags, for this purpose, e.g.: if (attrs[WGPEER_A_FLAGS]) flags = nla_get_u32(attrs[WGPEER_A_FLAGS]); ret = -EOPNOTSUPP; if (flags & ~__WGPEER_F_ALL) goto out; So I think we might be able to avoid having to bump the version number. GENL is supposed to be extensible like this. > +static void _remove(struct allowedips_node *node, struct mutex *lock) This file doesn't really do the _ prefix thing anywhere. Can you call this something more descriptive, like "remove_node"? > - if (free_parent) > - child = rcu_dereference_protected( > - parent->bit[!(node->parent_bit_packed & 1)], > - lockdep_is_held(lock)); [...] > + if (free_parent) > + child = rcu_dereference_protected(parent->bit[!(node->parent_bit_packed & 1)], > + lockdep_is_held(lock)); Thanks for fixing up the ugly extra \n in the original code you copy and pasted from. I remember the horror of that when I added line breaks in the original code. > + call_rcu(&node->rcu, node_free_rcu); > + if (!free_parent) > + return; > + if (child) > + child->parent_bit_packed = parent->parent_bit_packed; > + *(struct allowedips_node **)(parent->parent_bit_packed & ~3UL) = child; > + call_rcu(&parent->rcu, node_free_rcu); > +} > + > +static int remove(struct allowedips_node __rcu **trie, u8 bits, const u8 *key, > + u8 cidr, struct wg_peer *peer, struct mutex *lock) > +{ > + struct allowedips_node *node; > + > + if (unlikely(cidr > bits || !peer)) > + return -EINVAL; Reasoning for this is that it copies the logic in add()? > + if (!rcu_access_pointer(*trie) || > + !node_placement(*trie, key, cidr, bits, &node, lock) || > + peer != rcu_access_pointer(node->peer)) > + return 0; What's the reasoning behind returning success when it can't find the node? Because in that case it's already removed so the function is idempotent? And you checked that nothing really cares about the return value there anyway? Or is this a mistake and you meant to return something else? I can imagine good reasoning in either direction; I'd just like to learn that your choice is deliberate. > + > + _remove(node, lock); > + > + return 0; > +} > + > family = nla_get_u16(attrs[WGALLOWEDIP_A_FAMILY]); > cidr = nla_get_u8(attrs[WGALLOWEDIP_A_CIDR_MASK]); > + if (attrs[WGALLOWEDIP_A_FLAGS]) > + flags = nla_get_u32(attrs[WGALLOWEDIP_A_FLAGS]); As I mentioned above, you need to do the dance of: ret = -EOPNOTSUPP; if (flags & ~__WGALLOWEDIP_F_ALL) goto out; So that we can safely extend this later. > > if (family == AF_INET && cidr <= 32 && > - nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr)) > - ret = wg_allowedips_insert_v4( > - &peer->device->peer_allowedips, > - nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, > - &peer->device->device_update_lock); > - else if (family == AF_INET6 && cidr <= 128 && > - nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in6_addr)) > - ret = wg_allowedips_insert_v6( > - &peer->device->peer_allowedips, > - nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, > - &peer->device->device_update_lock); > + nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr)) { > + if (flags & WGALLOWEDIP_F_REMOVE_ME) > + ret = wg_allowedips_remove_v4(&peer->device->peer_allowedips, > + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), > + cidr, > + peer, > + &peer->device->device_update_lock); We get 100 chars now, so you can rewrite this as: ret = wg_allowedips_remove_v4(&peer->device->peer_allowedips, nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, &peer->device->device_update_lock); > + * WGALLOWEDIP_A_FLAGS: NLA_U32, WGALLOWEDIP_F_REMOVE_ME if > + * the specified IP should be removed, That comma should be a semicolon because what comes after is a complete sentence, and there's no conjunction. > + * otherwise this IP will be added if > + * it is not already present. > +remove-ip: > + gcc -I/usr/include/libnl3 \ > + -I../../../../usr/include \ > + remove-ip.c \ > + -o remove-ip \ > + -lnl-genl-3 \ > + -lnl-3 > > + sock = nl_socket_alloc(); > + genl_connect(sock); > + family = genl_ctrl_resolve(sock, WG_GENL_NAME); > + msg = nlmsg_alloc(); > + genlmsg_put(msg, NL_AUTO_PID, NL_AUTO_SEQ, family, 0, NLM_F_ECHO, > + WG_CMD_SET_DEVICE, WG_GENL_VERSION); > + nla_put_string(msg, WGDEVICE_A_IFNAME, argv[1]); > + > + struct nlattr *peers = nla_nest_start(msg, WGDEVICE_A_PEERS); > + struct nlattr *peer0 = nla_nest_start(msg, 0); > + > + nla_put(msg, WGPEER_A_PUBLIC_KEY, CURVE25519_KEY_SIZE, pub_key); > + > + struct nlattr *allowed_ips = nla_nest_start(msg, WGPEER_A_ALLOWEDIPS); > + struct nlattr *allowed_ip0 = nla_nest_start(msg, 0); > + > + nla_put_u16(msg, WGALLOWEDIP_A_FAMILY, af); > + nla_put(msg, WGALLOWEDIP_A_IPADDR, addr_len, &addr); > + nla_put_u8(msg, WGALLOWEDIP_A_CIDR_MASK, cidr); > + nla_put_u32(msg, WGALLOWEDIP_A_FLAGS, WGALLOWEDIP_F_REMOVE_ME); > + nla_nest_end(msg, allowed_ip0); > + nla_nest_end(msg, allowed_ips); > + nla_nest_end(msg, peer0); > + nla_nest_end(msg, peers); > + > + int err = nl_send_sync(sock, msg); > + > + if (err < 0) { > + char message[256]; > + > + nl_perror(err, message); > + printf("An error occurred: %d - %s\n", err, message); > + } > + I'm not so keen on this, simply because we've been able to do everything else in that script and keeping with the "make sure the userspace tooling" paradigm. There are two options: 1. Rewrite netns.sh all in C, only depending on libnl or whatever (which I would actually really *love* to see happen). This would change the testing paradigm, but I'd be okay with that if it's done well and all at once. 2. Add support for this new flag to wg(8) (which I think is necessary anyway for this to land; kernel features and userspace support oughta be posted at once). Thanks for the patch. I like the feature and I'm happy you posted this. Jason From Jason at zx2c4.com Sun Nov 17 21:46:37 2024 From: Jason at zx2c4.com (Jason A. Donenfeld) Date: Sun, 17 Nov 2024 22:46:37 +0100 Subject: [PATCH v2 net-next] wireguard: allowedips: Add WGALLOWEDIP_F_REMOVE_ME flag In-Reply-To: <20240905200551.4099064-1-jrife@google.com> References: <20240905200551.4099064-1-jrife@google.com> Message-ID: On Thu, Sep 05, 2024 at 03:05:41PM -0500, Jordan Rife wrote: > With the current API the only way to remove an allowed IP is to > completely rebuild the allowed IPs set for a peer using > WGPEER_F_REPLACE_ALLOWEDIPS. Just for posterity, there actually is another way: create a new peer with a random key, and give the allowed IP you want to remove to that peer. Moves are atomic. Then destroy that peer. Not that this is clean or nice or something, and I like your patch. But in case somebody gets into trouble before this lands, I thought I should note it on the list. Jason From Jason at zx2c4.com Mon Nov 18 01:35:09 2024 From: Jason at zx2c4.com (Jason A. Donenfeld) Date: Mon, 18 Nov 2024 02:35:09 +0100 Subject: WireGuard & Linux kernel RNG @ FOSDEM 2025 Message-ID: Hey folks, On February 1 & 2, 2025 in Brussels, the FOSDEM [0] conference will take place. This year, there's going to be a stand during those two days dedicated to answering questions and interacting with the community regarding WireGuard and the Linux kernel RNG [1], and perhaps other cryptography or networking adjacent topics, in addition to the bonanza of stickers and such of course. I'll be running the stand and will be hanging out there most of the two days. If anybody would like to help run the stand with me or organize anything around it, please don't hesitate to drop me an email. I think this presents a nice moment for the community, both to pull together developers and other interested parties into a common place to talk and discuss, and moreover to help with education and awareness in ways that "read the docs" or "ask on IRC" don't. Looking forward to seeing some of you there. Regards, Jason [0] https://fosdem.org/2025/ [1] https://fosdem.org/2025/news/2024-11-16-stands-announced/ From liuhangbin at gmail.com Mon Nov 18 10:08:55 2024 From: liuhangbin at gmail.com (Hangbin Liu) Date: Mon, 18 Nov 2024 10:08:55 +0000 Subject: [PATCHv2 net-next] selftests: wireguards: use nft by default In-Reply-To: References: <20241111041902.25814-1-liuhangbin@gmail.com> Message-ID: On Sun, Nov 17, 2024 at 09:09:00PM +0100, Jason A. Donenfeld wrote: > On Mon, Nov 11, 2024 at 04:19:02AM +0000, Hangbin Liu wrote: > > Use nft by default if it's supported, as nft is the replacement for iptables, > > which is used by default in some releases. Additionally, iptables is dropped > > in some releases. > > Rather than having this optionality, I'd rather just do everything in > one way or the other. So if you're adamant that we need to use nft, just > convert the whole thing. And then subsequently, make sure that the qemu > test harness supports it. That should probably be a series. Thanks, I will do an update for the qemu test. Hangbin From dheeraj.linuxdev at gmail.com Fri Nov 15 11:07:21 2024 From: dheeraj.linuxdev at gmail.com (Dheeraj Reddy Jonnalagadda) Date: Fri, 15 Nov 2024 16:37:21 +0530 Subject: [PATCH net-next] wireguard: allowedips: Fix useless call issue Message-ID: <20241115110721.22932-1-dheeraj.linuxdev@gmail.com> This commit fixes a useless call issue detected by Coverity (CID 1508092). The call to horrible_allowedips_lookup_v4 is unnecessary as its return value is never checked. Signed-off-by: Dheeraj Reddy Jonnalagadda --- drivers/net/wireguard/selftest/allowedips.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/wireguard/selftest/allowedips.c b/drivers/net/wireguard/selftest/allowedips.c index 3d1f64ff2e12..25de7058701a 100644 --- a/drivers/net/wireguard/selftest/allowedips.c +++ b/drivers/net/wireguard/selftest/allowedips.c @@ -383,7 +383,6 @@ static __init bool randomized_test(void) for (i = 0; i < NUM_QUERIES; ++i) { get_random_bytes(ip, 4); if (lookup(t.root4, 32, ip) != horrible_allowedips_lookup_v4(&h, (struct in_addr *)ip)) { - horrible_allowedips_lookup_v4(&h, (struct in_addr *)ip); pr_err("allowedips random v4 self-test: FAIL\n"); goto free; } -- 2.34.1 From phil at nwl.cc Thu Nov 7 08:23:04 2024 From: phil at nwl.cc (Phil Sutter) Date: Thu, 7 Nov 2024 09:23:04 +0100 Subject: [PATCH net-next] selftests: wireguards: use nft by default In-Reply-To: <20241107025438.3766-1-liuhangbin@gmail.com> References: <20241107025438.3766-1-liuhangbin@gmail.com> Message-ID: Hi Liu Hangbin, On Thu, Nov 07, 2024 at 02:54:38AM +0000, Hangbin Liu wrote: > Use nft by default if it's supported, as nft is the replacement for iptables, > which is used by default in some releases. Additionally, iptables is dropped > in some releases. > > Signed-off-by: Hangbin Liu > --- > CC nft developers to see if there are any easier configurations, > as I'm not very familiar with nft commands. Basically looks good, just a few minor remarks: > --- > tools/testing/selftests/wireguard/netns.sh | 63 ++++++++++++++++++---- > 1 file changed, 53 insertions(+), 10 deletions(-) > > diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh > index 405ff262ca93..4e29c1a7003c 100755 > --- a/tools/testing/selftests/wireguard/netns.sh > +++ b/tools/testing/selftests/wireguard/netns.sh > @@ -44,6 +44,7 @@ sleep() { read -t "$1" -N 1 || true; } > waitiperf() { pretty "${1//*-}" "wait for iperf:${3:-5201} pid $2"; while [[ $(ss -N "$1" -tlpH "sport = ${3:-5201}") != *\"iperf3\",pid=$2,fd=* ]]; do sleep 0.1; done; } > waitncatudp() { pretty "${1//*-}" "wait for udp:1111 pid $2"; while [[ $(ss -N "$1" -ulpH 'sport = 1111') != *\"ncat\",pid=$2,fd=* ]]; do sleep 0.1; done; } > waitiface() { pretty "${1//*-}" "wait for $2 to come up"; ip netns exec "$1" bash -c "while [[ \$(< \"/sys/class/net/$2/operstate\") != up ]]; do read -t .1 -N 0 || true; done;"; } > +use_nft() { nft --version &> /dev/null; } > > cleanup() { > set +e > @@ -196,13 +197,23 @@ ip1 link set wg0 mtu 1300 > ip2 link set wg0 mtu 1300 > n1 wg set wg0 peer "$pub2" endpoint 127.0.0.1:2 > n2 wg set wg0 peer "$pub1" endpoint 127.0.0.1:1 > -n0 iptables -A INPUT -m length --length 1360 -j DROP > +if use_nft; then > + n0 nft add table inet filter Using inet family captures IPv6 traffic, too. You don't seem to explicitly configure it, but the usual auto-config traffic may offset rule counters. If you care about such side-effects, you may want to use ip family instead. Tables are family-specific, but generic otherwise. So you could add a table for testing in each netns up front: | if use_nft; then | n0 nft add table ip wgtest | n1 nft add table ip wgtest | n2 nft add table ip wgtest | fi > + n0 nft add chain inet filter INPUT { type filter hook input priority filter \; policy accept \; } > + n0 nft add rule inet filter INPUT meta length 1360 counter drop > +else > + n0 iptables -A INPUT -m length --length 1360 -j DROP > +fi > n1 ip route add 192.168.241.2/32 dev wg0 mtu 1299 > n2 ip route add 192.168.241.1/32 dev wg0 mtu 1299 > n2 ping -c 1 -W 1 -s 1269 192.168.241.1 > n2 ip route delete 192.168.241.1/32 dev wg0 mtu 1299 > n1 ip route delete 192.168.241.2/32 dev wg0 mtu 1299 > -n0 iptables -F INPUT > +if use_nft; then > + n0 nft delete table inet filter Here just flush the table (drops only the rules): | n0 nft flush table ip wgtest Cheers, Phil From shaw.leon at gmail.com Wed Nov 13 12:57:09 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Wed, 13 Nov 2024 20:57:09 +0800 Subject: [PATCH net-next v3 0/6] net: Improve netns handling in RTNL and ip_tunnel Message-ID: <20241113125715.150201-1-shaw.leon@gmail.com> This patch series includes some netns-related improvements and fixes for RTNL and ip_tunnel, to make link creation more intuitive: - Creating link in another net namespace doesn't conflict with link names in current one. - Refector rtnetlink link creation. Create link in target namespace directly. Pass both source and link netns to drivers via newlink() callback. So that # ip link add netns ns1 link-netns ns2 tun0 type gre ... will create tun0 in ns1, rather than create it in ns2 and move to ns1. And don't conflict with another interface named "tun0" in current netns. Patch 1 from Donald is included just as a dependency. --- v3: - Drop "netns_atomic" flag and module parameter. Add netns parameter to newlink() instead, and convert drivers accordingly. - Move python NetNSEnter helper to net selftest lib. v2: link: https://lore.kernel.org/all/20241107133004.7469-1-shaw.leon at gmail.com/ - Check NLM_F_EXCL to ensure only link creation is affected. - Add self tests for link name/ifindex conflict and notifications in different netns. - Changes in dummy driver and ynl in order to add the test case. v1: link: https://lore.kernel.org/all/20241023023146.372653-1-shaw.leon at gmail.com/ Donald Hunter (1): Revert "tools/net/ynl: improve async notification handling" Xiao Liang (5): net: ip_tunnel: Build flow in underlay net namespace rtnetlink: Lookup device in target netns when creating link rtnetlink: Decouple net namespaces in rtnl_newlink_create() selftests: net: Add python context manager for netns entering selftests: net: Add two test cases for link netns drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 6 ++- drivers/net/amt.c | 6 +-- drivers/net/bareudp.c | 4 +- drivers/net/bonding/bond_netlink.c | 3 +- drivers/net/can/dev/netlink.c | 2 +- drivers/net/can/vxcan.c | 4 +- .../ethernet/qualcomm/rmnet/rmnet_config.c | 5 +- drivers/net/geneve.c | 4 +- drivers/net/gtp.c | 4 +- drivers/net/ipvlan/ipvlan.h | 2 +- drivers/net/ipvlan/ipvlan_main.c | 5 +- drivers/net/ipvlan/ipvtap.c | 4 +- drivers/net/macsec.c | 5 +- drivers/net/macvlan.c | 5 +- drivers/net/macvtap.c | 5 +- drivers/net/netkit.c | 4 +- drivers/net/pfcp.c | 4 +- drivers/net/ppp/ppp_generic.c | 4 +- drivers/net/team/team_core.c | 2 +- drivers/net/veth.c | 4 +- drivers/net/vrf.c | 2 +- drivers/net/vxlan/vxlan_core.c | 4 +- drivers/net/wireguard/device.c | 4 +- drivers/net/wireless/virtual/virt_wifi.c | 5 +- drivers/net/wwan/wwan_core.c | 6 ++- include/net/ip_tunnels.h | 5 +- include/net/rtnetlink.h | 22 ++++++++- net/8021q/vlan_netlink.c | 5 +- net/batman-adv/soft-interface.c | 5 +- net/bridge/br_netlink.c | 2 +- net/caif/chnl_net.c | 2 +- net/core/rtnetlink.c | 25 ++++++---- net/hsr/hsr_netlink.c | 8 +-- net/ieee802154/6lowpan/core.c | 5 +- net/ipv4/ip_gre.c | 13 +++-- net/ipv4/ip_tunnel.c | 16 +++--- net/ipv4/ip_vti.c | 5 +- net/ipv4/ipip.c | 5 +- net/ipv6/ip6_gre.c | 17 ++++--- net/ipv6/ip6_tunnel.c | 11 ++--- net/ipv6/ip6_vti.c | 11 ++--- net/ipv6/sit.c | 11 ++--- net/xfrm/xfrm_interface_core.c | 13 +++-- tools/net/ynl/cli.py | 10 ++-- tools/net/ynl/lib/ynl.py | 49 ++++++++----------- tools/testing/selftests/net/Makefile | 1 + .../testing/selftests/net/lib/py/__init__.py | 2 +- tools/testing/selftests/net/lib/py/netns.py | 18 +++++++ tools/testing/selftests/net/netns-name.sh | 10 ++++ tools/testing/selftests/net/netns_atomic.py | 38 ++++++++++++++ 50 files changed, 255 insertions(+), 157 deletions(-) create mode 100755 tools/testing/selftests/net/netns_atomic.py -- 2.47.0 From shaw.leon at gmail.com Wed Nov 13 12:57:10 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Wed, 13 Nov 2024 20:57:10 +0800 Subject: [PATCH net-next v3 1/6] Revert "tools/net/ynl: improve async notification handling" In-Reply-To: <20241113125715.150201-1-shaw.leon@gmail.com> References: <20241113125715.150201-1-shaw.leon@gmail.com> Message-ID: <20241113125715.150201-2-shaw.leon@gmail.com> From: Donald Hunter This reverts commit 1bf70e6c3a5346966c25e0a1ff492945b25d3f80. This modification to check_ntf() is being reverted so that its behaviour remains equivalent to ynl_ntf_check() in the C YNL. Instead a new poll_ntf() will be added in a separate patch. Signed-off-by: Donald Hunter --- tools/net/ynl/cli.py | 10 +++----- tools/net/ynl/lib/ynl.py | 49 ++++++++++++++++------------------------ 2 files changed, 23 insertions(+), 36 deletions(-) diff --git a/tools/net/ynl/cli.py b/tools/net/ynl/cli.py index 9e95016b85b3..b8481f401376 100755 --- a/tools/net/ynl/cli.py +++ b/tools/net/ynl/cli.py @@ -5,7 +5,6 @@ import argparse import json import pprint import time -import signal from lib import YnlFamily, Netlink, NlError @@ -18,8 +17,6 @@ class YnlEncoder(json.JSONEncoder): return list(obj) return json.JSONEncoder.default(self, obj) -def handle_timeout(sig, frame): - exit(0) def main(): description = """ @@ -84,8 +81,7 @@ def main(): ynl.ntf_subscribe(args.ntf) if args.sleep: - signal.signal(signal.SIGALRM, handle_timeout) - signal.alarm(args.sleep) + time.sleep(args.sleep) if args.list_ops: for op_name, op in ynl.ops.items(): @@ -110,8 +106,8 @@ def main(): exit(1) if args.ntf: - for msg in ynl.check_ntf(): - output(msg) + ynl.check_ntf() + output(ynl.async_msg_queue) if __name__ == "__main__": diff --git a/tools/net/ynl/lib/ynl.py b/tools/net/ynl/lib/ynl.py index 92f85698c50e..c22c22bf2cb7 100644 --- a/tools/net/ynl/lib/ynl.py +++ b/tools/net/ynl/lib/ynl.py @@ -12,8 +12,6 @@ import sys import yaml import ipaddress import uuid -import queue -import time from .nlspec import SpecFamily @@ -491,7 +489,7 @@ class YnlFamily(SpecFamily): self.sock.setsockopt(Netlink.SOL_NETLINK, Netlink.NETLINK_GET_STRICT_CHK, 1) self.async_msg_ids = set() - self.async_msg_queue = queue.Queue() + self.async_msg_queue = [] for msg in self.msgs.values(): if msg.is_async: @@ -905,39 +903,32 @@ class YnlFamily(SpecFamily): msg['name'] = op['name'] msg['msg'] = attrs - self.async_msg_queue.put(msg) + self.async_msg_queue.append(msg) - def check_ntf(self, interval=0.1): + def check_ntf(self): while True: try: reply = self.sock.recv(self._recv_size, socket.MSG_DONTWAIT) - nms = NlMsgs(reply) - self._recv_dbg_print(reply, nms) - for nl_msg in nms: - if nl_msg.error: - print("Netlink error in ntf!?", os.strerror(-nl_msg.error)) - print(nl_msg) - continue - if nl_msg.done: - print("Netlink done while checking for ntf!?") - continue + except BlockingIOError: + return - decoded = self.nlproto.decode(self, nl_msg, None) - if decoded.cmd() not in self.async_msg_ids: - print("Unexpected msg id while checking for ntf", decoded) - continue + nms = NlMsgs(reply) + self._recv_dbg_print(reply, nms) + for nl_msg in nms: + if nl_msg.error: + print("Netlink error in ntf!?", os.strerror(-nl_msg.error)) + print(nl_msg) + continue + if nl_msg.done: + print("Netlink done while checking for ntf!?") + continue - self.handle_ntf(decoded) - except BlockingIOError: - pass + decoded = self.nlproto.decode(self, nl_msg, None) + if decoded.cmd() not in self.async_msg_ids: + print("Unexpected msg id done while checking for ntf", decoded) + continue - try: - yield self.async_msg_queue.get_nowait() - except queue.Empty: - try: - time.sleep(interval) - except KeyboardInterrupt: - return + self.handle_ntf(decoded) def operation_do_attributes(self, name): """ -- 2.47.0 From shaw.leon at gmail.com Wed Nov 13 12:57:11 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Wed, 13 Nov 2024 20:57:11 +0800 Subject: [PATCH net-next v3 2/6] net: ip_tunnel: Build flow in underlay net namespace In-Reply-To: <20241113125715.150201-1-shaw.leon@gmail.com> References: <20241113125715.150201-1-shaw.leon@gmail.com> Message-ID: <20241113125715.150201-3-shaw.leon@gmail.com> Build IPv4 flow in underlay net namespace, where encapsulated packets are routed. Signed-off-by: Xiao Liang --- net/ipv4/ip_tunnel.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 25505f9b724c..09b73acf037a 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -294,7 +294,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev) ip_tunnel_init_flow(&fl4, iph->protocol, iph->daddr, iph->saddr, tunnel->parms.o_key, - iph->tos & INET_DSCP_MASK, dev_net(dev), + iph->tos & INET_DSCP_MASK, tunnel->net, tunnel->parms.link, tunnel->fwmark, 0, 0); rt = ip_route_output_key(tunnel->net, &fl4); @@ -611,7 +611,7 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, } ip_tunnel_init_flow(&fl4, proto, key->u.ipv4.dst, key->u.ipv4.src, tunnel_id_to_key32(key->tun_id), - tos & INET_DSCP_MASK, dev_net(dev), 0, skb->mark, + tos & INET_DSCP_MASK, tunnel->net, 0, skb->mark, skb_get_hash(skb), key->flow_flags); if (!tunnel_hlen) @@ -774,7 +774,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, ip_tunnel_init_flow(&fl4, protocol, dst, tnl_params->saddr, tunnel->parms.o_key, tos & INET_DSCP_MASK, - dev_net(dev), READ_ONCE(tunnel->parms.link), + tunnel->net, READ_ONCE(tunnel->parms.link), tunnel->fwmark, skb_get_hash(skb), 0); if (ip_tunnel_encap(skb, &tunnel->encap, &protocol, &fl4) < 0) -- 2.47.0 From shaw.leon at gmail.com Wed Nov 13 12:57:12 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Wed, 13 Nov 2024 20:57:12 +0800 Subject: [PATCH net-next v3 3/6] rtnetlink: Lookup device in target netns when creating link In-Reply-To: <20241113125715.150201-1-shaw.leon@gmail.com> References: <20241113125715.150201-1-shaw.leon@gmail.com> Message-ID: <20241113125715.150201-4-shaw.leon@gmail.com> When creating link, lookup for existing device in target net namespace instead of current one. For example, two links created by: # ip link add dummy1 type dummy # ip link add netns ns1 dummy1 type dummy should have no conflict since they are in different namespaces. Signed-off-by: Xiao Liang --- net/core/rtnetlink.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 327fa4957929..f573ace60234 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3846,20 +3846,26 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, { struct nlattr ** const tb = tbs->tb; struct net *net = sock_net(skb->sk); + struct net *device_net; struct net_device *dev; struct ifinfomsg *ifm; bool link_specified; + /* When creating, lookup for existing device in target net namespace */ + device_net = (nlh->nlmsg_flags & NLM_F_CREATE) && + (nlh->nlmsg_flags & NLM_F_EXCL) ? + tgt_net : net; + ifm = nlmsg_data(nlh); if (ifm->ifi_index > 0) { link_specified = true; - dev = __dev_get_by_index(net, ifm->ifi_index); + dev = __dev_get_by_index(device_net, ifm->ifi_index); } else if (ifm->ifi_index < 0) { NL_SET_ERR_MSG(extack, "ifindex can't be negative"); return -EINVAL; } else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) { link_specified = true; - dev = rtnl_dev_get(net, tb); + dev = rtnl_dev_get(device_net, tb); } else { link_specified = false; dev = NULL; -- 2.47.0 From shaw.leon at gmail.com Wed Nov 13 12:57:13 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Wed, 13 Nov 2024 20:57:13 +0800 Subject: [PATCH net-next v3 4/6] rtnetlink: Decouple net namespaces in rtnl_newlink_create() In-Reply-To: <20241113125715.150201-1-shaw.leon@gmail.com> References: <20241113125715.150201-1-shaw.leon@gmail.com> Message-ID: <20241113125715.150201-5-shaw.leon@gmail.com> There are three net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend). Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on whether IFLA_LINK_NETNSID is present. +-------------------+---------+---------+ | IFLA_LINK_NETNSID | src_net | dev_net | +-------------------+---------+---------+ | absent | source | target | +-------------------+---------+---------+ | present | link | link | +-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first. This has some side effects, including extra ifindex allocation, ifname validation and link notifications. There's also an extra step to move the device to target netns. These could be avoided if we create it in target netns at the beginning. On the other hand, the meaning of src_net is ambiguous. It should be the effective link netns by design, but some drivers ignore it and use dev_net instead. This patch refactors netns handling. rtnl_newlink_create() now creates devices in target netns directly, so dev_net is always target netns. Source and link netns are passed to newlink() as src_net and link_net in a new struct consistently. When determining the effective link netns, in the absence link_net, drivers should look for src_net in general. But for compatibility, drivers that use dev_net will keep current behavior. Signed-off-by: Xiao Liang --- There're some issues found when coverting drivers. Please check if they work as intended: - In amt_newlink() drivers/net/amt.c: amt->net = net; ... amt->stream_dev = dev_get_by_index(net, ... Uses net (src_net actually), but amt_lookup_upper_dev() only searches in dev_net. - In gtp_newlink() in drivers/net/gtp.c: gtp->net = src_net; ... gn = net_generic(dev_net(dev), gtp_net_id); list_add_rcu(>p->list, &gn->gtp_dev_list); Uses src_net, but is linked to list in dev_net. - In pfcp_newlink() in drivers/net/pfcp.c: pfcp->net = net; ... pn = net_generic(dev_net(dev), pfcp_net_id); list_add_rcu(&pfcp->list, &pn->pfcp_dev_list); Same. - In lowpan_newlink() in net/ieee802154/6lowpan/core.c: wdev = dev_get_by_index(dev_net(ldev), nla_get_u32(tb[IFLA_LINK])); Looks for IFLA_LINK in dev_net, but in theory the ifindex is defined in link netns. --- drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 6 +++-- drivers/net/amt.c | 6 ++--- drivers/net/bareudp.c | 4 ++-- drivers/net/bonding/bond_netlink.c | 3 ++- drivers/net/can/dev/netlink.c | 2 +- drivers/net/can/vxcan.c | 4 ++-- .../ethernet/qualcomm/rmnet/rmnet_config.c | 5 +++-- drivers/net/geneve.c | 4 ++-- drivers/net/gtp.c | 4 ++-- drivers/net/ipvlan/ipvlan.h | 2 +- drivers/net/ipvlan/ipvlan_main.c | 5 +++-- drivers/net/ipvlan/ipvtap.c | 4 ++-- drivers/net/macsec.c | 5 +++-- drivers/net/macvlan.c | 5 +++-- drivers/net/macvtap.c | 5 +++-- drivers/net/netkit.c | 4 ++-- drivers/net/pfcp.c | 4 ++-- drivers/net/ppp/ppp_generic.c | 4 ++-- drivers/net/team/team_core.c | 2 +- drivers/net/veth.c | 4 ++-- drivers/net/vrf.c | 2 +- drivers/net/vxlan/vxlan_core.c | 4 ++-- drivers/net/wireguard/device.c | 4 ++-- drivers/net/wireless/virtual/virt_wifi.c | 5 +++-- drivers/net/wwan/wwan_core.c | 6 +++-- include/net/ip_tunnels.h | 5 +++-- include/net/rtnetlink.h | 22 ++++++++++++++++++- net/8021q/vlan_netlink.c | 5 +++-- net/batman-adv/soft-interface.c | 5 +++-- net/bridge/br_netlink.c | 2 +- net/caif/chnl_net.c | 2 +- net/core/rtnetlink.c | 15 ++++++------- net/hsr/hsr_netlink.c | 8 +++---- net/ieee802154/6lowpan/core.c | 5 +++-- net/ipv4/ip_gre.c | 13 ++++++----- net/ipv4/ip_tunnel.c | 10 +++++---- net/ipv4/ip_vti.c | 5 +++-- net/ipv4/ipip.c | 5 +++-- net/ipv6/ip6_gre.c | 17 +++++++------- net/ipv6/ip6_tunnel.c | 11 +++++----- net/ipv6/ip6_vti.c | 11 +++++----- net/ipv6/sit.c | 11 +++++----- net/xfrm/xfrm_interface_core.c | 13 +++++------ 43 files changed, 153 insertions(+), 115 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c index 9ad8d9856275..7714115a4635 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c @@ -97,7 +97,8 @@ static int ipoib_changelink(struct net_device *dev, struct nlattr *tb[], return ret; } -static int ipoib_new_child_link(struct net *src_net, struct net_device *dev, +static int ipoib_new_child_link(struct rtnl_link_nets *nets, + struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -109,7 +110,8 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev, if (!tb[IFLA_LINK]) return -EINVAL; - pdev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); + pdev = __dev_get_by_index(rtnl_link_netns(nets), + nla_get_u32(tb[IFLA_LINK])); if (!pdev || pdev->type != ARPHRD_INFINIBAND) return -ENODEV; diff --git a/drivers/net/amt.c b/drivers/net/amt.c index 98c6205ed19f..9537175f7ac8 100644 --- a/drivers/net/amt.c +++ b/drivers/net/amt.c @@ -3161,14 +3161,14 @@ static int amt_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } -static int amt_newlink(struct net *net, struct net_device *dev, +static int amt_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct amt_dev *amt = netdev_priv(dev); int err = -EINVAL; - amt->net = net; + amt->net = rtnl_link_netns(nets); amt->mode = nla_get_u32(data[IFLA_AMT_MODE]); if (data[IFLA_AMT_MAX_TUNNELS] && @@ -3183,7 +3183,7 @@ static int amt_newlink(struct net *net, struct net_device *dev, amt->hash_buckets = AMT_HSIZE; amt->nr_tunnels = 0; get_random_bytes(&amt->hash_seed, sizeof(amt->hash_seed)); - amt->stream_dev = dev_get_by_index(net, + amt->stream_dev = dev_get_by_index(rtnl_link_netns(nets), nla_get_u32(data[IFLA_AMT_LINK])); if (!amt->stream_dev) { NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_AMT_LINK], diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c index a2abfade82dd..c0946006d175 100644 --- a/drivers/net/bareudp.c +++ b/drivers/net/bareudp.c @@ -698,7 +698,7 @@ static void bareudp_dellink(struct net_device *dev, struct list_head *head) unregister_netdevice_queue(dev, head); } -static int bareudp_newlink(struct net *net, struct net_device *dev, +static int bareudp_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -709,7 +709,7 @@ static int bareudp_newlink(struct net *net, struct net_device *dev, if (err) return err; - err = bareudp_configure(net, dev, &conf, extack); + err = bareudp_configure(rtnl_link_netns(nets), dev, &conf, extack); if (err) return err; diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c index 2a6a424806aa..242a37934f25 100644 --- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -564,7 +564,8 @@ static int bond_changelink(struct net_device *bond_dev, struct nlattr *tb[], return 0; } -static int bond_newlink(struct net *src_net, struct net_device *bond_dev, +static int bond_newlink(struct rtnl_link_nets *nets, + struct net_device *bond_dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { diff --git a/drivers/net/can/dev/netlink.c b/drivers/net/can/dev/netlink.c index 01aacdcda260..daf88325ccac 100644 --- a/drivers/net/can/dev/netlink.c +++ b/drivers/net/can/dev/netlink.c @@ -624,7 +624,7 @@ static int can_fill_xstats(struct sk_buff *skb, const struct net_device *dev) return -EMSGSIZE; } -static int can_newlink(struct net *src_net, struct net_device *dev, +static int can_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c index da7c72105fb6..2375175044bc 100644 --- a/drivers/net/can/vxcan.c +++ b/drivers/net/can/vxcan.c @@ -172,7 +172,7 @@ static void vxcan_setup(struct net_device *dev) /* forward declaration for rtnl_create_link() */ static struct rtnl_link_ops vxcan_link_ops; -static int vxcan_newlink(struct net *net, struct net_device *dev, +static int vxcan_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -203,7 +203,7 @@ static int vxcan_newlink(struct net *net, struct net_device *dev, name_assign_type = NET_NAME_ENUM; } - peer_net = rtnl_link_get_net(net, tbp); + peer_net = rtnl_link_get_net(rtnl_link_netns(nets), tbp); peer = rtnl_create_link(peer_net, ifname, name_assign_type, &vxcan_link_ops, tbp, extack); if (IS_ERR(peer)) { diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c index f3bea196a8f9..fa881b038d69 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c @@ -117,7 +117,7 @@ static void rmnet_unregister_bridge(struct rmnet_port *port) rmnet_unregister_real_device(bridge_dev); } -static int rmnet_newlink(struct net *src_net, struct net_device *dev, +static int rmnet_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -134,7 +134,8 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev, return -EINVAL; } - real_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); + real_dev = __dev_get_by_index(rtnl_link_netns(nets), + nla_get_u32(tb[IFLA_LINK])); if (!real_dev) { NL_SET_ERR_MSG_MOD(extack, "link does not exist"); return -ENODEV; diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 2f29b1386b1c..f621d5887b3d 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1614,7 +1614,7 @@ static void geneve_link_config(struct net_device *dev, geneve_change_mtu(dev, ldev_mtu - info->options_len); } -static int geneve_newlink(struct net *net, struct net_device *dev, +static int geneve_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -1631,7 +1631,7 @@ static int geneve_newlink(struct net *net, struct net_device *dev, if (err) return err; - err = geneve_configure(net, dev, extack, &cfg); + err = geneve_configure(rtnl_link_netns(nets), dev, extack, &cfg); if (err) return err; diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 89a996ad8cd0..73ad219746ec 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1460,7 +1460,7 @@ static int gtp_create_sockets(struct gtp_dev *gtp, const struct nlattr *nla, #define GTP_TH_MAXLEN (sizeof(struct udphdr) + sizeof(struct gtp0_header)) #define GTP_IPV6_MAXLEN (sizeof(struct ipv6hdr) + GTP_TH_MAXLEN) -static int gtp_newlink(struct net *src_net, struct net_device *dev, +static int gtp_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -1494,7 +1494,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev, gtp->restart_count = nla_get_u8_default(data[IFLA_GTP_RESTART_COUNT], 0); - gtp->net = src_net; + gtp->net = rtnl_link_netns(nets); err = gtp_hashtable_new(gtp, hashsize); if (err < 0) diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h index 025e0c19ec25..81e11086df51 100644 --- a/drivers/net/ipvlan/ipvlan.h +++ b/drivers/net/ipvlan/ipvlan.h @@ -166,7 +166,7 @@ struct ipvl_addr *ipvlan_addr_lookup(struct ipvl_port *port, void *lyr3h, void *ipvlan_get_L3_hdr(struct ipvl_port *port, struct sk_buff *skb, int *type); void ipvlan_count_rx(const struct ipvl_dev *ipvlan, unsigned int len, bool success, bool mcast); -int ipvlan_link_new(struct net *src_net, struct net_device *dev, +int ipvlan_link_new(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack); void ipvlan_link_delete(struct net_device *dev, struct list_head *head); diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index ee2c3cf4df36..5bd670927d27 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -532,7 +532,7 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb, return ret; } -int ipvlan_link_new(struct net *src_net, struct net_device *dev, +int ipvlan_link_new(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -545,7 +545,8 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev, if (!tb[IFLA_LINK]) return -EINVAL; - phy_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); + phy_dev = __dev_get_by_index(rtnl_link_netns(nets), + nla_get_u32(tb[IFLA_LINK])); if (!phy_dev) return -ENODEV; diff --git a/drivers/net/ipvlan/ipvtap.c b/drivers/net/ipvlan/ipvtap.c index 1afc4c47be73..7d8f780581e3 100644 --- a/drivers/net/ipvlan/ipvtap.c +++ b/drivers/net/ipvlan/ipvtap.c @@ -73,7 +73,7 @@ static void ipvtap_update_features(struct tap_dev *tap, netdev_update_features(vlan->dev); } -static int ipvtap_newlink(struct net *src_net, struct net_device *dev, +static int ipvtap_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -97,7 +97,7 @@ static int ipvtap_newlink(struct net *src_net, struct net_device *dev, /* Don't put anything that may fail after macvlan_common_newlink * because we can't undo what it does. */ - err = ipvlan_link_new(src_net, dev, tb, data, extack); + err = ipvlan_link_new(nets, dev, tb, data, extack); if (err) { netdev_rx_handler_unregister(dev); return err; diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 1bc1e5993f56..c7871c6747ca 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -4141,7 +4141,7 @@ static int macsec_add_dev(struct net_device *dev, sci_t sci, u8 icv_len) static struct lock_class_key macsec_netdev_addr_lock_key; -static int macsec_newlink(struct net *net, struct net_device *dev, +static int macsec_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -4154,7 +4154,8 @@ static int macsec_newlink(struct net *net, struct net_device *dev, if (!tb[IFLA_LINK]) return -EINVAL; - real_dev = __dev_get_by_index(net, nla_get_u32(tb[IFLA_LINK])); + real_dev = __dev_get_by_index(rtnl_link_netns(nets), + nla_get_u32(tb[IFLA_LINK])); if (!real_dev) return -ENODEV; if (real_dev->type != ARPHRD_ETHER) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index edbd5afcec41..24a1630e2ead 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1565,11 +1565,12 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev, } EXPORT_SYMBOL_GPL(macvlan_common_newlink); -static int macvlan_newlink(struct net *src_net, struct net_device *dev, +static int macvlan_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - return macvlan_common_newlink(src_net, dev, tb, data, extack); + return macvlan_common_newlink(rtnl_link_netns(nets), dev, tb, data, + extack); } void macvlan_dellink(struct net_device *dev, struct list_head *head) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 29a5929d48e5..99152d4ba82b 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -77,7 +77,7 @@ static void macvtap_update_features(struct tap_dev *tap, netdev_update_features(vlan->dev); } -static int macvtap_newlink(struct net *src_net, struct net_device *dev, +static int macvtap_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -105,7 +105,8 @@ static int macvtap_newlink(struct net *src_net, struct net_device *dev, /* Don't put anything that may fail after macvlan_common_newlink * because we can't undo what it does. */ - err = macvlan_common_newlink(src_net, dev, tb, data, extack); + err = macvlan_common_newlink(rtnl_link_netns(nets), dev, tb, data, + extack); if (err) { netdev_rx_handler_unregister(dev); return err; diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c index bb07725d1c72..70837ebbb9f2 100644 --- a/drivers/net/netkit.c +++ b/drivers/net/netkit.c @@ -327,7 +327,7 @@ static int netkit_validate(struct nlattr *tb[], struct nlattr *data[], static struct rtnl_link_ops netkit_link_ops; -static int netkit_new_link(struct net *src_net, struct net_device *dev, +static int netkit_new_link(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -385,7 +385,7 @@ static int netkit_new_link(struct net *src_net, struct net_device *dev, (tb[IFLA_ADDRESS] || tbp[IFLA_ADDRESS])) return -EOPNOTSUPP; - net = rtnl_link_get_net(src_net, tbp); + net = rtnl_link_get_net(rtnl_link_netns(nets), tbp); peer = rtnl_create_link(net, ifname, ifname_assign_type, &netkit_link_ops, tbp, extack); if (IS_ERR(peer)) { diff --git a/drivers/net/pfcp.c b/drivers/net/pfcp.c index 69434fd13f96..e1cf41cb9a87 100644 --- a/drivers/net/pfcp.c +++ b/drivers/net/pfcp.c @@ -184,7 +184,7 @@ static int pfcp_add_sock(struct pfcp_dev *pfcp) return PTR_ERR_OR_ZERO(pfcp->sock); } -static int pfcp_newlink(struct net *net, struct net_device *dev, +static int pfcp_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -192,7 +192,7 @@ static int pfcp_newlink(struct net *net, struct net_device *dev, struct pfcp_net *pn; int err; - pfcp->net = net; + pfcp->net = rtnl_link_netns(nets); err = pfcp_add_sock(pfcp); if (err) { diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 4583e15ad03a..87ade0389d0d 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -1303,7 +1303,7 @@ static int ppp_nl_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } -static int ppp_nl_newlink(struct net *src_net, struct net_device *dev, +static int ppp_nl_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -1343,7 +1343,7 @@ static int ppp_nl_newlink(struct net *src_net, struct net_device *dev, if (!tb[IFLA_IFNAME] || !nla_len(tb[IFLA_IFNAME]) || !*(char *)nla_data(tb[IFLA_IFNAME])) conf.ifname_is_set = false; - err = ppp_dev_configure(src_net, dev, &conf); + err = ppp_dev_configure(rtnl_link_netns(nets), dev, &conf); out_unlock: mutex_unlock(&ppp_mutex); diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index a1b27b69f010..d0f1fc404264 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -2206,7 +2206,7 @@ static void team_setup(struct net_device *dev) dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; } -static int team_newlink(struct net *src_net, struct net_device *dev, +static int team_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 0d6d0d749d44..a64d0c87b18f 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1765,7 +1765,7 @@ static int veth_init_queues(struct net_device *dev, struct nlattr *tb[]) return 0; } -static int veth_newlink(struct net *src_net, struct net_device *dev, +static int veth_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -1800,7 +1800,7 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, name_assign_type = NET_NAME_ENUM; } - net = rtnl_link_get_net(src_net, tbp); + net = rtnl_link_get_net(rtnl_link_netns(nets), tbp); peer = rtnl_create_link(net, ifname, name_assign_type, &veth_link_ops, tbp, extack); if (IS_ERR(peer)) { diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 67d25f4f94ef..7dd53d84b289 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1698,7 +1698,7 @@ static void vrf_dellink(struct net_device *dev, struct list_head *head) unregister_netdevice_queue(dev, head); } -static int vrf_newlink(struct net *src_net, struct net_device *dev, +static int vrf_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c index 42b07bc2b107..6a7fee5fc504 100644 --- a/drivers/net/vxlan/vxlan_core.c +++ b/drivers/net/vxlan/vxlan_core.c @@ -4345,7 +4345,7 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[], return 0; } -static int vxlan_newlink(struct net *src_net, struct net_device *dev, +static int vxlan_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -4356,7 +4356,7 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev, if (err) return err; - return __vxlan_dev_create(src_net, dev, &conf, extack); + return __vxlan_dev_create(rtnl_link_netns(nets), dev, &conf, extack); } static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[], diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c index 45e9b908dbfb..25118e1085f7 100644 --- a/drivers/net/wireguard/device.c +++ b/drivers/net/wireguard/device.c @@ -306,14 +306,14 @@ static void wg_setup(struct net_device *dev) wg->dev = dev; } -static int wg_newlink(struct net *src_net, struct net_device *dev, +static int wg_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct wg_device *wg = netdev_priv(dev); int ret = -ENOMEM; - rcu_assign_pointer(wg->creating_net, src_net); + rcu_assign_pointer(wg->creating_net, rtnl_link_netns(nets)); init_rwsem(&wg->static_identity.lock); mutex_init(&wg->socket_update_lock); mutex_init(&wg->device_update_lock); diff --git a/drivers/net/wireless/virtual/virt_wifi.c b/drivers/net/wireless/virtual/virt_wifi.c index 4ee374080466..3fea244010e8 100644 --- a/drivers/net/wireless/virtual/virt_wifi.c +++ b/drivers/net/wireless/virtual/virt_wifi.c @@ -519,7 +519,8 @@ static rx_handler_result_t virt_wifi_rx_handler(struct sk_buff **pskb) } /* Called with rtnl lock held. */ -static int virt_wifi_newlink(struct net *src_net, struct net_device *dev, +static int virt_wifi_newlink(struct rtnl_link_nets *nets, + struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -532,7 +533,7 @@ static int virt_wifi_newlink(struct net *src_net, struct net_device *dev, netif_carrier_off(dev); priv->upperdev = dev; - priv->lowerdev = __dev_get_by_index(src_net, + priv->lowerdev = __dev_get_by_index(rtnl_link_netns(nets), nla_get_u32(tb[IFLA_LINK])); if (!priv->lowerdev) diff --git a/drivers/net/wwan/wwan_core.c b/drivers/net/wwan/wwan_core.c index a51e2755991a..f208993e04c0 100644 --- a/drivers/net/wwan/wwan_core.c +++ b/drivers/net/wwan/wwan_core.c @@ -967,7 +967,8 @@ static struct net_device *wwan_rtnl_alloc(struct nlattr *tb[], return dev; } -static int wwan_rtnl_newlink(struct net *src_net, struct net_device *dev, +static int wwan_rtnl_newlink(struct rtnl_link_nets *nets, + struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -1064,6 +1065,7 @@ static void wwan_create_default_link(struct wwan_device *wwandev, struct net_device *dev; struct nlmsghdr *nlh; struct sk_buff *msg; + struct rtnl_link_nets nets = { .src_net = &init_net }; /* Forge attributes required to create a WWAN netdev. We first * build a netlink message and then parse it. This looks @@ -1105,7 +1107,7 @@ static void wwan_create_default_link(struct wwan_device *wwandev, if (WARN_ON(IS_ERR(dev))) goto unlock; - if (WARN_ON(wwan_rtnl_newlink(&init_net, dev, tb, data, NULL))) { + if (WARN_ON(wwan_rtnl_newlink(&nets, dev, tb, data, NULL))) { free_netdev(dev); goto unlock; } diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 1aa31bdb2b31..ae1f2dda4533 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -406,8 +406,9 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb, bool log_ecn_error); int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[], struct ip_tunnel_parm_kern *p, __u32 fwmark); -int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[], - struct ip_tunnel_parm_kern *p, __u32 fwmark); +int ip_tunnel_newlink(struct net *net, struct net_device *dev, + struct nlattr *tb[], struct ip_tunnel_parm_kern *p, + __u32 fwmark); void ip_tunnel_setup(struct net_device *dev, unsigned int net_id); bool ip_tunnel_netlink_encap_parms(struct nlattr *data[], diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index bc0069a8b6ea..f285071a24d4 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -69,6 +69,26 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh) return AF_UNSPEC; } +/** + * struct rtnl_link_nets - net namespace context of newlink. + * + * @src_net: Source netns of rtnetlink socket + * @link_net: Link netns by IFLA_LINK_NETNSID, NULL if not specified. + */ +struct rtnl_link_nets { + struct net *src_net; + struct net *link_net; +}; + +/* Get effective link netns from struct rtnl_link_nets. Generally, this is + * link_net and falls back to src_net. But for compatibility, a driver may + * choose to use dev_net(dev) instead. + */ +static inline struct net *rtnl_link_netns(struct rtnl_link_nets *nets) +{ + return nets->link_net ? : nets->src_net; +} + /** * struct rtnl_link_ops - rtnetlink link operations * @@ -125,7 +145,7 @@ struct rtnl_link_ops { struct nlattr *data[], struct netlink_ext_ack *extack); - int (*newlink)(struct net *src_net, + int (*newlink)(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c index 134419667d59..fe62c7c71e7f 100644 --- a/net/8021q/vlan_netlink.c +++ b/net/8021q/vlan_netlink.c @@ -135,7 +135,7 @@ static int vlan_changelink(struct net_device *dev, struct nlattr *tb[], return 0; } -static int vlan_newlink(struct net *src_net, struct net_device *dev, +static int vlan_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -155,7 +155,8 @@ static int vlan_newlink(struct net *src_net, struct net_device *dev, return -EINVAL; } - real_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); + real_dev = __dev_get_by_index(rtnl_link_netns(nets), + nla_get_u32(tb[IFLA_LINK])); if (!real_dev) { NL_SET_ERR_MSG_MOD(extack, "link does not exist"); return -ENODEV; diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 2758aba47a2f..d1c12072feaa 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -1063,7 +1063,7 @@ static int batadv_softif_validate(struct nlattr *tb[], struct nlattr *data[], /** * batadv_softif_newlink() - pre-initialize and register new batadv link - * @src_net: the applicable net namespace + * @nets: the applicable net namespaces * @dev: network device to register * @tb: IFLA_INFO_DATA netlink attributes * @data: enum batadv_ifla_attrs attributes @@ -1071,7 +1071,8 @@ static int batadv_softif_validate(struct nlattr *tb[], struct nlattr *data[], * * Return: 0 if successful or error otherwise. */ -static int batadv_softif_newlink(struct net *src_net, struct net_device *dev, +static int batadv_softif_newlink(struct rtnl_link_nets *nets, + struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 3e0f47203f2a..715abe3f8b89 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -1553,7 +1553,7 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], return 0; } -static int br_dev_newlink(struct net *src_net, struct net_device *dev, +static int br_dev_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c index 94ad09e36df2..a805f7552ac9 100644 --- a/net/caif/chnl_net.c +++ b/net/caif/chnl_net.c @@ -438,7 +438,7 @@ static void caif_netlink_parms(struct nlattr *data[], } } -static int ipcaif_newlink(struct net *src_net, struct net_device *dev, +static int ipcaif_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index f573ace60234..93bc4ea8a62e 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3750,6 +3750,10 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm, struct net_device *dev; char ifname[IFNAMSIZ]; int err; + struct rtnl_link_nets nets = { + .src_net = net, + .link_net = link_net, + }; if (!ops->alloc && !ops->setup) return -EOPNOTSUPP; @@ -3761,8 +3765,8 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm, name_assign_type = NET_NAME_ENUM; } - dev = rtnl_create_link(link_net ? : tgt_net, ifname, - name_assign_type, ops, tb, extack); + dev = rtnl_create_link(tgt_net, ifname, name_assign_type, ops, tb, + extack); if (IS_ERR(dev)) { err = PTR_ERR(dev); goto out; @@ -3771,7 +3775,7 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm, dev->ifindex = ifm->ifi_index; if (ops->newlink) - err = ops->newlink(link_net ? : net, dev, tb, data, extack); + err = ops->newlink(&nets, dev, tb, data, extack); else err = register_netdevice(dev); if (err < 0) { @@ -3782,11 +3786,6 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm, err = rtnl_configure_link(dev, ifm, portid, nlh); if (err < 0) goto out_unregister; - if (link_net) { - err = dev_change_net_namespace(dev, tgt_net, ifname); - if (err < 0) - goto out_unregister; - } if (tb[IFLA_MASTER]) { err = do_set_master(dev, nla_get_u32(tb[IFLA_MASTER]), extack); if (err) diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c index b68f2f71d0e1..efd42af79b94 100644 --- a/net/hsr/hsr_netlink.c +++ b/net/hsr/hsr_netlink.c @@ -29,7 +29,7 @@ static const struct nla_policy hsr_policy[IFLA_HSR_MAX + 1] = { /* Here, it seems a netdevice has already been allocated for us, and the * hsr_dev_setup routine has been executed. Nice! */ -static int hsr_newlink(struct net *src_net, struct net_device *dev, +static int hsr_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -46,7 +46,7 @@ static int hsr_newlink(struct net *src_net, struct net_device *dev, NL_SET_ERR_MSG_MOD(extack, "Slave1 device not specified"); return -EINVAL; } - link[0] = __dev_get_by_index(src_net, + link[0] = __dev_get_by_index(rtnl_link_netns(nets), nla_get_u32(data[IFLA_HSR_SLAVE1])); if (!link[0]) { NL_SET_ERR_MSG_MOD(extack, "Slave1 does not exist"); @@ -56,7 +56,7 @@ static int hsr_newlink(struct net *src_net, struct net_device *dev, NL_SET_ERR_MSG_MOD(extack, "Slave2 device not specified"); return -EINVAL; } - link[1] = __dev_get_by_index(src_net, + link[1] = __dev_get_by_index(rtnl_link_netns(nets), nla_get_u32(data[IFLA_HSR_SLAVE2])); if (!link[1]) { NL_SET_ERR_MSG_MOD(extack, "Slave2 does not exist"); @@ -69,7 +69,7 @@ static int hsr_newlink(struct net *src_net, struct net_device *dev, } if (data[IFLA_HSR_INTERLINK]) - interlink = __dev_get_by_index(src_net, + interlink = __dev_get_by_index(rtnl_link_netns(nets), nla_get_u32(data[IFLA_HSR_INTERLINK])); if (interlink && interlink == link[0]) { diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c index 175efd860f7b..fd661fec3975 100644 --- a/net/ieee802154/6lowpan/core.c +++ b/net/ieee802154/6lowpan/core.c @@ -129,7 +129,7 @@ static int lowpan_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } -static int lowpan_newlink(struct net *src_net, struct net_device *ldev, +static int lowpan_newlink(struct rtnl_link_nets *nets, struct net_device *ldev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -143,7 +143,8 @@ static int lowpan_newlink(struct net *src_net, struct net_device *ldev, if (!tb[IFLA_LINK]) return -EINVAL; /* find and hold wpan device */ - wdev = dev_get_by_index(dev_net(ldev), nla_get_u32(tb[IFLA_LINK])); + wdev = dev_get_by_index(nets->link_net ? : dev_net(ldev), + nla_get_u32(tb[IFLA_LINK])); if (!wdev) return -ENODEV; if (wdev->type != ARPHRD_IEEE802154) { diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index f1f31ebfc793..6fc344f5005c 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -1389,10 +1389,11 @@ ipgre_newlink_encap_setup(struct net_device *dev, struct nlattr *data[]) return 0; } -static int ipgre_newlink(struct net *src_net, struct net_device *dev, +static int ipgre_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { + struct net *net = nets->link_net ? : dev_net(dev); struct ip_tunnel_parm_kern p; __u32 fwmark = 0; int err; @@ -1404,13 +1405,14 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, err = ipgre_netlink_parms(dev, data, tb, &p, &fwmark); if (err < 0) return err; - return ip_tunnel_newlink(dev, tb, &p, fwmark); + return ip_tunnel_newlink(net, dev, tb, &p, fwmark); } -static int erspan_newlink(struct net *src_net, struct net_device *dev, +static int erspan_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { + struct net *net = nets->link_net ? : dev_net(dev); struct ip_tunnel_parm_kern p; __u32 fwmark = 0; int err; @@ -1422,7 +1424,7 @@ static int erspan_newlink(struct net *src_net, struct net_device *dev, err = erspan_netlink_parms(dev, data, tb, &p, &fwmark); if (err) return err; - return ip_tunnel_newlink(dev, tb, &p, fwmark); + return ip_tunnel_newlink(net, dev, tb, &p, fwmark); } static int ipgre_changelink(struct net_device *dev, struct nlattr *tb[], @@ -1695,6 +1697,7 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name, LIST_HEAD(list_kill); struct ip_tunnel *t; int err; + struct rtnl_link_nets nets = { .src_net = net }; memset(&tb, 0, sizeof(tb)); @@ -1707,7 +1710,7 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name, t = netdev_priv(dev); t->collect_md = true; - err = ipgre_newlink(net, dev, tb, NULL, NULL); + err = ipgre_newlink(&nets, dev, tb, NULL, NULL); if (err < 0) { free_netdev(dev); return ERR_PTR(err); diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 09b73acf037a..618a50d5c0c2 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -1213,11 +1213,11 @@ void ip_tunnel_delete_nets(struct list_head *net_list, unsigned int id, } EXPORT_SYMBOL_GPL(ip_tunnel_delete_nets); -int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[], - struct ip_tunnel_parm_kern *p, __u32 fwmark) +int ip_tunnel_newlink(struct net *net, struct net_device *dev, + struct nlattr *tb[], struct ip_tunnel_parm_kern *p, + __u32 fwmark) { struct ip_tunnel *nt; - struct net *net = dev_net(dev); struct ip_tunnel_net *itn; int mtu; int err; @@ -1326,7 +1326,9 @@ int ip_tunnel_init(struct net_device *dev) } tunnel->dev = dev; - tunnel->net = dev_net(dev); + if (!tunnel->net) + tunnel->net = dev_net(dev); + strscpy(tunnel->parms.name, dev->name); iph->version = 4; iph->ihl = 5; diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index f0b4419cef34..c315d60e02ec 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -575,7 +575,7 @@ static void vti_netlink_parms(struct nlattr *data[], *fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]); } -static int vti_newlink(struct net *src_net, struct net_device *dev, +static int vti_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -583,7 +583,8 @@ static int vti_newlink(struct net *src_net, struct net_device *dev, __u32 fwmark = 0; vti_netlink_parms(data, &parms, &fwmark); - return ip_tunnel_newlink(dev, tb, &parms, fwmark); + return ip_tunnel_newlink(nets->link_net ? : dev_net(dev), dev, tb, + &parms, fwmark); } static int vti_changelink(struct net_device *dev, struct nlattr *tb[], diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index dc0db5895e0e..7d6619d7ab3e 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -436,7 +436,7 @@ static void ipip_netlink_parms(struct nlattr *data[], *fwmark = nla_get_u32(data[IFLA_IPTUN_FWMARK]); } -static int ipip_newlink(struct net *src_net, struct net_device *dev, +static int ipip_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -453,7 +453,8 @@ static int ipip_newlink(struct net *src_net, struct net_device *dev, } ipip_netlink_parms(data, &p, &t->collect_md, &fwmark); - return ip_tunnel_newlink(dev, tb, &p, fwmark); + return ip_tunnel_newlink(nets->link_net ? : dev_net(dev), dev, tb, &p, + fwmark); } static int ipip_changelink(struct net_device *dev, struct nlattr *tb[], diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 235808cfec70..eb2909689002 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -1971,7 +1971,7 @@ static bool ip6gre_netlink_encap_parms(struct nlattr *data[], return ret; } -static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev, +static int ip6gre_newlink_common(struct net *link_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -1992,7 +1992,7 @@ static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev, eth_hw_addr_random(dev); nt->dev = dev; - nt->net = dev_net(dev); + nt->net = link_net; err = register_netdevice(dev); if (err) @@ -2005,12 +2005,12 @@ static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev, return err; } -static int ip6gre_newlink(struct net *src_net, struct net_device *dev, +static int ip6gre_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct ip6_tnl *nt = netdev_priv(dev); - struct net *net = dev_net(dev); + struct net *net = nets->link_net ? : dev_net(dev); struct ip6gre_net *ign; int err; @@ -2025,7 +2025,7 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev, return -EEXIST; } - err = ip6gre_newlink_common(src_net, dev, tb, data, extack); + err = ip6gre_newlink_common(net, dev, tb, data, extack); if (!err) { ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]); ip6gre_tunnel_link_md(ign, nt); @@ -2241,12 +2241,13 @@ static void ip6erspan_tap_setup(struct net_device *dev) netif_keep_dst(dev); } -static int ip6erspan_newlink(struct net *src_net, struct net_device *dev, +static int ip6erspan_newlink(struct rtnl_link_nets *nets, + struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct ip6_tnl *nt = netdev_priv(dev); - struct net *net = dev_net(dev); + struct net *net = nets->link_net ? : dev_net(dev); struct ip6gre_net *ign; int err; @@ -2262,7 +2263,7 @@ static int ip6erspan_newlink(struct net *src_net, struct net_device *dev, return -EEXIST; } - err = ip6gre_newlink_common(src_net, dev, tb, data, extack); + err = ip6gre_newlink_common(net, dev, tb, data, extack); if (!err) { ip6erspan_tnl_link_config(nt, !tb[IFLA_MTU]); ip6erspan_tunnel_link_md(ign, nt); diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 48fd53b98972..994357690d52 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -250,10 +250,9 @@ static void ip6_dev_free(struct net_device *dev) dst_cache_destroy(&t->dst_cache); } -static int ip6_tnl_create2(struct net_device *dev) +static int ip6_tnl_create2(struct net *net, struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); - struct net *net = dev_net(dev); struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id); int err; @@ -308,7 +307,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p) t = netdev_priv(dev); t->parms = *p; t->net = dev_net(dev); - err = ip6_tnl_create2(dev); + err = ip6_tnl_create2(net, dev); if (err < 0) goto failed_free; @@ -2002,11 +2001,11 @@ static void ip6_tnl_netlink_parms(struct nlattr *data[], parms->fwmark = nla_get_u32(data[IFLA_IPTUN_FWMARK]); } -static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev, +static int ip6_tnl_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - struct net *net = dev_net(dev); + struct net *net = nets->link_net ? : dev_net(dev); struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id); struct ip_tunnel_encap ipencap; struct ip6_tnl *nt, *t; @@ -2031,7 +2030,7 @@ static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev, return -EEXIST; } - err = ip6_tnl_create2(dev); + err = ip6_tnl_create2(net, dev); if (!err && tb[IFLA_MTU]) ip6_tnl_change_mtu(dev, nla_get_u32(tb[IFLA_MTU])); diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index 590737c27537..a329ee728331 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -174,10 +174,9 @@ vti6_tnl_unlink(struct vti6_net *ip6n, struct ip6_tnl *t) } } -static int vti6_tnl_create2(struct net_device *dev) +static int vti6_tnl_create2(struct net *net, struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); - struct net *net = dev_net(dev); struct vti6_net *ip6n = net_generic(net, vti6_net_id); int err; @@ -221,7 +220,7 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p t->parms = *p; t->net = dev_net(dev); - err = vti6_tnl_create2(dev); + err = vti6_tnl_create2(net, dev); if (err < 0) goto failed_free; @@ -997,11 +996,11 @@ static void vti6_netlink_parms(struct nlattr *data[], parms->fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]); } -static int vti6_newlink(struct net *src_net, struct net_device *dev, +static int vti6_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - struct net *net = dev_net(dev); + struct net *net = nets->link_net ? : dev_net(dev); struct ip6_tnl *nt; nt = netdev_priv(dev); @@ -1012,7 +1011,7 @@ static int vti6_newlink(struct net *src_net, struct net_device *dev, if (vti6_locate(net, &nt->parms, 0)) return -EEXIST; - return vti6_tnl_create2(dev); + return vti6_tnl_create2(net, dev); } static void vti6_dellink(struct net_device *dev, struct list_head *head) diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 39bd8951bfca..6d7e4be325e7 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -198,10 +198,9 @@ static void ipip6_tunnel_clone_6rd(struct net_device *dev, struct sit_net *sitn) #endif } -static int ipip6_tunnel_create(struct net_device *dev) +static int ipip6_tunnel_create(struct net *net, struct net_device *dev) { struct ip_tunnel *t = netdev_priv(dev); - struct net *net = dev_net(dev); struct sit_net *sitn = net_generic(net, sit_net_id); int err; @@ -270,7 +269,7 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net, nt = netdev_priv(dev); nt->parms = *parms; - if (ipip6_tunnel_create(dev) < 0) + if (ipip6_tunnel_create(net, dev) < 0) goto failed_free; if (!parms->name[0]) @@ -1550,11 +1549,11 @@ static bool ipip6_netlink_6rd_parms(struct nlattr *data[], } #endif -static int ipip6_newlink(struct net *src_net, struct net_device *dev, +static int ipip6_newlink(struct rtnl_link_nets *nets, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - struct net *net = dev_net(dev); + struct net *net = nets->link_net ? : dev_net(dev); struct ip_tunnel *nt; struct ip_tunnel_encap ipencap; #ifdef CONFIG_IPV6_SIT_6RD @@ -1575,7 +1574,7 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev, if (ipip6_tunnel_locate(net, &nt->parms, 0)) return -EEXIST; - err = ipip6_tunnel_create(dev); + err = ipip6_tunnel_create(net, dev); if (err < 0) return err; diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c index 98f1e2b67c76..c072de6db133 100644 --- a/net/xfrm/xfrm_interface_core.c +++ b/net/xfrm/xfrm_interface_core.c @@ -242,10 +242,9 @@ static void xfrmi_dev_free(struct net_device *dev) gro_cells_destroy(&xi->gro_cells); } -static int xfrmi_create(struct net_device *dev) +static int xfrmi_create(struct net *net, struct net_device *dev) { struct xfrm_if *xi = netdev_priv(dev); - struct net *net = dev_net(dev); struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id); int err; @@ -814,11 +813,11 @@ static void xfrmi_netlink_parms(struct nlattr *data[], parms->collect_md = true; } -static int xfrmi_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int xfrmi_newlink(struct rtnl_link_nets *nets, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[], + struct netlink_ext_ack *extack) { - struct net *net = dev_net(dev); + struct net *net = nets->link_net ? : dev_net(dev); struct xfrm_if_parms p = {}; struct xfrm_if *xi; int err; @@ -851,7 +850,7 @@ static int xfrmi_newlink(struct net *src_net, struct net_device *dev, xi->net = net; xi->dev = dev; - err = xfrmi_create(dev); + err = xfrmi_create(net, dev); return err; } -- 2.47.0 From shaw.leon at gmail.com Wed Nov 13 12:57:14 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Wed, 13 Nov 2024 20:57:14 +0800 Subject: [PATCH net-next v3 5/6] selftests: net: Add python context manager for netns entering In-Reply-To: <20241113125715.150201-1-shaw.leon@gmail.com> References: <20241113125715.150201-1-shaw.leon@gmail.com> Message-ID: <20241113125715.150201-6-shaw.leon@gmail.com> Change netns of current thread and switch back on context exit. For example: with NetNSEnter("ns1"): ip("link add dummy0 type dummy") The command be executed in netns "ns1". Signed-off-by: Xiao Liang --- tools/testing/selftests/net/lib/py/__init__.py | 2 +- tools/testing/selftests/net/lib/py/netns.py | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/lib/py/__init__.py b/tools/testing/selftests/net/lib/py/__init__.py index 54d8f5eba810..e2d6c7b63019 100644 --- a/tools/testing/selftests/net/lib/py/__init__.py +++ b/tools/testing/selftests/net/lib/py/__init__.py @@ -2,7 +2,7 @@ from .consts import KSRC from .ksft import * -from .netns import NetNS +from .netns import NetNS, NetNSEnter from .nsim import * from .utils import * from .ynl import NlError, YnlFamily, EthtoolFamily, NetdevFamily, RtnlFamily diff --git a/tools/testing/selftests/net/lib/py/netns.py b/tools/testing/selftests/net/lib/py/netns.py index ecff85f9074f..8e9317044eef 100644 --- a/tools/testing/selftests/net/lib/py/netns.py +++ b/tools/testing/selftests/net/lib/py/netns.py @@ -1,9 +1,12 @@ # SPDX-License-Identifier: GPL-2.0 from .utils import ip +import ctypes import random import string +libc = ctypes.cdll.LoadLibrary('libc.so.6') + class NetNS: def __init__(self, name=None): @@ -29,3 +32,18 @@ class NetNS: def __repr__(self): return f"NetNS({self.name})" + + +class NetNSEnter: + def __init__(self, ns_name): + self.ns_path = f"/run/netns/{ns_name}" + + def __enter__(self): + self.saved = open("/proc/thread-self/ns/net") + with open(self.ns_path) as ns_file: + libc.setns(ns_file.fileno(), 0) + return self + + def __exit__(self, exc_type, exc_value, traceback): + libc.setns(self.saved.fileno(), 0) + self.saved.close() -- 2.47.0 From shaw.leon at gmail.com Wed Nov 13 12:57:15 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Wed, 13 Nov 2024 20:57:15 +0800 Subject: [PATCH net-next v3 6/6] selftests: net: Add two test cases for link netns In-Reply-To: <20241113125715.150201-1-shaw.leon@gmail.com> References: <20241113125715.150201-1-shaw.leon@gmail.com> Message-ID: <20241113125715.150201-7-shaw.leon@gmail.com> - Add test for creating link in another netns when a link of the same name and ifindex exists in current netns. - Add test for link netns atomicity - create link directly in target netns, and no notifications should be generated in current netns. Signed-off-by: Xiao Liang --- tools/testing/selftests/net/Makefile | 1 + tools/testing/selftests/net/netns-name.sh | 10 ++++++ tools/testing/selftests/net/netns_atomic.py | 38 +++++++++++++++++++++ 3 files changed, 49 insertions(+) create mode 100755 tools/testing/selftests/net/netns_atomic.py diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 2b2a5ec7fa6a..4c15a115c251 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -34,6 +34,7 @@ TEST_PROGS += gre_gso.sh TEST_PROGS += cmsg_so_mark.sh TEST_PROGS += cmsg_time.sh cmsg_ipv6.sh TEST_PROGS += netns-name.sh +TEST_PROGS += netns_atomic.py TEST_PROGS += nl_netdev.py TEST_PROGS += srv6_end_dt46_l3vpn_test.sh TEST_PROGS += srv6_end_dt4_l3vpn_test.sh diff --git a/tools/testing/selftests/net/netns-name.sh b/tools/testing/selftests/net/netns-name.sh index 6974474c26f3..0be1905d1f2f 100755 --- a/tools/testing/selftests/net/netns-name.sh +++ b/tools/testing/selftests/net/netns-name.sh @@ -78,6 +78,16 @@ ip -netns $NS link show dev $ALT_NAME 2> /dev/null && fail "Can still find alt-name after move" ip -netns $test_ns link del $DEV || fail +# +# Test no conflict of the same name/ifindex in different netns +# +ip -netns $NS link add name $DEV index 100 type dummy || fail +ip -netns $NS link add netns $test_ns name $DEV index 100 type dummy || + fail "Can create in netns without moving" +ip -netns $test_ns link show dev $DEV >> /dev/null || fail "Device not found" +ip -netns $NS link del $DEV || fail +ip -netns $test_ns link del $DEV || fail + echo -ne "$(basename $0) \t\t\t\t" if [ $RET_CODE -eq 0 ]; then echo "[ OK ]" diff --git a/tools/testing/selftests/net/netns_atomic.py b/tools/testing/selftests/net/netns_atomic.py new file mode 100755 index 000000000000..e6c4147ef75e --- /dev/null +++ b/tools/testing/selftests/net/netns_atomic.py @@ -0,0 +1,38 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +import time + +from lib.py import ksft_run, ksft_exit, ksft_true +from lib.py import ip +from lib.py import NetNS, NetNSEnter +from lib.py import RtnlFamily + + +def test_event(ns1, ns2) -> None: + with NetNSEnter(str(ns1)): + rtnl = RtnlFamily() + + rtnl.ntf_subscribe("rtnlgrp-link") + + ip(f"netns set {ns1} 0", ns=str(ns2)) + + ip(f"link add netns {ns2} link-netnsid 0 dummy1 type dummy") + ip(f"link add netns {ns2} dummy2 type dummy", ns=str(ns1)) + + ip("link del dummy1", ns=str(ns2)) + ip("link del dummy2", ns=str(ns2)) + + time.sleep(1) + rtnl.check_ntf() + ksft_true(not rtnl.async_msg_queue, "Received unexpected link notification") + + +def main() -> None: + with NetNS() as ns1, NetNS() as ns2: + ksft_run([test_event], args=(ns1, ns2)) + ksft_exit() + + +if __name__ == "__main__": + main() -- 2.47.0 From patch-notifications at ellerman.id.au Sun Nov 17 11:56:15 2024 From: patch-notifications at ellerman.id.au (Michael Ellerman) Date: Sun, 17 Nov 2024 22:56:15 +1100 Subject: (subset) [PATCH 00/17] replace call_rcu by kfree_rcu for simple kmem_cache_free callback In-Reply-To: <20241013201704.49576-1-Julia.Lawall@inria.fr> References: <20241013201704.49576-1-Julia.Lawall@inria.fr> Message-ID: <173184457524.887714.2708612402334434298.b4-ty@ellerman.id.au> On Sun, 13 Oct 2024 22:16:47 +0200, Julia Lawall wrote: > Since SLOB was removed and since > commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), > it is not necessary to use call_rcu when the callback only performs > kmem_cache_free. Use kfree_rcu() directly. > > The changes were done using the following Coccinelle semantic patch. > This semantic patch is designed to ignore cases where the callback > function is used in another way. > > [...] Applied to powerpc/topic/ppc-kvm. [13/17] KVM: PPC: replace call_rcu by kfree_rcu for simple kmem_cache_free callback https://git.kernel.org/powerpc/c/1db6a4e8a3fc8ccaa4690272935e02831dc6d40d cheers From shaw.leon at gmail.com Mon Nov 18 14:32:39 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Mon, 18 Nov 2024 22:32:39 +0800 Subject: [PATCH net-next v4 0/5] net: Improve netns handling in RTNL and ip_tunnel Message-ID: <20241118143244.1773-1-shaw.leon@gmail.com> This patch series includes some netns-related improvements and fixes for RTNL and ip_tunnel, to make link creation more intuitive: - Creating link in another net namespace doesn't conflict with link names in current one. - Refector rtnetlink link creation. Create link in target namespace directly. Pass both source and link netns to drivers via newlink() callback. So that # ip link add netns ns1 link-netns ns2 tun0 type gre ... will create tun0 in ns1, rather than create it in ns2 and move to ns1. And don't conflict with another interface named "tun0" in current netns. --- v4: - Pack newlink() parameters to a single struct. - Use ynl async_msg_queue.empty() in selftest. v3: link: https://lore.kernel.org/all/20241113125715.150201-1-shaw.leon at gmail.com/ - Drop "netns_atomic" flag and module parameter. Add netns parameter to newlink() instead, and convert drivers accordingly. - Move python NetNSEnter helper to net selftest lib. v2: link: https://lore.kernel.org/all/20241107133004.7469-1-shaw.leon at gmail.com/ - Check NLM_F_EXCL to ensure only link creation is affected. - Add self tests for link name/ifindex conflict and notifications in different netns. - Changes in dummy driver and ynl in order to add the test case. v1: link: https://lore.kernel.org/all/20241023023146.372653-1-shaw.leon at gmail.com/ Xiao Liang (5): net: ip_tunnel: Build flow in underlay net namespace rtnetlink: Lookup device in target netns when creating link rtnetlink: Decouple net namespaces in rtnl_newlink_create() selftests: net: Add python context manager for netns entering selftests: net: Add two test cases for link netns drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 11 ++++-- drivers/net/amt.c | 13 ++++--- drivers/net/bareudp.c | 11 ++++-- drivers/net/bonding/bond_netlink.c | 8 ++-- drivers/net/can/dev/netlink.c | 4 +- drivers/net/can/vxcan.c | 11 ++++-- .../ethernet/qualcomm/rmnet/rmnet_config.c | 11 ++++-- drivers/net/geneve.c | 11 ++++-- drivers/net/gtp.c | 9 +++-- drivers/net/ipvlan/ipvlan.h | 4 +- drivers/net/ipvlan/ipvlan_main.c | 11 ++++-- drivers/net/ipvlan/ipvtap.c | 7 ++-- drivers/net/macsec.c | 11 ++++-- drivers/net/macvlan.c | 8 ++-- drivers/net/macvtap.c | 8 ++-- drivers/net/netkit.c | 11 ++++-- drivers/net/pfcp.c | 8 ++-- drivers/net/ppp/ppp_generic.c | 10 +++-- drivers/net/team/team_core.c | 7 ++-- drivers/net/veth.c | 11 ++++-- drivers/net/vrf.c | 7 ++-- drivers/net/vxlan/vxlan_core.c | 11 ++++-- drivers/net/wireguard/device.c | 8 ++-- drivers/net/wireless/virtual/virt_wifi.c | 10 +++-- drivers/net/wwan/wwan_core.c | 15 +++++-- include/net/ip_tunnels.h | 5 ++- include/net/rtnetlink.h | 34 +++++++++++++--- net/8021q/vlan_netlink.c | 11 ++++-- net/batman-adv/soft-interface.c | 8 ++-- net/bridge/br_netlink.c | 8 ++-- net/caif/chnl_net.c | 6 +-- net/core/rtnetlink.c | 29 +++++++++----- net/hsr/hsr_netlink.c | 14 ++++--- net/ieee802154/6lowpan/core.c | 9 +++-- net/ipv4/ip_gre.c | 27 ++++++++----- net/ipv4/ip_tunnel.c | 16 ++++---- net/ipv4/ip_vti.c | 10 +++-- net/ipv4/ipip.c | 10 +++-- net/ipv6/ip6_gre.c | 28 +++++++------ net/ipv6/ip6_tunnel.c | 16 ++++---- net/ipv6/ip6_vti.c | 15 ++++--- net/ipv6/sit.c | 16 ++++---- net/xfrm/xfrm_interface_core.c | 14 +++---- tools/testing/selftests/net/Makefile | 1 + .../testing/selftests/net/lib/py/__init__.py | 2 +- tools/testing/selftests/net/lib/py/netns.py | 18 +++++++++ tools/testing/selftests/net/netns-name.sh | 10 +++++ tools/testing/selftests/net/netns_atomic.py | 39 +++++++++++++++++++ 48 files changed, 377 insertions(+), 205 deletions(-) create mode 100755 tools/testing/selftests/net/netns_atomic.py -- 2.47.0 From shaw.leon at gmail.com Mon Nov 18 14:32:40 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Mon, 18 Nov 2024 22:32:40 +0800 Subject: [PATCH net-next v4 1/5] net: ip_tunnel: Build flow in underlay net namespace In-Reply-To: <20241118143244.1773-1-shaw.leon@gmail.com> References: <20241118143244.1773-1-shaw.leon@gmail.com> Message-ID: <20241118143244.1773-2-shaw.leon@gmail.com> Build IPv4 flow in underlay net namespace, where encapsulated packets are routed. Signed-off-by: Xiao Liang --- net/ipv4/ip_tunnel.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 25505f9b724c..09b73acf037a 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -294,7 +294,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev) ip_tunnel_init_flow(&fl4, iph->protocol, iph->daddr, iph->saddr, tunnel->parms.o_key, - iph->tos & INET_DSCP_MASK, dev_net(dev), + iph->tos & INET_DSCP_MASK, tunnel->net, tunnel->parms.link, tunnel->fwmark, 0, 0); rt = ip_route_output_key(tunnel->net, &fl4); @@ -611,7 +611,7 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, } ip_tunnel_init_flow(&fl4, proto, key->u.ipv4.dst, key->u.ipv4.src, tunnel_id_to_key32(key->tun_id), - tos & INET_DSCP_MASK, dev_net(dev), 0, skb->mark, + tos & INET_DSCP_MASK, tunnel->net, 0, skb->mark, skb_get_hash(skb), key->flow_flags); if (!tunnel_hlen) @@ -774,7 +774,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, ip_tunnel_init_flow(&fl4, protocol, dst, tnl_params->saddr, tunnel->parms.o_key, tos & INET_DSCP_MASK, - dev_net(dev), READ_ONCE(tunnel->parms.link), + tunnel->net, READ_ONCE(tunnel->parms.link), tunnel->fwmark, skb_get_hash(skb), 0); if (ip_tunnel_encap(skb, &tunnel->encap, &protocol, &fl4) < 0) -- 2.47.0 From shaw.leon at gmail.com Mon Nov 18 14:32:41 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Mon, 18 Nov 2024 22:32:41 +0800 Subject: [PATCH net-next v4 2/5] rtnetlink: Lookup device in target netns when creating link In-Reply-To: <20241118143244.1773-1-shaw.leon@gmail.com> References: <20241118143244.1773-1-shaw.leon@gmail.com> Message-ID: <20241118143244.1773-3-shaw.leon@gmail.com> When creating link, lookup for existing device in target net namespace instead of current one. For example, two links created by: # ip link add dummy1 type dummy # ip link add netns ns1 dummy1 type dummy should have no conflict since they are in different namespaces. Signed-off-by: Xiao Liang --- net/core/rtnetlink.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index dd142f444659..bc9d0ecd3a1e 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3846,20 +3846,26 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, { struct nlattr ** const tb = tbs->tb; struct net *net = sock_net(skb->sk); + struct net *device_net; struct net_device *dev; struct ifinfomsg *ifm; bool link_specified; + /* When creating, lookup for existing device in target net namespace */ + device_net = (nlh->nlmsg_flags & NLM_F_CREATE) && + (nlh->nlmsg_flags & NLM_F_EXCL) ? + tgt_net : net; + ifm = nlmsg_data(nlh); if (ifm->ifi_index > 0) { link_specified = true; - dev = __dev_get_by_index(net, ifm->ifi_index); + dev = __dev_get_by_index(device_net, ifm->ifi_index); } else if (ifm->ifi_index < 0) { NL_SET_ERR_MSG(extack, "ifindex can't be negative"); return -EINVAL; } else if (tb[IFLA_IFNAME] || tb[IFLA_ALT_IFNAME]) { link_specified = true; - dev = rtnl_dev_get(net, tb); + dev = rtnl_dev_get(device_net, tb); } else { link_specified = false; dev = NULL; -- 2.47.0 From shaw.leon at gmail.com Mon Nov 18 14:32:42 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Mon, 18 Nov 2024 22:32:42 +0800 Subject: [PATCH net-next v4 3/5] rtnetlink: Decouple net namespaces in rtnl_newlink_create() In-Reply-To: <20241118143244.1773-1-shaw.leon@gmail.com> References: <20241118143244.1773-1-shaw.leon@gmail.com> Message-ID: <20241118143244.1773-4-shaw.leon@gmail.com> There are three net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend). Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on whether IFLA_LINK_NETNSID is present. +-------------------+---------+---------+ | IFLA_LINK_NETNSID | src_net | dev_net | +-------------------+---------+---------+ | absent | source | target | +-------------------+---------+---------+ | present | link | link | +-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first. This has some side effects, including extra ifindex allocation, ifname validation and link notifications. There's also an extra step to move the device to target netns. These could be avoided if we create it in target netns at the beginning. On the other hand, the meaning of src_net is ambiguous. It should be the effective link netns by design, but some drivers ignore it and use dev_net instead. This patch refactors netns handling by packing newlink() parameters into a struct, and passing source and link netns as is through this struct. rtnl_newlink_create() now creates devices in target netns directly, so dev_net is always target netns. When determining the effective link netns, in the absence link_net, drivers should look for src_net in general. But for compatibility, drivers that use dev_net will keep current behavior. Signed-off-by: Xiao Liang --- There're some issues found when coverting drivers. Please check if they work as intended: - In amt_newlink() drivers/net/amt.c: amt->net = net; ... amt->stream_dev = dev_get_by_index(net, ... Uses net (src_net actually), but amt_lookup_upper_dev() only searches in dev_net. - In gtp_newlink() in drivers/net/gtp.c: gtp->net = src_net; ... gn = net_generic(dev_net(dev), gtp_net_id); list_add_rcu(>p->list, &gn->gtp_dev_list); Uses src_net, but is linked to list in dev_net. - In pfcp_newlink() in drivers/net/pfcp.c: pfcp->net = net; ... pn = net_generic(dev_net(dev), pfcp_net_id); list_add_rcu(&pfcp->list, &pn->pfcp_dev_list); Same. - In lowpan_newlink() in net/ieee802154/6lowpan/core.c: wdev = dev_get_by_index(dev_net(ldev), nla_get_u32(tb[IFLA_LINK])); Looks for IFLA_LINK in dev_net, but in theory the ifindex is defined in link netns. --- drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 11 +++--- drivers/net/amt.c | 13 ++++--- drivers/net/bareudp.c | 11 +++--- drivers/net/bonding/bond_netlink.c | 8 +++-- drivers/net/can/dev/netlink.c | 4 +-- drivers/net/can/vxcan.c | 11 +++--- .../ethernet/qualcomm/rmnet/rmnet_config.c | 11 +++--- drivers/net/geneve.c | 11 +++--- drivers/net/gtp.c | 9 ++--- drivers/net/ipvlan/ipvlan.h | 4 +-- drivers/net/ipvlan/ipvlan_main.c | 11 +++--- drivers/net/ipvlan/ipvtap.c | 7 ++-- drivers/net/macsec.c | 11 +++--- drivers/net/macvlan.c | 8 ++--- drivers/net/macvtap.c | 8 ++--- drivers/net/netkit.c | 11 +++--- drivers/net/pfcp.c | 8 ++--- drivers/net/ppp/ppp_generic.c | 10 +++--- drivers/net/team/team_core.c | 7 ++-- drivers/net/veth.c | 11 +++--- drivers/net/vrf.c | 7 ++-- drivers/net/vxlan/vxlan_core.c | 11 +++--- drivers/net/wireguard/device.c | 8 ++--- drivers/net/wireless/virtual/virt_wifi.c | 10 +++--- drivers/net/wwan/wwan_core.c | 15 +++++--- include/net/ip_tunnels.h | 5 +-- include/net/rtnetlink.h | 34 ++++++++++++++++--- net/8021q/vlan_netlink.c | 11 +++--- net/batman-adv/soft-interface.c | 8 ++--- net/bridge/br_netlink.c | 8 +++-- net/caif/chnl_net.c | 6 ++-- net/core/rtnetlink.c | 19 ++++++----- net/hsr/hsr_netlink.c | 14 ++++---- net/ieee802154/6lowpan/core.c | 9 ++--- net/ipv4/ip_gre.c | 27 ++++++++++----- net/ipv4/ip_tunnel.c | 10 +++--- net/ipv4/ip_vti.c | 10 +++--- net/ipv4/ipip.c | 10 +++--- net/ipv6/ip6_gre.c | 28 ++++++++------- net/ipv6/ip6_tunnel.c | 16 ++++----- net/ipv6/ip6_vti.c | 15 ++++---- net/ipv6/sit.c | 16 ++++----- net/xfrm/xfrm_interface_core.c | 14 ++++---- 43 files changed, 297 insertions(+), 199 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c index 9ad8d9856275..da587af85d4f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c @@ -97,10 +97,13 @@ static int ipoib_changelink(struct net_device *dev, struct nlattr *tb[], return ret; } -static int ipoib_new_child_link(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ipoib_new_child_link(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct net_device *pdev; struct ipoib_dev_priv *ppriv; u16 child_pkey; @@ -109,7 +112,7 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev, if (!tb[IFLA_LINK]) return -EINVAL; - pdev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); + pdev = __dev_get_by_index(link_net, nla_get_u32(tb[IFLA_LINK])); if (!pdev || pdev->type != ARPHRD_INFINIBAND) return -ENODEV; diff --git a/drivers/net/amt.c b/drivers/net/amt.c index 98c6205ed19f..2f7bf50e05d2 100644 --- a/drivers/net/amt.c +++ b/drivers/net/amt.c @@ -3161,14 +3161,17 @@ static int amt_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } -static int amt_newlink(struct net *net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int amt_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct amt_dev *amt = netdev_priv(dev); int err = -EINVAL; - amt->net = net; + amt->net = link_net; amt->mode = nla_get_u32(data[IFLA_AMT_MODE]); if (data[IFLA_AMT_MAX_TUNNELS] && @@ -3183,7 +3186,7 @@ static int amt_newlink(struct net *net, struct net_device *dev, amt->hash_buckets = AMT_HSIZE; amt->nr_tunnels = 0; get_random_bytes(&amt->hash_seed, sizeof(amt->hash_seed)); - amt->stream_dev = dev_get_by_index(net, + amt->stream_dev = dev_get_by_index(link_net, nla_get_u32(data[IFLA_AMT_LINK])); if (!amt->stream_dev) { NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_AMT_LINK], diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c index a2abfade82dd..49ff79498129 100644 --- a/drivers/net/bareudp.c +++ b/drivers/net/bareudp.c @@ -698,10 +698,13 @@ static void bareudp_dellink(struct net_device *dev, struct list_head *head) unregister_netdevice_queue(dev, head); } -static int bareudp_newlink(struct net *net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int bareudp_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct bareudp_conf conf; int err; @@ -709,7 +712,7 @@ static int bareudp_newlink(struct net *net, struct net_device *dev, if (err) return err; - err = bareudp_configure(net, dev, &conf, extack); + err = bareudp_configure(link_net, dev, &conf, extack); if (err) return err; diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c index 2a6a424806aa..db3062c6dbe0 100644 --- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -564,10 +564,12 @@ static int bond_changelink(struct net_device *bond_dev, struct nlattr *tb[], return 0; } -static int bond_newlink(struct net *src_net, struct net_device *bond_dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int bond_newlink(struct rtnl_newlink_params *params) { + struct net_device *bond_dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; int err; err = bond_changelink(bond_dev, tb, data, extack); diff --git a/drivers/net/can/dev/netlink.c b/drivers/net/can/dev/netlink.c index 01aacdcda260..52dae0e94858 100644 --- a/drivers/net/can/dev/netlink.c +++ b/drivers/net/can/dev/netlink.c @@ -624,9 +624,7 @@ static int can_fill_xstats(struct sk_buff *skb, const struct net_device *dev) return -EMSGSIZE; } -static int can_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int can_newlink(struct rtnl_newlink_params *params) { return -EOPNOTSUPP; } diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c index da7c72105fb6..8727701d5055 100644 --- a/drivers/net/can/vxcan.c +++ b/drivers/net/can/vxcan.c @@ -172,10 +172,13 @@ static void vxcan_setup(struct net_device *dev) /* forward declaration for rtnl_create_link() */ static struct rtnl_link_ops vxcan_link_ops; -static int vxcan_newlink(struct net *net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int vxcan_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct vxcan_priv *priv; struct net_device *peer; struct net *peer_net; @@ -203,7 +206,7 @@ static int vxcan_newlink(struct net *net, struct net_device *dev, name_assign_type = NET_NAME_ENUM; } - peer_net = rtnl_link_get_net(net, tbp); + peer_net = rtnl_link_get_net(link_net, tbp); peer = rtnl_create_link(peer_net, ifname, name_assign_type, &vxcan_link_ops, tbp, extack); if (IS_ERR(peer)) { diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c index f3bea196a8f9..d45555d784e6 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c @@ -117,10 +117,13 @@ static void rmnet_unregister_bridge(struct rmnet_port *port) rmnet_unregister_real_device(bridge_dev); } -static int rmnet_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int rmnet_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); u32 data_format = RMNET_FLAGS_INGRESS_DEAGGREGATION; struct net_device *real_dev; int mode = RMNET_EPMODE_VND; @@ -134,7 +137,7 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev, return -EINVAL; } - real_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); + real_dev = __dev_get_by_index(link_net, nla_get_u32(tb[IFLA_LINK])); if (!real_dev) { NL_SET_ERR_MSG_MOD(extack, "link does not exist"); return -ENODEV; diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 2f29b1386b1c..a74962d5f066 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1614,10 +1614,13 @@ static void geneve_link_config(struct net_device *dev, geneve_change_mtu(dev, ldev_mtu - info->options_len); } -static int geneve_newlink(struct net *net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int geneve_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct geneve_config cfg = { .df = GENEVE_DF_UNSET, .use_udp6_rx_checksums = false, @@ -1631,7 +1634,7 @@ static int geneve_newlink(struct net *net, struct net_device *dev, if (err) return err; - err = geneve_configure(net, dev, extack, &cfg); + err = geneve_configure(link_net, dev, extack, &cfg); if (err) return err; diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 89a996ad8cd0..3eb1bc3ac124 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1460,10 +1460,11 @@ static int gtp_create_sockets(struct gtp_dev *gtp, const struct nlattr *nla, #define GTP_TH_MAXLEN (sizeof(struct udphdr) + sizeof(struct gtp0_header)) #define GTP_IPV6_MAXLEN (sizeof(struct ipv6hdr) + GTP_TH_MAXLEN) -static int gtp_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int gtp_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **data = params->data; + struct net *link_net = rtnl_newlink_link_net(params); unsigned int role = GTP_ROLE_GGSN; struct gtp_dev *gtp; struct gtp_net *gn; @@ -1494,7 +1495,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev, gtp->restart_count = nla_get_u8_default(data[IFLA_GTP_RESTART_COUNT], 0); - gtp->net = src_net; + gtp->net = link_net; err = gtp_hashtable_new(gtp, hashsize); if (err < 0) diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h index 025e0c19ec25..beff25a1d6f0 100644 --- a/drivers/net/ipvlan/ipvlan.h +++ b/drivers/net/ipvlan/ipvlan.h @@ -166,9 +166,7 @@ struct ipvl_addr *ipvlan_addr_lookup(struct ipvl_port *port, void *lyr3h, void *ipvlan_get_L3_hdr(struct ipvl_port *port, struct sk_buff *skb, int *type); void ipvlan_count_rx(const struct ipvl_dev *ipvlan, unsigned int len, bool success, bool mcast); -int ipvlan_link_new(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack); +int ipvlan_link_new(struct rtnl_newlink_params *params); void ipvlan_link_delete(struct net_device *dev, struct list_head *head); void ipvlan_link_setup(struct net_device *dev); int ipvlan_link_register(struct rtnl_link_ops *ops); diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index ee2c3cf4df36..53860e9d08b1 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -532,10 +532,13 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb, return ret; } -int ipvlan_link_new(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +int ipvlan_link_new(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct ipvl_dev *ipvlan = netdev_priv(dev); struct ipvl_port *port; struct net_device *phy_dev; @@ -545,7 +548,7 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev, if (!tb[IFLA_LINK]) return -EINVAL; - phy_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); + phy_dev = __dev_get_by_index(link_net, nla_get_u32(tb[IFLA_LINK])); if (!phy_dev) return -ENODEV; diff --git a/drivers/net/ipvlan/ipvtap.c b/drivers/net/ipvlan/ipvtap.c index 1afc4c47be73..69e7456a48ca 100644 --- a/drivers/net/ipvlan/ipvtap.c +++ b/drivers/net/ipvlan/ipvtap.c @@ -73,10 +73,9 @@ static void ipvtap_update_features(struct tap_dev *tap, netdev_update_features(vlan->dev); } -static int ipvtap_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ipvtap_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; struct ipvtap_dev *vlantap = netdev_priv(dev); int err; @@ -97,7 +96,7 @@ static int ipvtap_newlink(struct net *src_net, struct net_device *dev, /* Don't put anything that may fail after macvlan_common_newlink * because we can't undo what it does. */ - err = ipvlan_link_new(src_net, dev, tb, data, extack); + err = ipvlan_link_new(params); if (err) { netdev_rx_handler_unregister(dev); return err; diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 1bc1e5993f56..e8b147fe4fce 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -4141,10 +4141,13 @@ static int macsec_add_dev(struct net_device *dev, sci_t sci, u8 icv_len) static struct lock_class_key macsec_netdev_addr_lock_key; -static int macsec_newlink(struct net *net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int macsec_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct macsec_dev *macsec = macsec_priv(dev); rx_handler_func_t *rx_handler; u8 icv_len = MACSEC_DEFAULT_ICV_LEN; @@ -4154,7 +4157,7 @@ static int macsec_newlink(struct net *net, struct net_device *dev, if (!tb[IFLA_LINK]) return -EINVAL; - real_dev = __dev_get_by_index(net, nla_get_u32(tb[IFLA_LINK])); + real_dev = __dev_get_by_index(link_net, nla_get_u32(tb[IFLA_LINK])); if (!real_dev) return -ENODEV; if (real_dev->type != ARPHRD_ETHER) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index fed4fe2a4748..7050a061b2b9 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1565,11 +1565,11 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev, } EXPORT_SYMBOL_GPL(macvlan_common_newlink); -static int macvlan_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int macvlan_newlink(struct rtnl_newlink_params *params) { - return macvlan_common_newlink(src_net, dev, tb, data, extack); + return macvlan_common_newlink(rtnl_newlink_link_net(params), + params->dev, params->tb, params->data, + params->extack); } void macvlan_dellink(struct net_device *dev, struct list_head *head) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 29a5929d48e5..213a16719c5a 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -77,10 +77,9 @@ static void macvtap_update_features(struct tap_dev *tap, netdev_update_features(vlan->dev); } -static int macvtap_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int macvtap_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; struct macvtap_dev *vlantap = netdev_priv(dev); int err; @@ -105,7 +104,8 @@ static int macvtap_newlink(struct net *src_net, struct net_device *dev, /* Don't put anything that may fail after macvlan_common_newlink * because we can't undo what it does. */ - err = macvlan_common_newlink(src_net, dev, tb, data, extack); + err = macvlan_common_newlink(rtnl_newlink_link_net(params), dev, + params->tb, params->data, params->extack); if (err) { netdev_rx_handler_unregister(dev); return err; diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c index bb07725d1c72..a50cdbe97588 100644 --- a/drivers/net/netkit.c +++ b/drivers/net/netkit.c @@ -327,10 +327,13 @@ static int netkit_validate(struct nlattr *tb[], struct nlattr *data[], static struct rtnl_link_ops netkit_link_ops; -static int netkit_new_link(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int netkit_new_link(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct nlattr *peer_tb[IFLA_MAX + 1], **tbp = tb, *attr; enum netkit_action policy_prim = NETKIT_PASS; enum netkit_action policy_peer = NETKIT_PASS; @@ -385,7 +388,7 @@ static int netkit_new_link(struct net *src_net, struct net_device *dev, (tb[IFLA_ADDRESS] || tbp[IFLA_ADDRESS])) return -EOPNOTSUPP; - net = rtnl_link_get_net(src_net, tbp); + net = rtnl_link_get_net(link_net, tbp); peer = rtnl_create_link(net, ifname, ifname_assign_type, &netkit_link_ops, tbp, extack); if (IS_ERR(peer)) { diff --git a/drivers/net/pfcp.c b/drivers/net/pfcp.c index 69434fd13f96..8576d5117233 100644 --- a/drivers/net/pfcp.c +++ b/drivers/net/pfcp.c @@ -184,15 +184,15 @@ static int pfcp_add_sock(struct pfcp_dev *pfcp) return PTR_ERR_OR_ZERO(pfcp->sock); } -static int pfcp_newlink(struct net *net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int pfcp_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct net *link_net = rtnl_newlink_link_net(params); struct pfcp_dev *pfcp = netdev_priv(dev); struct pfcp_net *pn; int err; - pfcp->net = net; + pfcp->net = link_net; err = pfcp_add_sock(pfcp); if (err) { diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 4583e15ad03a..a0ace8aa5b5d 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -1303,10 +1303,12 @@ static int ppp_nl_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } -static int ppp_nl_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ppp_nl_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct net *link_net = rtnl_newlink_link_net(params); struct ppp_config conf = { .unit = -1, .ifname_is_set = true, @@ -1343,7 +1345,7 @@ static int ppp_nl_newlink(struct net *src_net, struct net_device *dev, if (!tb[IFLA_IFNAME] || !nla_len(tb[IFLA_IFNAME]) || !*(char *)nla_data(tb[IFLA_IFNAME])) conf.ifname_is_set = false; - err = ppp_dev_configure(src_net, dev, &conf); + err = ppp_dev_configure(link_net, dev, &conf); out_unlock: mutex_unlock(&ppp_mutex); diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index a1b27b69f010..c9ee70030517 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -2206,10 +2206,11 @@ static void team_setup(struct net_device *dev) dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; } -static int team_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int team_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + if (tb[IFLA_ADDRESS] == NULL) eth_hw_addr_random(dev); diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 0d6d0d749d44..90f8a773a256 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1765,10 +1765,13 @@ static int veth_init_queues(struct net_device *dev, struct nlattr *tb[]) return 0; } -static int veth_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int veth_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); int err; struct net_device *peer; struct veth_priv *priv; @@ -1800,7 +1803,7 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, name_assign_type = NET_NAME_ENUM; } - net = rtnl_link_get_net(src_net, tbp); + net = rtnl_link_get_net(link_net, tbp); peer = rtnl_create_link(net, ifname, name_assign_type, &veth_link_ops, tbp, extack); if (IS_ERR(peer)) { diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 67d25f4f94ef..f45bc224fd15 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1698,10 +1698,11 @@ static void vrf_dellink(struct net_device *dev, struct list_head *head) unregister_netdevice_queue(dev, head); } -static int vrf_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int vrf_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; struct net_vrf *vrf = netdev_priv(dev); struct netns_vrf *nn_vrf; bool *add_fib_rules; diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c index 9ea63059d52d..ba16ff9b1553 100644 --- a/drivers/net/vxlan/vxlan_core.c +++ b/drivers/net/vxlan/vxlan_core.c @@ -4351,10 +4351,13 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[], return 0; } -static int vxlan_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int vxlan_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct vxlan_config conf; int err; @@ -4362,7 +4365,7 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev, if (err) return err; - return __vxlan_dev_create(src_net, dev, &conf, extack); + return __vxlan_dev_create(link_net, dev, &conf, extack); } static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[], diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c index 45e9b908dbfb..3e22f715422b 100644 --- a/drivers/net/wireguard/device.c +++ b/drivers/net/wireguard/device.c @@ -306,14 +306,14 @@ static void wg_setup(struct net_device *dev) wg->dev = dev; } -static int wg_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int wg_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct net *link_net = rtnl_newlink_link_net(params); struct wg_device *wg = netdev_priv(dev); int ret = -ENOMEM; - rcu_assign_pointer(wg->creating_net, src_net); + rcu_assign_pointer(wg->creating_net, link_net); init_rwsem(&wg->static_identity.lock); mutex_init(&wg->socket_update_lock); mutex_init(&wg->device_update_lock); diff --git a/drivers/net/wireless/virtual/virt_wifi.c b/drivers/net/wireless/virtual/virt_wifi.c index 4ee374080466..107dc503b4f2 100644 --- a/drivers/net/wireless/virtual/virt_wifi.c +++ b/drivers/net/wireless/virtual/virt_wifi.c @@ -519,10 +519,12 @@ static rx_handler_result_t virt_wifi_rx_handler(struct sk_buff **pskb) } /* Called with rtnl lock held. */ -static int virt_wifi_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int virt_wifi_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct virt_wifi_netdev_priv *priv = netdev_priv(dev); int err; @@ -532,7 +534,7 @@ static int virt_wifi_newlink(struct net *src_net, struct net_device *dev, netif_carrier_off(dev); priv->upperdev = dev; - priv->lowerdev = __dev_get_by_index(src_net, + priv->lowerdev = __dev_get_by_index(link_net, nla_get_u32(tb[IFLA_LINK])); if (!priv->lowerdev) diff --git a/drivers/net/wwan/wwan_core.c b/drivers/net/wwan/wwan_core.c index a51e2755991a..450cf2e253e4 100644 --- a/drivers/net/wwan/wwan_core.c +++ b/drivers/net/wwan/wwan_core.c @@ -967,10 +967,11 @@ static struct net_device *wwan_rtnl_alloc(struct nlattr *tb[], return dev; } -static int wwan_rtnl_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int wwan_rtnl_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; struct wwan_device *wwandev = wwan_dev_get_by_parent(dev->dev.parent); u32 link_id = nla_get_u32(data[IFLA_WWAN_LINK_ID]); struct wwan_netdev_priv *priv = netdev_priv(dev); @@ -1064,6 +1065,11 @@ static void wwan_create_default_link(struct wwan_device *wwandev, struct net_device *dev; struct nlmsghdr *nlh; struct sk_buff *msg; + struct rtnl_newlink_params params = { + .src_net = &init_net, + .tb = tb, + .data = data, + }; /* Forge attributes required to create a WWAN netdev. We first * build a netlink message and then parse it. This looks @@ -1105,7 +1111,8 @@ static void wwan_create_default_link(struct wwan_device *wwandev, if (WARN_ON(IS_ERR(dev))) goto unlock; - if (WARN_ON(wwan_rtnl_newlink(&init_net, dev, tb, data, NULL))) { + params.dev = dev; + if (WARN_ON(wwan_rtnl_newlink(¶ms))) { free_netdev(dev); goto unlock; } diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 1aa31bdb2b31..ae1f2dda4533 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -406,8 +406,9 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb, bool log_ecn_error); int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[], struct ip_tunnel_parm_kern *p, __u32 fwmark); -int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[], - struct ip_tunnel_parm_kern *p, __u32 fwmark); +int ip_tunnel_newlink(struct net *net, struct net_device *dev, + struct nlattr *tb[], struct ip_tunnel_parm_kern *p, + __u32 fwmark); void ip_tunnel_setup(struct net_device *dev, unsigned int net_id); bool ip_tunnel_netlink_encap_parms(struct nlattr *data[], diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index bc0069a8b6ea..fb28e538dd2d 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -69,6 +69,34 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh) return AF_UNSPEC; } +/** + * struct rtnl_newlink_params - parameters of rtnl_link_ops::newlink() + * + * @src_net: Source netns of rtnetlink socket + * @link_net: Link netns by IFLA_LINK_NETNSID, NULL if not specified + * @dev: The net_device being created + * @tb: IFLA_* attributes + * @data: IFLA_INFO_DATA attributes + * @extack: Netlink extended ACK + */ +struct rtnl_newlink_params { + struct net *src_net; + struct net *link_net; + struct net_device *dev; + struct nlattr **tb; + struct nlattr **data; + struct netlink_ext_ack *extack; +}; + +/* Get effective link netns from newlink params. Generally, this is link_net + * and falls back to src_net. But for compatibility, a driver may * choose to + * use dev_net(dev) instead. + */ +static inline struct net *rtnl_newlink_link_net(struct rtnl_newlink_params *p) +{ + return p->link_net ? : p->src_net; +} + /** * struct rtnl_link_ops - rtnetlink link operations * @@ -125,11 +153,7 @@ struct rtnl_link_ops { struct nlattr *data[], struct netlink_ext_ack *extack); - int (*newlink)(struct net *src_net, - struct net_device *dev, - struct nlattr *tb[], - struct nlattr *data[], - struct netlink_ext_ack *extack); + int (*newlink)(struct rtnl_newlink_params *params); int (*changelink)(struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c index 134419667d59..603b2ea1e2b4 100644 --- a/net/8021q/vlan_netlink.c +++ b/net/8021q/vlan_netlink.c @@ -135,10 +135,13 @@ static int vlan_changelink(struct net_device *dev, struct nlattr *tb[], return 0; } -static int vlan_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int vlan_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); struct vlan_dev_priv *vlan = vlan_dev_priv(dev); struct net_device *real_dev; unsigned int max_mtu; @@ -155,7 +158,7 @@ static int vlan_newlink(struct net *src_net, struct net_device *dev, return -EINVAL; } - real_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); + real_dev = __dev_get_by_index(link_net, nla_get_u32(tb[IFLA_LINK])); if (!real_dev) { NL_SET_ERR_MSG_MOD(extack, "link does not exist"); return -ENODEV; diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 2758aba47a2f..12eb939cf0e4 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -1063,7 +1063,7 @@ static int batadv_softif_validate(struct nlattr *tb[], struct nlattr *data[], /** * batadv_softif_newlink() - pre-initialize and register new batadv link - * @src_net: the applicable net namespace + * @nets: the applicable net namespaces * @dev: network device to register * @tb: IFLA_INFO_DATA netlink attributes * @data: enum batadv_ifla_attrs attributes @@ -1071,10 +1071,10 @@ static int batadv_softif_validate(struct nlattr *tb[], struct nlattr *data[], * * Return: 0 if successful or error otherwise. */ -static int batadv_softif_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int batadv_softif_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **data = params->data; struct batadv_priv *bat_priv = netdev_priv(dev); const char *algo_name; int err; diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 3e0f47203f2a..ccce5119b28d 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -1553,10 +1553,12 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], return 0; } -static int br_dev_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int br_dev_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; struct net_bridge *br = netdev_priv(dev); int err; diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c index 94ad09e36df2..748e38908709 100644 --- a/net/caif/chnl_net.c +++ b/net/caif/chnl_net.c @@ -438,10 +438,10 @@ static void caif_netlink_parms(struct nlattr *data[], } } -static int ipcaif_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ipcaif_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **data = params->data; int ret; struct chnl_net *caifdev; ASSERT_RTNL(); diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index bc9d0ecd3a1e..232ddce447b7 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3750,6 +3750,13 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm, struct net_device *dev; char ifname[IFNAMSIZ]; int err; + struct rtnl_newlink_params params = { + .src_net = net, + .link_net = link_net, + .tb = tb, + .data = data, + .extack = extack, + }; if (!ops->alloc && !ops->setup) return -EOPNOTSUPP; @@ -3761,17 +3768,18 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm, name_assign_type = NET_NAME_ENUM; } - dev = rtnl_create_link(link_net ? : tgt_net, ifname, - name_assign_type, ops, tb, extack); + dev = rtnl_create_link(tgt_net, ifname, name_assign_type, ops, tb, + extack); if (IS_ERR(dev)) { err = PTR_ERR(dev); goto out; } dev->ifindex = ifm->ifi_index; + params.dev = dev; if (ops->newlink) - err = ops->newlink(link_net ? : net, dev, tb, data, extack); + err = ops->newlink(¶ms); else err = register_netdevice(dev); if (err < 0) { @@ -3782,11 +3790,6 @@ static int rtnl_newlink_create(struct sk_buff *skb, struct ifinfomsg *ifm, err = rtnl_configure_link(dev, ifm, portid, nlh); if (err < 0) goto out_unregister; - if (link_net) { - err = dev_change_net_namespace(dev, tgt_net, ifname); - if (err < 0) - goto out_unregister; - } if (tb[IFLA_MASTER]) { err = do_set_master(dev, nla_get_u32(tb[IFLA_MASTER]), extack); if (err) diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c index b68f2f71d0e1..694392222637 100644 --- a/net/hsr/hsr_netlink.c +++ b/net/hsr/hsr_netlink.c @@ -29,10 +29,12 @@ static const struct nla_policy hsr_policy[IFLA_HSR_MAX + 1] = { /* Here, it seems a netdevice has already been allocated for us, and the * hsr_dev_setup routine has been executed. Nice! */ -static int hsr_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int hsr_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *link_net = rtnl_newlink_link_net(params); enum hsr_version proto_version; unsigned char multicast_spec; u8 proto = HSR_PROTOCOL_HSR; @@ -46,7 +48,7 @@ static int hsr_newlink(struct net *src_net, struct net_device *dev, NL_SET_ERR_MSG_MOD(extack, "Slave1 device not specified"); return -EINVAL; } - link[0] = __dev_get_by_index(src_net, + link[0] = __dev_get_by_index(link_net, nla_get_u32(data[IFLA_HSR_SLAVE1])); if (!link[0]) { NL_SET_ERR_MSG_MOD(extack, "Slave1 does not exist"); @@ -56,7 +58,7 @@ static int hsr_newlink(struct net *src_net, struct net_device *dev, NL_SET_ERR_MSG_MOD(extack, "Slave2 device not specified"); return -EINVAL; } - link[1] = __dev_get_by_index(src_net, + link[1] = __dev_get_by_index(link_net, nla_get_u32(data[IFLA_HSR_SLAVE2])); if (!link[1]) { NL_SET_ERR_MSG_MOD(extack, "Slave2 does not exist"); @@ -69,7 +71,7 @@ static int hsr_newlink(struct net *src_net, struct net_device *dev, } if (data[IFLA_HSR_INTERLINK]) - interlink = __dev_get_by_index(src_net, + interlink = __dev_get_by_index(link_net, nla_get_u32(data[IFLA_HSR_INTERLINK])); if (interlink && interlink == link[0]) { diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c index 175efd860f7b..65a5c61cf38c 100644 --- a/net/ieee802154/6lowpan/core.c +++ b/net/ieee802154/6lowpan/core.c @@ -129,10 +129,10 @@ static int lowpan_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } -static int lowpan_newlink(struct net *src_net, struct net_device *ldev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int lowpan_newlink(struct rtnl_newlink_params *params) { + struct net_device *ldev = params->dev; + struct nlattr **tb = params->tb; struct net_device *wdev; int ret; @@ -143,7 +143,8 @@ static int lowpan_newlink(struct net *src_net, struct net_device *ldev, if (!tb[IFLA_LINK]) return -EINVAL; /* find and hold wpan device */ - wdev = dev_get_by_index(dev_net(ldev), nla_get_u32(tb[IFLA_LINK])); + wdev = dev_get_by_index(params->link_net ? : dev_net(ldev), + nla_get_u32(tb[IFLA_LINK])); if (!wdev) return -ENODEV; if (wdev->type != ARPHRD_IEEE802154) { diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index f1f31ebfc793..4a3f8e450ef5 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -1389,10 +1389,12 @@ ipgre_newlink_encap_setup(struct net_device *dev, struct nlattr *data[]) return 0; } -static int ipgre_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ipgre_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct net *net = params->link_net ? : dev_net(dev); struct ip_tunnel_parm_kern p; __u32 fwmark = 0; int err; @@ -1404,13 +1406,15 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev, err = ipgre_netlink_parms(dev, data, tb, &p, &fwmark); if (err < 0) return err; - return ip_tunnel_newlink(dev, tb, &p, fwmark); + return ip_tunnel_newlink(net, dev, tb, &p, fwmark); } -static int erspan_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int erspan_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct net *net = params->link_net ? : dev_net(dev); struct ip_tunnel_parm_kern p; __u32 fwmark = 0; int err; @@ -1422,7 +1426,7 @@ static int erspan_newlink(struct net *src_net, struct net_device *dev, err = erspan_netlink_parms(dev, data, tb, &p, &fwmark); if (err) return err; - return ip_tunnel_newlink(dev, tb, &p, fwmark); + return ip_tunnel_newlink(net, dev, tb, &p, fwmark); } static int ipgre_changelink(struct net_device *dev, struct nlattr *tb[], @@ -1695,6 +1699,10 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name, LIST_HEAD(list_kill); struct ip_tunnel *t; int err; + struct rtnl_newlink_params params = { + .src_net = net, + .tb = tb, + }; memset(&tb, 0, sizeof(tb)); @@ -1707,7 +1715,8 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name, t = netdev_priv(dev); t->collect_md = true; - err = ipgre_newlink(net, dev, tb, NULL, NULL); + params.dev = dev; + err = ipgre_newlink(¶ms); if (err < 0) { free_netdev(dev); return ERR_PTR(err); diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 09b73acf037a..618a50d5c0c2 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -1213,11 +1213,11 @@ void ip_tunnel_delete_nets(struct list_head *net_list, unsigned int id, } EXPORT_SYMBOL_GPL(ip_tunnel_delete_nets); -int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[], - struct ip_tunnel_parm_kern *p, __u32 fwmark) +int ip_tunnel_newlink(struct net *net, struct net_device *dev, + struct nlattr *tb[], struct ip_tunnel_parm_kern *p, + __u32 fwmark) { struct ip_tunnel *nt; - struct net *net = dev_net(dev); struct ip_tunnel_net *itn; int mtu; int err; @@ -1326,7 +1326,9 @@ int ip_tunnel_init(struct net_device *dev) } tunnel->dev = dev; - tunnel->net = dev_net(dev); + if (!tunnel->net) + tunnel->net = dev_net(dev); + strscpy(tunnel->parms.name, dev->name); iph->version = 4; iph->ihl = 5; diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index f0b4419cef34..b567e2375302 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -575,15 +575,17 @@ static void vti_netlink_parms(struct nlattr *data[], *fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]); } -static int vti_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int vti_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; struct ip_tunnel_parm_kern parms; __u32 fwmark = 0; vti_netlink_parms(data, &parms, &fwmark); - return ip_tunnel_newlink(dev, tb, &parms, fwmark); + return ip_tunnel_newlink(params->link_net ? : dev_net(dev), dev, tb, + &parms, fwmark); } static int vti_changelink(struct net_device *dev, struct nlattr *tb[], diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index dc0db5895e0e..9dccaa0d6ba7 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -436,10 +436,11 @@ static void ipip_netlink_parms(struct nlattr *data[], *fwmark = nla_get_u32(data[IFLA_IPTUN_FWMARK]); } -static int ipip_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ipip_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; struct ip_tunnel *t = netdev_priv(dev); struct ip_tunnel_encap ipencap; struct ip_tunnel_parm_kern p; @@ -453,7 +454,8 @@ static int ipip_newlink(struct net *src_net, struct net_device *dev, } ipip_netlink_parms(data, &p, &t->collect_md, &fwmark); - return ip_tunnel_newlink(dev, tb, &p, fwmark); + return ip_tunnel_newlink(params->link_net ? : dev_net(dev), dev, tb, &p, + fwmark); } static int ipip_changelink(struct net_device *dev, struct nlattr *tb[], diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 235808cfec70..7d6d3db200a1 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -1971,7 +1971,7 @@ static bool ip6gre_netlink_encap_parms(struct nlattr *data[], return ret; } -static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev, +static int ip6gre_newlink_common(struct net *link_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -1992,7 +1992,7 @@ static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev, eth_hw_addr_random(dev); nt->dev = dev; - nt->net = dev_net(dev); + nt->net = link_net; err = register_netdevice(dev); if (err) @@ -2005,12 +2005,14 @@ static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev, return err; } -static int ip6gre_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ip6gre_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; struct ip6_tnl *nt = netdev_priv(dev); - struct net *net = dev_net(dev); + struct net *net = params->link_net ? : dev_net(dev); struct ip6gre_net *ign; int err; @@ -2025,7 +2027,7 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev, return -EEXIST; } - err = ip6gre_newlink_common(src_net, dev, tb, data, extack); + err = ip6gre_newlink_common(net, dev, tb, data, extack); if (!err) { ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]); ip6gre_tunnel_link_md(ign, nt); @@ -2241,12 +2243,14 @@ static void ip6erspan_tap_setup(struct net_device *dev) netif_keep_dst(dev); } -static int ip6erspan_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ip6erspan_newlink(struct rtnl_newlink_params *params) { + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; struct ip6_tnl *nt = netdev_priv(dev); - struct net *net = dev_net(dev); + struct net *net = params->link_net ? : dev_net(dev); struct ip6gre_net *ign; int err; @@ -2262,7 +2266,7 @@ static int ip6erspan_newlink(struct net *src_net, struct net_device *dev, return -EEXIST; } - err = ip6gre_newlink_common(src_net, dev, tb, data, extack); + err = ip6gre_newlink_common(net, dev, tb, data, extack); if (!err) { ip6erspan_tnl_link_config(nt, !tb[IFLA_MTU]); ip6erspan_tunnel_link_md(ign, nt); diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 48fd53b98972..33a58c3c9ebe 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -250,10 +250,9 @@ static void ip6_dev_free(struct net_device *dev) dst_cache_destroy(&t->dst_cache); } -static int ip6_tnl_create2(struct net_device *dev) +static int ip6_tnl_create2(struct net *net, struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); - struct net *net = dev_net(dev); struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id); int err; @@ -308,7 +307,7 @@ static struct ip6_tnl *ip6_tnl_create(struct net *net, struct __ip6_tnl_parm *p) t = netdev_priv(dev); t->parms = *p; t->net = dev_net(dev); - err = ip6_tnl_create2(dev); + err = ip6_tnl_create2(net, dev); if (err < 0) goto failed_free; @@ -2002,11 +2001,12 @@ static void ip6_tnl_netlink_parms(struct nlattr *data[], parms->fwmark = nla_get_u32(data[IFLA_IPTUN_FWMARK]); } -static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ip6_tnl_newlink(struct rtnl_newlink_params *params) { - struct net *net = dev_net(dev); + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct net *net = params->link_net ? : dev_net(dev); struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id); struct ip_tunnel_encap ipencap; struct ip6_tnl *nt, *t; @@ -2031,7 +2031,7 @@ static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev, return -EEXIST; } - err = ip6_tnl_create2(dev); + err = ip6_tnl_create2(net, dev); if (!err && tb[IFLA_MTU]) ip6_tnl_change_mtu(dev, nla_get_u32(tb[IFLA_MTU])); diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index 590737c27537..ff9dc74819c5 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -174,10 +174,9 @@ vti6_tnl_unlink(struct vti6_net *ip6n, struct ip6_tnl *t) } } -static int vti6_tnl_create2(struct net_device *dev) +static int vti6_tnl_create2(struct net *net, struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); - struct net *net = dev_net(dev); struct vti6_net *ip6n = net_generic(net, vti6_net_id); int err; @@ -221,7 +220,7 @@ static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p t->parms = *p; t->net = dev_net(dev); - err = vti6_tnl_create2(dev); + err = vti6_tnl_create2(net, dev); if (err < 0) goto failed_free; @@ -997,11 +996,11 @@ static void vti6_netlink_parms(struct nlattr *data[], parms->fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]); } -static int vti6_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int vti6_newlink(struct rtnl_newlink_params *params) { - struct net *net = dev_net(dev); + struct net_device *dev = params->dev; + struct nlattr **data = params->data; + struct net *net = params->link_net ? : dev_net(dev); struct ip6_tnl *nt; nt = netdev_priv(dev); @@ -1012,7 +1011,7 @@ static int vti6_newlink(struct net *src_net, struct net_device *dev, if (vti6_locate(net, &nt->parms, 0)) return -EEXIST; - return vti6_tnl_create2(dev); + return vti6_tnl_create2(net, dev); } static void vti6_dellink(struct net_device *dev, struct list_head *head) diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 39bd8951bfca..cbcaccbfc3c9 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -198,10 +198,9 @@ static void ipip6_tunnel_clone_6rd(struct net_device *dev, struct sit_net *sitn) #endif } -static int ipip6_tunnel_create(struct net_device *dev) +static int ipip6_tunnel_create(struct net *net, struct net_device *dev) { struct ip_tunnel *t = netdev_priv(dev); - struct net *net = dev_net(dev); struct sit_net *sitn = net_generic(net, sit_net_id); int err; @@ -270,7 +269,7 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net, nt = netdev_priv(dev); nt->parms = *parms; - if (ipip6_tunnel_create(dev) < 0) + if (ipip6_tunnel_create(net, dev) < 0) goto failed_free; if (!parms->name[0]) @@ -1550,11 +1549,12 @@ static bool ipip6_netlink_6rd_parms(struct nlattr *data[], } #endif -static int ipip6_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int ipip6_newlink(struct rtnl_newlink_params *params) { - struct net *net = dev_net(dev); + struct net_device *dev = params->dev; + struct nlattr **tb = params->tb; + struct nlattr **data = params->data; + struct net *net = params->link_net ? : dev_net(dev); struct ip_tunnel *nt; struct ip_tunnel_encap ipencap; #ifdef CONFIG_IPV6_SIT_6RD @@ -1575,7 +1575,7 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev, if (ipip6_tunnel_locate(net, &nt->parms, 0)) return -EEXIST; - err = ipip6_tunnel_create(dev); + err = ipip6_tunnel_create(net, dev); if (err < 0) return err; diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c index 98f1e2b67c76..d1f2674a98c8 100644 --- a/net/xfrm/xfrm_interface_core.c +++ b/net/xfrm/xfrm_interface_core.c @@ -242,10 +242,9 @@ static void xfrmi_dev_free(struct net_device *dev) gro_cells_destroy(&xi->gro_cells); } -static int xfrmi_create(struct net_device *dev) +static int xfrmi_create(struct net *net, struct net_device *dev) { struct xfrm_if *xi = netdev_priv(dev); - struct net *net = dev_net(dev); struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id); int err; @@ -814,11 +813,12 @@ static void xfrmi_netlink_parms(struct nlattr *data[], parms->collect_md = true; } -static int xfrmi_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], - struct netlink_ext_ack *extack) +static int xfrmi_newlink(struct rtnl_newlink_params *params) { - struct net *net = dev_net(dev); + struct net_device *dev = params->dev; + struct nlattr **data = params->data; + struct netlink_ext_ack *extack = params->extack; + struct net *net = params->link_net ? : dev_net(dev); struct xfrm_if_parms p = {}; struct xfrm_if *xi; int err; @@ -851,7 +851,7 @@ static int xfrmi_newlink(struct net *src_net, struct net_device *dev, xi->net = net; xi->dev = dev; - err = xfrmi_create(dev); + err = xfrmi_create(net, dev); return err; } -- 2.47.0 From shaw.leon at gmail.com Mon Nov 18 14:32:43 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Mon, 18 Nov 2024 22:32:43 +0800 Subject: [PATCH net-next v4 4/5] selftests: net: Add python context manager for netns entering In-Reply-To: <20241118143244.1773-1-shaw.leon@gmail.com> References: <20241118143244.1773-1-shaw.leon@gmail.com> Message-ID: <20241118143244.1773-5-shaw.leon@gmail.com> Change netns of current thread and switch back on context exit. For example: with NetNSEnter("ns1"): ip("link add dummy0 type dummy") The command be executed in netns "ns1". Signed-off-by: Xiao Liang --- tools/testing/selftests/net/lib/py/__init__.py | 2 +- tools/testing/selftests/net/lib/py/netns.py | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/lib/py/__init__.py b/tools/testing/selftests/net/lib/py/__init__.py index 54d8f5eba810..e2d6c7b63019 100644 --- a/tools/testing/selftests/net/lib/py/__init__.py +++ b/tools/testing/selftests/net/lib/py/__init__.py @@ -2,7 +2,7 @@ from .consts import KSRC from .ksft import * -from .netns import NetNS +from .netns import NetNS, NetNSEnter from .nsim import * from .utils import * from .ynl import NlError, YnlFamily, EthtoolFamily, NetdevFamily, RtnlFamily diff --git a/tools/testing/selftests/net/lib/py/netns.py b/tools/testing/selftests/net/lib/py/netns.py index ecff85f9074f..8e9317044eef 100644 --- a/tools/testing/selftests/net/lib/py/netns.py +++ b/tools/testing/selftests/net/lib/py/netns.py @@ -1,9 +1,12 @@ # SPDX-License-Identifier: GPL-2.0 from .utils import ip +import ctypes import random import string +libc = ctypes.cdll.LoadLibrary('libc.so.6') + class NetNS: def __init__(self, name=None): @@ -29,3 +32,18 @@ class NetNS: def __repr__(self): return f"NetNS({self.name})" + + +class NetNSEnter: + def __init__(self, ns_name): + self.ns_path = f"/run/netns/{ns_name}" + + def __enter__(self): + self.saved = open("/proc/thread-self/ns/net") + with open(self.ns_path) as ns_file: + libc.setns(ns_file.fileno(), 0) + return self + + def __exit__(self, exc_type, exc_value, traceback): + libc.setns(self.saved.fileno(), 0) + self.saved.close() -- 2.47.0 From shaw.leon at gmail.com Mon Nov 18 14:32:44 2024 From: shaw.leon at gmail.com (Xiao Liang) Date: Mon, 18 Nov 2024 22:32:44 +0800 Subject: [PATCH net-next v4 5/5] selftests: net: Add two test cases for link netns In-Reply-To: <20241118143244.1773-1-shaw.leon@gmail.com> References: <20241118143244.1773-1-shaw.leon@gmail.com> Message-ID: <20241118143244.1773-6-shaw.leon@gmail.com> - Add test for creating link in another netns when a link of the same name and ifindex exists in current netns. - Add test for link netns atomicity - create link directly in target netns, and no notifications should be generated in current netns. Signed-off-by: Xiao Liang --- tools/testing/selftests/net/Makefile | 1 + tools/testing/selftests/net/netns-name.sh | 10 ++++++ tools/testing/selftests/net/netns_atomic.py | 39 +++++++++++++++++++++ 3 files changed, 50 insertions(+) create mode 100755 tools/testing/selftests/net/netns_atomic.py diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 3d487b03c4a0..3aaa7950b0f0 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -34,6 +34,7 @@ TEST_PROGS += gre_gso.sh TEST_PROGS += cmsg_so_mark.sh TEST_PROGS += cmsg_time.sh cmsg_ipv6.sh TEST_PROGS += netns-name.sh +TEST_PROGS += netns_atomic.py TEST_PROGS += nl_netdev.py TEST_PROGS += srv6_end_dt46_l3vpn_test.sh TEST_PROGS += srv6_end_dt4_l3vpn_test.sh diff --git a/tools/testing/selftests/net/netns-name.sh b/tools/testing/selftests/net/netns-name.sh index 6974474c26f3..0be1905d1f2f 100755 --- a/tools/testing/selftests/net/netns-name.sh +++ b/tools/testing/selftests/net/netns-name.sh @@ -78,6 +78,16 @@ ip -netns $NS link show dev $ALT_NAME 2> /dev/null && fail "Can still find alt-name after move" ip -netns $test_ns link del $DEV || fail +# +# Test no conflict of the same name/ifindex in different netns +# +ip -netns $NS link add name $DEV index 100 type dummy || fail +ip -netns $NS link add netns $test_ns name $DEV index 100 type dummy || + fail "Can create in netns without moving" +ip -netns $test_ns link show dev $DEV >> /dev/null || fail "Device not found" +ip -netns $NS link del $DEV || fail +ip -netns $test_ns link del $DEV || fail + echo -ne "$(basename $0) \t\t\t\t" if [ $RET_CODE -eq 0 ]; then echo "[ OK ]" diff --git a/tools/testing/selftests/net/netns_atomic.py b/tools/testing/selftests/net/netns_atomic.py new file mode 100755 index 000000000000..d350a3fc0a91 --- /dev/null +++ b/tools/testing/selftests/net/netns_atomic.py @@ -0,0 +1,39 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +import time + +from lib.py import ksft_run, ksft_exit, ksft_true +from lib.py import ip +from lib.py import NetNS, NetNSEnter +from lib.py import RtnlFamily + + +def test_event(ns1, ns2) -> None: + with NetNSEnter(str(ns1)): + rtnl = RtnlFamily() + + rtnl.ntf_subscribe("rtnlgrp-link") + + ip(f"netns set {ns1} 0", ns=str(ns2)) + + ip(f"link add netns {ns2} link-netnsid 0 dummy1 type dummy") + ip(f"link add netns {ns2} dummy2 type dummy", ns=str(ns1)) + + ip("link del dummy1", ns=str(ns2)) + ip("link del dummy2", ns=str(ns2)) + + time.sleep(1) + rtnl.check_ntf() + ksft_true(rtnl.async_msg_queue.empty(), + "Received unexpected link notification") + + +def main() -> None: + with NetNS() as ns1, NetNS() as ns2: + ksft_run([test_event], args=(ns1, ns2)) + ksft_exit() + + +if __name__ == "__main__": + main() -- 2.47.0 From ozy at netpower.fr Mon Nov 18 14:31:29 2024 From: ozy at netpower.fr (Orsiris de Jong) Date: Mon, 18 Nov 2024 15:31:29 +0100 Subject: potentially disallowing IP fragmentation on wg packets, and handling routing loops better In-Reply-To: References: Message-ID: Hello, Sorry to ??unburry?? this thread, but I?d love to know what has been decided for Wireguard regarding UDP fragmentation. I use to play with a setup to bridge two physical ethernet ports through internet in order to create a LAN2LAN tunnel, that roughly looks like the following?: (SITE1 SWITCH)----ETH1----BRIDGE1----GRETAP1----WIREGUARD1----WAN1----(internet)----WAN2----WIREGUARD2---GRETAP2---BRIDGE2---ETH2----(SITE2 SWITCH) Setting the wireguard MTU on both sides to 9200 and setting MSS clamping, I was able to let this chain act almost as a transparent ethernet cable, where packets from ETH1 to ETH2 could be up to 9000 bytes MTU, even if the WAN MTU were 1500 bytes or less. Although I know many reasons why this is not ideal, and applications should compute pMTU in order to properly speak to each other, I have some industrial appliances that cannot be upgraded, that require a MTU 9000 link between both sites. When I made this setup on RHEL 8, everything worked okay, even if the bandwith was low due to fragmentation. But setting up the same in RHEL 9, I cannot transfer packets of sizes more than 1360 (which is 1500 minus wireguard and gretap headers). I guess that somewhere between both releases, something has been changed in the way Wireguard allows or does honor do not fragment packets. I?ve searched alot, and only found this thread that discusses a similar setup from Roman. Could anyone give me some insight what was decided for the Wireguard fragmentation, and perhaps a tip to make this setup work again?? Best regards, Orsiris de Jong from NetInvent. From jrife at google.com Mon Nov 18 20:44:26 2024 From: jrife at google.com (Jordan Rife) Date: Mon, 18 Nov 2024 13:44:26 -0700 Subject: [PATCH v2 net-next] wireguard: allowedips: Add WGALLOWEDIP_F_REMOVE_ME flag In-Reply-To: References: <20240905200551.4099064-1-jrife@google.com> Message-ID: Hi Jason, Thanks a lot for taking a look! > I'm actually less enthusiastic about this part, but mainly because I > haven't looked closely at what the convention for this is. I was > wondering though - this adds WGALLOWEDIP_A_FLAGS, which didn't exist > before. Shouldn't some upper layer return a relevant value in that case? > And even within the existing flags, for WGPEER_A_FLAGS, for example, old > kernels check to see if there are new flags, for this purpose, e.g.: > > if (attrs[WGPEER_A_FLAGS]) > flags = nla_get_u32(attrs[WGPEER_A_FLAGS]); > ret = -EOPNOTSUPP; > if (flags & ~__WGPEER_F_ALL) > goto out; > > So I think we might be able to avoid having to bump the version number. > GENL is supposed to be extensible like this. I think the challenge with WGALLOWEDIP_A_FLAGS in particular is that because it didn't exist since the beginning like WGPEER_A_FLAGS, there are kernels out there that have no knowledge of it and wouldn't have this check in place. While I think it's a good idea to replicate this check for WGALLOWEDIP_A_FLAGS as well for future compatibility, we still need some way for clients to probe whether or not this feature is supported in case they're running on an older kernel. If we want to keep the version number as-is, I see a few alternatives: 1. Include WGALLOWEDIP_A_FLAGS in the response of WG_CMD_GET_DEVICE. This would be a new field inside each entry of WGPEER_A_ALLOWEDIPS. If the result of WG_CMD_GET_DEVICE contains WGALLOWEDIP_A_FLAGS then a client knows that it can use features WGALLOWEDIP_F_REMOVE_ME. It could potentially be used later to contain persistent flags for an IP, but for now would just be zeroed out. This fails if there are no allowed IPs configured for the device you're trying to configure, but in the case where you're trying to use WGALLOWEDIP_F_REMOVE_ME you probably would. 2. Keep everything the same. Don't bump the version number. Clients could in theory try to use WGALLOWEDIP_A_FLAGS with WG_CMD_SET_DEVICE then check if their request had the intended effect (e.g. checking if the IP was removed). I slightly prefer approach 1 myself, as I feel it's a bit cleaner, but I'm happy to discuss some other approaches here. I'm trying to think about how these probes could be used inside the WireGuard Go library and wg itself. Assuming approach one, * For libraries that manage WireGuard like golang.zx2c4.com/wireguard/wgctrl the first time a client uses .Device() (WG_CMD_GET_DEVICE under the hood) there would need to be some logic that looks at WGPEER_A_ALLOWEDIPS for one of the peers and sets some internal flag like c.supportsAllowedIPFlags. When a client later calls c.ConfigureDevice() the library would disallow direct IP removal if the feature is not supported (c.supportsAllowedIPFlags != nil && !*c.supportsAllowedIPFlags). Alternatively, you could do all this up front while initializing the client by creating some dummy device + peer, adding some IP, then using WG_CMD_GET_DEVICE to see if WGALLOWEDIP_A_FLAGS is present in the result. * Similarly for wg, if the user is trying to remove an allowed IP you'd first query the allowed IP for a peer and check for WGALLOWEDIP_A_FLAGS if it doesn't exist in the response then the command would fail and print something like "not supported". Bumping WG_GENL_VERSION is cleaner still, since clients in userspace just need to check an integer value and avoid any probing logic. However, like you I am unsure what the convention is here and whether or not this has any unintended effects. > This file doesn't really do the _ prefix thing anywhere. Can you call > this something more descriptive, like "remove_node"? Sure. I'll update this in v3. > Reasoning for this is that it copies the logic in add()? As for the cidr > bits check, the intent was simply to fail if the user passes an invalid value for WGALLOWEDIP_A_CIDR_MASK. Although, unlike the add() case, I suppose we could just remove this check. Worst case if a user passes something high like 255 remove() would just be a no-op. It looks safe to remove !peer check as well. I can update this in v3. > What's the reasoning behind returning success when it can't find the > node? Because in that case it's already removed so the function is > idempotent? And you checked that nothing really cares about the return > value there anyway? Or is this a mistake and you meant to return > something else? I can imagine good reasoning in either direction; I'd > just like to learn that your choice is deliberate. Yes, I opted for idempotence here intentionally. At least for the use cases I have in mind the intent is basically "remove all these allowed IPs from this peer if they exist". If an allowed IP already got removed or is mapped to another peer somehow then that's fine. I'd imagine it complicates the code in userspace to have to deal with corner cases involving possible error codes returned when removing a batch of allowed IPs. You'd need to query the device again, reform your request, etc. I /think/ add() is idempotent as well in cases where you re-add an IP to a peer, so there's some symmetry here. Perhaps if there are use cases that need more stricter checks in the future we could introduce a new flag to return an error code in this case. > As I mentioned above, you need to do the dance of: > > ret = -EOPNOTSUPP; > if (flags & ~__WGALLOWEDIP_F_ALL) > goto out; > > So that we can safely extend this later. Good idea. I will add this in v3 as well. > We get 100 chars now, so you can rewrite this as: > > ret = wg_allowedips_remove_v4(&peer->device->peer_allowedips, > nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, > peer, &peer->device->device_update_lock); Nice, will do. > That comma should be a semicolon because what comes after is a complete > sentence, and there's no conjunction. Good point. I think we might actually need a comma /after/ otherwise: "...; otherwise, ...": "WGALLOWEDIP_F_REMOVE_ME if the specified IP should be removed; otherwise, this IP will be added if it is not already present". > I'm not so keen on this, simply because we've been able to do everything > else in that script and keeping with the "make sure the userspace > tooling" paradigm. There are two options: > > 1. Rewrite netns.sh all in C, only depending on libnl or whatever (which > I would actually really *love* to see happen). This would change the > testing paradigm, but I'd be okay with that if it's done well and all > at once. > > 2. Add support for this new flag to wg(8) (which I think is necessary > anyway for this to land; kernel features and userspace support oughta > be posted at once). > > > Thanks for the patch. I like the feature and I'm happy you posted this. Option 1 seems like a heavy lift for this patch. I think option 2 makes more sense, as long as there is an understanding that netns.sh needs to be run with the latest and greatest version of wg in order for all tests to pass. Maybe we can add a version check to selectively disable the remove IP tests if wg is on an older version. I agree though that long term option 1 would be nice, as it provides a finer level of control and tests could be run without as many external dependencies. I can get rid of the remove-ip cruft and send two patches to the wireguard mailing list if that works, one for the kernel and one for wireguard-tools. However, this raises some questions about the wg interface itself which would be used both by netns.sh and end users. Looking at the current interface for wg, the way to configure allowed IPs currently is "wg set". For example, wg set peer ABCD... allowed-ips 192.168.0.24/32,192.168.0.44/32,192.168.0.88/32 The intended effect is to completely replace the allowed IPs for that peer rather than to make an incremental change. I think we'll need to extend the interface a bit. There are a few directions we can take here: 1. Add a new flag to "wg set" like --incremental in which case it won't use WGPEER_F_REPLACE_ALLOWEDIPS under the hood. Support addition or removal of allowed IPs with a "-" suffix on CIDRs you want to remove (192.168.0.24/32-,192.168.0.44/32-,192.168.0.88/32). 2. Same as 1 but with a new argument name, allowed-ips-diff instead of the --incremental flag. This appears more in line with the current style of wg's arguments. There are a lot more variations here, so I wanted to get your input here before just picking a direction. Thanks again for the thorough feedback! -Jordan From syzbot+b0ae8f1abf7d891e0426 at syzkaller.appspotmail.com Tue Nov 19 01:31:21 2024 From: syzbot+b0ae8f1abf7d891e0426 at syzkaller.appspotmail.com (syzbot) Date: Mon, 18 Nov 2024 17:31:21 -0800 Subject: [syzbot] [net] INFO: task hung in tun_chr_close (5) In-Reply-To: <000000000000bd671b06222de427@google.com> Message-ID: <673bea69.050a0220.87769.0061.GAE@google.com> syzbot has found a reproducer for the following issue on: HEAD commit: 9fb2cfa4635a Merge tag 'pull-ufs' of git://git.kernel.org/.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=11a57378580000 kernel config: https://syzkaller.appspot.com/x/.config?x=e31661728c1a4027 dashboard link: https://syzkaller.appspot.com/bug?extid=b0ae8f1abf7d891e0426 compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1026f2e8580000 Downloadable assets: disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-9fb2cfa4.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/e676eb2a9e5f/vmlinux-9fb2cfa4.xz kernel image: https://storage.googleapis.com/syzbot-assets/abff576e0e8f/bzImage-9fb2cfa4.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+b0ae8f1abf7d891e0426 at syzkaller.appspotmail.com INFO: task syz-executor:5443 blocked for more than 143 seconds. Not tainted 6.12.0-syzkaller-00233-g9fb2cfa4635a #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:20280 pid:5443 tgid:5443 ppid:1 flags:0x00004006 Call Trace: context_switch kernel/sched/core.c:5328 [inline] __schedule+0x184f/0x4c30 kernel/sched/core.c:6693 __schedule_loop kernel/sched/core.c:6770 [inline] schedule+0x14b/0x320 kernel/sched/core.c:6785 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6842 __mutex_lock_common kernel/locking/mutex.c:684 [inline] __mutex_lock+0x6a7/0xd70 kernel/locking/mutex.c:752 tun_detach drivers/net/tun.c:698 [inline] tun_chr_close+0x3b/0x1b0 drivers/net/tun.c:3517 __fput+0x23c/0xa50 fs/file_table.c:450 task_work_run+0x24f/0x310 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xa2f/0x28e0 kernel/exit.c:938 do_group_exit+0x207/0x2c0 kernel/exit.c:1087 get_signal+0x16a3/0x1740 kernel/signal.c:2918 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0xc9/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f250b78049a RSP: 002b:00007ffd42ecadc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: 0000000000000000 RBX: 00007ffd42ecadf0 RCX: 00007f250b78049a RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000003 R08: 00007ffd42ecadec R09: 00007ffd42ecb207 R10: 00007ffd42ecadf0 R11: 0000000000000246 R12: 00007f250b90a500 R13: 00007ffd42ecadec R14: 0000000000000000 R15: 00007f250b90c000 INFO: task syz-executor:5449 blocked for more than 148 seconds. Not tainted 6.12.0-syzkaller-00233-g9fb2cfa4635a #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:20752 pid:5449 tgid:5449 ppid:1 flags:0x00004006 Call Trace: context_switch kernel/sched/core.c:5328 [inline] __schedule+0x184f/0x4c30 kernel/sched/core.c:6693 __schedule_loop kernel/sched/core.c:6770 [inline] schedule+0x14b/0x320 kernel/sched/core.c:6785 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6842 __mutex_lock_common kernel/locking/mutex.c:684 [inline] __mutex_lock+0x6a7/0xd70 kernel/locking/mutex.c:752 tun_detach drivers/net/tun.c:698 [inline] tun_chr_close+0x3b/0x1b0 drivers/net/tun.c:3517 __fput+0x23c/0xa50 fs/file_table.c:450 task_work_run+0x24f/0x310 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xa2f/0x28e0 kernel/exit.c:938 do_group_exit+0x207/0x2c0 kernel/exit.c:1087 get_signal+0x16a3/0x1740 kernel/signal.c:2918 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0xc9/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f853b18049a RSP: 002b:00007fff077d36d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 RAX: 0000000000000000 RBX: 00007fff077d3760 RCX: 00007f853b18049a RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000003 R08: 00007fff077d36fc R09: 00007fff077d3b17 R10: 00007fff077d3760 R11: 0000000000000202 R12: 00007f853b30b280 R13: 00007fff077d36fc R14: 0000000000000000 R15: 00007f853b30c000 INFO: task kworker/0:8:5617 blocked for more than 152 seconds. Not tainted 6.12.0-syzkaller-00233-g9fb2cfa4635a #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/0:8 state:D stack:25104 pid:5617 tgid:5617 ppid:2 flags:0x00004000 Workqueue: events switchdev_deferred_process_work Call Trace: context_switch kernel/sched/core.c:5328 [inline] __schedule+0x184f/0x4c30 kernel/sched/core.c:6693 --- If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing. From Ariel.Otilibili-Anieli at eurecom.fr Mon Nov 18 22:58:18 2024 From: Ariel.Otilibili-Anieli at eurecom.fr (Ariel Otilibili-Anieli) Date: Mon, 18 Nov 2024 23:58:18 +0100 Subject: [PATCH] =?utf-8?q?wireguard-tools=3A?= Extracted error message for the sake of legibility Message-ID: <15e782-673bc680-4ed-422b5480@29006332> Hello, This is a reminder about a patch sent to the WireGuard mailing list; CCing the maintainers of drivers/net/wireguard/; below a verbatim of my cover letter. Thank you, ** I have been using WireGuard for some time; it does ease the configuration of VPNs. This is my first patch to the list, I asked to be subscribed; please confirm me it is the case. I would like to improve my C programming skills; your feedback will be much appreciated. Ariel -------- Original Message -------- Subject: [PATCH] wireguard-tools: Extracted error message for the sake of legibility Date: Thursday, August 01, 2024 11:43 CEST From: Ariel Otilibili Reply-To: Ariel Otilibili To: wireguard at lists.zx2c4.com CC: "Jason A . Donenfeld" , Ariel Otilibili References: <20240725204917.192647-2-otilibil at eurecom.fr> <20240801094932.4502-1-otilibil at eurecom.fr> Signed-off-by: Ariel Otilibili --- src/set.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/set.c b/src/set.c index 75560fd..b2fbd54 100644 --- a/src/set.c +++ b/src/set.c @@ -16,9 +16,19 @@ int set_main(int argc, const char *argv[]) { struct wgdevice *device = NULL; int ret = 1; + const char *error_message = "Usage: %s %s " + " [listen-port ]" + " [fwmark ]" + " [private-key ]" + " [peer [remove]" + " [preshared-key ]" + " [endpoint :]" + " [persistent-keepalive ]" + " [allowed-ips /[,/]...]" + " ]...\n"; if (argc < 3) { - fprintf(stderr, "Usage: %s %s [listen-port ] [fwmark ] [private-key ] [peer [remove] [preshared-key ] [endpoint :] [persistent-keepalive ] [allowed-ips /[,/]...] ]...\n", PROG_NAME, argv[0]); + fprintf(stderr, error_message, PROG_NAME, argv[0]); return 1; } -- 2.45.2 From kuba at kernel.org Tue Nov 19 03:07:00 2024 From: kuba at kernel.org (Jakub Kicinski) Date: Mon, 18 Nov 2024 19:07:00 -0800 Subject: [PATCH net-next v4 0/5] net: Improve netns handling in RTNL and ip_tunnel In-Reply-To: <20241118143244.1773-1-shaw.leon@gmail.com> References: <20241118143244.1773-1-shaw.leon@gmail.com> Message-ID: <20241118190700.4c1b8156@kernel.org> On Mon, 18 Nov 2024 22:32:39 +0800 Xiao Liang wrote: > This patch series includes some netns-related improvements and fixes for > RTNL and ip_tunnel, to make link creation more intuitive: > > - Creating link in another net namespace doesn't conflict with link names > in current one. > - Refector rtnetlink link creation. Create link in target namespace > directly. Pass both source and link netns to drivers via newlink() > callback. > > So that > > # ip link add netns ns1 link-netns ns2 tun0 type gre ... > > will create tun0 in ns1, rather than create it in ns2 and move to ns1. > And don't conflict with another interface named "tun0" in current netns. ## Form letter - net-next-closed The merge window for v6.13 has begun and net-next is closed for new drivers, features, code refactoring and optimizations. We are currently accepting bug fixes only. Please repost when net-next reopens after Dec 2nd. RFC patches sent for review only are welcome at any time. See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle -- pw-bot: defer From liuhangbin at gmail.com Tue Nov 19 07:22:21 2024 From: liuhangbin at gmail.com (Hangbin Liu) Date: Tue, 19 Nov 2024 07:22:21 +0000 Subject: [PATCHv2 net-next] selftests: wireguards: use nft by default In-Reply-To: References: <20241111041902.25814-1-liuhangbin@gmail.com> Message-ID: On Sun, Nov 17, 2024 at 09:09:00PM +0100, Jason A. Donenfeld wrote: > On Mon, Nov 11, 2024 at 04:19:02AM +0000, Hangbin Liu wrote: > > Use nft by default if it's supported, as nft is the replacement for iptables, > > which is used by default in some releases. Additionally, iptables is dropped > > in some releases. > > Rather than having this optionality, I'd rather just do everything in > one way or the other. So if you're adamant that we need to use nft, just > convert the whole thing. And then subsequently, make sure that the qemu > test harness supports it. That should probably be a series. Hmm, try build nft but got error # make -C tools/testing/selftests/wireguard/qemu/ make: Entering directory '/home/net/tools/testing/selftests/wireguard/qemu' Building for x86_64-linux-musl using x86_64-redhat-linux cd /home/net/tools/testing/selftests/wireguard/qemu/build/x86_64/nftables-1.0.9 && ./configure --prefix=/ --build=x86_64-redhat-linux --host=x86_64-linux-musl --enable-static --disable-shared checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes ... checking for pkg-config... /usr/bin/pkg-config configure: WARNING: using cross tools not prefixed with host triplet checking pkg-config is at least version 0.9.0... yes checking for libmnl >= 1.0.4... yes checking for libnftnl >= 1.2.6... yes checking for __gmpz_init in -lgmp... no configure: error: No suitable version of libgmp found But I can config it manually like: ./configure --prefix=/ --build=x86_64-redhat-linux --host=x86_64-linux-musl --enable-static --disable-shared correctly Do you have any idea? Thanks Hangbin From phil at nwl.cc Tue Nov 19 14:37:23 2024 From: phil at nwl.cc (Phil Sutter) Date: Tue, 19 Nov 2024 15:37:23 +0100 Subject: [PATCHv2 net-next] selftests: wireguards: use nft by default In-Reply-To: References: <20241111041902.25814-1-liuhangbin@gmail.com> Message-ID: Hangbin, On Tue, Nov 19, 2024 at 07:22:21AM +0000, Hangbin Liu wrote: > On Sun, Nov 17, 2024 at 09:09:00PM +0100, Jason A. Donenfeld wrote: > > On Mon, Nov 11, 2024 at 04:19:02AM +0000, Hangbin Liu wrote: > > > Use nft by default if it's supported, as nft is the replacement for iptables, > > > which is used by default in some releases. Additionally, iptables is dropped > > > in some releases. > > > > Rather than having this optionality, I'd rather just do everything in > > one way or the other. So if you're adamant that we need to use nft, just > > convert the whole thing. And then subsequently, make sure that the qemu > > test harness supports it. That should probably be a series. > > Hmm, try build nft but got error > > # make -C tools/testing/selftests/wireguard/qemu/ > make: Entering directory '/home/net/tools/testing/selftests/wireguard/qemu' > Building for x86_64-linux-musl using x86_64-redhat-linux > cd /home/net/tools/testing/selftests/wireguard/qemu/build/x86_64/nftables-1.0.9 && ./configure --prefix=/ --build=x86_64-redhat-linux --host=x86_64-linux-musl --enable-static --disable-shared > checking for a BSD-compatible install... /usr/bin/install -c > checking whether build environment is sane... yes > ... > checking for pkg-config... /usr/bin/pkg-config > configure: WARNING: using cross tools not prefixed with host triplet > checking pkg-config is at least version 0.9.0... yes > checking for libmnl >= 1.0.4... yes > checking for libnftnl >= 1.2.6... yes > checking for __gmpz_init in -lgmp... no > configure: error: No suitable version of libgmp found You may find proper details about the failure in config.log. My guess is the cross build prevents host libraries from being used. (No idea why your manual call works, though.) > But I can config it manually like: ./configure --prefix=/ --build=x86_64-redhat-linux --host=x86_64-linux-musl --enable-static > --disable-shared correctly > > Do you have any idea? You may just pass '--with-mini-gmp' to nftables' configure call to avoid the external dependency. Cheers, Phil