[RFC PATCH 1/4] netdevice: add ndo_lookup_mtu for dynamically determining MTU

leon at is.currently.online leon at is.currently.online
Tue Dec 28 23:45:23 UTC 2021

From: Leon Schuermann <leon at is.currently.online>

Add an optional function `ndo_lookup_mtu` to the `struct
net_device_ops`. This function can be used to allow other parts of the
network stack to let the destination netdevice determine the allowed
packet MTU. This is done on a per-packet basis, providing the `struct
sk_buff` holding the packet contents.

The information obtained through this method may be cached by other
parts of the network stack, such as for instance the path MTU
discovery (PMTUD) mechanism. It is not guaranteed that this function
will be called for every packet, not even that is called on a single
packet of a given flow. When this function is not implemented or when
it returns -ENODATA no statement about the permitted MTU is made and
the networking stack will resort to the device MTU values. These
properties make this mechanism capable of providing a "suggestion" for
a packet's MTU, deviating from the default device MTU.

The device is allowed to announce MTU values lower or higher than the
minimum and maximum device MTU respectively. Whether such MTU values
will be respected is up to the implementation.

Still, even with this being a non-mandatory to implement or respect
mechanism, it has some interesting consequences. Being able to inspect
the entire packet buffer, the destination netdevice implementation can
control MTUs on a flow granularity. For instance, it could be used to
allow two devices on a shared Ethernet segment to communicate with
each other using a large (> 1500 byte) MTU, while using a lower MTU
for other devices.

The immediate motivation for these changes provide another example of
this mechanism being useful: when using WireGuard, peers can reside
behind paths of varying MTU restrictions. PMTUD does not work across
these tunnel links however, as WireGuard cannot accept unauthenticated
ICMP responses. Thus it will continue to send too large packets over
lower-MTU links. With this mechanism WireGuard can, on a per-peer
granularity, reduce the MTU, without limiting the overall device
MTU. Furthermore, it can employ in-band PMTUD mechanisms to resolve
these values automatically. While an MTU metric can be set for
specific FIB routes and thus lower the MTU for individual peers, as a
consequence this completely disables PMTUD on the entire route. While
regular PMTUD does not work over the tunnel link, it should still be
usable on the rest of the route. Furthermore, when employing an
in-band per-peer PMTUD mechanism, modifying the FIB to store the
detected MTU is inelegant at best.

Signed-off-by: Leon Schuermann <leon at is.currently.online>
 include/linux/netdevice.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7c3da0e1ea9d..d9d59b756f57 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1279,6 +1279,16 @@ struct netdev_net_notifier {
  * struct net_device *(*ndo_get_peer_dev)(struct net_device *dev);
  *	If a device is paired with a peer device, return the peer instance.
  *	The caller must be under RCU read context.
+ * int (*ndo_lookup_mtu)(const struct sk_buff *skb,
+ *			 const struct net_device *dev);
+ *	For devices supporting dynamic lookup of the MTU for individual
+ *	skb packets, this function returns the MTU for the passed skb.
+ *	A return value of -ENODATA must be treated as if the device does
+ *	not support this feature. It is not guaranteed that this function will
+ *	be called for every packet presented to the ndo_start_xmit function.
+ *	A device must always accept packets of the announced min/max device MTU.
+ *	This function may be used to potentially allow MTU sizes lower/higher
+ *	than the min/max device MTU respectively.
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1487,6 +1497,8 @@ struct net_device_ops {
 	int			(*ndo_tunnel_ctl)(struct net_device *dev,
 						  struct ip_tunnel_parm *p, int cmd);
 	struct net_device *	(*ndo_get_peer_dev)(struct net_device *dev);
+	int			(*ndo_lookup_mtu)(const struct sk_buff *skb,
+						  const struct net_device *dev);

More information about the WireGuard mailing list