[PATCH] Enabling Threaded NAPI by Default

Mirco Barone mirco.barone at polito.it
Tue May 27 09:08:30 UTC 2025


Hi everyone,


While testing WireGuard with a large number of tunnels, we expected throughput to scale linearly with the number of active tunnels. Instead, we observed very poor performance due to a bottleneck caused by multiple NAPI functions stacking on the same CPU core, preventing the system from scaling effectively. 
More details are provided in this paper on page 3:
https://netdevconf.info/0x18/docs/netdev-0x18-paper23-talk-paper.pdf


Since each peer has its own NAPI struct, the problem can potentially occur when many peers are created on the same machine. The simple solution we found is to enable threaded NAPI, which improves
considerably the throughput in our testing conditions while, at the same time, showing no drawbacks in case of traditional deployment scenarios (i.e., single tunnel). Hence, we feel we could slightly modify the code and move to threaded NAPI as the new default.


Any comment?

The option to revert to NAPI handled by a softirq is still preserved, by simply changing the `/sys/class/net/<iface>/threaded` flag.

-----------------------------------------------------------------------
CHANGES
-----------------------------------------------------------------------
 drivers/net/wireguard/device.c | 1 +
 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 45e9b908dbfb..bb77f54d7526 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -363,6 +363,7 @@ static int wg_newlink(struct net *src_net, struct net_device *dev,
        ret = wg_ratelimiter_init();
        if (ret < 0)
                goto err_free_handshake_queue;
+       dev_set_threaded(dev,true);

        ret = register_netdevice(dev);
        if (ret < 0)


More information about the WireGuard mailing list