Should setting the listen-port require CAP_SYS_ADMIN in the socket namespace?
ju.orth at gmail.com
Sun Sep 9 11:13:43 CEST 2018
Consider the following scenario:
1. Sysadmin runs a hostile application `h1` in container `c1`.
2. Sysadmin creates a Wireguard device `wg0` in the init namespace.
3. Sysadmin moves `wg0` into `c1`.
4. On the same server, a user wishes to sometimes run an application `a1` that
listens on a well-known unprivileged port in the init namespace.
`h1` has a full set of capabilities in `c1`. This allows `h1` to listen on an
arbitrary unprivileged port in the init namespace by setting the listen-port
of `wg0`. This allows `h1` to block the usage of `a1`.
`h1` cannot gain access to the traffic of `a1` unless `a1` was also intending
to use this port for a Wireguard device `wg1` and the private key of `wg1` is
already known to `h1`.
Therefore: Should setting the listen-port require some capability in the
network namespace of the socket?
setns(<fd>, CLONE_NEWNET) requires CAP_SYS_ADMIN in the user namespace of <fd>
. The leaves getting access to <fd>:
* If the namespace of <fd> is mounted somewhere in the file system, then `h1`
might be able to open that file without any additional requirements. (If the
namespace was created via ip(1), then it is mounted at /run/netns/<name>.)
* If /proc is mounted, opening /proc/<pid>/ns/net seems to require
CAP_SYS_PTRACE in /proc/<pid>/ns/user.
I don't know if there are any other ways to get access to <fd> without help
from outside `c1`.
Running `c1` in a separate mount namespace with a separate / mount and a
separate pid namespace might be sufficient to prevent access to <fd>. This is
the case for Docker containers. Someone with a better overview of the
namespace model might be able to answer the following question: Are network
namespaces already considered insecure unless combined with mount and pid
namespaces? (A mount namespace is probably required to hide /sys/class/net.)
setns(<fd>, CLONE_NEWNET) is sufficient to create a socket in that network
namespace but also gives the caller other abilities in that namespace.
A less-powerful capability might therefore be better in the case of
PS: This problem becomes more severe with the transit-net series because it
allows `h1` to create the socket in any network namespace it can refer to
via process id or file descriptor. Note that the CAP_SYS_PTRACE
requirement does not apply in this case because `h1` does not have to open
This allows `h1` to gain network access by creating a Wireguard device and
moving the transit namespace to PID 1.
Once again, this might not a problem in containers with separate mount
namespaces, separate / mounts, and separate pid namespaces.
Since the selling point of the transit-net series is enabling users to
create working Wireguard devices without capabilities, it series will have
to be reworked as follows:
The caller has to prove that he already had access to intended transit
namespace. This can be done by passing matching UDP sockets into the
kernel. Unfortunately this makes it harder to use this feature from Bash
I'll add this requirement in v2.
 Not described in the man page. But see netns_install:
More information about the WireGuard