Re: Unexpected experience of site-to-site wireguard tunneling‏‏

حامد صابر hsaber at gmail.com
Sat Sep 4 16:55:15 UTC 2021


UPDATE:
During the outage, I ran a simple test to check if both sides have
access to each other's specified UDP ports, then I found something
interesting:
Some 148 bytes length packets are transferred between parties which I
can't recognize what are them.
By the way here I copy my test results.
Anybody can spot what's going on here?
The middle-node host name which runs wg1 is ir-pp and the exit-note
(to free internet) which runs wg2 is sf-do:
(these two parts happened at the same time and are copied from 2
different servers):

========================
root at ir-pp:~# nc -u sf-do 50840
123456
1234
12
^C
root at ir-pp:~# tcpdump 'port 50841'
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:56:57.607342 IP ir-pp.50841 > sf-do.50840: UDP, length 148
20:57:01.042471 IP sf-do.43161 > ir-pp.50841: UDP, length 3
20:57:02.361827 IP ir-pp.50841 > sf-do.50840: UDP, length 148
20:57:03.635754 IP sf-do.43161 > ir-pp.50841: UDP, length 5
20:57:06.740922 IP sf-do.43161 > ir-pp.50841: UDP, length 7
20:57:07.552305 IP ir-pp.50841 > sf-do.50840: UDP, length 148
^C
6 packets captured
15 packets received by filter
0 packets dropped by kernel
root at ir-pp:~#


========================
root at sf-do:~# tcpdump 'port 50840'
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:56:09.126958 IP sf-do.50840 > ir-pp.50841: UDP, length 148
20:56:14.758974 IP sf-do.50840 > ir-pp.50841: UDP, length 148
20:56:18.222814 IP ir-pp.38136 > sf-do.50840: UDP, length 7
20:56:20.391041 IP sf-do.50840 > ir-pp.50841: UDP, length 148
20:56:22.307488 IP ir-pp.38136 > sf-do.50840: UDP, length 5
20:56:24.702590 IP ir-pp.38136 > sf-do.50840: UDP, length 3
20:56:25.510985 IP sf-do.50840 > ir-pp.50841: UDP, length 148
20:56:30.630917 IP sf-do.50840 > ir-pp.50841: UDP, length 148
20:56:35.750965 IP sf-do.50840 > ir-pp.50841: UDP, length 148
^C
9 packets captured
9 packets received by filter
0 packets dropped by kernel
root at sf-do:~# nc -u ir-pp 50841
12
1234
123456
^C
root at sf-do:~#

‫حامد صابر <‪hsaber at gmail.com‬‏> در تاریخ جمعه ۳ سپتامبر ۲۰۲۱ ساعت ۸:۳۹ نوشت:‬
>
> Hi again,
> Something new happened which makes me more confused.
> I wrote a small shell-script to check the connection between wg1 and
> wg2, so whenever it drops, the script restarts the wg1 and everything
> comes back.
> Yes yes I don't like this way of addressing issues, but what could I
> do if no meaningful debug information exists, and I predict that it
> might be a bug of Wireguard itself?
> BTW this system worked fine and the anti-censorship VPN chain was up
> and running till this morning.
> The connection went down at 7:49 and didn't come back with
> auto-restart. The process of restarting continued for about 6 minutes,
> and at last at 7:55 it came back.
> During the outage I checked both sides and everything seemed fine.
> Sides could ping the public ip of each other, their wg udp ports were
> accessible to each other, and even handshake seemed to be finished
> correctly (using wg-show command) but peers couldn't ping each other.
> And the most confusing part is everything came back to life without
> any action from my side. Just after 6 minutes of continuously
> restarting the wg1 interface!
> Isn't there a bug? Somebody please help me to debug this problem and
> find the cause.
>
> Here I bring you my shell-script code, and then its related log:
>
> -------------------------
> #!/bin/bash
>
> exec >>/var/log/wg-ping 2>&1
> while true
> do
>         connection=$(ping -c 1 10.10.10.1)
>         time=$(date +%H:%M)
>         seconds=$(date +%S)
>         seconds=${seconds#0}
>         if [[ "$connection" != *"icmp"* ]]; then
>           echo " "
>           date
>           wg-quick down wg1
>           echo " "
>           wg-quick up wg1
>           connection=$(ping -c 1 10.10.10.1)
>           time=$(date +%T)
>           if [[ "$connection" != *"icmp"* ]]; then
>             echo "$time ERROR"
>           else
>             echo "$time OK"
>             echo " "
>           fi
>         elif [[ $seconds -lt 5  ]]; then
>           echo "$time OK"
>         fi
>         sleep 5
> done
> ===============
> Sample log of simply restarting the wg1 which makes everything fine
> (and happens every few hours):
> -----------------------
> 01:17 OK
> 01:18 OK
> 01:19 OK
> 01:20 OK
> 01:21 OK
> 01:22 OK
> 01:23 OK
> 01:24 OK
> 01:25 OK
> 01:26 OK
>
> Fri Sep  3 01:26:41 +0430 2021
> [#] ip route del default dev wg1 table middle
> [#] ip rule del iif wg0 lookup middle
> [#] ip link delete dev wg1
>
> [#] ip link add wg1 type wireguard
> [#] wg setconf wg1 /dev/fd/63
> [#] ip -4 address add 10.10.10.2/32 dev wg1
> [#] ip link set mtu 1420 up dev wg1
> [#] ip -4 route add 10.10.10.1/32 dev wg1
> [#] ip route add default dev wg1 table middle
> [#] ip rule add iif wg0 lookup middle
> [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> 01:26:42 OK
>
> 01:27 OK
> 01:28 OK
> 01:30 OK
> 01:31 OK
> 01:32 OK
> 01:33 OK
> 01:34 OK
> =====================================================
> And the log for the confusing situation I explained:
> -----------------------
> 07:45 OK
> 07:46 OK
> 07:47 OK
> 07:48 OK
> 07:49 OK
>
> Fri Sep  3 07:49:42 +0430 2021
> [#] ip route del default dev wg1 table middle
> [#] ip rule del iif wg0 lookup middle
> [#] ip link delete dev wg1
>
> [#] ip link add wg1 type wireguard
> [#] wg setconf wg1 /dev/fd/63
> [#] ip -4 address add 10.10.10.2/32 dev wg1
> [#] ip link set mtu 1420 up dev wg1
> [#] ip -4 route add 10.10.10.1/32 dev wg1
> [#] ip route add default dev wg1 table middle
> [#] ip rule add iif wg0 lookup middle
> [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> 07:49:53 ERROR
>
> Fri Sep  3 07:50:08 +0430 2021
> [#] ip route del default dev wg1 table middle
> [#] ip rule del iif wg0 lookup middle
> [#] ip link delete dev wg1
>
> [#] ip link add wg1 type wireguard
> [#] wg setconf wg1 /dev/fd/63
> [#] ip -4 address add 10.10.10.2/32 dev wg1
> [#] ip link set mtu 1420 up dev wg1
> [#] ip -4 route add 10.10.10.1/32 dev wg1
> [#] ip route add default dev wg1 table middle
> [#] ip rule add iif wg0 lookup middle
> [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> 07:50:18 ERROR
>
> ======
> LOTS OF RETRY LOGS CROPPED
> ======
>
> Fri Sep  3 07:55:13 +0430 2021
> [#] ip route del default dev wg1 table middle
> [#] ip rule del iif wg0 lookup middle
> [#] ip link delete dev wg1
>
> [#] ip link add wg1 type wireguard
> [#] wg setconf wg1 /dev/fd/63
> [#] ip -4 address add 10.10.10.2/32 dev wg1
> [#] ip link set mtu 1420 up dev wg1
> [#] ip -4 route add 10.10.10.1/32 dev wg1
> [#] ip route add default dev wg1 table middle
> [#] ip rule add iif wg0 lookup middle
> [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> 07:55:23 ERROR
>
> Fri Sep  3 07:55:38 +0430 2021
> [#] ip route del default dev wg1 table middle
> [#] ip rule del iif wg0 lookup middle
> [#] ip link delete dev wg1
>
> [#] ip link add wg1 type wireguard
> [#] wg setconf wg1 /dev/fd/63
> [#] ip -4 address add 10.10.10.2/32 dev wg1
> [#] ip link set mtu 1420 up dev wg1
> [#] ip -4 route add 10.10.10.1/32 dev wg1
> [#] ip route add default dev wg1 table middle
> [#] ip rule add iif wg0 lookup middle
> [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> 07:55:39 OK
>
> 07:56 OK
> 07:57 OK
> 07:58 OK
> 07:59 OK
> 08:00 OK
> ---------------------------------------
>
>
>
> ‫حامد صابر <‪hsaber at gmail.com‬‏> در تاریخ پنجشنبه ۲ سپتامبر ۲۰۲۱ ساعت
> ۸:۴۷ نوشت:‬
> >
> > -Thanks for reply.
> > For test reasons, I turned the firewall off on my middle-node server.
> > But I can't understand how this issue may be related to firewall,
> > because most of the time it works (and to me it means firewall is Ok),
> > but sometime for some unknown reason it stops working, which when I
> > restart the wg1 interface with this command everything comes back to
> > life:
> > wg-quick down wg1 && wg-quick up wg1
> >
> > BTW here it is the firewall status of middle-node:
> > ==================
> > ● firewalld.service - firewalld - dynamic firewall daemon
> >    Loaded: loaded (/usr/lib/systemd/system/firewalld.service;
> > disabled; vendor preset: enabled)
> >    Active: inactive (dead)
> >      Docs: man:firewalld(1)
> > -----------------------------------
> > Chain INPUT (policy ACCEPT)
> > target     prot opt source               destination
> >
> > Chain FORWARD (policy ACCEPT)
> > target     prot opt source               destination
> >
> > Chain OUTPUT (policy ACCEPT)
> > target     prot opt source               destination
> > -----------------------------------
> >
> > And the firewall status of exit-node:
> > ====================
> > Unit firewalld.service could not be found.
> > ---------------------------
> > Chain INPUT (policy ACCEPT)
> > target     prot opt source               destination
> > ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:50842
> > ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:53
> >
> > Chain FORWARD (policy ACCEPT)
> > target     prot opt source               destination
> > ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
> > ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
> >
> > Chain OUTPUT (policy ACCEPT)
> > target     prot opt source               destination
> > ---------------------
> > Regards
> >
> > ‫‪John Lauro‬‏ <‪johnalauro at gmail.com‬‏> در تاریخ چهارشنبه ۱ سپتامبر
> > ۲۰۲۱ ساعت ۲۱:۵۱ نوشت:‬
> > >
> > > Just a guess, but I would be suspicious about connection tracking causing the issue.  What are your firewall rules?
> > >
> > > ‪On Wed, Sep 1, 2021 at 9:51 AM ‫حامد صابر‬‎ <hsaber at gmail.com> wrote:‬
> > >>
> > >> Dear friends,
> > >> I have configured 3 wireguard interfaces on 2 servers to act as a
> > >> chained VPN for me (to bypass the internet censorship in my country),
> > >> with this schema:
> > >>
> > >> client -- wg0 on middle-node -- wg1 on middle node -- wg2 on exit node
> > >> (to free internet)
> > >>
> > >> Everything works fine, but after a while, the connection between wg1
> > >> and wg2 drops and I can't find the reason. The connection comes back
> > >> to action by simply switching the wg1 down and up again using
> > >> wg-quick. And the amazing behaviour is that sometimes the connection
> > >> comes back to work automatically  after some random time passes,
> > >> without any actions from my side (sometimes after a few tens of
> > >> minutes, sometimes after a few hours).
> > >> When the wg1-wg2 connection is not working, anything else between 2
> > >> servers (middle-node and exit-node) works fine. I mean I can ping the
> > >> public IP of each server from another part, but the local wireguard ip
> > >> of none of them are accessible.
> > >>
> > >> I tried to monitor the situation and read the logs but couldn't find
> > >> what is happening here, so please help!
> > >>
> > >> The configuration:
> > >> ======================
> > >>
> > >> client (my mobile phone):
> > >> -------------------------------------------
> > >> [Interface]
> > >> Address = 10.10.20.2/32
> > >> PrivateKey =  <private key of client>
> > >> DNS = 10.10.10.1
> > >>
> > >> ### Middle Node
> > >> [Peer]
> > >> PublicKey =  <public key of wg0>
> > >> PresharedKey =  <preshared key>
> > >> AllowedIPs = 0.0.0.0/0
> > >> Endpoint = middle-node:50842
> > >> ======================
> > >>
> > >> wg0 (in middle-node server):
> > >> -------------------------------------------
> > >> [Interface]
> > >> Address = 10.10.20.1/24
> > >> ListenPort = 50842
> > >> PrivateKey =  <private key of wg0>
> > >>
> > >> ### Client
> > >> [Peer]
> > >> PublicKey =  <public key of client>
> > >> PresharedKey =  <preshared key>
> > >> AllowedIPs = 10.10.20.2/32
> > >> ======================
> > >>
> > >> wg1 (again in middle-node server):
> > >> -------------------------------------------
> > >> [Interface]
> > >> Address = 10.10.10.2/32
> > >> PrivateKey =  <private key of wg1>
> > >>
> > >> PostUp = ip route add default dev wg1 table middle
> > >> PostUp = ip rule add iif wg0 lookup middle
> > >> PostUp = wg set wg1 peer <publickey of wg2 (in exit-node)> allowed-ips 0.0.0.0/0
> > >>
> > >> PreDown = ip route del default dev wg1 table middle
> > >> PreDown = ip rule del iif wg0 lookup middle
> > >>
> > >> ### Exit Node
> > >> [Peer]
> > >> PublicKey =  <publickey of wg2 (in exit-node)>
> > >> PresharedKey =  <preshared key>
> > >> AllowedIPs = 10.10.10.1/32
> > >> Endpoint = exit-node:50842
> > >> PersistentKeepalive = 25
> > >> ======================
> > >>
> > >> wg2 (in exit-node server):
> > >> -------------------------------------------
> > >> [Interface]
> > >> Address = 10.10.10.1/24
> > >> ListenPort = 50842
> > >> PrivateKey =  <private key of wg2>
> > >>
> > >> PostUp   = iptables -A FORWARD -i eth0 -o wg2 -j ACCEPT
> > >> PostUp   = iptables -A FORWARD -i wg2 -j ACCEPT
> > >> PostUp   = iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> > >>
> > >> PostDown = iptables -D FORWARD -i eth0 -o wg2 -j ACCEPT
> > >> PostDown = iptables -D FORWARD -i wg2 -j ACCEPT
> > >> PostDown = iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
> > >>
> > >> ### Middle Node
> > >> [Peer]
> > >> PublicKey =  <publickey of wg1 (in middle-node)>
> > >> PresharedKey =  <preshared key>
> > >> AllowedIPs = 10.0.0.0/8
> > >> ======================
> > >> ======================
> > >> ======================
> > >>
> > >> Sample log of dmesg when the wg1-wg2 connection is not working:
> > >> -------------------------------------------
> > >> [Wed Sep  1 11:19:32 2021] wireguard: wg1: Sending keepalive packet to
> > >> peer 12 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:19:44 2021] wireguard: wg0: Sending keepalive packet to
> > >> peer 8 (~client-ip~:65323)
> > >> [Wed Sep  1 11:19:44 2021] wireguard: wg1: Receiving keepalive packet
> > >> from peer 12 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Receiving handshake
> > >> initiation from peer 8 (~client-ip~:65323)
> > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Sending handshake response
> > >> to peer 8 (~client-ip~:65323)
> > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Keypair 2867 destroyed for peer 8
> > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Keypair 2871 created for peer 8
> > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Receiving keepalive packet
> > >> from peer 8 (~client-ip~:65323)
> > >> [Wed Sep  1 11:21:19 2021] wireguard: wg0: Sending keepalive packet to
> > >> peer 8 (~client-ip~:65323)
> > >> [Wed Sep  1 11:21:24 2021] wireguard: wg1: Retrying handshake with
> > >> peer 12 (~exit-node-ip~:50842) because we stopped hearing back after
> > >> 15 seconds
> > >> [Wed Sep  1 11:21:24 2021] wireguard: wg1: Sending handshake
> > >> initiation to peer 12 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:21:30 2021] wireguard: wg1: Handshake for peer 12
> > >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try
> > >> 2)
> > >> ======================
> > >>
> > >> Sample log of dmesg when the wg1-wg2 connection is coming back using
> > >> manual restart:
> > >> -------------------------------------------
> > >> [Wed Sep  1 11:45:52 2021] wireguard: wg1: Sending handshake
> > >> initiation to peer 12 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:45:52 2021] wireguard: wg0: Sending keepalive packet to
> > >> peer 8 (~client-ip~:2335)
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Handshake for peer 12
> > >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try
> > >> 3)
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Sending handshake
> > >> initiation to peer 12 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Keypair 2878 destroyed for peer 12
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Peer 12
> > >> (~exit-node-ip~:50842) destroyed
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Interface destroyed
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Interface created
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Peer 13 created
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Sending keepalive packet to
> > >> peer 13 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Sending handshake
> > >> initiation to peer 13 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Receiving handshake
> > >> response from peer 13 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Keypair 2881 created for peer 13
> > >> [Wed Sep  1 11:46:12 2021] wireguard: wg0: Receiving keepalive packet
> > >> from peer 8 (~client-ip~:2335)
> > >> [Wed Sep  1 11:46:14 2021] wireguard: wg1: Receiving keepalive packet
> > >> from peer 13 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:46:27 2021] wireguard: wg0: Sending keepalive packet to
> > >> peer 8 (~client-ip~:2335)
> > >> [Wed Sep  1 11:46:28 2021] wireguard: wg1: Receiving keepalive packet
> > >> from peer 13 (~exit-node-ip~:50842)
> > >> [Wed Sep  1 11:46:52 2021] wireguard: wg1: Receiving keepalive packet
> > >> from peer 13 (~exit-node-ip~:50842)
> > >>
> > >>
> > >> Thanks in advance for your kind help


More information about the WireGuard mailing list