Re: Unexpected experience of site-to-site wireguard tunneling‏‏

حامد صابر hsaber at gmail.com
Fri Sep 3 04:09:16 UTC 2021


Hi again,
Something new happened which makes me more confused.
I wrote a small shell-script to check the connection between wg1 and
wg2, so whenever it drops, the script restarts the wg1 and everything
comes back.
Yes yes I don't like this way of addressing issues, but what could I
do if no meaningful debug information exists, and I predict that it
might be a bug of Wireguard itself?
BTW this system worked fine and the anti-censorship VPN chain was up
and running till this morning.
The connection went down at 7:49 and didn't come back with
auto-restart. The process of restarting continued for about 6 minutes,
and at last at 7:55 it came back.
During the outage I checked both sides and everything seemed fine.
Sides could ping the public ip of each other, their wg udp ports were
accessible to each other, and even handshake seemed to be finished
correctly (using wg-show command) but peers couldn't ping each other.
And the most confusing part is everything came back to life without
any action from my side. Just after 6 minutes of continuously
restarting the wg1 interface!
Isn't there a bug? Somebody please help me to debug this problem and
find the cause.

Here I bring you my shell-script code, and then its related log:

-------------------------
#!/bin/bash

exec >>/var/log/wg-ping 2>&1
while true
do
        connection=$(ping -c 1 10.10.10.1)
        time=$(date +%H:%M)
        seconds=$(date +%S)
        seconds=${seconds#0}
        if [[ "$connection" != *"icmp"* ]]; then
          echo " "
          date
          wg-quick down wg1
          echo " "
          wg-quick up wg1
          connection=$(ping -c 1 10.10.10.1)
          time=$(date +%T)
          if [[ "$connection" != *"icmp"* ]]; then
            echo "$time ERROR"
          else
            echo "$time OK"
            echo " "
          fi
        elif [[ $seconds -lt 5  ]]; then
          echo "$time OK"
        fi
        sleep 5
done
===============
Sample log of simply restarting the wg1 which makes everything fine
(and happens every few hours):
-----------------------
01:17 OK
01:18 OK
01:19 OK
01:20 OK
01:21 OK
01:22 OK
01:23 OK
01:24 OK
01:25 OK
01:26 OK

Fri Sep  3 01:26:41 +0430 2021
[#] ip route del default dev wg1 table middle
[#] ip rule del iif wg0 lookup middle
[#] ip link delete dev wg1

[#] ip link add wg1 type wireguard
[#] wg setconf wg1 /dev/fd/63
[#] ip -4 address add 10.10.10.2/32 dev wg1
[#] ip link set mtu 1420 up dev wg1
[#] ip -4 route add 10.10.10.1/32 dev wg1
[#] ip route add default dev wg1 table middle
[#] ip rule add iif wg0 lookup middle
[#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
01:26:42 OK

01:27 OK
01:28 OK
01:30 OK
01:31 OK
01:32 OK
01:33 OK
01:34 OK
=====================================================
And the log for the confusing situation I explained:
-----------------------
07:45 OK
07:46 OK
07:47 OK
07:48 OK
07:49 OK

Fri Sep  3 07:49:42 +0430 2021
[#] ip route del default dev wg1 table middle
[#] ip rule del iif wg0 lookup middle
[#] ip link delete dev wg1

[#] ip link add wg1 type wireguard
[#] wg setconf wg1 /dev/fd/63
[#] ip -4 address add 10.10.10.2/32 dev wg1
[#] ip link set mtu 1420 up dev wg1
[#] ip -4 route add 10.10.10.1/32 dev wg1
[#] ip route add default dev wg1 table middle
[#] ip rule add iif wg0 lookup middle
[#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
07:49:53 ERROR

Fri Sep  3 07:50:08 +0430 2021
[#] ip route del default dev wg1 table middle
[#] ip rule del iif wg0 lookup middle
[#] ip link delete dev wg1

[#] ip link add wg1 type wireguard
[#] wg setconf wg1 /dev/fd/63
[#] ip -4 address add 10.10.10.2/32 dev wg1
[#] ip link set mtu 1420 up dev wg1
[#] ip -4 route add 10.10.10.1/32 dev wg1
[#] ip route add default dev wg1 table middle
[#] ip rule add iif wg0 lookup middle
[#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
07:50:18 ERROR

======
LOTS OF RETRY LOGS CROPPED
======

Fri Sep  3 07:55:13 +0430 2021
[#] ip route del default dev wg1 table middle
[#] ip rule del iif wg0 lookup middle
[#] ip link delete dev wg1

[#] ip link add wg1 type wireguard
[#] wg setconf wg1 /dev/fd/63
[#] ip -4 address add 10.10.10.2/32 dev wg1
[#] ip link set mtu 1420 up dev wg1
[#] ip -4 route add 10.10.10.1/32 dev wg1
[#] ip route add default dev wg1 table middle
[#] ip rule add iif wg0 lookup middle
[#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
07:55:23 ERROR

Fri Sep  3 07:55:38 +0430 2021
[#] ip route del default dev wg1 table middle
[#] ip rule del iif wg0 lookup middle
[#] ip link delete dev wg1

[#] ip link add wg1 type wireguard
[#] wg setconf wg1 /dev/fd/63
[#] ip -4 address add 10.10.10.2/32 dev wg1
[#] ip link set mtu 1420 up dev wg1
[#] ip -4 route add 10.10.10.1/32 dev wg1
[#] ip route add default dev wg1 table middle
[#] ip rule add iif wg0 lookup middle
[#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
07:55:39 OK

07:56 OK
07:57 OK
07:58 OK
07:59 OK
08:00 OK
---------------------------------------



‫حامد صابر <‪hsaber at gmail.com‬‏> در تاریخ پنجشنبه ۲ سپتامبر ۲۰۲۱ ساعت
۸:۴۷ نوشت:‬
>
> -Thanks for reply.
> For test reasons, I turned the firewall off on my middle-node server.
> But I can't understand how this issue may be related to firewall,
> because most of the time it works (and to me it means firewall is Ok),
> but sometime for some unknown reason it stops working, which when I
> restart the wg1 interface with this command everything comes back to
> life:
> wg-quick down wg1 && wg-quick up wg1
>
> BTW here it is the firewall status of middle-node:
> ==================
> ● firewalld.service - firewalld - dynamic firewall daemon
>    Loaded: loaded (/usr/lib/systemd/system/firewalld.service;
> disabled; vendor preset: enabled)
>    Active: inactive (dead)
>      Docs: man:firewalld(1)
> -----------------------------------
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
>
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> -----------------------------------
>
> And the firewall status of exit-node:
> ====================
> Unit firewalld.service could not be found.
> ---------------------------
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:50842
> ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:53
>
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
> ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
> ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> ---------------------
> Regards
>
> ‫‪John Lauro‬‏ <‪johnalauro at gmail.com‬‏> در تاریخ چهارشنبه ۱ سپتامبر
> ۲۰۲۱ ساعت ۲۱:۵۱ نوشت:‬
> >
> > Just a guess, but I would be suspicious about connection tracking causing the issue.  What are your firewall rules?
> >
> > ‪On Wed, Sep 1, 2021 at 9:51 AM ‫حامد صابر‬‎ <hsaber at gmail.com> wrote:‬
> >>
> >> Dear friends,
> >> I have configured 3 wireguard interfaces on 2 servers to act as a
> >> chained VPN for me (to bypass the internet censorship in my country),
> >> with this schema:
> >>
> >> client -- wg0 on middle-node -- wg1 on middle node -- wg2 on exit node
> >> (to free internet)
> >>
> >> Everything works fine, but after a while, the connection between wg1
> >> and wg2 drops and I can't find the reason. The connection comes back
> >> to action by simply switching the wg1 down and up again using
> >> wg-quick. And the amazing behaviour is that sometimes the connection
> >> comes back to work automatically  after some random time passes,
> >> without any actions from my side (sometimes after a few tens of
> >> minutes, sometimes after a few hours).
> >> When the wg1-wg2 connection is not working, anything else between 2
> >> servers (middle-node and exit-node) works fine. I mean I can ping the
> >> public IP of each server from another part, but the local wireguard ip
> >> of none of them are accessible.
> >>
> >> I tried to monitor the situation and read the logs but couldn't find
> >> what is happening here, so please help!
> >>
> >> The configuration:
> >> ======================
> >>
> >> client (my mobile phone):
> >> -------------------------------------------
> >> [Interface]
> >> Address = 10.10.20.2/32
> >> PrivateKey =  <private key of client>
> >> DNS = 10.10.10.1
> >>
> >> ### Middle Node
> >> [Peer]
> >> PublicKey =  <public key of wg0>
> >> PresharedKey =  <preshared key>
> >> AllowedIPs = 0.0.0.0/0
> >> Endpoint = middle-node:50842
> >> ======================
> >>
> >> wg0 (in middle-node server):
> >> -------------------------------------------
> >> [Interface]
> >> Address = 10.10.20.1/24
> >> ListenPort = 50842
> >> PrivateKey =  <private key of wg0>
> >>
> >> ### Client
> >> [Peer]
> >> PublicKey =  <public key of client>
> >> PresharedKey =  <preshared key>
> >> AllowedIPs = 10.10.20.2/32
> >> ======================
> >>
> >> wg1 (again in middle-node server):
> >> -------------------------------------------
> >> [Interface]
> >> Address = 10.10.10.2/32
> >> PrivateKey =  <private key of wg1>
> >>
> >> PostUp = ip route add default dev wg1 table middle
> >> PostUp = ip rule add iif wg0 lookup middle
> >> PostUp = wg set wg1 peer <publickey of wg2 (in exit-node)> allowed-ips 0.0.0.0/0
> >>
> >> PreDown = ip route del default dev wg1 table middle
> >> PreDown = ip rule del iif wg0 lookup middle
> >>
> >> ### Exit Node
> >> [Peer]
> >> PublicKey =  <publickey of wg2 (in exit-node)>
> >> PresharedKey =  <preshared key>
> >> AllowedIPs = 10.10.10.1/32
> >> Endpoint = exit-node:50842
> >> PersistentKeepalive = 25
> >> ======================
> >>
> >> wg2 (in exit-node server):
> >> -------------------------------------------
> >> [Interface]
> >> Address = 10.10.10.1/24
> >> ListenPort = 50842
> >> PrivateKey =  <private key of wg2>
> >>
> >> PostUp   = iptables -A FORWARD -i eth0 -o wg2 -j ACCEPT
> >> PostUp   = iptables -A FORWARD -i wg2 -j ACCEPT
> >> PostUp   = iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> >>
> >> PostDown = iptables -D FORWARD -i eth0 -o wg2 -j ACCEPT
> >> PostDown = iptables -D FORWARD -i wg2 -j ACCEPT
> >> PostDown = iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
> >>
> >> ### Middle Node
> >> [Peer]
> >> PublicKey =  <publickey of wg1 (in middle-node)>
> >> PresharedKey =  <preshared key>
> >> AllowedIPs = 10.0.0.0/8
> >> ======================
> >> ======================
> >> ======================
> >>
> >> Sample log of dmesg when the wg1-wg2 connection is not working:
> >> -------------------------------------------
> >> [Wed Sep  1 11:19:32 2021] wireguard: wg1: Sending keepalive packet to
> >> peer 12 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:19:44 2021] wireguard: wg0: Sending keepalive packet to
> >> peer 8 (~client-ip~:65323)
> >> [Wed Sep  1 11:19:44 2021] wireguard: wg1: Receiving keepalive packet
> >> from peer 12 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Receiving handshake
> >> initiation from peer 8 (~client-ip~:65323)
> >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Sending handshake response
> >> to peer 8 (~client-ip~:65323)
> >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Keypair 2867 destroyed for peer 8
> >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Keypair 2871 created for peer 8
> >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Receiving keepalive packet
> >> from peer 8 (~client-ip~:65323)
> >> [Wed Sep  1 11:21:19 2021] wireguard: wg0: Sending keepalive packet to
> >> peer 8 (~client-ip~:65323)
> >> [Wed Sep  1 11:21:24 2021] wireguard: wg1: Retrying handshake with
> >> peer 12 (~exit-node-ip~:50842) because we stopped hearing back after
> >> 15 seconds
> >> [Wed Sep  1 11:21:24 2021] wireguard: wg1: Sending handshake
> >> initiation to peer 12 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:21:30 2021] wireguard: wg1: Handshake for peer 12
> >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try
> >> 2)
> >> ======================
> >>
> >> Sample log of dmesg when the wg1-wg2 connection is coming back using
> >> manual restart:
> >> -------------------------------------------
> >> [Wed Sep  1 11:45:52 2021] wireguard: wg1: Sending handshake
> >> initiation to peer 12 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:45:52 2021] wireguard: wg0: Sending keepalive packet to
> >> peer 8 (~client-ip~:2335)
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Handshake for peer 12
> >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try
> >> 3)
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Sending handshake
> >> initiation to peer 12 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Keypair 2878 destroyed for peer 12
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Peer 12
> >> (~exit-node-ip~:50842) destroyed
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Interface destroyed
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Interface created
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Peer 13 created
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Sending keepalive packet to
> >> peer 13 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Sending handshake
> >> initiation to peer 13 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Receiving handshake
> >> response from peer 13 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Keypair 2881 created for peer 13
> >> [Wed Sep  1 11:46:12 2021] wireguard: wg0: Receiving keepalive packet
> >> from peer 8 (~client-ip~:2335)
> >> [Wed Sep  1 11:46:14 2021] wireguard: wg1: Receiving keepalive packet
> >> from peer 13 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:46:27 2021] wireguard: wg0: Sending keepalive packet to
> >> peer 8 (~client-ip~:2335)
> >> [Wed Sep  1 11:46:28 2021] wireguard: wg1: Receiving keepalive packet
> >> from peer 13 (~exit-node-ip~:50842)
> >> [Wed Sep  1 11:46:52 2021] wireguard: wg1: Receiving keepalive packet
> >> from peer 13 (~exit-node-ip~:50842)
> >>
> >>
> >> Thanks in advance for your kind help


More information about the WireGuard mailing list