soft lockup - may be related to wireguard (backported)
Wang Jian
larkwang at gmail.com
Mon May 4 05:55:35 CEST 2020
The following is kernel log of soft lockup
# uname -a
Linux gw 5.5.0-2-amd64 #1 SMP Debian 5.5.17-1 (2020-04-15) x86_64 GNU/Linux
# cat /sys/module/wireguard/version
1.0.0
This is a linux box run debian testing. I think the wireguard is
backported from 5.6.
The box had a cpu usage spike during the soft lockup, and after that,
the wireguard stopped working, no handshake.
ifdown wg0; ifup wg0 was not capable to recover the wireguard, no handshake.
--
May 4 01:02:17 gw kernel: [27929.504771] rcu: INFO: rcu_sched
self-detected stall on CPU
May 4 01:02:17 gw kernel: [27929.504859] rcu: 2-....: (5582 ticks
this GP) idle=c02/1/0x4000000000000002 softirq=300509/300511 fqs=2470
May 4 01:02:17 gw kernel: [27929.504937] (t=5250 jiffies g=518541 q=1110)
May 4 01:02:17 gw kernel: [27929.504975] NMI backtrace for cpu 2
May 4 01:02:17 gw kernel: [27929.505010] CPU: 2 PID: 25094 Comm:
kworker/2:1 Not tainted 5.5.0-2-amd64 #1 Debian 5.5.17-1
May 4 01:02:17 gw kernel: [27929.505076] Hardware name: /DN2800MT,
BIOS MTCDT10N.86A.0169.2013.0923.1101 09/23/2013
May 4 01:02:17 gw kernel: [27929.505152] Workqueue: wg-crypt-wg0
wg_packet_tx_worker [wireguard]
May 4 01:02:17 gw kernel: [27929.505205] Call Trace:
May 4 01:02:17 gw kernel: [27929.505233] <IRQ>
May 4 01:02:17 gw kernel: [27929.505263] dump_stack+0x66/0x90
May 4 01:02:17 gw kernel: [27929.505297] nmi_cpu_backtrace.cold+0x14/0x53
May 4 01:02:17 gw kernel: [27929.505339] ? lapic_can_unplug_cpu.cold+0x3e/0x3e
May 4 01:02:17 gw kernel: [27929.505381]
nmi_trigger_cpumask_backtrace+0xf6/0xf8
May 4 01:02:17 gw kernel: [27929.505427] rcu_dump_cpu_stacks+0x8f/0xbd
May 4 01:02:17 gw kernel: [27929.505466] rcu_sched_clock_irq.cold+0x1b2/0x3ab
May 4 01:02:17 gw kernel: [27929.505509] update_process_times+0x24/0x50
May 4 01:02:17 gw kernel: [27929.505549] tick_sched_handle+0x22/0x60
May 4 01:02:17 gw kernel: [27929.505585] tick_sched_timer+0x38/0x80
May 4 01:02:17 gw kernel: [27929.505620] ? tick_sched_do_timer+0x60/0x60
May 4 01:02:17 gw kernel: [27929.505660] __hrtimer_run_queues+0xf6/0x270
May 4 01:02:17 gw kernel: [27929.505700] hrtimer_interrupt+0x10e/0x240
May 4 01:02:17 gw kernel: [27929.505740] smp_apic_timer_interrupt+0x6c/0x130
May 4 01:02:17 gw kernel: [27929.505782] apic_timer_interrupt+0xf/0x20
May 4 01:02:17 gw kernel: [27929.505818] </IRQ>
May 4 01:02:17 gw kernel: [27929.505843] RIP: 0010:queue_work_on+0x27/0x40
May 4 01:02:17 gw kernel: [27929.505883] Code: 00 00 00 0f 1f 44 00
00 53 9c 58 0f 1f 44 00 00 48 89 c3 fa 66 0f 1f 44 00 00 f0 48 0f ba
2a 00 73 12 45 31 c0 48 89 df 57 9d <0f> 1f 44 00 00 44 89 c0 5b c3 e8
ea fb ff ff 41 b8 01 00 00 00 eb
May 4 01:02:17 gw kernel: [27929.506019] RSP: 0018:ffffb071c2eefaa8
EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
May 4 01:02:17 gw kernel: [27929.506080] RAX: 0000000000000001 RBX:
0000000000000286 RCX: 0000000000000000
May 4 01:02:17 gw kernel: [27929.506135] RDX: 00001966d8343dfd RSI:
0000000000000083 RDI: 0000000000000286
May 4 01:02:17 gw kernel: [27929.506191] RBP: ffff8b61aa1c0000 R08:
0000000000000001 R09: ffff8b61abdad510
May 4 01:02:17 gw kernel: [27929.506246] R10: ffff8b61a467ee98 R11:
ffff8b61abdacb78 R12: ffff8b61abc43000
May 4 01:02:17 gw kernel: [27929.506301] R13: ffff8b60fef63500 R14:
0000000000000003 R15: ffff8b61a9623904
May 4 01:02:17 gw kernel: [27929.506367]
wg_packet_send_staged_packets+0x320/0x5d0 [wireguard]
May 4 01:02:17 gw kernel: [27929.506426] wg_xmit+0x324/0x490 [wireguard]
May 4 01:02:17 gw kernel: [27929.506469] dev_hard_start_xmit+0x8d/0x1e0
May 4 01:02:17 gw kernel: [27929.506508] __dev_queue_xmit+0x721/0x8e0
May 4 01:02:17 gw kernel: [27929.506549] ip_finish_output2+0x19b/0x590
May 4 01:02:17 gw kernel: [27929.506604] ? nf_confirm+0xcb/0xf0 [nf_conntrack]
May 4 01:02:17 gw kernel: [27929.506648] ip_output+0x76/0xf0
May 4 01:02:17 gw kernel: [27929.506681] ? __ip_finish_output+0x1c0/0x1c0
May 4 01:02:17 gw kernel: [27929.506720] iptunnel_xmit+0x174/0x210
May 4 01:02:17 gw kernel: [27929.506761] send4+0x120/0x390 [wireguard]
May 4 01:02:17 gw kernel: [27929.506806]
wg_socket_send_skb_to_peer+0x98/0xb0 [wireguard]
May 4 01:02:17 gw kernel: [27929.506860]
wg_packet_tx_worker+0xa9/0x210 [wireguard]
May 4 01:02:17 gw kernel: [27929.506907] process_one_work+0x1b4/0x370
May 4 01:02:17 gw kernel: [27929.506944] worker_thread+0x50/0x3c0
May 4 01:02:17 gw kernel: [27929.506980] kthread+0xf9/0x130
May 4 01:02:17 gw kernel: [27929.510322] ? process_one_work+0x370/0x370
May 4 01:02:17 gw kernel: [27929.514059] ? kthread_park+0x90/0x90
May 4 01:02:17 gw kernel: [27929.517774] ret_from_fork+0x1f/0x40
May 4 01:02:45 gw kernel: [27957.473746] watchdog: BUG: soft lockup -
CPU#2 stuck for 22s! [kworker/2:1:25094]
May 4 01:02:45 gw kernel: [27957.476765] Modules linked in:
nf_conntrack_netlink sctp wireguard pppoe pppox ppp_generic slhc
curve25519_x86_64 libcurve25519 libchacha20poly1305 chacha_x86_64
poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64
libblake2s_generic libchacha cfg80211 rfkill 8021q garp stp mrp llc
xt_mark xt_set nft_chain_nat xt_MASQUERADE xt_TCPMSS xt_nat nf_nat
nft_counter xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack
xt_multiport nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink
binfmt_misc nls_ascii nls_cp437 vfat intel_powerclamp fat coretemp
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio hid_generic
snd_hda_codec_hdmi evdev efi_pstore snd_hda_intel snd_intel_dspcfg
snd_hda_codec snd_hda_core gma500_gfx usbhid snd_hwdep snd_pcm hid
iTCO_wdt snd_timer efivars pcspkr iTCO_vendor_support drm_kms_helper
snd sg watchdog soundcore i2c_algo_bit acpi_cpufreq button tcp_bbr
sch_cake nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 loop drm efivarfs
ip_tables x_tables autofs4
May 4 01:02:45 gw kernel: [27957.476843] ext4 crc16 mbcache jbd2
btrfs xor zstd_decompress zstd_compress raid6_pq crc32c_generic
libcrc32c sd_mod ahci libahci uhci_hcd ehci_pci libata ehci_hcd e1000e
scsi_mod usbcore ptp i2c_i801 lpc_ich pps_core mfd_core usb_common
video [last unloaded: wireguard]
May 4 01:02:45 gw kernel: [27957.501504] CPU: 2 PID: 25094 Comm:
kworker/2:1 Not tainted 5.5.0-2-amd64 #1 Debian 5.5.17-1
May 4 01:02:45 gw kernel: [27957.505274] Hardware name: /DN2800MT,
BIOS MTCDT10N.86A.0169.2013.0923.1101 09/23/2013
May 4 01:02:45 gw kernel: [27957.509046] Workqueue: wg-crypt-wg0
wg_packet_tx_worker [wireguard]
May 4 01:02:45 gw kernel: [27957.512801] RIP:
0010:__xfrm_decode_session+0xd/0x800
May 4 01:02:45 gw kernel: [27957.516563] Code: 00 00 00 e8 55 be ce
ff eb 8c e8 be 46 95 ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f
1f 44 00 00 55 48 89 e5 41 57 41 56 <41> 89 ce 41 55 41 54 49 89 fc 53
48 89 f3 48 83 ec 10 83 fa 02 74
May 4 01:02:45 gw kernel: [27957.520497] RSP: 0018:ffffb071c2eefb28
EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
May 4 01:02:45 gw kernel: [27957.524422] RAX: 0000000000000000 RBX:
ffff8b618d13ed80 RCX: 0000000000000000
May 4 01:02:45 gw kernel: [27957.528366] RDX: 0000000000000002 RSI:
ffffb071c2eefb58 RDI: ffff8b60fe81ce00
May 4 01:02:45 gw kernel: [27957.532285] RBP: ffffb071c2eefb38 R08:
0000000000000000 R09: ffff8b60ff8e5f00
May 4 01:02:45 gw kernel: [27957.536216] R10: 0000000000000000 R11:
0000000000000001 R12: ffffffff92ae7f80
May 4 01:02:45 gw kernel: [27957.540097] R13: ffffb071c2eefb58 R14:
0000000000000000 R15: 0000000000000001
May 4 01:02:45 gw kernel: [27957.543929] FS: 0000000000000000(0000)
GS:ffff8b61abd00000(0000) knlGS:0000000000000000
May 4 01:02:45 gw kernel: [27957.547734] CS: 0010 DS: 0000 ES: 0000
CR0: 0000000080050033
May 4 01:02:45 gw kernel: [27957.551506] CR2: 00007fa15c342fb8 CR3:
000000010c80a000 CR4: 00000000000006e0
May 4 01:02:45 gw kernel: [27957.555248] Call Trace:
May 4 01:02:45 gw kernel: [27957.558920] ip_route_me_harder+0x1ce/0x250
May 4 01:02:45 gw kernel: [27957.562614]
nf_route_table_hook4+0xee/0x110 [nf_tables]
May 4 01:02:45 gw kernel: [27957.566288] nf_hook_slow+0x40/0xb0
May 4 01:02:45 gw kernel: [27957.569916] __ip_local_out+0xd4/0x140
May 4 01:02:45 gw kernel: [27957.573500] ? ip_forward_options.cold+0x23/0x23
May 4 01:02:45 gw kernel: [27957.577098] ip_local_out+0x18/0x50
May 4 01:02:45 gw kernel: [27957.580669] iptunnel_xmit+0x174/0x210
May 4 01:02:45 gw kernel: [27957.584210] send4+0x120/0x390 [wireguard]
May 4 01:02:45 gw kernel: [27957.587726]
wg_socket_send_skb_to_peer+0x98/0xb0 [wireguard]
May 4 01:02:45 gw kernel: [27957.591377]
wg_packet_tx_worker+0xa9/0x210 [wireguard]
May 4 01:02:45 gw kernel: [27957.594986] process_one_work+0x1b4/0x370
May 4 01:02:45 gw kernel: [27957.598526] worker_thread+0x50/0x3c0
May 4 01:02:45 gw kernel: [27957.602063] kthread+0xf9/0x130
May 4 01:02:45 gw kernel: [27957.605583] ? process_one_work+0x370/0x370
May 4 01:02:45 gw kernel: [27957.609112] ? kthread_park+0x90/0x90
May 4 01:02:45 gw kernel: [27957.612633] ret_from_fork+0x1f/0x40
More information about the WireGuard
mailing list