soft lockup - may be related to wireguard (backported)

Serge Belyshev belyshev at depni.sinp.msu.ru
Mon May 4 12:47:17 CEST 2020


Hi! I can reproduce similar RCU stall with a different kernel under
specific conditions on a specific box:

[   54.437636] rcu: INFO: rcu_sched self-detected stall on CPU
[   54.438838] rcu:  0-...!: (2101 ticks this GP) idle=ea6/1/0x4000000000000002 softirq=604/604 fqs=0 
[   54.440052]  (t=2101 jiffies g=69 q=89)
[   54.441273] rcu: rcu_sched kthread starved for 2101 jiffies! g69 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[   54.442547] rcu: RCU grace-period kthread stack dump:
[   54.443812] rcu_sched       I    0    10      2 0x80004000
[   54.443814] Call Trace:
[   54.445087]  ? __schedule+0x540/0xa80
[   54.446356]  schedule+0x45/0xb0
[   54.447612]  schedule_timeout+0x144/0x280
[   54.448859]  ? __next_timer_interrupt+0xc0/0xc0
[   54.450099]  rcu_gp_kthread+0x3f0/0x840
[   54.451329]  kthread+0xe6/0x120
[   54.452557]  ? rcu_gp_slow.part.0+0x30/0x30
[   54.453761]  ? __kthread_create_on_node+0x150/0x150
[   54.454943]  ret_from_fork+0x1f/0x30
[   54.456095] NMI backtrace for cpu 0
[   54.457221] CPU: 0 PID: 2910 Comm: md5sum Not tainted 5.6.0-00001-g6e142c237f00 #1309
[   54.458355] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790FX-DQ6/GA-MA790FX-DQ6, BIOS F7g 07/19/2010
[   54.459484] Call Trace:
[   54.460576]  <IRQ>
[   54.461672]  dump_stack+0x50/0x70
[   54.462772]  nmi_cpu_backtrace.cold+0x14/0x53
[   54.463871]  ? lapic_can_unplug_cpu.cold+0x3e/0x3e
[   54.464955]  nmi_trigger_cpumask_backtrace+0x7c/0x89
[   54.466026]  rcu_dump_cpu_stacks+0x7b/0xa9
[   54.467088]  rcu_sched_clock_irq.cold+0x153/0x38a
[   54.468146]  update_process_times+0x1f/0x50
[   54.469204]  tick_sched_timer+0x33/0x70
[   54.470262]  ? tick_sched_do_timer+0x50/0x50
[   54.471321]  __hrtimer_run_queues+0xe2/0x180
[   54.472378]  hrtimer_interrupt+0x109/0x240
[   54.473423]  smp_apic_timer_interrupt+0x48/0x80
[   54.474461]  apic_timer_interrupt+0xf/0x20
[   54.475486]  </IRQ>
[   54.476495] RIP: 0033:0x556cbd33bf19
[   54.477506] Code: ce 44 8b 4b 10 c1 c9 0f 01 d1 44 89 4c 24 c8 21 ce 31 c6 01 fe 41 8d bc 01 af 0f 7c f5 89 d0 44 8b 4b 3c c1 ce 0a 31 c8 01 ce <21> f0 31 d0 01 f8 41 8d bc 12 2a c6 87 47 89 ca 41 89 ea c1 c0 07
[   54.479694] RSP: 002b:00007ffc30913ce8 EFLAGS: 00000283 ORIG_RAX: ffffffffffffff13
[   54.480813] RAX: 00000000980270bd RBX: 0000556cbe81e4e0 RCX: 00000000c35c3b1a
[   54.481943] RDX: 000000005b5e4ba7 RSI: 00000000ae8ee5ae RDI: 0000000009b5de85
[   54.483075] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   54.484201] R10: 0000000000000000 R11: 00000000b16eb4f8 R12: 0000000000000000
[   54.485317] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000023604445


Details and steps to reproduce:

1. kernel from git://github.com/ckolivas/linux  tag: 5.6-muqss-199
2. kernel .config in the attachment
3. boot with "threadirqs"
4. launch 100% cpu load for all threads, e.g.:  for N in {1..6}; do md5sum /dev/zero & done
5. observe that the box stops responding to pings via wireguard interface.
6. after some time RCU stall may be triggered (but not always).
7. further wireguard configuration details in the attachment.

Note that this is a heisenbug, it disappears with more debugging options are enabled,
I cannot trigger it on mainline kernel or with different scheduler configuration,
and on a different box (skylake laptop) with exactly the same kernel it
is very hard to trigger.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.gz
Type: application/gzip
Size: 25579 bytes
Desc: not available
URL: <http://lists.zx2c4.com/pipermail/wireguard/attachments/20200504/c2f593f6/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proc_cpuinfo.gz
Type: application/gzip
Size: 775 bytes
Desc: not available
URL: <http://lists.zx2c4.com/pipermail/wireguard/attachments/20200504/c2f593f6/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wg-config-details.gz
Type: application/gzip
Size: 1305 bytes
Desc: not available
URL: <http://lists.zx2c4.com/pipermail/wireguard/attachments/20200504/c2f593f6/attachment-0002.bin>


More information about the WireGuard mailing list