soft lockup - may be related to wireguard (backported)

Mon May 4 18:51:02 CEST 2020

Alex Xu (Hello71) <alex_y_xu at yahoo.ca> 于2020年5月4日周一 下午9:49写道：
>
> Excerpts from Jason A. Donenfeld's message of May 4, 2020 1:26 am:
> > Are you routing wireguard over wireguard, or something interesting like that?
> >
> > Is ipsec being used?
> >
>
> This "DN2800MT" looks like an Atom board from 2012; are you trying to
> run a very large bandwidth through it? I think it's plausible that
> buffering could cause a large enough chunk of work for a slow CPU that
> the kernel could erroneously think that the code is stuck. I don't know
> why that would happen on the tx thread though, or why it would result in
> an RCU stall, seeing as only a single access is made under an RCU lock
> in this function...

It's an old Atom CPU but still capable of handling at least 100Mbps
(even with encryption).

I have a prometheus running on this box, so I have monitoring data at
an interval of 15s.

Looking at the monitoring data
1. the event started from 1:02:00am, ended at 1:04:45am
2. peak time was 1:03:15am, for load, cpu usage and memory usage
3. available memory dropped from 3.359GB to 2.714GB at 1:03:15am, and
bounced up gradually
from this point

Seems it's related to wireguard's memory use, a cause or a result.