kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with 00000100?

Brad Spengler spender at grsecurity.net
Mon Feb 27 12:53:08 CET 2017


Hi,

From looking at the code, this seems to have been introduced during
the port to 4.9.  In previous kernels (and in both the stable kernels)
a get_cpu was used to look at the per-cpu irq stack pointer value,
which was properly put in all cases.  When the code was reworked
for 4.9 to use some new upstream helper functions to identify the
various stacks (and one which uses this_cpu_ptr to identify the
irq stack) that single put_cpu() wasn't removed with the others.
We'll have this fixed in the next patch.

Thanks!
-Brad

On Mon, Feb 27, 2017 at 04:22:34AM +0100, Jason A. Donenfeld wrote:
> Hey Pipacs,
> 
> I've been receiving reports of strange bugs from grsec users with
> WireGuard. The first set of bugs was a heisenbug crash, and I never
> found the root cause, but it seemed to happen in the rx path. Then
> today Timoth??e emailed another different bug from a grsec box, also
> along the rx path. This time it was related to the preemption count
> being wrong coming into and going out of the rx softirq. This kind of
> preemption mismatch, I figure, might account for the earlier bug I
> never solved.
> 
> So armed with this new information, I went hunting. I followed the
> path inward, surrounding the body of each function with:
> 
> int i = preempt_count();
> function_body...
> if (i != preempt_count()) pr_err("LORDHAVEMERCY\n");
> 
> Eventually I isolated the bug to an interesting situation like this:
> 
> int i = preempt_count();
> other_function(...);
> if (i != preempt_count()) pr_err("This will print out\n");
> 
> void other_function(int a)
> {
> int vla[a];
> int i = preempt_count();
> function_body...
> if (i != preempt_count()) pr_err("This will NOT print out\n");
> }
> 
> Since I only got the outer print, I thought this was strange, so I rearranged:
> 
> void other_function(int a)
> {
> int i = preempt_count();
> int vla[a];
> if (i != preempt_count()) pr_err("This will print out\n");
> function_body...
> }
> 
> Yay, we found the bug. But wtf, what could possibly be changing the
> preempt_count there?
> 
> So I went disassembling, and lo and behold the clever PaX stack leak
> plugin was adding calls to pax_check_alloca. Very nice! But still, why
> the preemption bug situation? I went hunting further:
> 
> void __used pax_check_alloca(unsigned long size)
> {
>  ...
>        case STACK_TYPE_IRQ:
>                stack_left = sp & (IRQ_STACK_SIZE - 1);
>                put_cpu();
>                break;
>  ...
> }
> 
> Do you see the bug? Looks like somebody snuck in a "put_cpu()" there,
> where it really does not belong. "put_cpu()" basically just jiggers
> the preempt_count. I can confirm that removing the erroneous call to
> "put_cpu()" fixes the bug.
> 
> So, either this is by design, and there's some odd subtlety I'm
> missing, or this is a bug that should be fixed in grsec/PaX.
> 
> In the case of the latter, I believe this introduces a security
> vulnerability, since it opens up a whole host of interesting race
> conditions that can be exploited.
> 
> Thanks,
> Jason
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.zx2c4.com/pipermail/wireguard/attachments/20170227/e258b6a4/attachment.asc>


More information about the WireGuard mailing list