[PATCH net] wireguard: Use tunnel helpers for decapsulating ECN markings

Tue Apr 28 11:10:43 CEST 2020

"Rodney W. Grimes" <ietf at gndrsh.dnsmgr.net> writes:

> Replying to a single issue I am reading, and really hope I
> am miss understanding.  I am neither a wireguard or linux
> user so I may be miss understanding what is said.
>
> Inline at {RWG}
>
>> "Jason A. Donenfeld" <Jason at zx2c4.com> writes:
>> 
>> > Hey Toke,
>> >
>> > Thanks for fixing this. I wasn't aware there was a newer ECN RFC. A
>> > few comments below:
>> >
>> > On Mon, Apr 27, 2020 at 8:47 AM Toke H?iland-J?rgensen <toke at redhat.com> wrote:
>> >> RFC6040 also recommends dropping packets on certain combinations of
>> >> erroneous code points on the inner and outer packet headers which shouldn't
>> >> appear in normal operation. The helper signals this by a return value > 1,
>> >> so also add a handler for this case.
>> >
>> > This worries me. In the old implementation, we propagate some outer
>> > header data to the inner header, which is technically an authenticity
>> > violation, but minor enough that we let it slide. This patch here
>> > seems to make that violation a bit worse: namely, we're now changing
>> > the behavior based on a combination of outer header + inner header. An
>> > attacker can manipulate the outer header (set it to CE) in order to
>> > learn whether the inner header was CE or not, based on whether or not
>> > the packet gets dropped, which is often observable. That's some form
>
> Why is anyone dropping on decap over the CE bit?  It should be passed
> on, not lead to a packet drop.  If the outer header is CE on an inner
> header of CE it should just continue to be a CE, dropping it is actually
> breaking the purpose of the CE codepoint, to signal congestion before
> having to cause a packet loss.
>
>> > of an oracle, which I'm not too keen on having in wireguard. On the
>> > other hand, we pretty much already _explicitly leak this bit_ on tx
>> > side -- in send.c:
>> >
>> > PACKET_CB(skb)->ds = ip_tunnel_ecn_encap(0, ip_hdr(skb), skb); // inner packet
>> > ...
>> > wg_socket_send_skb_to_peer(peer, skb, PACKET_CB(skb)->ds); // outer packet
>> >
>> > We considered that leak a-okay. But a decryption oracle seems slightly
>> > worse than an explicit and intentional leak. But maybe not that much
>> > worse.
>> 
>> Well, seeing as those two bits on the outer header are already copied
>> from the inner header, there's no additional leak added by this change,
>> is there? An in-path observer could set CE and observe that the packet
>> gets dropped, but all they would learn is that the bits were zero
>
> Again why is CE leading to anyone dropping?
>
>> (non-ECT). Which they already knew because they could just read the bits
>> directly from the header.
>> 
>> Also note, BTW, that another difference between RFC 3168 and 6040 is the
>> propagation of ECT(1) from outer to inner header. That's not actually
>> done correctly in Linux ATM, but I sent a separate patch to fix this[0],
>> which Wireguard will also benefit from with this patch.
>
> Thanks for this.
>
>> 
>> > I wanted to check with you: is the analysis above correct? And can you
>> > somehow imagine the ==2 case leading to different behavior, in which
>> > the packet isn't dropped? Or would that ruin the "[de]congestion" part
>> > of ECN? I just want to make sure I understand the full picture before
>> > moving in one direction or another.
>> 
>> So I think the logic here is supposed to be that if there are CE marks
>> on the outer header, then an AQM somewhere along the path has marked the
>> packet, which is supposed to be a congestion signal, which we want to
>> propagate all the way to the receiver (who will then echo it back to the
>> receiver). However, if the inner packet is non-ECT then we can't
>> actually propagate the ECN signal; and a drop is thus the only
>> alternative congestion signal available to us.
>
> You cannot get a CE mark on the outer packet if the inner packet is
> not ECT, as the outer packet would also be not ECT and thus not
> eligible for CE mark.  If you get the above sited condition something
> has gone *wrong*.

Yup, you're quite right. If everything is working correctly, this should
never happen. This being the internet, though, there are bound to be
cases where it will go wrong :)

>> This case shouldn't
>> actually happen that often, a middlebox has to be misconfigured to
>> CE-mark a non-ECT packet in the first place. But, well, misconfigured
>> middleboxes do exist as you're no doubt aware :)
>
> That is true, though I believe the be liberal in what you accept
> concept would say ok, someone messed up, just propogate it and
> let the end nodes deal with it, otherwise your creating a blackhole
> that could be very hard to find.

But that would lead you to ignore a congestion signal. And someone has
to go through an awful lot of trouble to set this signal; if they're
just randomly mangling bits the packet checksum will likely be wrong and
the packet would be dropped anyway. So on balance I'd tend to agree with
the RFC that the right thing to do is to propagate the congestion
signal; which in the case of a non-ECT packet means dropping it,
otherwise we'd just be contributing to the RFC-violating behaviour...

I do believe the advice in the RFC to log these cases is exactly because
of the risk of blackholes you're referring to. I discussed this a bit
with Jason and we ended up agreeing that just marking it as a framing
error should be enough for Wireguard, though...

-Toke