Handshake state collision between parralel RoutineHandshake threads

Laura Zelenku laura.zelenku at wandera.com
Mon Mar 1 14:08:26 UTC 2021


Hi Jason,
I’ll try to explain the issue.

For incomming hanshake, the `handshake.state` is changing in the following way:
1. set state handshakeInitiationConsumed
2. check the state is handshakeInitiationConsumed otherwise "handshake initiation must be consumed first” error
3. set state handshakeResponseCreated
4. check the state is handshakeResponseCreated, otherwise "invalid state for keypair derivation” error
5. set state handshakeZeroed

For outgoing handshake the `handshake.state` is changing:
1. set state handshakeInitiationCreated
2. <sending handshake and waiting for response>
3. check the state is handshakeInitiationCreated, otherwise skip the packet
4. set state handshakeResponseConsumed
5. check the state is handshakeResponseConsumed, otherwise "invalid state for keypair derivation” error
6. set state handshakeZeroed

Usually only “client” is sending handshake initiations and the “server” responding. But in case some delay (e.g. cause by some network issues mainly for mobile devices) the “server” can start sending handshake initiations (expiredNewHandshake or expiredRetransmitHandshake timers). In this time the client and server are sending hanshake initiations against each other. "go device.RoutineHandshake()” is running in multiple threads. `handshake.state` is defined per peer. Two threads (RoutineHandshake) can process both handshakes (incomming, outgoing) in the same time and these threads are working with shared resource, handshake.state. Because the routine is expecting state that was set before and the second thread can modify the state, the routine can fail on checking the expected handshake.state.
This is happening to us. We are getting error "handshake initiation must be consumed first”. handshakeInitiationConsumed is expected but handshakeZeroed is actually set (set by different thread). The error is logged on error level (Failed to create response message).

Hope this will help to understand the issue well.

Laura


> On 25 Feb 2021, at 12:23, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
> 
> Hi Laura,
> 
> I'm not sure this is actually a problem. The latest handshake message
> should probably win the race. I don't see state machine or data
> corruption here, but just one handshake interrupting another, which is
> par for the course with WireGuard.
> 
> Or have I overlooked something important in the state machine implementation?
> 
> Jason


-- 
*IMPORTANT NOTICE*: This email, its attachments and any rights attaching 
hereto are confidential and intended exclusively for the person to whom the 
email is addressed. If you are not the intended recipient, do not read, 
copy, disclose or use the contents in any way. Wandera accepts no liability 
for any loss, damage or consequence resulting directly or indirectly from 
the use of this email and attachments.


More information about the WireGuard mailing list