Handshake state collision between parralel RoutineHandshake threads
Laura Zelenku
laura.zelenku at wandera.com
Tue Mar 16 14:03:09 UTC 2021
Still struggling with this issue. Running RoutineHandshake in single instance will help. Remember there is shared resource “peer.handshake.state”. Even the resource is per peer there are two directions (upstream/downstream) that can fight for this resource and write it’s own value.
Index: device/receive.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- device/receive.go (revision 5f0c8b942d93be6ac36a156c0ba44c86c3698f91)
+++ device/receive.go (date 1615902577604)
@@ -10,6 +10,7 @@
"encoding/binary"
"errors"
"net"
+ "runtime"
"sync"
"sync/atomic"
"time"
@@ -239,7 +240,9 @@
func (device *Device) RoutineHandshake() {
defer func() {
device.log.Verbosef("Routine: handshake worker - stopped")
- device.queue.encryption.wg.Done()
+ for i := 0; i < runtime.NumCPU(); i++ {
+ device.queue.encryption.wg.Done()
+ }
}()
device.log.Verbosef("Routine: handshake worker - started")
Index: device/device.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- device/device.go (revision 5f0c8b942d93be6ac36a156c0ba44c86c3698f91)
+++ device/device.go (date 1615902577566)
@@ -305,12 +305,12 @@
cpus := runtime.NumCPU()
device.state.stopping.Wait()
- device.queue.encryption.wg.Add(cpus) // One for each RoutineHandshake
+ device.queue.encryption.wg.Add(cpus)
for i := 0; i < cpus; i++ {
go device.RoutineEncryption()
go device.RoutineDecryption()
- go device.RoutineHandshake()
- }
+ }
+ go device.RoutineHandshake()
device.state.stopping.Add(1) // RoutineReadFromTUN
device.queue.encryption.wg.Add(1) // RoutineReadFromTUN
> On 1. 3. 2021, at 15:08, Laura Zelenku <laura.zelenku at wandera.com> wrote:
>
> Hi Jason,
> I’ll try to explain the issue.
>
> For incomming hanshake, the `handshake.state` is changing in the following way:
> 1. set state handshakeInitiationConsumed
> 2. check the state is handshakeInitiationConsumed otherwise "handshake initiation must be consumed first” error
> 3. set state handshakeResponseCreated
> 4. check the state is handshakeResponseCreated, otherwise "invalid state for keypair derivation” error
> 5. set state handshakeZeroed
>
> For outgoing handshake the `handshake.state` is changing:
> 1. set state handshakeInitiationCreated
> 2. <sending handshake and waiting for response>
> 3. check the state is handshakeInitiationCreated, otherwise skip the packet
> 4. set state handshakeResponseConsumed
> 5. check the state is handshakeResponseConsumed, otherwise "invalid state for keypair derivation” error
> 6. set state handshakeZeroed
>
> Usually only “client” is sending handshake initiations and the “server” responding. But in case some delay (e.g. cause by some network issues mainly for mobile devices) the “server” can start sending handshake initiations (expiredNewHandshake or expiredRetransmitHandshake timers). In this time the client and server are sending hanshake initiations against each other. "go device.RoutineHandshake()” is running in multiple threads. `handshake.state` is defined per peer. Two threads (RoutineHandshake) can process both handshakes (incomming, outgoing) in the same time and these threads are working with shared resource, handshake.state. Because the routine is expecting state that was set before and the second thread can modify the state, the routine can fail on checking the expected handshake.state.
> This is happening to us. We are getting error "handshake initiation must be consumed first”. handshakeInitiationConsumed is expected but handshakeZeroed is actually set (set by different thread). The error is logged on error level (Failed to create response message).
>
> Hope this will help to understand the issue well.
>
> Laura
>
>
>> On 25 Feb 2021, at 12:23, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>>
>> Hi Laura,
>>
>> I'm not sure this is actually a problem. The latest handshake message
>> should probably win the race. I don't see state machine or data
>> corruption here, but just one handshake interrupting another, which is
>> par for the course with WireGuard.
>>
>> Or have I overlooked something important in the state machine implementation?
>>
>> Jason
>
--
*IMPORTANT NOTICE*: This email, its attachments and any rights attaching
hereto are confidential and intended exclusively for the person to whom the
email is addressed. If you are not the intended recipient, do not read,
copy, disclose or use the contents in any way. Wandera accepts no liability
for any loss, damage or consequence resulting directly or indirectly from
the use of this email and attachments.
More information about the WireGuard
mailing list