[WireGuard] News about MIPS and ARM optimized code?

Thu Sep 22 20:27:20 CEST 2016

Hi Jason,

I am using the LEDE-projects default kernel.
My comparison is only between the patched C version with the aligned  
memory reads and my assembly version module.

I think it is too complex for GCC to optimize, so it flows the code by  
the letter.
This results in a lot of data hazards.

By doing by hand you can prevent many data hazards.
The trick is try to do 2 things by weaving the code together.
Which results in less maintainable code.

Greats,

René van Dorst.

Quoting "Jason A. Donenfeld" <Jason at zx2c4.com>:

> Hey René,
>
> That's excellent. Thanks for writing that. I'll review this implementation.
>
> Is your speed up compared to your unaligned optimization from the
> other patch? Or is that against vanilla?
>
> With only a 1% increase, I'm first interested to see where precisely
> that improvement is coming from, and if we could squeeze that out of
> gcc instead, so that they're producing more or less the same code.
>
> Regards,
> Jason