[WireGuard] News about MIPS and ARM optimized code?

René van Dorst opensource at vdorst.com
Thu Sep 8 13:57:53 CEST 2016


I did try to write some MIPS32r2 code.
I wrote the chacha20_keysetup, chacha20_generic_block and  
poly1305_generic_blocks in assembly.
Tried to load all needed variables in the registers. Which should  
reduce the memory overhead.
But it is very difficult for me to do code profiling and/or isolate  
the code and make some benchmark programs like supercop.
So testing was simple. Crosscompile the code. Copy and load the module  
on the target. Run setup script and iperf.

#ifdef CONFIG_CPU_MIPS32_R2
asmlinkage void chacha20_keysetup(struct chacha20_ctx *ctx, const u8  
key[static 32], const u8 nonce[static 8]);
asmlinkage void chacha20_generic_block(struct chacha20_ctx *ctx);
asmlinkage unsigned int poly1305_generic_blocks(struct poly1305_ctx  
*ctx, const u8 *src, unsigned int srclen, u32 hibit);
#endif

But the speed is equal or less on my TP WR1043ND device which is a  
MIPS32r2 24kc big endian.
So GCC does a good job. Also 24kc has no special CoProcessors or FPU.

Most improvement what I had it to change the buildroot default  
optimization -Os to -O2.
This gives around 1-3% speed improvement.

ideas:
- remove the little endian parts on the MIPS.
   Offcourse do it also on the other side.
   On this device I can't switch endian.
   But I did not see any improvements. Need 2 instruction for swapping  
32bit register.
   After a quick calculation it could save around 0.4% which is  
~0.1MBit/s on this device.

Greats,

René van Dorst.



More information about the WireGuard mailing list