[WireGuard] News about MIPS and ARM optimized code?
René van Dorst
opensource at vdorst.com
Thu Sep 8 13:57:53 CEST 2016
I did try to write some MIPS32r2 code.
I wrote the chacha20_keysetup, chacha20_generic_block and
poly1305_generic_blocks in assembly.
Tried to load all needed variables in the registers. Which should
reduce the memory overhead.
But it is very difficult for me to do code profiling and/or isolate
the code and make some benchmark programs like supercop.
So testing was simple. Crosscompile the code. Copy and load the module
on the target. Run setup script and iperf.
asmlinkage void chacha20_keysetup(struct chacha20_ctx *ctx, const u8
key[static 32], const u8 nonce[static 8]);
asmlinkage void chacha20_generic_block(struct chacha20_ctx *ctx);
asmlinkage unsigned int poly1305_generic_blocks(struct poly1305_ctx
*ctx, const u8 *src, unsigned int srclen, u32 hibit);
But the speed is equal or less on my TP WR1043ND device which is a
MIPS32r2 24kc big endian.
So GCC does a good job. Also 24kc has no special CoProcessors or FPU.
Most improvement what I had it to change the buildroot default
optimization -Os to -O2.
This gives around 1-3% speed improvement.
- remove the little endian parts on the MIPS.
Offcourse do it also on the other side.
On this device I can't switch endian.
But I did not see any improvements. Need 2 instruction for swapping
After a quick calculation it could save around 0.4% which is
~0.1MBit/s on this device.
René van Dorst.
More information about the WireGuard