ARM multitheaded?
René van Dorst
opensource at vdorst.com
Tue Nov 21 10:40:32 CET 2017
Hi Jason,
Part 2 ;)
I was expecting that my ixm6 quad core 933MHz outperform my single
core dove 800MHz with a large magnitude.
Dove (Cubox-es) iperf results:
root at cubox-es:~# iperf3 -c 10.0.0.1 -t 10 -Z -i 10
Connecting to host 10.0.0.1, port 5201
[ 4] local 10.0.0.4 port 43600 connected to 10.0.0.1 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-10.00 sec 194 MBytes 163 Mbits/sec 0 820 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 194 MBytes 163 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 192 MBytes 161 Mbits/sec receiver
iperf Done.
root at cubox-es:~# iperf3 -c 10.0.0.1 -t 10 -Z -i 10 -P 3
Connecting to host 10.0.0.1, port 5201
[ 4] local 10.0.0.4 port 43604 connected to 10.0.0.1 port 5201
[ 6] local 10.0.0.4 port 43606 connected to 10.0.0.1 port 5201
[ 8] local 10.0.0.4 port 43608 connected to 10.0.0.1 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-10.00 sec 89.3 MBytes 74.9 Mbits/sec 0 354 KBytes
[ 6] 0.00-10.00 sec 38.8 MBytes 32.6 Mbits/sec 0 227 KBytes
[ 8] 0.00-10.00 sec 54.3 MBytes 45.5 Mbits/sec 0 235 KBytes
[SUM] 0.00-10.00 sec 182 MBytes 153 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 89.3 MBytes 74.9 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 88.5 MBytes 74.2 Mbits/sec receiver
[ 6] 0.00-10.00 sec 38.8 MBytes 32.6 Mbits/sec 0 sender
[ 6] 0.00-10.00 sec 38.4 MBytes 32.2 Mbits/sec receiver
[ 8] 0.00-10.00 sec 54.3 MBytes 45.5 Mbits/sec 0 sender
[ 8] 0.00-10.00 sec 53.6 MBytes 44.9 Mbits/sec receiver
[SUM] 0.00-10.00 sec 182 MBytes 153 Mbits/sec 0 sender
[SUM] 0.00-10.00 sec 180 MBytes 151 Mbits/sec receiver
Imx6 (Utilite) iperf results:
[root at utilite ~]# iperf3 -c 10.0.0.1 -t 10 -Z -i 10
Connecting to host 10.0.0.1, port 5201
[ 4] local 10.0.0.5 port 40336 connected to 10.0.0.1 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-10.00 sec 216 MBytes 181 Mbits/sec 0 382 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 216 MBytes 181 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 215 MBytes 181 Mbits/sec receiver
iperf Done.
[root at utilite ~]# iperf3 -c 10.0.0.1 -t 10 -Z -i 10 -P 3
Connecting to host 10.0.0.1, port 5201
[ 4] local 10.0.0.5 port 40340 connected to 10.0.0.1 port 5201
[ 6] local 10.0.0.5 port 40342 connected to 10.0.0.1 port 5201
[ 8] local 10.0.0.5 port 40344 connected to 10.0.0.1 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-10.00 sec 93.5 MBytes 78.4 Mbits/sec 0 270 KBytes
[ 6] 0.00-10.00 sec 76.1 MBytes 63.9 Mbits/sec 1 224 KBytes
[ 8] 0.00-10.00 sec 88.9 MBytes 74.6 Mbits/sec 0 270 KBytes
[SUM] 0.00-10.00 sec 259 MBytes 217 Mbits/sec 1
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 93.5 MBytes 78.4 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 93.0 MBytes 78.0 Mbits/sec receiver
[ 6] 0.00-10.00 sec 76.1 MBytes 63.9 Mbits/sec 1 sender
[ 6] 0.00-10.00 sec 75.5 MBytes 63.3 Mbits/sec receiver
[ 8] 0.00-10.00 sec 88.9 MBytes 74.6 Mbits/sec 0 sender
[ 8] 0.00-10.00 sec 88.4 MBytes 74.1 Mbits/sec receiver
[SUM] 0.00-10.00 sec 259 MBytes 217 Mbits/sec 1 sender
[SUM] 0.00-10.00 sec 257 MBytes 215 Mbits/sec receiver
iperf Done.
I looked at the cpu usage at the imx while running iperf.
Then I see that iperf is around 2-10% cpu use.
But Kthreads use a lot more.
Below typical cpu usage. (HTOP cpu bars output)
Running: iperf3 -c 10.0.0.1 -t 10 -Z -i 40 -P 3
1 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
87.7%] Tasks: 29, 9 thr, 83 kthr; 6 running
2 [||||||||||||||||||||
28.5%] Load average: 0.86 0.64 0.87
3 [|||||||||||||||||||
27.3%] Uptime: 4 days, 14:22:07
4 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
Mem[||||||||||||||||||||||||||||||||||||||||||||| 85.9M/1000M]
Swp[ 0K/244M]
Running: iperf3 -c 10.0.0.1 -t 10 -Z -i 40
htop output
1 [|||||||||||||||||||||||||||||||||||||||||||||||||||
74.0%] Tasks: 29, 9 thr, 83 kthr; 4 running
2 [|||||||||||||||
20.5%] Load average: 1.20 0.73 0.90
3 [||||||||||||||
19.5%] Uptime: 4 days, 14:22:22
4 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||96.8%]
Mem[||||||||||||||||||||||||||||||||||||||||||||| 86.0M/1000M]
Swp[ 0K/244M]
So it seems that one of process in the chain has a bottleneck.
HTOP only show "kworkers" as a name. Not really useful for debugging.
See below.
1 [||||||||||||||||||||||||||||||||||||||||||||||||||||||
79.1%] Tasks: 29, 9 thr, 82 kthr; 5 running
2 [||||||||||||||||||
24.5%] Load average: 2.07 1.33 1.35
3 [||||||||||||||||
23.2%] Uptime: 4 days, 14:34:57
4 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||99.4%]
Mem[||||||||||||||||||||||||||||||||||||||||||||| 86.3M/1000M]
Swp[ 0K/244M]
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
13706 root 20 0 0 0 0 R 61.8 0.0 1:20.60 kworker/3:6
7 root 20 0 0 0 0 R 20.6 0.0 2:39.03 ksoftirqd/0
13743 root 20 0 0 0 0 S 19.9 0.0 0:10.00 kworker/2:0
13755 root 20 0 0 0 0 R 17.9 0.0 0:18.32 kworker/3:3
13707 root 20 0 0 0 0 S 15.9 0.0 0:24.29 kworker/1:3
13747 root 20 0 0 0 0 S 14.6 0.0 0:03.73 kworker/3:0
13753 root 20 0 0 0 0 S 13.3 0.0 0:01.68 kworker/0:1
13754 root 20 0 0 0 0 R 7.3 0.0 0:03.91 kworker/0:2
13752 root 20 0 0 0 0 S 4.7 0.0 0:02.97 kworker/1:0
13751 root 20 0 0 0 0 S 4.0 0.0 0:03.97 kworker/3:2
13748 root 20 0 2944 608 536 S 2.7 0.1 0:01.14 iperf3
-c 10.0.0.1 -t 1000 -Z -i 40
13749 root 20 0 0 0 0 S 2.7 0.0 0:02.61 kworker/2:1
13733 root 20 0 12860 3252 2368 R 2.0 0.3 0:16.53 htop
13757 root 20 0 0 0 0 S 0.7 0.0 0:01.54 kworker/2:2
13684 root 20 0 0 0 0 S 0.0 0.0 0:25.83 kworker/1:1
13750 root 20 0 0 0 0 S 0.0 0.0 0:04.12 kworker/3:1
13756 root 20 0 0 0 0 S 0.0 0.0 0:01.21 kworker/1:2
Any idea how to debug it and to improve the performance?
Greats,
René van Dorst.
More information about the WireGuard
mailing list