3 I've prepared some information about
5 * current RISC-V hardware
6 * future RISC-V hardware
13 http://fedora.riscv.rocks/koji/
14 https://koji.fedoraproject.org/koji/
16 Primary architecture or koji-shadow?
19 ## Current RISC-V hardware
21 * Milk-V Pioneer (SG2042)
22 * Sophgo 2U server (SG2042)
24 - Possible performance and quality issues
26 - Likely to end up as e-waste soon
31 - Doesn't support any recent extensions, esp. V, H
32 - No BMC, but might use BMC-on-PCIe card
34 * Single board computers (SBCs)
36 - VisionFive2 (StarFive JH7xxx), Lichee Pi 4A, ...
38 - Limited memory, cores
40 - Upfront & ongoing engineering headaches integrating into a 19" rack
44 - Performance numbers below, but not great
45 - Really using x86-64 hardware
46 - Hardware (x86-64) is well known and easy to manage
47 - No "e-waste", servers can be repurposed
48 - Supports all the latest extensions
49 - Supports any amount of RAM, large numbers of vCPUs
50 - We can add new extensions and fix bugs relatively easily
54 ## Future RISC-V hardware
63 - announced in August 2023
69 - power and efficiency versions, but they are not fully compatible
70 - 8 lanes of gen3 PCIe
72 - miniITX development board
79 ## Performance numbers
82 binutils openssl python3.12 mingw-gcc
84 i686 1589 1577 3411 4292
86 x86-64 1419 1172 2462 1827
88 aarch64 1573 811 (?) 1845 2521
90 ppc64le 2165 1291 3073 4388
92 s390x 2553 1380 1984 (?) 6824
96 qemu-system-riscv64 16 vCPUs, 16 GB
97 on AMD Ryzen 9 7950X server
99 (LTO) 4493 3052 14502 12428
100 +217% +160% +489% +580%
102 (no LTO) 3267 1351 6353 (failed)
105 qemu-system-riscv64 32 vCPUs, 16 GB
106 on AMD Genoa-X server
110 (no LTO) 5115 1882 (failed)
114 (LTO) 7202 8823 (crashed in LTO step)
117 (no LTO) 3274 2059 11627
120 koji.riscv.rocks (mainly HiFive Unmatched)
122 (LTO) 12743 15098 26058 67947
126 ## Single thread performance
128 qemu-system-riscv64 912
133 qemu-system-riscv64 598
148 AMD Genoa-X (x86-64) 36
150 AMD 7950x (x86-64) 35
156 "TCG" is the name for QEMU's software emulation, eg. RISC-V fully
157 emulated guest on x86-64 host.
159 Works using Translation Blocks (TBs) which translate basic blocks of
162 Well understood (by me), easy to fix simpler issues.
164 - I posted a patch yesterday which gets ~ +6% performance gain
166 A few tips to make TCG run (a bit) fast(er):
168 - Compile with -march=native (+4%)
172 - Don't overprovision host CPUs
174 * However pinning vCPUs to pCPUs didn't really help
176 - Give it plenty of guest & host RAM
178 * Measured memory overhead on host is up to 40% after running
181 * Host TBs track guest page cache; as long as a translated
182 executable remains in the guest page cache, it will not be
185 - Don't restart the VM
191 - Performance scales almost linearly with # vCPUs
193 * I compared 4 vCPU vs 16 vCPU virtual machine, and parallelized
194 build speed inside the guest increased by 3.3x, which is almost linear