Rendered at 10:44:21 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
kashyapc 29 minutes ago [-]
A couple of corrections (the blog-post is by colleague, but I'm not speaking for Marcin! :))
First, we do have a recent 'binutils' build[1] with test-suites in 67 minutes (it was Milk-V Megrez) in the Fedora RISC-V build system.
Second, the current fastest development machine is not Banana Pi BPI-F3. If we consider what is reasonably accessible today, it is SiFive "HiFive P550" (P550 for short) and an upcoming UltraRISC "DP1000", we have access to an eval board.
FWIW, in our FOSDEM talk earlier this year: "Fedora on RISC-V: state of the arch"[1]. It gives an overview of the hardware situation and it also has a couple of related poorman's benchmarks (an 'xz' compression test and a 'binutils' build without* the test-suite on the above two boards - that's what I could manage with the time I had).
Edit: Marcin's RISC-V test was done on StarFive "Vision Five 2". This small board has its strengths (upstreamed drivers), but it is not known for its speed!
Don't blame the ISA - blame the silicon implementations AND the software with no architecture-specific optimisations.
RISC-V will get there, eventually.
I remember that ARM started as a speed demon with conscious power consumption, then was surpassed by x86s and PPCs on desktops and moved to embedded, where it shone by being very frugal with power, only to now be leaving the embedded space with implementations optimised for speed more than power.
newpavlov 13 hours ago [-]
In some cases RISC-V ISA spec is definitely the one to blame:
Another example is hard-coded 4 KiB page size which effectively kneecaps ISA when compared against ARM.
weebull 12 hours ago [-]
All of those things are solved with modern extensions. It's like comparing pre-MMX x86 code with modern x86. Misaligned loads and stores are Zicclsm, bit manipulation is Zb[abcs], atomic memory operations are made mandatory in Ziccamoa.
All of these extensions are mandatory in the RVA22 and RVA23 profiles and so will be implemented on any up to date RISC-V core. It's definitely worth setting your compiler target appropriately before making comparisons.
LeFantome 11 hours ago [-]
Ubuntu being RVA23 is looking smarter and smarter.
The RISC-V ecosystem being handicapped by backwards compatibility does not make sense at this point.
Every new RISC-V board is going to be RVA23 capable. Now is the time to draw a line in the sand.
saagarjha 2 hours ago [-]
I’d be kind of depressed if every new RISC-V board was not RVA23 capable.
cmovq 8 hours ago [-]
But RISC-V is a _new_ ISA. Why did we start out with the wrong design that now needs a bunch of extensions? RISC-V should have taken the learnings from x86 and ARM but instead they seem to be committing the same mistakes.
kldg 4 hours ago [-]
I was a bit shocked by headline, given how poorly ARM and x86 compares to RISC-V in speed, cost, and efficiency ... in the MCU space where I near-exclusively live and where RISC-V has near-exclusively lived up until quite recently. RISC-V has been great for RTOS systems and Espressif in particular has pushed MCUs up to a new level where it's become viable to run a designed-from-scratch web server (you better believe we're using vector graphics) on a $5 board that sits on your thumb, but using RISC-V in SBCs and beyond as the primary CPU is a very different ballgame.
wolvoleo 8 hours ago [-]
It is a reduced instruction set computing isa of course. It shouldn't really have instructions for every edge case.
I only use it for microcontrollers and it's really nice there. But yeah I can imagine it doesn't perform well on bigger stuff. The idea of risc was to put the intelligence in the compiler though, not the silicon.
pjmlp 4 hours ago [-]
As proven by x86/x64 and ARM evolution, being all in into pure RISC doesn't pay off, because there is only so much compilers can do in a AOT deployment scenario.
hun3 8 hours ago [-]
It was kind of an experiment from start. Some ideas turned out to be good, so we keep them. Some ideas turned out not to be good, so we fix them with extensions.
pjmlp 4 hours ago [-]
The problem with hardware expirements is that people owning the hardware are stuck with experiments.
nsvd2 24 minutes ago [-]
Sure, but if you bought a dev board with an experimental ISA I think you knew what you were getting in to.
rbanffy 2 hours ago [-]
If your hardware is new, you get the nicest extensions though. You just don’t use the bad parts in your code.
pjmlp 2 hours ago [-]
Sure, if you are developing software for the computer you own, instead of supporting everyone.
veltas 2 hours ago [-]
Relatively new, we're about 16 years down the road.
pajko 2 hours ago [-]
Intentionally. Back then the guys were telling that everything could be solved by raw power.
3 hours ago [-]
sidewndr46 9 hours ago [-]
You're correct but I guess my thoughts are if we're going to wind up with a mess of extensions, why not just use x86-64?
LeFantome 6 hours ago [-]
First, x86-64 also has “extensions” such as avx, avx2, and avx512. Not all “x86-64” CPUs support the same ones. And you get things like svm on AMD and avx on Intel. Remember 3DNow?
X86-64 also has “profiles” which tell you what extensions should be available. There is x86-64v1 and x86-64v4 with v2 and v3 in the middle.
RVA23 offers a very similar feature-set to x86-64v4.
You do not end up with a mess of extensions. You get RVA23. Yes, RVA23 represents a set of mandatory extensions. The important thing is that two RVA23 compliant chips will implement the same ones.
But the most important point is that you cannot “just use x86-64”. Only Intel and AMD can do that. Anybody can build a RISC-V chip. You do not need permission.
BoredomIsFun 4 hours ago [-]
1. Yes, but most of the code would run on anything older than 2007. 20 years of stable ISA.
2. Also, fundamentally all modern CPUs are still 64-bit version of 80386. MMU, protection, low level details are all same.
whaleofatw2022 9 hours ago [-]
Because the ISA is not encumbered the way other ISAs are legally, and there are use cases where the minimal profile is fine for the sake of embedded whatever vs the cost to implement the extensions
computably 8 hours ago [-]
> why not just use x86-64?
Uh, because you can't? It's not open in any meaningful sense.
userbinator 7 hours ago [-]
The original amd64 came out in 2003. Any patents on the original instruction set have long expired, and even more so for 32-bit x86.
panick21_ 3 hours ago [-]
Its not about patents. Believe what you want but there is a reason nobody else is doing x86 or ARM chips unless they are allowed by the owner.
dbdr 53 minutes ago [-]
You're probably right. It would be helpful to say what the reason is, if it's not patents.
It's 4k on x86 as well. Doesn't seem to hurt so bad -- at least, not enough to explain the risc-v performance gap.
twoodfin 9 hours ago [-]
Hmm? x86 has supported much larger “huge” page sizes for ages.
ori_b 6 hours ago [-]
Yes, and Linux. at least historically, has not used them without explicit program opt-in. Often advice is to disable transparent huge pages for performance reasons. Not sure about other operating systems.
Now they want to introduce yet another (sic!) extension Oilsm... It maaaaaay become part of RVA30, so in the best case scenario it will be decades before we will be able to rely on it widely (especially considering that RVA23 is likely to become heavily entrenched as "the default").
IMO the spec authors should've mandated that the base load/store instructions work only with aligned pointers and introduced misaligned instructions in a separate early extension. (After all, passing a misaligned pointer where your code does not expect it is a correctness issue.) But I would've been fine as well if they mandated that misaligned pointers should be always accepted. Instead we have to deal the terrible middle ground.
>atomic memory operations are made mandatory in Ziccamoa
In other words, forget about potential performance advantages of load-link/store-conditional instructions. `compare_exchange` and `compare_exchange_weak` will always compile into the same instructions.
And I guess you are fine with the page size part. I know there are huge-page-like proposals, but they do not resolve the fundamental issue.
I have other minor performance-related nits such `seed` CSR being allowed to produce poor quality entropy which means that we have bring a whole CSPRNG if we want to generate a cryptographic key or nonce on a low-powered micro-controller.
By no means I consider myself a RISC-V expert, if anything my familiarity with the ISA as a systems language programmer is quite shallow, but the number of accumulated disappointments even from such shallow familiarity has cooled my enthusiasm for RISC-V quite significantly.
pseudohadamard 56 minutes ago [-]
RISC-V truly is the RyanAir of processors: Oh, you want FP maths? That's an optional extra, did you check that when you booked? And was that single or double-precision, all optional extras at an extra charge. Atomic instructions, that's an extra too, have your credit card details handy. Multiply and divide? Yeah, extras. Now, let me tell you about our high-end customer options, packed SIMD and user-level interrupts, only for business class users. And then there's our first-class benefits, hypervisor extensions for big spenders, and even more, all optional extras.
newpavlov 24 minutes ago [-]
>Multiply and divide
And where it actually mattered they did not introduce a separate extension. Integer division is significantly more complex than multiplication, so it may make sense for low-end microcontrollers to implement in hardware only the latter.
IshKebab 3 hours ago [-]
I think having separate unaligned load/store instructions would be a much worse design, not least because they use a lot of the opcode space. I don't understand why you don't just have an option to not generate misaligned loads for people that happen to be running on CPUs where it's really slow. You don't need to wait for a profile for that.
As for `seed`, if you're running on a microcontroller you can just look up the data sheet to see if it's seed entropy is sufficient. By the time you get to CPUs where portable code is important a CSPRNG is probably fine.
I agree about page size though. Svnapot seems overly complicated and gives only a fraction of the advantages of actually bigger pages.
newpavlov 30 minutes ago [-]
>As for `seed`, if you're running on a microcontroller you can just look up the data sheet to see if it's seed entropy is sufficient.
It's a terrible attitude to have towards programmers, but looking at misaligned ops, I guess we can see a pattern from RISC-V authors here.
Most programmers do not target a concrete microcontroller and develop every line of code from scratch. They either develop portable libraries (e.g. https://docs.rs/getrandom) or build their projects using those libraries.
The whole raison d'être of an ISA is to provide a portable contract between hardware vendors and programmers . RISC-V authors shirk this responsibility with "just look at your micro specs, lol" attitude.
dzaima 2 hours ago [-]
The option to generate or not generate misaligned loads/stores does exist (-mno-strict-align / -mstrict-align). But of course that's a compile-time option, and of course the preferred state would be to have use of them on by default, but RVA23 doesn't sufficiently guarantee/encourage them not being unreasonably-slow, leaving native misaligned loads/stores still effectively-unusable (and off by default on clang/gcc on -march=rva23u64).
aka, Zicclsm / RVA23 are entirely-useless as far as actually getting to make use of native misaligned loads/stores goes.
IshKebab 1 hours ago [-]
> RVA23 doesn't guatantee them not being unreasonably-slow
Right but it doesn't guarantee that anything is unreasonably slow does it? I am free to make an RVA23 compliant CPU with a div instruction that takes 10k cycles. Does that mean LLVM won't output div? At some point you're left with either -mcpu=<specific cpu> and falling back to reasonable assumptions about the actual hardware landscape.
Do ARM or x86 make any guarantees about the performance of misaligned loads/stores? I couldn't find anything.
dzaima 1 hours ago [-]
I don't think x86/ARM particularly guarantee fastness, but at least they effectively encourage making use of them via their contributions to compilers that do. They also don't really need to given that they mostly control who can make hardware anyway. (at the very least, if general-purpose HW with horribly-slow misaligned loads/stores came out from them, people would laugh at it, and assume/hope that that's because of some silicon defect requiring chicken-bit-ing it off, instead of just not bothering to implement it)
Indeed one can make any instruction take basically-forever, but I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation (on at least some inputs), else having said instruction is strictly-redundant.
And if any significant general-purpose hardware actually did a 10k-cycle div around the time the respective compiler defaults were decided, I think there's a good chance that software would have defaulted to calling division through a function such that an implementation can be picked depending on the running hardware. (let's ignore whether 10k-cycle-division and general-purpose-hardware would ever go together... but misaligned-mem-ops+general-purpose-hardware definitely do)
IshKebab 22 minutes ago [-]
> if general-purpose HW with horribly-slow misaligned loads/stores came out from them
How is that different for RISC-V?
> I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation
I agree! So just use misaligned loads if Zicclsm is supported. As you observed there's a feedback loop between what compilers output and what gets optimised in hardware. Since RVA23 hardware is basically non-existent at the moment you kind of have the opportunity to dictate to hardware "LLVM will use misaligned accesses on RVA23; if you make an RVA23 chip where this is horribly slow then people will laugh at you and assume it's some sort of silicon defect".
saagarjha 2 hours ago [-]
RISC-V is not particularly good at using opcode space, unfortunately.
IshKebab 1 hours ago [-]
I don't think it's too bad. The compressed extension was arguably a mistake (and shouldn't be in RVA23 IMO), but apart from that there aren't any major blunders. You're probably thinking about how JAL(R) basically always uses x1/x5 (or whatever it is), but I don't think that's a huge deal.
About 1/3 of the opcode space is used currently so there's a decent amount of space left.
11 hours ago [-]
tosti 6 hours ago [-]
Regarding misaligned reads, IIRC only x86 hides non-aligned memory access. It's still slower than aligned reads. Other processors just fault, so it would make sense to do the same on riscv.
The problem is decades of software being written on a chip that from the outside appears not to care.
torginus 1 hours ago [-]
Yes, unaligned loads/stores are a niche feature that has huge implications in processor design - loads across cache-lines with different residency, pages that fault etc.
This is the classic conundrum of legacy system redesign - if customers keep demanding every feature of the old system be present, and work the exact same then the new system will take on the baggage it was designed to get rid of.
The new implementation will be slow and buggy by this standard and nobody will use it.
0x000xca0xfe 9 minutes ago [-]
Unaligned load/store is crucial for zero-copy handling of mmaped data, network streams and all other kinds of space-optimized data structures.
If the CPU doesn't do it software must make many tiny conditional copies which is bad for branch prediction.
This sucks double when you have variable length vector operations... IMO fast unaligned memory accesses should have been mandatory without exceptions for all application-level profiles and everything with vector.
fredoralive 2 hours ago [-]
ARM Cortex-A cores also allow unaligned access (MCU cores don't though, and older ARM is weird). There's perhaps a hint if the two most popular CPU architectures have ended up in the forgiving approach to unaligned access, rather than the penalising approach of raising an interrupt.
pjmlp 4 hours ago [-]
On modern CPUs, it used not to be something to care about in the past across 8, 16, 32 bit generations, outside RISC.
inkyoto 4 hours ago [-]
PDP-11, m68k – to name a few, did not allow misaligned access to anything that was not a byte.
Neither are RISC nor modern.
pjmlp 2 hours ago [-]
In regards to 68000 I don't remember, only used it during demoscene coding parties when allowed to touch Amiga from my friends.
I have only seen PDP-11 Assembly snippets in UNIX related books, wasn't aware of its alignment requirements.
inkyoto 35 minutes ago [-]
PDP-11 was a major source of inspiration for m68k architecture designers. The influence can be seen in multiple places, starting from the orthogonal ISA design down to instruction mnemonics.
It is quite likely that not allowing the misaligned access was also influenced by PDP-11.
adastra22 13 hours ago [-]
Also the bit manipulation extension wasn't part of the core. So things like bit rotation is slow for no good reason, if you want portable code. Why? Who knows.
adgjlsfhk1 13 hours ago [-]
> Also the bit manipulation extension wasn't part of the core.
This is primarily because core is primarily a teaching ISA. One of the best parts about RiscV is that you can teach a freshman level architecture class or a senior level chip building project with an ISA that is actually used. Anything powerful to run (a non built from source manually) linux will support a profile that bundles all the commonly needed instructions to be fast.
jacquesm 13 hours ago [-]
Bit manipulation instructions are part and parcel of any curriculum that teaches CPU architecture. They are the basic building blocks for many more complex instructions.
I can see quite a few items on that list that imnsho should have been included in the core and for the life of me I can't see the rationale behind leaving them out. Even the most basic 8 bit CPU had various shifts and rolls baked in.
rwmj 13 hours ago [-]
This is the reason behind the profiles like RVA23 which include bitmanip, vector and a large number of other extensions. Real chips coming very soon will all be RVA23.
jacquesm 13 hours ago [-]
Neat. I can't wait to get my hands on a devboard.
NekkoDroid 12 hours ago [-]
The earlierst I know of coming is the SpaceMit K3, which Sipeed will have dev boards for.
statusfailed 11 hours ago [-]
The Milk-V Jupiter 2 (coming out in April) is RV23 too
jacquesm 11 hours ago [-]
Nice board but very low on max RAM.
kevin_thibedeau 12 hours ago [-]
32-bit barrel shifters consume significant area and RISC-V was developed to support resource constrained low cost embedded hardware in a minimal ISA implementation.
pezezin 10 hours ago [-]
The 32-bit ARM architecture included a barrel shifter as part of its basic design, as in every instruction had a shift field.
If a CPU built in 1985 with a grand total of 26 000 transistors could afford it, I am pretty sure that anything built in this century could afford it too.
snvzz 10 hours ago [-]
26k is a lot of transistors for an embedded MCU.
You'd be excluding many small CPUs which exist within other chips running very specialized code.
As profiles mandate these instructions anyway, there's no good reason to complicate the most basic RISC-V possible.
RISC-V is the ISA for everything, from the smallest such CPUs to supercomputers.
wk_end 9 hours ago [-]
What MCUs are you thinking of?
To the best of my knowledge (and Google-fu), 26K really isn't a lot of transistors for an embedded MCU - at least not a fully-featured 32-bit one comparable to a minimal RISC-V core. An ARM Cortex M0, which is pretty much the smallest thing out there, is around 10K gates => around 40K transistors. This is also around the same size as a minimal RISC-V core AFAICT.
The ARM core has a shifter, though.
snvzz 8 hours ago [-]
There's reason RV32E and RV64E, with half the registers, are a thing. RV32I/RV64I isn't small enough.
There are many chips in the market that do embed 8051s for janitorial tasks, because it is small and not legally encumbered. Some chips have several non-exposed tiny embedded CPUs within.
RISC-V is replacing many of these, bringing modern tooling. There's even open source designs like SERV that fit in a corner of an already small FPGA, leaving room for other purposes.
wk_end 8 hours ago [-]
Per https://en.wikipedia.org/wiki/Transistor_count, even an 8051 has 50K transistors, which reinforces my claim that 26K really doesn't seem like a big ask for an MCU core. Whether that means a barrel shifter is worth it or not is a totally orthogonal question, of course.
(Although I do have to eat my words here - I didn't check that Wikipedia page, and it does actually list a ~6K RISC-V core! It's an experimental academic prototype "made from a two-dimensional material [...] crafted from molybdenum disulfide"; I don't know if that construction might allow for a more efficient transistor count and it's totally impractical - 1KHz clock speed, 1-bit ALU, etc. - for almost any purpose, but it is technically a RISC-V implementation significantly smaller than 26K)
userbinator 7 hours ago [-]
I don't know if that construction might allow for a more efficient transistor count and it's totally impractical - 1KHz clock speed, 1-bit ALU, etc. - for almost any purpose, but it is technically a RISC-V implementation significantly smaller than 26K
That sounds like a microcoded RISC-V implementation, which can really be done for any ISA at the extreme expense of speed.
inkyoto 5 hours ago [-]
If I'm not mistaken, microcode is a thing at least on Intel CPU's, and that is how they patched Spectre, Meltdown and other vulnerabilities – Intel released a microcode update that BIOS applies at the cold start and hot patches the CPU.
Maybe other CPU's have it as well, though I do not have enough information on that.
adgjlsfhk1 8 hours ago [-]
> There's reason RV32E and RV64E, with half the registers, are a thing. RV32I/RV64I isn't small enough.
This is actually kind of counter to your point. The really tiny micro-controllers from the 80s only had 224 bits of registers. RV32E is at least twice that (16 registers*32 bits), and modern mcus generally use 2-4kbs of sram, so the overhead of a 32 bit barrel shifter is pretty minimal.
torginus 1 hours ago [-]
It was the case even 15 years ago when Cortex M0/M3 really started to get traction, that the processor area of ARM cores was small enough to not make a difference in practice.
adgjlsfhk1 11 hours ago [-]
IIUC this is a lot less true in the modern era. Even with 24nm transistors (the cheapest transistor last time I checked), modern microcontrollers have a fairly big transistor budget for the core (since 80+% of the transistors are going to sram anyway).
jacquesm 12 hours ago [-]
You can save a lot of silicon by doing 8 or 16 bit shifters and then doing the rest at the code generation level. Not having any seems really anemic to me.
bmenrigh 7 hours ago [-]
Yeah I don’t get it. Shifts and rolls are among the simplest of all instructions to implement because they can be done with just wires, zero gates. Hard to imagine a justification for leaving them out.
hackyhacky 13 hours ago [-]
> One of the best parts about RiscV is that you can teach a freshman level architecture class or a senior level chip building project with an ISA that is actually used.
Same could be said of MIPS.
My understanding is the RISC-V raison d'etre is rather avoidance of patented/copywritten designs.
musicale 7 hours ago [-]
As you indicate, MIPS was widely used in computer architecture courses and textbooks, including pre-RISC-V editions of Patterson & Hennessy (Computer Organization & Design) and Harris & Harris (Digital Design and Computer Architecture.
In spite of the currently mediocre RISC-V implementations, RISC-V seems to have more of a future and isn't clouded by ISA IP issues, as you note.
adgjlsfhk1 13 hours ago [-]
the avoidance of patent/copyright is critical for (legally) having students design their own chips. MIPS was pretty good (and widely used) for teaching assembly, but pretty bad for teaching a class where students design chips
musicale 6 hours ago [-]
This is largely contradicted by the (pre RISC-V) MIPS editions of Patterson & Hennessy, Harris & Harris, etc., which teach you how to design a MIPS datapath (at the gate level.)
Regarding silicon implementations, consider that 1) you can synthesize it from HDL/RTL designs using modern CAD tools, and 2) MIPS was originally designed to be simple enough for grad students to implement with the primitive CAD tools of the 1980s (basically semi-manual layout).
userbinator 7 hours ago [-]
MIPS patents have long expired too (and incidentally for any other CPU released prior to 2006), so that's a moot point.
fidotron 13 hours ago [-]
The fact the Hazard3 designer ended up creating an extension to resolve related oddities was kind of astonishing.
Why did it fall to them to do it? Impressive that he did, but it shouldn't have been necessary.
rllj 13 hours ago [-]
Which extension is that?
mjmas 13 hours ago [-]
An extension he calls Xh3bextm. For extracting multiple bits from bitfields.
There are also four other custom extensions implemented.
mort96 3 hours ago [-]
Do you typically care about portability to the degree that you want the same machine code to execute on both a Linux box and a microcontroller? Why?
torginus 1 hours ago [-]
Unaligned load/store is a horrible feature to implement.
Page size can be easily extended down the line without breaking changes.
direwolf20 7 hours ago [-]
The first one is common across many architectures, including ARM, and the second is just LLVM developers not understanding how cmpxchg works
fidotron 13 hours ago [-]
> RISC-V will get there, eventually.
Not trolling: I legitimately don't see why this is assumed to be true. It is one of those things that is true only once it has been achieved. Otherwise we would be able to create super high performance Sparc or SuperH processors, and we don't.
As you note, Arm once was fast, then slow, then fast. RISC-V has never actually been fast. It has enabled surprisingly good implementations by small numbers of people, but competing at the high end (mobile, desktop or server) it is not.
lizknope 12 hours ago [-]
I think the bigger question is does RISC-V need to be fast? Who wants to make it fast?
I'm a chip designer and I see people using RISC-V as small processor cores for things like PCIE link training or various bookkeeping tasks. These don't need to be fast, they need to be small and low power which means they will be relatively slow.
Most people on tech review sites only care about desktop / laptop / server performance. They may know about some of the ARM Cortex A series CPUs that have MMUs and can run desktop or smartphone Linux versions.
They generally don't care about the ARM Cortex M or R versions for embedded and real time use. Those are the areas where you don't need high performance and where RISC-V is already replacing ARM.
EDIT:
I'll add that there are companies that COULD make a fast RISC-V implementation.
Intel, AMD, Apple, Qualcomm, or Nvidia could redirect their existing teams to design a high performance RISC-V CPU. But why should they? They are heavily invested in their existing x86 and ARM CPU lines. Amazon and Google are using licensed ARM cores in their server CPUs.
What is the incentive for any of them to make a high performance RISC-V CPU? The only reason I can think of is that Softbank keeps raising ARM licensing costs and it gets high enough that it is more profitable to hire a team and design your own RISC-V CPU.
adgjlsfhk1 11 hours ago [-]
Of your list, Qualcomm and Nvidia are fairly likely to make high perf Riscv cpus. Qualcomm because Arm sued them to try and stop them from designing their own arm chips without paying a lot more money, and Nvidia because they already have a lot of teams making riscv chips, so it seems likely that they will try to unify on the one that doesn't require licensing.
lizknope 10 hours ago [-]
Yeah, they could but then what is the market? Qualcomm wants to sell smartphone chips and Android can run on RISC-V and most Android Java apps could in theory run.
But if you look at the Intel x86 smartphone chips from about 10 years ago they had to make an ARM to x86 emulator because even the Java apps contained native ARM instructions for performance reasons.
Qualcomm is trying to push their ARM Snapdragon chips in Windows laptops but I don't think they are selling well.
Nvidia could also make RISC-V based chips but where would they go? Nvidia is moving further away from the consumer space to the data center space. So even if Nvidia made a really fast RISC-V CPU it would probably be for the server / data center market and they may not even sell it to ordinary consumers.
Or if they did it could be like the Ampere ARM chips for servers. Yeah you can buy one as an ordinary consumer but they were in the $4,000 range last time I looked. How many people are going to buy that?
adgjlsfhk1 8 hours ago [-]
> Qualcomm is trying to push their ARM Snapdragon chips in Windows laptops but I don't think they are selling well.
That definitely seems to be the case. I think they likely would have more luck with Riscv phones (much less app brand loyalty). or servers (arm in the server has done a lot better than on windows)
For Nvidia, if they made a consumer riscv cpu it would be a gaming handheld/console (Switch 3 or similar) once the AI bubble pops. Before that, likely would be server cpus that cost $10k for big AI systems. Before that, I could see them expanding the role of Riscv in their GPUs (likely not visible to to users).
lizknope 8 hours ago [-]
Many PC hardware enthusiasts say they want a RISC-V or ARM CPU but then when these system exist they don't actually want them.
Why? Because they want something like a $300 CPU and $150 motherboard using standard DDR4/5 DIMMs that is RISC-V or ARM or something not x86 but is faster than x86. The sub $1000 systems that hardware companies make that are RISC-V or ARM chips are low end embedded single board systems that are too slow for these people. The really fast systems are $4000 server level chips that they can't afford. The only company really bringing fast non-x86 CPUs with consumer level pricing is Apple. We can also include Qualcomm but I'm skeptical of the software infrastructure and compatibility since they are relying on x86 emulation for windows.
benced 6 hours ago [-]
China is likely where it would come from - ARM and x86 are owned by Western companies.
rwmj 13 hours ago [-]
RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots), largely because we learned from that. It's in fact a very "boring" architecture. There's no one that expects it'll be hard to optimize for. There are at least 2 designs that have taped out in small runs and have high end performance.
adrian_b 13 hours ago [-]
RISC-V does not have the pitfalls of experimental ISAs from 45 years ago, but it has other pitfalls that have not existed in almost any ISA since the first vacuum-tube computers, like the lack of means for integer overflow detection and the lack of indexed addressing.
Especially the lack of integer overflow detection is a choice of great stupidity, for which there exists no excuse.
Detecting integer overflow in hardware is extremely cheap, its cost is absolutely negligible. On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably, because each arithmetic operation must be replaced by multiple operations.
Because of the unacceptable cost, normal RISC-V programs choose to ignore the risk of overflows, which makes them unreliable.
The highest performance implementations of RISC-V from previous years were forced to introduce custom extensions for indexed addressing, but those used inefficient encodings, because something like indexed addressing must be in the base ISA, not in an extension.
hackyhacky 13 hours ago [-]
> On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably,
Most languages don't care about integer overflow. Your typical C program will happily wrap around.
If I really want to detect overflow, I can do this:
add t0, a0, a1
blt t0, a0, overflow
Which is one more instruction, which is not great, not terrible.
From what I’ve read most native compiled code doesn’t really check for overflows in optimised builds, but this is more of an issue for JavaScript et al where they may detect the overflow and switch the underlying type? I’m definitely no expert on this.
sitharus 9 hours ago [-]
A bit more reading shows there's a three instruction general case version for 32-bit additions on the 64-bit RISC-V ISA. I'm not familiar with RISC-V assembly and they didn't provide an example, but I _think_ it's as easy as this since 64-bit add wouldn't match the 32-bit overflowed add.
Neither x86-64 nor RISC-V is implemented by running each single instruction. They both recognize patterns in the code and translate those into micro-ops. On high performance chips like Rivos's (now Meta's) I doubt there'd be any difference in the amount of work done.
Code size is a benefit for x86-64 however - no one is arguing that - but you have to trade that against the difficulty of instruction decoding.
adrian_b 12 hours ago [-]
That is not the correct way to test for integer overflow.
The correct sequence of instructions is given in the RISC-V documentation and it needs more instructions.
"Integer overflow" means "overflow in operations with signed integers". It does not mean "overflow in operations with non-negative integers". The latter is normally referred as "carry".
The 2 instructions given above detect carry, not overflow.
Carry is needed for multi-word operations, and these are also painful on RISC-V, but overflow detection is required much more frequently, i.e. it is needed at any arithmetic operation, unless it can be proven by static program analysis that overflow is impossible at that operation.
refulgentis 12 hours ago [-]
I have no idea or practical experience with anything this low-level, so idk how much following matters, it's just someone from the crowd offering unvarnished impressions:
It's easy to believe you're replying to something that has an element of hyperbole.
It's hard to believe "just do 2x as many instructions" and "ehhh who cares [i.e. your typical C program doesn't check for overflow]", coupled to a seemingly self-conscious repetition of a quip from the television series Chernobyl that is meant to reference sticking your head in the sand, retire the issue from discussion.
adrian_b 12 hours ago [-]
There was no hyperbole in what I have said.
The sequence of instructions given above is incorrect, it does not detect integer overflow (i.e. signed integer overflow). It detects carry, which is something else.
The correct sequence, which can be found in the official RISC-V documentation, requires more instructions.
Not checking for overflow in C programs is a serious mistake. All decent C compilers have compilation options for enabling checking for overflow. Such options should always be used, with the exception of the functions that have been analyzed carefully by the programmer and the conclusion has been that integer overflow cannot happen.
For example with operations involving counters or indices, overflow cannot normally happen, so in such places overflow checking may be disabled.
adgjlsfhk1 12 hours ago [-]
> On the other hand, detecting integer overflow in software is extremely expensive
this just isn't true. both addition and multiplication can check for overflow in <2 instructions.
+1 -- misinformation is best corrected quickly. If not, AI will propagate it and many will believe the erroneous information. I guess that would be viral hallucinations.
classichasclass 13 hours ago [-]
As a counterexample, I point to another relatively boring RISC, PA-RISC. It took off not (just) because the architecture was straightforward, but because HP poured cash into making it quick, and PA-RISC continued to be a very competitive architecture until the mass insanity of Itanic arrived. I don't see RISC-V vendors making that level of investment, either because they won't (selling to cheap markets) or can't (no capacity or funding), and a cynical take would say they hide them behind NDAs so no one can look behind the curtain.
I know this is a very negative take. I don't try to hide my pro-Power ISA bias, but that doesn't mean I wouldn't like another choice. So far, however, I've been repeatedly disappointed by RISC-V. It's always "five or six years" from getting there.
adrian_b 11 hours ago [-]
I would not call PA-RISC boring. Already at launch there was no doubt that it is a better ISA than SPARC or MIPS, and later it was improved. At the time when PA-RISC 2.0 was replaced by Itanium it was not at all clear which of the 2 ISAs is better. The later failures to design high-performance Itanium CPUs make plausible that if HP would have kept PA-RISC 2.0 they might have had more competitive CPUs than with Itanium.
SPARC (formerly called Berkeley RISC) and MIPS were pioneers that experimented with various features or lack of features, but they were inferior from many points of view to the earlier IBM 801.
The RISC ISAs developed later, including ARM, HP PA-RISC and IBM POWER, have avoided some of the mistakes of SPARC and MIPS, while also taking some features from IBM 801 (e.g. its addressing modes), so they were better.
burntoutgray 10 hours ago [-]
ISAs fail to gain traction when the sufficiently smart compilers don't eventuate.
The x86-64 is a dog's breakfast of features. But due to its widespread use, compiler writers make the effort to create compilers that optimize for its quirks.
Itanium hardware designers were expecting the compiler writers to cater for its unique design. Intel is a semi company. As good as some of their compilers are, internally they invested more in their biggest seller and the Itanium never got the level of support that was anticipated at the outset.
pjmlp 4 hours ago [-]
I am a firm believer that if AMD wasn't in the position to be able to come up with AMD64 architecture, eventually those Itanium issues would have been sorted out, Windows XP was already there and there was no other way for 64 bit going forward.
imtringued 1 hours ago [-]
I don't know anything about Itanium in particular, but AMD's NPU uses a VLIW architecture and they had to break backwards compatibility in the ISA for the second generation NPU (XDNA2) to get better performance.
classichasclass 10 hours ago [-]
I mean "boring" in the sense that its ISA was relatively straightforward, no performance-entangling kinks like delay slots, a good set of typical non-windowed GPRs, no wild or exotic operations. And POWER/PowerPC and PA-RISC weren't a lot later than SPARC or MIPS, either.
fidotron 13 hours ago [-]
> RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots),
You're saying ISA design does have implementation performance implications then? ;)
> There's no one that expects it'll be hard to optimize for
[Raises hand]
> There are at least 2 designs that have taped out in small runs and have high end performance.
Are these public?
Edit: I should add, I'm well aware of the cultural mismatch between HN and the semi industry, and have been caught in it more than a few times, but I also know the semi industry well enough to not trust anything they say. (Everything from well meaning but optimistic through to outright malicious depending on the company).
rwmj 13 hours ago [-]
The 2 designs I'm thinking of are (tiresomely) under NDA, although I'm sure others will be able to say what they are. Last November I had a sample of one of them in my hand and played with the silicon at their labs, running a bunch of AI workloads. They didn't let me take notes or photographs.
> There's no one that expects it'll be hard to optimize for
No one who is an expert in the field, and we (at Red Hat) talk to them routinely.
saagarjha 1 hours ago [-]
Expert here, are these made for general purpose workloads or do you expect them to be fast for AI only?
mastax 7 hours ago [-]
I assume the TensTorrent TT-Ascalon is one of the CPU designs.
Findecanor 11 hours ago [-]
Because today, getting a fast CPU out it isn't as much an engineering issue as it is about getting the investment for hiring a world-class fab.
The most promising RISC-V companies today have not set out to compete directly with Intel, AMD, Apple or Samsung, but are targeting a niche such as AI, HPC and/or high-end embedded such as automotive.
And you can bet that Qualcomm has RISC-V designs in-house, but only making ARM chips right now because ARM is where the market for smartphone and desktop SoCs is.
Once Google starts allowing RVA23 on Android / ChromeOS, the flood gates will open.
adgjlsfhk1 11 hours ago [-]
It's very much both. You need millions of dollars for the fab, but you also need ~5 years to get 3 generations of cpus out (to fix all the performance bugs you find in the first two)
gt0 13 hours ago [-]
I don't think anybody suggests Oracle couldn't make faster SPARC processors, it's just that development of SPARC ended almost 10 years ago. At the time SPARC was abandoned, it was very competitive.
twoodfin 11 hours ago [-]
In single-threaded performance? That’s not how I remember it: Sun was pushing parallel throughput over everything else, with designs like the T-Series & Rock.
gt0 10 hours ago [-]
Perhaps not single thread, but Rock was a dead end a while before Oracle pulled the plug, and Sun/Oracle's core market of course was always servers not workstations. We used Niagara machines at my work around the T2 era, a long time ago, but they were very competitive if you could saturate the cores and had the RAM to back it up.
twoodfin 10 hours ago [-]
Sure, my work got a few of the Niagaras too and they were tremendous build machines for Solaris software.
But if you’re judging an ISA by performance scalability, you generally want to look at single-threaded performance.
icedchai 9 hours ago [-]
Sparc stopped being competitive in the early 2000’s.
snvzz 9 hours ago [-]
Fast, RVA23-compatible microarchitectures already exist. Everything high performance seems to be based on RVA23, which is the current application profile and comparable to ARMv9 and x86-64v4.
However, it takes time from microarchitecture to chips, and from chips to products on shelves.
The very first RVA23-compatible chips to show up will likely be the spacemiT K3 SoC, due in development boards April (i.e. next month).
More of them, more performant, such as a development board with the Tenstorrent Ascalon CPU in the form of the Atlantis SoC, which was tapped out recently, are coming this summer.
It is even possible such designs will show up in products aimed at the general public within the present year.
rwmj 14 hours ago [-]
Marcin is working with us on RISC-V enablement for Fedora and RHEL, he's well aware of the problem with current implementations. We're hopeful that this'll be pretty much resolved by the end of the year.
LeFantome 11 hours ago [-]
If he expects it to be resolved by the end of the year (and I agree it likely will be), why is he writing a post like this?
Is this because Fedora 44 is going to beta?
Dwedit 13 hours ago [-]
There's the ARM video from LowSpecGamer, where they talk about how they forgot to connect power to the chip, and it was still executing code anyway. According to Steve Furber, the chip was accidentally being powered from the protection diodes alone. So ARM was incredibly power efficient from the very beginning.
cogman10 14 hours ago [-]
> AND the software with no architecture-specific optimisations
The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V. I do not believe this is a lack of software optimization issue.
We are well past the days where hand written assembly gives much benefit, and modern compilers like gcc and llvm do nearly identical work right up until it comes to instruction emissions (including determining where SIMD instructions could be placed).
Unless these chips have very very weird performance characteristics (like the weirdness around x86's lea instruction being used for arithmetic) there's just not going to be a lot of missed heuristics.
hrmtst93837 13 hours ago [-]
One thing compilers still struggle with is exploiting weird microarchitectural quirks or timing behaviors that aren't obvious from the ISA spec, especially with memory, cache and pipeline tuning. If a new RISC-V core doesn't expose the same prefetching tricks or has odd branch prediction you won't get parity just by porting the same backend. If you want peak numbers sometimes you do still need to tune libraries or even sprinkle in a bit of inline asm despite all the "let the compiler handle it" dogma.
cogman10 13 hours ago [-]
While true, it's typically not going to be impactful on system performance.
There's a reason, for example, why the linux distros all target a generic x86 architecture rather than a specific architecture.
Some applications may target a generic x86 architecture without any impact on performance.
However, other applications which must do cryptographic operations, audio/video processing, scientific/technical/engineering computing, etc. may have wildly different performances when compiled for different x86-64 ISA versions, for which dedicated assembly-language functions exist.
cogman10 12 hours ago [-]
Granted, these applications do exist. They are simply becoming more and more rare. I'd also say that there's been a pretty steady dedicated effort to abstracting the assembly. It's still pretty low level, as in you are caring about the specific instructions being used, but it's also not quite assembly in both C++/rust.
Java, interestingly enough, is somewhat leading the way here with their Vector API. I think they actually have one of the better setups for allowing someone to write fast code that is platform independent.
C++ is also diving into this realm. 26 just merged in now SIMD instructions.
That is the bulk of the benefit of diving down into assembly.
I would not say that such applications are becoming more and more rare.
Most of the applications whose performance matters for me, because I must wait a non-negligible time for them to do their job, are dependent on assembly implementation for certain functions invoked inside critical loops. I do not see any sign of replacements for them. On the contrary, Intel, AMD and Arm continue to introduce special instructions that are useful in certain niche applications and taking advantage of them will require additional assembly language functions, not less.
For me, there is only one application that I use and which consumes non-negligible computer time and which does not depend on SIMD optimizations, which is the compilation of software projects.
CyberDildonics 9 hours ago [-]
audio/video processing, scientific/technical/engineering computing, etc. may have wildly different performances when compiled for different x86-64 ISA versions
This is pretty vague and makes it sounds like there are big differences in instruction sets.
In actuality it comes down to memory access first which has nothing to with instructions.
After that it comes down to simple SIMD/AVX instructions and not some exotic entirely different instruction set.
CyberDildonics 9 hours ago [-]
The things you are talking about are taken care of by out of order execution and the CPU itself being smart about how it executes. Putting in prefetch instructions rarely beats the actual prefetcher itself. Compilers didn't end up generating perfect pentium asm either. OOO execution is what changed the game in not needing perfect compiler output any more.
bobmcnamara 13 hours ago [-]
> The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V.
There's no carry bit, and no widening multiply(or MAC)
Findecanor 11 hours ago [-]
RISC-V splits widening multiply out into two instructions: one for the high bits and one for the low. Just like 64-bit ARM does.
Integer MAC doesn't exist, and is also hindered by a design decision not to require more than two source operands, so as to allow simple implementations to stay simple.
The same reason also prevents RISC-V from having a true conditional move instruction: there is one but the second operand is hard-coded zero.
FMAC exists, but only because it is in the IEEE 754 spec ... and it requires significant op-code space.
bsder 12 hours ago [-]
> Don't blame the ISA - blame the silicon implementations
That's true, but tautological.
The issue is that the RISC-V core is the easy part of the problem, and nobody seems to even be able to generate a chip that gets that right without weirdness and quirks.
The more fundamental technical problem is that things like the cache organization and DDR interface and PCI interface and ... cannot just be synthesized. They require analog/RF VLSI designers doing things like clock forwarding and signal integrity analysis. If you get them wrong, your performance tanks, and, so far, everybody has gotten them wrong in various ways.
The business problem is the fact that everybody wants to be the "performance" RISC-V vendor, but nobody wants to be the "embedded" RISC-V vendor. This is a problem because practically anybody who is willing to cough up for a "performance" processor is almost completely insensitive to any cost premium that ARM demands. The embedded space is hugely sensitive to cost, but nobody is willing to step into it because that requires that you do icky ecosystem things like marketing, software, debugging tools, inventory distribution, etc.
This leads to the US business problem which is the fact that everybody wants to be an IP vendor and nobody wants to ship a damn chip. Consequently, if I want actual RISC-V hardware, I'm stuck dealing with Chinese vendors of various levels of dodginess.
userbinator 7 hours ago [-]
ARM was never a "speed demon"; it started out as a low power small-area core and clearly had more complexity and thought put into it than MIPS or RISC-V.
Strong doubt. Those of us who were around in the 90s might remember how much hype there was with MIPS.
rbanffy 2 hours ago [-]
I don’t think you remember, But the first Archimedes smoked the just-launched Compaq 386s with a dedicated 387 coprocessor.
It was not designed to be one, but it ended up being surprisingly fast.
api 14 hours ago [-]
A pattern I've noticed for a very long time:
A lot of times the path to the highest performing CPU seems to be to optimize for power first, then speed, then repeat. That's because power and heat are a major design constraint that limits speed.
I first noticed this way back with the Pentium 4 "Netburst" architecture vs. the smaller x86 cores that became the ancestor of the Core architecture. Intel eventually ran into a wall with P4 and then branched high performance cores off those lower-power ones and that's what gave us the venerable Core architecture that made Intel the dominant CPU maker for over a decade.
ARM's history is another example.
cpgxiii 13 hours ago [-]
I think the story is a bit more complicated. Core succeeded precisely because Intel had both the low-power experience with Pentium-M and the high-power experience with Netburst. The P4 architecture told them a lot about what was and wasn't viable and at what complexity. When you look at the successor generations from Core, what you see are a lot of more complex P4-like features being re-added, but with the benefits of improved microarch and fab processes. Obviously we will never know, but I don't think you would get to Haswell or Skylake in the form they were without the learning experience of the P4.
In comparison, I think Arm is actually a very strong cautionary tale that focusing on power will not get you to performance. Arm processors remained pretty poor performance until designers from other CPU families entirely (PowerPC and Intel) took it on at Apple and basically dragged Arm to the performance level they are today.
maximilianburke 10 hours ago [-]
And not just any PowerPC architects either, but the people from PA Semi. Motorola couldn't get the speed up and IBM couldn't get the power down.
userbinator 6 hours ago [-]
NetBurst was supposed to be the application of RISC principles to x86 taken to its extreme (ultra-long pipelines to reduce clock-to-clock delay, highest clock speed possible --- basically reducing work-per-clock and hoping that reduces complexity enough to increase clock speed to compensate.) The ALU was 16 bits, "double pumped" with the carry split between the two, which lead to 32-bit ALU operations that don't carry between the lower and upper halves actually finishing a clock cycle faster than those with a carry.
I don’t have a micro architecture background so I apologize if this is obvious — What do power and speed mean in this context?
McP 14 hours ago [-]
Power - how many Watts does it need?
Speed - how quickly can it perform operations?
wmf 12 hours ago [-]
You can get low power with a simple design at a low clock. This definitely will not help achieve high performance later.
weebull 12 hours ago [-]
Clock rate isn't the only factor. A design can be power hungry at a low clock rate if designed badly, and if it it is... you're never getting that think running fast.
unethical_ban 14 hours ago [-]
One could say "Optimize for efficiency first, then performance".
cptskippy 13 hours ago [-]
Core evolved from the Banis (Centrino) CPU core which was based on P3, not P4. Banias used the front-side bus from P4 but not the cores.
Banias was hyper optimized for power, the mantra was to get done quickly and go to sleep to save power. Somewhere along the line someone said "hey what happens if we don't go to sleep?" and Core was born.
jauntywundrkind 13 hours ago [-]
Parallels to code design, where optimizing data or code size can end up having fantastic performance benefits (sometimes).
dmitrygr 14 hours ago [-]
IF you care to read the article, they indeed do not blame the architecture but the available silicon implementations.
topspin 14 hours ago [-]
I keep checking in on Tenstorrent every few months thinking Keller is going to rock our world... losing hope.
At this point the most likely place for truly competitive RISC-V to appear is China.
Findecanor 10 hours ago [-]
Tenstorrent is supposedly taping out 8-wide Ascalon processors as we speak, with devboards projected to be available in Q2/Q3 this year.
BTW. Keller is also on the board of AheadComputing — founded by former Intel engineers behind the fabled "Royal Core".
topspin 8 hours ago [-]
I can't know what Ascalon will actually be, but back in April/May 2025 there were actual performance numbers presented by Tenstorrent, and I analyzed what was shown. I concluded that Ascalon would be the x86_64 equivalent of an i5-9600K.
That's useable for many applications, but it's not going to change the world. A lot of "micro PCs" with low power CPUs are well past that now. If that's what Ascalon turns out to be, it will amount to an SBC class device.
imtringued 28 minutes ago [-]
I don't know what bubble you are living in, but the i5-9600K is many steps up beyond "SBC class".
The Raspberry Pi 5 results on Geekbench 6 are all over the place. A score between 500 to 900 in single core and a 2000 multi core score.
Radxa 4 is an SBC based around the N100 and it basically gets the same or slightly higher performance as the Raspberry Pi 5.
Meanwhile the i5-9600K gets a score of 1677 in single core, which is 83% of the performance of the entire Raspberry Pi 5 and gets a score of 6199 when using multiple cores, that's 3x the performance.
I'd call this at least "Laptop class" and you even admitted yourself back in 2025 that you're using a processor on that level.
snvzz 9 hours ago [-]
>Ascalon tape out
Supposedly happened earlier this year. Tenstorrent says devboards in Q3.
Now we just wait.
rbanffy 14 hours ago [-]
> At this point the most likely place for fast RISC-V to appear is China.
Or we just adopt Loongson.
balou23 14 hours ago [-]
TBH I still don't really get how it's different from MIPS. As far as I can tell... Loongson seems to be really just MIPS, while LoongArch is MIPS with some extra instructions.
pantalaimon 13 hours ago [-]
They did get rid of the delay slots and some other MIPS oddities
bonzini 13 hours ago [-]
LoongArch is, on a first approximation, an almost RISC-V user space instruction set together with MIPS-like privileged instructions and registers.
mananaysiempre 11 hours ago [-]
Wait, this is a modern-ish ISA with a software-managed TLB, I didn’t realize that! The manual seems a bit unhappy about that part though:
> In the current version of this architecture specification, TLB refill and consistent maintenance between TLB and page tables are still [sic] all led by software.
But legally distinct! I guess calling it M○PS was not enough for plausible deniability.
genxy 13 hours ago [-]
ISAs shouldn't be patentable in the first place.
throawayonthe 13 hours ago [-]
(purely on vibes) loongson feels to me like an intermediate step/backup strategy rather than a longterm target (though they'll probably power govt equipment for decades of legacy either way :p)
rbanffy 14 hours ago [-]
I did read it. A Banana Pi is not the fastest developer platform. The title is misleading.
BTW, it's quite impressive how the s390x is so fast per core compared to the others. I mean, of course it's fast - we all knew that.
And don't let IBM legal see this can be considered a published benchmark, because they are very shy about s390x performance numbers.
Aurornis 14 hours ago [-]
> A Banana Pi is not the fastest developer platform.
What is the current fastest platform that isn’t exorbitantly expensive? Not upcoming releases, but something I can actually buy.
I check in every 3-6 months but the situation hasn’t changed significantly yet.
adgjlsfhk1 12 hours ago [-]
A P550 based board is the best you can get for now (~2-3x faster than the Banana Pi). In 2-3 months there should be a number of SpaceMIT k3 chips that are ~4-6x faster than the banana pi and somewhat reasonably priced (~200-300). By the end of the year, however, you should be able to get an ascalon chip which should be way way faster than that (roughly apple m1/zen3 speed)
cestith 13 hours ago [-]
What is the current fastest ppc64le implementation that isn’t exorbitantly expensive? How about the s390x?
gt0 14 hours ago [-]
I was really surprised by the s390x performance, but I also don't really understand why there are build time listed by architecture, not the actual processors.
kpil 13 hours ago [-]
What's fast on Z platforms is typically IO rather than raw CPU - the platform can push a lot of parallell data. This is typically the bottleneck when compiling.
The cores are in my experience moderately fast at most. Note that there are a lot of licencing options and I think some are speed-capped - but I don't think that applies to IFL - a standard CPU licence-restricted to only run linux.
burntoutgray 10 hours ago [-]
I thought I read somewhere that Z CPUs run at 5GHz ??
rbanffy 14 hours ago [-]
Probably because that's just the infrastructure they have.
pantalaimon 13 hours ago [-]
i686 builds even faster
menaerus 14 hours ago [-]
Which risc-v implementation is considered fast?
patchnull 14 hours ago [-]
Nothing shipping today is really competitive with modern ARM or x86. The SiFive P870 and Tenstorrent Ascalon (Jim Keller's team) are the most anticipated high-performance designs, but neither is widely available. What you can actually buy today tops out around Cortex-A76 class single-thread performance at best, which is roughly where ARM was five or six years ago.
menaerus 14 hours ago [-]
I remember taking down some notes wrt SiFive P870 specs, comparing them to x86_64, and reaching the same conclusion. Narrower core width (4-wide vs 8-wide), lower clock frequency (peaks at 3GHz) and no turbo (?), limited support for vector execution (128-bit vs 512-bit), limited L1 bandwidth (1x 128-bit load/cycle?), limited FP compute (2x 128-bit vs 2x 512-bit), load queue is also inconveniently small with 48 entries (affecting already limited load bandwidth), unclear system memory bandwidth and how it scales wrt the number of cores (L3 contention) although for the latter they seem to use what AMD is doing (exclusive L3 cache per chiplet).
LeFantome 12 hours ago [-]
SpacemiT K3 is about the same performance as a Rockchip RK3588. So, 4 years ago?
Except the K3 kills it on AI (60 TOPS).
LeFantome 12 hours ago [-]
> Which risc-v implementation is considered fast?
SpacemiT K3 is 2010 Macbook performance single-core, 2019 Macbook Air multi-core, and better than M4 Apple Silicon for AI.
So I guess it depends on what you are going to do with it.
menaerus 4 hours ago [-]
M4 is 38 TOPS at INT8 precision whereas SpacemiT K3 is 60 TOPS at INT4 precision so at best they would be equal in "AI" performance but they are not because the rest of the K3 chip is much less capable than M4 (as I would expect).
E.g. M4 total system memory bandwidth is 120GB/s whereas K4 is 51GB/s, single core memory bandwidth is 100-120GB/s vs ~30GB/s. M4 has 10 CPU cores and neural engine with 16 cores whereas K3 has 8 CPU cores and 8 "AI" cores, K3 clock frequency is almost half the clock frequency in M4 etc. etc.
But anyway thanks for sharing, always good to learn about new hardware.
NooneAtAll3 14 hours ago [-]
DC-ROMA 2 is on the Rasperry 4 level of performance last I heard
snvzz 9 hours ago [-]
>I did read it. A Banana Pi is not the fastest developer platform. The title is misleading.
Ironically, its SoC (spacemiT K1) is slower than the JH7110 used in the first mass-produced RISC-V SBC, VisionFive 2.
But unlike JH7110, it has vector 1.0, making it a very popular target.
Of course, none of these pre-RVA23 boards will be relevant anymore, once the first development boards with RVA23-compatible K3 ship next month.
These are also much faster than anything RISC-V currently purchasable. Developers have been playing with them for months through ssh access.
tromp 14 hours ago [-]
But they didn't reflect that in a title like "current RISC-V silicon Is Sloooow" ...
spiderice 14 hours ago [-]
Then how do you justify the title?
crest 12 hours ago [-]
RISC-V lacks a bunch of really useful relatively easy to implement instructions and most extensions are truly optional so you can't rely on them. That's the problem if you let a bunch of academics turn your ISA into a paper mill.
In theory you can spend a lot of effort to make a flawed ISA perform, but it will be neither easy nor pretty e.g. real world Linux distros can't distribute optimised packages for every uarch from dual-issue in-order RV64GC to 8-wide OoO RV64 with all the bells and whistles. Only in (deeply) embedded systems can you retarget the toolchain and optimise for each damn architecture subset you encounter.
kashyapc 10 hours ago [-]
Arm had 40 years to be where it is today. RISC-V is 15 years old. Some more patience is warranted.
Assuming they will keep their word, later this year Tenstorrent is supposed to ship their RVA23-based server development platform[1]. They announced[2] it at the last year's NA RISC-V Summit. Let's see.
The ball is in the court of hardware vendors to cook some high-end silicon.
MIPS, which RISC-V is closely modeled after, is also roughly 4 decades old and was massively hyped in the early 90s as well.
cesaref 26 minutes ago [-]
Just out of interest, why aren't they cross compiling RISC-V? I thought that was common practice when targeting lower performing hardware. It seems odd to me that the build cycle on the target hardware is a metric that matters.
kashyapc 20 minutes ago [-]
Please skim the thread :) We've already discussed it twice. Fedora "mandates" native builds.
Build time on target hardware matters when you're re-building an entire Linux distribution (25000+ packages) every six months.
Levitating 14 hours ago [-]
This is why felix has been building the risc-v archlinux repositories[1] using the Milk-V Pioneer.
I think the ban of SOPHGO is part to blame for the slow development.[2] They had the most performant and interesting SOCs. I had a bunch of pre-orders for the Milk-V Oasis before it was cancelled. It was supposed to come out a while ago, using the SG2380, supposedly much more performant than the Milk-V Titan mentioned in the article (which still isn't out).
It was also SOPHGO's SOCs that powered the crazy cheap/performant/versatile Milk-V DUO boards. They have the ability to switch ARM/RISC-V architecture.
Can you articulate why you think this ban impacted anything and what you think the ban applies to?
Levitating 1 hours ago [-]
I won't pretend to understand the geo-politics or rulings.
What I do know is since the ban, all ongoing products featuring SOPHGO SOCs were cancelled, and I haven't seen any products featuring them since. The SOPHGO forums have also closed down.
The Milk-V Oasis would have had 16 cores (SG2380 w/ SiFive P670), it was replaced by the Milk-V Megrez with just 4 cores (SiFive P550) for around the same price. The new Milk-V Titan has only 8. We're slowly catching up, but the performance is now one or two years behind what it could've been.
The SG2380 would've been the first desktop ready RISC-V SOC at an affordable price. I think it's still the only SOC made that used the SiFive P670 core.
echoangle 9 hours ago [-]
Is there a simple explanation why RISC-V software has to be built on a RISC-V system? Why is it so hard for compilers to compile for a different architecture? The general structure of the target architecture lives inside the compiler code and isn’t generated by introspecting the current system, right?
AnssiH 26 minutes ago [-]
The cross-compiler part itself is easy, but getting all the build scripting of tens of thousands of Fedora packages to work perfectly for cross-compiling would be a lot of work.
There are lots of small issues (libraries or headers not being found, wrong libraries or headers being found, build scripts trying to run the binaries they just built, wrong compiler being used, wrong flags being used, etc.) when trying to cross-compile arbitrary software.
All fixable (cross-compiling entire distributions is a thing), but a lot of work and an extra maintenance burden.
flowerthoughts 3 hours ago [-]
Old compilers tended to make it a compile-time switch which backends were included, probably because backends were "huge", so they were left out. (The insn lookup table in GCC took ages to generate and compile.) And of course all development environments running on Windows assumed x86 was the only architecture.
With LLVM existing, cross-compiling is not a problem anymore, but it means you can't run tests without an emulator. So it might just be easier to do it all on the target machine.
boredatoms 9 hours ago [-]
Under specified build dependencies that use libraries/config on your host OS rather than the target system
You can solve this on a per language basis, but the C/C++ ecosystem is messy.
So people use VMs or real hardware of the target arch to not have to think about it
anarazel 9 hours ago [-]
Cross building of possible, but it's rather useful to be able to test the software you just built... And often enough, tests take more resources than the build.
lifis 14 hours ago [-]
Or they could fix cross compilation and then compile it on a normal x86_64 server
mort96 3 hours ago [-]
Fixing cross compilation is a huge undertaking. So much software needs to be patched to be properly cross-compilable.
utopiah 2 hours ago [-]
FWIW checkout dockcross/linux-riscv32 and dockcross/linux-riscv64 if compilation itself is your problem.
I setup a CopyParty server on a headless RISC-V SBC and was a breeze. Just get the packets, do the thing, move on. Obviously depends on your need but maybe you're not using the right workflow and blame the tools instead.
leni536 14 hours ago [-]
Is cross compilation out of the question?
STKFLT 13 hours ago [-]
I'd guess that the issue is running the `%install` and `%check` stages of the .spec file. The Python library rpy (to pull a random example from Marcin's PRs) runs rpy's pytest test suite and had to be modified to avoid running vector tests on RISC-V.
Obviously a solvable problem to split build and test but perhaps the time savings aren't worth the complexity.
Maybe the tests could be run with user-mode qemu instead of the whole thing running under qemu or on RISC-V hardware. Could possibly be more or less seamless with binfmt_misc being set up in the builders.
kashyapc 10 hours ago [-]
Near as I know, Fedora prefers native compilation for the builds.
Your question made me look up Arm's history in Fedora and came up on this 2012 LWN thread[1]. There's some discussion against cross-compilation already back then.
It's usually an enormous pain to set up. QEMU is probably the best option.
VorpalWay 3 hours ago [-]
Yocto, which we use at work, manages it just fine to build a whole embedded Linux distro. So I don't see why Fedora couldn't make it work if they wanted. You could even scp over the test suites to run that on native systems if you wanted.
mort96 3 hours ago [-]
Yocto manages it thanks to the tireless effort of a community of people maintaining patches and unholy hacks for a ton of software to make it cross compilable. And they have nowhere near the amount of recipes that Fedora has.
How does it handle .so version differences and glibc version differences between the container and the target system?
sofixa 14 hours ago [-]
Depends on the language, it's pretty trivial with Go.
Zambyte 13 hours ago [-]
Unless you use CGO. I've heard people using Zig (which has great cross compilation for the Zig language as well) to cross compile C with CGO though.
IshKebab 13 hours ago [-]
Yes, but they're compiling binutils.
mrbluecoat 10 hours ago [-]
> Random mumblings of ARM developer ... RISC-V is sloooow
Old news. See also:
> Random mumblings of x86_64 developer ... ARM is sloooow
throwa356262 16 minutes ago [-]
What kind or ancient arm hardware are they using here?
On a related note, SoC companies needs to get their act together and start using the latest arm cores. Even the mid range cores of 1-2 years ago show a huge leap in performance:
If I'm reading their chart right, they have barely half as much memory for their RISC-V machine compared to any of the others? I don't know enough to know whether it's actually bottlenecked by memory, but it's a bit odd to claim it's slower, give those numbers, and not say anything about it. I'd hope they ruled that out as the source of the discrepancy, but it's hard to tell without confirmation.
Levitating 56 minutes ago [-]
I think it's mentioned clearly in the article.
> RISC-V builders have four or eight cores with 8, 16 or 32 GB of RAM (depending on a board)
> The UltraRISC UR-DP1000 SoC, present on the Milk-V Titan motherboard should improve situation a bit (and can have 64 GB ram).
RISC-V SOCs just typically don't support much ram. With the exception of the SG2042 which can take 128GB, but it's expensive, buggy and now old.
So I am sure it's a combination of low ram and low clockspeeds.
mkj 10 hours ago [-]
Does that page even say which RISC-V CPUs are being used that are slow? I couldn't see it, which seems a bit of pointless complaining.
Levitating 43 minutes ago [-]
> RISC-V builders have four or eight cores with 8, 16 or 32 GB of RAM (depending on a board).
Which boards are used specifically should not matter much. There's not much available.
Except for the Milk-V Pioneer, which has 64 cores and 128GB ram. But that's an older architecture and it's expensive.
AceJohnny2 10 hours ago [-]
There was a Mastodon post some time back (~1y?) where someone realized that the fastest RISC-V hardware they could get was still slower than running it on QEMU.
That's not how it usually works :\
RISC-V is certainly spreading across niches, but performant computing is not one of them.
Edit: lol the author mentions the same! Perhaps they were the source of the original Mastodon post I'm thinking of.
Levitating 46 minutes ago [-]
The Milk-V Pioneer breaks that barrier, it's expensive though. And the risc-v architecture used is now old, the company that developed is was sanctioned by the US and is now dead.
sylware 1 hours ago [-]
The current hardware used is self-hosting mini-server grade, and certainly not on the latest silicon process. "Slow" is expected.
It is not the ISA, but the implementations and those horrible SDKs which needs to be adjusted for RISC-V (actually any new ISA).
RISC-V needs extremely performant implementations, that on the best silicon process, until then RISC-V _will be_ "slow".
Not to mention, RISC-V is 'standard ISA': assembly writted software is more than appropriate in many cases.
aa-jv 1 hours ago [-]
I don't care as long as it keeps my soldering iron hot.
srott 13 hours ago [-]
Couldn’t be caused by a slower compiler? Fe. What would be a difference when cross compiling same code to aarch64 vs risc-v?
rbalint 14 hours ago [-]
If the builds are slow, build accelerators can help a lot. Ccache would work for sure and there is also firebuild, that can accelerate the linker phase and many other tools in builds.
OK, I'll bite. If this is a truly competitive core - I don't claim enough personal expertise to judge - does anyone fab and sell it? There should be a business case if it is.
luyu_wu 11 hours ago [-]
If I remember correctly,it was taped out by some company as some embedded core in a GPU?
I guess that may be the true use case for 'Open-Source' cores.
That being said, the advertised SPEC2007 scores are close to a M1 in IPC.
sltkr 13 hours ago [-]
Are you sure you are comparing apples with apples here?
The fact that i686 is 14% faster than x86_64 is a little suspicious, because usually the same software runs _faster_ on x86_64 (despite the increased memory use) thanks to a larger register set, an optimized ABI, and more vector instructions.
Of course, if you are compiling an i686 binary on i686, and an x86_64 binary on x86_64, then the compilers aren't really doing the same work, since their output is different. I'm not a compiler expert, but I could imagine that compiling x86_64 binaries is intrinsically slower than for i686 for a variety of reasons. For example, x86_64 is mostly a superset of i686, so a compiler has way more instructions to consider, including potential optimizations using e.g. SIMD instructions that don't exist on i686 at all. Or a compiler might assume a larger instruction cache size, by default, and do more unrolling or inlining when compiling for x86_64. And so on.
In that case, compiling on x86_64 is slower not because the hardware is bad but because the compiler does more work. Perhaps something similar is happening on RISC-V.
jmalicki 12 hours ago [-]
It isn't crazy uncommon to see i686 be faster - usually it means you're memory bandwidth bound.
But yeah, it may mean the benchmark is not representative.
fweimer 13 hours ago [-]
The x86-64 build runs about 50% more linker tests than the i686 build.
10 hours ago [-]
andrepd 14 hours ago [-]
There's zero mention of hardware specs or cost beyond architecture and core counts... What is the purpose of this post?
Anyway, it's hardly surprising that a young ISA with not a 1/1000th of the investment of x86 or ARM has slower chips than them x)
kashyapc 10 hours ago [-]
On benchmarks, for more precision details, I recommend the RISC-V Vector (RVV) benchmarks[1], maintained by Olaf Bernsten. He only covers the Vector stuff, but with great depth.
i. llvm presentation can thrash caches if setup wrong (given the plethora of RISC-V fragmented versions, most compilers won't cover every vanity silicon.)
ii. gcc is also "slow" in general, but is predictable/reliable
iii. emulation is always slower than kvm in qemu
It may seem silly, but I'd try a gcc build with -O0 flag, and a toy unit test with -S to see if the ASM is actually foobar. One may have to force the -mtune=boom flag to narrow your search. Best regards =3
brcmthrowaway 14 hours ago [-]
Why is it slow? I thought we have Rivos chips
rwmj 13 hours ago [-]
Rivos was acquired by Meta last year.
IshKebab 14 hours ago [-]
Yeah it's a few years behind ARM, but not that many. Imagine trying to compile this on ARM 10 years ago. It would be similarly painful.
kllrnohj 13 hours ago [-]
> Imagine trying to compile this on ARM 10 years ago
Cortex A57 is 14 years old and is significantly faster than the 9 year old Cortex A55 these RISC-V cores are being compared against.
So yes it's many years behind. Many, many years.
LeFantome 12 hours ago [-]
SpacemiT K3 is on par with Rockchip RK3588. So, about 4 years behind ARM.
Tenstorrent Atlantis (first Ascalon silicon) should ship in Q2/Q3 and be twice as fast. About as fast as Ryzen5. So, about 5 years behind AMD.
But even the K3 has faster AI than Apple Silicon or Qualcomm X Elite.
Current trend-lines suggest ARM64 and RISC-V performance parity before 2030.
ben-schaaf 10 minutes ago [-]
Not sure why you're taking the rk3588 as a milestone for ARM, when it's a low end chip using core designs that were old when it released. Cortex-A76 is from 2018, so if that's the yardstick then the K3 is 8 years behind. Even then at the time the A76 was released Apple was significantly ahead with their own ARM CPUs.
HerbManic 10 hours ago [-]
I love the optimisim, but I do thimk your time line is little quick. It will be more like 10 years than 4.
kllrnohj 11 hours ago [-]
> SpacemiT K3 is on par with Rockchip RK3588. So, about 4 years behind ARM.
That'd be ~7 years behind, not 4. Cortex A76 came out in late 2018. Also what benchmarks are you looking at?
> Tenstorrent Atlantis (first Ascalon silicon) should ship in Q2/Q3 and be twice as fast. About as fast as Ryzen5. So, about 5 years behind AMD.
Which Ryzen 5? The first Ryzen 5 came out in 2017, which was a lot more than 5 years ago.
> But even the K3 has faster AI than Apple Silicon or Qualcomm X Elite.
Which isn't RISC-V. Might as well brag about a RISC-V CPU with an RTX 5090 being faster at CUDA than a Nintendo Switch. That's a coprocessor that has nothing to do with the ISA or CPU core.
> Current trend-lines suggest ARM64 and RISC-V performance parity before 2030.
L. O. fucking. L. That's not how this works. That's not how any of this works.
hackerInnen 14 hours ago [-]
This. While I doubt that there will be a good (whatever that means) desktop risc-v CPU anytime soon, I do think that it will eventually catch up in embedded systems and special applications. Maybe even high core count servers.
It just takes time, people who believe in it and tons of money. Will see where the journey goes, but I am a big risc-v believer
theodric 13 hours ago [-]
[flagged]
throwaway27448 14 hours ago [-]
[flagged]
primis 14 hours ago [-]
Hey! I get this is a throwaway account so you might not answer, but I really, really don't like opening an article and having the first thing I see in a thread be someone calling the author a slur. There are ways of expressing insult without bringing intellectual disabilities into the mix.
dmit 14 hours ago [-]
For future readers: throwaway27448's comment used to say something completely different, featuring the r-slur, and then immediately edited.
throwaway27448 6 hours ago [-]
[flagged]
notenlish 2 hours ago [-]
Can you explain why you think the author is stupid.
throwaway27448 6 hours ago [-]
The author could try to not be retarded for once
11 hours ago [-]
ephou7 14 hours ago [-]
Ulrich Drepper, Lennart Poettering, this clown. Red Hat seems to have a skill of hiring savants with high technical and low social aptitude.
devl547 4 hours ago [-]
Is it RISC-V or bloated software full of layered abstractions?
First, we do have a recent 'binutils' build[1] with test-suites in 67 minutes (it was Milk-V Megrez) in the Fedora RISC-V build system.
Second, the current fastest development machine is not Banana Pi BPI-F3. If we consider what is reasonably accessible today, it is SiFive "HiFive P550" (P550 for short) and an upcoming UltraRISC "DP1000", we have access to an eval board.
FWIW, in our FOSDEM talk earlier this year: "Fedora on RISC-V: state of the arch"[1]. It gives an overview of the hardware situation and it also has a couple of related poorman's benchmarks (an 'xz' compression test and a 'binutils' build without* the test-suite on the above two boards - that's what I could manage with the time I had).
Edit: Marcin's RISC-V test was done on StarFive "Vision Five 2". This small board has its strengths (upstreamed drivers), but it is not known for its speed!
[1] https://riscv-koji.fedoraproject.org/koji/taskinfo?taskID=91...
[2] Slides: https://fosdem.org/2026/events/attachments/SQGLW7-fedora-on-...
RISC-V will get there, eventually.
I remember that ARM started as a speed demon with conscious power consumption, then was surpassed by x86s and PPCs on desktops and moved to embedded, where it shone by being very frugal with power, only to now be leaving the embedded space with implementations optimised for speed more than power.
1) https://github.com/llvm/llvm-project/issues/150263
2) https://github.com/llvm/llvm-project/issues/141488
Another example is hard-coded 4 KiB page size which effectively kneecaps ISA when compared against ARM.
All of these extensions are mandatory in the RVA22 and RVA23 profiles and so will be implemented on any up to date RISC-V core. It's definitely worth setting your compiler target appropriately before making comparisons.
The RISC-V ecosystem being handicapped by backwards compatibility does not make sense at this point.
Every new RISC-V board is going to be RVA23 capable. Now is the time to draw a line in the sand.
I only use it for microcontrollers and it's really nice there. But yeah I can imagine it doesn't perform well on bigger stuff. The idea of risc was to put the intelligence in the compiler though, not the silicon.
X86-64 also has “profiles” which tell you what extensions should be available. There is x86-64v1 and x86-64v4 with v2 and v3 in the middle.
RVA23 offers a very similar feature-set to x86-64v4.
You do not end up with a mess of extensions. You get RVA23. Yes, RVA23 represents a set of mandatory extensions. The important thing is that two RVA23 compliant chips will implement the same ones.
But the most important point is that you cannot “just use x86-64”. Only Intel and AMD can do that. Anybody can build a RISC-V chip. You do not need permission.
2. Also, fundamentally all modern CPUs are still 64-bit version of 80386. MMU, protection, low level details are all same.
Uh, because you can't? It's not open in any meaningful sense.
See, for example, https://www.pingcap.com/blog/transparent-huge-pages-why-we-d...
Nope. See https://github.com/llvm/llvm-project/issues/110454 which was linked in the first issue. The spec authors have managed to made a mess even here.
Now they want to introduce yet another (sic!) extension Oilsm... It maaaaaay become part of RVA30, so in the best case scenario it will be decades before we will be able to rely on it widely (especially considering that RVA23 is likely to become heavily entrenched as "the default").
IMO the spec authors should've mandated that the base load/store instructions work only with aligned pointers and introduced misaligned instructions in a separate early extension. (After all, passing a misaligned pointer where your code does not expect it is a correctness issue.) But I would've been fine as well if they mandated that misaligned pointers should be always accepted. Instead we have to deal the terrible middle ground.
>atomic memory operations are made mandatory in Ziccamoa
In other words, forget about potential performance advantages of load-link/store-conditional instructions. `compare_exchange` and `compare_exchange_weak` will always compile into the same instructions.
And I guess you are fine with the page size part. I know there are huge-page-like proposals, but they do not resolve the fundamental issue.
I have other minor performance-related nits such `seed` CSR being allowed to produce poor quality entropy which means that we have bring a whole CSPRNG if we want to generate a cryptographic key or nonce on a low-powered micro-controller.
By no means I consider myself a RISC-V expert, if anything my familiarity with the ISA as a systems language programmer is quite shallow, but the number of accumulated disappointments even from such shallow familiarity has cooled my enthusiasm for RISC-V quite significantly.
And where it actually mattered they did not introduce a separate extension. Integer division is significantly more complex than multiplication, so it may make sense for low-end microcontrollers to implement in hardware only the latter.
As for `seed`, if you're running on a microcontroller you can just look up the data sheet to see if it's seed entropy is sufficient. By the time you get to CPUs where portable code is important a CSPRNG is probably fine.
I agree about page size though. Svnapot seems overly complicated and gives only a fraction of the advantages of actually bigger pages.
It's a terrible attitude to have towards programmers, but looking at misaligned ops, I guess we can see a pattern from RISC-V authors here.
Most programmers do not target a concrete microcontroller and develop every line of code from scratch. They either develop portable libraries (e.g. https://docs.rs/getrandom) or build their projects using those libraries.
The whole raison d'être of an ISA is to provide a portable contract between hardware vendors and programmers . RISC-V authors shirk this responsibility with "just look at your micro specs, lol" attitude.
aka, Zicclsm / RVA23 are entirely-useless as far as actually getting to make use of native misaligned loads/stores goes.
Right but it doesn't guarantee that anything is unreasonably slow does it? I am free to make an RVA23 compliant CPU with a div instruction that takes 10k cycles. Does that mean LLVM won't output div? At some point you're left with either -mcpu=<specific cpu> and falling back to reasonable assumptions about the actual hardware landscape.
Do ARM or x86 make any guarantees about the performance of misaligned loads/stores? I couldn't find anything.
Indeed one can make any instruction take basically-forever, but I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation (on at least some inputs), else having said instruction is strictly-redundant.
And if any significant general-purpose hardware actually did a 10k-cycle div around the time the respective compiler defaults were decided, I think there's a good chance that software would have defaulted to calling division through a function such that an implementation can be picked depending on the running hardware. (let's ignore whether 10k-cycle-division and general-purpose-hardware would ever go together... but misaligned-mem-ops+general-purpose-hardware definitely do)
How is that different for RISC-V?
> I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation
I agree! So just use misaligned loads if Zicclsm is supported. As you observed there's a feedback loop between what compilers output and what gets optimised in hardware. Since RVA23 hardware is basically non-existent at the moment you kind of have the opportunity to dictate to hardware "LLVM will use misaligned accesses on RVA23; if you make an RVA23 chip where this is horribly slow then people will laugh at you and assume it's some sort of silicon defect".
About 1/3 of the opcode space is used currently so there's a decent amount of space left.
The problem is decades of software being written on a chip that from the outside appears not to care.
This is the classic conundrum of legacy system redesign - if customers keep demanding every feature of the old system be present, and work the exact same then the new system will take on the baggage it was designed to get rid of.
The new implementation will be slow and buggy by this standard and nobody will use it.
If the CPU doesn't do it software must make many tiny conditional copies which is bad for branch prediction.
This sucks double when you have variable length vector operations... IMO fast unaligned memory accesses should have been mandatory without exceptions for all application-level profiles and everything with vector.
Neither are RISC nor modern.
I have only seen PDP-11 Assembly snippets in UNIX related books, wasn't aware of its alignment requirements.
It is quite likely that not allowing the misaligned access was also influenced by PDP-11.
This is primarily because core is primarily a teaching ISA. One of the best parts about RiscV is that you can teach a freshman level architecture class or a senior level chip building project with an ISA that is actually used. Anything powerful to run (a non built from source manually) linux will support a profile that bundles all the commonly needed instructions to be fast.
https://five-embeddev.com/riscv-bitmanip/1.0.0/bitmanip.html
I can see quite a few items on that list that imnsho should have been included in the core and for the life of me I can't see the rationale behind leaving them out. Even the most basic 8 bit CPU had various shifts and rolls baked in.
If a CPU built in 1985 with a grand total of 26 000 transistors could afford it, I am pretty sure that anything built in this century could afford it too.
You'd be excluding many small CPUs which exist within other chips running very specialized code.
As profiles mandate these instructions anyway, there's no good reason to complicate the most basic RISC-V possible.
RISC-V is the ISA for everything, from the smallest such CPUs to supercomputers.
To the best of my knowledge (and Google-fu), 26K really isn't a lot of transistors for an embedded MCU - at least not a fully-featured 32-bit one comparable to a minimal RISC-V core. An ARM Cortex M0, which is pretty much the smallest thing out there, is around 10K gates => around 40K transistors. This is also around the same size as a minimal RISC-V core AFAICT.
The ARM core has a shifter, though.
There are many chips in the market that do embed 8051s for janitorial tasks, because it is small and not legally encumbered. Some chips have several non-exposed tiny embedded CPUs within.
RISC-V is replacing many of these, bringing modern tooling. There's even open source designs like SERV that fit in a corner of an already small FPGA, leaving room for other purposes.
(Although I do have to eat my words here - I didn't check that Wikipedia page, and it does actually list a ~6K RISC-V core! It's an experimental academic prototype "made from a two-dimensional material [...] crafted from molybdenum disulfide"; I don't know if that construction might allow for a more efficient transistor count and it's totally impractical - 1KHz clock speed, 1-bit ALU, etc. - for almost any purpose, but it is technically a RISC-V implementation significantly smaller than 26K)
That sounds like a microcoded RISC-V implementation, which can really be done for any ISA at the extreme expense of speed.
Maybe other CPU's have it as well, though I do not have enough information on that.
This is actually kind of counter to your point. The really tiny micro-controllers from the 80s only had 224 bits of registers. RV32E is at least twice that (16 registers*32 bits), and modern mcus generally use 2-4kbs of sram, so the overhead of a 32 bit barrel shifter is pretty minimal.
Same could be said of MIPS.
My understanding is the RISC-V raison d'etre is rather avoidance of patented/copywritten designs.
In spite of the currently mediocre RISC-V implementations, RISC-V seems to have more of a future and isn't clouded by ISA IP issues, as you note.
Regarding silicon implementations, consider that 1) you can synthesize it from HDL/RTL designs using modern CAD tools, and 2) MIPS was originally designed to be simple enough for grad students to implement with the primitive CAD tools of the 1980s (basically semi-manual layout).
Why did it fall to them to do it? Impressive that he did, but it shouldn't have been necessary.
https://wren.wtf/hazard3/doc/#extension-xh3bextm-section
There are also four other custom extensions implemented.
Page size can be easily extended down the line without breaking changes.
Not trolling: I legitimately don't see why this is assumed to be true. It is one of those things that is true only once it has been achieved. Otherwise we would be able to create super high performance Sparc or SuperH processors, and we don't.
As you note, Arm once was fast, then slow, then fast. RISC-V has never actually been fast. It has enabled surprisingly good implementations by small numbers of people, but competing at the high end (mobile, desktop or server) it is not.
I'm a chip designer and I see people using RISC-V as small processor cores for things like PCIE link training or various bookkeeping tasks. These don't need to be fast, they need to be small and low power which means they will be relatively slow.
Most people on tech review sites only care about desktop / laptop / server performance. They may know about some of the ARM Cortex A series CPUs that have MMUs and can run desktop or smartphone Linux versions.
They generally don't care about the ARM Cortex M or R versions for embedded and real time use. Those are the areas where you don't need high performance and where RISC-V is already replacing ARM.
EDIT:
I'll add that there are companies that COULD make a fast RISC-V implementation.
Intel, AMD, Apple, Qualcomm, or Nvidia could redirect their existing teams to design a high performance RISC-V CPU. But why should they? They are heavily invested in their existing x86 and ARM CPU lines. Amazon and Google are using licensed ARM cores in their server CPUs.
What is the incentive for any of them to make a high performance RISC-V CPU? The only reason I can think of is that Softbank keeps raising ARM licensing costs and it gets high enough that it is more profitable to hire a team and design your own RISC-V CPU.
But if you look at the Intel x86 smartphone chips from about 10 years ago they had to make an ARM to x86 emulator because even the Java apps contained native ARM instructions for performance reasons.
Qualcomm is trying to push their ARM Snapdragon chips in Windows laptops but I don't think they are selling well.
Nvidia could also make RISC-V based chips but where would they go? Nvidia is moving further away from the consumer space to the data center space. So even if Nvidia made a really fast RISC-V CPU it would probably be for the server / data center market and they may not even sell it to ordinary consumers.
Or if they did it could be like the Ampere ARM chips for servers. Yeah you can buy one as an ordinary consumer but they were in the $4,000 range last time I looked. How many people are going to buy that?
That definitely seems to be the case. I think they likely would have more luck with Riscv phones (much less app brand loyalty). or servers (arm in the server has done a lot better than on windows)
For Nvidia, if they made a consumer riscv cpu it would be a gaming handheld/console (Switch 3 or similar) once the AI bubble pops. Before that, likely would be server cpus that cost $10k for big AI systems. Before that, I could see them expanding the role of Riscv in their GPUs (likely not visible to to users).
Why? Because they want something like a $300 CPU and $150 motherboard using standard DDR4/5 DIMMs that is RISC-V or ARM or something not x86 but is faster than x86. The sub $1000 systems that hardware companies make that are RISC-V or ARM chips are low end embedded single board systems that are too slow for these people. The really fast systems are $4000 server level chips that they can't afford. The only company really bringing fast non-x86 CPUs with consumer level pricing is Apple. We can also include Qualcomm but I'm skeptical of the software infrastructure and compatibility since they are relying on x86 emulation for windows.
Especially the lack of integer overflow detection is a choice of great stupidity, for which there exists no excuse.
Detecting integer overflow in hardware is extremely cheap, its cost is absolutely negligible. On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably, because each arithmetic operation must be replaced by multiple operations.
Because of the unacceptable cost, normal RISC-V programs choose to ignore the risk of overflows, which makes them unreliable.
The highest performance implementations of RISC-V from previous years were forced to introduce custom extensions for indexed addressing, but those used inefficient encodings, because something like indexed addressing must be in the base ISA, not in an extension.
Most languages don't care about integer overflow. Your typical C program will happily wrap around.
If I really want to detect overflow, I can do this:
Which is one more instruction, which is not great, not terrible.And what did I find? Yep that code is right from the manual for unsigned integer overflow.
For signed addition if you know one of the signs (eg it’s a compile time constant) the manual says
But the general case for signed addition if you need to check for overflow and don’t have knowledge of the signs From what I’ve read most native compiled code doesn’t really check for overflows in optimised builds, but this is more of an issue for JavaScript et al where they may detect the overflow and switch the underlying type? I’m definitely no expert on this.Code size is a benefit for x86-64 however - no one is arguing that - but you have to trade that against the difficulty of instruction decoding.
The correct sequence of instructions is given in the RISC-V documentation and it needs more instructions.
"Integer overflow" means "overflow in operations with signed integers". It does not mean "overflow in operations with non-negative integers". The latter is normally referred as "carry".
The 2 instructions given above detect carry, not overflow.
Carry is needed for multi-word operations, and these are also painful on RISC-V, but overflow detection is required much more frequently, i.e. it is needed at any arithmetic operation, unless it can be proven by static program analysis that overflow is impossible at that operation.
It's easy to believe you're replying to something that has an element of hyperbole.
It's hard to believe "just do 2x as many instructions" and "ehhh who cares [i.e. your typical C program doesn't check for overflow]", coupled to a seemingly self-conscious repetition of a quip from the television series Chernobyl that is meant to reference sticking your head in the sand, retire the issue from discussion.
The sequence of instructions given above is incorrect, it does not detect integer overflow (i.e. signed integer overflow). It detects carry, which is something else.
The correct sequence, which can be found in the official RISC-V documentation, requires more instructions.
Not checking for overflow in C programs is a serious mistake. All decent C compilers have compilation options for enabling checking for overflow. Such options should always be used, with the exception of the functions that have been analyzed carefully by the programmer and the conclusion has been that integer overflow cannot happen.
For example with operations involving counters or indices, overflow cannot normally happen, so in such places overflow checking may be disabled.
this just isn't true. both addition and multiplication can check for overflow in <2 instructions.
I know this is a very negative take. I don't try to hide my pro-Power ISA bias, but that doesn't mean I wouldn't like another choice. So far, however, I've been repeatedly disappointed by RISC-V. It's always "five or six years" from getting there.
SPARC (formerly called Berkeley RISC) and MIPS were pioneers that experimented with various features or lack of features, but they were inferior from many points of view to the earlier IBM 801.
The RISC ISAs developed later, including ARM, HP PA-RISC and IBM POWER, have avoided some of the mistakes of SPARC and MIPS, while also taking some features from IBM 801 (e.g. its addressing modes), so they were better.
The x86-64 is a dog's breakfast of features. But due to its widespread use, compiler writers make the effort to create compilers that optimize for its quirks.
Itanium hardware designers were expecting the compiler writers to cater for its unique design. Intel is a semi company. As good as some of their compilers are, internally they invested more in their biggest seller and the Itanium never got the level of support that was anticipated at the outset.
You're saying ISA design does have implementation performance implications then? ;)
> There's no one that expects it'll be hard to optimize for
[Raises hand]
> There are at least 2 designs that have taped out in small runs and have high end performance.
Are these public?
Edit: I should add, I'm well aware of the cultural mismatch between HN and the semi industry, and have been caught in it more than a few times, but I also know the semi industry well enough to not trust anything they say. (Everything from well meaning but optimistic through to outright malicious depending on the company).
> There's no one that expects it'll be hard to optimize for
No one who is an expert in the field, and we (at Red Hat) talk to them routinely.
The most promising RISC-V companies today have not set out to compete directly with Intel, AMD, Apple or Samsung, but are targeting a niche such as AI, HPC and/or high-end embedded such as automotive.
And you can bet that Qualcomm has RISC-V designs in-house, but only making ARM chips right now because ARM is where the market for smartphone and desktop SoCs is. Once Google starts allowing RVA23 on Android / ChromeOS, the flood gates will open.
But if you’re judging an ISA by performance scalability, you generally want to look at single-threaded performance.
However, it takes time from microarchitecture to chips, and from chips to products on shelves.
The very first RVA23-compatible chips to show up will likely be the spacemiT K3 SoC, due in development boards April (i.e. next month).
More of them, more performant, such as a development board with the Tenstorrent Ascalon CPU in the form of the Atlantis SoC, which was tapped out recently, are coming this summer.
It is even possible such designs will show up in products aimed at the general public within the present year.
Is this because Fedora 44 is going to beta?
The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V. I do not believe this is a lack of software optimization issue.
We are well past the days where hand written assembly gives much benefit, and modern compilers like gcc and llvm do nearly identical work right up until it comes to instruction emissions (including determining where SIMD instructions could be placed).
Unless these chips have very very weird performance characteristics (like the weirdness around x86's lea instruction being used for arithmetic) there's just not going to be a lot of missed heuristics.
There's a reason, for example, why the linux distros all target a generic x86 architecture rather than a specific architecture.
https://discourse.ubuntu.com/t/introducing-architecture-vari...
However, other applications which must do cryptographic operations, audio/video processing, scientific/technical/engineering computing, etc. may have wildly different performances when compiled for different x86-64 ISA versions, for which dedicated assembly-language functions exist.
Java, interestingly enough, is somewhat leading the way here with their Vector API. I think they actually have one of the better setups for allowing someone to write fast code that is platform independent.
C++ is also diving into this realm. 26 just merged in now SIMD instructions.
That is the bulk of the benefit of diving down into assembly.
https://en.cppreference.com/w/cpp/numeric/simd.html
Most of the applications whose performance matters for me, because I must wait a non-negligible time for them to do their job, are dependent on assembly implementation for certain functions invoked inside critical loops. I do not see any sign of replacements for them. On the contrary, Intel, AMD and Arm continue to introduce special instructions that are useful in certain niche applications and taking advantage of them will require additional assembly language functions, not less.
For me, there is only one application that I use and which consumes non-negligible computer time and which does not depend on SIMD optimizations, which is the compilation of software projects.
This is pretty vague and makes it sounds like there are big differences in instruction sets.
In actuality it comes down to memory access first which has nothing to with instructions.
After that it comes down to simple SIMD/AVX instructions and not some exotic entirely different instruction set.
There's no carry bit, and no widening multiply(or MAC)
Integer MAC doesn't exist, and is also hindered by a design decision not to require more than two source operands, so as to allow simple implementations to stay simple. The same reason also prevents RISC-V from having a true conditional move instruction: there is one but the second operand is hard-coded zero.
FMAC exists, but only because it is in the IEEE 754 spec ... and it requires significant op-code space.
That's true, but tautological.
The issue is that the RISC-V core is the easy part of the problem, and nobody seems to even be able to generate a chip that gets that right without weirdness and quirks.
The more fundamental technical problem is that things like the cache organization and DDR interface and PCI interface and ... cannot just be synthesized. They require analog/RF VLSI designers doing things like clock forwarding and signal integrity analysis. If you get them wrong, your performance tanks, and, so far, everybody has gotten them wrong in various ways.
The business problem is the fact that everybody wants to be the "performance" RISC-V vendor, but nobody wants to be the "embedded" RISC-V vendor. This is a problem because practically anybody who is willing to cough up for a "performance" processor is almost completely insensitive to any cost premium that ARM demands. The embedded space is hugely sensitive to cost, but nobody is willing to step into it because that requires that you do icky ecosystem things like marketing, software, debugging tools, inventory distribution, etc.
This leads to the US business problem which is the fact that everybody wants to be an IP vendor and nobody wants to ship a damn chip. Consequently, if I want actual RISC-V hardware, I'm stuck dealing with Chinese vendors of various levels of dodginess.
Over a decade ago: https://news.ycombinator.com/item?id=8235120
RISC-V will get there, eventually.
Strong doubt. Those of us who were around in the 90s might remember how much hype there was with MIPS.
It was not designed to be one, but it ended up being surprisingly fast.
A lot of times the path to the highest performing CPU seems to be to optimize for power first, then speed, then repeat. That's because power and heat are a major design constraint that limits speed.
I first noticed this way back with the Pentium 4 "Netburst" architecture vs. the smaller x86 cores that became the ancestor of the Core architecture. Intel eventually ran into a wall with P4 and then branched high performance cores off those lower-power ones and that's what gave us the venerable Core architecture that made Intel the dominant CPU maker for over a decade.
ARM's history is another example.
In comparison, I think Arm is actually a very strong cautionary tale that focusing on power will not get you to performance. Arm processors remained pretty poor performance until designers from other CPU families entirely (PowerPC and Intel) took it on at Apple and basically dragged Arm to the performance level they are today.
https://stackoverflow.com/questions/45066299/was-there-a-p4-...
Banias was hyper optimized for power, the mantra was to get done quickly and go to sleep to save power. Somewhere along the line someone said "hey what happens if we don't go to sleep?" and Core was born.
At this point the most likely place for truly competitive RISC-V to appear is China.
BTW. Keller is also on the board of AheadComputing — founded by former Intel engineers behind the fabled "Royal Core".
That's useable for many applications, but it's not going to change the world. A lot of "micro PCs" with low power CPUs are well past that now. If that's what Ascalon turns out to be, it will amount to an SBC class device.
The Raspberry Pi 5 results on Geekbench 6 are all over the place. A score between 500 to 900 in single core and a 2000 multi core score.
Radxa 4 is an SBC based around the N100 and it basically gets the same or slightly higher performance as the Raspberry Pi 5.
Meanwhile the i5-9600K gets a score of 1677 in single core, which is 83% of the performance of the entire Raspberry Pi 5 and gets a score of 6199 when using multiple cores, that's 3x the performance.
I'd call this at least "Laptop class" and you even admitted yourself back in 2025 that you're using a processor on that level.
Supposedly happened earlier this year. Tenstorrent says devboards in Q3.
Now we just wait.
Or we just adopt Loongson.
> In the current version of this architecture specification, TLB refill and consistent maintenance between TLB and page tables are still [sic] all led by software.
https://loongson.github.io/LoongArch-Documentation/LoongArch...
https://lwn.net/Articles/932048/
BTW, it's quite impressive how the s390x is so fast per core compared to the others. I mean, of course it's fast - we all knew that.
And don't let IBM legal see this can be considered a published benchmark, because they are very shy about s390x performance numbers.
What is the current fastest platform that isn’t exorbitantly expensive? Not upcoming releases, but something I can actually buy.
I check in every 3-6 months but the situation hasn’t changed significantly yet.
The cores are in my experience moderately fast at most. Note that there are a lot of licencing options and I think some are speed-capped - but I don't think that applies to IFL - a standard CPU licence-restricted to only run linux.
Except the K3 kills it on AI (60 TOPS).
SpacemiT K3 is 2010 Macbook performance single-core, 2019 Macbook Air multi-core, and better than M4 Apple Silicon for AI.
So I guess it depends on what you are going to do with it.
E.g. M4 total system memory bandwidth is 120GB/s whereas K4 is 51GB/s, single core memory bandwidth is 100-120GB/s vs ~30GB/s. M4 has 10 CPU cores and neural engine with 16 cores whereas K3 has 8 CPU cores and 8 "AI" cores, K3 clock frequency is almost half the clock frequency in M4 etc. etc.
But anyway thanks for sharing, always good to learn about new hardware.
Ironically, its SoC (spacemiT K1) is slower than the JH7110 used in the first mass-produced RISC-V SBC, VisionFive 2.
But unlike JH7110, it has vector 1.0, making it a very popular target.
Of course, none of these pre-RVA23 boards will be relevant anymore, once the first development boards with RVA23-compatible K3 ship next month.
These are also much faster than anything RISC-V currently purchasable. Developers have been playing with them for months through ssh access.
In theory you can spend a lot of effort to make a flawed ISA perform, but it will be neither easy nor pretty e.g. real world Linux distros can't distribute optimised packages for every uarch from dual-issue in-order RV64GC to 8-wide OoO RV64 with all the bells and whistles. Only in (deeply) embedded systems can you retarget the toolchain and optimise for each damn architecture subset you encounter.
Assuming they will keep their word, later this year Tenstorrent is supposed to ship their RVA23-based server development platform[1]. They announced[2] it at the last year's NA RISC-V Summit. Let's see.
The ball is in the court of hardware vendors to cook some high-end silicon.
[1] https://tenstorrent.com/ip/risc-v-cpu
[2] https://static.sched.com/hosted_files/riscvsummit2025/e2/Unl...
Build time on target hardware matters when you're re-building an entire Linux distribution (25000+ packages) every six months.
I think the ban of SOPHGO is part to blame for the slow development.[2] They had the most performant and interesting SOCs. I had a bunch of pre-orders for the Milk-V Oasis before it was cancelled. It was supposed to come out a while ago, using the SG2380, supposedly much more performant than the Milk-V Titan mentioned in the article (which still isn't out).
It was also SOPHGO's SOCs that powered the crazy cheap/performant/versatile Milk-V DUO boards. They have the ability to switch ARM/RISC-V architecture.
[1]: https://archriscv.felixc.at/
[2]: https://www.tomshardware.com/tech-industry/artificial-intell...
What I do know is since the ban, all ongoing products featuring SOPHGO SOCs were cancelled, and I haven't seen any products featuring them since. The SOPHGO forums have also closed down.
The Milk-V Oasis would have had 16 cores (SG2380 w/ SiFive P670), it was replaced by the Milk-V Megrez with just 4 cores (SiFive P550) for around the same price. The new Milk-V Titan has only 8. We're slowly catching up, but the performance is now one or two years behind what it could've been.
The SG2380 would've been the first desktop ready RISC-V SOC at an affordable price. I think it's still the only SOC made that used the SiFive P670 core.
There are lots of small issues (libraries or headers not being found, wrong libraries or headers being found, build scripts trying to run the binaries they just built, wrong compiler being used, wrong flags being used, etc.) when trying to cross-compile arbitrary software.
All fixable (cross-compiling entire distributions is a thing), but a lot of work and an extra maintenance burden.
With LLVM existing, cross-compiling is not a problem anymore, but it means you can't run tests without an emulator. So it might just be easier to do it all on the target machine.
You can solve this on a per language basis, but the C/C++ ecosystem is messy. So people use VMs or real hardware of the target arch to not have to think about it
I setup a CopyParty server on a headless RISC-V SBC and was a breeze. Just get the packets, do the thing, move on. Obviously depends on your need but maybe you're not using the right workflow and blame the tools instead.
Obviously a solvable problem to split build and test but perhaps the time savings aren't worth the complexity.
https://src.fedoraproject.org/rpms/rpy/pull-request/4#reques...
Your question made me look up Arm's history in Fedora and came up on this 2012 LWN thread[1]. There's some discussion against cross-compilation already back then.
[1] https://lwn.net/Articles/487622/
https://t2linux.com/
https://github.com/dockcross/dockcross
Old news. See also:
> Random mumblings of x86_64 developer ... ARM is sloooow
On a related note, SoC companies needs to get their act together and start using the latest arm cores. Even the mid range cores of 1-2 years ago show a huge leap in performance:
https://sbc.compare/56-raspberry-pi-500-plus-16gb/101-radxa-...
> RISC-V builders have four or eight cores with 8, 16 or 32 GB of RAM (depending on a board)
> The UltraRISC UR-DP1000 SoC, present on the Milk-V Titan motherboard should improve situation a bit (and can have 64 GB ram).
RISC-V SOCs just typically don't support much ram. With the exception of the SG2042 which can take 128GB, but it's expensive, buggy and now old.
So I am sure it's a combination of low ram and low clockspeeds.
Which boards are used specifically should not matter much. There's not much available.
Except for the Milk-V Pioneer, which has 64 cores and 128GB ram. But that's an older architecture and it's expensive.
That's not how it usually works :\
RISC-V is certainly spreading across niches, but performant computing is not one of them.
Edit: lol the author mentions the same! Perhaps they were the source of the original Mastodon post I'm thinking of.
It is not the ISA, but the implementations and those horrible SDKs which needs to be adjusted for RISC-V (actually any new ISA).
RISC-V needs extremely performant implementations, that on the best silicon process, until then RISC-V _will be_ "slow".
Not to mention, RISC-V is 'standard ISA': assembly writted software is more than appropriate in many cases.
I guess that may be the true use case for 'Open-Source' cores.
That being said, the advertised SPEC2007 scores are close to a M1 in IPC.
The fact that i686 is 14% faster than x86_64 is a little suspicious, because usually the same software runs _faster_ on x86_64 (despite the increased memory use) thanks to a larger register set, an optimized ABI, and more vector instructions.
Of course, if you are compiling an i686 binary on i686, and an x86_64 binary on x86_64, then the compilers aren't really doing the same work, since their output is different. I'm not a compiler expert, but I could imagine that compiling x86_64 binaries is intrinsically slower than for i686 for a variety of reasons. For example, x86_64 is mostly a superset of i686, so a compiler has way more instructions to consider, including potential optimizations using e.g. SIMD instructions that don't exist on i686 at all. Or a compiler might assume a larger instruction cache size, by default, and do more unrolling or inlining when compiling for x86_64. And so on.
In that case, compiling on x86_64 is slower not because the hardware is bad but because the compiler does more work. Perhaps something similar is happening on RISC-V.
But yeah, it may mean the benchmark is not representative.
Anyway, it's hardly surprising that a young ISA with not a 1/1000th of the investment of x86 or ARM has slower chips than them x)
[1] https://camel-cdr.github.io/rvv-bench-results/
i. llvm presentation can thrash caches if setup wrong (given the plethora of RISC-V fragmented versions, most compilers won't cover every vanity silicon.)
ii. gcc is also "slow" in general, but is predictable/reliable
iii. emulation is always slower than kvm in qemu
It may seem silly, but I'd try a gcc build with -O0 flag, and a toy unit test with -S to see if the ASM is actually foobar. One may have to force the -mtune=boom flag to narrow your search. Best regards =3
Cortex A57 is 14 years old and is significantly faster than the 9 year old Cortex A55 these RISC-V cores are being compared against.
So yes it's many years behind. Many, many years.
Tenstorrent Atlantis (first Ascalon silicon) should ship in Q2/Q3 and be twice as fast. About as fast as Ryzen5. So, about 5 years behind AMD.
But even the K3 has faster AI than Apple Silicon or Qualcomm X Elite.
Current trend-lines suggest ARM64 and RISC-V performance parity before 2030.
That'd be ~7 years behind, not 4. Cortex A76 came out in late 2018. Also what benchmarks are you looking at?
> Tenstorrent Atlantis (first Ascalon silicon) should ship in Q2/Q3 and be twice as fast. About as fast as Ryzen5. So, about 5 years behind AMD.
Which Ryzen 5? The first Ryzen 5 came out in 2017, which was a lot more than 5 years ago.
> But even the K3 has faster AI than Apple Silicon or Qualcomm X Elite.
Which isn't RISC-V. Might as well brag about a RISC-V CPU with an RTX 5090 being faster at CUDA than a Nintendo Switch. That's a coprocessor that has nothing to do with the ISA or CPU core.
> Current trend-lines suggest ARM64 and RISC-V performance parity before 2030.
L. O. fucking. L. That's not how this works. That's not how any of this works.
It just takes time, people who believe in it and tons of money. Will see where the journey goes, but I am a big risc-v believer