Ask questionsPortable SIMD project group
The Major Change Process was proposed in RFC 2936 and is not yet in full operation. This template is meant to show how it could work.
Create a project group for considering what portable SIMD in the standard library should look like.
While Rust presently exposes ALU features of the underlying ISAs in a portable way, it doesn't expose SIMD capabilities in a portable way except for autovectorization.
A wide variety of computation tasks can be accomplished faster using SIMD than using the ALU capabilities. Relying on autovectorization to go from ALU-oriented source code to SIMD-using object code is not a proper programming model. It is brittle and depends on the programmer being able to guess correctly what the compiler back end will do. Requiring godbolting for every step is not good for programmer productivity.
Using ISA-specific instructions results in ISA-specific code. For things like "perform lane-wise addition these two vectors of 16
u8 lanes" should be a portable operation for the same reason as "add these two
u8 scalars" is a portable operation that does not require the programmer to write ISA-specific code.
Typical use cases for SIMD involve text encoding conversion and graphics operations on bitmaps. Firefox already relies of the Rust
packed_simd crate for text encoding conversion.
Compiler back ends in general and LLVM in particular provide a notion of portable SIMD where the types are lane-aware and of particular size and the operations are ISA-independent and lower to ISA-specific instructions later. To avoid a massive task of replicating the capabilities of LLVM's optimizer and back ends, it makes sense to leverage this existing capability.
However, to avoid exposing the potentially subject-to-change LLVM intrinsics, it makes sense expose an API that is conceptually close and maps rather directly to the LLVM concepts while making sense for Rust and being stable for Rust applications. This means introducing lane-aware types of typical vector sizes, such as
f32x4, etc., and providing lane-wise operations that are broadly supported by various ISAs on these types. This means basic lane-wise arithmetic and comparisons.
Additionally, it is essential to provide shuffles where what lane goes where is known at compile time. Also, unlike the LLVM layer, it makes sense to provide distinct boolean/mask vector types for the outputs of lanewise comparisons, because encoding the invariant that all bits of a lane are either one or zero allows operations like "are all lanes true" or "is at least one lane true" to be implemented more efficiently especially on x86/x86_64.
When the target doesn't support SIMD, LLVM provides ALU-based emulation, which might not be a performance win compared to manual ALU code, but at least keeps the code portable.
When the target does support SIMD, the portable types must be zero-cost transmutable to the types that vendor intrinsics accept, so that specific things can be optimized with ISA-specific alternative code paths.
packed_simd crate provides an implementation that already works across a wide variety of Rust targets and that has already been developed with the intent that it could become
std::simd. It makes sense not to start from scratch but to start from there.
The code needs to go in the standard library if it is assumed that rustc won't, on stable Rust, expose the kind of compiler internals that
packed_simd depends on.
Please see the FAQ.
Once this MCP is filed, a Zulip topic will be opened for discussion. Ultimately, one of the following things can happen:
You can read [more about the lang-team MCP process on forge].
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.
Answer questions scottmcm
The "language design" portion of it is basically limited to "should we add intrinsic capabilities and therefore tie ourselves to LLVM even further", correct?
I don't think that, from a formal specification perspective, this is true. Another rust implementation could provide a fully semantically-correct implementation by just calling the scalar versions of all the functions in the appropriate loops.
Now, obviously from a quality-of-implementation perspective a compiler would likely want to provide something smarter than that, to take better advantage of hardware capabilities. But I think LLVM is only one way of getting that -- albeit what I would probably pick if I was implementing atop of it anyway. We could also have it with an implementation strategy of
cfg_if!s to call existing-stable intrinsics on the relevant platforms, for example, as people hit them or they stabilize.
My expectation from "portable" is that such differences would necessarily be inobservable semantically, and thus I think I personally would be fine with this being entirely a libs projects: to figure out the portable set of operations, how best to expose a rust interface to those, and how best to have them interop with non-portable platform intrinsics where needed, etc. (There might be some libs-impl/compiler/lang conversations about implementations details, but I suspect none of those will lock us into things.)
Related questionsNo questions were found.