profile
viewpoint

nagisa/django-bfm 40

[Not really maintained anymore] BFM is an Apache Licenced server file manager for Django made with ease of use and new web technologies in mind.

nagisa/Manga-Fox-Grabber 12

GUI to download manga from mangafox, make it into PDF, and maybe read in your EBook reader.

nagisa/marksman_escape 8

HTML escaping and unescaping in Rust

lucab/memfd-rs 7

A pure-Rust library to work with Linux memfd

nagisa/e4rat-preload-lite 5

More efficient way to preload e4rat file lists.

nagisa/Feeds 3

Feed Reader for GNOME.

nagisa/gnome-shell-theme-min 3

A GNOME-Shell theme without bells or whistles

nagisa/llvm_build_utils.rs 2

LLVM build utils for cargo build scripts

nagisa/django-localflavor-lt 1

Country-specific Django helpers for Lithuania.

nagisa/kazlauskas.me 1

Hakyll based website of mine

created tagnagisa/rust_tracy_client

tagtracy-client-sys-v0.8.1

Tracy client libraries for Rust

created time in 8 hours

created tagnagisa/rust_tracy_client

tagtracy-client-v0.7.4

Tracy client libraries for Rust

created time in 8 hours

pull request commentnagisa/rust_tracy_client

Add rust functions for set_thread_name in the Tracy C API

Thanks!

zanshi

comment created time in 9 hours

PR merged nagisa/rust_tracy_client

Add rust functions for set_thread_name in the Tracy C API

Tracy now has support for setting the thread name in the C API, this PR adds support for this!

I think you might have to publish the new versions to crates.io for everything to work, I'm not really sure how that works though.

I also ran cargo fmt, hence the formatting changes.

+35 -15

1 comment

6 changed files

zanshi

pr closed time in 9 hours

push eventnagisa/rust_tracy_client

Niclas Olmenius

commit sha 77a8f310c194227bba78d1810e18e8d60a74ff00

add rust functions for set_thread_name

view details

Simonas Kazlauskas

commit sha 2428a641fcbff53571facbeb7496c022fe9984c5

Also allow resolving deps by a relative path Helps CI to work when changes are made across multiple packages

view details

push time in 9 hours

delete branch nagisa/rust_tracy_client

delete branch : master

delete time in 9 hours

delete branch nagisa/rust_tracy_client

delete branch : nagisa/ci-1

delete time in 9 hours

create barnchnagisa/rust_tracy_client

branch : master

created branch time in 9 hours

push eventnagisa/rust_tracy_client

Simonas Kazlauskas

commit sha 2428a641fcbff53571facbeb7496c022fe9984c5

Also allow resolving deps by a relative path Helps CI to work when changes are made across multiple packages

view details

push time in 9 hours

create barnchnagisa/rust_tracy_client

branch : nagisa/ci-1

created branch time in 9 hours

pull request commentnagisa/rust_tracy_client

Add rust functions for set_thread_name in the Tracy C API

Very cool. I’ll see if I can get the CI to pass without publishing things to crates.io first.

zanshi

comment created time in 9 hours

pull request commentrust-lang/rust

rustc: Improving safe wasm float->int casts

Yeah LGTM. Its too bad we cannot easily generate phi – having many casts will ultimately end up generating a larger amount of IR for LLVM to optimize, slowing down compilation. But nevertheless the improvement is worth it.

@bors r+

alexcrichton

comment created time in a day

issue commentNixOS/nixpkgs

musl: Trivial binaries built with musl will fault on nixos

The most obvious way to compile something with musl, i. e. musl-gcc test.c will result in a binary which segfaults

This was mentioned in the original report as well.

nagisa

comment created time in 3 days

issue commentrust-lang/stacker

Linking failing when compiling for wasm32-wasi

cc @alexcrichton

bjorn3

comment created time in 5 days

issue commentrust-lang/stacker

Linking failing when compiling for wasm32-wasi

Please do provide more information either way.

bjorn3

comment created time in 5 days

issue commentrust-lang/stacker

Linking failing when compiling for wasm32-wasi

Huh, the CI actually even runs tests for the wasm32-wasi target as added in #40.

bjorn3

comment created time in 5 days

issue commentrust-lang/rust

Pattern matching regression 1.45.0 -> 1.45.1 (+nightly)

I think this is the 2nd instance over the years where we end up with a breaking stable backport. Last time in (1.27.2) we very quickly prepared another point release. This bug might merit something like that again. I’m going to preliminarily and unilaterally prioritize this as P-critical, but also leave the I-prioritize to ensure the prioritization team looks at this.

jsgf

comment created time in 5 days

issue commentrust-lang/rust

Awkward table on firefox

This is a firefox bug. Closing.

gabriel-araujjo

comment created time in 5 days

issue commentnotracking/hosts-blocklists

newrelic.com is blocked

Is a more granular rule possible here? Newrelic may be used for instrumenting web applications but it can also be a tool for instrumenting and observing properties of non-user-facing applications.

docs.newrelic.com and blog.newrelic.com are also blocked by this rule, despite those being very unlikely to be used to indirectly track visitors. Similarly the rpm.newrelic.com is the newrelic application itself that is also unlikely to track visitors, but may be a critical tool for work.

macalimlim

comment created time in 5 days

pull request commentrust-lang/rust

Fix incorrect clashing_extern_declarations warnings.

@bors r+

jumbatm

comment created time in 5 days

issue commentNixOS/nixpkgs

musl: Trivial binaries built with musl will fault on nixos

As per #musl channel on IRC running binaries using the musl libc with the glibc interpreter is not supported.

nagisa

comment created time in 5 days

issue closedrust-lang/rust

x86_64-unknown-linux-musl binaries SIGSEGV during early initialization

Minimized reproducer

fn main() {}

// $ rustc --target=x86_64-unknown-linux-musl test.rs
// $ ./test
// fish: “./test” terminated by signal SIGSEGV (Address boundary error)

Stack trace:

(gdb) bt
#0  0x0000555555558193 in _start_c ()
#1  0x00007fffffffc6b9 in ?? ()
#2  0x00000000178bfbff in ?? ()
#3  0x0000000000000064 in ?? ()
#4  0x0000000000000000 in ?? ()

The faulting instruction is this:

0x0000555555558193 <+291>:   mov    0x8(%rdx),%rsi

The register value is not obviously wrong (i.e. its not 0 or something like that), so could be a buffer overrun of some sort.

rustc 1.45.0 (5c1f21c3b 2020-07-13) works. rustc 1.46.0-beta.2 (6f959902b 2020-07-23) fails. rustc 1.47.0-nightly (5ef299eb9 2020-07-24) fails.

closed time in 5 days

nagisa

issue openedNixOS/nixpkgs

musl: Trivial binaries built with musl will fault on nixos

Describe the bug

Running statically linked musl libraries on nixos will fault in early runtime initialization. There appear to be multiple issues, first is that linking on nixos when PIE or PIC is enabled will implicitly also add the glibc ld-linux.so, which causes issues in rust binaries: https://github.com/rust-lang/rust/issues/74757, but it looks like trivial C binaries don’t work either...

This is the backtrace:

(gdb) bt
#0  0x00007ffff7f7aba6 in do_init_fini () from /nix/store/d8d7k50ffblx4dkqrg4j4flj41dxbwcp-musl-1.1.24/lib/libc.so
#1  0x00007ffff7f7bf43 in __libc_start_init () from /nix/store/d8d7k50ffblx4dkqrg4j4flj41dxbwcp-musl-1.1.24/lib/libc.so
#2  0x00007ffff7f3fc33 in libc_start_main_stage2 () from /nix/store/d8d7k50ffblx4dkqrg4j4flj41dxbwcp-musl-1.1.24/lib/libc.so
#3  0x0000000000401059 in _start ()

To Reproduce

Observe that building static binary works:

$ nix-shell -p 'musl' --run 'echo "int main(void) { return 0; }" | musl-gcc -static -xc - && ./a.out'

but if run with a dynamic interpreter, the binary will fault:

$ /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 ./a.out
fish: “/nix/store/jx19wa4xlh9n4324xdl9…” terminated by signal SIGSEGV (Address boundary error)

Running a dynamically linked binary doesn’t work either (it encodes the same interpreter in its DT_INTERP elf metadata):

$ nix-shell -p 'musl' --run 'echo "int main(void) { return 0; }" | musl-gcc -xc - && ./a.out'
/run/user/1000/nix-shell-8331-0/rc: line 1:  8374 Segmentation fault      (core dumped) ./a.out

Expected behavior

Binaries linked against musl should work regardless of whether they are linked statically, dynamically or are position independent.

Screenshots N/A

Additional context N/A

Notify maintainers

@thoughtpolice @dtzWill

Metadata

  • system: "x86_64-linux"
  • host os: Linux 5.6.16, NixOS, 20.09pre228599.7a07f2a5edd (Nightingale)
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.3.6
  • channels(root): "nixos-20.09pre228599.7a07f2a5edd"
  • channels(nagisa): ""
  • nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:

created time in 5 days

pull request commentrust-lang/stacker

Fix support for wasm32

psm 0.1.11 and stacker 0.1.10 have been released with these changes.

alexcrichton

comment created time in 5 days

push eventrust-lang/stacker

Simonas Kazlauskas

commit sha 7f1e6f234353b24bad14ba6aadf4756202af642a

psm -> 0.1.11

view details

push time in 6 days

created tagrust-lang/stacker

tagpsm-0.1.11

Manual segmented stacks for Rust

created time in 6 days

IssuesEvent

created tagrust-lang/stacker

tagstacker-0.1.10

Manual segmented stacks for Rust

created time in 6 days

push eventrust-lang/stacker

Simonas Kazlauskas

commit sha 320c26487e45cb3a863febeb249cef17b9b43e39

Disable CI on sparc64-linux and powerpc64-linux For stacker portion of the CI. See 6bde9f47426d544f945.

view details

push time in 6 days

PR merged rust-lang/stacker

Fix support for wasm32

This commit updates the build scripts to fix support for wasm32 so that it links correctly. Currently it compiles correctly but doesn't have a ton of functionality, but afterwards this should have functionality for growing the stack as well as linking correctly.

+159 -98

4 comments

10 changed files

alexcrichton

pr closed time in 6 days

push eventrust-lang/stacker

Alex Crichton

commit sha 6452a64f92e6c878b01d45d6fbee35fffcb11e0c

Fix support for wasm32 This commit updates the build scripts to fix support for wasm32 so that it links correctly. Currently it compiles correctly but doesn't have a ton of functionality, but afterwards this should have functionality for growing the stack as well as linking correctly.

view details

Alex Crichton

commit sha 6e1af7dbf1075b455468af433de66cfab71e096f

Add `rust_psm_replace_stack`

view details

Alex Crichton

commit sha eccbde44911eb6707c8af3bd2a57bc64d4e3604d

Get tests on CI for wasm/wasi

view details

Simonas Kazlauskas

commit sha 6bde9f47426d544f9458e2d99928e77073a61a8a

Disable CI on sparc64-linux and powerpc64-linux The upstream project we depend on has disabled building of the docker images of these targets.

view details

Simonas Kazlauskas

commit sha c0d8e410596ee70a4746c74f426489ac3d0fe8b6

stacker -> 0.1.10

view details

push time in 6 days

pull request commentrust-embedded/cross

Bump version to 0.2.1.

Is there any description of how the sparc/ppc images keep breaking?

reitermarkus

comment created time in 6 days

pull request commentrust-lang/stacker

Fix support for wasm32

Cool, I’ll take a closer look at the rest of the PR in a couple hours.

alexcrichton

comment created time in 6 days

pull request commentrust-lang/stacker

Fix support for wasm32

Also wasm doesn't give you infinite recursion (since engines still have a hard limit)

Yeah, this is a known limitation of the target, documented in the original addition of the target: https://github.com/rust-lang/stacker/pull/10

so I had to decrease some of the recursion amounts.

Hm, I would still want to keep exercising the actual call stack being operated for the other platforms, so we could perhaps add a constant of some sort that indicated the amount of recursion in tests and use a smaller value on wasm.

I can add testing to CI if desired but the CI here looks somewhat complicated so I may need some guidance...

https://github.com/rust-lang/stacker/pull/14 could help. I don’t particularly mind it not being tested, but having some would be lovely!

alexcrichton

comment created time in 6 days

Pull request review commentrust-lang/stacker

Fix support for wasm32

 #include "psm.h" -.text-.section .text.rust_psm_stack_direction,"",@+# Note that this function is not compiled when this package is uploaded to+# crates.io, this source is only here as a reference for how the corresponding+# wasm32.o was generated. This file can be compiled with:+#+#    cpp psm/src/arch/wasm32.s | llvm-mc -o psm/src/arch/wasm32.o --arch=wasm32 -filetype=obj+#+# where you'll want to ensure that `llvm-mc` is from a relatively recent+# version of LLVM.++.globaltype __stack_pointer, i32+ .globl rust_psm_stack_direction .type rust_psm_stack_direction,@function rust_psm_stack_direction: .functype rust_psm_stack_direction () -> (i32)-    i32.const $STACK_DIRECTION_DESCENDING+    i32.const STACK_DIRECTION_DESCENDING     end_function-.rust_psm_stack_direction_end:-.size rust_psm_stack_direction, .rust_psm_stack_direction_end-rust_psm_stack_direction --.section .text.rust_psm_stack_pointer,"",@ .globl rust_psm_stack_pointer .type rust_psm_stack_pointer,@function rust_psm_stack_pointer: .functype rust_psm_stack_pointer () -> (i32)     global.get __stack_pointer     end_function-.rust_psm_stack_pointer_end:-.size rust_psm_stack_pointer, .rust_psm_stack_pointer_end-rust_psm_stack_pointer -.globaltype __stack_pointer, i32+.globl rust_psm_on_stack+.type rust_psm_on_stack,@function

You may need to add another similar implementation for rust_psm_replace_stack. If the crate(s) work on wasm right now, its probably only because stacker or its test suite itself doesn't use the function, but psm's public API does depend on it.

alexcrichton

comment created time in 6 days

issue commentrust-lang/rust

sysroot spans still not printed, when remap-path-prefix is set with a custom path

Closely related, almost to the point of being a duplicate, to https://github.com/rust-lang/rust/issues/73167

infinity0

comment created time in 8 days

issue commentrust-lang/rust

Incorrect read_vectored behaviour on Windows

Much like with read, you must inspect the number of bytes that read_vectored returns. That’s the number of bytes read, and the buffers are filled in as if they were concatenated. In your case, I’m almost certain that the function call returned 3.

The documentation also says this:

Data is copied to fill each buffer in order, with the final buffer written to possibly being only partially filled. This method must behave equivalently to a single call to read with concatenated buffers.

kubkon

comment created time in 8 days

pull request commentrust-lang/rust

rustbuild: fix bad usage of UNIX exec() in rustc wrapper

@bors r+ rollup

infinity0

comment created time in 9 days

pull request commentrust-lang/rust

rustbuild: fix bad usage of UNIX exec() in rustc wrapper

Wait, --on-fail still works at all? I thought it was entirely broken… (though perhaps only affecting --on-fail bash usage?)

I think the motivation from https://github.com/rust-lang/rust/commit/b3b2f1b0d626e79c0a4dd8650042fb805b79a6c4 still stands for the proper rustc invocations, though, and only --on-fail is really broken. If that’s true, I’d adjust the --on-fail path only.

infinity0

comment created time in 9 days

issue commentrust-lang/rust

x86_64-unknown-linux-musl binaries SIGSEGV during early initialization

It likely deserves an issue report to both nixos and musl upstreams. I’m going to do it as time permits. Then this can be closed.

nagisa

comment created time in 9 days

issue commentrust-lang/rust

x86_64-unknown-linux-musl binaries SIGSEGV during early initialization

It looks like the issue is coming from the fact that on NixOS cc wraps gcc to add the following arguments

  -Wl\,-dynamic-linker
  -Wl\,/nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2

If the wrapper is not utilized, or these arguments are removed, the binary runs correctly. Conversely if its run with the interpreter directly it also faults:

/nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 ./test
fish: “/nix/store/jx19wa4xlh9n4324xdl9…” terminated by signal SIGSEGV (Address boundary error)

Possibly not a rustc bug either way.

nagisa

comment created time in 10 days

issue commentrust-lang/rust

x86_64-unknown-linux-musl binaries SIGSEGV during early initialization

Disabling pie helps:

$ rustc --version; and rustc --target=x86_64-unknown-linux-musl test.rs -C relocation-model=static; and ./test
rustc 1.46.0-beta.2 (6f959902b 2020-07-23)
$ echo $status
0
nagisa

comment created time in 10 days

issue commentrust-lang/rust

x86_64-unknown-linux-musl binaries SIGSEGV during early initialization

Marking as O-NixOS then where this was reproduced.

nagisa

comment created time in 10 days

issue openedrust-lang/rust

x86_64-unknown-linux-musl binaries SIGSEGV during early initialization

Minimized reproducer

fn main() {}

// $ rustc --target=x86_64-unknown-linux-musl test.rs
// $ ./test
// fish: “./test” terminated by signal SIGSEGV (Address boundary error)

Stack trace:

(gdb) bt
#0  0x0000555555558193 in _start_c ()
#1  0x00007fffffffc6b9 in ?? ()
#2  0x00000000178bfbff in ?? ()
#3  0x0000000000000064 in ?? ()
#4  0x0000000000000000 in ?? ()

The faulting instruction is this:

0x0000555555558193 <+291>:   mov    0x8(%rdx),%rsi

The register value is not obviously wrong (i.e. its not 0 or something like that), so could be a buffer overrun of some sort.

rustc 1.45.0 (5c1f21c3b 2020-07-13) works. rustc 1.46.0-beta.2 (6f959902b 2020-07-23) fails. rustc 1.47.0-nightly (5ef299eb9 2020-07-24) fails.

created time in 10 days

Pull request review commentrust-lang/rust

rustc: Improving safe wasm float->int casts

 fn cast_float_to_int<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(     //     int_ty::MIN and therefore the return value of int_ty::MIN is correct.     // QED. -    // Step 1 was already performed above.--    // Step 2: We use two comparisons and two selects, with %s1 being the result:-    //     %less_or_nan = fcmp ult %x, %f_min-    //     %greater = fcmp olt %x, %f_max-    //     %s0 = select %less_or_nan, int_ty::MIN, %fptosi_result-    //     %s1 = select %greater, int_ty::MAX, %s0-    // Note that %less_or_nan uses an *unordered* comparison. This comparison is true if the-    // operands are not comparable (i.e., if x is NaN). The unordered comparison ensures that s1-    // becomes int_ty::MIN if x is NaN.-    // Performance note: Unordered comparison can be lowered to a "flipped" comparison and a-    // negation, and the negation can be merged into the select. Therefore, it not necessarily any-    // more expensive than a ordered ("normal") comparison. Whether these optimizations will be-    // performed is ultimately up to the backend, but at least x86 does perform them.-    let less_or_nan = bx.fcmp(RealPredicate::RealULT, x, f_min);-    let greater = bx.fcmp(RealPredicate::RealOGT, x, f_max);     let int_max = bx.cx().const_uint_big(int_ty, int_max(signed, int_width));     let int_min = bx.cx().const_uint_big(int_ty, int_min(signed, int_width) as u128);-    let s0 = bx.select(less_or_nan, int_min, fptosui_result);-    let s1 = bx.select(greater, int_max, s0);--    // Step 3: NaN replacement.-    // For unsigned types, the above step already yielded int_ty::MIN == 0 if x is NaN.-    // Therefore we only need to execute this step for signed integer types.-    if signed {-        // LLVM has no isNaN predicate, so we use (x == x) instead-        let zero = bx.cx().const_uint(int_ty, 0);-        let cmp = bx.fcmp(RealPredicate::RealOEQ, x, x);-        bx.select(cmp, s1, zero)+    let zero = bx.cx().const_uint(int_ty, 0);++    // The codegen here differes quite a bit depending on whether our builder's+    // `fptosi` and `fptoui` instructions may trap for out-of-bounds values. If+    // they don't dtrap then we can start doing everything inline with a+    // `select` instruction because it's ok to execute `fptosi` and `fptoui`+    // even if we don't use the results.+    if !bx.fptosui_may_trap(x, int_ty) {+        // Step 1 ...+        let fptosui_result = if signed { bx.fptosi(x, int_ty) } else { bx.fptoui(x, int_ty) };+        let less_or_nan = bx.fcmp(RealPredicate::RealULT, x, f_min);+        let greater = bx.fcmp(RealPredicate::RealOGT, x, f_max);++        // Step 2: We use two comparisons and two selects, with %s1 being the+        // result:+        //     %less_or_nan = fcmp ult %x, %f_min+        //     %greater = fcmp olt %x, %f_max+        //     %s0 = select %less_or_nan, int_ty::MIN, %fptosi_result+        //     %s1 = select %greater, int_ty::MAX, %s0+        // Note that %less_or_nan uses an *unordered* comparison. This+        // comparison is true if the operands are not comparable (i.e., if x is+        // NaN). The unordered comparison ensures that s1 becomes int_ty::MIN if+        // x is NaN.+        //+        // Performance note: Unordered comparison can be lowered to a "flipped"+        // comparison and a negation, and the negation can be merged into the+        // select. Therefore, it not necessarily any more expensive than a+        // ordered ("normal") comparison. Whether these optimizations will be+        // performed is ultimately up to the backend, but at least x86 does+        // perform them.+        let s0 = bx.select(less_or_nan, int_min, fptosui_result);+        let s1 = bx.select(greater, int_max, s0);++        // Step 3: NaN replacement.+        // For unsigned types, the above step already yielded int_ty::MIN == 0 if x is NaN.+        // Therefore we only need to execute this step for signed integer types.+        if signed {+            // LLVM has no isNaN predicate, so we use (x == x) instead+            let cmp = bx.fcmp(RealPredicate::RealOEQ, x, x);+            bx.select(cmp, s1, zero)+        } else {+            s1+        }     } else {-        s1+        // In this case we cannot execute `fptosi` or `fptoui` and then later+        // discard the result. The builder is telling us that these instructions+        // will trap on out-of-bounds values, so we need to use basic blocks and+        // control flow to avoid executing the `fptosi` and `fptoui`+        // instructions.+        //+        // The general idea of what we're constructing here is, for f64 -> i32:+        //+        //      ;; ... previous basic block+        //      %load_result = alloca i32, align 4+        //      %less_or_nan = fcmp ult double %x, 0xC1E0000000000000

This is what I ended up in the end

    ;; block so far... %0 is the argument
    %inbound_lower = fcmp oge double %0, 0xC1E0000000000000
    %inbound_upper = fcmp ole double %0, 0x41DFFFFFFFC00000
    ;; match (inbound_lower, inbound_upper) { 
    ;;     (true, true) => %0 can be converted without trapping
    ;;     (false, false) => %0 is a NaN
    ;;     (true, false) => %0 is too large
    ;;     (false, true) => %0 is too small
    ;; }
    ;;
    ;; The (true, true) check, go to %convert if so.
    %inbounds = and i1 %inbound_lower, %inbound_upper
    br i1 %inbounds, label %convert, label %specialcase

specialcase:
    ;; Handle the cases where the number is NaN, too large or too small
    
    ;; Either (true, false) or (false, true)
    %is_not_nan = or i1 %inbound_lower, %inbound_upper
    ;; Figure out which saturated value we are interested in if not `NaN`
    %saturated = select i1 %inbound_lower, i32 2147483647, i32 -2147483648
    ;; Figure out between saturated and NaN representations
    %result_nan = select i1 %is_not_nan, i32 %saturated, i32 0
    br label %done

convert:
    %result = call i32 @llvm.wasm.trunc.signed.i32.f64(double %0) ; fptosi double %0 to i32
    br label %done

done:
    %r = phi i32 [ %result, %convert ], [ %result_nan, %specialcase ]
    ;; continues here

Here’s a godbolt comparison of the two approaches: https://rust.godbolt.org/z/cnEGWM. The new one looks somewhat better to my untrained eye, and its also something LLVM is not really able to optimize much further, so at the very least there’s a very minor win in terms of compile time.

alexcrichton

comment created time in 11 days

Pull request review commentrust-lang/rust

rustc: Improving safe wasm float->int casts

 fn cast_float_to_int<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(     //     int_ty::MIN and therefore the return value of int_ty::MIN is correct.     // QED. -    // Step 1 was already performed above.--    // Step 2: We use two comparisons and two selects, with %s1 being the result:-    //     %less_or_nan = fcmp ult %x, %f_min-    //     %greater = fcmp olt %x, %f_max-    //     %s0 = select %less_or_nan, int_ty::MIN, %fptosi_result-    //     %s1 = select %greater, int_ty::MAX, %s0-    // Note that %less_or_nan uses an *unordered* comparison. This comparison is true if the-    // operands are not comparable (i.e., if x is NaN). The unordered comparison ensures that s1-    // becomes int_ty::MIN if x is NaN.-    // Performance note: Unordered comparison can be lowered to a "flipped" comparison and a-    // negation, and the negation can be merged into the select. Therefore, it not necessarily any-    // more expensive than a ordered ("normal") comparison. Whether these optimizations will be-    // performed is ultimately up to the backend, but at least x86 does perform them.-    let less_or_nan = bx.fcmp(RealPredicate::RealULT, x, f_min);-    let greater = bx.fcmp(RealPredicate::RealOGT, x, f_max);     let int_max = bx.cx().const_uint_big(int_ty, int_max(signed, int_width));     let int_min = bx.cx().const_uint_big(int_ty, int_min(signed, int_width) as u128);-    let s0 = bx.select(less_or_nan, int_min, fptosui_result);-    let s1 = bx.select(greater, int_max, s0);--    // Step 3: NaN replacement.-    // For unsigned types, the above step already yielded int_ty::MIN == 0 if x is NaN.-    // Therefore we only need to execute this step for signed integer types.-    if signed {-        // LLVM has no isNaN predicate, so we use (x == x) instead-        let zero = bx.cx().const_uint(int_ty, 0);-        let cmp = bx.fcmp(RealPredicate::RealOEQ, x, x);-        bx.select(cmp, s1, zero)+    let zero = bx.cx().const_uint(int_ty, 0);++    // The codegen here differes quite a bit depending on whether our builder's+    // `fptosi` and `fptoui` instructions may trap for out-of-bounds values. If+    // they don't dtrap then we can start doing everything inline with a+    // `select` instruction because it's ok to execute `fptosi` and `fptoui`+    // even if we don't use the results.+    if !bx.fptosui_may_trap(x, int_ty) {+        // Step 1 ...+        let fptosui_result = if signed { bx.fptosi(x, int_ty) } else { bx.fptoui(x, int_ty) };+        let less_or_nan = bx.fcmp(RealPredicate::RealULT, x, f_min);+        let greater = bx.fcmp(RealPredicate::RealOGT, x, f_max);++        // Step 2: We use two comparisons and two selects, with %s1 being the+        // result:+        //     %less_or_nan = fcmp ult %x, %f_min+        //     %greater = fcmp olt %x, %f_max+        //     %s0 = select %less_or_nan, int_ty::MIN, %fptosi_result+        //     %s1 = select %greater, int_ty::MAX, %s0+        // Note that %less_or_nan uses an *unordered* comparison. This+        // comparison is true if the operands are not comparable (i.e., if x is+        // NaN). The unordered comparison ensures that s1 becomes int_ty::MIN if+        // x is NaN.+        //+        // Performance note: Unordered comparison can be lowered to a "flipped"+        // comparison and a negation, and the negation can be merged into the+        // select. Therefore, it not necessarily any more expensive than a+        // ordered ("normal") comparison. Whether these optimizations will be+        // performed is ultimately up to the backend, but at least x86 does+        // perform them.+        let s0 = bx.select(less_or_nan, int_min, fptosui_result);+        let s1 = bx.select(greater, int_max, s0);++        // Step 3: NaN replacement.+        // For unsigned types, the above step already yielded int_ty::MIN == 0 if x is NaN.+        // Therefore we only need to execute this step for signed integer types.+        if signed {+            // LLVM has no isNaN predicate, so we use (x == x) instead+            let cmp = bx.fcmp(RealPredicate::RealOEQ, x, x);+            bx.select(cmp, s1, zero)+        } else {+            s1+        }     } else {-        s1+        // In this case we cannot execute `fptosi` or `fptoui` and then later+        // discard the result. The builder is telling us that these instructions+        // will trap on out-of-bounds values, so we need to use basic blocks and+        // control flow to avoid executing the `fptosi` and `fptoui`+        // instructions.+        //+        // The general idea of what we're constructing here is, for f64 -> i32:+        //+        //      ;; ... previous basic block+        //      %load_result = alloca i32, align 4+        //      %less_or_nan = fcmp ult double %x, 0xC1E0000000000000

Here’s something I have so far (caveat untested, unchecked and not validated):

  ; check that the number is within the expected bounds and just convert if the number is in range
  %inbound_lower = fcmp oge double %0, 0xC1E0000000000000
  %inbound_upper = fcmp ole double %0, 0x41DFFFFFFFC00000
  %inbounds = and i1 %inbound_lower, %inbound_upper
  br i1 %inbounds, label %convert, label %specialcase

specialcase:
  %isnan = fcmp ord double %0, 0.000000e+00 ; can probably also be `!inbound_lower && !inbound_upper`?
  ; select a max/min value for the case where float is not nan
  %result_notnan = select i1 %inbound_lower, i32 2147483647, i32 -2147483648
  ; nan or min/max
  %result_nan = select i1 %isnan, i32 0, i32 %result_notnan
  ret i32 %result_nan

convert:
  %result = fptosi double %0 to i32
  ret i32 %result
alexcrichton

comment created time in 11 days

Pull request review commentrust-lang/rust

rustc: Improving safe wasm float->int casts

 fn cast_float_to_int<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(     //     int_ty::MIN and therefore the return value of int_ty::MIN is correct.     // QED. -    // Step 1 was already performed above.--    // Step 2: We use two comparisons and two selects, with %s1 being the result:-    //     %less_or_nan = fcmp ult %x, %f_min-    //     %greater = fcmp olt %x, %f_max-    //     %s0 = select %less_or_nan, int_ty::MIN, %fptosi_result-    //     %s1 = select %greater, int_ty::MAX, %s0-    // Note that %less_or_nan uses an *unordered* comparison. This comparison is true if the-    // operands are not comparable (i.e., if x is NaN). The unordered comparison ensures that s1-    // becomes int_ty::MIN if x is NaN.-    // Performance note: Unordered comparison can be lowered to a "flipped" comparison and a-    // negation, and the negation can be merged into the select. Therefore, it not necessarily any-    // more expensive than a ordered ("normal") comparison. Whether these optimizations will be-    // performed is ultimately up to the backend, but at least x86 does perform them.-    let less_or_nan = bx.fcmp(RealPredicate::RealULT, x, f_min);-    let greater = bx.fcmp(RealPredicate::RealOGT, x, f_max);     let int_max = bx.cx().const_uint_big(int_ty, int_max(signed, int_width));     let int_min = bx.cx().const_uint_big(int_ty, int_min(signed, int_width) as u128);-    let s0 = bx.select(less_or_nan, int_min, fptosui_result);-    let s1 = bx.select(greater, int_max, s0);--    // Step 3: NaN replacement.-    // For unsigned types, the above step already yielded int_ty::MIN == 0 if x is NaN.-    // Therefore we only need to execute this step for signed integer types.-    if signed {-        // LLVM has no isNaN predicate, so we use (x == x) instead-        let zero = bx.cx().const_uint(int_ty, 0);-        let cmp = bx.fcmp(RealPredicate::RealOEQ, x, x);-        bx.select(cmp, s1, zero)+    let zero = bx.cx().const_uint(int_ty, 0);++    // The codegen here differes quite a bit depending on whether our builder's+    // `fptosi` and `fptoui` instructions may trap for out-of-bounds values. If+    // they don't dtrap then we can start doing everything inline with a+    // `select` instruction because it's ok to execute `fptosi` and `fptoui`+    // even if we don't use the results.+    if !bx.fptosui_may_trap(x, int_ty) {+        // Step 1 ...+        let fptosui_result = if signed { bx.fptosi(x, int_ty) } else { bx.fptoui(x, int_ty) };+        let less_or_nan = bx.fcmp(RealPredicate::RealULT, x, f_min);+        let greater = bx.fcmp(RealPredicate::RealOGT, x, f_max);++        // Step 2: We use two comparisons and two selects, with %s1 being the+        // result:+        //     %less_or_nan = fcmp ult %x, %f_min+        //     %greater = fcmp olt %x, %f_max+        //     %s0 = select %less_or_nan, int_ty::MIN, %fptosi_result+        //     %s1 = select %greater, int_ty::MAX, %s0+        // Note that %less_or_nan uses an *unordered* comparison. This+        // comparison is true if the operands are not comparable (i.e., if x is+        // NaN). The unordered comparison ensures that s1 becomes int_ty::MIN if+        // x is NaN.+        //+        // Performance note: Unordered comparison can be lowered to a "flipped"+        // comparison and a negation, and the negation can be merged into the+        // select. Therefore, it not necessarily any more expensive than a+        // ordered ("normal") comparison. Whether these optimizations will be+        // performed is ultimately up to the backend, but at least x86 does+        // perform them.+        let s0 = bx.select(less_or_nan, int_min, fptosui_result);+        let s1 = bx.select(greater, int_max, s0);++        // Step 3: NaN replacement.+        // For unsigned types, the above step already yielded int_ty::MIN == 0 if x is NaN.+        // Therefore we only need to execute this step for signed integer types.+        if signed {+            // LLVM has no isNaN predicate, so we use (x == x) instead+            let cmp = bx.fcmp(RealPredicate::RealOEQ, x, x);+            bx.select(cmp, s1, zero)+        } else {+            s1+        }     } else {-        s1+        // In this case we cannot execute `fptosi` or `fptoui` and then later+        // discard the result. The builder is telling us that these instructions+        // will trap on out-of-bounds values, so we need to use basic blocks and+        // control flow to avoid executing the `fptosi` and `fptoui`+        // instructions.+        //+        // The general idea of what we're constructing here is, for f64 -> i32:+        //+        //      ;; ... previous basic block+        //      %load_result = alloca i32, align 4+        //      %less_or_nan = fcmp ult double %x, 0xC1E0000000000000

This looks more complicated than it needs to be. I’m aware that this is not very constructive in terms of feedback as it is… but I’m experimenting with some alternative ways to write this down as LLVM IR and hopefully will come back to update this before monday.

alexcrichton

comment created time in 11 days

issue commenttokio-rs/tracing

core: Make `Field` keys more flexible

Note: just pondering below, not an actual proposal.

If the field values are at all times known statically (as they are in e.g. calls to event! and span! families of macros), the interner could be avoided entirely and the linker could be made to do the work for us. In particular something like…

#[export_name="_rust_tracing_field_name_$fieldname"]
#[linkage="linkonce"]
static $fieldname_FIELD_NAME: [u8; _] = *b"$fieldname";

since the symbol names must be unique across the executable, there will only ever be one address for &MESSAGE_FIELD_NAME across the entire executable, no matter how many definitions of this static exist.

The two problems with this approach is that you also need #[linkage="linkonce"] which is unstable… And also somehow only generate at most one such static per crate...

Less reliable (read: useless), but in a given codegen unit multiple equivalent string literals will end up having the same address...

hawkw

comment created time in 12 days

PR opened NixOS/nixpkgs

arcanist: 20200127 -> 20200711

Motivation for this change

Note that this arcanist bump introduces some breaking changes to the tool interface. arcanist hasn’t been updated for quite a while now, though, to the point where some documentation is no longer applicable to the old version, leading to confused users and broken workflows.

I verified that the following commands work (provided git is already in the environment):

arc help
arc branches
arc work
arc todo # changes state in phabricator
arc get-config
arc which
arc tasks
arc look

I also found that the following commands fail to work:

arc version # installation is not a git repo
arc anoid   # easter egg, requires python3, not worth the closure size bump 
arc browse # unable to find a browser command to run.

r? @thoughtpolice

Things done

<!-- Please check what applies. Note that these are not hard requirements but merely serve as information for reviewers. -->

  • [x] Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • [x] NixOS
    • [ ] macOS
    • [ ] other Linux distributions
  • [ ] Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • [ ] Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • [x] Tested execution of all binary files (usually in ./result/bin/)
  • [x] Determined the impact on package closure size (by running nix path-info -S before and after)
  • [ ] Ensured that relevant documentation is up to date
  • [ ] Fits CONTRIBUTING.md.
+31 -36

0 comment

1 changed file

pr created time in 13 days

create barnchnagisa/nixpkgs

branch : arcup

created branch time in 13 days

pull request commentrust-lang/rust

Rearrange the pipeline of `pow` to gain efficiency

@bors r-, yeah, please squash.

Neutron3529

comment created time in 13 days

pull request commentrust-lang/rust

Rearrange the pipeline of `pow` to gain efficiency

@bors r+ rollup

Neutron3529

comment created time in 14 days

push eventnagisa/msi-rgb

Simonas Kazlauskas

commit sha 12d465aa20a70a51cc7dd25e6add26cec8a34437

Redirect visitors to openrgb

view details

push time in 14 days

pull request commentrust-lang/rust

Add the aarch64-apple-darwin target

@bors r+

shepmaster

comment created time in 14 days

Pull request review commentrust-lang/rust

Add the aarch64-apple-darwin target

+use crate::spec::{LinkerFlavor, Target, TargetOptions, TargetResult};++pub fn target() -> TargetResult {+    let mut base = super::apple_base::opts();+    base.cpu = "apple-a12".to_string();

This brings up an interesting question though – should we actually strive to leave this set to apple-a12 to support these soon-legacy devices?

shepmaster

comment created time in 14 days

pull request commentrust-lang/rust

Remove branch in optimized is_ascii

It improves medium and short by 1ns but regresses unaligned_tail by 2ns

This amount of improvement sounds like its most likely within the error margin. At this point we probably would need to start measuring cycles (with e.g. llvm-mca), rather than nanoseconds.

Much like @thomcc I’m worried that that the impact on having an unconditional unaligned load will move the performance hit outside of the error margin on architectures where unaligned loads are expensive (and possibly simulated in generated code).

One thing we could try to mitigate the cost somewhat is replace the unaligned load with a plain byte-by-byte loop if we are making that part of code unconditional... though it would probably make the improvement also vanish...

All that said, I guess I'm not super against the idea of landing this, but it does not look like a slam dunk either.

pickfire

comment created time in 14 days

pull request commentrust-lang/rust

Rename HAIR to THIR (Typed HIR).

I'm not sure whether this ought to merit an MCP

IMHO we can do without one here. HAIR is a comparatively obscure internal representation not many interact with or will have an opinion about.

Lezzz

comment created time in 14 days

pull request commentrust-lang/rust

Stabilize the backtrace feature.

The inherent

impl dyn Error {
    pub fn downcast<T: Error + 'static>(self: Box<Self>) -> Result<Box<T>, Box<dyn Error>> { ... }
}

method was cited as problematic for Error’s inclusion in core, however is it really the case? Is there any reason we couldn’t put something along the lines of

impl Box<dyn core::error::Error> {
    pub fn downcast<T: core::error::Error + 'static>(self) -> Result<Box<T>, Box<dyn core::error::Error>> { ... }
}

in liballoc instead? That way libcore users wouldn’t have downcast available to them, but its not like they have Box without liballoc anyway.

withoutboats

comment created time in 14 days

issue openedimage-rs/image

image::flat::Error does not implement std::error::Error

This happens in 0.23.7

Expected

Error types should implement the standard Error trait.

Actual behaviour

image::flat::Error does not implement the standard Error trait.

Reproduction steps

extern crate image;

fn is_error<E: std::error::Error>() {
    is_error::<image::flat::Error>;
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=07a134e278eaa0d55afab09954ff7159

created time in 15 days

pull request commentrust-lang/rust

Fix an ICE on an invalid `binding @ ...` in a tuple struct pattern

@bors r+

jakubadamw

comment created time in 15 days

Pull request review commentrust-lang/rust

Fix an ICE on an invalid `binding @ ...` in a tuple struct pattern

 impl<'a, 'hir> LoweringContext<'a, 'hir> {                             Applicability::MaybeIncorrect,                         )                         .emit();-                    break;

This specific piece of code was conceptually correct – interpreting this pattern as a simple PatKind::Rest makes the most sense.

In that respect, failing to resolve the variable (i.e. using the first iteration of this PR) seems like a better workaround.

The ideal fix would probably be something that ensures users cannot interpret ddpos as anything but PatKind::Rest. Achieving this might involve reworking the AST representation or something along those lines. Or perhaps something along the lines of the following hack might work:

            elems.push(self.lower_pat(Pat {
                kind: PatKind::Rest,
                ..pat
            }));
            break;

so that the downstream users also end up seeing .. rather than x@.. with a potential to mis-interpret it in some way.

jakubadamw

comment created time in 15 days

pull request commentrust-lang/rust

Fix an ICE on an invalid `binding @ ...` in a tuple struct pattern

I personally like the former slightly more because its not as redundant (and x not being found is fine given that its a variable introduced by an error), but ultimately either error seems fine to me as either approach fixes the significantly more important problem – the ICE.

In the end the decision on diagnostics is something that @rust-lang/wg-diagnostics is responsible for and the diagnostics themselves can be iterated further in a separate PR that does not target a backport to beta or stable. If we want this to be backportable it needs to be as simple and obviously correct as it can possibly get.

jakubadamw

comment created time in 15 days

pull request commentrust-lang/rust

Fix an ICE on an invalid `binding @ ...` in a tuple struct pattern

@bors r+

jakubadamw

comment created time in 15 days

Pull request review commentrust-lang/rust

Fix an ICE on an invalid `binding @ ...` in a tuple struct pattern

 impl<'a, 'b, 'ast> LateResolutionVisitor<'a, 'b, 'ast> {         pat_src: PatternSource,         bindings: &mut SmallVec<[(PatBoundCtx, FxHashSet<Ident>); 1]>,     ) {+        let is_tuple_struct_pat = matches!(pat.kind, PatKind::TupleStruct(_, _));+         // Visit all direct subpatterns of this pattern.         pat.walk(&mut |pat| {             debug!("resolve_pattern pat={:?} node={:?}", pat, pat.kind);             match pat.kind {-                PatKind::Ident(bmode, ident, ref sub) => {+                PatKind::Ident(bmode, ident, ref sub)+                    // In tuple struct patterns ignore the invalid `ident @ ...`.+                    // It will be handled as an error by the AST lowering.+                    if !(is_tuple_struct_pat && sub.as_ref().filter(|p| p.is_rest()).is_some()) => {

Very minor nit: this is indented weird, it took me a couple reads before I realised this is a pattern guard.

jakubadamw

comment created time in 15 days

issue commentrust-random/rand

Use-after-free when using a ThreadRng from a std::thread_local destructor

It is not clear to me if 6 is a viable solution. mem::forget is safe. If somebody had a good reason to forget it for any reason they would end up with a panic in the TLS destructor, which can possibly abort the entire program for no good reason.

In particular I think something along the lines of the following snippet can be fairly plausible:

let thread_rng = Box::new(thread_rng());
let ptr = Box::into_raw(thread_rng);
// something that could maybe panic, if panic occurs you end up with a leaked reference
// eventually on thread termination the TLS destructor may run and may abort if the thread
// termination is due to this or a different panic.
let thread_rng = Box::from_raw(ptr);
// carry on...
nathdobson

comment created time in 16 days

pull request commentrust-lang/rfcs

Add Drop::poll_drop_ready for asynchronous destructors

@burdges from what I can tell not at all. AFAIK accounting for the inherent lack of ordering in TLS destructors can only really be done in library/application specific manner.

withoutboats

comment created time in 16 days

issue commentrust-random/rand

Use-after-free when using a ThreadRng from a std::thread_local destructor

Could you explain in a bit more detail the failure case with (3)

I’m also curious. As the sample implementation is written now, the example will just panic if thread_rng is no longer usable. From what I can tell there's no difference between implementations (3) and (2).

Unless the intent was to say that users can misuse the ThreadRng::rng method to obtain and retain the pointer?


In my opinion the possibility of a panic (from LocalKey::with) inside of a destructor is also a problem, but from what I can tell the APIs don’t really allow for anything else. Could we perhaps fall back to something slower like getrandom, perhaps, if TLS key can no longer be accessed?

nathdobson

comment created time in 16 days

Pull request review commentrust-lang/compiler-builtins

Use REP MOVSB/STOSB/CMPSB on x86_64

+use super::c_int;++// On recent Intel processors, "rep movsb" and "rep stosb" have been enhanced to

How does this implementation fare on non-intel implementations of x86_64?

josephlr

comment created time in 17 days

pull request commentrust-lang/rust

Rearrange the pipeline of `pow` to gain efficiency

It took me a while to understand the benchmark results, but from what I can tell the results are all awfully close to being within error margin? Can the benchmarks be done with something like e.g. criterion so that we get more information than just the sum?

Neutron3529

comment created time in 17 days

pull request commentrust-lang/measureme

Add named "additional data" support

Note: this doesn’t actually add the API to measureme to output such data. Ideas on what API would make most sense here would be appreciated.

nagisa

comment created time in 17 days

push eventnagisa/measureme

Simonas Kazlauskas

commit sha 48c55c49842910d8c3e59a4b9c1509da52d6dda9

Add named "additional data" support This allows representing KV pairs of data in e.g. chrome profiles more faithfully.

view details

push time in 17 days

PR opened rust-lang/measureme

Add named "additional data" support

This allows representing KV pairs of data in e.g. chrome profiles more faithfully. Previously we would automatically generate arg0... as a key for all such data.

Chromium Profiler showing proper key-value information for the selected span

+101 -51

0 comment

7 changed files

pr created time in 17 days

create barnchnagisa/measureme

branch : nagisa/kv-additional-data

created branch time in 17 days

Pull request review commentrust-lang/rfcs

Add Drop::poll_drop_ready for asynchronous destructors

+- Feature Name: `poll_drop_ready`+- Start Date: 2020-07-17+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++Add a `poll_drop_ready` method to `Drop`, and change the drop glue in async+contexts to call this before calling `drop`. This will allow users to perform+async operations during destructors when they are called in an async context.++# Motivation+[motivation]: #motivation++Rust encourages an idiom which originated in C++ called "Resource Acquisition+Is Initialization," or "RAII." This name is very unclear, but what it means is+that objects in Rust can be used to represent program state and IO resources,+which clean themselves up when they go out of scope and their destructors run.+As Yehuda Katz wrote in 2014 [Rust means never having to close a+socket.][close-a-socket]++However, there is no way to perform a *nonblocking* operation inside a+destructor today without just blocking the thread the destructor is running on. +This is because destructors have no way to yield control, allowing the executor+to run other tasks until this operation is ready to continue. This means that+the performance advantages of our nonblocking and concurrent Future constructs+do not apply to destructor clean up code. It would be preferable for these+constructs to be able to use nonblocking IO when they are used in an async+context.++I'll describe a few concrete examples.++## Flushing and buffered writers++It's fairly common to create `Write` types which perform flush, fsync or even+writes in their destructor. For types which are intended to perform+asynchronous writes, its not possible to do this as a part of the destructor+code.++For example, the std `BufWriter` type is designed to take a series of small+writes and perform a smaller number of large writes at once. It possibly+performs this in its destructor, to guarantee that all writes made to it+ultimately go through to the underlying `Write` type. A similar construct for+async code would have to either: a) flush in a blocking manner, b) spawn a +new task onto some sort of executor to perform the flush, c) not flush at all+in the destructor, opening the user up to missed writes when they forget to+flush before dropping.++The [`BufWriter` type in `async-std`][bufwriter], for example, currently does+not flush in its destructor because it cannot do so asynchronously. With+`poll_drop_ready`, it would be able to perform an asynchronous flush as a part+of its destructor code.++## Types can which close asynchronously++Types which guard file descriptors usually close the file descriptor when they+are dropped. Some interfaces, like io-uring on Linux, allow closing file+descriptors to be performed asynchronously.  This cannot be performed by the+destructor, because it cannot yield control.++## Types which update internal program state in their destructors++It's not only IO that could be made non-blocking. Some types update state that+is internal to the program when they are dropped, using types like mutexes and+channels. If these programs want to use the nonblocking version of these+constructs, so that only this task waits for the channel to have room or the+mutex to be free, this is not currently possible in their destructor. Instead,+they have to block the entire thread.++## Scope guards++One pattern used in Rust is the "scope guard" - a guard which cleans up state+after user code has executed. There are important caveats to implementing this+pattern safely in a public API (which are better covered in discussions of+memory leaks and destructors), but it can be done safely by passing a reference+to the scope guard to a higher order function. In these examples, the scope+guard implements a destructor which performs the clean up. This way, even if the+client code panics and unwinds, the clean up gets performed. That clean up+cannot be asynchronous today, because the destructor has no way to yield+control.++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++The `Drop` trait gains one new method, with a default implementation:++```rust+trait Drop {+    fn drop(&mut self);++    fn poll_drop_ready(&mut self, cx: &mut Context<'_>) -> Poll<()> {+        Poll::Ready(())+    }+}+```++Like `Drop::drop`, it would not be possible to call `poll_drop_ready` on a+type (users should use `mem::poll_drop_ready` discussed below instead).++When a value is dropped inside of an async context (that is, an async function+or block), it's `poll_drop_ready` method will be called repeatedly until it+returns `Poll::Ready(())`. Then, its `drop` method will be called. This way,+users can perform (or prepare to perform) nonblocking operations during+destructors when they are called in an async context.++It's important to note, however, that `poll_drop_ready` may be called even+after it has returned `Poll::Ready(())`. This is different from the `drop`+method, which is generally guaranteed to be called only once. Users+implementing `poll_drop_ready` should take care to ensure that it has "fuse"+semantics - that once it returns `Ready(())`, it continues to return that value+thereafter.++These additional APIs are also added to the `std::mem` module:++```rust+// an empty async function+async fn drop_async<T>(to_drop: T) { }++fn poll_drop_ready<T>(&mut self, cx: &mut Context<'_>) -> Poll<()>+    where T: ?Sized+{+    // implemented through a lang item+}+```++The `drop_async` function is analogous to the `drop` function, and is also+added to the prelude as well. This function drops the value in an async+context, guaranteeing that its `poll_drop_ready` method will be called.++The `poll_drop_ready` function calls this value's "drop ready glue" - it calls+`poll_drop_ready` on this value and also on all of its fields, recursively. The+exact ordering and semantics of this glue are specified in the reference+section.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++## Guarantees about when `poll_drop_ready` is called++In general, users cannot assume that `poll_drop_ready` will ever be called,+just as they cannot assume that destructors will run at all. However, they can+assume less about `poll_drop_ready` than they can about `drop`, and that should+be noted.++In particular, users cannot safely assume that their values will be dropped in+an async context, regardless of how they are intended to be used. It's this+easy to supress the `poll_drop_ready` call of a value:++```rust+async fn ignore_poll_drop_ready<T>(x: T) {+    // by passing `x` to `drop`, a non-async context, `poll_drop_ready` is+    // never called when dropping `x`.+    drop(x);+}+```++However, we do guarantee that values dropped in an async context *will* have+`poll_drop_ready` called, and we even guarantee the drop order between+variables and between fields of a type. When the user sees a value go out of+scope *in an async context*, they know that `poll_drop_ready` is called. And we+do guarantee that the destructors of all fields of that type will be called.

Does this need to hold up in presence of panics? Double panics? If one of the asynchronous destructors panics when polled, do we attempt to drive to completion the other destructors? Do we call the non-async drop for the fields which have had their async destructor panic? Do we call non-async portion of the drop glue at all after a panic in an async destructor?

withoutboats

comment created time in 18 days

Pull request review commentrust-lang/rfcs

Add Drop::poll_drop_ready for asynchronous destructors

+- Feature Name: `poll_drop_ready`+- Start Date: 2020-07-17+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++Add a `poll_drop_ready` method to `Drop`, and change the drop glue in async+contexts to call this before calling `drop`. This will allow users to perform+async operations during destructors when they are called in an async context.++# Motivation+[motivation]: #motivation++Rust encourages an idiom which originated in C++ called "Resource Acquisition+Is Initialization," or "RAII." This name is very unclear, but what it means is+that objects in Rust can be used to represent program state and IO resources,+which clean themselves up when they go out of scope and their destructors run.+As Yehuda Katz wrote in 2014 [Rust means never having to close a+socket.][close-a-socket]++However, there is no way to perform a *nonblocking* operation inside a+destructor today without just blocking the thread the destructor is running on. +This is because destructors have no way to yield control, allowing the executor+to run other tasks until this operation is ready to continue. This means that+the performance advantages of our nonblocking and concurrent Future constructs+do not apply to destructor clean up code. It would be preferable for these+constructs to be able to use nonblocking IO when they are used in an async+context.++I'll describe a few concrete examples.++## Flushing and buffered writers++It's fairly common to create `Write` types which perform flush, fsync or even+writes in their destructor. For types which are intended to perform+asynchronous writes, its not possible to do this as a part of the destructor+code.++For example, the std `BufWriter` type is designed to take a series of small+writes and perform a smaller number of large writes at once. It possibly+performs this in its destructor, to guarantee that all writes made to it+ultimately go through to the underlying `Write` type. A similar construct for+async code would have to either: a) flush in a blocking manner, b) spawn a +new task onto some sort of executor to perform the flush, c) not flush at all+in the destructor, opening the user up to missed writes when they forget to+flush before dropping.++The [`BufWriter` type in `async-std`][bufwriter], for example, currently does+not flush in its destructor because it cannot do so asynchronously. With+`poll_drop_ready`, it would be able to perform an asynchronous flush as a part+of its destructor code.++## Types can which close asynchronously++Types which guard file descriptors usually close the file descriptor when they+are dropped. Some interfaces, like io-uring on Linux, allow closing file+descriptors to be performed asynchronously.  This cannot be performed by the+destructor, because it cannot yield control.++## Types which update internal program state in their destructors++It's not only IO that could be made non-blocking. Some types update state that+is internal to the program when they are dropped, using types like mutexes and+channels. If these programs want to use the nonblocking version of these+constructs, so that only this task waits for the channel to have room or the+mutex to be free, this is not currently possible in their destructor. Instead,+they have to block the entire thread.++## Scope guards++One pattern used in Rust is the "scope guard" - a guard which cleans up state+after user code has executed. There are important caveats to implementing this+pattern safely in a public API (which are better covered in discussions of+memory leaks and destructors), but it can be done safely by passing a reference+to the scope guard to a higher order function. In these examples, the scope+guard implements a destructor which performs the clean up. This way, even if the+client code panics and unwinds, the clean up gets performed. That clean up+cannot be asynchronous today, because the destructor has no way to yield+control.++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++The `Drop` trait gains one new method, with a default implementation:++```rust+trait Drop {+    fn drop(&mut self);++    fn poll_drop_ready(&mut self, cx: &mut Context<'_>) -> Poll<()> {+        Poll::Ready(())+    }+}+```++Like `Drop::drop`, it would not be possible to call `poll_drop_ready` on a+type (users should use `mem::poll_drop_ready` discussed below instead).++When a value is dropped inside of an async context (that is, an async function+or block), it's `poll_drop_ready` method will be called repeatedly until it+returns `Poll::Ready(())`. Then, its `drop` method will be called. This way,+users can perform (or prepare to perform) nonblocking operations during+destructors when they are called in an async context.++It's important to note, however, that `poll_drop_ready` may be called even+after it has returned `Poll::Ready(())`. This is different from the `drop`+method, which is generally guaranteed to be called only once. Users+implementing `poll_drop_ready` should take care to ensure that it has "fuse"+semantics - that once it returns `Ready(())`, it continues to return that value+thereafter.++These additional APIs are also added to the `std::mem` module:++```rust+// an empty async function+async fn drop_async<T>(to_drop: T) { }++fn poll_drop_ready<T>(&mut self, cx: &mut Context<'_>) -> Poll<()>+    where T: ?Sized+{+    // implemented through a lang item+}+```++The `drop_async` function is analogous to the `drop` function, and is also+added to the prelude as well. This function drops the value in an async+context, guaranteeing that its `poll_drop_ready` method will be called.++The `poll_drop_ready` function calls this value's "drop ready glue" - it calls+`poll_drop_ready` on this value and also on all of its fields, recursively. The+exact ordering and semantics of this glue are specified in the reference+section.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++## Guarantees about when `poll_drop_ready` is called++In general, users cannot assume that `poll_drop_ready` will ever be called,+just as they cannot assume that destructors will run at all. However, they can+assume less about `poll_drop_ready` than they can about `drop`, and that should+be noted.++In particular, users cannot safely assume that their values will be dropped in+an async context, regardless of how they are intended to be used. It's this+easy to supress the `poll_drop_ready` call of a value:++```rust+async fn ignore_poll_drop_ready<T>(x: T) {+    // by passing `x` to `drop`, a non-async context, `poll_drop_ready` is+    // never called when dropping `x`.+    drop(x);+}+```++However, we do guarantee that values dropped in an async context *will* have+`poll_drop_ready` called, and we even guarantee the drop order between+variables and between fields of a type. When the user sees a value go out of+scope *in an async context*, they know that `poll_drop_ready` is called. And we+do guarantee that the destructors of all fields of that type will be called.++## Changes to destructor calls in async contexts++In async contexts, we change the generated destructor calls to insert, before+each call, this code:++```rust+while let Poll::Pending = mem::poll_drop_ready(obj, cx) {+    yield Poll::Pending;+}+```++This "prepares" the object to be destroyed by calling its destructor, calling+`poll_drop_ready` on the object and all of its fields. This is called before+any destructors are run on that object.++In non-async contexts, no change is made to destructors. This means that users+writing low level primitives which may want to run async destructor code on+values they are destroying (such as inside the poll method of a+manually-implemented future) will need to call `mem::poll_drop_ready`+themselves, or this code will be skipped.++Therefore, there are two cases in which users will have to take special care to+ensure that async destructors are called:++1. If they would already have to call `drop_in_place` to ensure that normal+   destructors are called (as in data structures like `Vec`).+2. If they are dropping a value that may have an async destructor inside a+   poll method.++## Drop glue in `mem::poll_drop_ready`++`mem::poll_drop_ready` will first call `Drop::poll_drop_ready` on this value,+before recursively calling `mem::poll_drop_ready` on every field of the value.+It will logically "AND" all of the return values, so that it will return+`Poll::Ready` only if all calls return `Poll::Ready`, and otherwise return+`Poll::Pending`. In pseudo-code:++```rust+let mut ready = self.poll_drop_ready(cx);+$(+    ready &= mem::poll_drop_ready(&mut self.fieldN, cx);+)*+ready+```++### The necessity of fused semantics++Calls to `poll_drop_ready` occur in a loop, until every recursive subcall+returns `Ready`. These calls are (by necessity) stateless: we would be required+to add secret state to every struct which has `poll_drop_ready` to enable the+drop glue to work. We used to have similar secret drop flags to support normal+destructors, but have managed to eliminate it in RFC 320. Consistent with that+philosophy, we want the `poll_drop_ready` glue to be stateless as well. What+this means is that every iteration of the loop, we will call the+`poll_drop_ready` function on each field again, until all of them return Ready.++It is necessary, therefore, when implementing `poll_drop_ready`, to prepare for+the possibility that it will be called after it has returned `Ready`. The best+way to write this is to give it "fused" semantics: have a final state it enters+into from which it will always return `Poll::Ready`.++Futures in general don't have fused semantics because the value they return on+Ready cannot necessarily be manufactured more than once. Because+`poll_drop_ready` evaluates to a Poll of unit, which is a zero sized type that+is trivial to produce, it is much more straightforward to implement it with+fused semantics.++### Why `poll_drop_ready` does not use `Pin`++It would have been ideal for `poll_drop_ready` to use `Pin`, just like `poll`+does. However, this would be unsound because of the definition of the Drop+trait. This is a reoccurrence of an inconvenience caused by the definition of+Drop, as it relates to pin projections.++As first context: the definition of Drop is in some sense "wrong." It would be+preferable of `Drop::drop` took self as `Pin<&mut Self>`, just like+`Future::poll` does. Once something is dropped, we know it won't be moved again+outside of the destructor. But because Drop receives an unpinned mutable+reference, we know that it *can* move self, or even a field of self.++Moving *fields* of self is a particular problem because of how it interacts+with pin *projections* - that is, taking a Pin reference to type and+"projecting" to a pinned reference to a field of that type. If you move out of+the fields of a struct in the struct's destructor, pin projecting to that field+would be unsound, because it would have been moved after it was pinned and+before its destructor ran.++This applies equally well to `poll_drop_ready`. If `poll_drop_ready` took self+by `Pin`, in order to generate drop glue it would need to pin project to each+field of the type. The order of operations would be:++1. `poll_drop_ready` runs, pin projecting to each field of the type.+2. The type's destructor runs, potentially moving each field.+3. Each field's destructor runs, after the type's destructor has potentially+moved them.++Because of this sequence, the in projection in `poll_drop_ready` would be+unsound. Therefore, `poll_drop_ready` cannot take self by Pin. Here is a bit of+code demonstrating what I mean:++```rust+// If `field` implements `poll_drop_ready` and `poll_drop_ready` used `Pin`,+// `field` would be witnessed as pinned before calling drop on `Foo`.+struct Foo<T: Default> {+    field: T+}++impl<T: Default> Drop for Foo<T> {+    fn drop(&mut self) {+        // Then, in the destructor for `Foo`, we can move `field` and re-pin it+        // at a different location, violating the guarantees of `Pin`.++        let moved_field = mem::replace(&mut self.field, T::default());+        let pinned: Box<Pin<T>> = Box::pin(moved_field);+    }+}+```++However, its worth noting that a user can locally reason about whether private+fields can be moved or accessed again, based on the APIs exposed by a type,+and, if necessary, reconstruct a pin of those fields. [The pin documentation+contains relevant information.][pin-docs] In other words, if your drop+implementation is implemented correctly, you can construct a `Pin` of self+inside your `poll_drop_ready`.++## Drop order++The order of calls to `poll_drop_ready` can be inferred from the rest of this+document, but it's good to specify them explicitly.++Between variables, calls to `poll_drop_ready` will occur in reverse order of+the variable's introduction, immediately prior to the calls to that variable's+destructor.++Between fields of a type, calls to `poll_drop_ready` will occur in the textual+order of that type's declaration, with the call for the type itself occuring+first. This is similar to the order for calls to drop. *However*, these calls+will occur in a loop, at the level of the type being dropped, until all calls+to `poll_drop_ready` for that value have returned `Ready`, so they will be+"interleaved" and concurrent in practice. The program will *not* wait for each+field to return ready before beginning to process the subsequent field.

Consider referencing https://github.com/rust-lang/rfcs/blob/master/text/1857-stabilize-drop-order.md here.

Also consider mentioning how built-in structures like Vec ought to behave and if there are or aren’t any special considerations in presence of panics, especially during structure construction.

withoutboats

comment created time in 18 days

Pull request review commentrust-lang/rfcs

Add Drop::poll_drop_ready for asynchronous destructors

+- Feature Name: `poll_drop_ready`+- Start Date: 2020-07-17+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++Add a `poll_drop_ready` method to `Drop`, and change the drop glue in async+contexts to call this before calling `drop`. This will allow users to perform+async operations during destructors when they are called in an async context.++# Motivation+[motivation]: #motivation++Rust encourages an idiom which originated in C++ called "Resource Acquisition+Is Initialization," or "RAII." This name is very unclear, but what it means is+that objects in Rust can be used to represent program state and IO resources,+which clean themselves up when they go out of scope and their destructors run.+As Yehuda Katz wrote in 2014 [Rust means never having to close a+socket.][close-a-socket]++However, there is no way to perform a *nonblocking* operation inside a+destructor today without just blocking the thread the destructor is running on. +This is because destructors have no way to yield control, allowing the executor+to run other tasks until this operation is ready to continue. This means that+the performance advantages of our nonblocking and concurrent Future constructs+do not apply to destructor clean up code. It would be preferable for these+constructs to be able to use nonblocking IO when they are used in an async+context.++I'll describe a few concrete examples.++## Flushing and buffered writers++It's fairly common to create `Write` types which perform flush, fsync or even+writes in their destructor. For types which are intended to perform+asynchronous writes, its not possible to do this as a part of the destructor+code.++For example, the std `BufWriter` type is designed to take a series of small+writes and perform a smaller number of large writes at once. It possibly+performs this in its destructor, to guarantee that all writes made to it+ultimately go through to the underlying `Write` type. A similar construct for+async code would have to either: a) flush in a blocking manner, b) spawn a +new task onto some sort of executor to perform the flush, c) not flush at all+in the destructor, opening the user up to missed writes when they forget to+flush before dropping.++The [`BufWriter` type in `async-std`][bufwriter], for example, currently does+not flush in its destructor because it cannot do so asynchronously. With+`poll_drop_ready`, it would be able to perform an asynchronous flush as a part+of its destructor code.++## Types can which close asynchronously

s/types can which/types which can/?

withoutboats

comment created time in 18 days

Pull request review commentrust-lang/rfcs

Stabilize Cargo's new feature resolver

+- Feature Name: `cargo-features2`+- Start Date: 2020-05-09+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary++This RFC is to gather final feedback on stabilizing the new feature resolver+in Cargo. This new feature resolver introduces a new algorithm for computing+[package features][docs-old-features] that helps to avoid some unwanted+unification that happens in the current resolver. This also includes some+changes in how features are enabled on the command-line.++These changes have already been implemented and are available on the nightly+channel as an unstable feature. See the [unstable feature docs] for+information on how to test out the new resolver, and the [unstable package+flags] for information on the new flag behavior.++> *Note*: The new feature resolver does not address all of the enhancement+> requests for feature resolution. Some of these are listed below in the+> [Feature resolver enhancements](#feature-resolver-enhancements) section.+> These are explicitly deferred for future work.++[docs-old-features]: https://doc.rust-lang.org/nightly/cargo/reference/features.html+[unstable feature docs]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#features+[unstable package flags]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#package-features++# Motivation++## Feature unification++Currently, when features are computed for a package, Cargo takes the union of+all requested features in all situations for that package. This is relatively+easy to understand, and ensures that packages are only built once during a+single build. However, this has problems when features introduce unwanted+behavior, dependencies, or other requirements. The following three situations+illustrate some of the unwanted feature unification that the new resolver aims+to solve:++* Unused targets: If a dependency shows up multiple times in the resolve+  graph, and one of those situations is a target-specific dependency, the+  features it enables are enabled on all platforms. See [target+  dependencies](#target-dependencies) below for how this problem is solved.++* Dev-dependencies: If a dependency is shared as a normal dependency and a+  dev-dependency, then any features enabled on the dev-dependency will also+  show up when used as a normal dependency. This only applies to workspace+  packages; dev-dependencies in packages on registries like [crates.io] have+  always been ignored. `cargo install` has also always ignored+  dev-dependencies. See [dev-dependencies](#dev-dependencies) below for how+  this problem is solved.++* Host-dependencies: Similarly to dev-dependencies, if a build-dependency or+  proc-macro has a shared dependency with a normal dependency, then the+  features are unified with the normal dependency. See [host+  dependencies](#host-dependencies) below for how this problem is solved.++[crates.io]: https://crates.io/++## Command-line feature selection++Cargo has several flags for choosing which features are enabled during a+build. `--features` allows enabling individual features, `--all-features`+enables all features, and `--no-default-features` ensures the "default"+feature is not automatically enabled.++These are fairly straightforward when used with a single package, but in a+workspace the current behavior is limited and confusing. There are several+problems in a workspace:++* `cargo build -p other_member --features …` — The listed features are for the+  package in the current directory, even if that package isn't being built!+  This also makes it difficult or impossible to build multiple packages at+  once with different features enabled.+* `--features` and `--no-default-features` flags are not allowed in the root+  of a virtual workspace.++See [New command-line behavior](#new-command-line-behavior) below for how+these problems are solved.++# Guide-level explanation++## New resolver behavior++When the new feature resolver is enabled, features are not always unified when+a dependency appears multiple times in the dependency graph. The new behaviors+are described below.++For [target dependencies](#target-dependencies) and+[dev-dependencies](#dev-dependencies), the general rule is, if a dependency is+not built, it does not affect feature resolution. For [host+dependencies](#host-dependencies), the general rule is that packages used for+building (like proc-macros) do not affect the packages being built.++The following three sections describe the new behavior for three difference+situations.++### Target dependencies++When a package appears multiple times in the build graph, and one of those+instances is a target-specific dependency, then the features of the+target-specific dependency are only enabled if the target is currently being+built. For example:++```toml+[dependency.common]+version = "1.0"+features = ["f1"]++[target.'cfg(windows)'.dependencies.common]+version = "1.0"+features = ["f2"]+```++When building this example for a non-Windows platform, the `f2` feature will+*not* be enabled.++### dev-dependencies++When a package is shared as a normal dependency and a dev-dependency, the+dev-dependency features are only enabled if the current build is including+dev-dependencies. For example:++```toml+[dependencies]+serde = {version = "1.0", default-features = false}++[dev-dependencies]+serde = {version = "1.0", features = ["std"]}+```++In this situation, a normal `cargo build` will build `serde` without any+features. When built with `cargo test`, Cargo will build `serde` with its+default features plus the "std" feature.++Note that this is a global decision. So a command like `cargo build+--all-targets` will include examples and tests, and thus features from+dev-dependencies will be enabled.++### Host dependencies++When a package is shared as a normal dependency and a build-dependency or+proc-macro, the features for the normal dependency are kept independent of the+build-dependency or proc-macro. For example:++```toml+[dependencies]+log = "0.4"++[build-dependencies]+log = {version = "0.4", features=['std']}+```++In this situation, the `log` package will be built with the default features+for the normal dependencies. As a build-dependency, it will have the `std`+feature enabled. This means that `log` will be built twice, once without `std`+and once with `std`.++Note that a dependency shared between a build-dependency and proc-macro are+still unified. This is intended to help reduce build times, and is expected to+be unlikely to cause problems that feature unification usually cause because+they are both being built for the host compiler, and are only used at build+time.++## Resolver opt-in++Testing has been performed on various projects. Some were found to fail to+compile with the new resolver. This is because some dependencies are written+to assume that features are enabled from another part of the graph. Because+the new resolver results in a backwards-incompatible change in resolver+behavior, the user must opt-in to use the new resolver. This can be done with+the `resolver` field in `Cargo.toml`:++```toml+[package]+name = "my-package"+version = "1.0.0"+resolver = "2"+```++Setting the resolver to `"2"` switches Cargo to use the new feature resolver.+It also enables backwards-incompatible behavior detailed in [New command-line+behavior](#new-command-line-behavior). `"2"` is the only valid option, if+`resolver` is not specified then the old behavior is used.++The `resolver` field is only honored in the top-level package or workspace, it+is ignored in dependencies. This is because feature-unification is an+inherently global decision.++If using a virtual workspace, the root definition should be in the+`[workspace]` table like this:++```toml+[workspace]+members = ["member1", "member2"]+resolver = "2"+```++For packages that encounter a problem due to missing feature declarations, it+is backwards-compatible to add the missing features. Adding those missing+features should not affect projects using the old resolver.++It is intended that `resolver = "2"` will likely become the default setting in+a future Rust Edition. See ["Default opt-in"](#default-opt-in) below for more+details.++## New command-line behavior++The following changes are made to the behavior of selecting features on the+command-line.++* Features listed in the `--features` flag no longer pay attention to the+  package in the current directory. Instead, it only enables the given+  features for the selected packages. Additionally, the features are enabled+  only if the the package defines the given features.++  For example:++      cargo build -p member1 -p member2 --features foo,bar++  In this situation, features "foo" and "bar" are enabled on the given members+  only if the member defines that feature. It is still an error if none of the+  selected packages defines a given feature.++* Features for individual packages can be enabled by using+  `member_name/feature_name` syntax. For example, `cargo build --workspace+  --feature member_name/feature_name` will build all packages in a workspace,+  and enable the given feature only for the given member.++* The `--features` and `--no-default-features` flags may now be used in the+  root of a virtual workspace.++The ability to set features for non-workspace members is not allowed, as the+resolver fundamentally does not support that ability.++The first change is only enabled if the `resolver = "2"` value is set in the+workspace manifest because it is a backwards-incompatible change. The other+changes are intended to be stabilized for everyone, as they only extend+previously invalid usage.++## `cargo metadata`++At this time, the `cargo metadata` command will not be changed to expose the+new feature resolver. The "features" field will continue to display the+features as computed by the original dependency resolver.++Properly expressing the dependency graph with features would require a number+of changes to `cargo metadata` that can add complexity to the interface. For+example, the following flags would need to be added to properly show how+features are selected:++* Workspace selection flags (`-p`, `--workspace`, `--exclude`).+* Whether or not dev-dependencies are included (`--dep-kinds`?).++Additionally, the current graph structure does not expose the host-vs-target+dependency relationship, among other issues.++It is intended that this will be addressed at some point in the future.+Feedback on desired use cases for feature information will help define the+solution. A possible alternative is to stabilize the [`--unit-graph`] flag,+which exposes Cargo's internal graph structure, which accurately indicates the+actual dependency relationships and uses the new feature resolver.++For non-parseable output, `cargo tree` will show features from the new+resolver.++[`--unit-graph`]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#unit-graph++# Drawbacks++There are a number of drawbacks to this approach:++* In some situations, dependencies will be built multiple times where they+  were previously only built once. This causes two problems: increased build+  times, and potentially broken builds when transitioning to the new resolver.+  It is intended that if the user wants to build a dependency once that now+  has non-unified features, they will need to add feature declarations within+  their dependencies so that they once again have the same features. The+  `cargo tree` command has been added to help the user identify and remedy+  these situations. `cargo tree -d` will expose dependencies that are built+  multiple times, and the `-e features` flag can be used to see which packages+  are enabling which features.++  Unfortunately the error message is not very clear when a feature that was+  previously assumed to be enabled is no longer enabled. Typically these+  appear in the form of unresolved paths. In testing so far, this has come up+  occasionally, but is usually fairly easy to identify what is wrong. Once+  more of the ecosystem starts using the new resolver, these errors should+  become less frequent.++* Feature unification with dev-dependencies being a global decision can result+  in some artifacts including features that may not be desired. For example, a+  project with a binary and a shared dependency that is used as a+  dev-dependency and a normal dependency. When running `cargo test` the binary+  will include the shared dev-dependency features. Compare this to a normal+  `cargo build --bin name`, where the binary will be built without those+  features. This means that if you are testing a binary with an integration+  test, you end up not testing the same thing as what is normally built.+  Changing this has significant drawbacks. Cargo's dependency graph+  construction will require fundamental changes to support this scenario.+  Additionally, it has a high risk that will cause increased build times for+  many projects that aren't affected or don't care that it may have slightly+  different features enabled.++* This adds complexity to Cargo, and adds boilerplate to `Cargo.toml`. It can+  also be confusing when switching between projects that use different+  settings. It is intended in the future that new resolver will become the+  default via the "edition" declaration. This will remove the extra

edition is a non-global choice and there workspaces generally do not have the edition annotation. Does this mean that workspaces are doomed to forever using resolver annotation and/or using the old resolver?

ehuss

comment created time in 18 days

Pull request review commentrust-lang/rfcs

Stabilize Cargo's new feature resolver

+- Feature Name: `cargo-features2`+- Start Date: 2020-05-09+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary++This RFC is to gather final feedback on stabilizing the new feature resolver+in Cargo. This new feature resolver introduces a new algorithm for computing+[package features][docs-old-features] that helps to avoid some unwanted+unification that happens in the current resolver. This also includes some+changes in how features are enabled on the command-line.++These changes have already been implemented and are available on the nightly+channel as an unstable feature. See the [unstable feature docs] for+information on how to test out the new resolver, and the [unstable package+flags] for information on the new flag behavior.++> *Note*: The new feature resolver does not address all of the enhancement+> requests for feature resolution. Some of these are listed below in the+> [Feature resolver enhancements](#feature-resolver-enhancements) section.+> These are explicitly deferred for future work.++[docs-old-features]: https://doc.rust-lang.org/nightly/cargo/reference/features.html+[unstable feature docs]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#features+[unstable package flags]: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#package-features++# Motivation++## Feature unification++Currently, when features are computed for a package, Cargo takes the union of+all requested features in all situations for that package. This is relatively+easy to understand, and ensures that packages are only built once during a+single build. However, this has problems when features introduce unwanted+behavior, dependencies, or other requirements. The following three situations+illustrate some of the unwanted feature unification that the new resolver aims+to solve:++* Unused targets: If a dependency shows up multiple times in the resolve+  graph, and one of those situations is a target-specific dependency, the+  features it enables are enabled on all platforms. See [target+  dependencies](#target-dependencies) below for how this problem is solved.++* Dev-dependencies: If a dependency is shared as a normal dependency and a+  dev-dependency, then any features enabled on the dev-dependency will also+  show up when used as a normal dependency. This only applies to workspace+  packages; dev-dependencies in packages on registries like [crates.io] have+  always been ignored. `cargo install` has also always ignored+  dev-dependencies. See [dev-dependencies](#dev-dependencies) below for how+  this problem is solved.++* Host-dependencies: Similarly to dev-dependencies, if a build-dependency or+  proc-macro has a shared dependency with a normal dependency, then the+  features are unified with the normal dependency. See [host+  dependencies](#host-dependencies) below for how this problem is solved.++[crates.io]: https://crates.io/++## Command-line feature selection++Cargo has several flags for choosing which features are enabled during a+build. `--features` allows enabling individual features, `--all-features`+enables all features, and `--no-default-features` ensures the "default"+feature is not automatically enabled.++These are fairly straightforward when used with a single package, but in a+workspace the current behavior is limited and confusing. There are several+problems in a workspace:++* `cargo build -p other_member --features …` — The listed features are for the+  package in the current directory, even if that package isn't being built!+  This also makes it difficult or impossible to build multiple packages at+  once with different features enabled.+* `--features` and `--no-default-features` flags are not allowed in the root+  of a virtual workspace.++See [New command-line behavior](#new-command-line-behavior) below for how+these problems are solved.++# Guide-level explanation++## New resolver behavior++When the new feature resolver is enabled, features are not always unified when+a dependency appears multiple times in the dependency graph. The new behaviors+are described below.++For [target dependencies](#target-dependencies) and+[dev-dependencies](#dev-dependencies), the general rule is, if a dependency is+not built, it does not affect feature resolution. For [host+dependencies](#host-dependencies), the general rule is that packages used for+building (like proc-macros) do not affect the packages being built.++The following three sections describe the new behavior for three difference+situations.++### Target dependencies++When a package appears multiple times in the build graph, and one of those+instances is a target-specific dependency, then the features of the+target-specific dependency are only enabled if the target is currently being+built. For example:++```toml+[dependency.common]+version = "1.0"+features = ["f1"]++[target.'cfg(windows)'.dependencies.common]+version = "1.0"+features = ["f2"]+```++When building this example for a non-Windows platform, the `f2` feature will+*not* be enabled.++### dev-dependencies++When a package is shared as a normal dependency and a dev-dependency, the+dev-dependency features are only enabled if the current build is including+dev-dependencies. For example:++```toml+[dependencies]+serde = {version = "1.0", default-features = false}++[dev-dependencies]+serde = {version = "1.0", features = ["std"]}+```++In this situation, a normal `cargo build` will build `serde` without any+features. When built with `cargo test`, Cargo will build `serde` with its+default features plus the "std" feature.++Note that this is a global decision. So a command like `cargo build+--all-targets` will include examples and tests, and thus features from+dev-dependencies will be enabled.++### Host dependencies++When a package is shared as a normal dependency and a build-dependency or+proc-macro, the features for the normal dependency are kept independent of the+build-dependency or proc-macro. For example:++```toml+[dependencies]+log = "0.4"++[build-dependencies]+log = {version = "0.4", features=['std']}+```++In this situation, the `log` package will be built with the default features+for the normal dependencies. As a build-dependency, it will have the `std`+feature enabled. This means that `log` will be built twice, once without `std`+and once with `std`.++Note that a dependency shared between a build-dependency and proc-macro are+still unified. This is intended to help reduce build times, and is expected to+be unlikely to cause problems that feature unification usually cause because+they are both being built for the host compiler, and are only used at build+time.++## Resolver opt-in++Testing has been performed on various projects. Some were found to fail to+compile with the new resolver. This is because some dependencies are written+to assume that features are enabled from another part of the graph. Because+the new resolver results in a backwards-incompatible change in resolver+behavior, the user must opt-in to use the new resolver. This can be done with+the `resolver` field in `Cargo.toml`:++```toml+[package]+name = "my-package"+version = "1.0.0"+resolver = "2"+```++Setting the resolver to `"2"` switches Cargo to use the new feature resolver.+It also enables backwards-incompatible behavior detailed in [New command-line+behavior](#new-command-line-behavior). `"2"` is the only valid option, if+`resolver` is not specified then the old behavior is used.

Is there any reason why resolver = "1" is not something we are going to accept, even if just for reduced surprise factor? (As a prior art --edition accepts both 2018 and 2015 as a value)

ehuss

comment created time in 18 days

Pull request review commentrust-lang/rfcs

RFC: Promote aarch64-unknown-linux-gnu to a Tier-1 Rust target

+- Feature Name: `promote-aarch64-unknown-linux-gnu-to-tier-1`+- Start Date: 2020-07-17+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++Promote the Arm aarch64-unknown-linux-gnu Rust target to Tier-1.++# Motivation+[motivation]: #motivation++Arch64-unknown-linux-gnu is [currently a Tier-2 Rust target](https://forge.rust-lang.org/release/platform-support.html#tier-2), in accordance with the target tier policy articulated [here](https://rust-lang.github.io/compiler-team/minutes/design-meeting/2019-09-20-target-tier-policy/).++In the last 2 quarters, very good progress has been made in understanding and filling the gaps that remain in the path to attaining Tier-1 status for this target.++As a direct result, those gaps have either already been filled or are very close to being filled.++As such, this RFC aims to:++- Evidence what has been done.++- On the basis of that evidence propose that the proceeedings to promote the aarch64-unknown-linux-gnu target to the Tier-1 category may please be kickstarted.++- Culminate in the actual promotion of the aarch64-unknown-linux-gnu target to Tier-1, including any and all of the relevant processes and actions as appropriate.++Please note that the narrative here doesn't always match the RFC template so some liberties may have been taken in the expression.++Please also note, by way of wilful disclosure, that this RFC's author is an employee of Arm.++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++1. **In essence, the target tier policy for a Tier-1 target aims to obtain the following technical and tangible assurances:**++   a. The Rust compiler and compiler tests must all build and pass reliably for the target in question.++   b. All necessary supporting infrastructure, including dedicated hardware, to build and run the Rust compiler and compiler tests reliably must be available openly.++   c. There must exist a robust and convenient CI integration for the target in question.++2. **In addition, the target tier policy for a Tier-1 target aims to obtain the following stategic assurances:**++   a. The long term viability of the existence of a target specific ecosystem should be clear.++   b. The long term viability of supporting the target should be clear.++   c. The target must have substantial and widespread interest within the Rust developer community.++   d. The target must serve the interests of multiple production users of Rust across multiple organizations or projects.++3. **Finally, the target tier policy for a Tier-1 target aims to obtain the following approvals:**++   a. An approval from the Compiler Team that Tier-1 target requirements have been met.++   b. An approval from the Infrastructure Team that the target in question may be integrated into CI.++   c. An approval from the Release Team that supporting the target in question is viable in the long term.++The following section details how points 1 and 2 of the above assurances have either already been met or are close to being met. ++Items in point 3 are addressed in the section titled [Unresolved Questions](#Unresolved-questions). That is not to say that they are unresolved per se but more that they are proposed next steps.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++**1.a. The Rust compiler and compiler tests must all build and pass reliably for the target in question.**++ - As of today, all tests pass reliably barring one test. + + - A fix addressing the failing test [has been posted for review.](https://github.com/rust-lang/rust/pull/73655)++    The failure will likely be addressed very shortly.++ - In addition, as a result of inputs from the core team, engineers from Arm performed an audit of all tests that are currently marked **'only-x86_64'** and **'only-aarch64'** has been done. This was to ascertain whether past viewpoints and/or decisions that led to those markings are still valid. +     - The audit report is available [here.](https://docs.google.com/spreadsheets/d/1B-Jg1Ml6nAF6Tf9wJGTgqkFUNeJEejC3aMikGl6vXlc/edit?usp=sharing)++     - Work is being planned under the guidance of core team members to upstream patches that came out of the audit as well as to address any open questions that came about.++**1.b. All necessary supporting infrastructure, including dedicated hardware, to build and run the Rust compiler and compiler tests reliably must be available openly.**++ - Two quarters ago, Arm donated a [Packet c2.large.arm system](https://www.packet.com/cloud/servers/c2-large-arm/) to the core team.++ - It is noteworthy that the core team have done a brilliant job in integrating this system into Rust's CI infrastructure while also circumventing myriad Github Actions security problems that popped up.++ - Over time, Arm intends to further donate newer and more capable hardware to this initiative.++**1.c. There must exist a robust and convenient CI integration for the target in question.**++ - The happy outcome of the core team's work with the donated system is that the system integrates largely seamlessly with existing Rust CI infrastructure. ++ - The integration has been verified to produce green runs once patches from the two outstanding PRs are in place.++**2.a. The long term viability of the existence of a target specific ecosystem should be clear.**++ - It is hard to concretely quantify this aspect.++ - That said, Arm AArch64 silicon is either already prevalent or is en-route to prevalance in a wide spectrum of application domains ranging from 'traditional' embedded systems at one end of the spectrum, on to mobile phones, clam-shell devices, desktops, vehicle autonomy controllers, datacenter servers etc all the way to high performance super-computers.++ - The evidence to that effect is too numerous to quote but generally easy to verify openly. ++ - It is fair to state that this is an ongoing reality which is unlikely to stop trending upwards and sidewards for the forseeable future.++ - Software stacks built for those domains predominantly use an AArch64 Linux kernel build.++ - Rust presents an attractive value proposition across all such domains, irrespective of the underlying processor architecture.++ - **As such, the Rust aarch64-unknown-linux-gnu target's ecosystem presents very strong viability for the long term.**++**2.b. The long term viability of supporting the target should be clear.**++ - It is hard to concretely quantify this aspect.++ - It is worth calling out, in the same vein as the previous point, that given the increasing prevalance of AArch64 silicon deployments and given Rust's general value proposition, **supporting the Rust aarch64-unknown-linux-gnu target presents very strong viability for the long term.**++**2.c. The target must have substantial and widespread interest within the Rust developer community.**++ - It is hard to concretely quantify this aspect.++ - It is generally fair to state that **there is already substantial and widespread interest for the aarch64-unknown-linux-gnu target in the Rust developer commmunity**.++ - It is also generally fair to state that there is a clear upward trend in the use of AArch64 systems as self hosted development environments. ++ - Most major operating system environments support hosted development on AArch64 based systems and this trend is increasing.++ - As a somewhat related note: Slow but steady progress is being made to support Windows AArch64 targets, initially for cross-platform development. This shall inevitably trend towards hosted development.++ - As such, **it is very likely that developer interest in Rust on aarch64-unknown-linux-gnu will continue to increase in the medium to long term.**++**2.d. The target must serve the interests of multiple production users of Rust across multiple organizations or projects.**++ - It is hard to concretely quantify this aspect.++ - Most major Arm software ecosystem partners are either already using Rust extensively, or are building up to extensive use. A few publically known examples are Microsoft and Google. There are many more.++ - Arm itself recognises Rust as an important component to consider in a broader horizontal safety and security foundation across multiple processor portfolios. ++ - Arm has dedicated a small team to help improve Rust for the aarch64-unknown-linux-gnu target.++ - **It is very likely that support for aarch64-unknown-linux-gnu in these organisations will trend upwards commensurate with the increasing prevalence of AArch64 silicon based systems.**++Points 3.a through 3.c from the [Guide-level explanation](#Guide-level-explanation) section above are addressed in the [Unresolved questions](#unresolved-questions) section below.++# Drawbacks+[drawbacks]: #drawbacks++**There is no drawback envisioned in promoting the Rust aarch64-unknown-linux-gnu to Tier-1.**

I don’t think your average contributor will be willing to spend their own dollars just to hack on an architecture that they don’t have much stake in.

That said, this doesn't really affect me much, I've in my possession of a fairly competent aarch64 machine already. If others don't consider it a problem, that’s fine by me.

raw-bin

comment created time in 18 days

pull request commentrust-lang/rfcs

RFC: Promote aarch64-unknown-linux-gnu to a Tier-1 Rust target

I’m in broad support of this RFC.

raw-bin

comment created time in 18 days

Pull request review commentrust-lang/rfcs

RFC: Promote aarch64-unknown-linux-gnu to a Tier-1 Rust target

+- Feature Name: `promote-aarch64-unknown-linux-gnu-to-tier-1`+- Start Date: 2020-07-17+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)++# Summary+[summary]: #summary++Promote the Arm aarch64-unknown-linux-gnu Rust target to Tier-1.++# Motivation+[motivation]: #motivation++Arch64-unknown-linux-gnu is [currently a Tier-2 Rust target](https://forge.rust-lang.org/release/platform-support.html#tier-2), in accordance with the target tier policy articulated [here](https://rust-lang.github.io/compiler-team/minutes/design-meeting/2019-09-20-target-tier-policy/).++In the last 2 quarters, very good progress has been made in understanding and filling the gaps that remain in the path to attaining Tier-1 status for this target.++As a direct result, those gaps have either already been filled or are very close to being filled.++As such, this RFC aims to:++- Evidence what has been done.++- On the basis of that evidence propose that the proceeedings to promote the aarch64-unknown-linux-gnu target to the Tier-1 category may please be kickstarted.++- Culminate in the actual promotion of the aarch64-unknown-linux-gnu target to Tier-1, including any and all of the relevant processes and actions as appropriate.++Please note that the narrative here doesn't always match the RFC template so some liberties may have been taken in the expression.++Please also note, by way of wilful disclosure, that this RFC's author is an employee of Arm.++# Guide-level explanation+[guide-level-explanation]: #guide-level-explanation++1. **In essence, the target tier policy for a Tier-1 target aims to obtain the following technical and tangible assurances:**++   a. The Rust compiler and compiler tests must all build and pass reliably for the target in question.++   b. All necessary supporting infrastructure, including dedicated hardware, to build and run the Rust compiler and compiler tests reliably must be available openly.++   c. There must exist a robust and convenient CI integration for the target in question.++2. **In addition, the target tier policy for a Tier-1 target aims to obtain the following stategic assurances:**++   a. The long term viability of the existence of a target specific ecosystem should be clear.++   b. The long term viability of supporting the target should be clear.++   c. The target must have substantial and widespread interest within the Rust developer community.++   d. The target must serve the interests of multiple production users of Rust across multiple organizations or projects.++3. **Finally, the target tier policy for a Tier-1 target aims to obtain the following approvals:**++   a. An approval from the Compiler Team that Tier-1 target requirements have been met.++   b. An approval from the Infrastructure Team that the target in question may be integrated into CI.++   c. An approval from the Release Team that supporting the target in question is viable in the long term.++The following section details how points 1 and 2 of the above assurances have either already been met or are close to being met. ++Items in point 3 are addressed in the section titled [Unresolved Questions](#Unresolved-questions). That is not to say that they are unresolved per se but more that they are proposed next steps.++# Reference-level explanation+[reference-level-explanation]: #reference-level-explanation++**1.a. The Rust compiler and compiler tests must all build and pass reliably for the target in question.**++ - As of today, all tests pass reliably barring one test. + + - A fix addressing the failing test [has been posted for review.](https://github.com/rust-lang/rust/pull/73655)++    The failure will likely be addressed very shortly.++ - In addition, as a result of inputs from the core team, engineers from Arm performed an audit of all tests that are currently marked **'only-x86_64'** and **'only-aarch64'** has been done. This was to ascertain whether past viewpoints and/or decisions that led to those markings are still valid. +     - The audit report is available [here.](https://docs.google.com/spreadsheets/d/1B-Jg1Ml6nAF6Tf9wJGTgqkFUNeJEejC3aMikGl6vXlc/edit?usp=sharing)++     - Work is being planned under the guidance of core team members to upstream patches that came out of the audit as well as to address any open questions that came about.++**1.b. All necessary supporting infrastructure, including dedicated hardware, to build and run the Rust compiler and compiler tests reliably must be available openly.**++ - Two quarters ago, Arm donated a [Packet c2.large.arm system](https://www.packet.com/cloud/servers/c2-large-arm/) to the core team.++ - It is noteworthy that the core team have done a brilliant job in integrating this system into Rust's CI infrastructure while also circumventing myriad Github Actions security problems that popped up.++ - Over time, Arm intends to further donate newer and more capable hardware to this initiative.++**1.c. There must exist a robust and convenient CI integration for the target in question.**++ - The happy outcome of the core team's work with the donated system is that the system integrates largely seamlessly with existing Rust CI infrastructure. ++ - The integration has been verified to produce green runs once patches from the two outstanding PRs are in place.++**2.a. The long term viability of the existence of a target specific ecosystem should be clear.**++ - It is hard to concretely quantify this aspect.++ - That said, Arm AArch64 silicon is either already prevalent or is en-route to prevalance in a wide spectrum of application domains ranging from 'traditional' embedded systems at one end of the spectrum, on to mobile phones, clam-shell devices, desktops, vehicle autonomy controllers, datacenter servers etc all the way to high performance super-computers.++ - The evidence to that effect is too numerous to quote but generally easy to verify openly. ++ - It is fair to state that this is an ongoing reality which is unlikely to stop trending upwards and sidewards for the forseeable future.++ - Software stacks built for those domains predominantly use an AArch64 Linux kernel build.++ - Rust presents an attractive value proposition across all such domains, irrespective of the underlying processor architecture.++ - **As such, the Rust aarch64-unknown-linux-gnu target's ecosystem presents very strong viability for the long term.**++**2.b. The long term viability of supporting the target should be clear.**++ - It is hard to concretely quantify this aspect.++ - It is worth calling out, in the same vein as the previous point, that given the increasing prevalance of AArch64 silicon deployments and given Rust's general value proposition, **supporting the Rust aarch64-unknown-linux-gnu target presents very strong viability for the long term.**++**2.c. The target must have substantial and widespread interest within the Rust developer community.**++ - It is hard to concretely quantify this aspect.++ - It is generally fair to state that **there is already substantial and widespread interest for the aarch64-unknown-linux-gnu target in the Rust developer commmunity**.++ - It is also generally fair to state that there is a clear upward trend in the use of AArch64 systems as self hosted development environments. ++ - Most major operating system environments support hosted development on AArch64 based systems and this trend is increasing.++ - As a somewhat related note: Slow but steady progress is being made to support Windows AArch64 targets, initially for cross-platform development. This shall inevitably trend towards hosted development.++ - As such, **it is very likely that developer interest in Rust on aarch64-unknown-linux-gnu will continue to increase in the medium to long term.**++**2.d. The target must serve the interests of multiple production users of Rust across multiple organizations or projects.**++ - It is hard to concretely quantify this aspect.++ - Most major Arm software ecosystem partners are either already using Rust extensively, or are building up to extensive use. A few publically known examples are Microsoft and Google. There are many more.++ - Arm itself recognises Rust as an important component to consider in a broader horizontal safety and security foundation across multiple processor portfolios. ++ - Arm has dedicated a small team to help improve Rust for the aarch64-unknown-linux-gnu target.++ - **It is very likely that support for aarch64-unknown-linux-gnu in these organisations will trend upwards commensurate with the increasing prevalence of AArch64 silicon based systems.**++Points 3.a through 3.c from the [Guide-level explanation](#Guide-level-explanation) section above are addressed in the [Unresolved questions](#unresolved-questions) section below.++# Drawbacks+[drawbacks]: #drawbacks++**There is no drawback envisioned in promoting the Rust aarch64-unknown-linux-gnu to Tier-1.**

There is one important caveat/drawback/? compared to other current T1 targets – most developers won’t yet have access to aarch64 hardware that's capable enough to support compiler development. We do not currently have any infrastructure to provide such access either. As such there's a significant limitation in terms of developer-power that we would be able to mobilize in case something broke on aarch64.

That’s not an explicit requirement for a target to become T1 though (should it be?), so not saying this is a blocker or anything of the sort.

raw-bin

comment created time in 18 days

pull request commentrust-lang/rust

bootstrap.py: patch RPATH on NixOS to handle the new zlib dependency.

@bors r+ with or without comments addressed.

eddyb

comment created time in 18 days

Pull request review commentrust-lang/rust

bootstrap.py: patch RPATH on NixOS to handle the new zlib dependency.

 def fix_executable(fname):         nix_os_msg = "info: you seem to be running NixOS. Attempting to patch"         print(nix_os_msg, fname) -        try:-            interpreter = subprocess.check_output(-                ["patchelf", "--print-interpreter", fname])-            interpreter = interpreter.strip().decode(default_encoding)-        except subprocess.CalledProcessError as reason:-            print("warning: failed to call patchelf:", reason)-            return--        loader = interpreter.split("/")[-1]--        try:-            ldd_output = subprocess.check_output(-                ['ldd', '/run/current-system/sw/bin/sh'])-            ldd_output = ldd_output.strip().decode(default_encoding)-        except subprocess.CalledProcessError as reason:-            print("warning: unable to call ldd:", reason)-            return--        for line in ldd_output.splitlines():-            libname = line.split()[0]-            if libname.endswith(loader):-                loader_path = libname[:len(libname) - len(loader)]-                break+        # Only build `stage0/.nix-deps` once.+        nix_deps_dir = self.nix_deps_dir+        if not nix_deps_dir:+            nix_deps_dir = "{}/.nix-deps".format(self.bin_root())+            if not os.path.exists(nix_deps_dir):+                os.makedirs(nix_deps_dir)++            nix_deps = [+                # Needed for the path of `ld-linux.so` (via `nix-support/dynamic-linker`).+                "stdenv.cc.bintools",++                # Needed as a system dependency of `libLLVM-*.so`.+                "zlib",++                # Needed for patching ELF binaries (see doc comment above).+                "patchelf",+            ]++            # Run `nix-build` to "build" each dependency (which will likely reuse+            # the existing `/nix/store` copy, or at most download a pre-built copy).+            # Importantly, we don't rely on `nix-build` printing the `/nix/store`+            # path on stdout, but use `-o` to symlink it into `stage0/.nix-deps/$dep`,+            # ensuring garbage collection will never remove the `/nix/store` path+            # (which would break our patched binaries that hardcode those paths).+            for dep in nix_deps:+                try:+                    subprocess.check_output([+                        "nix-build", "<nixpkgs>",+                        "-A", dep,+                        "-o", "{}/{}".format(nix_deps_dir, dep),+                    ])+                except subprocess.CalledProcessError as reason:+                    print("warning: failed to call nix-build:", reason)+                    return++            self.nix_deps_dir = nix_deps_dir++        patchelf = "{}/patchelf/bin/patchelf".format(nix_deps_dir)++        if fname.endswith(".so"):+            # Dynamic library, patch RPATH to point to system dependencies.+            dylib_deps = ["zlib"]+            rpath_entries = [

yeah, it is just not clear to me that $ORIGIN/../lib is sufficient and that there aren’t libraries/binaries that implicitly gain other rpath entries as part of their compilation process. I’m fine with it if it is, I guess.

eddyb

comment created time in 18 days

Pull request review commentrust-lang/rust

bootstrap.py: patch RPATH on NixOS to handle the new zlib dependency.

 def fix_executable(fname):         nix_os_msg = "info: you seem to be running NixOS. Attempting to patch"         print(nix_os_msg, fname) -        try:-            interpreter = subprocess.check_output(-                ["patchelf", "--print-interpreter", fname])-            interpreter = interpreter.strip().decode(default_encoding)-        except subprocess.CalledProcessError as reason:-            print("warning: failed to call patchelf:", reason)-            return--        loader = interpreter.split("/")[-1]--        try:-            ldd_output = subprocess.check_output(-                ['ldd', '/run/current-system/sw/bin/sh'])-            ldd_output = ldd_output.strip().decode(default_encoding)-        except subprocess.CalledProcessError as reason:-            print("warning: unable to call ldd:", reason)-            return--        for line in ldd_output.splitlines():-            libname = line.split()[0]-            if libname.endswith(loader):-                loader_path = libname[:len(libname) - len(loader)]-                break+        # Only build `stage0/.nix-deps` once.+        nix_deps_dir = self.nix_deps_dir+        if not nix_deps_dir:+            nix_deps_dir = "{}/.nix-deps".format(self.bin_root())+            if not os.path.exists(nix_deps_dir):+                os.makedirs(nix_deps_dir)++            nix_deps = [+                # Needed for the path of `ld-linux.so` (via `nix-support/dynamic-linker`).+                "stdenv.cc.bintools",++                # Needed as a system dependency of `libLLVM-*.so`.+                "zlib",++                # Needed for patching ELF binaries (see doc comment above).+                "patchelf",+            ]++            # Run `nix-build` to "build" each dependency (which will likely reuse+            # the existing `/nix/store` copy, or at most download a pre-built copy).+            # Importantly, we don't rely on `nix-build` printing the `/nix/store`+            # path on stdout, but use `-o` to symlink it into `stage0/.nix-deps/$dep`,+            # ensuring garbage collection will never remove the `/nix/store` path+            # (which would break our patched binaries that hardcode those paths).+            for dep in nix_deps:+                try:+                    subprocess.check_output([+                        "nix-build", "<nixpkgs>",+                        "-A", dep,

The symlinks won’t be named as well as they are right now, but:

$ nix-build '<nixpkgs>' -A zlib -A stdenv.cc.bintools -o foo/.nixdeps
$ ls -alh foo/
total 0
drwxr-xr-x  2 nagisa users  80 Jul 17 17:43 .
drwxrwxrwt 21 root   root  560 Jul 17 17:43 ..
lrwxrwxrwx  1 nagisa users  55 Jul 17 17:43 .nixdeps -> /nix/store/ml4ipdnvc7pr07dr6i35831f7ffxny0k-zlib-1.2.11
lrwxrwxrwx  1 nagisa users  67 Jul 17 17:43 .nixdeps-2 -> /nix/store/wy6v1s2y8rvxcy98l2yvxqj280cq9wgc-binutils-wrapper-2.31.1
eddyb

comment created time in 18 days

Pull request review commentrust-lang/rust

bootstrap.py: patch RPATH on NixOS to handle the new zlib dependency.

 def fix_executable(fname):         nix_os_msg = "info: you seem to be running NixOS. Attempting to patch"         print(nix_os_msg, fname) -        try:-            interpreter = subprocess.check_output(-                ["patchelf", "--print-interpreter", fname])-            interpreter = interpreter.strip().decode(default_encoding)-        except subprocess.CalledProcessError as reason:-            print("warning: failed to call patchelf:", reason)-            return--        loader = interpreter.split("/")[-1]--        try:-            ldd_output = subprocess.check_output(-                ['ldd', '/run/current-system/sw/bin/sh'])-            ldd_output = ldd_output.strip().decode(default_encoding)-        except subprocess.CalledProcessError as reason:-            print("warning: unable to call ldd:", reason)-            return--        for line in ldd_output.splitlines():-            libname = line.split()[0]-            if libname.endswith(loader):-                loader_path = libname[:len(libname) - len(loader)]-                break+        # Only build `stage0/.nix-deps` once.+        nix_deps_dir = self.nix_deps_dir+        if not nix_deps_dir:+            nix_deps_dir = "{}/.nix-deps".format(self.bin_root())+            if not os.path.exists(nix_deps_dir):+                os.makedirs(nix_deps_dir)++            nix_deps = [+                # Needed for the path of `ld-linux.so` (via `nix-support/dynamic-linker`).+                "stdenv.cc.bintools",++                # Needed as a system dependency of `libLLVM-*.so`.+                "zlib",++                # Needed for patching ELF binaries (see doc comment above).+                "patchelf",+            ]++            # Run `nix-build` to "build" each dependency (which will likely reuse+            # the existing `/nix/store` copy, or at most download a pre-built copy).+            # Importantly, we don't rely on `nix-build` printing the `/nix/store`+            # path on stdout, but use `-o` to symlink it into `stage0/.nix-deps/$dep`,+            # ensuring garbage collection will never remove the `/nix/store` path+            # (which would break our patched binaries that hardcode those paths).+            for dep in nix_deps:+                try:+                    subprocess.check_output([+                        "nix-build", "<nixpkgs>",+                        "-A", dep,+                        "-o", "{}/{}".format(nix_deps_dir, dep),+                    ])+                except subprocess.CalledProcessError as reason:+                    print("warning: failed to call nix-build:", reason)+                    return++            self.nix_deps_dir = nix_deps_dir++        patchelf = "{}/patchelf/bin/patchelf".format(nix_deps_dir)++        if fname.endswith(".so"):+            # Dynamic library, patch RPATH to point to system dependencies.+            dylib_deps = ["zlib"]+            rpath_entries = [

Hm… shouldn't we just append the rpath?

eddyb

comment created time in 18 days

Pull request review commentrust-lang/rust

bootstrap.py: patch RPATH on NixOS to handle the new zlib dependency.

 def fix_executable(fname):         nix_os_msg = "info: you seem to be running NixOS. Attempting to patch"         print(nix_os_msg, fname) -        try:-            interpreter = subprocess.check_output(-                ["patchelf", "--print-interpreter", fname])-            interpreter = interpreter.strip().decode(default_encoding)-        except subprocess.CalledProcessError as reason:-            print("warning: failed to call patchelf:", reason)-            return--        loader = interpreter.split("/")[-1]--        try:-            ldd_output = subprocess.check_output(-                ['ldd', '/run/current-system/sw/bin/sh'])-            ldd_output = ldd_output.strip().decode(default_encoding)-        except subprocess.CalledProcessError as reason:-            print("warning: unable to call ldd:", reason)-            return--        for line in ldd_output.splitlines():-            libname = line.split()[0]-            if libname.endswith(loader):-                loader_path = libname[:len(libname) - len(loader)]-                break+        # Only build `stage0/.nix-deps` once.+        nix_deps_dir = self.nix_deps_dir+        if not nix_deps_dir:+            nix_deps_dir = "{}/.nix-deps".format(self.bin_root())+            if not os.path.exists(nix_deps_dir):+                os.makedirs(nix_deps_dir)++            nix_deps = [+                # Needed for the path of `ld-linux.so` (via `nix-support/dynamic-linker`).+                "stdenv.cc.bintools",++                # Needed as a system dependency of `libLLVM-*.so`.+                "zlib",++                # Needed for patching ELF binaries (see doc comment above).+                "patchelf",+            ]++            # Run `nix-build` to "build" each dependency (which will likely reuse+            # the existing `/nix/store` copy, or at most download a pre-built copy).+            # Importantly, we don't rely on `nix-build` printing the `/nix/store`+            # path on stdout, but use `-o` to symlink it into `stage0/.nix-deps/$dep`,+            # ensuring garbage collection will never remove the `/nix/store` path+            # (which would break our patched binaries that hardcode those paths).+            for dep in nix_deps:+                try:+                    subprocess.check_output([+                        "nix-build", "<nixpkgs>",+                        "-A", dep,

Would it make sense to try and build all of the deps with a single command?

eddyb

comment created time in 18 days

issue commentrust-lang/rust

On linux, __rust_probestack shouldn't be called on smallish stack allocations

__rust_probestack being called for anything more than 1 typical page on the platform (4k on Linux) is expected. That's because when the guard page is 1 page large (true at least for non-main threads), then not probing could allow functions to jump over the guard.

All the linked PR does is not manually allocating said guard page on the main thread. I’m not sure what the relation between it and probing functionality is, other than the fact that both involve stacks and their guard pages.

KyleSiefring

comment created time in 19 days

issue commentnagisa/msi-rgb

MSI MEG x399 Creation

I’m not actively developing this project past the point it currently is in. Mostly because I do not have access to the hardware other than my own motherboard and am unlikely to purchase any consumer-grade computer hardware made by MSI in the future.

scott-hyperdrive

comment created time in 19 days

startedjakubadamw/arbitrary-model-tests

started time in 19 days

pull request commentrust-lang/rust

improve DiscriminantKind handling

@bors r+

lcnr

comment created time in 20 days

pull request commentrust-lang/compiler-builtins

Improve `__clzsi2` performance

Yeah, I think its fine to conditionally switch between implementations in compiler-builtins. Its a library that's used everywhere, after all.

AaronKutch

comment created time in 20 days

Pull request review commentrust-lang/rust

improve DiscriminantKind handling

 language_item_table! {     CloneTraitLangItem,          "clone",              clone_trait,             Target::Trait;     SyncTraitLangItem,           "sync",               sync_trait,              Target::Trait;     DiscriminantKindTraitLangItem,"discriminant_kind", discriminant_kind_trait, Target::Trait;+    DiscrKindDiscrLangItem,      "discr_kind_discr",   discr_kind_discr,        Target::AssocTy;

Maybe just discriminant_type?

lcnr

comment created time in 21 days

pull request commentrust-lang/compiler-builtins

Improve `__clzsi2` performance

I did some llvm-mca spelunking:

uOps Per Cycle IPC Block RThroughput
OLD x86_64 znver1 3.05 3.05 9.3
NEW x86_64 znver1 1.48 1.48 10.0
OLD Apple-A13 1.58 1.58 7.8
NEW Apple-A13 1.10 1.10 11.5
OLD Cortex-A73 1.66 1.66 15.0
NEW Cortex-A73 1.22 1.22 14.5

(lower Block RThroughput is better and is probably the best indication of real-world performance you can expect)

With the caveat that this implementation is not relevant on superscalar CPUs which is the only thing llvm-mca, apparently, supports, the old version is better in 2 out of 3 cases.

Cursory look suggests that both the new and old implementation are both pretty good about their register dependencies and don’t spend much time overall waiting on anything, just that LLVM ends up just generating fewer instructions on targets that happen to have a "conditional set/move" kind of instructions. For tested targets the assembly also ends up non-branchy with either new and old implementations, which makes sense since the generated code is using conditional moves.

For example here’s assembly for MIPS:

<details> <summary>Old</summary>

	srl	$1, $4, 16
	addiu	$2, $zero, 16
	addiu	$3, $zero, 32
	movz	$2, $3, $1
	addiu	$3, $2, -8
	movz	$1, $4, $1
	srl	$4, $1, 8
	movz	$3, $2, $4
	addiu	$2, $3, -4
	movz	$4, $1, $4
	srl	$1, $4, 4
	movz	$2, $3, $1
	addiu	$3, $zero, -2
	addiu	$5, $2, -2
	movz	$1, $4, $1
	srl	$4, $1, 2
	movz	$5, $2, $4
	movz	$4, $1, $4
	negu	$1, $4
	sltiu	$2, $4, 2
	movz	$1, $3, $2
	jr	$ra
	addu	$2, $1, $5

</details>

<details> <summary>New</summary>

	ori	$1, $zero, 65535
	sltu	$1, $1, $4
	sll	$1, $1, 4
	srlv	$2, $4, $1
	addiu	$3, $zero, 255
	sltu	$3, $3, $2
	sll	$3, $3, 3
	srlv	$2, $2, $3
	or	$1, $1, $3
	addiu	$3, $zero, 32
	addiu	$4, $zero, 1
	addiu	$5, $zero, 3
	addiu	$6, $zero, 15
	sltu	$6, $6, $2
	sll	$6, $6, 2
	srlv	$2, $2, $6
	sltu	$5, $5, $2
	sll	$5, $5, 1
	srlv	$2, $2, $5
	sltu	$4, $4, $2
	srlv	$2, $2, $4
	or	$1, $1, $6
	or	$1, $1, $5
	or	$1, $1, $4
	addu	$1, $1, $2
	jr	$ra
	subu	$2, $3, $1

</details>

So overall conclusion at least from me is that its looking like if there's any improvement with the new implementation, its probably going to be limited to RISC-V...

(though also remember that targets like x86_64 and aarch64 have no use for this builtin as they have a native instruction)

AaronKutch

comment created time in 21 days

pull request commentrust-lang/compiler-builtins

Improve `__clzsi2` performance

There's also examples of benchmarks on #365, but the general idea is that if this is done for performance reasons it would be good to confirm that the performance did indeed improve.

IMO looking at something like llvm-mca output for cases like this gives significantly more information and also, in my opinion, is a more rigorous way to evaluate performance of functions in compiler-builtins. So, while I think benchmarks would be cool, I’d also be fine with accepting with with just an evaluation of llvm-mca or similar function latency/throughput analysis for targets that might use this function.

AaronKutch

comment created time in 21 days

Pull request review commentrust-lang/compiler-builtins

Improve `__clzsi2` performance

-extern crate compiler_builtins;+use rand::random;  use compiler_builtins::int::__clzsi2;  #[test] fn __clzsi2_test() {-    let mut i: usize = core::usize::MAX;-    // Check all values above 0-    while i > 0 {-        assert_eq!(__clzsi2(i) as u32, i.leading_zeros());-        i >>= 1;-    }-    // check 0 also-    i = 0;-    assert_eq!(__clzsi2(i) as u32, i.leading_zeros());-    // double check for bit patterns that aren't just solid 1s-    i = 1;-    for _ in 0..63 {-        assert_eq!(__clzsi2(i) as u32, i.leading_zeros());-        i <<= 2;-        i += 1;+    // binary fuzzer+    let mut x = 0usize;+    let mut ones: usize;+    // creates a mask for indexing the bits of the type+    let bit_indexing_mask = usize::MAX.count_ones() - 1;+    for _ in 0..1000 {+        for _ in 0..4 {+            let r0: u32 = bit_indexing_mask & random::<u32>();+            ones = !0 >> r0;+            let r1: u32 = bit_indexing_mask & random::<u32>();+            let mask = ones.rotate_left(r1);+            match (random(), random()) {

I wonder if using something like https://docs.rs/quickcheck/0.9.2/quickcheck/ something we can consider for tests?

AaronKutch

comment created time in 21 days

Pull request review commentrust-lang/compiler-builtins

Greatly improve division performance for u128 and other cases

+// TODO: when `unsafe_block_in_unsafe_fn` is stabilized, remove this+#![allow(unused_unsafe)]++//! This `specialized_div_rem` module is from version 0.4.0 of the `specialized-div-rem`+//! crate. The build system cannot handle including it as a regular dependency, so we have+//! pasted `asymmetric.rs`, `binary_long.rs`, `delegate.rs`, `norm_shift.rs`, and+//! `trifecta.rs` here without modification. Note that `for` loops with ranges are not+//! used in this module, since unoptimized compilation may generate references to `memcpy`.

Maybe I should just let go and say that if any future contributor wants to break this dependency they can. I don't know if I will ever touch my specialized-div-rem crate again.

If the very good implementation is made available in compiler-builtins, the external crate kind of loses its purpose, except maybe as a place to experiment on further improvements, so that would make sense to me.

If we end up allowing the code in compiler-builtins to eventually diverge from what’s in the crate, then I would probably take this approach:

  1. Reference the crate where the code comes from in e.g. mod.rs once, as you suggested;
  2. Add yourself to the list of authors in Cargo.toml if you want your contribution to be named/acknowledged;
  3. Land the code as is without caring much about the minor stylistic nits I made.

After the PR lands we can then clean up the code to match the compiler-builtins conventions better, without needing to worry about problems that the divergence would pose, as the codebases would effectively be entirely independent from the landing point.

AaronKutch

comment created time in 21 days

Pull request review commentrust-lang/compiler-builtins

Greatly improve division performance for u128 and other cases

-use int::Int;+use int::specialized_div_rem::*; -trait Div: Int {-    /// Returns `a / b`-    fn div(self, other: Self) -> Self {-        let s_a = self >> (Self::BITS - 1);-        let s_b = other >> (Self::BITS - 1);-        // NOTE it's OK to overflow here because of the `.unsigned()` below.-        // This whole operation is computing the absolute value of the inputs-        // So some overflow will happen when dealing with e.g. `i64::MIN`-        // where the absolute value is `(-i64::MIN) as u64`-        let a = (self ^ s_a).wrapping_sub(s_a);-        let b = (other ^ s_b).wrapping_sub(s_b);-        let s = s_a ^ s_b;--        let r = a.unsigned().aborting_div(b.unsigned());-        (Self::from_unsigned(r) ^ s) - s-    }-}--impl Div for i32 {}-impl Div for i64 {}-impl Div for i128 {}--trait Mod: Int {-    /// Returns `a % b`-    fn mod_(self, other: Self) -> Self {-        let s = other >> (Self::BITS - 1);-        // NOTE(wrapping_sub) see comment in the `div`-        let b = (other ^ s).wrapping_sub(s);-        let s = self >> (Self::BITS - 1);-        let a = (self ^ s).wrapping_sub(s);--        let r = a.unsigned().aborting_rem(b.unsigned());-        (Self::from_unsigned(r) ^ s) - s-    }-}--impl Mod for i32 {}-impl Mod for i64 {}-impl Mod for i128 {}--trait Divmod: Int {-    /// Returns `a / b` and sets `*rem = n % d`-    fn divmod<F>(self, other: Self, rem: &mut Self, div: F) -> Self-    where-        F: Fn(Self, Self) -> Self,-    {-        let r = div(self, other);-        // NOTE won't overflow because it's using the result from the-        // previous division-        *rem = self - r.wrapping_mul(other);-        r-    }-}--impl Divmod for i32 {}-impl Divmod for i64 {}+// NOTE: there are aborts inside the specialized_div_rem functions if division by 0+// is encountered, however these should be unreachable and optimized away unless+// uses of `std/core::intrinsics::unchecked_div/rem` do not have a 0 check in front+// of them.

This is only true (ignoring factual mistakes in the quote) for typical crate use mechanisms. That is, when crates are depended on in Cargo.toml or otherwise specified as an --extern crate to rustc. compiler-builtins is a somewhat special case and is linked to as if it was a system/FFI library, rather than like a typical Rust crate.

Most of the common advice about many of the things work will not apply to the compiler-builtins crate just because how special it is in various ways. compiler-builtins' #[inline]s not working outside compiler-builtins is one of those exceptions.

While lto might still be able to look at cross-library relationship and maybe run some optimizations, I wouldn’t hold my breath on that being true, either.

AaronKutch

comment created time in 21 days

Pull request review commentrust-lang/compiler-builtins

Greatly improve division performance for u128 and other cases

-use int::Int;+use int::specialized_div_rem::*; -trait Div: Int {-    /// Returns `a / b`-    fn div(self, other: Self) -> Self {-        let s_a = self >> (Self::BITS - 1);-        let s_b = other >> (Self::BITS - 1);-        // NOTE it's OK to overflow here because of the `.unsigned()` below.-        // This whole operation is computing the absolute value of the inputs-        // So some overflow will happen when dealing with e.g. `i64::MIN`-        // where the absolute value is `(-i64::MIN) as u64`-        let a = (self ^ s_a).wrapping_sub(s_a);-        let b = (other ^ s_b).wrapping_sub(s_b);-        let s = s_a ^ s_b;--        let r = a.unsigned().aborting_div(b.unsigned());-        (Self::from_unsigned(r) ^ s) - s-    }-}--impl Div for i32 {}-impl Div for i64 {}-impl Div for i128 {}--trait Mod: Int {-    /// Returns `a % b`-    fn mod_(self, other: Self) -> Self {-        let s = other >> (Self::BITS - 1);-        // NOTE(wrapping_sub) see comment in the `div`-        let b = (other ^ s).wrapping_sub(s);-        let s = self >> (Self::BITS - 1);-        let a = (self ^ s).wrapping_sub(s);--        let r = a.unsigned().aborting_rem(b.unsigned());-        (Self::from_unsigned(r) ^ s) - s-    }-}--impl Mod for i32 {}-impl Mod for i64 {}-impl Mod for i128 {}--trait Divmod: Int {-    /// Returns `a / b` and sets `*rem = n % d`-    fn divmod<F>(self, other: Self, rem: &mut Self, div: F) -> Self-    where-        F: Fn(Self, Self) -> Self,-    {-        let r = div(self, other);-        // NOTE won't overflow because it's using the result from the-        // previous division-        *rem = self - r.wrapping_mul(other);-        r-    }-}--impl Divmod for i32 {}-impl Divmod for i64 {}+// NOTE: there are aborts inside the specialized_div_rem functions if division by 0+// is encountered, however these should be unreachable and optimized away unless+// uses of `std/core::intrinsics::unchecked_div/rem` do not have a 0 check in front+// of them.

I believe any #[inline] annotations as far as compiler-builtins are concerned will only be relevant within this crate as it is not a traditional crate. AFAIK #[inline] will not have any effect to the code a typical Rust user writes.

AaronKutch

comment created time in 21 days

Pull request review commentrust-lang/compiler-builtins

Greatly improve division performance for u128 and other cases

+/// Creates unsigned and signed division functions optimized for division of integers with bitwidths+/// larger than the largest hardware integer division supported. These functions use large radix+/// division algorithms that require both fast division and very fast widening multiplication on the+/// target microarchitecture. Otherwise, `impl_delegate` should be used instead.+#[macro_export]+macro_rules! impl_trifecta {+    (+        $unsigned_name:ident, // name of the unsigned division function+        $signed_name:ident, // name of the signed division function+        $zero_div_fn:ident, // function called when division by zero is attempted+        $half_division:ident, // function for division of a $uX by a $uX+        $n_h:expr, // the number of bits in $iH or $uH+        $uH:ident, // unsigned integer with half the bit width of $uX+        $uX:ident, // unsigned integer with half the bit width of $uD+        $uD:ident, // unsigned integer type for the inputs and outputs of `$unsigned_name`+        $iD:ident, // signed integer type for the inputs and outputs of `$signed_name`+        $($unsigned_attr:meta),*; // attributes for the unsigned function+        $($signed_attr:meta),* // attributes for the signed function+    ) => {+        /// Computes the quotient and remainder of `duo` divided by `div` and returns them as a+        /// tuple.+        ///+        /// # Panics+        ///+        /// When attempting to divide by zero, this function will panic.+        $(+            #[$unsigned_attr]+        )*+        pub fn $unsigned_name(duo: $uD, div: $uD) -> ($uD, $uD) {+            // This is called the trifecta algorithm because it uses three main algorithms: short+            // division for small divisors, the two possibility algorithm for large divisors, and an+            // undersubtracting long division algorithm for intermediate cases.++            // This replicates `carrying_mul` (rust-lang rfc #2417). LLVM correctly optimizes this+            // to use a widening multiply to 128 bits on the relevant architectures.+            #[inline]+            fn carrying_mul(lhs: $uX, rhs: $uX) -> ($uX, $uX) {+                let tmp = (lhs as $uD).wrapping_mul(rhs as $uD);+                (tmp as $uX, (tmp >> ($n_h * 2)) as $uX)+            }+            #[inline]+            fn carrying_mul_add(lhs: $uX, mul: $uX, add: $uX) -> ($uX, $uX) {+                let tmp = (lhs as $uD).wrapping_mul(mul as $uD).wrapping_add(add as $uD);+                (tmp as $uX, (tmp >> ($n_h * 2)) as $uX)+            }++            // the number of bits in a $uX+            let n = $n_h * 2;++            if div == 0 {+                $zero_div_fn()+            }++            // Trying to use a normalization shift function will cause inelegancies in the code and+            // inefficiencies for architectures with a native count leading zeros instruction. The+            // undersubtracting algorithm needs both values (keeping the original `div_lz` but+            // updating `duo_lz` multiple times), so we assume hardware support for fast+            // `leading_zeros` calculation.+            let div_lz = div.leading_zeros();+            let mut duo_lz = duo.leading_zeros();++            // the possible ranges of `duo` and `div` at this point:+            // `0 <= duo < 2^n_d`+            // `1 <= div < 2^n_d`++            // quotient is 0 or 1 branch+            if div_lz <= duo_lz {+                // The quotient cannot be more than 1. The highest set bit of `duo` needs to be at+                // least one place higher than `div` for the quotient to be more than 1.+                if duo >= div {+                    return (1, duo - div)+                } else {+                    return (0, duo)+                }+            }++            // `_sb` is the number of significant bits (from the ones place to the highest set bit)+            // `{2, 2^div_sb} <= duo < 2^n_d`+            // `1 <= div < {2^duo_sb, 2^(n_d - 1)}`+            // smaller division branch+            if duo_lz >= n {+                // `duo < 2^n` so it will fit in a $uX. `div` will also fit in a $uX (because of the+                // `div_lz <= duo_lz` branch) so no numerical error.+                let (quo, rem) = $half_division(duo as $uX, div as $uX);+                return (+                    quo as $uD,+                    rem as $uD+                )+            }++            // `{2^n, 2^div_sb} <= duo < 2^n_d`+            // `1 <= div < {2^duo_sb, 2^(n_d - 1)}`+            // short division branch+            if div_lz >= (n + $n_h) {+                // `1 <= div < {2^duo_sb, 2^n_h}`++                // It is barely possible to improve the performance of this by calculating the+                // reciprocal and removing one `$half_division`, but only if the CPU can do fast+                // multiplications in parallel. Other reciprocal based methods can remove two+                // `$half_division`s, but have multiplications that cannot be done in parallel and+                // reduce performance. I have decided to use this trivial short division method and+                // rely on the CPU having quick divisions.++                let duo_hi = (duo >> n) as $uX;+                let div_0 = div as $uH as $uX;+                let (quo_hi, rem_3) = $half_division(duo_hi, div_0);++                let duo_mid =+                    ((duo >> $n_h) as $uH as $uX)+                    | (rem_3 << $n_h);+                let (quo_1, rem_2) = $half_division(duo_mid, div_0);++                let duo_lo =+                    (duo as $uH as $uX)+                    | (rem_2 << $n_h);+                let (quo_0, rem_1) = $half_division(duo_lo, div_0);++                return (+                    (quo_0 as $uD)+                    | ((quo_1 as $uD) << $n_h)+                    | ((quo_hi as $uD) << n),+                    rem_1 as $uD+                )+            }++            // relative leading significant bits, cannot overflow because of above branches+            let lz_diff = div_lz - duo_lz;++            // `{2^n, 2^div_sb} <= duo < 2^n_d`+            // `2^n_h <= div < {2^duo_sb, 2^(n_d - 1)}`+            // `mul` or `mul - 1` branch+            if lz_diff < $n_h {+                // Two possibility division algorithm++                // The most significant bits of `duo` and `div` are within `$n_h` bits of each+                // other. If we take the `n` most significant bits of `duo` and divide them by the+                // corresponding bits in `div`, it produces a quotient value `quo`. It happens that+                // `quo` or `quo - 1` will always be the correct quotient for the whole number. In+                // other words, the bits less significant than the `n` most significant bits of+                // `duo` and `div` can only influence the quotient to be one of two values.+                // Because there are only two possibilities, there only needs to be one `$uH` sized+                // division, a `$uH` by `$uD` multiplication, and only one branch with a few simple+                // operations.+                //+                // Proof that the true quotient can only be `quo` or `quo - 1`.+                // All `/` operators here are floored divisions.+                //+                // `shift` is the number of bits not in the higher `n` significant bits of `duo`.+                // (definitions)+                // 0. shift = n - duo_lz+                // 1. duo_sig_n == duo / 2^shift+                // 2. div_sig_n == div / 2^shift+                // 3. quo == duo_sig_n / div_sig_n+                //+                //+                // We are trying to find the true quotient, `true_quo`.+                // 4. true_quo = duo / div. (definition)+                //+                // This is true because of the bits that are cut off during the bit shift.+                // 5. duo_sig_n * 2^shift <= duo < (duo_sig_n + 1) * 2^shift.+                // 6. div_sig_n * 2^shift <= div < (div_sig_n + 1) * 2^shift.+                //+                // Dividing each bound of (5) by each bound of (6) gives 4 possibilities for what+                // `true_quo == duo / div` is bounded by:+                // (duo_sig_n * 2^shift) / (div_sig_n * 2^shift)+                // (duo_sig_n * 2^shift) / ((div_sig_n + 1) * 2^shift)+                // ((duo_sig_n + 1) * 2^shift) / (div_sig_n * 2^shift)+                // ((duo_sig_n + 1) * 2^shift) / ((div_sig_n + 1) * 2^shift)+                //+                // Simplifying each of these four:+                // duo_sig_n / div_sig_n+                // duo_sig_n / (div_sig_n + 1)+                // (duo_sig_n + 1) / div_sig_n+                // (duo_sig_n + 1) / (div_sig_n + 1)+                //+                // Taking the smallest and the largest of these as the low and high bounds+                // and replacing `duo / div` with `true_quo`:+                // 7. duo_sig_n / (div_sig_n + 1) <= true_quo < (duo_sig_n + 1) / div_sig_n+                //+                // The `lz_diff < n_h` conditional on this branch makes sure that `div_sig_n` is at+                // least `2^n_h`, and the `div_lz <= duo_lz` branch makes sure that the highest bit+                // of `div_sig_n` is not the `2^(n - 1)` bit.+                // 8. `2^(n - 1) <= duo_sig_n < 2^n`+                // 9. `2^n_h <= div_sig_n < 2^(n - 1)`+                //+                // We want to prove that either+                // `(duo_sig_n + 1) / div_sig_n == duo_sig_n / (div_sig_n + 1)` or that+                // `(duo_sig_n + 1) / div_sig_n == duo_sig_n / (div_sig_n + 1) + 1`.+                //+                // We also want to prove that `quo` is one of these:+                // `duo_sig_n / div_sig_n == duo_sig_n / (div_sig_n + 1)` or+                // `duo_sig_n / div_sig_n == (duo_sig_n + 1) / div_sig_n`.+                //+                // When 1 is added to the numerator of `duo_sig_n / div_sig_n` to produce+                // `(duo_sig_n + 1) / div_sig_n`, it is not possible that the value increases by+                // more than 1 with floored integer arithmetic and `div_sig_n != 0`. Consider+                // `x/y + 1 < (x + 1)/y` <=> `x/y + 1 < x/y + 1/y` <=> `1 < 1/y` <=> `y < 1`.+                // `div_sig_n` is a nonzero integer. Thus,+                // 10. `duo_sig_n / div_sig_n == (duo_sig_n + 1) / div_sig_n` or+                //     `(duo_sig_n / div_sig_n) + 1 == (duo_sig_n + 1) / div_sig_n.+                //+                // When 1 is added to the denominator of `duo_sig_n / div_sig_n` to produce+                // `duo_sig_n / (div_sig_n + 1)`, it is not possible that the value decreases by+                // more than 1 with the bounds (8) and (9). Consider `x/y - 1 <= x/(y + 1)` <=>+                // `(x - y)/y < x/(y + 1)` <=> `(y + 1)*(x - y) < x*y` <=> `x*y - y*y + x - y < x*y`+                // <=> `x < y*y + y`. The smallest value of `div_sig_n` is `2^n_h` and the largest+                // value of `duo_sig_n` is `2^n - 1`. Substituting reveals `2^n - 1 < 2^n + 2^n_h`.+                // Thus,+                // 11. `duo_sig_n / div_sig_n == duo_sig_n / (div_sig_n + 1)` or+                //     `(duo_sig_n / div_sig_n) - 1` == duo_sig_n / (div_sig_n + 1)`+                //+                // Combining both (10) and (11), we know that+                // `quo - 1 <= duo_sig_n / (div_sig_n + 1) <= true_quo+                // < (duo_sig_n + 1) / div_sig_n <= quo + 1` and therefore:+                // 12. quo - 1 <= true_quo < quo + 1+                //+                // In a lot of division algorithms using smaller divisions to construct a larger+                // division, we often encounter a situation where the approximate `quo` value+                // calculated from a smaller division is multiple increments away from the true+                // `quo` value. In those algorithms, multiple correction steps have to be applied.+                // Those correction steps may need more multiplications to test `duo - (quo*div)`+                // again. Because of the fact that our `quo` can only be one of two values, we can+                // see if `duo - (quo*div)` overflows. If it did overflow, then we know that we have+                // the larger of the two values (since the true quotient is unique, and any larger+                // quotient will cause `duo - (quo*div)` to be negative). Also because there is only+                // one correction needed, we can calculate the remainder `duo - (true_quo*div) ==+                // duo - ((quo - 1)*div) == duo - (quo*div - div) == duo + div - quo*div`.+                // If `duo - (quo*div)` did not overflow, then we have the correct answer.+                let shift = n - duo_lz;+                let duo_sig_n = (duo >> shift) as $uX;+                let div_sig_n = (div >> shift) as $uX;+                let quo = $half_division(duo_sig_n, div_sig_n).0;++                // The larger `quo` value can overflow `$uD` in the right circumstances. This is a+                // manual `carrying_mul_add` with overflow checking.+                let div_lo = div as $uX;+                let div_hi = (div >> n) as $uX;+                let (tmp_lo, carry) = carrying_mul(quo, div_lo);+                let (tmp_hi, overflow) = carrying_mul_add(quo, div_hi, carry);+                let tmp = (tmp_lo as $uD) | ((tmp_hi as $uD) << n);+                if (overflow != 0) || (duo < tmp) {+                    return (+                        (quo - 1) as $uD,+                        // Both the addition and subtraction can overflow, but when combined end up+                        // as a correct positive number.+                        duo.wrapping_add(div).wrapping_sub(tmp)+                    )+                } else {+                    return (+                        quo as $uD,+                        duo - tmp+                    )+                }+            }++            // Undersubtracting long division algorithm.+            // Instead of clearing a minimum of 1 bit from `duo` per iteration via binary long+            // division, `n_h - 1` bits are cleared per iteration with this algorithm. It is a more+            // complicated version of regular long division. Most integer division algorithms tend+            // to guess a part of the quotient, and may have a larger quotient than the true+            // quotient (which when multiplied by `div` will "oversubtract" the original dividend).+            // They then check if the quotient was in fact too large and then have to correct it.+            // This long division algorithm has been carefully constructed to always underguess the+            // quotient by slim margins. This allows different subalgorithms to be blindly jumped to+            // without needing an extra correction step.+            //+            // The only problem is that this subalgorithm will not work for many ranges of `duo` and+            // `div`. Fortunately, the short division, two possibility algorithm, and other simple+            // cases happen to exactly fill these gaps.+            //+            // For an example, consider the division of 76543210 by 213 and assume that `n_h` is+            // equal to two decimal digits (note: we are working with base 10 here for readability).+            // The first `sig_n_h` part of the divisor (21) is taken and is incremented by 1 to+            // prevent oversubtraction. We also record the number of extra places not a part of+            // the `sig_n` or `sig_n_h` parts.+            //+            // sig_n_h == 2 digits, sig_n == 4 digits+            //+            // vvvv     <- `duo_sig_n`+            // 76543210+            //     ^^^^ <- extra places in duo, `duo_extra == 4`+            //+            // vv  <- `div_sig_n_h`+            // 213+            //   ^ <- extra places in div, `div_extra == 1`+            //+            // The difference in extra places, `duo_extra - div_extra == extra_shl == 3`, is used+            // for shifting partial sums in the long division.+            //+            // In the first step, the first `sig_n` part of duo (7654) is divided by+            // `div_sig_n_h_add_1` (22), which results in a partial quotient of 347. This is+            // multiplied by the whole divisor to make 73911, which is shifted left by `extra_shl`+            // and subtracted from duo. The partial quotient is also shifted left by `extra_shl` to+            // be added to `quo`.+            //+            //    347+            //  ________+            // |76543210+            // -73911+            //   2632210+            //+            // Variables dependent on duo have to be updated:+            //+            // vvvv    <- `duo_sig_n == 2632`+            // 2632210+            //     ^^^ <- `duo_extra == 3`+            //+            // `extra_shl == 2`+            //+            // Two more steps are taken after this and then duo fits into `n` bits, and then a final+            // normal long division step is made. The partial quotients are all progressively added+            // to each other in the actual algorithm, but here I have left them all in a tower that+            // can be added together to produce the quotient, 359357.+            //+            //        14+            //       443+            //     119+            //    347+            //  ________+            // |76543210+            // -73911+            //   2632210+            //  -25347+            //     97510+            //    -94359+            //      3151+            //     -2982+            //       169 <- the remainder++            let mut duo = duo;+            let mut quo: $uD = 0;++            // The number of lesser significant bits not a part of `div_sig_n_h`+            let div_extra = (n + $n_h) - div_lz;++            // The most significant `n_h` bits of div+            let div_sig_n_h = (div >> div_extra) as $uH;++            // This needs to be a `$uX` in case of overflow from the increment+            let div_sig_n_h_add1 = (div_sig_n_h as $uX) + 1;++            // `{2^n, 2^(div_sb + n_h)} <= duo < 2^n_d`+            // `2^n_h <= div < {2^(duo_sb - n_h), 2^n}`+            loop {+                // The number of lesser significant bits not a part of `duo_sig_n`+                let duo_extra = n - duo_lz;++                // The most significant `n` bits of `duo`+                let duo_sig_n = (duo >> duo_extra) as $uX;++                // the two possibility algorithm requires that the difference between msbs is less+                // than `n_h`, so the comparison is `<=` here.+                if div_extra <= duo_extra {+                    // Undersubtracting long division step+                    let quo_part = $half_division(duo_sig_n, div_sig_n_h_add1).0 as $uD;+                    let extra_shl = duo_extra - div_extra;++                    // Addition to the quotient.+                    quo += (quo_part << extra_shl);++                    // Subtraction from `duo`. At least `n_h - 1` bits are cleared from `duo` here.+                    duo -= (div.wrapping_mul(quo_part) << extra_shl);+                } else {+                    // Two possibility algorithm+                    let shift = n - duo_lz;+                    let duo_sig_n = (duo >> shift) as $uX;+                    let div_sig_n = (div >> shift) as $uX;+                    let quo_part = $half_division(duo_sig_n, div_sig_n).0;+                    let div_lo = div as $uX;+                    let div_hi = (div >> n) as $uX;++                    let (tmp_lo, carry) = carrying_mul(quo_part, div_lo);+                    // The undersubtracting long division algorithm has already run once, so+                    // overflow beyond `$uD` bits is not possible here+                    let (tmp_hi, _) = carrying_mul_add(quo_part, div_hi, carry);+                    let tmp = (tmp_lo as $uD) | ((tmp_hi as $uD) << n);++                    if duo < tmp {+                        return (+                            quo + ((quo_part - 1) as $uD),+                            duo.wrapping_add(div).wrapping_sub(tmp)+                        )+                    } else {+                        return (+                            quo + (quo_part as $uD),+                            duo - tmp+                        )+                    }+                }++                duo_lz = duo.leading_zeros();++                if div_lz <= duo_lz {+                    // quotient can have 0 or 1 added to it+                    if div <= duo {+                        return (+                            quo + 1,+                            duo - div+                        )+                    } else {+                        return (+                            quo,+                            duo+                        )+                    }+                }++                // This can only happen if `div_sd < n` (because of previous "quo = 0 or 1"+                // branches), but it is not worth it to unroll further.+                if n <= duo_lz {+                    // simple division and addition+                    let tmp = $half_division(duo as $uX, div as $uX);+                    return (+                        quo + (tmp.0 as $uD),+                        tmp.1 as $uD+                    )+                }+            }+        }++        /// Computes the quotient and remainder of `duo` divided by `div` and returns them as a+        /// tuple.+        ///+        /// # Panics+        ///+        /// When attempting to divide by zero, this function will panic.+        $(+            #[$signed_attr]+        )*+        pub fn $signed_name(duo: $iD, div: $iD) -> ($iD, $iD) {

This is something that LLVM does as an optimisation in user code. In particular it will replace

%3 = sdiv iN %0, %1 # signed division

with

%7 = udiv iN %0, %1 # unsigned division

if it can tell %0 and %1 are both non-negative. The implementation of division in compiler-builtins should not have any effect for this optimizer behaviour.


I don’t particularly care much about what the implementation of this function is currently, they should all be pretty much equivalent in terms of speed. Just found it curious that it is repeated across different (trifecta/asymmetric/etc) strategies.

AaronKutch

comment created time in 21 days

more