Ask questionsLTO triggers apparent miscompilation on Windows 10 x64

I tried this code:


name = "bmp-minimal"
version = "0.1.0"
authors = ["Malloc Voidstar <>"]
edition = "2018"


lto = true
codegen-units = 1

use std::io::Write;

fn main() {
    let mut out = Vec::with_capacity(51200);
    for val in 0u8..=255 {
        out.write_all(&[val, val, val, 0]).unwrap();

    for _ in (0..1).rev() {
        for _ in 0..1 {
    println!("{} bytes written", out.len());

(extremely minimized form of grayscale BMP encoding from image crate)

Command: cargo run --release

I expected to see this happen: 1025 bytes written

Instead, this happened:

memory allocation of 26843545600 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)


rustc --version --verbose:

rustc 1.45.0 (5c1f21c3b 2020-07-13)
binary: rustc
commit-hash: 5c1f21c3b82297671ad3ae1e8c942d2ca92e84f2
commit-date: 2020-07-13
host: x86_64-pc-windows-msvc
release: 1.45.0
LLVM version: 10.0

It appears to work fine on ARM Linux in my limited testing.

This minimized form works/doesn't work on the following versions:

Rust version Works?
1.45 (LLVM 10.0) No
1.44.1 No
1.43.1 Yes
1.42.0 No
1.41.1 No
1.40.0 No
1.39.0 No
1.38.0 (LLVM 9.0) No
1.37.0 Yes
1.36.0 Yes
1.35.0 Yes
1.34.0 Yes

If the LTO and codegen-units settings are removed it works on all above versions.

I originally noticed this when a project that worked on 1.44.1 suddenly started failing with huge memory allocations on 1.45, so this table specifically applies to this repro, not whatever the underlying bug is.


Answer questions luqmana

Here's the reduced IR of the minimal rust code here:, a bit simplified and with some renaming

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.26.28806"

define i32 @main() unnamed_addr {
  %proc_heap = tail call i8* @GetProcessHeap()
  %vec_ptr0 = tail call i8* @HeapAlloc(i8* %proc_heap, i32 0, i64 8)
  br label %loop_hdr
loop_hdr:                                         ; preds = %loop_write, %entry
  %vec_ptr1 = phi i8* [ %vec_ptr0, %entry ], [ %vec_ptr2, %loop_write ]
  %vec_cap1 = phi i64 [ 8, %entry ], [ %new_vec_cap1, %loop_write ]
  %vec_len1 = phi i64 [ 0, %entry ], [ %new_vec_len1, %loop_write ]
  %val = phi i8 [ 0, %entry ], [ %1, %loop_write ]
  %0 = icmp eq i8 %val, 255
  %1 = add nuw i8 %val, 1
  %2 = sub i64 %vec_cap1, %vec_len1
  %3 = icmp ult i64 %2, 2
  br i1 %3, label %check_realloc, label %pre_loop_write
check_realloc:                   ; preds = %loop_hdr
  %len_plus_two_checked = tail call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %vec_len1, i64 2)
  %checked_res = extractvalue { i64, i1 } %len_plus_two_checked, 1
  br i1 %checked_res, label %exit, label %realloc
realloc:                         ; preds = %check_realloc
  %len_plus_two = extractvalue { i64, i1 } %len_plus_two_checked, 0
  %cap_times_two = shl i64 %vec_cap1, 1
  %4 = icmp ugt i64 %cap_times_two, %len_plus_two
  %new_cap = select i1 %4, i64 %cap_times_two, i64 %len_plus_two
  %new_vec_ptr = tail call i8* @HeapReAlloc(i8* %proc_heap, i32 0, i8* %vec_ptr1, i64 %new_cap)
  br label %loop_write
pre_loop_write:                                             ; preds = %loop_hdr
  %new_vec_len2 = add i64 %vec_len1, 2
  br label %loop_write
loop_write:                                       ; preds = %realloc, %pre_loop_write
  %new_vec_len1 = phi i64 [ %new_vec_len2, %pre_loop_write ], [ %len_plus_two, %realloc ]
  %vec_ptr2 = phi i8* [ %vec_ptr1, %pre_loop_write ], [ %new_vec_ptr, %realloc ]
  %new_vec_cap1 = phi i64 [ %vec_cap1, %pre_loop_write ], [ %new_cap, %realloc ]
  %vec_store_slot1 = getelementptr inbounds i8, i8* %vec_ptr2, i64 %vec_len1
  store i8 %val, i8* %vec_store_slot1, align 1
  %vec_store_slot2 = getelementptr inbounds i8, i8* %vec_store_slot1, i64 1
  store i8 %val, i8* %vec_store_slot2, align 1
  br i1 %0, label %exit, label %loop_hdr
exit:                                             ; preds = %loop_write, %check_realloc
  %exit_code = phi i32 [ 0, %loop_write ], [ 1, %check_realloc ]
  ret i32 %exit_code
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)
declare i8* @GetProcessHeap() unnamed_addr
declare i8* @HeapAlloc(i8*, i32, i64) unnamed_addr
declare i8* @HeapReAlloc(i8*, i32, i8*, i64) unnamed_addr

Running this through clang -O (I tried v 10.0.0), the resulting binary will continue to allocate memory in an infinite loop.

Running it through opt -loop-reduce will return a slightly different IR:

  %0 = add nuw i8 %val, 1
  %4 = icmp eq i8 %0, 0
  br i1 %4, label %exit, label %loop_hdr

This is the same issue @comex mentioned above. LLVM should be clearing the no wrap flag on the add there after transforming to a post-increment check but it doesn't. Then later it just gets rid of the exit condition altogether since it believes that the add will never wrap, it can't possibly be equal to 0.

But that doesn't explain why the flag doesn't seem to work post 1.44, looking at the IR (with -cvp-dont-add-nowrap-flags=true):


%_5.0.i.i.i = add i8 %iter.sroa.0.060, 1


%7 = add nuw i8 %iter.sroa.0.063, 1

So in 1.45+ we're not relying on LLVM's CVP to determine that the add has no unsigned wrap but some other mechanism. But after looking through print-after-all for a bit I'm pretty sure this new line in Range::next is the reason:

That boils down to an unchecked_add which in LLVM becomes LLVMBuildNUWAdd in this case.

Since that code was introduced in that lines up with the behaviour we see in 1.45+.

Either way, still an LLVM bug that's just triggered more now.


Related questions

Spurious NaNs produced by trig functions with valid inputs on Windows GNU toolchains hot 3
using 'cargo install xsv' on windows 10 triggers rustc internal error hot 2
under latest MinGW, cannot link with C code using stdout hot 2
Archive all nightlies hot 2
chain() make collect very slow hot 1
if/while Some(n) = &mut foo sugar will leak a temporary mutable borrow to current scope in particular situation hot 1
build an empty project failed (undefined reference to `__onexitbegin') hot 1
Invalid collision with TryFrom implementation? hot 1
Crater runs for Rust 1.38.0 hot 1
Spurious NaNs produced by trig functions with valid inputs on Windows GNU toolchains hot 1
Building LLVM with Clang fails hot 1
Internal compiler error: can't buffer lints after HIR lowering hot 1
E0373 help suggests `move async` but the correct syntax is `async move` hot 1
Tracking issue for `Option::contains` and `Result::contains` hot 1
async fn + rustfmt don't "just work" inside of RLS hot 1
Github User Rank List