Skip to content

Conversation

@the8472
Copy link
Member

@the8472 the8472 commented Sep 28, 2024

Elide temporary allocations in patterns like vec.append(slice.to_vec())

related discussion: https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/nocapture.20and.20allocation.20elimination

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 28, 2024
@the8472
Copy link
Member Author

the8472 commented Sep 28, 2024

On its own I don't expect this to do much, we also need llvm/llvm-project#110280 to get memcpy propagation.

But lets see what the perf impact is without the LLVM changes.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 28, 2024
@bors
Copy link
Collaborator

bors commented Sep 28, 2024

⌛ Trying commit 9a03f37 with merge fa341e6...

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 28, 2024
@bors
Copy link
Collaborator

bors commented Sep 29, 2024

☀️ Try build successful - checks-actions
Build commit: fa341e6 (fa341e6bb26c4367ebce0bf9e5583eb3df53c79f)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (fa341e6): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.3%, 0.9%] 5
Regressions ❌
(secondary)
0.2% [0.2%, 0.2%] 1
Improvements ✅
(primary)
-0.3% [-0.4%, -0.3%] 2
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) 0.3% [-0.4%, 0.9%] 7

Max RSS (memory usage)

Results (primary -0.1%, secondary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.5% [2.5%, 2.5%] 1
Regressions ❌
(secondary)
1.7% [1.7%, 1.7%] 1
Improvements ✅
(primary)
-2.6% [-2.6%, -2.6%] 1
Improvements ✅
(secondary)
-0.7% [-0.7%, -0.7%] 1
All ❌✅ (primary) -0.1% [-2.6%, 2.5%] 2

Cycles

Results (secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.4% [2.4%, 2.4%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.1% [-3.1%, -3.1%] 1
All ❌✅ (primary) - - 0

Binary size

Results (primary 0.0%, secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.0%] 13
Regressions ❌
(secondary)
0.0% [0.0%, 0.1%] 31
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [-0.1%, 0.0%] 14

Bootstrap: 769.092s -> 770.401s (0.17%)
Artifact size: 341.40 MiB -> 341.45 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 29, 2024
@the8472
Copy link
Member Author

the8472 commented Sep 29, 2024

As expected it doesn't do much on its own, let's wait for the LLVM change.

@the8472 the8472 added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Sep 29, 2024
@rust-log-analyzer

This comment has been minimized.

@the8472
Copy link
Member Author

the8472 commented Sep 30, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 30, 2024
@bors
Copy link
Collaborator

bors commented Sep 30, 2024

⌛ Trying commit c44e0f4 with merge fd59d69...

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 30, 2024
@bors
Copy link
Collaborator

bors commented Oct 1, 2024

☀️ Try build successful - checks-actions
Build commit: fd59d69 (fd59d692ac24ed3d88de1b531716c0c053cc7680)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (fd59d69): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.3%, 0.8%] 5
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.4% [-0.7%, -0.3%] 5
Improvements ✅
(secondary)
-0.2% [-0.3%, -0.1%] 11
All ❌✅ (primary) 0.1% [-0.7%, 0.8%] 10

Max RSS (memory usage)

Results (primary 2.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
5.1% [4.1%, 6.2%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.8% [-2.8%, -2.8%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.5% [-2.8%, 6.2%] 3

Cycles

Results (secondary -2.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.8% [-2.8%, -2.8%] 1
All ❌✅ (primary) - - 0

Binary size

Results (primary 0.0%, secondary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.5%] 43
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 35
Improvements ✅
(primary)
-0.2% [-0.5%, -0.0%] 8
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.1%] 3
All ❌✅ (primary) 0.0% [-0.5%, 0.5%] 51

Bootstrap: 770.125s -> 769.973s (-0.02%)
Artifact size: 341.42 MiB -> 341.49 MiB (0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 1, 2024
@Dylan-DPC Dylan-DPC removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 30, 2024
@the8472
Copy link
Member Author

the8472 commented Feb 9, 2025

waiting for #135763

@Dylan-DPC
Copy link
Member

#135763 is now merged so unblocking

@Dylan-DPC Dylan-DPC removed the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Feb 25, 2025
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 11, 2026
@the8472
Copy link
Member Author

the8472 commented Jan 11, 2026

r? compiler

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 11, 2026
// applies to argument place instead of function place
let allocated_pointer = AttributeKind::AllocatedPointer.create_attr(cx.llcx);
attributes::apply_to_llfn(llfn, AttributePlace::Argument(0), &[allocated_pointer]);
let attrs: &[_] = if llvm_util::get_version() >= (21, 0, 0) {
Copy link
Member

@RalfJung RalfJung Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have some sort of justification for why this is correct. This operation destroys the provenance given to it, why is it okay to consider that non-capturing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from the zulip conversation is that since the allocation ends there's nothing further that the callee could do that would be relevant to aliasing rules of this allocation.

Whether or not the allocator touches the memory afterwards (e.g. for zeroing), isn't relevant to this side of the allocator boundary.

I assume we need to tell LLVMs both things separately because they just happen to be tracked separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic just to check the obvious, adding this attribute wouldn't allow

let p = alloc();
foo(p);
dealloc(p);

to be reordered to

let p = alloc();
dealloc(p);
foo(p);

Being an allocator method already inhibits that reordering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Does not capture provenance" means "if the function call stashes the pointer somewhere, accessing that pointer after the function returns is UB". It does not limit what can be done with the pointer within the function itself.

FWIW, the C free function is marked captures(none).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we shouldn't match what C is doing then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that if GlobalAlloc is used, it may inspect the address of the pointer, so using captures(none) may not be correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let attrs: &[_] = if llvm_util::get_version() >= (21, 0, 0) {
let attrs: &[_] = if llvm_util::get_version() >= (21, 0, 0) {
// "Does not capture provenance" means "if the function call stashes the pointer somewhere,
// accessing that pointer after the function returns is UB". That is definitely the case here since
// freeing will destroy the provenance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that if GlobalAlloc is used, it may inspect the address of the pointer, so using captures(none) may not be correct.

But C allocators would also have to look at the address (e.g. comparing it to memory pools). Is the difference that those are assumed to be compiled separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we care about here are effects that are observable outside the allocator. Specifically, what I had in mind is GlobalAlloc::dealloc() doing something like print "I'm now freeing pointer 0xdeadbeef".

@the8472 the8472 force-pushed the bail-before-memcpy branch from f4a2ee3 to 468eb45 Compare January 12, 2026 01:54
@rustbot
Copy link
Collaborator

rustbot commented Jan 12, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@nnethercote
Copy link
Contributor

This is way outside my wheelhouse, but the interactions with Ralf and Nikita make it seem like it's ok...

@bors r+

@rust-bors rust-bors bot added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 14, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 14, 2026

📌 Commit 468eb45 has been approved by nnethercote

It is now in the queue for this repository.

@rust-bors rust-bors bot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 14, 2026
@rust-bors

This comment has been minimized.

@rust-bors rust-bors bot added merged-by-bors This PR was explicitly merged by bors. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 14, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 14, 2026

☀️ Test successful - CI
Approved by: nnethercote
Pushing 86a49fd to main...

@rust-bors rust-bors bot merged commit 86a49fd into rust-lang:main Jan 14, 2026
12 checks passed
@rustbot rustbot added this to the 1.94.0 milestone Jan 14, 2026
@github-actions
Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 8c52f73 (parent) -> 86a49fd (this PR)

Test differences

Show 199 test diffs

Stage 1

  • [codegen] tests/codegen-llvm/lib-optimizations/append-elements.rs: [missing] -> pass (J1)
  • [codegen] tests/codegen-llvm/lib-optimizations/append-elements.rs: [missing] -> ignore (ignored when the LLVM version 20.1.2 is older than 21.0.0) (J4)

Stage 2

  • [codegen] tests/codegen-llvm/lib-optimizations/append-elements.rs: [missing] -> ignore (ignored when the LLVM version 20.1.2 is older than 21.0.0) (J0)
  • [codegen] tests/codegen-llvm/lib-optimizations/append-elements.rs: [missing] -> pass (J2)
  • [codegen] tests/codegen-llvm/lib-optimizations/append-elements.rs: [missing] -> ignore (ignored when the LLVM version 20.1.8 is older than 21.0.0) (J3)

Additionally, 194 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 86a49fd71fecd25b0fd20247db0ba95eeceaba28 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. aarch64-apple: 7730.4s -> 11396.2s (+47.4%)
  2. x86_64-gnu-debug: 6890.0s -> 8410.0s (+22.1%)
  3. dist-i686-mingw: 11143.3s -> 8903.5s (-20.1%)
  4. dist-x86_64-mingw: 10552.7s -> 8861.4s (-16.0%)
  5. pr-check-1: 1526.4s -> 1738.5s (+13.9%)
  6. x86_64-mingw-1: 11594.3s -> 10007.1s (-13.7%)
  7. x86_64-mingw-2: 10805.4s -> 9491.8s (-12.2%)
  8. x86_64-rust-for-linux: 2840.8s -> 2522.7s (-11.2%)
  9. x86_64-gnu-aux: 7384.5s -> 6573.9s (-11.0%)
  10. x86_64-gnu-llvm-21-2: 5697.2s -> 5115.0s (-10.2%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (86a49fd): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.6% [0.1%, 1.2%] 10
Regressions ❌
(secondary)
0.2% [0.1%, 0.9%] 8
Improvements ✅
(primary)
-1.3% [-2.7%, -0.5%] 3
Improvements ✅
(secondary)
-1.4% [-4.3%, -0.3%] 6
All ❌✅ (primary) 0.2% [-2.7%, 1.2%] 13

Max RSS (memory usage)

Results (primary 7.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
7.2% [4.1%, 10.2%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 7.2% [4.1%, 10.2%] 2

Cycles

Results (primary -2.6%, secondary -3.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.6% [-2.6%, -2.6%] 1
Improvements ✅
(secondary)
-3.6% [-4.4%, -2.9%] 2
All ❌✅ (primary) -2.6% [-2.6%, -2.6%] 1

Binary size

Results (primary -0.0%, secondary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.5%] 41
Regressions ❌
(secondary)
0.1% [0.0%, 0.1%] 8
Improvements ✅
(primary)
-0.4% [-1.4%, -0.0%] 6
Improvements ✅
(secondary)
-0.8% [-1.4%, -0.1%] 2
All ❌✅ (primary) -0.0% [-1.4%, 0.5%] 47

Bootstrap: 472.286s -> 475.838s (0.75%)
Artifact size: 383.57 MiB -> 383.64 MiB (0.02%)

@Zalathar
Copy link
Member

The new codegen test appears to have been causing flaky CI failures:

jieyouxu added a commit to jieyouxu/rust that referenced this pull request Jan 15, 2026
…g#130998"

This reverts PR <rust-lang#130998> because
the added test seems to be flaky / non-deterministic, and has been
failing in unrelated PRs during merge CI.
rust-bors bot pushed a commit that referenced this pull request Jan 15, 2026
Revert "avoid phi node for pointers flowing into Vec appends #130998"

This reverts PR #130998 because the added test seems to be flaky / non-deterministic, and has been failing in unrelated PRs during merge CI:

- #151129 (comment)
- #150772 (comment)
- #150925 (comment)

See also [#t-infra > Tree ops](https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/Tree.20ops/with/568111767).

> [!NOTE]
>
> This is a "fallback" PR in case the FileCheck failure isn't obvious (i.e. fix-forward). This PR reverts #130998 wholesale in case the failure is genuine and indicative of a bug in the actual implementation change.
Zalathar added a commit to Zalathar/rust that referenced this pull request Jan 15, 2026
Revert "avoid phi node for pointers flowing into Vec appends rust-lang#130998"

This reverts PR rust-lang#130998 because the added test seems to be flaky / non-deterministic, and has been failing in unrelated PRs during merge CI:

- rust-lang#151129 (comment)
- rust-lang#150772 (comment)
- rust-lang#150925 (comment)
- rust-lang#151145 (comment)

See also [#t-infra > Tree ops](https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/Tree.20ops/with/568111767).

> [!NOTE]
>
> This is a "fallback" PR in case the FileCheck failure isn't obvious (i.e. fix-forward). This PR reverts rust-lang#130998 wholesale in case the failure is genuine and indicative of a bug in the actual implementation change.
rust-bors bot pushed a commit that referenced this pull request Jan 15, 2026
Rollup of 2 pull requests

Successful merges:

 - #151150 (Revert "avoid phi node for pointers flowing into Vec appends #130998")
 - #151145 (Reduce rustdoc GUI flakyness, take 2)

r? @ghost
rust-timer added a commit that referenced this pull request Jan 15, 2026
Rollup merge of #151150 - revert-vec-append, r=Zalathar

Revert "avoid phi node for pointers flowing into Vec appends #130998"

This reverts PR #130998 because the added test seems to be flaky / non-deterministic, and has been failing in unrelated PRs during merge CI:

- #151129 (comment)
- #150772 (comment)
- #150925 (comment)
- #151145 (comment)

See also [#t-infra > Tree ops](https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/Tree.20ops/with/568111767).

> [!NOTE]
>
> This is a "fallback" PR in case the FileCheck failure isn't obvious (i.e. fix-forward). This PR reverts #130998 wholesale in case the failure is genuine and indicative of a bug in the actual implementation change.
the8472 added a commit to the8472/rust that referenced this pull request Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants