Skip to content

Conversation

@valadaptive
Copy link
Contributor

@valadaptive valadaptive commented Dec 19, 2024

I ran into this when optimizing simdnoise. With gather ops no longer being a thing, I reimplemented them as standard loops. This led to a 20-30% slowdown in my application (which spends around half its time in simdnoise, so simdnoise itself is about 40-60% slower).

I thought the software gather ops were just inherently slower, but the actual problem is this line, with the innocuous-looking indices[i] access. i ranges from 0 to WIDTH, so the compiler should be able to easily remove the bounds check, but it couldn't, because the indexing operations couldn't be inlined.

With that fixed, simdnoise should now be as fast as it used to be before updating to simdeez 2.

@valadaptive
Copy link
Contributor Author

@arduano Just wanted to make sure this (and my other PRs here) haven't fallen off your radar.

@arduano
Copy link
Owner

arduano commented Feb 11, 2025

Apologies, will address all applicable PRs right now. I miss these notifications sometimes, and github does a terrible job at making sure I'm aware of them

@arduano arduano merged commit 38f6457 into arduano:master Feb 11, 2025
4 checks passed
@arduano
Copy link
Owner

arduano commented Feb 11, 2025

I've published 2.0.0-dev5

Also just a note, the reason I'm still publishing dev versions is because I haven't ported across sleef. There's big architecture concerns around porting it, so I'm not sure if I should just bite the bullet and deprecate sleef support, or wait until someone PRs it in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants