Skip to content

Conversation

@He-Pin
Copy link
Contributor

@He-Pin He-Pin commented Dec 9, 2025

The ArrayBuffer#toArray will do another copy.

@stephenamar-db stephenamar-db merged commit da96167 into databricks:master Dec 9, 2025
6 checks passed
for (v <- arrValue) {
if (out.isEmpty) {
out.append(v)
val outResult = out.result()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this change is introducing an O(n^2) bug because now we're making a fresh copy of the array on every loop iteration.

I've opened #575 with a suggested fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need just use the length to fix this.

@He-Pin He-Pin deleted the arrayBuilder branch December 22, 2025 03:04
stephenamar-db pushed a commit that referenced this pull request Dec 22, 2025
…ns (#575)

This PR implements two performance optimizations in `uniqueArr`:

- **Fix an O(n^2) performance bug**:
#564 changed this code to
call `out.result()` on every loop iteration, resulting in copying of an
`O(n)` sized array on each loop iteration. We can avoid this by keeping
a reference to the last output element.
- **Avoid duplicate keyF evaluations**: in the old code, each loop
iteration would repeat the `keyF` evaluation for the last element of the
output array; the new code calls `keyF` exactly once per element by
maintaining a `lastAddedKey` variable during the loop.
- This is similar in spirit to
#245

Claude (Opus 4.5) spotted the O(n^2) bug and designed that part of the
fix. I spotted the duplicate keyF evaluation and suggested this fix
(plus the switch to a `while` loop).

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants