Skip to content

Conversation

@JoshRosen
Copy link
Contributor

@JoshRosen JoshRosen commented Dec 21, 2025

This PR implements two performance optimizations in uniqueArr:

  • Fix an O(n^2) performance bug: chore: Use ArrayBuilder to build the array #564 changed this code to call out.result() on every loop iteration, resulting in copying of an O(n) sized array on each loop iteration. We can avoid this by keeping a reference to the last output element.
  • Avoid duplicate keyF evaluations: in the old code, each loop iteration would repeat the keyF evaluation for the last element of the output array; the new code calls keyF exactly once per element by maintaining a lastAddedKey variable during the loop.

Claude (Opus 4.5) spotted the O(n^2) bug and designed that part of the fix. I spotted the duplicate keyF evaluation and suggested this fix (plus the switch to a while loop).

The previous implementation called ArrayBuilder.result() inside the loop,
which creates a new array copy on each iteration, resulting in O(n²) behavior.

This fix tracks lastAddedKey separately to compare against the previous
element's key without rebuilding the array. Also uses a while loop instead
of foreach to avoid closure allocation overhead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@stephenamar-db stephenamar-db merged commit fb2a9cd into databricks:master Dec 22, 2025
9 checks passed
@He-Pin
Copy link
Contributor

He-Pin commented Dec 22, 2025

@JoshRosen Thanks, I think we may need use Ai agent to do a detailed review around the current implementation before we cut 1.0.0

@JoshRosen JoshRosen deleted the fix-nsquared-uniqueArr branch December 22, 2025 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants