Skip to content

Comments

add knninrange to uniformly sample k points in range within a given range#223

Open
KristofferC wants to merge 1 commit intomasterfrom
kc/knninrange
Open

add knninrange to uniformly sample k points in range within a given range#223
KristofferC wants to merge 1 commit intomasterfrom
kc/knninrange

Conversation

@KristofferC
Copy link
Owner

Fixes #210.

@juliohm, mind trying this out?

@juliohm
Copy link
Contributor

juliohm commented Nov 12, 2025

Tested locally and it works as expected. Thank you!

Looking forward to the next release.

@KristofferC
Copy link
Owner Author

I just want to say that it won't save you much time, because it still checks all points in range; it is just that it only store a random subset of size k of all these points in that range. But that is maybe what you wanted anyway.

@juliohm
Copy link
Contributor

juliohm commented Nov 12, 2025

I guess the main benefit would be the pre-allocation of k indices and knninrange! updates in-place. I am curious to benchmark this new function against our current implementation (combination of knn with inrange).

@KristofferC
Copy link
Owner Author

KristofferC commented Nov 13, 2025

I guess the main benefit would be the pre-allocation of k indices and knninrange! updates in-place.

Note that there is also an inrange! function that take an input array to mutate. The type of algorithm used here is typically for when the memory for storing the full output set is quite large and you cannot afford to store them all https://en.wikipedia.org/wiki/Reservoir_sampling:

The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory.

So I am actually not sure how useful this is in practice vs just allocating an output idx vector an calling inrange! (unless you will get a huge number of points returned from it).

@juliohm
Copy link
Contributor

juliohm commented Nov 13, 2025

You are probably right.

What is your recommendation for the fastest method to retrieve up to k points inside a range of radius r?

My current implementation looks like the following:

inds, dists = knn(tree, x, k, true)

inds[dists .≤ r]

@KristofferC
Copy link
Owner Author

You won't get a uniformly random sample from the points inside that range with that code but I don't know if you care about that. For uniformity you could do something like:

  buffer = Int[]
  idxs = inrange!(buffer, tree, x, r)
  sample = rand(rng, idxs, min(k, length(idxs)))

(which would be just a version of knninrange! that allocates a bit more memory).

Also, in your code you allocate idx_dists = dists .≤ r and inds[idx_dists]. You could pass around a filter like (ind for (ind, dist) in zip(inds, dists) if dist ≤ r) to avoid allocating anything and just iterate over that.

@juliohm
Copy link
Contributor

juliohm commented Nov 13, 2025

(which would be just a version of knninrange! that allocates a bit more memory).

Thank you. I believe knninrange! is what I need then.

@juliohm
Copy link
Contributor

juliohm commented Dec 18, 2025

@KristofferC what is the plan regarding this PR? Is it ready?

@KristofferC
Copy link
Owner Author

Kind of, but I am a little bit uncertain how useful it is... When it comes to allocations, creating the ReservoirSampler object causes an allocation so you get one from that at every call. That could potentially be remidied with an API where you create a sampler externally and pass it in through. Can you try this in practice and see that you actually get a performance/memory improvement from it?

@juliohm
Copy link
Contributor

juliohm commented Dec 22, 2025

Can you try this in practice and see that you actually get a performance/memory improvement from it?

I will investigate this after the break and report results back here.

@juliohm
Copy link
Contributor

juliohm commented Jan 21, 2026

@KristofferC I started looking into this. Did some benchmarks by replacing

inds, dists = knn(tree, x, k, true)
keep = dists .≤ r

with

inds = knninrange(tree, x, r, k)
dists = zeros(length(inds)) # currently knninrange doesn't return distances
keep = trues(length(inds)) # keep all indices as they are inside range already

and calling the function multiple times inside a hot loop. The original version is 4x faster.

I wonder if an in-place knninrange! that modifies both inds and dists could make things faster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retrieve up to k random indices inrange

2 participants