This project implements the KNN algorithms for the following paper:
- Yiqi Wang, Long Yuan, Wenjie Zhang, Xuemin Lin, Zi Chen, Qing Liu, "Simpler is More: Efficient Top-K Nearest Neighbors Search on Large Road Networks", which is submitted to VLDB2024.
All experiments were conducted on a Linux machine with an Intel Xeon CPU and 384GB of memory, Debian GNU/Linux 12 and the g++ version is 12.2.0. We have implemented all methods using the C++11 standard and turned on the O3 optimization flag.
All the datasets in this paper can be downloaded from DIMACS
There are five files in the data folder:
-
NY.datastores the whole graph data. Please refer to the format below, including sample data:6 6 (there are total 6 vertices and 6 edges) 1 2 8 (an edge between vertex 1 and vertex 2 with the distance of 8) 1 3 2 2 4 1 3 5 2 4 5 3 5 6 3 -
NY.qustores all queries. Please refer to the format below, including sample data:3 (there are total 3 queries) 1 3 (a query: return top 3 nearest neighbors for query vertex 1) 2 2 3 4 -
NY.objectstores all objects in the candidate set. Please refer to the format below, including sample data:4 (there are total 4 objects in the candidate object set) 1 (vertex 1 belongs to the candidate object set) 3 5 6 -
NY.destores all objected to be deleted from the candidate set. Please refer to the format below, including sample data:2 (there are 2 objects, will be deleted from the set) d 3 (vertex 3 will be deleted from the candidate object set) d 5 -
NY.instores all objected to be inserted into the candidate set. Please refer to the format below, including sample data:1 (there are 1 objects, will be inserted into the set) i 2 (vertex 2 will be inserted into the candidate object set)
Note: In our experiments, all deleted objects (inserted objects) are valid. All objects to be deleted are part of the original candidate object set. While, all objects to be inserted do not belong to the original candidate object set.
-
compile predata.cpp for preprocessing the raw graph data
g++ -std=c++11 -O3 predata.cpp -o pre -
compile index_bng.cpp for building bridge neighbor preserved graph for the original graph
g++ -std=c++11 -O3 index_bng.cpp -o bng -
compile query_up.cpp for knn-index construction, knn queries and updating objects
g++ -std=c++11 -O3 query_up.cpp -o qu
-
Preprocess raw graph data from DIMACS
./pre oridata gendata-
oridata: the file path to the raw data
-
gendata: the file path to generated data file, which stores the formatted data
eg:
./pre data/USA-road-d.NY.gr data/NY.data -
-
construct bridge neighbor graph
./pre origraph bng-
origraph: the file path to a data file, containing the original graph data
-
bng: the file path to the data file, storing the bridge neighbor graph
eg:
./bng data/NY.data data/NY.idx -
-
construct knn-index, query and update obejcts
./qu bng objset -X XF alg topk-
construct knn-index and query
./qu data/NY.idx data/NY.object -q data/NY.qu opt 40 -
update for inserting objects into a set of candidate objects
./qu data/NY.idx data/NY.object -u data/NY.in opt 40 -
update for deleting objects from a set of candidate objects
./qu data/NY.idx data/NY.object -u data/NY.de opt 40
-
-
Arguments
-
bng: the file path to the bridge neighbor graph -
objset: the file path to the object file -
-X:-qdenotes to query,-udenotes to update objects -
XF: the file path to queries document or the path to updates document -
alg: choose a knn-index construction algo,prirepresents our primary bottom-up computing-sharing algorithm (alg 2 in our paper),optrepresents our optimized bidirectional construction algorihtm (alg 3 in our paper). -
topk: the paramter for$k$ in the top$k$ nearesr neighbor search
-