-
Notifications
You must be signed in to change notification settings - Fork 42
Description
According to the paper, queries for the training set are sampled by removing edges from the graph. In parallel_sample() if test is True, the test edges are loaded:
graphqembed/netquery/data_utils.py
Lines 77 to 86 in 2775a24
| def parallel_sample(graph, num_workers, samples_per_worker, data_dir, test=False, start_ind=None): | |
| if test: | |
| print "Loading test/val data.." | |
| test_edges = load_queries(data_dir + "/test_edges.pkl") | |
| val_edges = load_queries(data_dir + "/val_edges.pkl") | |
| else: | |
| test_edges = [] | |
| val_edges = [] | |
| proc_range = range(num_workers) if start_ind is None else range(start_ind, num_workers+start_ind) | |
| procs = [Process(target=parallel_sample_worker, args=[i, samples_per_worker, graph, data_dir, test, val_edges+test_edges]) for i in proc_range] |
test is then passed to parallel_sample_worker which uses not is_test to remove the edges. This means that if test is True, the test edges are loaded, but in parallel_sample_worker evaluating not is_test yields false and the edges are not removed:
graphqembed/netquery/data_utils.py
Lines 67 to 69 in 2775a24
| def parallel_sample_worker(pid, num_samples, graph, data_dir, is_test, test_edges): | |
| if not is_test: | |
| graph.remove_edges([(q.target_node, q.formula.rels[0], q.anchor_nodes[0]) for q in test_edges]) |
Conversely, if test is False, an empty list of edges is passed to parallel_sample_worker, in which case not is_test evaluates to True and this empty list is used to remove edges from the graph, effectively not removing edges from the graph.
This might have an effect on the evaluation of the methods.