Possible evaluation issue

According to the paper, queries for the training set are sampled by removing edges from the graph. In `parallel_sample()` if `test` is True, the test edges are loaded:

https://github.com/williamleif/graphqembed/blob/2775a242cf8a5df520530f2dd08aa8790f109ebf/netquery/data_utils.py#L77-L86

`test` is then passed to `parallel_sample_worker` which uses `not is_test` to remove the edges. This means that if `test` is True, the test edges are loaded, but in `parallel_sample_worker` evaluating `not is_test` yields false and the edges are not removed:

https://github.com/williamleif/graphqembed/blob/2775a242cf8a5df520530f2dd08aa8790f109ebf/netquery/data_utils.py#L67-L69

Conversely, if `test` is False, an empty list of edges is passed to `parallel_sample_worker`, in which case `not is_test` evaluates to True and this empty list is used to remove edges from the graph, effectively *not* removing edges from the graph.

This might have an effect on the evaluation of the methods.

	def parallel_sample(graph, num_workers, samples_per_worker, data_dir, test=False, start_ind=None):
	if test:
	print "Loading test/val data.."
	test_edges = load_queries(data_dir + "/test_edges.pkl")
	val_edges = load_queries(data_dir + "/val_edges.pkl")
	else:
	test_edges = []
	val_edges = []
	proc_range = range(num_workers) if start_ind is None else range(start_ind, num_workers+start_ind)
	procs = [Process(target=parallel_sample_worker, args=[i, samples_per_worker, graph, data_dir, test, val_edges+test_edges]) for i in proc_range]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible evaluation issue #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	def parallel_sample_worker(pid, num_samples, graph, data_dir, is_test, test_edges):
	if not is_test:
	graph.remove_edges([(q.target_node, q.formula.rels[0], q.anchor_nodes[0]) for q in test_edges])

Possible evaluation issue #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions