Skip to content

Possible evaluation issue #8

@dfdazac

Description

@dfdazac

According to the paper, queries for the training set are sampled by removing edges from the graph. In parallel_sample() if test is True, the test edges are loaded:

def parallel_sample(graph, num_workers, samples_per_worker, data_dir, test=False, start_ind=None):
if test:
print "Loading test/val data.."
test_edges = load_queries(data_dir + "/test_edges.pkl")
val_edges = load_queries(data_dir + "/val_edges.pkl")
else:
test_edges = []
val_edges = []
proc_range = range(num_workers) if start_ind is None else range(start_ind, num_workers+start_ind)
procs = [Process(target=parallel_sample_worker, args=[i, samples_per_worker, graph, data_dir, test, val_edges+test_edges]) for i in proc_range]

test is then passed to parallel_sample_worker which uses not is_test to remove the edges. This means that if test is True, the test edges are loaded, but in parallel_sample_worker evaluating not is_test yields false and the edges are not removed:

def parallel_sample_worker(pid, num_samples, graph, data_dir, is_test, test_edges):
if not is_test:
graph.remove_edges([(q.target_node, q.formula.rels[0], q.anchor_nodes[0]) for q in test_edges])

Conversely, if test is False, an empty list of edges is passed to parallel_sample_worker, in which case not is_test evaluates to True and this empty list is used to remove edges from the graph, effectively not removing edges from the graph.

This might have an effect on the evaluation of the methods.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions