Skip to content

rfdavid/graph-influence

Repository files navigation

Graph Influence

Graph Influence is an implementation of Influence Functions by Pang Wei Koh and Percy Liang for Graph Neural Networks. The main idea is to use the same methodology to calculate the influence of a training node in another testing node.

There are two main files in this repository: train.py and graph_influence.py. The first is a generic graph application to train graph neural networks on different toy datasets for different models. If you want to see more comprehensive information about influence functions, please take a look at my blog post about influence functions.

Training from scratch

You can train a model from scratch leaving one specific node out for validation/debugging purpose. When leave_out is set, it will not actually remove the node and its edges, but it is going to mask during training:

$ python train.py --dataset Cora --model GAT --leave_out 11

Parameter Description Default
--dataset Dataset to use (Cora, Pubmed, CiteSeer, Flickr) Cora
--model Network model (GCN, GAT, GIN, ARMA) GCN
--sampling Use sampling (random, graphsaint, shadowkhop)
--batch_size batch size for sampling (default: 1024)
--seed Seeding number 123
--device Device to train (cpu, cuda...) cuda
--epochs Number of epochs 50
--lr learning rate 0.001
--hidden_layers Number of hidden layers for MLP 256
--num_layers Number of layers (GCN or GIN only) 2
--heads Attention heads 8
--leave_out Use this option to leave some specific node id out of the training None
--node_ids Store the test loss of those specific node ids None
--debug Log level logging.INFO

Calculating the influence

To calculate the influence function use graph_influence. The following example calculates the values for the test node ids 1708, 1720 and 1800 for Cora dataset on GAT.

$ python graph_influence.py --dataset Cora --model GAT --node_ids 1708 1720 1800

Parameter Description Default
--dataset Dataset to use (Cora, Pubmed, CiteSeer, Flickr) Cora
--model Network model (GCN, GAT, GIN, ARMA) GCN
--sampling Use sampling (random, graphsaint, shadowkhop)
--batch_size batch size for sampling (default: 1024)
--seed Seeding number 123
--device Device to train (cpu, cuda...) cuda
--epochs Number of epochs 50
--lr learning rate 0.001
--hidden_layers Number of hidden layers for MLP 256
--num_layers Number of layers (GCN or GIN only) 2
--heads Attention heads 8
--node_ids Test node ids to measure the impact for each training node (required) None
--recursion_depth Recursion depth for s_test calculation 1
--r_averaging R averaging 1
--debug Log level logging.INFO

If you have any question feel free to contact me, open an issue or a pull request.

About

Influence Functions on Graph Neural Networks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages