Graph Influence

Graph Influence is an implementation of Influence Functions by Pang Wei Koh and Percy Liang for Graph Neural Networks. The main idea is to use the same methodology to calculate the influence of a training node in another testing node.

There are two main files in this repository: train.py and graph_influence.py. The first is a generic graph application to train graph neural networks on different toy datasets for different models. If you want to see more comprehensive information about influence functions, please take a look at my blog post about influence functions.

Training from scratch

You can train a model from scratch leaving one specific node out for validation/debugging purpose. When leave_out is set, it will not actually remove the node and its edges, but it is going to mask during training:

$ python train.py --dataset Cora --model GAT --leave_out 11

Parameter	Description	Default
--dataset	Dataset to use (Cora, Pubmed, CiteSeer, Flickr)	Cora
--model	Network model (GCN, GAT, GIN, ARMA)	GCN
--sampling	Use sampling (random, graphsaint, shadowkhop)
--batch_size	batch size for sampling (default: 1024)
--seed	Seeding number	123
--device	Device to train (cpu, cuda...)	cuda
--epochs	Number of epochs	50
--lr	learning rate	0.001
--hidden_layers	Number of hidden layers for MLP	256
--num_layers	Number of layers (GCN or GIN only)	2
--heads	Attention heads	8
--leave_out	Use this option to leave some specific node id out of the training	None
--node_ids	Store the test loss of those specific node ids	None
--debug	Log level	logging.INFO

Calculating the influence

To calculate the influence function use graph_influence. The following example calculates the values for the test node ids 1708, 1720 and 1800 for Cora dataset on GAT.

$ python graph_influence.py --dataset Cora --model GAT --node_ids 1708 1720 1800

Parameter	Description	Default
--dataset	Dataset to use (Cora, Pubmed, CiteSeer, Flickr)	Cora
--model	Network model (GCN, GAT, GIN, ARMA)	GCN
--sampling	Use sampling (random, graphsaint, shadowkhop)
--batch_size	batch size for sampling (default: 1024)
--seed	Seeding number	123
--device	Device to train (cpu, cuda...)	cuda
--epochs	Number of epochs	50
--lr	learning rate	0.001
--hidden_layers	Number of hidden layers for MLP	256
--num_layers	Number of layers (GCN or GIN only)	2
--heads	Attention heads	8
--node_ids	Test node ids to measure the impact for each training node (required)	None
--recursion_depth	Recursion depth for s_test calculation	1
--r_averaging	R averaging	1
--debug	Log level	logging.INFO

If you have any question feel free to contact me, open an issue or a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
graph_influence.py		graph_influence.py
influence.py		influence.py
loader.py		loader.py
shadow_influence.py		shadow_influence.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Influence

Training from scratch

Calculating the influence

About

Uh oh!

Releases

Packages

Languages

rfdavid/graph-influence

Folders and files

Latest commit

History

Repository files navigation

Graph Influence

Training from scratch

Calculating the influence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages