GitHub

Metapath-Based Pattern Mining in HINs

Many mining algorithms in HINs have a common computationally intensive step, that consists of finding the pairs of nodes connected given a particular metapath. This repository provides open-source implementations of the Sequential and Dynamic Programming matrix multiplication strategies proposed in [1] for this problem.

Input File Formats

Input files can be one of the following types:

File containing node attributes. These are tab-separated files containing all node attributes. The first line is the header that contains all attribute names with _n and _s suffixes for numeric and alphanumeric attributes respectively. The first column should be an incremental integer identifier starting from 1, denoted as "id_n" in the header. These files should be named with the first letter of the entity they are representing. For example, the file that contains the attributes for node type Author should be named A.csv. An example of a file containing node attributes is the following:

id_n	name_s    surname_s
0	Makoto  Satoh
1	Ryo Muramatsu
...

File containing relations. These are tab-separated files needed to construct the relations among nodes. These files contain two columns, the source and target identidiers respectively and should be sorted based on the first column and named with the extension .csv. They do not contain a header and they should be named according to the source and target node types. For example, the file with the relations between node types Author and Paper should be named AP.csv. An example of file containing node relations is the following:

Example Input Files

All methods use the same format of input files. Examples of input files can be found in the ./data/ folder. Two sample datasets are included: (a) a subset of the GDELT dataset ./data/GDELT/, (b) a subset of the DBLP dataset from AMiner (folder ./data/DBLP_subset/).

Example Query File

An example query can be found in data/DBLP_subset/query.txt

Clone & Build

# clone repository
git clone --recursive https://github.com/smartdatalake/HMiner.git

# update submodules
git submodule update --remote

# build project
make

Configuration

The program recieves parameters from a json config file. An example config file is the following:

{
    "indir": "./data/DBLP_subset/nodes/",
    "irdir": "./data/DBLP_subset/relations/",
    "algorithm": "OTree",
    "hin_out": "./data/out/",
    "qf": "./data/DBLP_subset/queries.txt",
    "cache_size": 4096,
    "cache_policy": "PGDS",
    "dynamic_optimizer": "Sparse",
    "ranking": {
        "ranking_out": "./data/out/",
        "pr_alpha": 0.5,
        "pr_tol": 0.00000000001,
        "threshold": 2
    }
}

Query Workload

Thq query workload has the following format:

{ "metapath": "APPA", "constraints": { "P": "year >= 2000" } }
{ "metapath": "APPAP", "constraints": { "P": "year >= 2000" }, "src_field": "name" }}

Fields "metapath" and "constraints" are required. The field "src_field" is optional and is used to append the specified column in the output in the case of ranking.

Execution

Execute ranking with the following command:

./run -c config.json

References

[1] Y. P. S. W. B. Shi Chuan, Li Yitong. Constrained-meta-path-based ranking in heterogeneous information network. Knowledge and Information Systems, 2016

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Base		Base
Cache		Cache
CacheBaselines		CacheBaselines
HRank		HRank
MatrixMultiplier		MatrixMultiplier
TransitionMatrix		TransitionMatrix
data		data
libs		libs
workflows		workflows
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Utils.cpp		Utils.cpp
Utils.h		Utils.h
config.json		config.json
main.cpp		main.cpp
pagerank.py		pagerank.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Metapath-Based Pattern Mining in HINs

Input File Formats

Example Input Files

Example Query File

Clone & Build

Configuration

Query Workload

Execution

References

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

smartdatalake/HMiner

Folders and files

Latest commit

History

Repository files navigation

Metapath-Based Pattern Mining in HINs

Input File Formats

Example Input Files

Example Query File

Clone & Build

Configuration

Query Workload

Execution

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages