A high-performance RDF triple store implementation using RDFLib and Apache Parquet format.
- Storage: Store RDF graphs as Parquet files for efficient disk usage
- Loading: Load RDF graphs from Parquet files
- Batch Operations: Store and load multiple graphs
- Querying: Basic SPARQL query support
- Indexing: Optional indexed storage for faster subject/predicate queries
- Merging: Merge multiple RDF graphs into a single dataset
- Export: Export triples to Turtle format
cd parquad
pip install -r requirements.txtRequired packages:
- rdflib >= 6.4.0
- pyarrow >= 14.0.0
- pandas >= 2.0.0
from parquet_triple_store import ParquetTripleStore
# Initialize the store
store = ParquetTripleStore(storage_path="my_triple_store")from rdflib import Graph, URIRef, Literal, RDF
from parquet_triple_store import ParquetTripleStore
graph = Graph()
# Add some triples
graph.add((URIRef("http://example.org/person1"), RDF.type, URIRef("http://xmlns.com/foaf/0.1/Person")))
graph.add((URIRef("http://example.org/person1"),
URIRef("http://xmlns.com/foaf/0.1/name"),
Literal("Alice")))
# Store the graph
filepath = store.store_graph(graph, "my_data")# Load a specific graph
loaded_graph = store.load_graph("my_data")
# Load all graphs
all_triples = store.load_all_graphs()# Get statistics
stats = store.get_statistics()
print(f"Total triples: {stats['total_triples']}")
# Export to Turtle
store.export_to_turtle("output.ttl")For faster queries by subject or predicate:
from parquet_triple_store import ParquetTripleStoreWithIndex
# Initialize indexed store
indexed_store = ParquetTripleStoreWithIndex()
# Store and load graphs
indexed_store.store_graph(graph, "indexed_data")
indexed_store.load_all_graphs()
# Query by subject
results = indexed_store.find_by_subject("http://example.org/person1")
# Find triples with criteria
results = indexed_store.find_triples(
subject="http://example.org/person1",
predicate="http://xmlns.com/foaf/0.1/name"
)from rdflib import Graph
from parquet_triple_store import ParquetTripleStore
# Store multiple graphs
graphs_to_store = [
("dataset1", graph1),
("dataset2", graph2),
("dataset3", graph3)
]
filenames = store.batch_store(graphs_to_store)# Merge two graphs
merged_file = store.merge_graphs("dataset1", "dataset2")Initialize the triple store with a storage directory.
Store an RDF graph as a Parquet file.
Load an RDF graph from a Parquet file.
Store multiple graphs at once.
Load all Parquet files and return as a DataFrame.
Get statistics about stored triples.
Export triples to Turtle format.
Merge two graphs and store as new file.
Delete a specific Parquet file.
Extends ParquetTripleStore with additional indexing capabilities.
Find all triples with a specific subject.
Find all triples with a specific predicate.
Find triples matching given criteria.
- Parquet format provides excellent compression and fast reading
- Indexed store is recommended for frequent subject/predicate queries
- For large datasets, consider loading only necessary data
- SPARQL queries require
sparqlwrapperpackage
See usage_example.py for comprehensive examples.
MIT