Command-line and Python client for downloading and deploying datasets on DBpedia Databus.
- Quickstart
- DBpedia
- CLI Usage
- Module Usage
- Development & Contributing
The client supports two main workflows: downloading datasets from the Databus and deploying datasets to the Databus. Below you can choose how to run it (Python or Docker), then follow the sections on DBpedia downloads, CLI usage, or module usage.
You can use either Python or Docker. Both methods support all client features. The Docker image is available at dbpedia/databus-python-client.
Requirements: Python 3.11+ and pip
Before using the client, install it via pip:
python3 -m pip install databusclientYou can then use the client in the command line:
databusclient --help
databusclient deploy --help
databusclient delete --help
databusclient download --helpRequirements: Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client --help
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy --help
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --helpCommands to download the DBpedia Knowledge Graphs generated by Live Fusion. DBpedia Live Fusion publishes two kinds of KGs:
- Open Core Knowledge Graphs under CC-BY-SA license, open with copyleft/share-alike, no registration needed.
- Industry Knowledge Graphs under BUSL 1.1 license, unrestricted for research and experimentation, commercial license for productive use, free registration needed.
To download BUSL 1.1 licensed datasets, you need to register and get an access token.
- If you do not have a DBpedia Account yet (Forum/Databus), please register at https://account.dbpedia.org
- Log in at https://account.dbpedia.org and create your token.
- Save the token to a file, e.g.
vault-token.dat.
High-frequency, conflict-resolved knowledge graph that merges Live Wikipedia and Wikidata signals into a single, queryable dump for enterprise consumption. More information
# Python
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-dump --vault-token vault-token.dat
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-dump --vault-token vault-token.datDBpedia Wikipedia Extraction Enriched
DBpedia-based enrichment of structured Wikipedia extractions (currently EN DBpedia only). More information
# Python
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.dat
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.datOriginal extraction of structured Wikipedia data before enrichment. More information
# Python
databusclient download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-dump
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-dumpOriginal extraction of structured Wikidata data before enrichment. More information
# Python
databusclient download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-dump
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-dumpTo get started with the command-line interface (CLI) of the databus-python-client, you can use either the Python installation or the Docker image. The examples below show both methods.
Help and further general information:
# Python
databusclient --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client --help
# Output:
Usage: databusclient [OPTIONS] COMMAND [ARGS]...
Databus Client CLI
Options:
--help Show this message and exit.
Commands:
deploy Flexible deploy to Databus command supporting three modes:
download Download datasets from databus, optionally using vault access...With the download command, you can download datasets or parts thereof from the Databus. The download command expects one or more Databus URIs or a SPARQL query as arguments. The URIs can point to files, versions, artifacts, groups, or collections. If a SPARQL query is provided, the query must return download URLs from the Databus which will be downloaded.
# Python
databusclient download $DOWNLOADTARGET
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOADTARGET-
$DOWNLOADTARGET- Can be any Databus URI including collections OR SPARQL query (or several thereof).
-
--localdir- If no
--localdiris provided, the current working directory is used as base directory. The downloaded files will be stored in the working directory in a folder structure according to the Databus layout, i.e../$ACCOUNT/$GROUP/$ARTIFACT/$VERSION/.
- If no
-
--vault-token- If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with
--vault-token /path/to/vault-token.dat. See Registration (Access Token) for details on how to get a vault token.
Note: Vault tokens are only required for certain protected Databus hosts (for example:
data.dbpedia.io,data.dev.dbpedia.link). The client now detects those hosts and will fail early with a clear message if a token is required but not provided. Do not pass--vault-tokenfor public downloads. - If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with
-
--databus-key- If the databus is protected and needs API key authentication, you can provide the API key with
--databus-key YOUR_API_KEY.
- If the databus is protected and needs API key authentication, you can provide the API key with
Help and further information on download command:
# Python
databusclient download --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --help
# Output:
Usage: databusclient download [OPTIONS] DATABUSURIS...
Download datasets from databus, optionally using vault access if vault
options are provided.
Options:
--localdir TEXT Local databus folder (if not given, databus folder
structure is created in current working directory)
--databus TEXT Databus URL (if not given, inferred from databusuri,
e.g. https://databus.dbpedia.org/sparql)
--vault-token TEXT Path to Vault refresh token file
--databus-key TEXT Databus API key to download from protected databus
--all-versions When downloading artifacts, download all versions
instead of only the latest
--authurl TEXT Keycloak token endpoint URL [default:
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
connect/token]
--clientid TEXT Client ID for token exchange [default: vault-token-
exchange]
--help Show this message and exit.Download File: download of a single file
# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01/mappingbased-literals_lang=az.ttl.bz2
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01/mappingbased-literals_lang=az.ttl.bz2Download Version: download of all files of a specific version
# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01Download Artifact: download of all files with the latest version of an artifact
# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literalsDownload Group: download of all files with the latest version of all artifacts of a group
# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappingsDownload Collection: download of all files within a collection
# Python
databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12Download Query: download of all files returned by a query (SPARQL endpoint must be provided with --databus)
# Python
databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparqlWith the deploy command, you can deploy datasets to the Databus. The deploy command supports three modes:
- Classic dataset deployment via list of distributions
- Metadata-based deployment via metadata JSON file
- Upload & deploy via Nextcloud/WebDAV
# Python
databusclient deploy [OPTIONS] [DISTRIBUTIONS]...
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy [OPTIONS] [DISTRIBUTIONS]...Help and further information on deploy command:
# Python
databusclient deploy --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy --help
# Output:
Usage: databusclient deploy [OPTIONS] [DISTRIBUTIONS]...
Flexible deploy to Databus command supporting three modes:
- Classic deploy (distributions as arguments)
- Metadata-based deploy (--metadata <file>)
- Upload & deploy via Nextcloud (--webdav-url, --remote, --path)
Options:
--version-id TEXT Target databus version/dataset identifier of the form <h
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
RSION> [required]
--title TEXT Dataset title [required]
--abstract TEXT Dataset abstract max 200 chars [required]
--description TEXT Dataset description [required]
--license TEXT License (see dalicc.net) [required]
--apikey TEXT API key [required]
--metadata PATH Path to metadata JSON file (for metadata mode)
--webdav-url TEXT WebDAV URL (e.g.,
https://cloud.example.com/remote.php/webdav)
--remote TEXT rclone remote name (e.g., 'nextcloud')
--path TEXT Remote path on Nextcloud (e.g., 'datasets/mydataset')
--help Show this message and exit.# Python
databusclient deploy \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 \
--title "Client Testing" \
--abstract "Testing the client...." \
--description "Testing the client...." \
--license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 \
--apikey MYSTERIOUS \
'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 \
--title "Client Testing" \
--abstract "Testing the client...." \
--description "Testing the client...." \
--license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 \
--apikey MYSTERIOUS \
'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swaggerA few more notes for CLI usage:
- The content variants can be left out ONLY IF there is just one distribution
- For complete inferred: Just use the URL with
https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml - If other parameters are used, you need to leave them empty like
https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116
- For complete inferred: Just use the URL with
Use a JSON metadata file to define all distributions. The metadata.json should list all distributions and their metadata. All files referenced there will be registered on the Databus.
# Python
databusclient deploy \
--metadata ./metadata.json \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Metadata Deploy Example" \
--abstract "This is a short abstract of the dataset." \
--description "This dataset was uploaded using metadata.json." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY"
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--metadata ./metadata.json \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Metadata Deploy Example" \
--abstract "This is a short abstract of the dataset." \
--description "This dataset was uploaded using metadata.json." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY"Example metadata.json metadata file structure (file_format and compression are optional):
[
{
"checksum": "0929436d44bba110fc7578c138ed770ae9f548e195d19c2f00d813cca24b9f39",
"size": 12345,
"url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.ttl",
"file_format": "ttl"
},
{
"checksum": "2238acdd7cf6bc8d9c9963a9f6014051c754bf8a04aacc5cb10448e2da72c537",
"size": 54321,
"url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.csv.gz",
"file_format": "csv",
"compression": "gz"
}
]Upload local files or folders to a WebDAV/Nextcloud instance and automatically deploy to DBpedia Databus. Rclone is required.
# Python
databusclient deploy \
--webdav-url https://cloud.example.com/remote.php/webdav \
--remote nextcloud \
--path datasets/mydataset \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Test Dataset" \
--abstract "Short abstract of dataset" \
--description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY" \
./localfile1.ttl \
./data_folder
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--webdav-url https://cloud.example.com/remote.php/webdav \
--remote nextcloud \
--path datasets/mydataset \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
--title "Test Dataset" \
--abstract "Short abstract of dataset" \
--description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY" \
./localfile1.ttl \
./data_folderWith the delete command you can delete collections, groups, artifacts, and versions from the Databus. Deleting files is not supported via API.
Note: Deleting datasets will recursively delete all data associated with the dataset below the specified level. Please use this command with caution. As security measure, the delete command will prompt you for confirmation before proceeding with any deletion.
# Python
databusclient delete [OPTIONS] DATABUSURIS...
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete [OPTIONS] DATABUSURIS...Help and further information on delete command:
# Python
databusclient delete --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete --help
# Output:
Usage: databusclient delete [OPTIONS] DATABUSURIS...
Delete a dataset from the databus.
Delete a group, artifact, or version identified by the given databus URI.
Will recursively delete all data associated with the dataset.
Options:
--databus-key TEXT Databus API key to access protected databus [required]
--dry-run Perform a dry run without actual deletion
--force Force deletion without confirmation prompt
--help Show this message and exit.To authenticate the delete request, you need to provide an API key with --databus-key YOUR_API_KEY.
If you want to perform a dry run without actual deletion, use the --dry-run option. This will show you what would be deleted without making any changes.
As security measure, the delete command will prompt you for confirmation before proceeding with the deletion. If you want to skip this prompt, you can use the --force option.
Delete Version: delete a specific version
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEYDelete Artifact: delete an artifact and all its versions
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEYDelete Group: delete a group and all its artifacts and versions
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEYDelete Collection: delete collection
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEYfrom databusclient import create_distribution
# create a list
distributions = []
# minimal requirements
# compression and filetype will be inferred from the path
# this will trigger the download of the file to evaluate the shasum and content length
distributions.append(
create_distribution(url="https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml", cvs={"type": "swagger"})
)
# full parameters
# will just place parameters correctly, nothing will be downloaded or inferred
distributions.append(
create_distribution(
url="https://example.org/some/random/file.csv.bz2",
cvs={"type": "example", "realfile": "false"},
file_format="csv",
compression="bz2",
sha256_length_tuple=("7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653", 367116)
)
)A few notes:
- The dict for content variants can be empty ONLY IF there is just one distribution
- There can be no compression if there is no file format
from databusclient import create_dataset
# minimal way
dataset = create_dataset(
version_id="https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18",
title="Client Testing",
abstract="Testing the client....",
description="Testing the client....",
license_url="http://dalicc.net/licenselibrary/AdaptivePublicLicense10",
distributions=distributions,
)
# with group metadata
dataset = create_dataset(
version_id="https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18",
title="Client Testing",
abstract="Testing the client....",
description="Testing the client....",
license_url="http://dalicc.net/licenselibrary/AdaptivePublicLicense10",
distributions=distributions,
group_title="Title of group1",
group_abstract="Abstract of group1",
group_description="Description of group1"
)NOTE: Group metadata is applied only if all group parameters are set.
from databusclient import deploy
# to deploy something you just need the dataset from the previous step and an API key
# API key can be found (or generated) at https://$$DATABUS_BASE$$/$$USER$$#settings
deploy(dataset, "mysterious API key")Install development dependencies yourself or via Poetry:
poetry install --with devThe used linter is Ruff. Ruff is configured in pyproject.toml and is enforced in CI (.github/workflows/ruff.yml).
For development, you can run linting locally with ruff check . and optionally auto-format with ruff format ..
To ensure compatibility with the pyproject.toml configured dependencies, run Ruff via Poetry:
# To check for linting issues:
poetry run ruff check .
# To auto-format code:
poetry run ruff format .When developing new features please make sure to add appropriate tests and ensure that all tests pass. Tests are under tests/ and use pytest as test framework.
When fixing bugs or refactoring existing code, please make sure to add tests that cover the affected functionality. The current test coverage is very low, so any additional tests are highly appreciated.
To run tests locally, use:
pytest tests/Or to ensure compatibility with the pyproject.toml configured dependencies, run pytest via Poetry:
poetry run pytest tests/