- Title: Decentralized Geospatial (dgeo)
- Identifier: https://raw.githubusercontent.com/DecentralizedGeo/dgeo/refs/heads/main/json-schema/schema.json
- Field Name Prefix:
dgeo - Scope: Item, Collection, Assets
- Extension Maturity Classification: Proposal
- Owner: @DecentralizedGeo
- Version: 1.0.0
The Decentralized Geospatial (dgeo) extension provides a standard set of fields to associate STAC Items and Collections with resources on the decentralized web (dweb), such as IPFS, Filecoin, and other content-addressed storage systems.
The primary objectives of this extension are:
- To enable queryable CID-based discovery through STAC API and pgstac, allowing users to search for Items by Content Identifier (CID).
- To provide asset-level DAG metadata that describes how content-addressed structures were generated, enabling reproducible verification and reconstruction.
- To separate discovery (queryable CID arrays) from access (asset hrefs and descriptions), following STAC best practices.
Unlike extensions that describe how to access a specific file, dgeo describes what decentralized resources exist in a queryable, pgstac-compatible format.
Relation to other extensions
- The
dgeoextension is complementary to the Alternate Assets extension. Alternate Assets provides alternate location URIs for the same file, whiledgeoprovides content-addressed identifiers and technical DAG profiles.- Like the MLM extension,
dgeois designed to compose with multiple other extensions. A properly described decentralized dataset will typically usedgeoalongside EO, Raster, File, and possibly MLM depending on context.
- Examples:
- Item example: Landsat scene with multiple CIDs and asset-level metadata.
- Core example: Multiple CID representations for the same asset.
- Collection example: Collection-level dgeo fields.
- JSON Schema (JSON Schema Draft-07).
- Changelog
The dgeo extension MUST only be declared in STAC Items and Collections via the stac_extensions array.
-
Catalogs
- Catalogs MUST NOT declare the
dgeoextension instac_extensions, but MAY contain child Items or Collections that implement it.
- Catalogs MUST NOT declare the
-
Collections
- Collections MAY implement
dgeo. - When used at the Collection level,
dgeo:cidsanddgeo:piece_cidsdescribe collection-wide decentralized resources (for example, a collection-level CAR archive or IPFS directory root). - Asset-level
dgeofields MAY be used in Collection assets.
- Collections MAY implement
-
Items
- Items MAY implement
dgeo. - Item-level
dgeofields describe decentralized resources specific to that Item (for example, CIDs corresponding to raster assets).
- Items MAY implement
-
Assets
- Both Item and Collection assets MAY include
dgeo:cidanddgeo:cid_profilefields to provide asset-specific decentralized metadata.
- Both Item and Collection assets MAY include
The fields in the table below can be used in these parts of STAC documents:
- Catalogs
- Collections
- Item Properties (including Summaries in Collections)
- Assets (for both Collections and Items, including Item Asset Definitions in Collections)
- Links
| Field Name | Type | Description |
|---|---|---|
dgeo:cids |
string[] | REQUIRED. Array of IPFS Content Identifiers (CIDs) associated with this Item or Collection. Queryable via pgstac and STAC API. |
dgeo:piece_cids |
string[] | OPTIONAL. Array of Filecoin Piece CIDs (commP) used for storage verification and proof-of-replication workflows. Queryable via pgstac and STAC API. |
Type: Array of strings
REQUIRED when the dgeo extension is declared.
An array of IPFS Content Identifiers (CIDs) associated with this Item or Collection. Each CID MUST be immutable; mutable pointers such as IPNS MUST NOT be included.
This field is designed for queryability via pgstac and STAC API CQL2 filters, enabling users to search for Items by CID.
CID Format Validation:
- CIDv0:
^Qm[1-9A-HJ-NP-Za-km-z]{44}$A 46-character string starting with "Qm", base58-encoded multihash - CIDv1:
^b[a-z2-7]{58,}$base32-encoded self-describing multiformat protocol
Constraints:
- Minimum 1 item (
minItems: 1) - All items must be unique (
uniqueItems: true)
Example:
{
"properties": {
"datetime": "2020-12-04T22:38:32Z",
"dgeo:cids": [
"bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi",
"bafybeic2zycyt36xnkbvbzqbrmhb3jndcylvqywb4zfqn7i6kcsjjl62ka"
]
}
}Type: Array of strings
OPTIONAL
An array of Filecoin Piece CIDs (commP) used for storage verification and proof-of-replication workflows. Like dgeo:cids, this field is queryable via pgstac.
Piece CID Format Validation:
^baga6ea[a-z2-7]{52,}$
Constraints:
- All items must be unique (
uniqueItems: true)
Example:
{
"properties": {
"dgeo:cids": ["bafybei..."],
"dgeo:piece_cids": [
"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"
]
}
}| Field Name | Type | Description |
|---|---|---|
dgeo:cid_profile |
object | OPTIONAL. Technical details about how the CID's DAG was generated (chunking, hashing, layout, sharding). See CID Profile Object. |
dgeo:cid |
string | OPTIONAL. The IPFS CID that this asset represents. MUST appear in the Item's dgeo:cids array. Enables programmatic CID-to-asset correlation. |
dgeo:piece_cid |
string | OPTIONAL. The Filecoin Piece CID (commP) that this specific asset represents. When present, this CID MUST also appear in the Item's dgeo:piece_cids array at the properties level. |
Type: Object
OPTIONAL
Technical details about how the CID's DAG was generated (chunking, hashing, layout, sharding, etc.). This metadata enables reproducible DAG reconstruction and verification workflows.
See the CID Profile Object section for detailed field definitions.
Type: String
OPTIONAL
The IPFS CID that this specific asset represents. When present, this CID MUST also appear in the Item's dgeo:cids array at the properties level.
This field solves the CID-to-asset correlation problem: after querying Items by CID, clients can programmatically identify which asset that CID corresponds to, even when the asset href uses an HTTP gateway URL.
Example:
{
"properties": {
"dgeo:cids": ["bafybeigdyrzt..."]
},
"assets": {
"red": {
"href": "https://gateway.pinata.cloud/ipfs/bafybeigdyrzt...",
"type": "image/tiff",
"dgeo:cid": "bafybeigdyrzt...",
"dgeo:cid_profile": {...}
}
}
}Type: String
OPTIONAL
The Filecoin Piece CID (commP) that this specific asset represents. When present, this CID MUST also appear in the Item's dgeo:piece_cids array at the properties level.
This field solves the Piece CID-to-asset correlation problem: after querying Items by Piece CID, clients can programmatically identify which asset that Piece CID corresponds to, even when the asset href uses an HTTP gateway URL.
Example:
{
"properties": {
"dgeo:piece_cids": ["baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"]
},
"assets": {
"red": {
"href": "https://gateway.pinata.cloud/ipfs/bafybeigdyrzt...",
"type": "image/tiff",
"dgeo:piece_cid": "baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq",
"dgeo:cid_profile": {...}
}
}
}The CID Profile Object describes the parameters that affect DAG and CID generation, conceptually aligned with UnixFS and IPIP-0499 UnixFS parameters. This allows independent systems to re-chunk or verify DAGs in a reproducible way.
When cid_profile is present, the following fields are RECOMMENDED: cid_version, chunking_algorithm, dag_layout, and hash_function.
| Field Name | Type | Description |
|---|---|---|
cid_version |
integer | RECOMMENDED. Content Identifier (CID) version (0 or 1) specifying the format’s structure and encoding. |
hash_function |
string | RECOMMENDED. Multihash function to use (e.g., "sha2-256"). |
chunking_algorithm |
string | RECOMMENDED. Algorithm used to split files into chunks (e.g., "fixed-size", "rabin"). |
chunk_size |
integer | OPTIONAL. Maximum size of each chunk in bytes. |
dag_width |
integer | OPTIONAL. Maximum number of children per node in the DAG. |
dag_layout |
string | RECOMMENDED. Layout of the DAG (e.g., "balanced", "balanced-packed", "trickle"). |
empty_directories |
boolean | OPTIONAL. Whether empty directories are included in the DAG. |
hamt_directory_fanout |
string | OPTIONAL. Maximum number of block entries per HAMT directory node (e.g. "256 blocks"). |
hamt_directory_threshold |
string | OPTIONAL. The HAMTDirectory threshold determines when a directory converts to a HAMT structure. |
hamt_switch_comparison |
string | OPTIONAL. Comparison operators (>= or >) for switching to a HAMT structure. |
leaves |
string | OPTIONAL. Determines whether file data is stored in a dag-pb-wrapped block or as raw bytes. |
hidden_entities |
boolean | OPTIONAL. Whether hidden entities (including dot files) are included in the DAG. |
symlinks |
string | OPTIONAL. Method for handling symbolic links (e.g., "preserve", "followed", "skipped"). |
mode_permissions |
boolean | OPTIONAL. POSIX file permissions included in the DAG. |
mod_time |
boolean | OPTIONAL. File modification time included in the DAG. |
JSON Schema for cid_profile uses "additionalProperties": true to allow other UnixFS/IPLD parameters in the future.
Example:
{
"dgeo:cid_profile": {
"cid_version": 1,
"chunking_algorithm": "fixed-size",
"chunk_size": 262144,
"dag_layout": "balanced",
"hash_function": "sha2-256"
}
}For details on common Usage Patterns and best practices, please see the Implementation Guide.
All contributions are subject to the STAC Specification Code of Conduct. For contributions, please follow the STAC specification contributing guide Instructions for running tests are copied here for convenience.
The same checks that run as checks on PR's are part of the repository and can be run locally to verify that changes are valid.
To run tests locally, you'll need npm, which is a standard part of any node.js installation.
First you'll need to install everything with npm once. Just navigate to the root of this repository and on your command line run:
npm installThen to check markdown formatting and test the examples against the JSON schema, you can run:
npm testThis will spit out the same texts that you see online, and you can then go and fix your markdown or examples.
If the tests reveal formatting problems with the examples, you can fix them with:
npm run format-examples