Skip to content

sharkdata/darwincore

Repository files navigation

Darwincore

Generator for extended/event-based DarwinCore-Archive (DwC-A) files.

Usage

This DwC-A generator converts a text file containing data and metadata to a DwC-A-file. At the moment only text files embedded in zip files for the SHARK data format is supported.

The extended DwC-A format is used where the event table is in the center of the mandatory star schema and the extendedmeasurementorfact (eMoF) is allowed to reference both the event and the occurrence tables.

The mapping between the input data file and DwC-A is controlled by an number of YAML files located in the directory dwca_config.

Installation

Darwincore is installed using uv. Follow instructions on https://docs.astral.sh/uv/ to install uv.

With uv, you can run darwincore directly without installing first.

$ uv run dwca-generato-main 

Virtual environment

You can also run from a virtual environment. First you need to initialize the virtual environment.

$ uv venv

Activate the venv (Mac/Linux):

$ source venv/bin/activate

Activate the venv (Windows)

$ venv\Scripts\activate

When the environment is activated, you can run the scripts using python:

(darwincore) $ python dwca_generator_cli.py

Usage

Before running darwincore, add zipped datasets to the directory data_in/datasets.

darwincor-generator-main

Run all configurations:

$ uv run dwca-generator-main

darwincore-generator-cli

Cli to choose which configuration to run.

$ uv run dwca-generator-cli

put-metadatafile-to-yamr

Post yame json metadata files to existing metadata record in yamr.

$ uv put-metadatafile-to-yamr

publish-to-yamr

Post from yame test to yamr prod.

$ uv run publish-to-yamr

Development

Testing

Run all tests:

$ uv run pytest

Adding dependencies

Add project dependencies:

$ uv add <name-of-dependency>

Add developer dependencies:

$ uv add --dev <name-of-dependency>

Formatting and linting

The project is configured to use ruff for both formatting and linting. Specific rulesets are configured in pyproject.toml.

Run formatting for all files:

$ uv run ruff format

Run formatting for a specific file or directory:

$ uv run ruff format <path>

Run linting of code:

$ uv run ruff check

pre-commit

Optionally you can activate pre-commit that automatically runs formatting and linting on everything you commit.

Initialize it once:

$ uv run pre-commit install

After this, a commit will fail if there are formatting or linting errors for the specific files. For formatting errors the fix will be applied to the files but you must accept the changes by adding the affected files to the commit again.

To skip this step for a specific commit (e.g. you just want to store work in progress) you can use the --no-verify flag in git.

$ git commit --no-verify

Contact

shark@smhi.se

About

Generator for extended/event-based DarwinCore-Archive files.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5