Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/ci_linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
steps:
- uses: actions/checkout@v5

- name: Install extra dependencies for a python 3.7.17 install
- name: Install extra dependencies for a python install
run: |
sudo apt-get update
sudo apt -y install --no-install-recommends liblzma-dev libbz2-dev libreadline-dev
Expand All @@ -26,6 +26,9 @@ jobs:

- name: reshim asdf
run: asdf reshim

- name: ensure poetry using desired python version
run: poetry env use $(asdf which python)

- name: Cache Poetry virtualenv
uses: actions/cache@v4
Expand Down
10 changes: 6 additions & 4 deletions .github/workflows/ci_testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v5

- name: Install extra dependencies for a python 3.7.17 install
- name: Install extra dependencies for a python install
run: |
sudo apt-get update
sudo apt -y install --no-install-recommends liblzma-dev libbz2-dev libreadline-dev

- name: Install asdf cli
uses: asdf-vm/actions/setup@v4

Expand All @@ -26,6 +27,9 @@ jobs:

- name: reshim asdf
run: asdf reshim

- name: ensure poetry using desired python version
run: poetry env use $(asdf which python)

- name: Cache Poetry virtualenv
uses: actions/cache@v4
Expand All @@ -42,7 +46,6 @@ jobs:
- name: Run pytest and coverage
run: |
export JAVA_HOME=$(asdf where java)
echo "JAVA_HOME - $JAVA_HOME"
make coverage

- name: Upload Coverage Report
Expand All @@ -54,5 +57,4 @@ jobs:
- name: Run behave tests
run: |
export JAVA_HOME=$(asdf where java)
echo "JAVA_HOME - $JAVA_HOME"
make behave
4 changes: 2 additions & 2 deletions .mise.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[tools]
python="3.7.17"
poetry="1.4.2"
python="3.11"
poetry="2.2"
java="liberica-1.8.0"
4 changes: 2 additions & 2 deletions .tool-versions
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
python 3.7.17
poetry 1.4.2
python 3.11.14
poetry 2.2.0
java liberica-1.8.0
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
## v0.2.0 (2025-11-12)

### Refactor

- ensure dve working on python 3.10
- ensure dve working on python 3.11

### BREAKING CHANGE

- Numerous typing updates that will make this codebase unusable below python 3.9

note - this does not mean the package will work on python 3.9. Minimum working version is 3.10.

### Feat

- added functionality to allow error messages in business rules t… (#8)

### Refactor

- bump pylint to work correctly with py3.11 and fix numerous linting issues

## 0.1.0 (2025-11-10)

*NB - This was previously v1.0.0 and v1.1.0 but has been rolled back into a 0.1.0 release to reflect lack of package stability.*

### Feat

- Added ability to define custom error codes and templated messages for data contract feedback messages
Expand Down
11 changes: 10 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ activate = poetry run
# dev
install:
poetry lock
poetry install --with dev,test
poetry install --with dev

# dist
wheel:
Expand All @@ -27,6 +27,15 @@ coverage:
$(activate) coverage report
$(activate) coverage xml

# lint
pylint:
${activate} pylint src/

mypy:
${activate} mypy src/

lint: mypy pylint

# pre-commit
pre-commit-all:
${activate} pre-commit run --all-files
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Data Validation Engine

The Data Validation Engine (DVE) is a configuration driven data validation library built and utilised by NHS England.
The Data Validation Engine (DVE) is a configuration driven data validation library built and utilised by NHS England. Currently the package has been reverted from v1.0.0 release to a 0.x as we feel the package is not yet mature enough to be considered a 1.0.0 release. So please bare this in mind if reading through the commits and references to a v1+ release when on v0.x.

As mentioned above, the DVE is "configuration driven" which means the majority of development for you as a user will be building a JSON document to describe how the data will be validated. The JSON document is known as a `dischema` file and example files can be accessed [here](./tests/testdata/). If you'd like to learn more about JSON document and how to build one from scratch, then please read the documentation [here](./docs/).

Expand All @@ -21,7 +21,7 @@ Additionally, if you'd like to contribute a new backend implementation into the

## Installation and usage

The DVE is a Python package and can be installed using `pip`. As of release v0.1.0 we currently only supports Python 3.7, with Spark version 3.2.1 and DuckDB version of 1.1.0. We are currently working on upgrading the DVE to work on Python 3.11+ and this will be made available asap with version 1.0.0 release.
The DVE is a Python package and can be installed using `pip`. As of release v0.1.x we currently only supports Python 3.7, with Spark version 3.2.1 and DuckDB version of 1.1.0. We are currently working on upgrading the DVE to work on Python 3.10-3.11 and this will be made available with version v0.2.x release.

In addition to a working Python 3.7+ installation you will need OpenJDK 11 installed if you're planning to use the Spark backend implementation.

Expand Down Expand Up @@ -49,7 +49,7 @@ Below is a list of features that we would like to implement or have been request
| Feature | Release Version | Released? |
| ------- | --------------- | --------- |
| Open source release | 0.1.0 | Yes |
| Uplift to Python 3.11 | 1.0.0 | No |
| Uplift to Python 3.11 | 0.2.0 | Yes |
| Upgrade to Pydantic 2.0 | Not yet confirmed | No |
| Create a more user friendly interface for building and modifying dischema files | Not yet confirmed | No |

Expand Down
70 changes: 42 additions & 28 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "nhs_dve"
version = "0.1.0"
version = "0.2.0"
description = "`nhs data validation engine` is a framework used to validate data"
authors = ["NHS England <england.contactus@nhs.net>"]
readme = "README.md"
Expand All @@ -9,58 +9,73 @@ packages = [
]
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Operating System :: OS Independent",
"Topic :: Software Development :: Libraries",
"Typing :: Typed",
]

[tool.poetry.dependencies]
python = ">=3.7.2,<3.8"
boto3 = "1.28.47" # Boto3 will no longer support Python 3.7 starting December 13, 2023
botocore = "1.31.47"
delta-spark = "1.1.0"
python = ">=3.10,<3.12"
boto3 = "1.34.162"
botocore = "1.34.162"
delta-spark = "2.4.0"
duckdb = "1.1.0" # mitigates security vuln in < 1.1.0
formulas = "1.2.4"
idna = "3.7" # Downstream dep of requests but has security vuln < 3.7
Jinja2 = "3.1.6" # mitigates security vuln in < 3.1.6
lxml = "4.9.1"
openpyxl = "3.1.0"
pandas = "1.3.5"
polars = "0.17.14"
pyarrow = "7.0.0"
pandas = "2.2.2"
polars = "0.20.14"
pyarrow = "17.0.0"
pydantic = "1.10.15" # Mitigates security vuln in < 1.10.13
pymongo = "4.6.3"
pyspark = "3.2.1"
pyspark = "3.4.4"
pytz = "2022.1"
PyYAML = "5.4"
requests = "2.31.0"
PyYAML = "6.0.3"
requests = "2.32.4" # Mitigates security vuln in < 2.31.0
schedula = "1.2.19"
sqlalchemy = "2.0.19"
typing_extensions = "4.6.2"
urllib3 = "1.26.19" # Used transiently, but has security vuln < 1.26.19
urllib3 = "2.5.0" # Mitigates security vuln in < 1.26.19
xmltodict = "0.13.0"

[tool.poetry.group.dev]
optional = true
include-groups = [
"test",
"lint"
]

[tool.poetry.group.dev.dependencies]
commitizen = "3.9.1" # latest version to support Python 3.7.17
pre-commit = "2.21.0" # latest version to support Python 3.7.17
commitizen = "4.9.1"
pre-commit = "4.3.0"

[tool.poetry.group.test]
optional = true

[tool.poetry.group.test.dependencies]
faker = "18.11.1"
behave = "1.2.6"
coverage = "6.4.3"
moto = {extras = ["s3"], version = "3.1.18"}
behave = "1.3.3"
coverage = "7.11.0"
moto = {extras = ["s3"], version = "4.0.13"}
Werkzeug = "3.0.6" # Dependency of moto which needs 3.0.6 for security vuln mitigation
mongomock = "4.1.2"
pytest = "7.4.4"
pytest-lazy-fixture = "0.6.3"
pytest = "8.4.2"
pytest-lazy-fixtures = "1.4.0" # switched from https://github.com/TvoroG/pytest-lazy-fixture as it's no longer supported
xlsx2csv = "0.8.2"

[tool.poetry.group.lint]
optional = true

[tool.poetry.group.lint.dependencies]
black = "22.6.0"
astroid = "2.11.7"
black = "24.3.0"
astroid = "2.14.2"
isort = "5.11.5"
pylint = "2.14.5"
mypy = "0.982"
pylint = "2.16.4"
mypy = "0.991"
boto3-stubs = {extras = ["essential"], version = "1.26.72"}
botocore-stubs = "1.29.72"
pandas-stubs = "1.2.0.62"
Expand Down Expand Up @@ -112,9 +127,8 @@ source_pkgs = [
show_missing = true

[tool.pylint]
# Can't add support for custom checker until running on Python 3.9+ again.
# init-hook = "import sys; sys.path.append('./pylint_checkers')"
# load-plugins = "check_typing_imports"
init-hook = "import sys; sys.path.append('./pylint_checkers')"
load-plugins = "check_typing_imports"

[tool.pylint.main]
# Analyse import fallback blocks. This can be used to support both Python 2 and 3
Expand Down Expand Up @@ -189,7 +203,7 @@ persistent = true

# Minimum Python version to use for version dependent checks. Will default to the
# version used to run pylint.
py-version = "3.7"
py-version = "3.11"

# Discover python modules and packages in the file system subtree.
# recursive =
Expand Down
Loading