Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
2d76ff8
change: don't use `or` to check for `None` arg
rmnldwg Jul 14, 2025
ff8a01d
merge: branch 'main' into 'dev'
rmnldwg Jul 23, 2025
2ba9d41
merge: branch 'dev' into 7-combining-introduces-incosistencies
rmnldwg Jul 23, 2025
2518817
remove!: old funcs to infer/combine data
rmnldwg Jul 23, 2025
f23b07f
fix!: combine mods & lvl info using probs
rmnldwg Jul 23, 2025
11c6161
test: test new combine/augment with CLB patient 17
rmnldwg Jul 23, 2025
7412c06
feat: `pass_to` to `C` objects for custom queries
rmnldwg Jul 24, 2025
d4658b7
refac: move `C` & `Q` to own module
rmnldwg Jul 24, 2025
90ef414
change!: update schema for new 2nd lvl cols
rmnldwg Jul 31, 2025
f7f41a0
fix: use spec/sens < 1 in `augment`
rmnldwg Jul 31, 2025
eef4bd3
feat: allow importing `LyDataFrame` type from root
rmnldwg Jul 31, 2025
9d52456
fix: make `LyDataFrame` importable
rmnldwg Jul 31, 2025
374b256
fix: ensure alignment during combine/augment
rmnldwg Jul 31, 2025
2a1b20c
fix: change midlvl col from `info` to `core`
rmnldwg Jul 31, 2025
6b8c41b
docs: add more info to augment/combine
rmnldwg Aug 4, 2025
229e00b
feat: add working sorting functions
rmnldwg Aug 4, 2025
4635820
feat: add convenience `.ly.enhance()` method
rmnldwg Aug 4, 2025
cebbacd
test: add basic `.ly.combine()` test
rmnldwg Aug 4, 2025
aff4c79
change: improve final sorting of tables
rmnldwg Aug 4, 2025
c89275d
test: add scripts to compare augment/combine
rmnldwg Aug 4, 2025
fce13ba
test: check one patient with specific issue
rmnldwg Aug 5, 2025
2beacec
fix: don't override super when subs unknown
rmnldwg Aug 5, 2025
dba7666
fix: join using "outer" in enhance
rmnldwg Aug 5, 2025
a2c157c
fix: `None`s due to index mismatch etc.
rmnldwg Aug 5, 2025
382aa46
test: add util doctest (though unnecessary)
rmnldwg Aug 5, 2025
9952c31
test: add some more patient-specific checks
rmnldwg Aug 5, 2025
01a62ec
fix: replace instead of udpdate augmented cols
rmnldwg Aug 5, 2025
116a70b
test: add scripts to compare old vs new
rmnldwg Aug 5, 2025
3ac9338
docs: update some docstrings
rmnldwg Aug 6, 2025
56be73b
merge: branch '7-combining-introduces-incosistencies' into 'dev'
rmnldwg Aug 6, 2025
0d725ad
feat: add basic pydantic patient/tumor model
rmnldwg Jul 9, 2025
e7ea8ed
test: ensure basic functionality of schemas
rmnldwg Jul 9, 2025
e84f19f
feat: add schema for modalities
rmnldwg Jul 10, 2025
65fde0a
change!: rewrite validation using new schema
rmnldwg Jul 10, 2025
9d8430c
test: check another wrong patient
rmnldwg Aug 6, 2025
524eb46
fix: augment during combine for max_llh/rank
rmnldwg Aug 6, 2025
eb99daf
test: rerun comparison
rmnldwg Aug 6, 2025
631c347
change!: start using only schema for validation
rmnldwg Aug 6, 2025
2dcc940
change: update midlvl cols to new `core`
rmnldwg Aug 11, 2025
fa33dcd
feat: add working casting function
rmnldwg Aug 11, 2025
e1b76bf
test: cover casting with minimal checks
rmnldwg Aug 11, 2025
736934a
test: update schema test to use `core`, too
rmnldwg Aug 11, 2025
de658f8
fix: nicotine/pack years validator
rmnldwg Aug 12, 2025
9bd354c
test: fix schema tests
rmnldwg Aug 12, 2025
c2c591d
feat: add `cast()` to lydata accessor
rmnldwg Aug 12, 2025
82e6d00
feat: add function to write JSON schema to file
rmnldwg Aug 12, 2025
8e922a8
feat: add basic collector web UI
rmnldwg Aug 12, 2025
6828696
feat: get parsing and downloading CSV to work
rmnldwg Aug 12, 2025
d535d86
docs: add docstrings to JS code
rmnldwg Aug 13, 2025
bcc71e4
build: remove typer dependency
rmnldwg Aug 13, 2025
79c5d49
remove: put collector in lyscripts
rmnldwg Aug 13, 2025
dd874c2
merge: branch '4-formalize-dataset-schema' into '7-combining-introduc…
rmnldwg Aug 19, 2025
1d5cdf8
test: add another 2025-USZ patient to test cases
rmnldwg Aug 19, 2025
ccfd3d5
test: fix small issues causing tests to fail
rmnldwg Aug 19, 2025
66d2129
fix: use default subdivisions in `.enhance()`
rmnldwg Aug 19, 2025
3c9f569
test: ensure diff does not change again
rmnldwg Aug 19, 2025
98760fc
merge: branch '7-combining-introduces-incosistencies' into '4-formali…
rmnldwg Aug 21, 2025
220bbcc
change: expand the schema
rmnldwg Aug 21, 2025
c653c62
fix: make casting safer and better
rmnldwg Aug 21, 2025
0049197
feat: add pre-/suffixes to T/N stages in schema
rmnldwg Aug 21, 2025
6d1bc06
change: further improve validation
rmnldwg Aug 21, 2025
b18ef3e
docs: update schema & validation docstrings
rmnldwg Aug 22, 2025
9490b0b
feat: casting, validating, & enhancing during load
rmnldwg Aug 22, 2025
8163d9e
test: update to new, cast data
rmnldwg Aug 22, 2025
02f7a8e
fix: avoid pydantic's weird TypeError for pd.NaT
rmnldwg Aug 25, 2025
e59454d
fix: check central info in schema
rmnldwg Aug 25, 2025
3ba50b2
fix: call `logger.error` over `exception`
rmnldwg Aug 25, 2025
884a0d9
fix: allow MX=-1 in schema
rmnldwg Aug 25, 2025
1e22f1b
fix: allow `None` in more patient fields
rmnldwg Aug 25, 2025
a60a8af
fix: allow central & side to be `None`
rmnldwg Aug 25, 2025
c5c6ad3
fix: side may be `None` when central=`True`
rmnldwg Aug 25, 2025
93aac55
fix: make some fields robust to uppercase strings
rmnldwg Aug 25, 2025
1e6196b
docs: add new modules to sphinx
rmnldwg Aug 25, 2025
e5a08d1
merge: branch '4-formalize-dataset-schema' into 'dev'
rmnldwg Aug 26, 2025
8469a26
feat: add a `.get_tnm()` helper method
rmnldwg Aug 26, 2025
6812761
feat: fail more informatively when loading
rmnldwg Aug 26, 2025
d3b8e06
fix: allow loading from disk using custom paths
rmnldwg Aug 27, 2025
371970f
feat: add `location` to short column access
rmnldwg Aug 27, 2025
0c82338
fix: get github fetch working again
rmnldwg Aug 27, 2025
5cc6a22
test: ensure .env is loaded during all tests
rmnldwg Sep 4, 2025
b65a65f
merge: branch '10-fetching-data-should-fail-more-informatively' into …
rmnldwg Sep 4, 2025
26414d0
chore: update changelog
rmnldwg Sep 4, 2025
53bb4ad
chore: remove old vs new scripts
rmnldwg Sep 4, 2025
0912188
docs: fix old data version in README.md
rmnldwg Sep 4, 2025
a8a5f58
ci: use env var for main tests, too
rmnldwg Sep 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,11 @@ jobs:

# Below, we first run pytest in the `tests/` folder. Because we use a `src`
# layout, this will fail if the package is not installed correctly.
- name: Test package is installable
- name: Patient-specific and installation tests
run: pytest --cov=lydata --cov-config=pyproject.toml tests
env:
COVERAGE_FILE: .coverage.is_installable
GITHUB_TOKEN: ${{ secrets.LYCOSYSTEM_READALL }}

# Now, we execute all doctests in the `src` tree. This will NOT run with
# the installed code, but it doesn't matter, because we already know it is
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -176,5 +176,5 @@ pyrightconfig.json
# End of https://www.toptal.com/developers/gitignore/api/python
**/_version.py

# VS Code
## VS Code
.vscode/
83 changes: 83 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,85 @@

All notable changes to this project will be documented in this file.

## [0.4.0] - 2025-09-04

### 🚀 Features

- Allow custom functions via `.pass_to()` of `C` objects
- Allow importing `LyDataFrame` type from root
- Add working sorting functions for `LyDataFrame`
- Add convenience `.ly.enhance()` method
- Add pydantic patient/tumor model
- Add schema for modalities
- Add working dtype casting function
- Add `.ly.cast()` to lydata accessor
- Add function to write JSON schema to file
- Add pre-/suffixes to T/N stages in schema
- Casting, validating, & enhancing during load
- Add a `.get_tnm()` helper method
- Fail more informatively when loading. Fixes [#10].
- Add `.ly.location` to short column access

### 🐛 Bug Fixes

- [**breaking**] Combine mods & lvl info using probabilities over likelihoods
- Use spec/sens < 1 in `augment`
- Make `LyDataFrame` importable
- Ensure alignment of columns during combine/augment
- Change mid-level column from `info` to `core`
- Don't override superlevel when sublevels unknown
- Join using "outer" in `.ly.enhance()`
- Avoid `None`s due to index mismatch etc.
- Replace instead of udpdate augmented columns
- Augment during combine for max_llh/rank
- Use default subdivisions in `.ly.enhance()`
- Make casting safer and better
- Avoid pydantic's weird `TypeError` for `pd.NaT`
- Check central info in schema
- Call `logger.error` over `exception`
- Allow MX=-1 in schema
- Allow `None` in more patient fields
- Side may be `None` when central=`True`
- Make some fields robust to uppercase strings
- Allow loading from disk using custom paths
- Get github fetch working again

### 💼 Other

- Don't use `or` to check for `None` arg
- [**breaking**] remove old functions to infer/combine data
- Move `C` & `Q` to own module
- [**breaking**] Update schema for new 2nd lvl cols
- Improve final sorting of tables
- [**breaking**] Rewrite validation using new schema
- [**breaking**] Start using only pydantic schema for validation
- Update mid-level cols to new `core`
- Remove typer dependency

### 📚 Documentation

- Add more info to augment/combine
- Update some docstrings
- Add docstrings to JS code
- Update schema & validation docstrings
- Add new modules to sphinx

### 🧪 Testing

- Test new combine/augment with CLB patient 17
- Add basic `.ly.combine()` test
- Add scripts to compare augment/combine
- Check one patient with specific issue
- Add util doctest (though unnecessary)
- Add some more patient-specific checks
- Ensure basic functionality of schemas
- Cover casting with minimal checks
- Update schema test to use `core`, too
- Add another 2025-USZ patient to test cases
- Fix small issues causing tests to fail
- Update to new, cast data
- Ensure .env is loaded during all tests

## [0.3.3] - 2025-07-22

### 🚀 Features
Expand Down Expand Up @@ -301,6 +380,9 @@ Initial implementation of the lyDATA library.
<!-- generated by git-cliff -->
<!-- markdownlint-disable-file MD024 -->

[0.4.0]: https://github.com/lycosystem/lydata-package/compare/0.3.3..0.4.0
[0.3.3]: https://github.com/lycosystem/lydata-package/compare/0.3.2..0.3.3
[0.3.2]: https://github.com/lycosystem/lydata-package/compare/0.3.1..0.3.2
[0.3.1]: https://github.com/lycosystem/lydata-package/compare/0.3.0..0.3.1
[0.3.0]: https://github.com/lycosystem/lydata-package/compare/8ae13..0.3.0
[0.2.5]: https://github.com/lycosystem/lydata/compare/0.2.4..0.2.5
Expand All @@ -321,3 +403,4 @@ Initial implementation of the lyDATA library.
[#4]: https://github.com/lycosystem/lydata/issues/4
[#13]: https://github.com/lycosystem/lydata/issues/13
[#5]: https://github.com/lycosystem/lydata-package/issues/5
[#10]: https://github.com/lycosystem/lydata-package/issues/10
5 changes: 5 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Pytest configuration and fixtures for lydata tests."""

from dotenv import load_dotenv

load_dotenv()
7 changes: 7 additions & 0 deletions docs/source/augmentor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.. currentmodule:: lydata.augmentor

Enhancing and Augmenting Datasets
=================================

.. automodule:: lydata.augmentor
:members:
3 changes: 3 additions & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@
:maxdepth: 2

accessor
augmentor
loader
querier
schema
utils
validator
:::
7 changes: 7 additions & 0 deletions docs/source/querier.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.. currentmodule:: lydata.querier

Efficient and Reusable DataFrame Queries
========================================

.. automodule:: lydata.querier
:members:
7 changes: 7 additions & 0 deletions docs/source/schema.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.. currentmodule:: lydata.schema

Formal Definition of a Patient Record
=====================================

.. automodule:: lydata.schema
:members:
4 changes: 2 additions & 2 deletions docs/source/validator.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. currentmodule:: lydata.validator

Pandera Schemas to Validate Datasets
====================================
Type Casting and Validation
===========================

.. automodule:: lydata.validator
:members:
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ dependencies = [
"pandera",
"pydantic",
"loguru",
"roman",
]

[project.urls]
Expand All @@ -44,6 +45,7 @@ docs = [
tests = [
"pytest",
"pytest-cov",
"python-dotenv>=1.1.1",
]
dev = [
"pre-commit",
Expand All @@ -67,6 +69,9 @@ exclude = ["docs"]
select = ["E", "F", "W", "B", "C", "R", "U", "D", "I", "S", "T", "A", "N"]
ignore = ["B028", "N816", "E712"]

[tool.ruff.lint.per-file-ignores]
"tests/*" = ["S101"]

[tool.uv]
package = true

Expand Down
11 changes: 6 additions & 5 deletions src/lydata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,28 @@
from loguru import logger

import lydata._version as _version
from lydata.accessor import C, Q
from lydata.accessor import LyDataFrame
from lydata.loader import (
available_datasets,
load_datasets,
)
from lydata.utils import infer_and_combine_levels
from lydata.validator import validate_datasets
from lydata.querier import C, Q
from lydata.validator import is_valid

__author__ = "Roman Ludwig"
__email__ = "roman.ludwig@usz.ch"
__uri__ = "https://github.com/lycosystem/lydata"
__version__ = _version.__version__

__all__ = [
"LyDataFrame",
"accessor",
"Q",
"C",
"available_datasets",
"load_datasets",
"validate_datasets",
"infer_and_combine_levels",
"is_valid",
]

logger.disable("lydata")
logger.remove()
Loading
Loading