-
Notifications
You must be signed in to change notification settings - Fork 1
permutation tests #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- saved processed dataset to example_data - removed data cleaning code from notebook - added descriptions and permutation explanation in markdown
enryH
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will test it now.
| @@ -0,0 +1,53 @@ | |||
| # preprocessing | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this to something similar to compositional.py - I would like to not have it specific to microbiome at best, but we can group it here for now. Maybe you could move the file as compositional.py to transform/compositional.py?
| @@ -0,0 +1,25 @@ | |||
| eff_id,inf_id,sampling_location,sampling_read,eff_abundance,inf_abundance | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a README.md to the folder briefly putting a link where this originates from? And maybe if it is a subset, a small preprocessing script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive permutation testing functionality to the acore library, including implementations for paired and independent sample tests, chi-squared tests, and Jaccard similarity calculations. The changes also include documentation, examples, and minor code quality improvements.
- Implements three main permutation test functions:
paired_permutation,indep_permutation, andchi2_permutation - Adds Jaccard similarity calculations for graph comparison
- Includes comprehensive test suite and tutorial documentation with real-world metagenomics example
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| src/acore/permutation_test/init.py | Core permutation test implementations with support for multiple metrics |
| src/acore/permutation_test/internal_functions.py | Helper functions for permutation and contingency table operations |
| src/acore/permutation_test/jaccard.py | Jaccard similarity calculation functions for set comparison |
| src/acore/types/permutation_test.py | Pydantic model for permutation test results |
| tests/test_permutation_test.py | Comprehensive test suite for permutation test functions |
| docs/api_examples/permutation_testing.py | Tutorial demonstrating permutation testing on metagenomics data |
| docs/api_examples/permutation_testing.ipynb | Jupyter notebook version of the tutorial |
| docs/index.rst | Updated documentation index to include permutation testing tutorial |
| src/acore/microbiome/internal_functions.py | New CLR transformation functions for compositional data |
| example_data/mgnify/Ju2018_GO0017001_enf_inf_paired.csv | Example dataset for tutorial |
| src/acore/multiple_testing/init.py | Code style fix: changed membership test to use not in operator |
| src/acore/enrichment_analysis/init.py | Code style fix: changed membership tests to use not in operator |
| src/acore/decomposition/umap.py | Added blank line for formatting consistency |
| src/acore/io/uniprot/uniprot.py | Removed extra blank lines |
| CONTRIBUTING.rst | Fixed installation command quoting and corrected PR template path |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # ## Data preparation details | ||
| # | ||
| # ### Downloading | ||
| # The analysed samples were downloaded via the [MGnify API](https://www.ebi.ac.uk/metagenomics/api/docs/). The inffluent (INF) and effluent (EFFF) datasets have paired samples and we also needed to download the sample metadata (also available via Mgnify API) to assign the correct pairing. |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple spelling errors: "inffluent" should be "influent" and "EFFF" should be "EFF".
| "## Data preparation details\n", | ||
| "\n", | ||
| "### Downloading\n", | ||
| "The analysed samples were downloaded via the [MGnify API](https://www.ebi.ac.uk/metagenomics/api/docs/). The inffluent (INF) and effluent (EFFF) datasets have paired samples and we also needed to download the sample metadata (also available via Mgnify API) to assign the correct pairing.\n", |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple spelling errors: "inffluent" should be "influent" and "EFFF" should be "EFF".
| def random_choice_equal(): | ||
| return np.random.choice( | ||
| ["A", "B", "C", "D"], size=100, replace=True, p=[0.25, 0.25, 0.25, 0.25] | ||
| ) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def random_choice_unequal(): | ||
| return np.random.choice( | ||
| ["A", "B", "C", "D"], size=100, replace=True, p=[0.1, 0.25, 0.05, 0.6] | ||
| ) |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test fixtures that use random number generation without a fixed seed can lead to non-reproducible test failures. Consider using np.random.default_rng(seed=...) or np.random.seed() to ensure reproducibility.
| from acore import permutation_test as pt | ||
| import pytest | ||
| import numpy as np | ||
| from scipy.stats import ttest_rel, ttest_ind, mannwhitneyu, wilcoxon |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'ttest_ind' is not used.
Import of 'mannwhitneyu' is not used.
Import of 'wilcoxon' is not used.
| from scipy.stats import ttest_rel, ttest_ind, mannwhitneyu, wilcoxon | |
| from scipy.stats import ttest_rel |
|
@angelphanth Definitely 'squash and merge` this PR. |
Co-authored-by: Henry Webel <heweb@dtu.dk>
Co-authored-by: Henry Webel <heweb@dtu.dk>
mainly typos Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@angelphanth I merge main, where some documentation files were updated (see #47) |
Description
Add a subpackage (as new model) to core.
Tasks Checklist
__init__.pyin the new folder, so thatthey are available when the subpackage is imported.
Create Pandera schema in a file with subpackage name in thesrc/acore/typesfolder.Optimal is to have only one output schema of results per subpackage or module.
datafolder, or reuse an existing one for testingdocs/api_examplesfolder with that dataindex.rstfile in thedocsfolder with the new example/testfolder with the name of the subpackage or moduleusing pytest or unittests to test your new functionality.