From cbd99ab10960062be58dcdc736958412d95a4eb0 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Thu, 27 Nov 2025 10:20:31 -0800 Subject: [PATCH 01/87] remove excel section --- book/workflows.md | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/book/workflows.md b/book/workflows.md index 7955729..1717223 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -5,26 +5,19 @@ https://workflowhub.eu/ -### Russ's First Law of Scientific Data Management + - use narps as the working example -> "Don't use spreadsheets to manage scientific data." + +## Simple workflow management with Makefiles -In this chapter I will talk in detail about best practices for data management, but I start by discussing a data management "anti-pattern", which is the use of spreadsheets for data management. Spreadsheet software such as Microsoft Excel is commonly used by researchers for all sorts of data management and processing operations. Why are spreadsheets problematic? -- They encourage manual manipulation of the data, which makes the operations non-reproducible by definition. -- Spreadsheet tools will often automatically format data, sometimes changing things in important but unwanted ways. For example, gene names such as "SEPT2" and "MARCH1" are converted to dates by Microsoft Excel, and some accession numbers (e.g. "2310009E13") are converted to floating point numbers. An analysis of published genomics papers {cite:p}`Ziemann:2016aa` found that roughly twenty percent of supplementary gene lists created using Excel contained errors in gene names due to these conversions. -- It is very easy to make errors when performing operations on a spreadsheet, and these errors can often go unnoticed. A well known example occurred in the paper ["Growth in the time of debt"](https://www.nber.org/papers/w15639) by the prominent economists Carmen Reinhart and Kenneth Rogoff. This paper claimed to have found that high levels of national debt led to decreased economic growth, and was used as a basis for promoting austerity programs after the 2008 financial crisis. However, [researchers subsequently discovered](https://academic.oup.com/cje/article/38/2/257/1714018) that the authors had made an error in their Excel spreadsheet, excluding data from several countries; when the full data were used, the relationship between growth and debt became much weaker. -- Spreadsheet software can sometimes have limitations that can cause problems. For example, the use of an outdated Microsoft Excel file format (.xls) [caused underreporting of COVID-19 cases](https://www.bbc.com/news/technology-54423988) due to limitations on the number of rows in that file format, and the lack of any warnings when additional rows in the imported data files were ignored. -- Spreadsheets do not easily lend themselves to version control and change tracking, although some spreadsheet tools (such as Google Sheets) do provide the ability to clearly label versions of the data. +## Workflow management systems for complex workflows -I will occasionally use Microsoft Excel to examine a data file, but I think that spreadsheet tools should *never* be used as part of a scientific data workflow. +## Building a workflow management system from scratch -## Simple workflow management with Makefiles - -## Workflow management systems for complex workflows +## Tracking provenance -## Building a workflow management system from scratch From 5887fa597ac9e638aacce8d4f041aae4afa3031c Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Fri, 28 Nov 2025 09:40:56 -0800 Subject: [PATCH 02/87] initial section on desiderata --- book/workflows.md | 122 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 120 insertions(+), 2 deletions(-) diff --git a/book/workflows.md b/book/workflows.md index 1717223..1b56b8e 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -1,5 +1,85 @@ # Workflow Management +In most parts of science today, data processing and analysis comprises many different steps. We will refer to such a set of steps as a computational *workflow* (or, interchangeably, *pipeline*). If you have been doing science for very long, you have very likely encountered a *mega-script* that implements such a workflow. Usually written in a scripting language like *Bash*, this is a script that may be hundreds or even thousands of lines long that runs a single workflow from start to end. Often these scripts are handed down to new trainees over generations, such that users become afraid to make any changes lest the entire house of cards comes crashing down. I think that most of us can agree that this is not an optimal workflow, and in this chapter I will discuss in detail how to move from a mega-script to a workflow that will meet all of the requirements that are required to provide robust and reliable answers to our scientific questions. + +## What do we want from a scientific workflow? + +First let's ask: What do we want from a computational scientific workflow? Here are some of the factors that I think are important. First, we care about the *correctness* of the workflow, which includes the following factors: + +- *Verifiability*: The workflow includes validation procedures to ensure against known problems or edge cases. +- *Reproducibility*: The workflow can be rerun from scratch on the same data and get the same answer, at least within the limits of uncontrollable factors such as floating point imprecision. +- *Robustness*: When there is a problem, the workflow fails quickly with explicit error messages, or degrades gracefully when possible. + +Second, we care about the *usability* of the workflow. Factors related to usability include: + +- *Configurability*: The workflow uses smart defaults, but allows the user to easily change the configuration in a way that is traceable. +- *Parameterizability*: Multiple runs of the workflow can be executed with different parameters, and the separate outputs can be tracked. +- *Standards compliance*: The workflow leverages common standards to easily read in data and generates output using community standards for file formats and organization when available. + +Third, we care about the *engineering quality* of the code, which includes: + +- *Maintainability*: The workflow is structured and documented so that others (including your future self) can easily maintain, update, and extend it in the future. +- *Modularity*: The workflow is composed of a set of independently testable modules, which can be swapped in or out relatively easily. +- *Idempotency*: This term from computer science means that running the workflow multiple times gives the same result as running it once, which allows safely rerunning the workflow when there is a failure. +- *Traceability*: All operations are logged, and provenance information is stored for outputs. + +Finally, we care about the *efficiency* of the workflow implementation. This includes: + +- *Incremental execution*: The workflow only reruns a module if necessary, such as when an input changes. +- *Amortized computation*: The workflow pre-computes and reuses results from expensive operations when possible. + +It's worth noting that these different desiderata will sometimes conflict with one another (such as configurability versus maintainability), and that no workflow will be perfect. + + + +### Breaking a workflow into stages + +good breakpoints between workflow modules include: + +- conceptual logic - different stages do different things +- points where one might need to restart the computation (e.g. due to computational cost) +- sections where one might wish to swap in a new method +- points where the output could be reusable elsewhere + +the workflow should be stateless when possible + +- allows each state to be run independently + +but sometimes state is required + +- e.g. training a neural network, one needs to know where you are in the process + + + +## Modularity and reusability + +- separate analysis logic from workflow orchestration +- analysis modules should be tested (e.g. with synthetic data) + + + +## Idempotency + +- running it multiple time should give same answer as running it once +- + +## Precomputing expensive/common operations + + +## Tracking provenance + + + +### Configuration management + +- how to configure a workflow + + - configuration files + - command line arguments + - defaults + +- interaction with provenance + - discuss fit-transform model somewhere https://workflowhub.eu/ @@ -7,17 +87,55 @@ https://workflowhub.eu/ - use narps as the working example + +## Error handling and robustness + +- Fail fast +- Gracefully handle missing data +- Checkpointing for long-running workflows +- write tests for common edge cases + - use a small toy dataset for testing +- unit vs integration tests + + +## Logging + + ## Simple workflow management with Makefiles + +## Python workflow management with checkpoints + + + ## Workflow management systems for complex workflows +- introduce DAGs -## Building a workflow management system from scratch +- general purpose vs domain specific + - overview various engines -## Tracking provenance +## Scaling workflows + +- maybe leave this to the HPC chapter? + + +## FAIR-inspired practices for workflows + - FAIR workflows + - https://pmc.ncbi.nlm.nih.gov/articles/PMC10538699/ + - https://www.nature.com/articles/s41597-025-04451-9 + - this seems really heavyweight. + - 80/20 approach to reproducible workflows + - version control + documentation + - requirements file or container + - clear workflow structure + - standard file formats + - The full FAIR approach may be necessary in some contexts + + From 18e0e9a9da0e9c638078f4f80f13b86945b7068e Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 29 Nov 2025 09:08:52 -0800 Subject: [PATCH 03/87] initial add --- tests/narps/test_bids.py | 267 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 267 insertions(+) create mode 100644 tests/narps/test_bids.py diff --git a/tests/narps/test_bids.py b/tests/narps/test_bids.py new file mode 100644 index 0000000..40f0540 --- /dev/null +++ b/tests/narps/test_bids.py @@ -0,0 +1,267 @@ +"""Tests for BIDS utility functions.""" + +import pytest +from pathlib import Path +import tempfile +import shutil + +from BetterCodeBetterScience.narps.bids_utils import ( + parse_bids_filename, + find_bids_files, +) + + +# Tests for parse_bids_filename + + +def test_parse_bids_filename_basic_string(): + """Test parsing a basic BIDS filename from a string.""" + filename = "sub-01_task-rest_bold.nii.gz" + result = parse_bids_filename(filename) + + assert result['sub'] == '01' + assert result['task'] == 'rest' + assert result['suffix'] == 'bold' + assert 'path' in result + assert result['path'] == filename + + +def test_parse_bids_filename_path_object(): + """Test parsing a BIDS filename from a Path object.""" + filepath = Path("/data/sub-02_ses-01_T1w.nii.gz") + result = parse_bids_filename(filepath) + + assert result['sub'] == '02' + assert result['ses'] == '01' + assert result['suffix'] == 'T1w' + assert result['path'] == str(filepath) + + +def test_parse_bids_filename_complex(): + """Test parsing a complex BIDS filename with multiple entities.""" + filename = "sub-01_ses-02_task-rest_acq-mb_run-01_bold.nii.gz" + result = parse_bids_filename(filename) + + assert result['sub'] == '01' + assert result['ses'] == '02' + assert result['task'] == 'rest' + assert result['acq'] == 'mb' + assert result['run'] == '01' + assert result['suffix'] == 'bold' + + +def test_parse_bids_filename_with_hyphen_in_value(): + """Test parsing when a value contains a hyphen.""" + filename = "sub-control-01_task-go-nogo_bold.nii" + result = parse_bids_filename(filename) + + # Should split only on first hyphen + assert result['sub'] == 'control-01' + assert result['task'] == 'go-nogo' + assert result['suffix'] == 'bold' + + +def test_parse_bids_filename_narps_format(): + """Test parsing NARPS-specific filename format.""" + filename = "sub-team01_hyp-1_type-thresh_desc-orig_stat.nii.gz" + result = parse_bids_filename(filename) + + assert result['sub'] == 'team01' + assert result['hyp'] == '1' + assert result['type'] == 'thresh' + assert result['desc'] == 'orig' + assert result['suffix'] == 'stat' + + +def test_parse_bids_filename_with_full_path(): + """Test parsing with a full absolute path.""" + filepath = Path("/home/user/data/sub-03_T2w.nii.gz") + result = parse_bids_filename(filepath) + + assert result['sub'] == '03' + assert result['suffix'] == 'T2w' + assert result['path'] == str(filepath) + # Should only parse the filename, not the directory + assert 'home' not in result + assert 'user' not in result + + +def test_parse_bids_filename_minimal(): + """Test parsing a minimal BIDS filename.""" + filename = "sub-01_bold.nii" + result = parse_bids_filename(filename) + + assert result['sub'] == '01' + assert result['suffix'] == 'bold' + assert len(result) == 3 # sub, suffix, path + + +def test_parse_bids_filename_no_extension(): + """Test parsing a filename without extension.""" + filename = "sub-01_task-rest_bold" + result = parse_bids_filename(filename) + + assert result['sub'] == '01' + assert result['task'] == 'rest' + assert result['suffix'] == 'bold' + + +# Tests for find_bids_files + + +@pytest.fixture +def temp_bids_dir(): + """Create a temporary BIDS-like directory structure for testing.""" + tmpdir = tempfile.mkdtemp() + basedir = Path(tmpdir) + + # Create test files + files = [ + "sub-01_task-rest_bold.nii.gz", + "sub-01_task-rest_T1w.nii.gz", + "sub-01_task-memory_bold.nii.gz", + "sub-02_task-rest_bold.nii.gz", + "sub-02_task-memory_bold.nii.gz", + "sub-03_ses-01_task-rest_bold.nii.gz", + "sub-03_ses-02_task-rest_bold.nii.gz", + ] + + for filename in files: + filepath = basedir / filename + filepath.touch() + + yield basedir + + # Cleanup + shutil.rmtree(tmpdir) + + +def test_find_bids_files_single_tag(temp_bids_dir): + """Test finding files with a single BIDS tag.""" + results = find_bids_files(temp_bids_dir, sub='01') + + assert len(results) == 3 + assert all('sub-01' in r.name for r in results) + + +def test_find_bids_files_multiple_tags(temp_bids_dir): + """Test finding files with multiple BIDS tags.""" + results = find_bids_files(temp_bids_dir, sub='01', task='rest') + + assert len(results) == 2 + filenames = [r.name for r in results] + assert "sub-01_task-rest_bold.nii.gz" in filenames + assert "sub-01_task-rest_T1w.nii.gz" in filenames + + +def test_find_bids_files_with_suffix(temp_bids_dir): + """Test finding files filtered by suffix.""" + results = find_bids_files(temp_bids_dir, suffix='bold') + + assert len(results) == 6 + assert all('bold' in r.name for r in results) + + +def test_find_bids_files_no_matches(temp_bids_dir): + """Test finding files when no matches exist.""" + results = find_bids_files(temp_bids_dir, sub='99') + + assert len(results) == 0 + + +def test_find_bids_files_all_tags_must_match(temp_bids_dir): + """Test that all specified tags must match.""" + results = find_bids_files(temp_bids_dir, sub='01', task='rest', suffix='bold') + + assert len(results) == 1 + assert results[0].name == "sub-01_task-rest_bold.nii.gz" + + +def test_find_bids_files_with_session(temp_bids_dir): + """Test finding files with session tag.""" + results = find_bids_files(temp_bids_dir, sub='03', ses='01') + + assert len(results) == 1 + assert results[0].name == "sub-03_ses-01_task-rest_bold.nii.gz" + + +def test_find_bids_files_string_basedir(temp_bids_dir): + """Test that basedir can be a string.""" + results = find_bids_files(str(temp_bids_dir), sub='02') + + assert len(results) == 2 + assert all('sub-02' in r.name for r in results) + + +def test_find_bids_files_no_tags(temp_bids_dir): + """Test finding files with no tag filters returns all files.""" + results = find_bids_files(temp_bids_dir) + + assert len(results) == 7 + + +def test_find_bids_files_nonexistent_tag(temp_bids_dir): + """Test filtering by a tag that doesn't exist in any files.""" + results = find_bids_files(temp_bids_dir, run='01') + + assert len(results) == 0 + + +@pytest.fixture +def temp_narps_dir(): + """Create a temporary NARPS-like directory structure.""" + tmpdir = tempfile.mkdtemp() + basedir = Path(tmpdir) + + # Create NARPS-style test files + files = [ + "sub-team01_hyp-1_type-thresh_desc-orig_stat.nii.gz", + "sub-team01_hyp-1_type-unthresh_desc-orig_stat.nii.gz", + "sub-team01_hyp-2_type-thresh_desc-orig_stat.nii.gz", + "sub-team02_hyp-1_type-thresh_desc-orig_stat.nii.gz", + "sub-team02_hyp-1_type-thresh_desc-rect_stat.nii.gz", + ] + + for filename in files: + filepath = basedir / filename + filepath.touch() + + yield basedir + + shutil.rmtree(tmpdir) + + +def test_find_bids_files_narps_format(temp_narps_dir): + """Test finding NARPS-formatted files.""" + results = find_bids_files(temp_narps_dir, sub='team01', hyp='1') + + assert len(results) == 2 + + +def test_find_bids_files_narps_by_type(temp_narps_dir): + """Test finding NARPS files by type.""" + results = find_bids_files(temp_narps_dir, type='thresh') + + assert len(results) == 4 + + +def test_find_bids_files_narps_by_desc(temp_narps_dir): + """Test finding NARPS files by description.""" + results = find_bids_files(temp_narps_dir, desc='rect') + + assert len(results) == 1 + assert results[0].name == "sub-team02_hyp-1_type-thresh_desc-rect_stat.nii.gz" + + +def test_find_bids_files_narps_complex_filter(temp_narps_dir): + """Test complex filtering on NARPS files.""" + results = find_bids_files( + temp_narps_dir, + sub='team01', + hyp='1', + type='thresh', + desc='orig' + ) + + assert len(results) == 1 + assert results[0].name == "sub-team01_hyp-1_type-thresh_desc-orig_stat.nii.gz" From 9b5f867795a39192afd4cdf2bbee2e194da9201b Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 29 Nov 2025 10:28:11 -0800 Subject: [PATCH 04/87] add summary function --- .../narps/bids_utils.py | 229 ++++++++++++++++++ 1 file changed, 229 insertions(+) create mode 100644 src/BetterCodeBetterScience/narps/bids_utils.py diff --git a/src/BetterCodeBetterScience/narps/bids_utils.py b/src/BetterCodeBetterScience/narps/bids_utils.py new file mode 100644 index 0000000..1a41efe --- /dev/null +++ b/src/BetterCodeBetterScience/narps/bids_utils.py @@ -0,0 +1,229 @@ +from typing import Dict, List, Union +from pathlib import Path + + +def parse_bids_filename(filename) -> Dict[str, str]: + """ + Parse a BIDS-like filename into its components. + + Parameters + ---------- + filename : str or Path + BIDS-like filename (string or Path object) + + Returns + ------- + dict + Dictionary with components of the filename, including 'path' key + """ + if isinstance(filename, str): + filepath = Path(filename) + else: + filepath = filename + + base = filepath.name + parts = base.split('_') + # take the last element which is the suffix + suffix = parts[-1].split('.')[0] + parts = parts[:-1] # remove suffix part + components = {'path': str(filepath), 'suffix': suffix} + for part in parts: + if '-' in part: + key, value = part.split('-', 1) + components[key] = value + return components + + +# take in a set of bids tags (defined as kwargs) and find all files matching them +def find_bids_files(basedir: Union[str, Path], **bids_tags) -> List[Path]: + """ + Find all files in a BIDS-like directory that match specified tags. + + Parameters + ---------- + basedir : str or Path + Base directory to search + **bids_tags : dict + Key-value pairs of BIDS tags to match + + Returns + ------- + list of Path + List of Path objects for files matching the specified tags + """ + if isinstance(basedir, str): + basedir = Path(basedir) + + matched_files = [] + for filepath in basedir.rglob('*'): + if filepath.is_file(): + components = parse_bids_filename(filepath) + if all(components.get(k) == v for k, v in bids_tags.items()): + matched_files.append(filepath) + return matched_files + + +def modify_bids_filename(filename: Union[str, Path], **bids_tags) -> Union[str, Path]: + """ + Modify a BIDS-like filename by replacing specified tag values. + + Parameters + ---------- + filename : str or Path + Original BIDS-like filename (string or Path object) + **bids_tags : dict + Key-value pairs of BIDS tags to modify + + Returns + ------- + str or Path + Modified filename with updated tag values, returned in the same type + as the input (str or Path). Full directory path is preserved. + + Examples + -------- + >>> modify_bids_filename("sub-123_desc-test_type-1_stat.nii.gz", desc="real") + 'sub-123_desc-real_type-1_stat.nii.gz' + + >>> modify_bids_filename("sub-01_task-rest_bold.nii", task="memory", run="02") + 'sub-01_task-memory_run-02_bold.nii' + """ + # Track input type + input_is_string = isinstance(filename, str) + + if input_is_string: + filepath = Path(filename) + else: + filepath = filename + + # Get directory and original filename + parent_dir = filepath.parent + original_name = filepath.name + + # Get the file extension(s) + if '.' in original_name: + # Handle both .nii.gz and .nii cases + ext_parts = original_name.split('.') + extension = '.' + '.'.join(ext_parts[1:]) + else: + extension = '' + + # Parse original filename to extract key-value pairs in order + base = original_name + if '.' in base: + base = base.split('.')[0] # Remove extension + + parts = base.split('_') + suffix = parts[-1] # Last part is the suffix + kv_parts = parts[:-1] # Everything before suffix + + # Build ordered list of (key, value) tuples, preserving original order + ordered_pairs = [] + existing_keys = set() + + for part in kv_parts: + if '-' in part: + key, value = part.split('-', 1) + # Update value if modification requested + if key in bids_tags: + value = bids_tags[key] + ordered_pairs.append((key, value)) + existing_keys.add(key) + + # Add any new keys from bids_tags that weren't in original + for key, value in bids_tags.items(): + if key not in existing_keys: + ordered_pairs.append((key, value)) + + # Reconstruct filename maintaining order + new_parts = [f"{key}-{value}" for key, value in ordered_pairs] + new_filename = '_'.join(new_parts) + f"_{suffix}{extension}" + + # Reconstruct full path + new_filepath = parent_dir / new_filename + + # Return in same type as input + if input_is_string: + return str(new_filepath) + else: + return new_filepath + + +def bids_summary( + basedir: Union[str, Path], + extension: str = ".nii.gz", + verbose: bool = True +) -> Dict[str, Dict[str, int]]: + """ + Summarize BIDS files in a directory by counting images for each type within each desc. + + Parameters + ---------- + basedir : str or Path + Base directory containing BIDS files + extension : str, optional + File extension to filter (default: ".nii.gz") + verbose : bool, optional + Whether to print summary to screen (default: True) + + Returns + ------- + dict + Nested dictionary with structure: + {desc_value: {type_value: count, ...}, ...} + If desc is not present, uses 'no_desc' as key. + If type is not present, uses 'no_type' as key. + + Examples + -------- + >>> summary = bids_summary("/data/narps", extension=".nii.gz", verbose=True) + Summary of BIDS files in /data/narps (*.nii.gz): + desc-orig: + type-thresh: 10 files + type-unthresh: 10 files + desc-rect: + type-thresh: 5 files + """ + if isinstance(basedir, str): + basedir = Path(basedir) + + # Find all files with the specified extension + if extension.startswith('.'): + pattern = f"*{extension}" + else: + pattern = f"*.{extension}" + + all_files = list(basedir.rglob(pattern)) + + # Count by desc and type + summary_dict = {} + + for filepath in all_files: + components = parse_bids_filename(filepath) + + # Get desc and type, with defaults if missing + desc_value = components.get('desc', 'no_desc') + type_value = components.get('type', 'no_type') + + # Initialize nested dict if needed + if desc_value not in summary_dict: + summary_dict[desc_value] = {} + + if type_value not in summary_dict[desc_value]: + summary_dict[desc_value][type_value] = 0 + + summary_dict[desc_value][type_value] += 1 + + # Print summary if verbose + if verbose: + print(f"Summary of BIDS files in {basedir} (*{extension}):") + if not summary_dict: + print(" No files found") + else: + for desc, type_counts in sorted(summary_dict.items()): + print(f" desc-{desc}:") + for type_val, count in sorted(type_counts.items()): + print(f" type-{type_val}: {count} files") + + return summary_dict + From 67a0b25bc61aa526ad695ed3a78d22512f68616c Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 29 Nov 2025 10:28:30 -0800 Subject: [PATCH 05/87] add summary function --- tests/narps/test_bids.py | 341 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 341 insertions(+) diff --git a/tests/narps/test_bids.py b/tests/narps/test_bids.py index 40f0540..6961659 100644 --- a/tests/narps/test_bids.py +++ b/tests/narps/test_bids.py @@ -8,6 +8,8 @@ from BetterCodeBetterScience.narps.bids_utils import ( parse_bids_filename, find_bids_files, + modify_bids_filename, + bids_summary, ) @@ -265,3 +267,342 @@ def test_find_bids_files_narps_complex_filter(temp_narps_dir): assert len(results) == 1 assert results[0].name == "sub-team01_hyp-1_type-thresh_desc-orig_stat.nii.gz" + + +# Tests for modify_bids_filename + + +def test_modify_bids_filename_single_tag(): + """Test modifying a single BIDS tag.""" + original = "sub-123_desc-test_type-1_stat.nii.gz" + result = modify_bids_filename(original, desc="real") + + # Should preserve order and return same type + assert result == "sub-123_desc-real_type-1_stat.nii.gz" + assert isinstance(result, str) + + +def test_modify_bids_filename_multiple_tags(): + """Test modifying multiple BIDS tags at once.""" + original = "sub-01_task-rest_bold.nii" + result = modify_bids_filename(original, task="memory", run="02") + + # Should add run and modify task, preserving original order + assert result == "sub-01_task-memory_run-02_bold.nii" + assert isinstance(result, str) + + +def test_modify_bids_filename_path_object(): + """Test that function works with Path objects.""" + original = Path("sub-01_ses-01_T1w.nii.gz") + result = modify_bids_filename(original, ses="02") + + # Should return Path object when input is Path + assert isinstance(result, Path) + assert result.name == "sub-01_ses-02_T1w.nii.gz" + + +def test_modify_bids_filename_add_new_tag(): + """Test adding a tag that wasn't in the original filename.""" + original = "sub-01_task-rest_bold.nii.gz" + result = modify_bids_filename(original, run="01", acq="mb") + + # New tags should be added at the end (after existing tags) + assert result == "sub-01_task-rest_run-01_acq-mb_bold.nii.gz" + assert isinstance(result, str) + + +def test_modify_bids_filename_preserve_extension(): + """Test that file extensions are preserved.""" + original = "sub-01_bold.nii.gz" + result = modify_bids_filename(original, sub="02") + + assert result.endswith(".nii.gz") + assert result == "sub-02_bold.nii.gz" + + +def test_modify_bids_filename_simple_extension(): + """Test with simple extension (not .nii.gz).""" + original = "sub-01_task-rest_bold.nii" + result = modify_bids_filename(original, task="memory") + + assert result.endswith(".nii") + assert "task-memory" in result + + +def test_modify_bids_filename_no_extension(): + """Test with no file extension.""" + original = "sub-01_task-rest_bold" + result = modify_bids_filename(original, task="memory") + + assert "task-memory" in result + assert not result.endswith(".") + + +def test_modify_bids_filename_narps_format(): + """Test modifying NARPS-specific fields.""" + original = "sub-team01_hyp-1_type-thresh_desc-orig_stat.nii.gz" + result = modify_bids_filename(original, desc="rect", type="unthresh") + + assert "desc-rect" in result + assert "type-unthresh" in result + assert "hyp-1" in result + assert "sub-team01" in result + + +def test_modify_bids_filename_change_subject(): + """Test changing the subject ID.""" + original = "sub-01_task-rest_bold.nii.gz" + result = modify_bids_filename(original, sub="99") + + # Should preserve order with modified subject + assert result == "sub-99_task-rest_bold.nii.gz" + + +def test_modify_bids_filename_preserve_suffix(): + """Test that suffix is preserved when modifying other tags.""" + original = "sub-01_task-rest_bold.nii.gz" + result = modify_bids_filename(original, task="memory") + + assert result.endswith("_bold.nii.gz") + + +def test_modify_bids_filename_complex_modification(): + """Test complex modification with many changes.""" + original = "sub-01_ses-01_task-rest_acq-mb_bold.nii.gz" + result = modify_bids_filename( + original, + sub="02", + ses="03", + task="memory", + run="01" + ) + + # Should preserve original order and append new tag + assert result == "sub-02_ses-03_task-memory_acq-mb_run-01_bold.nii.gz" + + +def test_modify_bids_filename_hyphenated_value(): + """Test that hyphenated values are handled correctly.""" + original = "sub-control-01_task-go_bold.nii" + result = modify_bids_filename(original, task="go-nogo") + + # Should preserve order with hyphenated values + assert result == "sub-control-01_task-go-nogo_bold.nii" + + +def test_modify_bids_filename_empty_modification(): + """Test that calling with no modifications returns identical filename.""" + original = "sub-01_task-rest_bold.nii.gz" + result = modify_bids_filename(original) + + # Should return exactly the same filename + assert result == original + + +def test_modify_bids_filename_with_full_path(): + """Test modification preserves full directory path.""" + original = Path("/data/bids/sub-01_task-rest_bold.nii.gz") + result = modify_bids_filename(original, task="memory") + + # Result should preserve the full directory path + assert isinstance(result, Path) + assert result == Path("/data/bids/sub-01_task-memory_bold.nii.gz") + assert result.parent == Path("/data/bids") + assert result.name == "sub-01_task-memory_bold.nii.gz" + + +def test_modify_bids_filename_string_with_path(): + """Test that string input with directory path preserves the path.""" + original = "/data/bids/sub-01_task-rest_bold.nii.gz" + result = modify_bids_filename(original, task="memory") + + # Should return string and preserve directory path + assert isinstance(result, str) + assert result == "/data/bids/sub-01_task-memory_bold.nii.gz" + + +def test_modify_bids_filename_order_preservation(): + """Test that original key order is strictly preserved.""" + original = "sub-01_desc-orig_type-thresh_hyp-1_stat.nii.gz" + result = modify_bids_filename(original, type="unthresh") + + # Order should be exactly: sub, desc, type, hyp + assert result == "sub-01_desc-orig_type-unthresh_hyp-1_stat.nii.gz" + + +def test_modify_bids_filename_relative_path(): + """Test with relative path.""" + original = Path("data/sub-01_task-rest_bold.nii.gz") + result = modify_bids_filename(original, task="memory") + + assert isinstance(result, Path) + assert result == Path("data/sub-01_task-memory_bold.nii.gz") + assert result.parent == Path("data") + + +# Tests for bids_summary + + +@pytest.fixture +def temp_summary_dir(): + """Create a temporary directory with various BIDS files for summary testing.""" + tmpdir = tempfile.mkdtemp() + basedir = Path(tmpdir) + + # Create test files with different desc and type combinations + files = [ + "sub-team01_desc-orig_type-thresh_hyp-1_stat.nii.gz", + "sub-team01_desc-orig_type-thresh_hyp-2_stat.nii.gz", + "sub-team01_desc-orig_type-unthresh_hyp-1_stat.nii.gz", + "sub-team01_desc-orig_type-unthresh_hyp-2_stat.nii.gz", + "sub-team02_desc-orig_type-thresh_hyp-1_stat.nii.gz", + "sub-team02_desc-rect_type-thresh_hyp-1_stat.nii.gz", + "sub-team02_desc-rect_type-unthresh_hyp-1_stat.nii.gz", + "sub-team03_desc-rect_type-thresh_hyp-1_stat.nii.gz", + # Files without desc or type + "sub-team04_hyp-1_stat.nii.gz", + "sub-team05_desc-orig_hyp-1_stat.nii.gz", + # Different extension (should be excluded by default) + "sub-team01_desc-orig_type-thresh_hyp-1_stat.nii", + ] + + for filename in files: + filepath = basedir / filename + filepath.touch() + + yield basedir + + shutil.rmtree(tmpdir) + + +def test_bids_summary_basic(temp_summary_dir, capsys): + """Test basic summary functionality.""" + result = bids_summary(temp_summary_dir, verbose=False) + + # Check the structure + assert 'orig' in result + assert 'rect' in result + + # Check counts for orig + assert result['orig']['thresh'] == 3 + assert result['orig']['unthresh'] == 2 + + # Check counts for rect + assert result['rect']['thresh'] == 2 + assert result['rect']['unthresh'] == 1 + + +def test_bids_summary_verbose_output(temp_summary_dir, capsys): + """Test that verbose mode prints summary.""" + result = bids_summary(temp_summary_dir, verbose=True) + + captured = capsys.readouterr() + assert "Summary of BIDS files" in captured.out + assert "desc-orig:" in captured.out + assert "desc-rect:" in captured.out + assert "type-thresh:" in captured.out + assert "type-unthresh:" in captured.out + + +def test_bids_summary_no_verbose(temp_summary_dir, capsys): + """Test that verbose=False suppresses output.""" + result = bids_summary(temp_summary_dir, verbose=False) + + captured = capsys.readouterr() + assert captured.out == "" + + +def test_bids_summary_different_extension(temp_summary_dir): + """Test filtering by different extension.""" + result = bids_summary(temp_summary_dir, extension=".nii", verbose=False) + + # Should find only the .nii file + assert 'orig' in result + assert result['orig']['thresh'] == 1 + # Should not find .nii.gz files + assert result['orig'].get('unthresh') is None + + +def test_bids_summary_no_desc(temp_summary_dir): + """Test handling of files without desc tag.""" + result = bids_summary(temp_summary_dir, verbose=False) + + # Files without desc should be grouped under 'no_desc' + assert 'no_desc' in result + + +def test_bids_summary_no_type(temp_summary_dir): + """Test handling of files without type tag.""" + result = bids_summary(temp_summary_dir, verbose=False) + + # Check for files with desc but no type + assert 'orig' in result + assert 'no_type' in result['orig'] + assert result['orig']['no_type'] == 1 + + +def test_bids_summary_empty_directory(): + """Test summary on empty directory.""" + tmpdir = tempfile.mkdtemp() + try: + result = bids_summary(tmpdir, verbose=False) + assert result == {} + finally: + shutil.rmtree(tmpdir) + + +def test_bids_summary_string_path(temp_summary_dir): + """Test that function accepts string paths.""" + result = bids_summary(str(temp_summary_dir), verbose=False) + + assert isinstance(result, dict) + assert 'orig' in result + + +def test_bids_summary_extension_with_dot(temp_summary_dir): + """Test that extension works with or without leading dot.""" + result1 = bids_summary(temp_summary_dir, extension=".nii.gz", verbose=False) + result2 = bids_summary(temp_summary_dir, extension="nii.gz", verbose=False) + + # Both should produce same results + assert result1 == result2 + + +def test_bids_summary_nested_directories(): + """Test summary works with nested directory structures.""" + tmpdir = tempfile.mkdtemp() + basedir = Path(tmpdir) + + try: + # Create nested structure + subdir1 = basedir / "sub-01" + subdir2 = basedir / "sub-02" / "nested" + subdir1.mkdir(parents=True) + subdir2.mkdir(parents=True) + + (subdir1 / "sub-01_desc-orig_type-thresh_stat.nii.gz").touch() + (subdir2 / "sub-02_desc-orig_type-thresh_stat.nii.gz").touch() + + result = bids_summary(basedir, verbose=False) + + # Should find files in nested directories + assert result['orig']['thresh'] == 2 + + finally: + shutil.rmtree(tmpdir) + + +def test_bids_summary_return_structure(temp_summary_dir): + """Test that return structure is correct type.""" + result = bids_summary(temp_summary_dir, verbose=False) + + assert isinstance(result, dict) + for desc_key, type_dict in result.items(): + assert isinstance(desc_key, str) + assert isinstance(type_dict, dict) + for type_key, count in type_dict.items(): + assert isinstance(type_key, str) + assert isinstance(count, int) + assert count > 0 + From bb65968aa668458b42e08d9522ba296561f08e8e Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 29 Nov 2025 10:28:46 -0800 Subject: [PATCH 06/87] add deps --- pyproject.toml | 1 + uv.lock | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/pyproject.toml b/pyproject.toml index 1fa8704..8f8d3ec 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -59,6 +59,7 @@ dependencies = [ "ols-client>=0.2.1", "statsmodels>=0.14.5", "blue>=0.9.1", + "nilearn>=0.12.1", ] [build-system] diff --git a/uv.lock b/uv.lock index bed2c1b..dde3a40 100644 --- a/uv.lock +++ b/uv.lock @@ -327,6 +327,7 @@ dependencies = [ { name = "neo4j" }, { name = "networkx" }, { name = "nibabel" }, + { name = "nilearn" }, { name = "numba" }, { name = "numpy" }, { name = "ols-client" }, @@ -385,6 +386,7 @@ requires-dist = [ { name = "neo4j", specifier = ">=6.0.3" }, { name = "networkx", specifier = ">=3.4.2" }, { name = "nibabel", specifier = ">=5.3.2" }, + { name = "nilearn", specifier = ">=0.12.1" }, { name = "numba", specifier = ">=0.61.0" }, { name = "numpy", specifier = ">=2.1.2" }, { name = "ols-client", specifier = ">=0.2.1" }, @@ -3005,6 +3007,26 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/43/b2/dc384197be44e2a640bb43311850e23c2c30f3b82ce7c8cdabbf0e53045e/nibabel-5.3.2-py3-none-any.whl", hash = "sha256:52970a5a8a53b1b55249cba4d9bcfaa8cc57e3e5af35a29d7352237e8680a6f8", size = 3293839, upload-time = "2024-10-23T14:19:52.65Z" }, ] +[[package]] +name = "nilearn" +version = "0.12.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "joblib" }, + { name = "lxml" }, + { name = "nibabel" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "pandas" }, + { name = "requests" }, + { name = "scikit-learn" }, + { name = "scipy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/4f/1c/5d6ef3f80145889bee1c7abbf107c88e7adea7d5b498633f8644910c673c/nilearn-0.12.1.tar.gz", hash = "sha256:a08bbfae94d0fac5ba0aebbbcd864b7f91d1ef5725d1c309ce643dd64b2391b9", size = 25134624, upload-time = "2025-09-03T06:00:40.631Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6b/90/f17ebc6914b9ed0b577475a17a0b8d31e929897f7002bae1b03438852dad/nilearn-0.12.1-py3-none-any.whl", hash = "sha256:2112e1cdf9f7b96e0af87d679997e834ee36534fc0a970811a703700820edc4c", size = 12743284, upload-time = "2025-09-03T06:00:18.625Z" }, +] + [[package]] name = "nodeenv" version = "1.9.1" From 34165578daeb036b0ef00b60d64537775358259e Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 29 Nov 2025 10:30:13 -0800 Subject: [PATCH 07/87] initial add --- .../narps/image_utils.py | 134 ++++++++ .../narps/narps_megascript.py | 285 ++++++++++++++++++ 2 files changed, 419 insertions(+) create mode 100644 src/BetterCodeBetterScience/narps/image_utils.py create mode 100644 src/BetterCodeBetterScience/narps/narps_megascript.py diff --git a/src/BetterCodeBetterScience/narps/image_utils.py b/src/BetterCodeBetterScience/narps/image_utils.py new file mode 100644 index 0000000..7a9e950 --- /dev/null +++ b/src/BetterCodeBetterScience/narps/image_utils.py @@ -0,0 +1,134 @@ + +import os +from typing import Dict, Optional, Union +from pathlib import Path + +import numpy as np +import nibabel as nib + + +def compare_thresh_unthresh_values( + thresh_image_path: Union[str, Path], + unthresh_image_path: Union[str, Path], + hyp_num: int, + team_id: Optional[str] = None, + collection_id: Optional[str] = None, + error_thresh: float = 0.05, + verbose: bool = True, + logger=None, +) -> Dict: + """ + Examine unthresholded values within thresholded map voxels + to check direction of maps and determine if rectification is needed. + + If more than error_thresh percent of voxels are in opposite direction, + then flag a problem. We allow a few to bleed over due to interpolation. + + Parameters + ---------- + thresh_image_path : str or Path + Path to thresholded image + unthresh_image_path : str or Path + Path to unthresholded image + hyp_num : int + Hypothesis number + team_id : str, optional + Team identifier + collection_id : str, optional + Collection identifier + error_thresh : float + Threshold for flagging problems (proportion of voxels in wrong direction) + verbose : bool + Whether to print diagnostic messages + + Returns + ------- + dict + Dictionary containing diagnostic information: + - autorectify: bool, whether image should be rectified + - problem: bool, whether there's a problem with the image + - n_thresh_vox: int, number of thresholded voxels + - min_unthresh: float, minimum unthresholded value within mask + - max_unthresh: float, maximum unthresholded value within mask + - p_pos_unthresh: float, proportion of positive unthresholded values + - p_neg_unthresh: float, proportion of negative unthresholded values + """ + result = { + "hyp": hyp_num, + "team_id": team_id, + "collection_id": collection_id, + "autorectify": False, + "problem": False, + "n_thresh_vox": 0, + "min_unthresh": np.nan, + "max_unthresh": np.nan, + "p_pos_unthresh": np.nan, + "p_neg_unthresh": np.nan, + } + + # Check if files exist + if not os.path.exists(thresh_image_path): + if verbose and logger: + logger.warning("Threshold image not found: %s", thresh_image_path) + return result + + if not os.path.exists(unthresh_image_path): + if verbose and logger: + logger.warning("Unthreshold image not found: %s", unthresh_image_path) + return result + + # Load images + thresh_img = nib.load(thresh_image_path) + thresh_data = thresh_img.get_fdata().flatten() + thresh_data = np.nan_to_num(thresh_data) + + unthresh_img = nib.load(unthresh_image_path) + unthresh_data = unthresh_img.get_fdata().flatten() + unthresh_data = np.nan_to_num(unthresh_data) + + # Check shape compatibility + if thresh_data.shape != unthresh_data.shape: + if verbose and logger: + logger.error("thresh/unthresh size mismatch for hyp %d", hyp_num) + result["problem"] = True + return result + + # Count thresholded voxels + n_thresh_vox = np.sum(thresh_data > 0) + result["n_thresh_vox"] = int(n_thresh_vox) + + if n_thresh_vox == 0: + if verbose and logger: + logger.warning("hyp %d - empty mask", hyp_num) + return result + + # Analyze values within the mask + inmask_unthresh_data = unthresh_data[thresh_data > 0] + + min_val = float(np.min(inmask_unthresh_data)) + max_val = float(np.max(inmask_unthresh_data)) + p_pos_unthresh = float(np.mean(inmask_unthresh_data > 0)) + p_neg_unthresh = float(np.mean(inmask_unthresh_data < 0)) + + result["min_unthresh"] = min_val + result["max_unthresh"] = max_val + result["p_pos_unthresh"] = p_pos_unthresh + result["p_neg_unthresh"] = p_neg_unthresh + + # Check if rectification is needed + if max_val < 0: # All values are negative + result["autorectify"] = True + if verbose and logger: + logger.info("Autorectify needed: hyp %d", hyp_num) + + # Check for problems + min_p_direction = min(p_pos_unthresh, p_neg_unthresh) + if min_p_direction > error_thresh: + if verbose and logger: + logger.warning( + "hyp %d invalid in-mask values (neg: %.3f, pos: %.3f)", + hyp_num, p_neg_unthresh, p_pos_unthresh + ) + result["problem"] = True + + return result diff --git a/src/BetterCodeBetterScience/narps/narps_megascript.py b/src/BetterCodeBetterScience/narps/narps_megascript.py new file mode 100644 index 0000000..a696657 --- /dev/null +++ b/src/BetterCodeBetterScience/narps/narps_megascript.py @@ -0,0 +1,285 @@ +## This is a megascript to run the NARPS preprocessing workflow +## This will serve as the basis for refactoring into a more modular workflow later. + +import os +import dotenv +from pathlib import Path +import tarfile +import urllib.request +from typing import Dict, List, Union +import shutil +from BetterCodeBetterScience.narps.bids_utils import ( + parse_bids_filename, + find_bids_files, + modify_bids_filename +) +from nilearn.maskers import NiftiMasker +import nibabel as nib +import numpy as np + +dotenv.load_dotenv() + + + + +## Download data +# - the organized data are available from https://zenodo.org/record/3528329/files/narps_origdata_1.0.tgz + +assert 'NARPS_DATADIR' in os.environ, "Please set NARPS_DATADIR in your environment variables or .env file" +basedir = Path(os.environ['NARPS_DATADIR']) +if not basedir.exists(): + basedir.mkdir(parents=True, exist_ok=True) + +narps_data_url = "https://zenodo.org/record/3528329/files/narps_origdata_1.0.tgz" +narps_data_archive = basedir / "narps_origdata_1.0.tgz" + +overwrite_data = False + +if not narps_data_archive.exists() or overwrite_data: + + print(f"Downloading NARPS data from {narps_data_url}...") + urllib.request.urlretrieve(narps_data_url, narps_data_archive) + print("Download complete.") + +origdir = basedir / "orig" +if not origdir.exists() or overwrite_data: + print("Extracting data...") + with tarfile.open(narps_data_archive, "r:gz") as tar: + tar.extractall(path=basedir) + print("Extraction complete.") + + +## Get info about teams and hypotheses +## team dirs are in orig, starting with numeric team IDs + +teamdirs = sorted([d for d in origdir.iterdir() if d.is_dir() and d.name[0].isdigit()]) +print(f"Found {len(teamdirs)} team directories.") +team_dict = {d.name: {'orig': d} for d in teamdirs} + +# convert orig data to a BIDS-like organization + +overwrite = False +datadir = basedir / "data" +if datadir.exists() and overwrite: + shutil.rmtree(datadir) + +if not datadir.exists(): + datadir.mkdir(parents=True, exist_ok=True) + +for team_id, paths in team_dict.items(): + # only use the team number + team_id_short = team_id.split('_')[0] + team_orig_dir = paths['orig'] + # include "thresh" to prevent some additional files from being detected + for type in ['thresh', 'unthresh']: + for img_file in team_orig_dir.glob(f"*_{type}.nii.gz"): + hyp, imgtype = img_file.name.split('.')[0].replace('hypo','').split('_') + dest_file = datadir / f"team-{team_id_short}_hyp-{hyp}_type-{imgtype}_desc-orig_stat.nii.gz" + if not dest_file.exists(): + print(f"Copying {img_file} to {dest_file}") + shutil.copy(img_file.resolve(), dest_file) + assert parse_bids_filename(dest_file)['team'] == team_id_short + assert parse_bids_filename(dest_file)['hyp'] == hyp + assert parse_bids_filename(dest_file)['type'] == imgtype + +# - Create rectified images - narps.create_rectified_images() +# - input: original image (thresh and unthresh versions) +# - output: rectified images for reverse contrasts +# - NOTE: see logic within get_binarized_thresh_masks + +print("Creating rectified images...") +unthresh_images_to_rectify = find_bids_files( + datadir, type='unthresh', desc='orig') + +for unthresh_img_path in unthresh_images_to_rectify: + components = parse_bids_filename(unthresh_img_path) + hyp_num = int(components['hyp']) + team_id = components['team'] + result = { + "hyp": hyp_num, + "team_id": team_id, + } + thresh_img_path = modify_bids_filename( + unthresh_img_path, + type='thresh' + ) + if not Path(thresh_img_path).exists(): + print(f"Thresholded image not found for hyp {hyp_num}, team {team_id}, skipping rectification.") + result["problem"] = 'thresholded image not found' + continue + + output_path = Path(modify_bids_filename( + unthresh_img_path, + desc='rectified' + )) + result["output_path"] = str(output_path) + if output_path.exists() and not overwrite: + print(f"Rectified image already exists: {output_path}, skipping.") + continue + + # recitification involves looking at the values of the unthresh + # image within the mask defined by the thresholded image + unthresh_img = nib.load(str(unthresh_img_path)) + thresh_img = nib.load(str(thresh_img_path)) + + # check image dimensions + if unthresh_img.shape != thresh_img.shape: + print(f"Image shape mismatch for hyp {components['hyp']}, team {components['team']}, skipping rectification.") + result["problem"] = 'image shape mismatch' + continue + thresh_data = thresh_img.get_fdata().flatten() + thresh_data = np.nan_to_num(thresh_data) + n_thresh_vox = np.sum(thresh_data > 0) + result["n_thresh_vox"] = int(n_thresh_vox) + if n_thresh_vox == 0: + print(f"Empty mask for hyp {components['hyp']}, team {components['team']}, skipping rectification.") + continue + + masker = NiftiMasker(mask_img=thresh_img) + masker.fit() + unthresh_data = masker.transform(str(unthresh_img_path)).flatten() + +ljksadf + + +# - Get binarized thresholded maps (narps.get_binarized_thresh_masks()) +# - input: thresholded original image +# - output: binarized version + + +# - Get resampled images (narps.get_resampled_images()) +# - input: all image types (thresh, bin, unthresh) +# - output: resampled image in MNI space + + +# - Create concatenated versions of all images - narps.create_concat_imag +# - input: individual 3d images +# - output: combined 4d images for each image type + + +# - Check image values - narps.check_image_values() +# - input: thresholded images +# - output: data frame with number of NA and nonzero voxels + + +# - Create mean thresholded images - narps.create_mean_thresholded_images() +# - input: contact thresh image +# - output: mean thresholded image + + +# - Convert to zscores - narps.convert_to_zscores() +# - input: unthresh rectified images +# - output: zscore images + + +# - Create concatenated utnhresh zstat images - narps.create_concat_images(datatype='zstat', imgtypes=['unthresh'], +# create_voxel_map=True) +# input: zstat images +# output: concatenated 4d zstat image +# NOTE: maynbe consider using the concatenated unthresh image to compute zstats + + +## Analyses + +# - Compute image stats - narps.compute_image_stats() +# input: unthresholded concatenated files +# output: range and std images + +# - Estimate smoothness - narps.estimate_smoothness() - relies upon FSL smoothest +# input: individual 3d team zstat images +# output: dataframe containing smoothness estimates for each team + + +# ### Map analysis functions + +# NOTE: the following three are largely duplicative +# - Create overlap maps - AnalyzeMaps.mk_overlap_maps +# input: mean thresholded images +# output: pdf and png files with rendering of map on anatomy +# - Create range maps - AnalyzeMaps.mk_range_maps +# input: range images +# output: pdf/png files with rendering of range map on anatomy +# - Create std maps - AnalyzeMaps.mk_std_maps +# input: std images +# output: pdf/png files with rendering of std map on anatomy + +# - Unthresholded correlation analysis - test_unthresh_correlation_analysis +# - Create correlation maps for unthresholded images - AnalyzeMaps.mk_correlation_maps_unthresh +# input: concatenated unthresholded images +# output: +# - median pattern correlation (?) +# - df containing correlation matrix between datasets for each hypothesis +# - png/pdf images of clustermap based on correlation matrix +# - cluster membership (saved as json) +# - reordered correlation matrix based on ward clustering +# - dendrograms for each hypothesis +# NOTE: this one has WAY too much going on - good candidate for refactoring example! +# - Analyze clusters - AnalyzeMaps.analyze_clusters +# input: dendrograms and membership dict +# output: +# - mean unthresholded map for each cluster/hypothesis +# - various stats output to logfile +# -pdf/png files with rendered thresholded map for each cluster +# - cluster metadata saved to csv +# - cluster similarity (rand scores) saved to csv +# - consistency of cluster membership across hypotheses +# - Plot distance from mean - AnalyzeMaps.plot_distance_from_mean(narps) +# input: median pattern correlation +# output: +# - pdf with bar plot of correlations across teams +# - pdf of bar plot excluding teams above .2 correlation (i.e. only bad teams) +# - Get similarity of thresholded images - AnalyzeMaps.get_thresh_similarity(narps) +# input: concatenated thresholded data +# output: +# - df with percent agreement +# - pdf/png with cluster map of percent agreement +# - mean jaccard for nonzero voxels +# - Get thresholded z-stat maps - utils.get_thresholded_Z_maps(narps) +# input: unthresholded z maps and thresholded binarized maps +# output: thresholded Z maps +# NOTE: currently done on individual maps, probably should do on concatenated +# - Get diagnostics on thresholded maps - ThreshVoxelStatistics.get_zstat_diagnostics(narps) +# input: thresholded and unthresholded maps +# output: diagnostic data comparing thresh and unthresh for each team +# - Get stats on thresholded maps - ThreshVoxelStatistics.get_thresh_voxel_stats(narps.basedir) +# input: individual diagnostic data frames +# output: +# - df with all diagnostic data combined +# - df with statistics on thresholded maps +# - Get similarity summary - GetMeanSimilarity.get_similarity_summary() +# input: correlation data frames, cluster membership json +# output: +# - png with histogram of correlations +# - pngs with hisrogram of correlations for each cluster +# - data frame with correlation results by hyp/group + + +# ### Consensus analysis functions + +# - run t-tests - ConsensusAnalysis.run_ttests +# input: unthresholded zstat images +# output: +# - t, pval, and fdr image for each hypothesis +# - tau image for each hypothesis +# - make figures - ConsensusAnalysis.mk_figures() +# input: t and fdr images +# output: +# - pdf with rendered consensus images +# - pdf with rendered tau maps +# - pdf with rendered tau histograms + +# - Compute cluster similarity for Tom et al. and mean activation - ClusterImageCorrelation.cluster_image_correlation +# input: cluster images and target image (mean or tom et al) +# output: +# - correlation maps for each cluster vs target for each hyp +# - df with cluster correlations for each hyp + +# - Make combined cluster figures - MakeCombinedClusterFigures.make_combined_cluster_figures +# input: pngs for corr map and cluster means +# output: png with combined cluster images +# - Make supp figure 1 - MakeSupplementaryFigure1.mk_supp_figure1 +# input: metadata +# output: +# - csv tables with merged metadata, decision data, and confidence data +# - png image with confidence data and modeling info + From 25ae638effd4876df13e80ec7389be8796ec3da2 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 29 Nov 2025 17:30:09 -0800 Subject: [PATCH 08/87] working version --- .../narps/narps_megascript.py | 421 +++++++++++------- 1 file changed, 257 insertions(+), 164 deletions(-) diff --git a/src/BetterCodeBetterScience/narps/narps_megascript.py b/src/BetterCodeBetterScience/narps/narps_megascript.py index a696657..98abaed 100644 --- a/src/BetterCodeBetterScience/narps/narps_megascript.py +++ b/src/BetterCodeBetterScience/narps/narps_megascript.py @@ -16,12 +16,12 @@ from nilearn.maskers import NiftiMasker import nibabel as nib import numpy as np +import json +import templateflow.api as tflow +from nilearn.image import resample_to_img dotenv.load_dotenv() - - - ## Download data # - the organized data are available from https://zenodo.org/record/3528329/files/narps_origdata_1.0.tgz @@ -48,6 +48,9 @@ tar.extractall(path=basedir) print("Extraction complete.") +logdir = basedir / "logs" +if not logdir.exists(): + logdir.mkdir(parents=True, exist_ok=True) ## Get info about teams and hypotheses ## team dirs are in orig, starting with numeric team IDs @@ -59,7 +62,7 @@ # convert orig data to a BIDS-like organization overwrite = False -datadir = basedir / "data" +datadir = basedir / "data-teams" if datadir.exists() and overwrite: shutil.rmtree(datadir) @@ -74,7 +77,12 @@ for type in ['thresh', 'unthresh']: for img_file in team_orig_dir.glob(f"*_{type}.nii.gz"): hyp, imgtype = img_file.name.split('.')[0].replace('hypo','').split('_') - dest_file = datadir / f"team-{team_id_short}_hyp-{hyp}_type-{imgtype}_desc-orig_stat.nii.gz" + try: + int(hyp) + except ValueError: + print(f"Unexpected hypothesis number format in file {img_file}, skipping.") + continue + dest_file = datadir / f"team-{team_id_short}_hyp-{hyp}_type-{imgtype}_space-native_desc-orig_stat.nii.gz" if not dest_file.exists(): print(f"Copying {img_file} to {dest_file}") shutil.copy(img_file.resolve(), dest_file) @@ -82,204 +90,289 @@ assert parse_bids_filename(dest_file)['hyp'] == hyp assert parse_bids_filename(dest_file)['type'] == imgtype -# - Create rectified images - narps.create_rectified_images() -# - input: original image (thresh and unthresh versions) -# - output: rectified images for reverse contrasts -# - NOTE: see logic within get_binarized_thresh_masks +# QC to identify bad data and move them to excluded data dir +# look for: +# - different image dimensions or affine between thresh and unthresh images for a given team/hyp +# - missing thresholded images +# - need for rectification (i.e. mostly negative values in unthresh image within the mask defined by the thresh image) +# - invalid in-mask values (i.e. both positive and negative values in unthresh image within the mask defined by the thresh image) - don't exclude, just note in log -print("Creating rectified images...") -unthresh_images_to_rectify = find_bids_files( - datadir, type='unthresh', desc='orig') +datadir_excluded = basedir / "data-teams-excluded" +if not datadir_excluded.exists(): + datadir_excluded.mkdir(parents=True, exist_ok=True) + +qc_results = {} +error_thresh = 0.1 # proportion of invalid in-mask values to flag problem -for unthresh_img_path in unthresh_images_to_rectify: +print("Running QC on original images...") +unthresh_images = find_bids_files(datadir, type='unthresh', desc='orig') +print (f"Found {len(unthresh_images)} unthresh original images to QC.") + +for unthresh_img_path in unthresh_images: components = parse_bids_filename(unthresh_img_path) hyp_num = int(components['hyp']) team_id = components['team'] result = { "hyp": hyp_num, "team_id": team_id, + 'infile': str(unthresh_img_path) } thresh_img_path = modify_bids_filename( unthresh_img_path, type='thresh' ) if not Path(thresh_img_path).exists(): - print(f"Thresholded image not found for hyp {hyp_num}, team {team_id}, skipping rectification.") - result["problem"] = 'thresholded image not found' + print(f"Thresholded image not found for hyp {hyp_num}, team {team_id}, moving to excluded data.") + shutil.move(unthresh_img_path, datadir_excluded / unthresh_img_path.name) + result["exclusion"] = 'thresholded image not found' + qc_results[str(unthresh_img_path)] = result continue - output_path = Path(modify_bids_filename( - unthresh_img_path, - desc='rectified' - )) - result["output_path"] = str(output_path) - if output_path.exists() and not overwrite: - print(f"Rectified image already exists: {output_path}, skipping.") - continue - - # recitification involves looking at the values of the unthresh - # image within the mask defined by the thresholded image + # Load images unthresh_img = nib.load(str(unthresh_img_path)) thresh_img = nib.load(str(thresh_img_path)) - # check image dimensions - if unthresh_img.shape != thresh_img.shape: - print(f"Image shape mismatch for hyp {components['hyp']}, team {components['team']}, skipping rectification.") - result["problem"] = 'image shape mismatch' + # check image dimensions and affine + if unthresh_img.shape != thresh_img.shape or not np.allclose(unthresh_img.affine, thresh_img.affine): + print(f"Image shape or affine mismatch for hyp {components['hyp']}, team {components['team']}, moving to excluded data.") + shutil.move(unthresh_img_path, datadir_excluded / unthresh_img_path.name) + shutil.move(thresh_img_path, datadir_excluded / Path(thresh_img_path).name) + result["exclusion"] = 'image shape or affine mismatch' + qc_results[str(unthresh_img_path)] = result continue + thresh_data = thresh_img.get_fdata().flatten() thresh_data = np.nan_to_num(thresh_data) n_thresh_vox = np.sum(thresh_data > 0) + # check for min_p_direction > error_thresh result["n_thresh_vox"] = int(n_thresh_vox) - if n_thresh_vox == 0: - print(f"Empty mask for hyp {components['hyp']}, team {components['team']}, skipping rectification.") - continue - masker = NiftiMasker(mask_img=thresh_img) - masker.fit() - unthresh_data = masker.transform(str(unthresh_img_path)).flatten() -ljksadf + if n_thresh_vox > 0: + masker = NiftiMasker(mask_img=thresh_img) + masker.fit() + unthresh_data_masked = masker.transform(str(unthresh_img_path)) + + min_val = float(np.min(unthresh_data_masked.flatten())) + max_val = float(np.max(unthresh_data_masked.flatten())) + p_pos_unthresh = float(np.mean(unthresh_data_masked.flatten() > 0)) + p_neg_unthresh = float(np.mean(unthresh_data_masked.flatten() < 0)) + result["min_unthresh"] = min_val + result["max_unthresh"] = max_val + result["p_pos_unthresh"] = p_pos_unthresh + result["p_neg_unthresh"] = p_neg_unthresh + + if p_neg_unthresh > (1 - error_thresh): + # mostly negative values, rectify + result["autorectify"] = True + else: + result["autorectify"] = False + + # Check for problems - note these in QC but don't exclude + min_p_direction = min(p_pos_unthresh, p_neg_unthresh) + if min_p_direction > error_thresh: + result["problem"] = 'invalid in-mask values' + print(f"hyp {hyp_num}, team {team_id} - invalid in-mask values (neg: {p_neg_unthresh:.3f}, pos: {p_pos_unthresh:.3f})") + + qc_results[str(unthresh_img_path)] = result + +with open(logdir / "qc_log.json", "w") as f: + json.dump(qc_results, f, indent=4) # - Get binarized thresholded maps (narps.get_binarized_thresh_masks()) # - input: thresholded original image # - output: binarized version +# Create binarized versions of thresholded images + +print("Creating binarized images...") +thresh_images_to_binarize = find_bids_files( + datadir, type='thresh', desc='orig') +print (f"Found {len(thresh_images_to_binarize)} thresh binarized images to process.") + +results = {} +overwrite = False +thresh = 1e-4 +for thresh_img_path in thresh_images_to_binarize: + outfile = Path(modify_bids_filename( + thresh_img_path, + desc='binarized', suffix='mask' + )) + if outfile.exists() and not overwrite: + continue + + thresh_img = nib.load(str(thresh_img_path)) + thresh_data = thresh_img.get_fdata() + thresh_data = np.nan_to_num(thresh_data) + binarized_data = (np.abs(thresh_data) > thresh).astype(np.float32) + binarized_img = nib.Nifti1Image( + binarized_data, + thresh_img.affine, + thresh_img.header + ) + binarized_img.to_filename(str(outfile)) + results[str(outfile)] = { + 'infile': str(thresh_img_path), + 'n_nonzero_voxels': int(np.sum(binarized_data)) + } + +with open(logdir / "binarization_log.json", "w") as f: + json.dump(results, f, indent=4) + + +# - Create rectified images - narps.create_rectified_images() +# - input: original image (thresh and unthresh versions) +# - output: rectified images for reverse contrasts +# - NOTE: see logic within get_binarized_thresh_masks + +print("Creating rectified images...") +results = {} +overwrite = False + +for unthresh_img_path, values in qc_results.items(): + if not Path(unthresh_img_path).exists(): + continue + + output_path = Path(modify_bids_filename( + unthresh_img_path, + desc='rectified' + )) + + if output_path.exists() and not overwrite: + print(f"Rectified image already exists: {output_path}, skipping.") + continue + + unthresh_img = nib.load(str(unthresh_img_path)) + unthresh_data = unthresh_img.get_fdata() + + if values.get("autorectify", False): + # mostly negative values, rectify + print(f"Rectifying unthresh image for hyp {hyp_num}, team {team_id}") + rectified_data = -1 * unthresh_data + qc_results[unthresh_img_path]["rectified"] = True + else: + rectified_data = unthresh_data + qc_results[unthresh_img_path]["rectified"] = False + + rectified_img = nib.Nifti1Image( + rectified_data, + unthresh_img.affine, + unthresh_img.header + ) + rectified_img.to_filename(str(output_path)) + + results[str(unthresh_img_path)] = result + +with open(logdir / "rectification_log.json", "w") as f: + json.dump(qc_results, f, indent=4) + # - Get resampled images (narps.get_resampled_images()) # - input: all image types (thresh, bin, unthresh) # - output: resampled image in MNI space +## first get MNI152NLin2009cAsym template from templateflow +mni_template = tflow.get("MNI152NLin2009cAsym", resolution=2, suffix="T1w", desc=None) + +print("Resampling images to MNI space...") +results = {} +overwrite = False +all_images_to_resample = find_bids_files( + datadir, type='thresh', space='native', desc='binarized') + find_bids_files( + datadir, type='unthresh', space='native', desc='rectified') + +print (f"Found {len(all_images_to_resample)} images to resample.") + +for img_path in all_images_to_resample: + components = parse_bids_filename(img_path) + output_path = Path(modify_bids_filename( + img_path, + space='MNI152NLin2009cAsym' + )) + results [str(output_path)] = { + 'infile': str(img_path), + } + if output_path.exists() and not overwrite: + continue + + img = nib.load(str(img_path)) + # use linear interpolation for binarized maps, then threshold at 0.5 + # this avoids empty voxels that can occur with NN interpolation + # resample to MNI space + if components['desc'] == 'binarized': + interpolation = 'linear' + else: + interpolation = 'continuous' + + resampled_img = resample_to_img(img, mni_template, + interpolation=interpolation, + force_resample=True, copy_header=True) + + if components['desc'] == 'binarized': + interpolation = 'linear' + # re-binarize + resampled_data = resampled_img.get_fdata() + binarized_data = (resampled_data > 0.5).astype(np.float32) + resampled_img = nib.Nifti1Image( + binarized_data, + resampled_img.affine, + resampled_img.header + ) + + + resampled_img.to_filename(str(output_path)) + +with open(logdir / "resampling_log.json", "w") as f: + json.dump(results, f, indent=4) + + + # - Create concatenated versions of all images - narps.create_concat_imag # - input: individual 3d images # - output: combined 4d images for each image type - -# - Check image values - narps.check_image_values() -# - input: thresholded images -# - output: data frame with number of NA and nonzero voxels - - -# - Create mean thresholded images - narps.create_mean_thresholded_images() -# - input: contact thresh image -# - output: mean thresholded image - - -# - Convert to zscores - narps.convert_to_zscores() -# - input: unthresh rectified images -# - output: zscore images - - -# - Create concatenated utnhresh zstat images - narps.create_concat_images(datatype='zstat', imgtypes=['unthresh'], -# create_voxel_map=True) -# input: zstat images -# output: concatenated 4d zstat image -# NOTE: maynbe consider using the concatenated unthresh image to compute zstats - - -## Analyses - -# - Compute image stats - narps.compute_image_stats() -# input: unthresholded concatenated files -# output: range and std images - -# - Estimate smoothness - narps.estimate_smoothness() - relies upon FSL smoothest -# input: individual 3d team zstat images -# output: dataframe containing smoothness estimates for each team - - -# ### Map analysis functions - -# NOTE: the following three are largely duplicative -# - Create overlap maps - AnalyzeMaps.mk_overlap_maps -# input: mean thresholded images -# output: pdf and png files with rendering of map on anatomy -# - Create range maps - AnalyzeMaps.mk_range_maps -# input: range images -# output: pdf/png files with rendering of range map on anatomy -# - Create std maps - AnalyzeMaps.mk_std_maps -# input: std images -# output: pdf/png files with rendering of std map on anatomy - -# - Unthresholded correlation analysis - test_unthresh_correlation_analysis -# - Create correlation maps for unthresholded images - AnalyzeMaps.mk_correlation_maps_unthresh -# input: concatenated unthresholded images -# output: -# - median pattern correlation (?) -# - df containing correlation matrix between datasets for each hypothesis -# - png/pdf images of clustermap based on correlation matrix -# - cluster membership (saved as json) -# - reordered correlation matrix based on ward clustering -# - dendrograms for each hypothesis -# NOTE: this one has WAY too much going on - good candidate for refactoring example! -# - Analyze clusters - AnalyzeMaps.analyze_clusters -# input: dendrograms and membership dict -# output: -# - mean unthresholded map for each cluster/hypothesis -# - various stats output to logfile -# -pdf/png files with rendered thresholded map for each cluster -# - cluster metadata saved to csv -# - cluster similarity (rand scores) saved to csv -# - consistency of cluster membership across hypotheses -# - Plot distance from mean - AnalyzeMaps.plot_distance_from_mean(narps) -# input: median pattern correlation -# output: -# - pdf with bar plot of correlations across teams -# - pdf of bar plot excluding teams above .2 correlation (i.e. only bad teams) -# - Get similarity of thresholded images - AnalyzeMaps.get_thresh_similarity(narps) -# input: concatenated thresholded data -# output: -# - df with percent agreement -# - pdf/png with cluster map of percent agreement -# - mean jaccard for nonzero voxels -# - Get thresholded z-stat maps - utils.get_thresholded_Z_maps(narps) -# input: unthresholded z maps and thresholded binarized maps -# output: thresholded Z maps -# NOTE: currently done on individual maps, probably should do on concatenated -# - Get diagnostics on thresholded maps - ThreshVoxelStatistics.get_zstat_diagnostics(narps) -# input: thresholded and unthresholded maps -# output: diagnostic data comparing thresh and unthresh for each team -# - Get stats on thresholded maps - ThreshVoxelStatistics.get_thresh_voxel_stats(narps.basedir) -# input: individual diagnostic data frames -# output: -# - df with all diagnostic data combined -# - df with statistics on thresholded maps -# - Get similarity summary - GetMeanSimilarity.get_similarity_summary() -# input: correlation data frames, cluster membership json -# output: -# - png with histogram of correlations -# - pngs with hisrogram of correlations for each cluster -# - data frame with correlation results by hyp/group - - -# ### Consensus analysis functions - -# - run t-tests - ConsensusAnalysis.run_ttests -# input: unthresholded zstat images -# output: -# - t, pval, and fdr image for each hypothesis -# - tau image for each hypothesis -# - make figures - ConsensusAnalysis.mk_figures() -# input: t and fdr images -# output: -# - pdf with rendered consensus images -# - pdf with rendered tau maps -# - pdf with rendered tau histograms - -# - Compute cluster similarity for Tom et al. and mean activation - ClusterImageCorrelation.cluster_image_correlation -# input: cluster images and target image (mean or tom et al) -# output: -# - correlation maps for each cluster vs target for each hyp -# - df with cluster correlations for each hyp - -# - Make combined cluster figures - MakeCombinedClusterFigures.make_combined_cluster_figures -# input: pngs for corr map and cluster means -# output: png with combined cluster images -# - Make supp figure 1 - MakeSupplementaryFigure1.mk_supp_figure1 -# input: metadata -# output: -# - csv tables with merged metadata, decision data, and confidence data -# - png image with confidence data and modeling info - +concat_dir = basedir / "data-concat" +if not concat_dir.exists(): + concat_dir.mkdir(parents=True, exist_ok=True) + +results = {} +for hyp in range(1,10): + print(f"Creating concatenated images for hypothesis {hyp}...") + + resampled_images = {'unthresh': find_bids_files( + datadir, type='unthresh', space='MNI152NLin2009cAsym', hyp=str(hyp)), + 'thresh': []} + team_ids = [] + for img_path in resampled_images['unthresh']: + components = parse_bids_filename(img_path) + team_ids.append(components['team']) + thresh_img_path = modify_bids_filename( + img_path, + type='thresh', desc='binarized', suffix='mask' + ) + assert Path(thresh_img_path).exists(), f"Binarized thresholded image not found for {img_path}" + resampled_images['thresh'].append(thresh_img_path) + assert len(resampled_images['thresh']) == len(resampled_images['unthresh']), "Mismatch in number of unthresh and thresh images" + print (f"Found {len(team_ids)} unthresh resampled images to concatenate for hypothesis {hyp}.") + + + suffix_dict = {'unthresh': 'stat', 'thresh': 'mask'} + for imgtype in ['unthresh', 'thresh']: + img_list = [] + for img_path in resampled_images[imgtype]: + img = nib.load(str(img_path)) + img_list.append(img) + # concatenate along 4th dimension + concat_img = nilearn.image.concat_imgs(img_list) + + output_path = concat_dir / f"hyp-{hyp}_type-{imgtype}_space-MNI152NLin2009cAsym_desc-concat_{suffix_dict[imgtype]}.nii.gz" + concat_img.to_filename(str(output_path)) + print(f"Saved concatenated {imgtype} image for hypothesis {hyp} to {output_path}") + results[str(output_path)] = { + 'team_ids': team_ids + } + +with open(logdir / "concatenation_log.json", "w") as f: + json.dump(results, f, indent=4) \ No newline at end of file From ac742f6e9566c5beda9befe5fd9f8bdadd22c369 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 29 Nov 2025 17:30:28 -0800 Subject: [PATCH 09/87] Add words --- project-words.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/project-words.txt b/project-words.txt index 4b02c83..54f60e7 100644 --- a/project-words.txt +++ b/project-words.txt @@ -1,4 +1,5 @@ # New Words +ttests pheno tsim keepdims From aaf1818955563a3da16b6272b45b7476675dc124 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 29 Nov 2025 17:33:38 -0800 Subject: [PATCH 10/87] ruff/blue fixes --- .../narps/narps_megascript.py | 264 ++++++++++-------- 1 file changed, 147 insertions(+), 117 deletions(-) diff --git a/src/BetterCodeBetterScience/narps/narps_megascript.py b/src/BetterCodeBetterScience/narps/narps_megascript.py index 98abaed..b4d3024 100644 --- a/src/BetterCodeBetterScience/narps/narps_megascript.py +++ b/src/BetterCodeBetterScience/narps/narps_megascript.py @@ -6,14 +6,14 @@ from pathlib import Path import tarfile import urllib.request -from typing import Dict, List, Union import shutil from BetterCodeBetterScience.narps.bids_utils import ( parse_bids_filename, find_bids_files, - modify_bids_filename + modify_bids_filename, ) from nilearn.maskers import NiftiMasker +import nilearn.image import nibabel as nib import numpy as np import json @@ -25,44 +25,50 @@ ## Download data # - the organized data are available from https://zenodo.org/record/3528329/files/narps_origdata_1.0.tgz -assert 'NARPS_DATADIR' in os.environ, "Please set NARPS_DATADIR in your environment variables or .env file" +assert ( + 'NARPS_DATADIR' in os.environ +), 'Please set NARPS_DATADIR in your environment variables or .env file' basedir = Path(os.environ['NARPS_DATADIR']) if not basedir.exists(): basedir.mkdir(parents=True, exist_ok=True) -narps_data_url = "https://zenodo.org/record/3528329/files/narps_origdata_1.0.tgz" -narps_data_archive = basedir / "narps_origdata_1.0.tgz" +narps_data_url = ( + 'https://zenodo.org/record/3528329/files/narps_origdata_1.0.tgz' +) +narps_data_archive = basedir / 'narps_origdata_1.0.tgz' overwrite_data = False if not narps_data_archive.exists() or overwrite_data: - print(f"Downloading NARPS data from {narps_data_url}...") + print(f'Downloading NARPS data from {narps_data_url}...') urllib.request.urlretrieve(narps_data_url, narps_data_archive) - print("Download complete.") + print('Download complete.') -origdir = basedir / "orig" +origdir = basedir / 'orig' if not origdir.exists() or overwrite_data: - print("Extracting data...") - with tarfile.open(narps_data_archive, "r:gz") as tar: + print('Extracting data...') + with tarfile.open(narps_data_archive, 'r:gz') as tar: tar.extractall(path=basedir) - print("Extraction complete.") + print('Extraction complete.') -logdir = basedir / "logs" +logdir = basedir / 'logs' if not logdir.exists(): logdir.mkdir(parents=True, exist_ok=True) ## Get info about teams and hypotheses ## team dirs are in orig, starting with numeric team IDs -teamdirs = sorted([d for d in origdir.iterdir() if d.is_dir() and d.name[0].isdigit()]) -print(f"Found {len(teamdirs)} team directories.") +teamdirs = sorted( + [d for d in origdir.iterdir() if d.is_dir() and d.name[0].isdigit()] +) +print(f'Found {len(teamdirs)} team directories.') team_dict = {d.name: {'orig': d} for d in teamdirs} # convert orig data to a BIDS-like organization overwrite = False -datadir = basedir / "data-teams" +datadir = basedir / 'data-teams' if datadir.exists() and overwrite: shutil.rmtree(datadir) @@ -75,16 +81,23 @@ team_orig_dir = paths['orig'] # include "thresh" to prevent some additional files from being detected for type in ['thresh', 'unthresh']: - for img_file in team_orig_dir.glob(f"*_{type}.nii.gz"): - hyp, imgtype = img_file.name.split('.')[0].replace('hypo','').split('_') + for img_file in team_orig_dir.glob(f'*_{type}.nii.gz'): + hyp, imgtype = ( + img_file.name.split('.')[0].replace('hypo', '').split('_') + ) try: int(hyp) except ValueError: - print(f"Unexpected hypothesis number format in file {img_file}, skipping.") + print( + f'Unexpected hypothesis number format in file {img_file}, skipping.' + ) continue - dest_file = datadir / f"team-{team_id_short}_hyp-{hyp}_type-{imgtype}_space-native_desc-orig_stat.nii.gz" + dest_file = ( + datadir + / f'team-{team_id_short}_hyp-{hyp}_type-{imgtype}_space-native_desc-orig_stat.nii.gz' + ) if not dest_file.exists(): - print(f"Copying {img_file} to {dest_file}") + print(f'Copying {img_file} to {dest_file}') shutil.copy(img_file.resolve(), dest_file) assert parse_bids_filename(dest_file)['team'] == team_id_short assert parse_bids_filename(dest_file)['hyp'] == hyp @@ -97,34 +110,35 @@ # - need for rectification (i.e. mostly negative values in unthresh image within the mask defined by the thresh image) # - invalid in-mask values (i.e. both positive and negative values in unthresh image within the mask defined by the thresh image) - don't exclude, just note in log -datadir_excluded = basedir / "data-teams-excluded" +datadir_excluded = basedir / 'data-teams-excluded' if not datadir_excluded.exists(): datadir_excluded.mkdir(parents=True, exist_ok=True) qc_results = {} error_thresh = 0.1 # proportion of invalid in-mask values to flag problem -print("Running QC on original images...") +print('Running QC on original images...') unthresh_images = find_bids_files(datadir, type='unthresh', desc='orig') -print (f"Found {len(unthresh_images)} unthresh original images to QC.") +print(f'Found {len(unthresh_images)} unthresh original images to QC.') for unthresh_img_path in unthresh_images: components = parse_bids_filename(unthresh_img_path) hyp_num = int(components['hyp']) team_id = components['team'] result = { - "hyp": hyp_num, - "team_id": team_id, - 'infile': str(unthresh_img_path) + 'hyp': hyp_num, + 'team_id': team_id, + 'infile': str(unthresh_img_path), } - thresh_img_path = modify_bids_filename( - unthresh_img_path, - type='thresh' - ) + thresh_img_path = modify_bids_filename(unthresh_img_path, type='thresh') if not Path(thresh_img_path).exists(): - print(f"Thresholded image not found for hyp {hyp_num}, team {team_id}, moving to excluded data.") - shutil.move(unthresh_img_path, datadir_excluded / unthresh_img_path.name) - result["exclusion"] = 'thresholded image not found' + print( + f'Thresholded image not found for hyp {hyp_num}, team {team_id}, moving to excluded data.' + ) + shutil.move( + unthresh_img_path, datadir_excluded / unthresh_img_path.name + ) + result['exclusion'] = 'thresholded image not found' qc_results[str(unthresh_img_path)] = result continue @@ -133,11 +147,19 @@ thresh_img = nib.load(str(thresh_img_path)) # check image dimensions and affine - if unthresh_img.shape != thresh_img.shape or not np.allclose(unthresh_img.affine, thresh_img.affine): - print(f"Image shape or affine mismatch for hyp {components['hyp']}, team {components['team']}, moving to excluded data.") - shutil.move(unthresh_img_path, datadir_excluded / unthresh_img_path.name) - shutil.move(thresh_img_path, datadir_excluded / Path(thresh_img_path).name) - result["exclusion"] = 'image shape or affine mismatch' + if unthresh_img.shape != thresh_img.shape or not np.allclose( + unthresh_img.affine, thresh_img.affine + ): + print( + f"Image shape or affine mismatch for hyp {components['hyp']}, team {components['team']}, moving to excluded data." + ) + shutil.move( + unthresh_img_path, datadir_excluded / unthresh_img_path.name + ) + shutil.move( + thresh_img_path, datadir_excluded / Path(thresh_img_path).name + ) + result['exclusion'] = 'image shape or affine mismatch' qc_results[str(unthresh_img_path)] = result continue @@ -145,8 +167,7 @@ thresh_data = np.nan_to_num(thresh_data) n_thresh_vox = np.sum(thresh_data > 0) # check for min_p_direction > error_thresh - result["n_thresh_vox"] = int(n_thresh_vox) - + result['n_thresh_vox'] = int(n_thresh_vox) if n_thresh_vox > 0: masker = NiftiMasker(mask_img=thresh_img) @@ -157,26 +178,28 @@ max_val = float(np.max(unthresh_data_masked.flatten())) p_pos_unthresh = float(np.mean(unthresh_data_masked.flatten() > 0)) p_neg_unthresh = float(np.mean(unthresh_data_masked.flatten() < 0)) - result["min_unthresh"] = min_val - result["max_unthresh"] = max_val - result["p_pos_unthresh"] = p_pos_unthresh - result["p_neg_unthresh"] = p_neg_unthresh + result['min_unthresh'] = min_val + result['max_unthresh'] = max_val + result['p_pos_unthresh'] = p_pos_unthresh + result['p_neg_unthresh'] = p_neg_unthresh if p_neg_unthresh > (1 - error_thresh): # mostly negative values, rectify - result["autorectify"] = True + result['autorectify'] = True else: - result["autorectify"] = False + result['autorectify'] = False # Check for problems - note these in QC but don't exclude min_p_direction = min(p_pos_unthresh, p_neg_unthresh) if min_p_direction > error_thresh: - result["problem"] = 'invalid in-mask values' - print(f"hyp {hyp_num}, team {team_id} - invalid in-mask values (neg: {p_neg_unthresh:.3f}, pos: {p_pos_unthresh:.3f})") + result['problem'] = 'invalid in-mask values' + print( + f'hyp {hyp_num}, team {team_id} - invalid in-mask values (neg: {p_neg_unthresh:.3f}, pos: {p_pos_unthresh:.3f})' + ) qc_results[str(unthresh_img_path)] = result -with open(logdir / "qc_log.json", "w") as f: +with open(logdir / 'qc_log.json', 'w') as f: json.dump(qc_results, f, indent=4) @@ -186,19 +209,21 @@ # Create binarized versions of thresholded images -print("Creating binarized images...") +print('Creating binarized images...') thresh_images_to_binarize = find_bids_files( - datadir, type='thresh', desc='orig') -print (f"Found {len(thresh_images_to_binarize)} thresh binarized images to process.") + datadir, type='thresh', desc='orig' +) +print( + f'Found {len(thresh_images_to_binarize)} thresh binarized images to process.' +) results = {} overwrite = False thresh = 1e-4 for thresh_img_path in thresh_images_to_binarize: - outfile = Path(modify_bids_filename( - thresh_img_path, - desc='binarized', suffix='mask' - )) + outfile = Path( + modify_bids_filename(thresh_img_path, desc='binarized', suffix='mask') + ) if outfile.exists() and not overwrite: continue @@ -207,26 +232,24 @@ thresh_data = np.nan_to_num(thresh_data) binarized_data = (np.abs(thresh_data) > thresh).astype(np.float32) binarized_img = nib.Nifti1Image( - binarized_data, - thresh_img.affine, - thresh_img.header + binarized_data, thresh_img.affine, thresh_img.header ) - binarized_img.to_filename(str(outfile)) + binarized_img.to_filename(str(outfile)) results[str(outfile)] = { 'infile': str(thresh_img_path), - 'n_nonzero_voxels': int(np.sum(binarized_data)) + 'n_nonzero_voxels': int(np.sum(binarized_data)), } -with open(logdir / "binarization_log.json", "w") as f: +with open(logdir / 'binarization_log.json', 'w') as f: json.dump(results, f, indent=4) # - Create rectified images - narps.create_rectified_images() # - input: original image (thresh and unthresh versions) # - output: rectified images for reverse contrasts -# - NOTE: see logic within get_binarized_thresh_masks +# - NOTE: see logic within get_binarized_thresh_masks -print("Creating rectified images...") +print('Creating rectified images...') results = {} overwrite = False @@ -234,37 +257,34 @@ if not Path(unthresh_img_path).exists(): continue - output_path = Path(modify_bids_filename( - unthresh_img_path, - desc='rectified' - )) + output_path = Path( + modify_bids_filename(unthresh_img_path, desc='rectified') + ) if output_path.exists() and not overwrite: - print(f"Rectified image already exists: {output_path}, skipping.") + print(f'Rectified image already exists: {output_path}, skipping.') continue unthresh_img = nib.load(str(unthresh_img_path)) unthresh_data = unthresh_img.get_fdata() - if values.get("autorectify", False): + if values.get('autorectify', False): # mostly negative values, rectify - print(f"Rectifying unthresh image for hyp {hyp_num}, team {team_id}") + print(f'Rectifying unthresh image for hyp {hyp_num}, team {team_id}') rectified_data = -1 * unthresh_data - qc_results[unthresh_img_path]["rectified"] = True + qc_results[unthresh_img_path]['rectified'] = True else: rectified_data = unthresh_data - qc_results[unthresh_img_path]["rectified"] = False + qc_results[unthresh_img_path]['rectified'] = False rectified_img = nib.Nifti1Image( - rectified_data, - unthresh_img.affine, - unthresh_img.header + rectified_data, unthresh_img.affine, unthresh_img.header ) rectified_img.to_filename(str(output_path)) results[str(unthresh_img_path)] = result -with open(logdir / "rectification_log.json", "w") as f: +with open(logdir / 'rectification_log.json', 'w') as f: json.dump(qc_results, f, indent=4) @@ -273,24 +293,25 @@ # - output: resampled image in MNI space ## first get MNI152NLin2009cAsym template from templateflow -mni_template = tflow.get("MNI152NLin2009cAsym", resolution=2, suffix="T1w", desc=None) +mni_template = tflow.get( + 'MNI152NLin2009cAsym', resolution=2, suffix='T1w', desc=None +) -print("Resampling images to MNI space...") +print('Resampling images to MNI space...') results = {} overwrite = False all_images_to_resample = find_bids_files( - datadir, type='thresh', space='native', desc='binarized') + find_bids_files( - datadir, type='unthresh', space='native', desc='rectified') + datadir, type='thresh', space='native', desc='binarized' +) + find_bids_files(datadir, type='unthresh', space='native', desc='rectified') -print (f"Found {len(all_images_to_resample)} images to resample.") +print(f'Found {len(all_images_to_resample)} images to resample.') for img_path in all_images_to_resample: components = parse_bids_filename(img_path) - output_path = Path(modify_bids_filename( - img_path, - space='MNI152NLin2009cAsym' - )) - results [str(output_path)] = { + output_path = Path( + modify_bids_filename(img_path, space='MNI152NLin2009cAsym') + ) + results[str(output_path)] = { 'infile': str(img_path), } if output_path.exists() and not overwrite: @@ -304,10 +325,14 @@ interpolation = 'linear' else: interpolation = 'continuous' - - resampled_img = resample_to_img(img, mni_template, + + resampled_img = resample_to_img( + img, + mni_template, interpolation=interpolation, - force_resample=True, copy_header=True) + force_resample=True, + copy_header=True, + ) if components['desc'] == 'binarized': interpolation = 'linear' @@ -315,48 +340,50 @@ resampled_data = resampled_img.get_fdata() binarized_data = (resampled_data > 0.5).astype(np.float32) resampled_img = nib.Nifti1Image( - binarized_data, - resampled_img.affine, - resampled_img.header + binarized_data, resampled_img.affine, resampled_img.header ) - resampled_img.to_filename(str(output_path)) -with open(logdir / "resampling_log.json", "w") as f: +with open(logdir / 'resampling_log.json', 'w') as f: json.dump(results, f, indent=4) - - # - Create concatenated versions of all images - narps.create_concat_imag # - input: individual 3d images # - output: combined 4d images for each image type -concat_dir = basedir / "data-concat" +concat_dir = basedir / 'data-concat' if not concat_dir.exists(): concat_dir.mkdir(parents=True, exist_ok=True) results = {} -for hyp in range(1,10): - print(f"Creating concatenated images for hypothesis {hyp}...") - - resampled_images = {'unthresh': find_bids_files( - datadir, type='unthresh', space='MNI152NLin2009cAsym', hyp=str(hyp)), - 'thresh': []} +for hyp in range(1, 10): + print(f'Creating concatenated images for hypothesis {hyp}...') + + resampled_images = { + 'unthresh': find_bids_files( + datadir, type='unthresh', space='MNI152NLin2009cAsym', hyp=str(hyp) + ), + 'thresh': [], + } team_ids = [] for img_path in resampled_images['unthresh']: components = parse_bids_filename(img_path) team_ids.append(components['team']) thresh_img_path = modify_bids_filename( - img_path, - type='thresh', desc='binarized', suffix='mask' + img_path, type='thresh', desc='binarized', suffix='mask' ) - assert Path(thresh_img_path).exists(), f"Binarized thresholded image not found for {img_path}" + assert Path( + thresh_img_path + ).exists(), f'Binarized thresholded image not found for {img_path}' resampled_images['thresh'].append(thresh_img_path) - assert len(resampled_images['thresh']) == len(resampled_images['unthresh']), "Mismatch in number of unthresh and thresh images" - print (f"Found {len(team_ids)} unthresh resampled images to concatenate for hypothesis {hyp}.") - + assert len(resampled_images['thresh']) == len( + resampled_images['unthresh'] + ), 'Mismatch in number of unthresh and thresh images' + print( + f'Found {len(team_ids)} unthresh resampled images to concatenate for hypothesis {hyp}.' + ) suffix_dict = {'unthresh': 'stat', 'thresh': 'mask'} for imgtype in ['unthresh', 'thresh']: @@ -367,12 +394,15 @@ # concatenate along 4th dimension concat_img = nilearn.image.concat_imgs(img_list) - output_path = concat_dir / f"hyp-{hyp}_type-{imgtype}_space-MNI152NLin2009cAsym_desc-concat_{suffix_dict[imgtype]}.nii.gz" + output_path = ( + concat_dir + / f'hyp-{hyp}_type-{imgtype}_space-MNI152NLin2009cAsym_desc-concat_{suffix_dict[imgtype]}.nii.gz' + ) concat_img.to_filename(str(output_path)) - print(f"Saved concatenated {imgtype} image for hypothesis {hyp} to {output_path}") - results[str(output_path)] = { - 'team_ids': team_ids - } + print( + f'Saved concatenated {imgtype} image for hypothesis {hyp} to {output_path}' + ) + results[str(output_path)] = {'team_ids': team_ids} -with open(logdir / "concatenation_log.json", "w") as f: - json.dump(results, f, indent=4) \ No newline at end of file +with open(logdir / 'concatenation_log.json', 'w') as f: + json.dump(results, f, indent=4) From 561c3b0ac91c9c8cc1be2628dcb608d5c21aefb7 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 30 Nov 2025 09:56:11 -0800 Subject: [PATCH 11/87] fix suffix handling --- src/BetterCodeBetterScience/narps/bids_utils.py | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/BetterCodeBetterScience/narps/bids_utils.py b/src/BetterCodeBetterScience/narps/bids_utils.py index 1a41efe..66c1d8c 100644 --- a/src/BetterCodeBetterScience/narps/bids_utils.py +++ b/src/BetterCodeBetterScience/narps/bids_utils.py @@ -130,9 +130,14 @@ def modify_bids_filename(filename: Union[str, Path], **bids_tags) -> Union[str, ordered_pairs.append((key, value)) existing_keys.add(key) - # Add any new keys from bids_tags that weren't in original + # Check if suffix should be modified + if 'suffix' in bids_tags: + suffix = bids_tags['suffix'] + existing_keys.add('suffix') + + # Add any new keys from bids_tags that weren't in original (except suffix) for key, value in bids_tags.items(): - if key not in existing_keys: + if key not in existing_keys and key != 'suffix': ordered_pairs.append((key, value)) # Reconstruct filename maintaining order From 7973dd209928e1b846cac80b1d59dbecae26f0c1 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 30 Nov 2025 09:56:23 -0800 Subject: [PATCH 12/87] fix suffix handling --- tests/narps/test_bids.py | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/tests/narps/test_bids.py b/tests/narps/test_bids.py index 6961659..1f10019 100644 --- a/tests/narps/test_bids.py +++ b/tests/narps/test_bids.py @@ -367,6 +367,25 @@ def test_modify_bids_filename_preserve_suffix(): assert result.endswith("_bold.nii.gz") +def test_modify_bids_filename_change_suffix(): + """Test changing the suffix.""" + original = "sub-01_task-rest_bold.nii.gz" + result = modify_bids_filename(original, suffix="T1w") + + # Suffix should be at the end, not as suffix-value + assert result == "sub-01_task-rest_T1w.nii.gz" + assert "suffix-" not in result + + +def test_modify_bids_filename_change_suffix_and_tags(): + """Test changing both suffix and other tags.""" + original = "sub-01_task-rest_bold.nii.gz" + result = modify_bids_filename(original, task="memory", suffix="T1w") + + assert result == "sub-01_task-memory_T1w.nii.gz" + assert "suffix-" not in result + + def test_modify_bids_filename_complex_modification(): """Test complex modification with many changes.""" original = "sub-01_ses-01_task-rest_acq-mb_bold.nii.gz" From 5cf3f15af48ef959d5b5509b4b35a84464bba169 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 30 Nov 2025 09:56:41 -0800 Subject: [PATCH 13/87] working version --- .../narps/narps_megascript.py | 174 ++++++++++++++---- 1 file changed, 143 insertions(+), 31 deletions(-) diff --git a/src/BetterCodeBetterScience/narps/narps_megascript.py b/src/BetterCodeBetterScience/narps/narps_megascript.py index b4d3024..d7c6843 100644 --- a/src/BetterCodeBetterScience/narps/narps_megascript.py +++ b/src/BetterCodeBetterScience/narps/narps_megascript.py @@ -19,10 +19,12 @@ import json import templateflow.api as tflow from nilearn.image import resample_to_img +import pandas as pd +from scipy.stats import norm, t dotenv.load_dotenv() -## Download data +## 1. Download data # - the organized data are available from https://zenodo.org/record/3528329/files/narps_origdata_1.0.tgz assert ( @@ -56,16 +58,8 @@ if not logdir.exists(): logdir.mkdir(parents=True, exist_ok=True) -## Get info about teams and hypotheses -## team dirs are in orig, starting with numeric team IDs -teamdirs = sorted( - [d for d in origdir.iterdir() if d.is_dir() and d.name[0].isdigit()] -) -print(f'Found {len(teamdirs)} team directories.') -team_dict = {d.name: {'orig': d} for d in teamdirs} - -# convert orig data to a BIDS-like organization +## 2. convert orig data to a BIDS-like organization overwrite = False datadir = basedir / 'data-teams' @@ -75,9 +69,24 @@ if not datadir.exists(): datadir.mkdir(parents=True, exist_ok=True) +# Get info about teams - we will only process data for hypothesis 1 here +# team dirs are in orig, starting with numeric team IDs + +teamdirs = sorted( + [d for d in origdir.iterdir() if d.is_dir() and d.name[0].isdigit()] +) +print(f'Found {len(teamdirs)} team directories.') +team_dict = {d.name: {'orig': d} for d in teamdirs} + +# we will only process hypothesis 1 for this analysis for the sake of speed +target_hypothesis = 1 + +team_id_to_number = {} + for team_id, paths in team_dict.items(): # only use the team number team_id_short = team_id.split('_')[0] + team_id_to_number[team_id.split('_')[1]] = team_id_short team_orig_dir = paths['orig'] # include "thresh" to prevent some additional files from being detected for type in ['thresh', 'unthresh']: @@ -92,6 +101,8 @@ f'Unexpected hypothesis number format in file {img_file}, skipping.' ) continue + if int(hyp) != target_hypothesis: + continue dest_file = ( datadir / f'team-{team_id_short}_hyp-{hyp}_type-{imgtype}_space-native_desc-orig_stat.nii.gz' @@ -103,7 +114,7 @@ assert parse_bids_filename(dest_file)['hyp'] == hyp assert parse_bids_filename(dest_file)['type'] == imgtype -# QC to identify bad data and move them to excluded data dir +## 3. QC to identify bad data and move them to excluded data dir # look for: # - different image dimensions or affine between thresh and unthresh images for a given team/hyp # - missing thresholded images @@ -169,6 +180,9 @@ # check for min_p_direction > error_thresh result['n_thresh_vox'] = int(n_thresh_vox) + # decide whether to rectify based on in-mask values + # and info from researcher survey + if n_thresh_vox > 0: masker = NiftiMasker(mask_img=thresh_img) masker.fit() @@ -203,11 +217,9 @@ json.dump(qc_results, f, indent=4) -# - Get binarized thresholded maps (narps.get_binarized_thresh_masks()) -# - input: thresholded original image -# - output: binarized version - -# Create binarized versions of thresholded images +## 4 - Get binarized thresholded maps +# some thresholded masks have continuous values, so we will binarize +# them at a small threshold (1e-4) print('Creating binarized images...') thresh_images_to_binarize = find_bids_files( @@ -244,10 +256,10 @@ json.dump(results, f, indent=4) -# - Create rectified images - narps.create_rectified_images() -# - input: original image (thresh and unthresh versions) -# - output: rectified images for reverse contrasts -# - NOTE: see logic within get_binarized_thresh_masks +# 5 - Create rectified images +# some unthresh images need to be rectified (i.e. multiplied by -1) +# so that they match the hypothesis +# we infer this based on the match between thresh and unthresh images print('Creating rectified images...') results = {} @@ -282,15 +294,13 @@ ) rectified_img.to_filename(str(output_path)) - results[str(unthresh_img_path)] = result + results[str(output_path)] = result with open(logdir / 'rectification_log.json', 'w') as f: json.dump(qc_results, f, indent=4) -# - Get resampled images (narps.get_resampled_images()) -# - input: all image types (thresh, bin, unthresh) -# - output: resampled image in MNI space +## 6: Get resampled images ## first get MNI152NLin2009cAsym template from templateflow mni_template = tflow.get( @@ -348,22 +358,122 @@ with open(logdir / 'resampling_log.json', 'w') as f: json.dump(results, f, indent=4) +## 7: convert concatenated unthresh image to z scores +# some teams provided t instead of z scores + + +def TtoZ(data, df=54): + """ + takes a nibabel file object and converts from z to t + using Hughett's transform + adapted from: + https://github.com/vsoch/TtoZ/blob/master/TtoZ/scripts.py + - default to 54 which is full sample per condition for narps + """ + + # Select just the nonzero voxels + nonzero_vox = data != 0 + nonzero = data[nonzero_vox] + + # We will store our results here + Z = np.zeros(len(nonzero)) + + # Select values less than or == 0, and greater than zero + c = np.zeros(len(nonzero)) + k1 = nonzero <= c + k2 = nonzero > c + + # Subset the data into two sets + t1 = nonzero[k1] + t2 = nonzero[k2] + + # Calculate p values for <=0 + p_values_t1 = t.cdf(t1, df=df) + z_values_t1 = norm.ppf(p_values_t1) + + # Calculate p values for > 0 + p_values_t2 = t.cdf(-t2, df=df) + z_values_t2 = -norm.ppf(p_values_t2) + Z[k1] = z_values_t1 + Z[k2] = z_values_t2 + + # Write new image to file + new_nii = np.zeros(data.shape) + new_nii[nonzero_vox] = Z + + return new_nii + + +print('Converting unthresh images to z-scores...') + +# first load the spreadsheet to get the stats types +stats_types_df = pd.read_csv( + origdir / 'narps_neurovault_images_details_responses_corrected.csv' + ) #.set_index('team_id') +stats_types_df.columns = [ + 'Timestamp', 'team_id', 'software', + 'unthresh_type', 'thresh_type', + 'template', 'h5', 'h6', 'h9', 'comments'] +stats_types_df['team_number'] = [ + team_id_to_number.get(tid, None) for tid in stats_types_df['team_id']] + +stat_type_by_team = {} +for _, row in stats_types_df.iterrows(): + team_number = row['team_number'] + if 't value' in row['unthresh_type'].strip().lower(): + stat_type_by_team[team_number] = 't' + else: + # default to z if unsure - i.e. no conversion + stat_type_by_team[team_number] = 'z' + +for team, stattype in stat_type_by_team.items(): + team_unthresh_images = find_bids_files( + datadir, team=team,type='unthresh', space='MNI152NLin2009cAsym', desc='rectified' + ) + if len(team_unthresh_images) == 0: + continue + if len(team_unthresh_images) > 1: + print(f'Warning: multiple unthresh images found for team {team}, using first one.') + unthresh_img_path = team_unthresh_images[0] + output_path = Path( + modify_bids_filename( + unthresh_img_path, suffix='zstat' + ) + ) + if output_path.exists() and not overwrite: + continue + unthresh_img = nib.load(str(unthresh_img_path)) + unthresh_data = unthresh_img.get_fdata() + converted = False + if stattype == 't': + unthresh_data = TtoZ(unthresh_data, df=54) + converted = True + zstat_img = nib.Nifti1Image( + unthresh_data, unthresh_img.affine, unthresh_img.header + ) + zstat_img.to_filename(str(output_path)) + results[str(output_path)] = { + 'infile': str(unthresh_img_path), + 'original_stat_type': stattype, + } + +with open(logdir / 't_to_z_log.json', 'w') as f: + json.dump(results, f, indent=4) -# - Create concatenated versions of all images - narps.create_concat_imag -# - input: individual 3d images -# - output: combined 4d images for each image type +## 8: Create concatenated versions of all images concat_dir = basedir / 'data-concat' if not concat_dir.exists(): concat_dir.mkdir(parents=True, exist_ok=True) results = {} -for hyp in range(1, 10): +for hyp in [target_hypothesis]: print(f'Creating concatenated images for hypothesis {hyp}...') resampled_images = { 'unthresh': find_bids_files( - datadir, type='unthresh', space='MNI152NLin2009cAsym', hyp=str(hyp) + datadir, type='unthresh', space='MNI152NLin2009cAsym', hyp=str(hyp), + suffix='zstat' ), 'thresh': [], } @@ -385,12 +495,13 @@ f'Found {len(team_ids)} unthresh resampled images to concatenate for hypothesis {hyp}.' ) - suffix_dict = {'unthresh': 'stat', 'thresh': 'mask'} + suffix_dict = {'unthresh': 'zstat', 'thresh': 'mask'} for imgtype in ['unthresh', 'thresh']: img_list = [] for img_path in resampled_images[imgtype]: img = nib.load(str(img_path)) img_list.append(img) + img_paths = [p.as_posix() for p in resampled_images[imgtype]] # concatenate along 4th dimension concat_img = nilearn.image.concat_imgs(img_list) @@ -402,7 +513,8 @@ print( f'Saved concatenated {imgtype} image for hypothesis {hyp} to {output_path}' ) - results[str(output_path)] = {'team_ids': team_ids} + results[str(output_path)] = {'infiles': img_paths} with open(logdir / 'concatenation_log.json', 'w') as f: json.dump(results, f, indent=4) + From 9024ea1a360c29a3b0ecd2f97294b5e5234edee2 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 30 Nov 2025 10:50:21 -0800 Subject: [PATCH 14/87] minor cleanup, format with blue ? --- .../narps/narps_megascript.py | 51 ++++++++++++------- 1 file changed, 32 insertions(+), 19 deletions(-) diff --git a/src/BetterCodeBetterScience/narps/narps_megascript.py b/src/BetterCodeBetterScience/narps/narps_megascript.py index d7c6843..b7e6ddf 100644 --- a/src/BetterCodeBetterScience/narps/narps_megascript.py +++ b/src/BetterCodeBetterScience/narps/narps_megascript.py @@ -217,8 +217,8 @@ json.dump(qc_results, f, indent=4) -## 4 - Get binarized thresholded maps -# some thresholded masks have continuous values, so we will binarize +## 4 - Get binarized thresholded maps +# some thresholded masks have continuous values, so we will binarize # them at a small threshold (1e-4) print('Creating binarized images...') @@ -256,7 +256,7 @@ json.dump(results, f, indent=4) -# 5 - Create rectified images +# 5 - Create rectified images # some unthresh images need to be rectified (i.e. multiplied by -1) # so that they match the hypothesis # we infer this based on the match between thresh and unthresh images @@ -409,13 +409,22 @@ def TtoZ(data, df=54): # first load the spreadsheet to get the stats types stats_types_df = pd.read_csv( origdir / 'narps_neurovault_images_details_responses_corrected.csv' - ) #.set_index('team_id') +) # .set_index('team_id') stats_types_df.columns = [ - 'Timestamp', 'team_id', 'software', - 'unthresh_type', 'thresh_type', - 'template', 'h5', 'h6', 'h9', 'comments'] + 'Timestamp', + 'team_id', + 'software', + 'unthresh_type', + 'thresh_type', + 'template', + 'h5', + 'h6', + 'h9', + 'comments', +] stats_types_df['team_number'] = [ - team_id_to_number.get(tid, None) for tid in stats_types_df['team_id']] + team_id_to_number.get(tid, None) for tid in stats_types_df['team_id'] +] stat_type_by_team = {} for _, row in stats_types_df.iterrows(): @@ -428,18 +437,20 @@ def TtoZ(data, df=54): for team, stattype in stat_type_by_team.items(): team_unthresh_images = find_bids_files( - datadir, team=team,type='unthresh', space='MNI152NLin2009cAsym', desc='rectified' + datadir, + team=team, + type='unthresh', + space='MNI152NLin2009cAsym', + desc='rectified', ) if len(team_unthresh_images) == 0: continue if len(team_unthresh_images) > 1: - print(f'Warning: multiple unthresh images found for team {team}, using first one.') - unthresh_img_path = team_unthresh_images[0] - output_path = Path( - modify_bids_filename( - unthresh_img_path, suffix='zstat' + print( + f'Warning: multiple unthresh images found for team {team}, using first one.' ) - ) + unthresh_img_path = team_unthresh_images[0] + output_path = Path(modify_bids_filename(unthresh_img_path, suffix='zstat')) if output_path.exists() and not overwrite: continue unthresh_img = nib.load(str(unthresh_img_path)) @@ -460,7 +471,7 @@ def TtoZ(data, df=54): with open(logdir / 't_to_z_log.json', 'w') as f: json.dump(results, f, indent=4) -## 8: Create concatenated versions of all images +## 8: Create concatenated versions of all images concat_dir = basedir / 'data-concat' if not concat_dir.exists(): @@ -472,8 +483,11 @@ def TtoZ(data, df=54): resampled_images = { 'unthresh': find_bids_files( - datadir, type='unthresh', space='MNI152NLin2009cAsym', hyp=str(hyp), - suffix='zstat' + datadir, + type='unthresh', + space='MNI152NLin2009cAsym', + hyp=str(hyp), + suffix='zstat', ), 'thresh': [], } @@ -517,4 +531,3 @@ def TtoZ(data, df=54): with open(logdir / 'concatenation_log.json', 'w') as f: json.dump(results, f, indent=4) - From 936ea6c6e842656bb17db42a2a4d802c4200f108 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Thu, 4 Dec 2025 07:27:14 -0800 Subject: [PATCH 15/87] initial workflow development --- book/extras.md | 9 +++++++++ book/workflows.md | 7 +++++-- pyproject.toml | 1 + uv.lock | 11 +++++++++++ 4 files changed, 26 insertions(+), 2 deletions(-) diff --git a/book/extras.md b/book/extras.md index 6d9b932..14a4269 100644 --- a/book/extras.md +++ b/book/extras.md @@ -329,3 +329,12 @@ Found 422 publications containing Memory in the title One very nice feature of the document store is that not all records have to have the same keys; this provides a great deal of flexibility at data ingestion. However, too much heterogeneity between documents can make the database hard to work with. One benefit of homogeneity in the document structure is that it allows indexing, which can greatly increase the speed of queries in large document stores. For example, if we know that we will often want to search by the `year` field, then we can add an index for this field: *MORE HERE* + + +### NARPS + +The example comes from a paper that we published in 2020 {cite:p}`Botvinik-Nezer:2020aa`, which involved analysis of data from a large study called the Neuroimaging Analysis Replication and Prediction Study (hereafter *NARPS* for short). The goal of this study was to identify how the results of data analysis varied between different research groups when given the same data. A relatively large neuroimaging dataset was collected and distributed to groups of researchers, who were asked to test a set of nine hypotheses about brain activity in relation to a monetary gambling task that the participants performed during MRI scanning. Seventy teams submitted results, which included their answers to the 9 yes/no hypotheses along with a detailed description of their analysis workflow and a number of outputs from intermediate stages of the analysis. The main finding was that there was a striking amount of variability in the results between teams, even though the raw data were identical. + +The workflow that I will use here starts with the results that the teams submitted, and ends with preprocessed data that are ready for further statistical analysis. I wrote much of the original analysis code for the project, which can be found [here](https://github.com/poldrack/narps). This code was written at the point when I was just becoming interested in software engineering practices for science, and while it represents a first step in that direction, it has *a lot* of problems. In particular, it uses the problematic *God object* anti-pattern that I mentioned in an earlier chapter. For the purposes of this chapter I have first rewritten the analysis into a monolithic mega-script, which I will then incrementally refactor into a well-structured workflow. I chose this example because it is relatively complex yet runs quickly on any modern laptop. + + diff --git a/book/workflows.md b/book/workflows.md index 1b56b8e..4d10fdb 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -30,15 +30,18 @@ Finally, we care about the *efficiency* of the workflow implementation. This inc It's worth noting that these different desiderata will sometimes conflict with one another (such as configurability versus maintainability), and that no workflow will be perfect. +## An example workflow +In this chapter I will use a running example to show how to move from a monolithic analysis script to a well-structured and usable workflow that meets most of the desired features outlined above. -### Breaking a workflow into stages + +## Breaking a workflow into stages good breakpoints between workflow modules include: - conceptual logic - different stages do different things - points where one might need to restart the computation (e.g. due to computational cost) -- sections where one might wish to swap in a new method +- sections where one might wish to swap in a new method or different parameterization - points where the output could be reusable elsewhere the workflow should be stateless when possible diff --git a/pyproject.toml b/pyproject.toml index 8f8d3ec..da969f7 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -60,6 +60,7 @@ dependencies = [ "statsmodels>=0.14.5", "blue>=0.9.1", "nilearn>=0.12.1", + "fmriprep-docker>=25.2.3", ] [build-system] diff --git a/uv.lock b/uv.lock index dde3a40..c5951fe 100644 --- a/uv.lock +++ b/uv.lock @@ -312,6 +312,7 @@ dependencies = [ { name = "docutils" }, { name = "fastembed" }, { name = "fastparquet" }, + { name = "fmriprep-docker" }, { name = "gprofiler-official" }, { name = "h5py" }, { name = "hypothesis" }, @@ -371,6 +372,7 @@ requires-dist = [ { name = "docutils", specifier = "==0.17.1" }, { name = "fastembed", specifier = ">=0.7.3" }, { name = "fastparquet", specifier = ">=2024.11.0" }, + { name = "fmriprep-docker", specifier = ">=25.2.3" }, { name = "gprofiler-official", specifier = ">=1.0.0" }, { name = "h5py", specifier = ">=3.15.1" }, { name = "hypothesis", specifier = ">=6.115.3" }, @@ -1326,6 +1328,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ee/1b/00a78aa2e8fbd63f9af08c9c19e6deb3d5d66b4dda677a0f61654680ee89/flatbuffers-25.9.23-py2.py3-none-any.whl", hash = "sha256:255538574d6cb6d0a79a17ec8bc0d30985913b87513a01cce8bcdb6b4c44d0e2", size = 30869, upload-time = "2025-09-24T05:25:28.912Z" }, ] +[[package]] +name = "fmriprep-docker" +version = "25.2.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/0f/1a/df92c6e30179895add181faf3a7ce0ee9c7888052177cb0ca29a35e99db6/fmriprep_docker-25.2.3.tar.gz", hash = "sha256:19c9fe7ac860a49142aa51e5351c44af2a68737a3a0c78603e2ef138f718ae93", size = 9278, upload-time = "2025-10-17T18:07:32.242Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c3/b3/b442ad416fd14bbc33ef7c595f5cbe119c2b612f2f1b77928be5e5d583dd/fmriprep_docker-25.2.3-py2.py3-none-any.whl", hash = "sha256:16b31baf18fa9bf761e022ec931bde3464a0673d5778c9ed5121ebb4208ae241", size = 10449, upload-time = "2025-10-17T18:07:28.637Z" }, +] + [[package]] name = "fonttools" version = "4.54.1" From eb12af0fe3e3b888ac93dd32edc2cf687b32f3bd Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 15 Dec 2025 15:11:22 -0800 Subject: [PATCH 16/87] cleanup --- .gitmodules | 4 ---- my_datalad_repo | 1 - 2 files changed, 5 deletions(-) delete mode 160000 my_datalad_repo diff --git a/.gitmodules b/.gitmodules index 5bca6a6..e69de29 100644 --- a/.gitmodules +++ b/.gitmodules @@ -1,4 +0,0 @@ -[submodule "my_datalad_repo"] - path = my_datalad_repo - url = ./my_datalad_repo - datalad-id = 74807713-a6cf-4418-9dfc-e490a881645b diff --git a/my_datalad_repo b/my_datalad_repo deleted file mode 160000 index e8ce63a..0000000 --- a/my_datalad_repo +++ /dev/null @@ -1 +0,0 @@ -Subproject commit e8ce63a3121002619888b8a6c87a693d177d78ea From 1147e5f48e17668b8f97547eeda9bf012221d5a0 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 15 Dec 2025 15:26:47 -0800 Subject: [PATCH 17/87] update build cmds --- Makefile | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index 7cd3860..35add9a 100644 --- a/Makefile +++ b/Makefile @@ -1,10 +1,11 @@ clean: - rm -rf book/_build -build: clean - uv run jupyter-book build book/ +build-html: clean + myst build --html + npx serve _build/html -pdf: +build-pdf: jupyter-book build book/ --builder pdflatex pipinstall: From b9f4b22aeb53f0f7147dec9becc2409d29862e90 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 16 Dec 2025 07:30:47 -0800 Subject: [PATCH 18/87] add deps --- myst.yml | 4 +-- pyproject.toml | 2 ++ uv.lock | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 77 insertions(+), 2 deletions(-) diff --git a/myst.yml b/myst.yml index b4c5624..e179bf8 100644 --- a/myst.yml +++ b/myst.yml @@ -8,7 +8,7 @@ project: - book/references.bib exports: - format: pdf - template: https://github.com/myst-templates/plain_typst_book.git + template: plain_latex_book output: exports/book.pdf - format: md - format: docx @@ -30,5 +30,5 @@ project: site: options: logo: logo.png - folders: true +# folders: true template: book-theme diff --git a/pyproject.toml b/pyproject.toml index 795338a..4e1d5b6 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -62,6 +62,8 @@ dependencies = [ "nilearn>=0.12.1", "fmriprep-docker>=25.2.3", "mystmd>=1.7.0", + "mne>=1.11.0", + "mongomock>=4.3.0", ] [build-system] diff --git a/uv.lock b/uv.lock index 5afa41f..a31483c 100644 --- a/uv.lock +++ b/uv.lock @@ -309,7 +309,9 @@ dependencies = [ { name = "mariadb" }, { name = "matplotlib" }, { name = "mdnewline" }, + { name = "mne" }, { name = "monarch-py" }, + { name = "mongomock" }, { name = "mysql-connector-python" }, { name = "mystmd" }, { name = "neo4j" }, @@ -370,7 +372,9 @@ requires-dist = [ { name = "mariadb", specifier = ">=1.1.14" }, { name = "matplotlib", specifier = ">=3.9.2" }, { name = "mdnewline", specifier = ">=0.1.3" }, + { name = "mne", specifier = ">=1.11.0" }, { name = "monarch-py", specifier = ">=1.22.0" }, + { name = "mongomock", specifier = ">=4.3.0" }, { name = "mysql-connector-python", specifier = ">=9.5.0" }, { name = "mystmd", specifier = ">=1.7.0" }, { name = "neo4j", specifier = ">=6.0.3" }, @@ -2437,6 +2441,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/82/3d/14ce75ef66813643812f3093ab17e46d3a206942ce7376d31ec2d36229e7/lark-1.3.1-py3-none-any.whl", hash = "sha256:c629b661023a014c37da873b4ff58a817398d12635d3bbb2c5a03be7fe5d1e12", size = 113151, upload-time = "2025-10-27T18:25:54.882Z" }, ] +[[package]] +name = "lazy-loader" +version = "0.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "packaging" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/6f/6b/c875b30a1ba490860c93da4cabf479e03f584eba06fe5963f6f6644653d8/lazy_loader-0.4.tar.gz", hash = "sha256:47c75182589b91a4e1a85a136c074285a5ad4d9f39c63e0d7fb76391c4574cd1", size = 15431, upload-time = "2024-04-05T13:03:12.261Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/83/60/d497a310bde3f01cb805196ac61b7ad6dc5dcf8dce66634dc34364b20b4f/lazy_loader-0.4-py3-none-any.whl", hash = "sha256:342aa8e14d543a154047afb4ba8ef17f5563baad3fc610d7b15b213b0f119efc", size = 12097, upload-time = "2024-04-05T13:03:10.514Z" }, +] + [[package]] name = "linkml" version = "1.9.3" @@ -2728,6 +2744,26 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/15/fd/f7420e8cbce45c259c770cac5718badf907b302d3a99ec587ba5ce030237/mmh3-5.2.0-cp312-cp312-win_arm64.whl", hash = "sha256:3d6bfd9662a20c054bc216f861fa330c2dac7c81e7fb8307b5e32ab5b9b4d2e0", size = 39350, upload-time = "2025-07-29T07:42:29.794Z" }, ] +[[package]] +name = "mne" +version = "1.11.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "decorator" }, + { name = "jinja2" }, + { name = "lazy-loader" }, + { name = "matplotlib" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "pooch" }, + { name = "scipy" }, + { name = "tqdm" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/9b/4f/ed4c27b6179665235a7ce3a738b24a05cafe9fa67f728994361598b27c2f/mne-1.11.0.tar.gz", hash = "sha256:0a89b8fc44133b81218a35cdcba74ad0f8ae2e265136249b365b9ce04864c688", size = 7152794, upload-time = "2025-11-21T19:34:45.907Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c3/60/a9b51009f6df38b491a874b3ca5db1ce37395c6327faca1008e71e9814e6/mne-1.11.0-py3-none-any.whl", hash = "sha256:993f25b0c92e563c23cb272c42c6c0298be10f40ed50abe4dd2deeba8d184ac2", size = 7451119, upload-time = "2025-11-21T19:34:43.463Z" }, +] + [[package]] name = "monarch-py" version = "1.23.1" @@ -2755,6 +2791,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/be/3b/6b13ac39ed9e714c75214ae9d6bb70f272ea65c36e37b8a396ebff20e2da/monarch_py-1.23.1-py3-none-any.whl", hash = "sha256:9cba9cad1c88745fba19b699a5e81aee653802568997c700e7d1707b863aecd1", size = 71086, upload-time = "2025-12-10T04:04:01.81Z" }, ] +[[package]] +name = "mongomock" +version = "4.3.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "packaging" }, + { name = "pytz" }, + { name = "sentinels" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/4d/a4/4a560a9f2a0bec43d5f63104f55bc48666d619ca74825c8ae156b08547cf/mongomock-4.3.0.tar.gz", hash = "sha256:32667b79066fabc12d4f17f16a8fd7361b5f4435208b3ba32c226e52212a8c30", size = 135862, upload-time = "2024-11-16T11:23:25.957Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/94/4d/8bea712978e3aff017a2ab50f262c620e9239cc36f348aae45e48d6a4786/mongomock-4.3.0-py2.py3-none-any.whl", hash = "sha256:5ef86bd12fc8806c6e7af32f21266c61b6c4ba96096f85129852d1c4fec1327e", size = 64891, upload-time = "2024-11-16T11:23:24.748Z" }, +] + [[package]] name = "more-click" version = "0.1.3" @@ -3650,6 +3700,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" }, ] +[[package]] +name = "pooch" +version = "1.8.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "packaging" }, + { name = "platformdirs" }, + { name = "requests" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/c6/77/b3d3e00c696c16cf99af81ef7b1f5fe73bd2a307abca41bd7605429fe6e5/pooch-1.8.2.tar.gz", hash = "sha256:76561f0de68a01da4df6af38e9955c4c9d1a5c90da73f7e40276a5728ec83d10", size = 59353, upload-time = "2024-06-06T16:53:46.224Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a8/87/77cc11c7a9ea9fd05503def69e3d18605852cd0d4b0d3b8f15bbeb3ef1d1/pooch-1.8.2-py3-none-any.whl", hash = "sha256:3529a57096f7198778a5ceefd5ac3ef0e4d06a6ddaf9fc2d609b806f25302c47", size = 64574, upload-time = "2024-06-06T16:53:44.343Z" }, +] + [[package]] name = "posthog" version = "5.4.0" @@ -4790,6 +4854,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/40/b0/4562db6223154aa4e22f939003cb92514c79f3d4dccca3444253fd17f902/Send2Trash-1.8.3-py3-none-any.whl", hash = "sha256:0c31227e0bd08961c7665474a3d1ef7193929fedda4233843689baa056be46c9", size = 18072, upload-time = "2024-04-07T00:01:07.438Z" }, ] +[[package]] +name = "sentinels" +version = "1.1.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/6f/9b/07195878aa25fe6ed209ec74bc55ae3e3d263b60a489c6e73fdca3c8fe05/sentinels-1.1.1.tar.gz", hash = "sha256:3c2f64f754187c19e0a1a029b148b74cf58dd12ec27b4e19c0e5d6e22b5a9a86", size = 4393, upload-time = "2025-08-12T07:57:50.26Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/49/65/dea992c6a97074f6d8ff9eab34741298cac2ce23e2b6c74fb7d08afdf85c/sentinels-1.1.1-py3-none-any.whl", hash = "sha256:835d3b28f3b47f5284afa4bf2db6e00f2dc5f80f9923d4b7e7aeeeccf6146a11", size = 3744, upload-time = "2025-08-12T07:57:48.858Z" }, +] + [[package]] name = "setuptools" version = "80.9.0" From 389465310ff0c0091a45f177023b1f4d16391a8a Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 16 Dec 2025 08:01:40 -0800 Subject: [PATCH 19/87] initial add --- .../LifeSnaps_example.ipynb | 219 ++++++++++++++++++ 1 file changed, 219 insertions(+) create mode 100644 src/BetterCodeBetterScience/LifeSnaps_example.ipynb diff --git a/src/BetterCodeBetterScience/LifeSnaps_example.ipynb b/src/BetterCodeBetterScience/LifeSnaps_example.ipynb new file mode 100644 index 0000000..bdab0ca --- /dev/null +++ b/src/BetterCodeBetterScience/LifeSnaps_example.ipynb @@ -0,0 +1,219 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 8, + "id": "5397a69c", + "metadata": {}, + "outputs": [], + "source": [ + "import pymongo\n", + "from pathlib import Path\n", + "import os\n", + "\n", + "\n", + "db_import_dir = Path('/Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized')\n" + ] + }, + { + "cell_type": "markdown", + "id": "9318b6f9", + "metadata": {}, + "source": [ + "## Step 1: load mongobd data from bson\n", + "\n", + "There are three data files:\n", + "\n", + "- fitbit.bson\n", + "- sema.bson\n", + "- surveys.bson\n", + "\n", + "we load these into the local mongodb using mongorestore.\n", + "\n", + "the id/user_id entry in the mongo records refers to the subject id\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1c55e4bf", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Collection 'fitbit' already loaded with 71284346 documents.\n", + "Collection 'sema' already loaded with 15380 documents.\n", + "Collection 'surveys' already loaded with 935 documents.\n" + ] + } + ], + "source": [ + "# Connect to MongoDB\n", + "def get_mongo_client(host='localhost', port=27017):\n", + " try:\n", + " client = pymongo.MongoClient(f\"mongodb://{host}:{port}/\")\n", + " except pymongo.errors.ConnectionError as e:\n", + " raise Exception(f\"Error connecting to MongoDB - have you set it up yet?: {e}\")\n", + " return client\n", + "\n", + "client = get_mongo_client()\n", + "\n", + "# load the database and import data if necessary\n", + "db = client['lifesnaps']\n", + "collection_lengths = {\n", + " 'fitbit': 71284346,\n", + " 'sema': 15380,\n", + " 'surveys': 935\n", + "}\n", + "\n", + "overwrite = False\n", + "\n", + "for collection_name, expected_length in collection_lengths.items():\n", + " collection = db[collection_name]\n", + " actual_length = collection.count_documents({})\n", + " # use ge since we will removing some objects below\n", + " if actual_length >= expected_length and not overwrite:\n", + " print(f\"Collection '{collection_name}' already loaded with {actual_length} documents.\")\n", + " else:\n", + " # import the data from the BSON file\n", + " print(f\"Collection '{collection_name}' has {actual_length} documents, expected {expected_length}. Importing data...\")\n", + " import_file = db_import_dir / f\"{collection_name}.bson\"\n", + " if not import_file.exists():\n", + " raise FileNotFoundError(f\"Import file {import_file} does not exist.\")\n", + " print(f\"Importing data into collection '{collection_name}' from {import_file}...\")\n", + " command = f\"mongorestore --host {client.address[0]} --port {client.address[1]} --db lifesnaps --collection {collection_name} --drop {import_file}\"\n", + " print(f\"Running command: {command}\")\n", + " os.system(command)\n", + " \n", + " collection = db[collection_name]\n", + " actual_length = collection.count_documents({})\n", + " assert actual_length >= expected_length, f\"After import, collection '{collection_name}' has {actual_length} documents, expected {expected_length}.\"\n", + " print(f\"Successfully imported collection '{collection_name}' with {actual_length} documents.\")\n", + " " + ] + }, + { + "cell_type": "markdown", + "id": "0729c2ec", + "metadata": {}, + "source": [ + "### Step 2: remove unnecessary entries from fitbit database\n", + "\n", + "The fitbit store is huge and we don't need many of the entries, so let's remove them.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "56ad6fcb", + "metadata": {}, + "source": [] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "a2dd2b2e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Removed 9844973 unwanted documents from 'fitbit' collection.\n" + ] + } + ], + "source": [ + "fitbit_types_to_keep = [\n", + " \"Profile\",\n", + " \"heart_rate\",\n", + " \"sleep\",\n", + " \"steps\",\n", + " \"lightly_active_minutes\",\n", + " \"moderately_active_minutes\",\n", + " \"very_active_minutes\",\n", + " \"sedentary_minutes\",\n", + " \"calories\",\n", + "]\n", + "\n", + "# remove unwanted fitbit data\n", + "fitbit_collection = db['fitbit']\n", + "deletion_result = fitbit_collection.delete_many({\"type\": {\"$nin\": fitbit_types_to_keep}})\n", + "print(f\"Removed {deletion_result.deleted_count} unwanted documents from 'fitbit' collection.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "3752b2db", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'Profile': 69,\n", + " 'calories': 9675782,\n", + " 'heart_rate': 48720040,\n", + " 'lightly_active_minutes': 7203,\n", + " 'moderately_active_minutes': 7203,\n", + " 'sedentary_minutes': 7203,\n", + " 'sleep': 4141,\n", + " 'steps': 3010529,\n", + " 'very_active_minutes': 7203}" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def get_collection_type_counts(db, collection_name, sample_size=3):\n", + " # Get distinct types in a collection and sample documents for each type\n", + " collection = db[collection_name]\n", + " distinct_types = collection.distinct(\"type\")\n", + " type_counts = {}\n", + " for dtype in distinct_types:\n", + " count = collection.count_documents({\"type\": dtype})\n", + " sample_docs = list(collection.find({\"type\": dtype}).limit(sample_size))\n", + " type_counts[dtype] = count\n", + " \n", + " return type_counts\n", + "\n", + "get_collection_type_counts(db, 'fitbit')\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11b8f064", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "BetterCodeBetterScience", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 98e22acdf92aac1443bd1367c8a542e65a69d7af Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 16 Dec 2025 15:36:27 -0800 Subject: [PATCH 20/87] remove 'one of us' --- book/software_engineering.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/book/software_engineering.md b/book/software_engineering.md index 90a9bd8..b6b1575 100644 --- a/book/software_engineering.md +++ b/book/software_engineering.md @@ -69,7 +69,7 @@ User stories are also useful for thinking through the potential impact of new fe Perhaps the most common example of violations of YAGNI comes about in the development of visualization tools. In this example, the developer might decide to create an visualizer to show how the original dataset is being converted into the new format, with interactive features that would allow the user to view features of individual files. The question that you should always ask yourself is: What user stories would this feature address? If it's difficult to come up with stories that make clear how the feature would help solve particular problems for users, then the feature is probably not needed. "If you build it, they will come" might work in baseball, but it rarely works in scientific software. -This is the reason that one of us (RP) regularly tells his trainees to post a note in their workspace with one simple mantra: "MVP". +This is the reason that I regularly tells my trainees to post a note in their workspace with one simple mantra: "MVP". ## Refactoring code @@ -954,7 +954,7 @@ def get_subject_label(file): return None ``` -When one of us asked the question "Should there ever be a file path that doesn't include a subject label?", the answer was "No", meaning that this code allows what amounts to an error to occur without announcing its presence. +When I asked the question "Should there ever be a file path that doesn't include a subject label?", the answer was "No", meaning that this code allows what amounts to an error to occur without announcing its presence. When we looked at the place where this function was used in the code, there was no check for whether the output was `None`, meaning that such an error would go unnoticed until it caused an error later when `subject_label` was assumed to be a string. Also note that the docstring for this function is misleading, as it states that a message will be printed if the return value is `None`, but no message is actually printed. In general, printing a message is a poor way to signal the potential presence of a problem, particularly if the code has a large amount of text output in which the message might be lost. From 0abbf6d02b23e57368a1e0f4a795f5b90f39f418 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 17 Dec 2025 08:39:11 -0800 Subject: [PATCH 21/87] add link check --- Makefile | 3 + pyproject.toml | 5 ++ uv.lock | 173 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 181 insertions(+) diff --git a/Makefile b/Makefile index 35add9a..cadd7fe 100644 --- a/Makefile +++ b/Makefile @@ -8,6 +8,9 @@ build-html: clean build-pdf: jupyter-book build book/ --builder pdflatex +check-links: + check-links + pipinstall: uv pip install -r pyproject.toml uv pip install -e . diff --git a/pyproject.toml b/pyproject.toml index 4e1d5b6..62b01fc 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -64,12 +64,17 @@ dependencies = [ "mystmd>=1.7.0", "mne>=1.11.0", "mongomock>=4.3.0", + "linkcheckmd>=1.4.0", ] [build-system] requires = ["hatchling"] build-backend = "hatchling.build" +# add script entry points here +[project.scripts] +check-links = "BetterCodeBetterScience.check_links:main" + [tool.codespell] # Ref: https://github.com/codespell-project/codespell#using-a-config-file skip = './book/transcripts,./book/_build/html,.js*,.git*,*.lock,*.bib,.venv*,*.ipynb,*.json' diff --git a/uv.lock b/uv.lock index a31483c..0c89da9 100644 --- a/uv.lock +++ b/uv.lock @@ -29,6 +29,62 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/39/e8/806475fe4cdfd8635535d3fa11bd61d19b7cc94b61b9147ebdd2ab4cbbee/acres-0.5.0-py3-none-any.whl", hash = "sha256:fcc32b974b510897de0f041609b4234f9ff03e2e960aea088f63973fb106c772", size = 12703, upload-time = "2025-06-04T12:40:28.745Z" }, ] +[[package]] +name = "aiohappyeyeballs" +version = "2.6.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/26/30/f84a107a9c4331c14b2b586036f40965c128aa4fee4dda5d3d51cb14ad54/aiohappyeyeballs-2.6.1.tar.gz", hash = "sha256:c3f9d0113123803ccadfdf3f0faa505bc78e6a72d1cc4806cbd719826e943558", size = 22760, upload-time = "2025-03-12T01:42:48.764Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0f/15/5bf3b99495fb160b63f95972b81750f18f7f4e02ad051373b669d17d44f2/aiohappyeyeballs-2.6.1-py3-none-any.whl", hash = "sha256:f349ba8f4b75cb25c99c5c2d84e997e485204d2902a9597802b0371f09331fb8", size = 15265, upload-time = "2025-03-12T01:42:47.083Z" }, +] + +[[package]] +name = "aiohttp" +version = "3.13.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "aiohappyeyeballs" }, + { name = "aiosignal" }, + { name = "attrs" }, + { name = "frozenlist" }, + { name = "multidict" }, + { name = "propcache" }, + { name = "yarl" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/1c/ce/3b83ebba6b3207a7135e5fcaba49706f8a4b6008153b4e30540c982fae26/aiohttp-3.13.2.tar.gz", hash = "sha256:40176a52c186aefef6eb3cad2cdd30cd06e3afbe88fe8ab2af9c0b90f228daca", size = 7837994, upload-time = "2025-10-28T20:59:39.937Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/29/9b/01f00e9856d0a73260e86dd8ed0c2234a466c5c1712ce1c281548df39777/aiohttp-3.13.2-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:b1e56bab2e12b2b9ed300218c351ee2a3d8c8fdab5b1ec6193e11a817767e47b", size = 737623, upload-time = "2025-10-28T20:56:30.797Z" }, + { url = "https://files.pythonhosted.org/packages/5a/1b/4be39c445e2b2bd0aab4ba736deb649fabf14f6757f405f0c9685019b9e9/aiohttp-3.13.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:364e25edaabd3d37b1db1f0cbcee8c73c9a3727bfa262b83e5e4cf3489a2a9dc", size = 492664, upload-time = "2025-10-28T20:56:32.708Z" }, + { url = "https://files.pythonhosted.org/packages/28/66/d35dcfea8050e131cdd731dff36434390479b4045a8d0b9d7111b0a968f1/aiohttp-3.13.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:c5c94825f744694c4b8db20b71dba9a257cd2ba8e010a803042123f3a25d50d7", size = 491808, upload-time = "2025-10-28T20:56:34.57Z" }, + { url = "https://files.pythonhosted.org/packages/00/29/8e4609b93e10a853b65f8291e64985de66d4f5848c5637cddc70e98f01f8/aiohttp-3.13.2-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ba2715d842ffa787be87cbfce150d5e88c87a98e0b62e0f5aa489169a393dbbb", size = 1738863, upload-time = "2025-10-28T20:56:36.377Z" }, + { url = "https://files.pythonhosted.org/packages/9d/fa/4ebdf4adcc0def75ced1a0d2d227577cd7b1b85beb7edad85fcc87693c75/aiohttp-3.13.2-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:585542825c4bc662221fb257889e011a5aa00f1ae4d75d1d246a5225289183e3", size = 1700586, upload-time = "2025-10-28T20:56:38.034Z" }, + { url = "https://files.pythonhosted.org/packages/da/04/73f5f02ff348a3558763ff6abe99c223381b0bace05cd4530a0258e52597/aiohttp-3.13.2-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:39d02cb6025fe1aabca329c5632f48c9532a3dabccd859e7e2f110668972331f", size = 1768625, upload-time = "2025-10-28T20:56:39.75Z" }, + { url = "https://files.pythonhosted.org/packages/f8/49/a825b79ffec124317265ca7d2344a86bcffeb960743487cb11988ffb3494/aiohttp-3.13.2-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:e67446b19e014d37342f7195f592a2a948141d15a312fe0e700c2fd2f03124f6", size = 1867281, upload-time = "2025-10-28T20:56:41.471Z" }, + { url = "https://files.pythonhosted.org/packages/b9/48/adf56e05f81eac31edcfae45c90928f4ad50ef2e3ea72cb8376162a368f8/aiohttp-3.13.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4356474ad6333e41ccefd39eae869ba15a6c5299c9c01dfdcfdd5c107be4363e", size = 1752431, upload-time = "2025-10-28T20:56:43.162Z" }, + { url = "https://files.pythonhosted.org/packages/30/ab/593855356eead019a74e862f21523db09c27f12fd24af72dbc3555b9bfd9/aiohttp-3.13.2-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:eeacf451c99b4525f700f078becff32c32ec327b10dcf31306a8a52d78166de7", size = 1562846, upload-time = "2025-10-28T20:56:44.85Z" }, + { url = "https://files.pythonhosted.org/packages/39/0f/9f3d32271aa8dc35036e9668e31870a9d3b9542dd6b3e2c8a30931cb27ae/aiohttp-3.13.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:d8a9b889aeabd7a4e9af0b7f4ab5ad94d42e7ff679aaec6d0db21e3b639ad58d", size = 1699606, upload-time = "2025-10-28T20:56:46.519Z" }, + { url = "https://files.pythonhosted.org/packages/2c/3c/52d2658c5699b6ef7692a3f7128b2d2d4d9775f2a68093f74bca06cf01e1/aiohttp-3.13.2-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:fa89cb11bc71a63b69568d5b8a25c3ca25b6d54c15f907ca1c130d72f320b76b", size = 1720663, upload-time = "2025-10-28T20:56:48.528Z" }, + { url = "https://files.pythonhosted.org/packages/9b/d4/8f8f3ff1fb7fb9e3f04fcad4e89d8a1cd8fc7d05de67e3de5b15b33008ff/aiohttp-3.13.2-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:8aa7c807df234f693fed0ecd507192fc97692e61fee5702cdc11155d2e5cadc8", size = 1737939, upload-time = "2025-10-28T20:56:50.77Z" }, + { url = "https://files.pythonhosted.org/packages/03/d3/ddd348f8a27a634daae39a1b8e291ff19c77867af438af844bf8b7e3231b/aiohttp-3.13.2-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:9eb3e33fdbe43f88c3c75fa608c25e7c47bbd80f48d012763cb67c47f39a7e16", size = 1555132, upload-time = "2025-10-28T20:56:52.568Z" }, + { url = "https://files.pythonhosted.org/packages/39/b8/46790692dc46218406f94374903ba47552f2f9f90dad554eed61bfb7b64c/aiohttp-3.13.2-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:9434bc0d80076138ea986833156c5a48c9c7a8abb0c96039ddbb4afc93184169", size = 1764802, upload-time = "2025-10-28T20:56:54.292Z" }, + { url = "https://files.pythonhosted.org/packages/ba/e4/19ce547b58ab2a385e5f0b8aa3db38674785085abcf79b6e0edd1632b12f/aiohttp-3.13.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ff15c147b2ad66da1f2cbb0622313f2242d8e6e8f9b79b5206c84523a4473248", size = 1719512, upload-time = "2025-10-28T20:56:56.428Z" }, + { url = "https://files.pythonhosted.org/packages/70/30/6355a737fed29dcb6dfdd48682d5790cb5eab050f7b4e01f49b121d3acad/aiohttp-3.13.2-cp312-cp312-win32.whl", hash = "sha256:27e569eb9d9e95dbd55c0fc3ec3a9335defbf1d8bc1d20171a49f3c4c607b93e", size = 426690, upload-time = "2025-10-28T20:56:58.736Z" }, + { url = "https://files.pythonhosted.org/packages/0a/0d/b10ac09069973d112de6ef980c1f6bb31cb7dcd0bc363acbdad58f927873/aiohttp-3.13.2-cp312-cp312-win_amd64.whl", hash = "sha256:8709a0f05d59a71f33fd05c17fc11fcb8c30140506e13c2f5e8ee1b8964e1b45", size = 453465, upload-time = "2025-10-28T20:57:00.795Z" }, +] + +[[package]] +name = "aiosignal" +version = "1.4.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "frozenlist" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/61/62/06741b579156360248d1ec624842ad0edf697050bbaf7c3e46394e106ad1/aiosignal-1.4.0.tar.gz", hash = "sha256:f47eecd9468083c2029cc99945502cb7708b082c232f9aca65da147157b251c7", size = 25007, upload-time = "2025-07-03T22:54:43.528Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/fb/76/641ae371508676492379f16e2fa48f4e2c11741bd63c48be4b12a6b09cba/aiosignal-1.4.0-py3-none-any.whl", hash = "sha256:053243f8b92b990551949e63930a839ff0cf0b0ebbe0597b0f3fb19e1a0fe82e", size = 7490, upload-time = "2025-07-03T22:54:42.156Z" }, +] + [[package]] name = "airium" version = "0.2.7" @@ -306,6 +362,7 @@ dependencies = [ { name = "jupyter" }, { name = "jupyter-book" }, { name = "jupytext" }, + { name = "linkcheckmd" }, { name = "mariadb" }, { name = "matplotlib" }, { name = "mdnewline" }, @@ -369,6 +426,7 @@ requires-dist = [ { name = "jupyter", specifier = ">=1.1.1" }, { name = "jupyter-book", specifier = ">=1.0.2" }, { name = "jupytext", specifier = ">=1.16.4" }, + { name = "linkcheckmd", specifier = ">=1.4.0" }, { name = "mariadb", specifier = ">=1.1.14" }, { name = "matplotlib", specifier = ">=3.9.2" }, { name = "mdnewline", specifier = ">=0.1.3" }, @@ -1394,6 +1452,31 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/38/74/f94141b38a51a553efef7f510fc213894161ae49b88bffd037f8d2a7cb2f/frozendict-2.4.7-py3-none-any.whl", hash = "sha256:972af65924ea25cf5b4d9326d549e69a9a4918d8a76a9d3a7cd174d98b237550", size = 16264, upload-time = "2025-11-11T22:40:12.836Z" }, ] +[[package]] +name = "frozenlist" +version = "1.8.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/2d/f5/c831fac6cc817d26fd54c7eaccd04ef7e0288806943f7cc5bbf69f3ac1f0/frozenlist-1.8.0.tar.gz", hash = "sha256:3ede829ed8d842f6cd48fc7081d7a41001a56f1f38603f9d49bf3020d59a31ad", size = 45875, upload-time = "2025-10-06T05:38:17.865Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/69/29/948b9aa87e75820a38650af445d2ef2b6b8a6fab1a23b6bb9e4ef0be2d59/frozenlist-1.8.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:78f7b9e5d6f2fdb88cdde9440dc147259b62b9d3b019924def9f6478be254ac1", size = 87782, upload-time = "2025-10-06T05:36:06.649Z" }, + { url = "https://files.pythonhosted.org/packages/64/80/4f6e318ee2a7c0750ed724fa33a4bdf1eacdc5a39a7a24e818a773cd91af/frozenlist-1.8.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:229bf37d2e4acdaf808fd3f06e854a4a7a3661e871b10dc1f8f1896a3b05f18b", size = 50594, upload-time = "2025-10-06T05:36:07.69Z" }, + { url = "https://files.pythonhosted.org/packages/2b/94/5c8a2b50a496b11dd519f4a24cb5496cf125681dd99e94c604ccdea9419a/frozenlist-1.8.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f833670942247a14eafbb675458b4e61c82e002a148f49e68257b79296e865c4", size = 50448, upload-time = "2025-10-06T05:36:08.78Z" }, + { url = "https://files.pythonhosted.org/packages/6a/bd/d91c5e39f490a49df14320f4e8c80161cfcce09f1e2cde1edd16a551abb3/frozenlist-1.8.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:494a5952b1c597ba44e0e78113a7266e656b9794eec897b19ead706bd7074383", size = 242411, upload-time = "2025-10-06T05:36:09.801Z" }, + { url = "https://files.pythonhosted.org/packages/8f/83/f61505a05109ef3293dfb1ff594d13d64a2324ac3482be2cedc2be818256/frozenlist-1.8.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:96f423a119f4777a4a056b66ce11527366a8bb92f54e541ade21f2374433f6d4", size = 243014, upload-time = "2025-10-06T05:36:11.394Z" }, + { url = "https://files.pythonhosted.org/packages/d8/cb/cb6c7b0f7d4023ddda30cf56b8b17494eb3a79e3fda666bf735f63118b35/frozenlist-1.8.0-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:3462dd9475af2025c31cc61be6652dfa25cbfb56cbbf52f4ccfe029f38decaf8", size = 234909, upload-time = "2025-10-06T05:36:12.598Z" }, + { url = "https://files.pythonhosted.org/packages/31/c5/cd7a1f3b8b34af009fb17d4123c5a778b44ae2804e3ad6b86204255f9ec5/frozenlist-1.8.0-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c4c800524c9cd9bac5166cd6f55285957fcfc907db323e193f2afcd4d9abd69b", size = 250049, upload-time = "2025-10-06T05:36:14.065Z" }, + { url = "https://files.pythonhosted.org/packages/c0/01/2f95d3b416c584a1e7f0e1d6d31998c4a795f7544069ee2e0962a4b60740/frozenlist-1.8.0-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:d6a5df73acd3399d893dafc71663ad22534b5aa4f94e8a2fabfe856c3c1b6a52", size = 256485, upload-time = "2025-10-06T05:36:15.39Z" }, + { url = "https://files.pythonhosted.org/packages/ce/03/024bf7720b3abaebcff6d0793d73c154237b85bdf67b7ed55e5e9596dc9a/frozenlist-1.8.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:405e8fe955c2280ce66428b3ca55e12b3c4e9c336fb2103a4937e891c69a4a29", size = 237619, upload-time = "2025-10-06T05:36:16.558Z" }, + { url = "https://files.pythonhosted.org/packages/69/fa/f8abdfe7d76b731f5d8bd217827cf6764d4f1d9763407e42717b4bed50a0/frozenlist-1.8.0-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:908bd3f6439f2fef9e85031b59fd4f1297af54415fb60e4254a95f75b3cab3f3", size = 250320, upload-time = "2025-10-06T05:36:17.821Z" }, + { url = "https://files.pythonhosted.org/packages/f5/3c/b051329f718b463b22613e269ad72138cc256c540f78a6de89452803a47d/frozenlist-1.8.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:294e487f9ec720bd8ffcebc99d575f7eff3568a08a253d1ee1a0378754b74143", size = 246820, upload-time = "2025-10-06T05:36:19.046Z" }, + { url = "https://files.pythonhosted.org/packages/0f/ae/58282e8f98e444b3f4dd42448ff36fa38bef29e40d40f330b22e7108f565/frozenlist-1.8.0-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:74c51543498289c0c43656701be6b077f4b265868fa7f8a8859c197006efb608", size = 250518, upload-time = "2025-10-06T05:36:20.763Z" }, + { url = "https://files.pythonhosted.org/packages/8f/96/007e5944694d66123183845a106547a15944fbbb7154788cbf7272789536/frozenlist-1.8.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:776f352e8329135506a1d6bf16ac3f87bc25b28e765949282dcc627af36123aa", size = 239096, upload-time = "2025-10-06T05:36:22.129Z" }, + { url = "https://files.pythonhosted.org/packages/66/bb/852b9d6db2fa40be96f29c0d1205c306288f0684df8fd26ca1951d461a56/frozenlist-1.8.0-cp312-cp312-win32.whl", hash = "sha256:433403ae80709741ce34038da08511d4a77062aa924baf411ef73d1146e74faf", size = 39985, upload-time = "2025-10-06T05:36:23.661Z" }, + { url = "https://files.pythonhosted.org/packages/b8/af/38e51a553dd66eb064cdf193841f16f077585d4d28394c2fa6235cb41765/frozenlist-1.8.0-cp312-cp312-win_amd64.whl", hash = "sha256:34187385b08f866104f0c0617404c8eb08165ab1272e884abc89c112e9c00746", size = 44591, upload-time = "2025-10-06T05:36:24.958Z" }, + { url = "https://files.pythonhosted.org/packages/a7/06/1dc65480ab147339fecc70797e9c2f69d9cea9cf38934ce08df070fdb9cb/frozenlist-1.8.0-cp312-cp312-win_arm64.whl", hash = "sha256:fe3c58d2f5db5fbd18c2987cba06d51b0529f52bc3a6cdc33d3f4eab725104bd", size = 40102, upload-time = "2025-10-06T05:36:26.333Z" }, + { url = "https://files.pythonhosted.org/packages/9a/9a/e35b4a917281c0b8419d4207f4334c8e8c5dbf4f3f5f9ada73958d937dcc/frozenlist-1.8.0-py3-none-any.whl", hash = "sha256:0c18a16eab41e82c295618a77502e17b195883241c563b00f0aa5106fc4eaa0d", size = 13409, upload-time = "2025-10-06T05:38:16.721Z" }, +] + [[package]] name = "fsspec" version = "2025.12.0" @@ -2453,6 +2536,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/83/60/d497a310bde3f01cb805196ac61b7ad6dc5dcf8dce66634dc34364b20b4f/lazy_loader-0.4-py3-none-any.whl", hash = "sha256:342aa8e14d543a154047afb4ba8ef17f5563baad3fc610d7b15b213b0f119efc", size = 12097, upload-time = "2024-04-05T13:03:10.514Z" }, ] +[[package]] +name = "linkcheckmd" +version = "1.4.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "aiohttp" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/59/ea/34da82a2c946699e18275b19a97464046f1193299c42401ae7b088108eed/linkcheckmd-1.4.0.tar.gz", hash = "sha256:3a539c9a4e11697fc7fcc269d379accf93c8cccbf971f3cea0bae40912d9f609", size = 10760, upload-time = "2021-02-28T02:50:22.504Z" } + [[package]] name = "linkml" version = "1.9.3" @@ -2852,6 +2944,33 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/c5/31/5b1a1f70eb0e87d1678e9624908f86317787b536060641d6798e3cf70ace/msgpack-1.1.2-cp312-cp312-win_arm64.whl", hash = "sha256:be5980f3ee0e6bd44f3a9e9dea01054f175b50c3e6cdb692bc9424c0bbb8bf69", size = 64119, upload-time = "2025-10-08T09:15:13.589Z" }, ] +[[package]] +name = "multidict" +version = "6.7.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/80/1e/5492c365f222f907de1039b91f922b93fa4f764c713ee858d235495d8f50/multidict-6.7.0.tar.gz", hash = "sha256:c6e99d9a65ca282e578dfea819cfa9c0a62b2499d8677392e09feaf305e9e6f5", size = 101834, upload-time = "2025-10-06T14:52:30.657Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c2/9e/9f61ac18d9c8b475889f32ccfa91c9f59363480613fc807b6e3023d6f60b/multidict-6.7.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:8a3862568a36d26e650a19bb5cbbba14b71789032aebc0423f8cc5f150730184", size = 76877, upload-time = "2025-10-06T14:49:20.884Z" }, + { url = "https://files.pythonhosted.org/packages/38/6f/614f09a04e6184f8824268fce4bc925e9849edfa654ddd59f0b64508c595/multidict-6.7.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:960c60b5849b9b4f9dcc9bea6e3626143c252c74113df2c1540aebce70209b45", size = 45467, upload-time = "2025-10-06T14:49:22.054Z" }, + { url = "https://files.pythonhosted.org/packages/b3/93/c4f67a436dd026f2e780c433277fff72be79152894d9fc36f44569cab1a6/multidict-6.7.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:2049be98fb57a31b4ccf870bf377af2504d4ae35646a19037ec271e4c07998aa", size = 43834, upload-time = "2025-10-06T14:49:23.566Z" }, + { url = "https://files.pythonhosted.org/packages/7f/f5/013798161ca665e4a422afbc5e2d9e4070142a9ff8905e482139cd09e4d0/multidict-6.7.0-cp312-cp312-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:0934f3843a1860dd465d38895c17fce1f1cb37295149ab05cd1b9a03afacb2a7", size = 250545, upload-time = "2025-10-06T14:49:24.882Z" }, + { url = "https://files.pythonhosted.org/packages/71/2f/91dbac13e0ba94669ea5119ba267c9a832f0cb65419aca75549fcf09a3dc/multidict-6.7.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b3e34f3a1b8131ba06f1a73adab24f30934d148afcd5f5de9a73565a4404384e", size = 258305, upload-time = "2025-10-06T14:49:26.778Z" }, + { url = "https://files.pythonhosted.org/packages/ef/b0/754038b26f6e04488b48ac621f779c341338d78503fb45403755af2df477/multidict-6.7.0-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:efbb54e98446892590dc2458c19c10344ee9a883a79b5cec4bc34d6656e8d546", size = 242363, upload-time = "2025-10-06T14:49:28.562Z" }, + { url = "https://files.pythonhosted.org/packages/87/15/9da40b9336a7c9fa606c4cf2ed80a649dffeb42b905d4f63a1d7eb17d746/multidict-6.7.0-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a35c5fc61d4f51eb045061e7967cfe3123d622cd500e8868e7c0c592a09fedc4", size = 268375, upload-time = "2025-10-06T14:49:29.96Z" }, + { url = "https://files.pythonhosted.org/packages/82/72/c53fcade0cc94dfaad583105fd92b3a783af2091eddcb41a6d5a52474000/multidict-6.7.0-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:29fe6740ebccba4175af1b9b87bf553e9c15cd5868ee967e010efcf94e4fd0f1", size = 269346, upload-time = "2025-10-06T14:49:31.404Z" }, + { url = "https://files.pythonhosted.org/packages/0d/e2/9baffdae21a76f77ef8447f1a05a96ec4bc0a24dae08767abc0a2fe680b8/multidict-6.7.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:123e2a72e20537add2f33a79e605f6191fba2afda4cbb876e35c1a7074298a7d", size = 256107, upload-time = "2025-10-06T14:49:32.974Z" }, + { url = "https://files.pythonhosted.org/packages/3c/06/3f06f611087dc60d65ef775f1fb5aca7c6d61c6db4990e7cda0cef9b1651/multidict-6.7.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:b284e319754366c1aee2267a2036248b24eeb17ecd5dc16022095e747f2f4304", size = 253592, upload-time = "2025-10-06T14:49:34.52Z" }, + { url = "https://files.pythonhosted.org/packages/20/24/54e804ec7945b6023b340c412ce9c3f81e91b3bf5fa5ce65558740141bee/multidict-6.7.0-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:803d685de7be4303b5a657b76e2f6d1240e7e0a8aa2968ad5811fa2285553a12", size = 251024, upload-time = "2025-10-06T14:49:35.956Z" }, + { url = "https://files.pythonhosted.org/packages/14/48/011cba467ea0b17ceb938315d219391d3e421dfd35928e5dbdc3f4ae76ef/multidict-6.7.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:c04a328260dfd5db8c39538f999f02779012268f54614902d0afc775d44e0a62", size = 251484, upload-time = "2025-10-06T14:49:37.631Z" }, + { url = "https://files.pythonhosted.org/packages/0d/2f/919258b43bb35b99fa127435cfb2d91798eb3a943396631ef43e3720dcf4/multidict-6.7.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:8a19cdb57cd3df4cd865849d93ee14920fb97224300c88501f16ecfa2604b4e0", size = 263579, upload-time = "2025-10-06T14:49:39.502Z" }, + { url = "https://files.pythonhosted.org/packages/31/22/a0e884d86b5242b5a74cf08e876bdf299e413016b66e55511f7a804a366e/multidict-6.7.0-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:9b2fd74c52accced7e75de26023b7dccee62511a600e62311b918ec5c168fc2a", size = 259654, upload-time = "2025-10-06T14:49:41.32Z" }, + { url = "https://files.pythonhosted.org/packages/b2/e5/17e10e1b5c5f5a40f2fcbb45953c9b215f8a4098003915e46a93f5fcaa8f/multidict-6.7.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:3e8bfdd0e487acf992407a140d2589fe598238eaeffa3da8448d63a63cd363f8", size = 251511, upload-time = "2025-10-06T14:49:46.021Z" }, + { url = "https://files.pythonhosted.org/packages/e3/9a/201bb1e17e7af53139597069c375e7b0dcbd47594604f65c2d5359508566/multidict-6.7.0-cp312-cp312-win32.whl", hash = "sha256:dd32a49400a2c3d52088e120ee00c1e3576cbff7e10b98467962c74fdb762ed4", size = 41895, upload-time = "2025-10-06T14:49:48.718Z" }, + { url = "https://files.pythonhosted.org/packages/46/e2/348cd32faad84eaf1d20cce80e2bb0ef8d312c55bca1f7fa9865e7770aaf/multidict-6.7.0-cp312-cp312-win_amd64.whl", hash = "sha256:92abb658ef2d7ef22ac9f8bb88e8b6c3e571671534e029359b6d9e845923eb1b", size = 46073, upload-time = "2025-10-06T14:49:50.28Z" }, + { url = "https://files.pythonhosted.org/packages/25/ec/aad2613c1910dce907480e0c3aa306905830f25df2e54ccc9dea450cb5aa/multidict-6.7.0-cp312-cp312-win_arm64.whl", hash = "sha256:490dab541a6a642ce1a9d61a4781656b346a55c13038f0b1244653828e3a83ec", size = 43226, upload-time = "2025-10-06T14:49:52.304Z" }, + { url = "https://files.pythonhosted.org/packages/b7/da/7d22601b625e241d4f23ef1ebff8acfc60da633c9e7e7922e24d10f592b3/multidict-6.7.0-py3-none-any.whl", hash = "sha256:394fc5c42a333c9ffc3e421a4c85e08580d990e08b99f6bf35b4132114c5dcb3", size = 12317, upload-time = "2025-10-06T14:52:29.272Z" }, +] + [[package]] name = "murmurhash" version = "1.0.15" @@ -3830,6 +3949,30 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/cb/79/217ae7eb2462ef254aab95b3610d85105a86f4ec8b43863788e32c0d5369/pronto-2.7.2-py3-none-any.whl", hash = "sha256:9c4b037ae1f9598398f38a55306eb1af7c3cdf8be7d535925500b978b32d1450", size = 62212, upload-time = "2025-11-10T12:45:42.969Z" }, ] +[[package]] +name = "propcache" +version = "0.4.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/9e/da/e9fc233cf63743258bff22b3dfa7ea5baef7b5bc324af47a0ad89b8ffc6f/propcache-0.4.1.tar.gz", hash = "sha256:f48107a8c637e80362555f37ecf49abe20370e557cc4ab374f04ec4423c97c3d", size = 46442, upload-time = "2025-10-08T19:49:02.291Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a2/0f/f17b1b2b221d5ca28b4b876e8bb046ac40466513960646bda8e1853cdfa2/propcache-0.4.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:e153e9cd40cc8945138822807139367f256f89c6810c2634a4f6902b52d3b4e2", size = 80061, upload-time = "2025-10-08T19:46:46.075Z" }, + { url = "https://files.pythonhosted.org/packages/76/47/8ccf75935f51448ba9a16a71b783eb7ef6b9ee60f5d14c7f8a8a79fbeed7/propcache-0.4.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:cd547953428f7abb73c5ad82cbb32109566204260d98e41e5dfdc682eb7f8403", size = 46037, upload-time = "2025-10-08T19:46:47.23Z" }, + { url = "https://files.pythonhosted.org/packages/0a/b6/5c9a0e42df4d00bfb4a3cbbe5cf9f54260300c88a0e9af1f47ca5ce17ac0/propcache-0.4.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f048da1b4f243fc44f205dfd320933a951b8d89e0afd4c7cacc762a8b9165207", size = 47324, upload-time = "2025-10-08T19:46:48.384Z" }, + { url = "https://files.pythonhosted.org/packages/9e/d3/6c7ee328b39a81ee877c962469f1e795f9db87f925251efeb0545e0020d0/propcache-0.4.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ec17c65562a827bba85e3872ead335f95405ea1674860d96483a02f5c698fa72", size = 225505, upload-time = "2025-10-08T19:46:50.055Z" }, + { url = "https://files.pythonhosted.org/packages/01/5d/1c53f4563490b1d06a684742cc6076ef944bc6457df6051b7d1a877c057b/propcache-0.4.1-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:405aac25c6394ef275dee4c709be43745d36674b223ba4eb7144bf4d691b7367", size = 230242, upload-time = "2025-10-08T19:46:51.815Z" }, + { url = "https://files.pythonhosted.org/packages/20/e1/ce4620633b0e2422207c3cb774a0ee61cac13abc6217763a7b9e2e3f4a12/propcache-0.4.1-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:0013cb6f8dde4b2a2f66903b8ba740bdfe378c943c4377a200551ceb27f379e4", size = 238474, upload-time = "2025-10-08T19:46:53.208Z" }, + { url = "https://files.pythonhosted.org/packages/46/4b/3aae6835b8e5f44ea6a68348ad90f78134047b503765087be2f9912140ea/propcache-0.4.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:15932ab57837c3368b024473a525e25d316d8353016e7cc0e5ba9eb343fbb1cf", size = 221575, upload-time = "2025-10-08T19:46:54.511Z" }, + { url = "https://files.pythonhosted.org/packages/6e/a5/8a5e8678bcc9d3a1a15b9a29165640d64762d424a16af543f00629c87338/propcache-0.4.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:031dce78b9dc099f4c29785d9cf5577a3faf9ebf74ecbd3c856a7b92768c3df3", size = 216736, upload-time = "2025-10-08T19:46:56.212Z" }, + { url = "https://files.pythonhosted.org/packages/f1/63/b7b215eddeac83ca1c6b934f89d09a625aa9ee4ba158338854c87210cc36/propcache-0.4.1-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:ab08df6c9a035bee56e31af99be621526bd237bea9f32def431c656b29e41778", size = 213019, upload-time = "2025-10-08T19:46:57.595Z" }, + { url = "https://files.pythonhosted.org/packages/57/74/f580099a58c8af587cac7ba19ee7cb418506342fbbe2d4a4401661cca886/propcache-0.4.1-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:4d7af63f9f93fe593afbf104c21b3b15868efb2c21d07d8732c0c4287e66b6a6", size = 220376, upload-time = "2025-10-08T19:46:59.067Z" }, + { url = "https://files.pythonhosted.org/packages/c4/ee/542f1313aff7eaf19c2bb758c5d0560d2683dac001a1c96d0774af799843/propcache-0.4.1-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:cfc27c945f422e8b5071b6e93169679e4eb5bf73bbcbf1ba3ae3a83d2f78ebd9", size = 226988, upload-time = "2025-10-08T19:47:00.544Z" }, + { url = "https://files.pythonhosted.org/packages/8f/18/9c6b015dd9c6930f6ce2229e1f02fb35298b847f2087ea2b436a5bfa7287/propcache-0.4.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:35c3277624a080cc6ec6f847cbbbb5b49affa3598c4535a0a4682a697aaa5c75", size = 215615, upload-time = "2025-10-08T19:47:01.968Z" }, + { url = "https://files.pythonhosted.org/packages/80/9e/e7b85720b98c45a45e1fca6a177024934dc9bc5f4d5dd04207f216fc33ed/propcache-0.4.1-cp312-cp312-win32.whl", hash = "sha256:671538c2262dadb5ba6395e26c1731e1d52534bfe9ae56d0b5573ce539266aa8", size = 38066, upload-time = "2025-10-08T19:47:03.503Z" }, + { url = "https://files.pythonhosted.org/packages/54/09/d19cff2a5aaac632ec8fc03737b223597b1e347416934c1b3a7df079784c/propcache-0.4.1-cp312-cp312-win_amd64.whl", hash = "sha256:cb2d222e72399fcf5890d1d5cc1060857b9b236adff2792ff48ca2dfd46c81db", size = 41655, upload-time = "2025-10-08T19:47:04.973Z" }, + { url = "https://files.pythonhosted.org/packages/68/ab/6b5c191bb5de08036a8c697b265d4ca76148efb10fa162f14af14fb5f076/propcache-0.4.1-cp312-cp312-win_arm64.whl", hash = "sha256:204483131fb222bdaaeeea9f9e6c6ed0cac32731f75dfc1d4a567fc1926477c1", size = 37789, upload-time = "2025-10-08T19:47:06.077Z" }, + { url = "https://files.pythonhosted.org/packages/5b/5a/bc7b4a4ef808fa59a816c17b20c4bef6884daebbdf627ff2a161da67da19/propcache-0.4.1-py3-none-any.whl", hash = "sha256:af2a6052aeb6cf17d3e46ee169099044fd8224cbaf75c76a2ef596e8163e2237", size = 13305, upload-time = "2025-10-08T19:49:00.792Z" }, +] + [[package]] name = "protobuf" version = "6.33.2" @@ -5837,6 +5980,36 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/15/d1/b51471c11592ff9c012bd3e2f7334a6ff2f42a7aed2caffcf0bdddc9cb89/wrapt-2.0.1-py3-none-any.whl", hash = "sha256:4d2ce1bf1a48c5277d7969259232b57645aae5686dba1eaeade39442277afbca", size = 44046, upload-time = "2025-11-07T00:45:32.116Z" }, ] +[[package]] +name = "yarl" +version = "1.22.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "idna" }, + { name = "multidict" }, + { name = "propcache" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/57/63/0c6ebca57330cd313f6102b16dd57ffaf3ec4c83403dcb45dbd15c6f3ea1/yarl-1.22.0.tar.gz", hash = "sha256:bebf8557577d4401ba8bd9ff33906f1376c877aa78d1fe216ad01b4d6745af71", size = 187169, upload-time = "2025-10-06T14:12:55.963Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/75/ff/46736024fee3429b80a165a732e38e5d5a238721e634ab41b040d49f8738/yarl-1.22.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:e340382d1afa5d32b892b3ff062436d592ec3d692aeea3bef3a5cfe11bbf8c6f", size = 142000, upload-time = "2025-10-06T14:09:44.631Z" }, + { url = "https://files.pythonhosted.org/packages/5a/9a/b312ed670df903145598914770eb12de1bac44599549b3360acc96878df8/yarl-1.22.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:f1e09112a2c31ffe8d80be1b0988fa6a18c5d5cad92a9ffbb1c04c91bfe52ad2", size = 94338, upload-time = "2025-10-06T14:09:46.372Z" }, + { url = "https://files.pythonhosted.org/packages/ba/f5/0601483296f09c3c65e303d60c070a5c19fcdbc72daa061e96170785bc7d/yarl-1.22.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:939fe60db294c786f6b7c2d2e121576628468f65453d86b0fe36cb52f987bd74", size = 94909, upload-time = "2025-10-06T14:09:48.648Z" }, + { url = "https://files.pythonhosted.org/packages/60/41/9a1fe0b73dbcefce72e46cf149b0e0a67612d60bfc90fb59c2b2efdfbd86/yarl-1.22.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e1651bf8e0398574646744c1885a41198eba53dc8a9312b954073f845c90a8df", size = 372940, upload-time = "2025-10-06T14:09:50.089Z" }, + { url = "https://files.pythonhosted.org/packages/17/7a/795cb6dfee561961c30b800f0ed616b923a2ec6258b5def2a00bf8231334/yarl-1.22.0-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:b8a0588521a26bf92a57a1705b77b8b59044cdceccac7151bd8d229e66b8dedb", size = 345825, upload-time = "2025-10-06T14:09:52.142Z" }, + { url = "https://files.pythonhosted.org/packages/d7/93/a58f4d596d2be2ae7bab1a5846c4d270b894958845753b2c606d666744d3/yarl-1.22.0-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:42188e6a615c1a75bcaa6e150c3fe8f3e8680471a6b10150c5f7e83f47cc34d2", size = 386705, upload-time = "2025-10-06T14:09:54.128Z" }, + { url = "https://files.pythonhosted.org/packages/61/92/682279d0e099d0e14d7fd2e176bd04f48de1484f56546a3e1313cd6c8e7c/yarl-1.22.0-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:f6d2cb59377d99718913ad9a151030d6f83ef420a2b8f521d94609ecc106ee82", size = 396518, upload-time = "2025-10-06T14:09:55.762Z" }, + { url = "https://files.pythonhosted.org/packages/db/0f/0d52c98b8a885aeda831224b78f3be7ec2e1aa4a62091f9f9188c3c65b56/yarl-1.22.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:50678a3b71c751d58d7908edc96d332af328839eea883bb554a43f539101277a", size = 377267, upload-time = "2025-10-06T14:09:57.958Z" }, + { url = "https://files.pythonhosted.org/packages/22/42/d2685e35908cbeaa6532c1fc73e89e7f2efb5d8a7df3959ea8e37177c5a3/yarl-1.22.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:1e8fbaa7cec507aa24ea27a01456e8dd4b6fab829059b69844bd348f2d467124", size = 365797, upload-time = "2025-10-06T14:09:59.527Z" }, + { url = "https://files.pythonhosted.org/packages/a2/83/cf8c7bcc6355631762f7d8bdab920ad09b82efa6b722999dfb05afa6cfac/yarl-1.22.0-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:433885ab5431bc3d3d4f2f9bd15bfa1614c522b0f1405d62c4f926ccd69d04fa", size = 365535, upload-time = "2025-10-06T14:10:01.139Z" }, + { url = "https://files.pythonhosted.org/packages/25/e1/5302ff9b28f0c59cac913b91fe3f16c59a033887e57ce9ca5d41a3a94737/yarl-1.22.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:b790b39c7e9a4192dc2e201a282109ed2985a1ddbd5ac08dc56d0e121400a8f7", size = 382324, upload-time = "2025-10-06T14:10:02.756Z" }, + { url = "https://files.pythonhosted.org/packages/bf/cd/4617eb60f032f19ae3a688dc990d8f0d89ee0ea378b61cac81ede3e52fae/yarl-1.22.0-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:31f0b53913220599446872d757257be5898019c85e7971599065bc55065dc99d", size = 383803, upload-time = "2025-10-06T14:10:04.552Z" }, + { url = "https://files.pythonhosted.org/packages/59/65/afc6e62bb506a319ea67b694551dab4a7e6fb7bf604e9bd9f3e11d575fec/yarl-1.22.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:a49370e8f711daec68d09b821a34e1167792ee2d24d405cbc2387be4f158b520", size = 374220, upload-time = "2025-10-06T14:10:06.489Z" }, + { url = "https://files.pythonhosted.org/packages/e7/3d/68bf18d50dc674b942daec86a9ba922d3113d8399b0e52b9897530442da2/yarl-1.22.0-cp312-cp312-win32.whl", hash = "sha256:70dfd4f241c04bd9239d53b17f11e6ab672b9f1420364af63e8531198e3f5fe8", size = 81589, upload-time = "2025-10-06T14:10:09.254Z" }, + { url = "https://files.pythonhosted.org/packages/c8/9a/6ad1a9b37c2f72874f93e691b2e7ecb6137fb2b899983125db4204e47575/yarl-1.22.0-cp312-cp312-win_amd64.whl", hash = "sha256:8884d8b332a5e9b88e23f60bb166890009429391864c685e17bd73a9eda9105c", size = 87213, upload-time = "2025-10-06T14:10:11.369Z" }, + { url = "https://files.pythonhosted.org/packages/44/c5/c21b562d1680a77634d748e30c653c3ca918beb35555cff24986fff54598/yarl-1.22.0-cp312-cp312-win_arm64.whl", hash = "sha256:ea70f61a47f3cc93bdf8b2f368ed359ef02a01ca6393916bc8ff877427181e74", size = 81330, upload-time = "2025-10-06T14:10:13.112Z" }, + { url = "https://files.pythonhosted.org/packages/73/ae/b48f95715333080afb75a4504487cbe142cae1268afc482d06692d605ae6/yarl-1.22.0-py3-none-any.whl", hash = "sha256:1380560bdba02b6b6c90de54133c81c9f2a453dee9912fe58c1dcced1edb7cff", size = 46814, upload-time = "2025-10-06T14:12:53.872Z" }, +] + [[package]] name = "zarr" version = "3.1.5" From 693bd6643cefda4bb7f365e14e44aebbb18d5222 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 17 Dec 2025 08:39:30 -0800 Subject: [PATCH 22/87] full data prep: --- .../LifeSnaps_example.ipynb | 514 ++++++++++++++++-- 1 file changed, 467 insertions(+), 47 deletions(-) diff --git a/src/BetterCodeBetterScience/LifeSnaps_example.ipynb b/src/BetterCodeBetterScience/LifeSnaps_example.ipynb index bdab0ca..9d75202 100644 --- a/src/BetterCodeBetterScience/LifeSnaps_example.ipynb +++ b/src/BetterCodeBetterScience/LifeSnaps_example.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 8, + "execution_count": 1, "id": "5397a69c", "metadata": {}, "outputs": [], @@ -36,7 +36,32 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, + "id": "342e7980", + "metadata": {}, + "outputs": [], + "source": [ + "def get_collection_type_counts(db, collection_name, sample_size=3):\n", + " # Get distinct types in a collection and sample documents for each type\n", + " collection = db[collection_name]\n", + " distinct_types = collection.distinct(\"type\")\n", + " type_counts = {}\n", + " for dtype in distinct_types:\n", + " count = collection.count_documents({\"type\": dtype})\n", + " sample_docs = list(collection.find({\"type\": dtype}).limit(sample_size))\n", + " type_counts[dtype] = count\n", + " \n", + " return type_counts\n", + "\n", + "def get_collection_size(db, collection_name):\n", + " # Get the numner of documents in a collection\n", + "\n", + " return db[collection_name].count_documents({})" + ] + }, + { + "cell_type": "code", + "execution_count": 3, "id": "1c55e4bf", "metadata": {}, "outputs": [ @@ -44,9 +69,100 @@ "name": "stdout", "output_type": "stream", "text": [ - "Collection 'fitbit' already loaded with 71284346 documents.\n", - "Collection 'sema' already loaded with 15380 documents.\n", - "Collection 'surveys' already loaded with 935 documents.\n" + "Collection 'fitbit' has 0 documents, expected 71284346. Importing data...\n", + "Importing data into collection 'fitbit' from /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/fitbit.bson...\n", + "Running command: mongorestore --host localhost --port 27017 --db lifesnaps --collection fitbit --drop /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/fitbit.bson\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2025-12-16T20:53:00.855-0800\tchecking for collection data in /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/fitbit.bson\n", + "2025-12-16T20:53:00.856-0800\treading metadata for lifesnaps.fitbit from /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/fitbit.metadata.json\n", + "2025-12-16T20:53:00.889-0800\trestoring lifesnaps.fitbit from /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/fitbit.bson\n", + "2025-12-16T20:53:03.854-0800\t[........................] lifesnaps.fitbit 246MB/9.02GB (2.7%)\n", + "2025-12-16T20:53:06.854-0800\t[#.......................] lifesnaps.fitbit 511MB/9.02GB (5.5%)\n", + "2025-12-16T20:53:09.854-0800\t[#.......................] lifesnaps.fitbit 758MB/9.02GB (8.2%)\n", + "2025-12-16T20:53:12.854-0800\t[##......................] lifesnaps.fitbit 968MB/9.02GB (10.5%)\n", + "2025-12-16T20:53:15.854-0800\t[###.....................] lifesnaps.fitbit 1.19GB/9.02GB (13.2%)\n", + "2025-12-16T20:53:18.854-0800\t[###.....................] lifesnaps.fitbit 1.45GB/9.02GB (16.0%)\n", + "2025-12-16T20:53:21.854-0800\t[####....................] lifesnaps.fitbit 1.66GB/9.02GB (18.4%)\n", + "2025-12-16T20:53:24.854-0800\t[#####...................] lifesnaps.fitbit 1.90GB/9.02GB (21.0%)\n", + "2025-12-16T20:53:27.854-0800\t[#####...................] lifesnaps.fitbit 2.15GB/9.02GB (23.8%)\n", + "2025-12-16T20:53:30.854-0800\t[######..................] lifesnaps.fitbit 2.40GB/9.02GB (26.7%)\n", + "2025-12-16T20:53:33.854-0800\t[#######.................] lifesnaps.fitbit 2.66GB/9.02GB (29.5%)\n", + "2025-12-16T20:53:36.854-0800\t[#######.................] lifesnaps.fitbit 2.91GB/9.02GB (32.3%)\n", + "2025-12-16T20:53:39.854-0800\t[########................] lifesnaps.fitbit 3.13GB/9.02GB (34.7%)\n", + "2025-12-16T20:53:42.854-0800\t[#########...............] lifesnaps.fitbit 3.39GB/9.02GB (37.6%)\n", + "2025-12-16T20:53:45.854-0800\t[#########...............] lifesnaps.fitbit 3.64GB/9.02GB (40.4%)\n", + "2025-12-16T20:53:48.854-0800\t[##########..............] lifesnaps.fitbit 3.89GB/9.02GB (43.1%)\n", + "2025-12-16T20:53:51.854-0800\t[##########..............] lifesnaps.fitbit 4.13GB/9.02GB (45.8%)\n", + "2025-12-16T20:53:54.854-0800\t[###########.............] lifesnaps.fitbit 4.34GB/9.02GB (48.1%)\n", + "2025-12-16T20:53:57.854-0800\t[############............] lifesnaps.fitbit 4.59GB/9.02GB (50.9%)\n", + "2025-12-16T20:54:00.854-0800\t[############............] lifesnaps.fitbit 4.84GB/9.02GB (53.7%)\n", + "2025-12-16T20:54:03.854-0800\t[#############...........] lifesnaps.fitbit 5.10GB/9.02GB (56.6%)\n", + "2025-12-16T20:54:06.854-0800\t[##############..........] lifesnaps.fitbit 5.36GB/9.02GB (59.4%)\n", + "2025-12-16T20:54:09.854-0800\t[##############..........] lifesnaps.fitbit 5.62GB/9.02GB (62.3%)\n", + "2025-12-16T20:54:12.854-0800\t[###############.........] lifesnaps.fitbit 5.83GB/9.02GB (64.6%)\n", + "2025-12-16T20:54:15.854-0800\t[################........] lifesnaps.fitbit 6.06GB/9.02GB (67.2%)\n", + "2025-12-16T20:54:18.854-0800\t[################........] lifesnaps.fitbit 6.31GB/9.02GB (69.9%)\n", + "2025-12-16T20:54:21.854-0800\t[#################.......] lifesnaps.fitbit 6.57GB/9.02GB (72.8%)\n", + "2025-12-16T20:54:24.854-0800\t[##################......] lifesnaps.fitbit 6.81GB/9.02GB (75.5%)\n", + "2025-12-16T20:54:27.854-0800\t[##################......] lifesnaps.fitbit 7.05GB/9.02GB (78.2%)\n", + "2025-12-16T20:54:30.854-0800\t[###################.....] lifesnaps.fitbit 7.29GB/9.02GB (80.8%)\n", + "2025-12-16T20:54:33.854-0800\t[####################....] lifesnaps.fitbit 7.55GB/9.02GB (83.7%)\n", + "2025-12-16T20:54:36.854-0800\t[####################....] lifesnaps.fitbit 7.81GB/9.02GB (86.6%)\n", + "2025-12-16T20:54:39.854-0800\t[#####################...] lifesnaps.fitbit 8.06GB/9.02GB (89.4%)\n", + "2025-12-16T20:54:42.854-0800\t[######################..] lifesnaps.fitbit 8.31GB/9.02GB (92.2%)\n", + "2025-12-16T20:54:45.855-0800\t[######################..] lifesnaps.fitbit 8.54GB/9.02GB (94.7%)\n", + "2025-12-16T20:54:48.855-0800\t[#######################.] lifesnaps.fitbit 8.78GB/9.02GB (97.4%)\n", + "2025-12-16T20:54:51.722-0800\t[########################] lifesnaps.fitbit 9.02GB/9.02GB (100.0%)\n", + "2025-12-16T20:54:51.722-0800\tfinished restoring lifesnaps.fitbit (71284346 documents, 0 failures)\n", + "2025-12-16T20:54:51.722-0800\trestoring indexes for collection lifesnaps.fitbit from metadata\n", + "2025-12-16T20:54:51.722-0800\tindex: &idx.IndexDocument{Options:primitive.M{\"background\":true, \"name\":\"type_1\", \"ns\":\"raisV3_anonymized.fitbit\", \"v\":2}, Key:primitive.D{primitive.E{Key:\"type\", Value:1}}, PartialFilterExpression:primitive.D(nil)}\n", + "2025-12-16T20:54:51.722-0800\tindex: &idx.IndexDocument{Options:primitive.M{\"background\":true, \"name\":\"id_1_type_1\", \"ns\":\"raisV3_anonymized.fitbit\", \"v\":2}, Key:primitive.D{primitive.E{Key:\"id\", Value:1}, primitive.E{Key:\"type\", Value:1}}, PartialFilterExpression:primitive.D(nil)}\n", + "2025-12-16T20:54:51.722-0800\tindex: &idx.IndexDocument{Options:primitive.M{\"background\":true, \"name\":\"id_1_type_1_data.dateTime_1_data.value.bpm_1\", \"ns\":\"raisV3_anonymized.fitbit\", \"v\":2}, Key:primitive.D{primitive.E{Key:\"id\", Value:1}, primitive.E{Key:\"type\", Value:1}, primitive.E{Key:\"data.dateTime\", Value:1}, primitive.E{Key:\"data.value.bpm\", Value:1}}, PartialFilterExpression:primitive.D(nil)}\n", + "2025-12-16T20:54:51.722-0800\tindex: &idx.IndexDocument{Options:primitive.M{\"background\":true, \"name\":\"data.dateTime_1\", \"ns\":\"raisV3_anonymized.fitbit\", \"v\":2}, Key:primitive.D{primitive.E{Key:\"data.dateTime\", Value:1}}, PartialFilterExpression:primitive.D(nil)}\n", + "2025-12-16T20:54:51.722-0800\tindex: &idx.IndexDocument{Options:primitive.M{\"background\":true, \"name\":\"data.reading_time_1\", \"ns\":\"raisV3_anonymized.fitbit\", \"v\":2}, Key:primitive.D{primitive.E{Key:\"data.reading_time\", Value:1}}, PartialFilterExpression:primitive.D(nil)}\n", + "2025-12-16T20:54:51.722-0800\tindex: &idx.IndexDocument{Options:primitive.M{\"background\":true, \"name\":\"data.sleep_start_1\", \"ns\":\"raisV3_anonymized.fitbit\", \"v\":2}, Key:primitive.D{primitive.E{Key:\"data.sleep_start\", Value:1}}, PartialFilterExpression:primitive.D(nil)}\n", + "2025-12-16T20:54:51.722-0800\tindex: &idx.IndexDocument{Options:primitive.M{\"background\":true, \"name\":\"data.startTime_1\", \"ns\":\"raisV3_anonymized.fitbit\", \"v\":2}, Key:primitive.D{primitive.E{Key:\"data.startTime\", Value:1}}, PartialFilterExpression:primitive.D(nil)}\n", + "2025-12-16T20:54:51.722-0800\tindex: &idx.IndexDocument{Options:primitive.M{\"background\":true, \"name\":\"id_1\", \"ns\":\"raisV3_anonymized.fitbit\", \"v\":2}, Key:primitive.D{primitive.E{Key:\"id\", Value:1}}, PartialFilterExpression:primitive.D(nil)}\n", + "2025-12-16T20:54:51.722-0800\tindex: &idx.IndexDocument{Options:primitive.M{\"background\":true, \"name\":\"data.timestamp_1\", \"ns\":\"raisV3_anonymized.fitbit\", \"v\":2}, Key:primitive.D{primitive.E{Key:\"data.timestamp\", Value:1}}, PartialFilterExpression:primitive.D(nil)}\n", + "2025-12-16T20:59:25.881-0800\t71284346 document(s) restored successfully. 0 document(s) failed to restore.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Successfully imported collection 'fitbit' with 71284346 documents.\n", + "Collection 'sema' has 0 documents, expected 15380. Importing data...\n", + "Importing data into collection 'sema' from /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/sema.bson...\n", + "Running command: mongorestore --host localhost --port 27017 --db lifesnaps --collection sema --drop /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/sema.bson\n", + "Successfully imported collection 'sema' with 15380 documents.\n", + "Collection 'surveys' has 0 documents, expected 935. Importing data...\n", + "Importing data into collection 'surveys' from /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/surveys.bson...\n", + "Running command: mongorestore --host localhost --port 27017 --db lifesnaps --collection surveys --drop /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/surveys.bson\n", + "Successfully imported collection 'surveys' with 935 documents.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2025-12-16T20:59:31.410-0800\tchecking for collection data in /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/sema.bson\n", + "2025-12-16T20:59:31.411-0800\treading metadata for lifesnaps.sema from /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/sema.metadata.json\n", + "2025-12-16T20:59:31.442-0800\trestoring lifesnaps.sema from /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/sema.bson\n", + "2025-12-16T20:59:31.492-0800\tfinished restoring lifesnaps.sema (15380 documents, 0 failures)\n", + "2025-12-16T20:59:31.492-0800\tno indexes to restore for collection lifesnaps.sema\n", + "2025-12-16T20:59:31.492-0800\t15380 document(s) restored successfully. 0 document(s) failed to restore.\n", + "2025-12-16T20:59:31.538-0800\tchecking for collection data in /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/surveys.bson\n", + "2025-12-16T20:59:31.538-0800\treading metadata for lifesnaps.surveys from /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/surveys.metadata.json\n", + "2025-12-16T20:59:31.560-0800\trestoring lifesnaps.surveys from /Users/poldrack/data_unsynced/LifeSnaps/rais_anonymized/mongo_rais_anonymized/surveys.bson\n", + "2025-12-16T20:59:31.604-0800\tfinished restoring lifesnaps.surveys (935 documents, 0 failures)\n", + "2025-12-16T20:59:31.604-0800\tno indexes to restore for collection lifesnaps.surveys\n", + "2025-12-16T20:59:31.604-0800\t935 document(s) restored successfully. 0 document(s) failed to restore.\n" ] } ], @@ -69,11 +185,11 @@ " 'surveys': 935\n", "}\n", "\n", - "overwrite = False\n", + "# in general we will need to overwrite to get the full dataset to begin with\n", + "overwrite = True\n", "\n", "for collection_name, expected_length in collection_lengths.items():\n", - " collection = db[collection_name]\n", - " actual_length = collection.count_documents({})\n", + " actual_length = get_collection_size(db, collection_name)\n", " # use ge since we will removing some objects below\n", " if actual_length >= expected_length and not overwrite:\n", " print(f\"Collection '{collection_name}' already loaded with {actual_length} documents.\")\n", @@ -88,8 +204,7 @@ " print(f\"Running command: {command}\")\n", " os.system(command)\n", " \n", - " collection = db[collection_name]\n", - " actual_length = collection.count_documents({})\n", + " actual_length = get_collection_size(db, collection_name)\n", " assert actual_length >= expected_length, f\"After import, collection '{collection_name}' has {actual_length} documents, expected {expected_length}.\"\n", " print(f\"Successfully imported collection '{collection_name}' with {actual_length} documents.\")\n", " " @@ -110,11 +225,47 @@ "cell_type": "markdown", "id": "56ad6fcb", "metadata": {}, - "source": [] + "source": [ + "First pull Profile records into a separate object store since they are a different kind of data" + ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 4, + "id": "e6bfae7a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created 'fitbit_profile' collection with 69 documents.\n" + ] + } + ], + "source": [ + "# create a new table containing all documents from fitbit with type 'Profile'\n", + "profile_collection = db['fitbit_profile']\n", + "profile_collection.drop() # drop existing collection if it exists\n", + "fitbit_collection = db['fitbit']\n", + "profiles = list(fitbit_collection.find({\"type\": \"Profile\"}))\n", + "if len(profiles) > 0:\n", + " profile_collection.insert_many(profiles)\n", + " print(f\"Created 'fitbit_profile' collection with {get_collection_size(db, 'fitbit_profile')} documents.\")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "a653da8c", + "metadata": {}, + "source": [ + "Remove unwanted fitbit data types" + ] + }, + { + "cell_type": "code", + "execution_count": 5, "id": "a2dd2b2e", "metadata": {}, "outputs": [ @@ -122,13 +273,22 @@ "name": "stdout", "output_type": "stream", "text": [ - "Removed 9844973 unwanted documents from 'fitbit' collection.\n" + "Removed 9845042 unwanted documents from 'fitbit' collection.\n", + "Remaining documents in 'fitbit' collection: 61439304\n", + "Final document counts in 'fitbit' collection after cleanup:\n", + "Type: calories, Count: 9675782\n", + "Type: heart_rate, Count: 48720040\n", + "Type: lightly_active_minutes, Count: 7203\n", + "Type: moderately_active_minutes, Count: 7203\n", + "Type: sedentary_minutes, Count: 7203\n", + "Type: sleep, Count: 4141\n", + "Type: steps, Count: 3010529\n", + "Type: very_active_minutes, Count: 7203\n" ] } ], "source": [ "fitbit_types_to_keep = [\n", - " \"Profile\",\n", " \"heart_rate\",\n", " \"sleep\",\n", " \"steps\",\n", @@ -142,57 +302,317 @@ "# remove unwanted fitbit data\n", "fitbit_collection = db['fitbit']\n", "deletion_result = fitbit_collection.delete_many({\"type\": {\"$nin\": fitbit_types_to_keep}})\n", - "print(f\"Removed {deletion_result.deleted_count} unwanted documents from 'fitbit' collection.\")" + "print(f\"Removed {deletion_result.deleted_count} unwanted documents from 'fitbit' collection.\")\n", + "print(f\"Remaining documents in 'fitbit' collection: {get_collection_size(db, 'fitbit')}\")\n", + "print(\"Final document counts in 'fitbit' collection after cleanup:\")\n", + "final_type_counts = get_collection_type_counts(db, 'fitbit')\n", + "for dtype, count in final_type_counts.items():\n", + " print(f\"Type: {dtype}, Count: {count}\")" + ] + }, + { + "cell_type": "markdown", + "id": "2f6d90f7", + "metadata": {}, + "source": [ + "## Harmonize the documents and combine into a single database\n", + "\n", + "We want to be able to treat each of the different data types similarly, but currently some of them have their value in a different location than the `value` field." ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 6, "id": "3752b2db", "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "{'Profile': 69,\n", - " 'calories': 9675782,\n", - " 'heart_rate': 48720040,\n", - " 'lightly_active_minutes': 7203,\n", - " 'moderately_active_minutes': 7203,\n", - " 'sedentary_minutes': 7203,\n", - " 'sleep': 4141,\n", - " 'steps': 3010529,\n", - " 'very_active_minutes': 7203}" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + "Removed 2 documents with null 'data.SURVEY_NAME' from 'sema' collection.\n", + "Remaining documents in 'sema' collection: 15378\n", + "Updated 15378 documents in 'sema' collection to add 'type' field.\n", + "Document counts in 'sema' collection after adding 'type' field:\n", + "Type: Context and Mood Survey, Count: 11526\n", + "Type: Step Goal Survey, Count: 3852\n" + ] } ], "source": [ - "def get_collection_type_counts(db, collection_name, sample_size=3):\n", - " # Get distinct types in a collection and sample documents for each type\n", - " collection = db[collection_name]\n", - " distinct_types = collection.distinct(\"type\")\n", - " type_counts = {}\n", - " for dtype in distinct_types:\n", - " count = collection.count_documents({\"type\": dtype})\n", - " sample_docs = list(collection.find({\"type\": dtype}).limit(sample_size))\n", - " type_counts[dtype] = count\n", - " \n", - " return type_counts\n", + "# for sema collection, create a 'type' field based on data['SURVEY_NAME']\n", "\n", - "get_collection_type_counts(db, 'fitbit')\n" + "sema_collection = db['sema']\n", + "# first remove documents that None for data.SURVEY_NAME\n", + "deletion_result = sema_collection.delete_many({\"data.SURVEY_NAME\": None})\n", + "print(f\"Removed {deletion_result.deleted_count} documents with null 'data.SURVEY_NAME' from 'sema' collection.\")\n", + "print(f\"Remaining documents in 'sema' collection: {get_collection_size(db,'sema')}\")\n", + "# now update documents to add 'type'\n", + "update_result = sema_collection.update_many(\n", + " {\"type\": {\"$exists\": False}},\n", + " [{\"$set\": {\"type\": \"$data.SURVEY_NAME\"}}]\n", + ")\n", + "print(f\"Updated {update_result.modified_count} documents in 'sema' collection to add 'type' field.\")\n", + "print(f\"Document counts in 'sema' collection after adding 'type' field:\")\n", + "sema_type_counts = get_collection_type_counts(db, 'sema')\n", + "for dtype, count in sema_type_counts.items():\n", + " print(f\"Type: {dtype}, Count: {count}\") " ] }, { "cell_type": "code", - "execution_count": null, - "id": "11b8f064", + "execution_count": 7, + "id": "eb0057c2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Combined 'sema' collection into 'fitbit'. New 'fitbit' collection size: 61450830 documents.\n" + ] + } + ], + "source": [ + "# combine the sema collection into fitbit\n", + "\n", + "sema_collection = db['sema']\n", + "# drop the \"Step Goal Survey\" from sema\n", + "sema_collection.delete_many({\"type\": \"Step Goal Survey\"})\n", + "\n", + "fitbit_collection = db['fitbit']\n", + "fitbit_collection.insert_many(sema_collection.find())\n", + "print(f\"Combined 'sema' collection into 'fitbit'. New 'fitbit' collection size: {get_collection_size(db, 'fitbit')} documents.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "853eb7e5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Updated 48720040 documents of type 'heart_rate' to add 'value' field from 'data.value.bpm'.\n", + "Updated 4141 documents of type 'sleep' to add 'date' field from 'data.endTime'.\n", + "Updated 4141 documents of type 'sleep' to add 'value' field from 'data.minutesAsleep'.\n", + "Updated 7203 documents of type 'lightly_active_minutes' to add 'value' field from 'data.value'.\n", + "Updated 7203 documents of type 'moderately_active_minutes' to add 'value' field from 'data.value'.\n", + "Updated 7203 documents of type 'very_active_minutes' to add 'value' field from 'data.value'.\n", + "Updated 7203 documents of type 'sedentary_minutes' to add 'value' field from 'data.value'.\n", + "Updated 9675782 documents of type 'calories' to add 'date' field from 'data.dateTime'.\n", + "Updated 9675782 documents of type 'calories' to add 'value' field from 'data.value'.\n", + "Updated 3010529 documents of type 'steps' to add 'date' field from 'data.dateTime'.\n", + "Updated 3010529 documents of type 'steps' to add 'value' field from 'data.value'.\n", + "Updated 11526 documents of type 'Context and Mood Survey' to add 'date' field from 'data.COMPLETED_TS'.\n", + "Updated 11526 documents of type 'Context and Mood Survey' to add 'value' field from 'data.MOOD'.\n", + "\n", + "Verifying updates:\n", + "Type 'heart_rate': 48720040/48720040 documents now have 'value' field.\n", + "Type 'sleep': 4141/4141 documents now have 'value' field.\n", + "Type 'lightly_active_minutes': 7203/7203 documents now have 'value' field.\n", + "Type 'moderately_active_minutes': 7203/7203 documents now have 'value' field.\n", + "Type 'very_active_minutes': 7203/7203 documents now have 'value' field.\n", + "Type 'sedentary_minutes': 7203/7203 documents now have 'value' field.\n", + "Type 'calories': 9675782/9675782 documents now have 'value' field.\n", + "Type 'steps': 3010529/3010529 documents now have 'value' field.\n", + "Type 'Context and Mood Survey': 11526/11526 documents now have 'value' field.\n" + ] + } + ], + "source": [ + "# some already are called \"value\": calories, active/sedentary minutes\n", + "value_variable = {\n", + " 'heart_rate': 'value.bpm',\n", + " 'sleep': 'minutesAsleep',\n", + " 'lightly_active_minutes': 'value',\n", + " 'moderately_active_minutes': 'value',\n", + " 'very_active_minutes': 'value',\n", + " 'sedentary_minutes': 'value',\n", + " 'calories': 'value',\n", + " 'steps': 'value',\n", + " \"Context and Mood Survey\": 'MOOD'\n", + "}\n", + "date_variable = {\n", + " 'sleep': 'endTime',\n", + " 'calories': 'dateTime',\n", + " 'steps': 'dateTime',\n", + " \"Context and Mood Survey\": 'COMPLETED_TS'\n", + "}\n", + "# for each object that has type matching one of the keys in value_variable,\n", + "# move data[value_variable] into 'value' field at root level of object\n", + "\n", + "fitbit_collection = db['fitbit']\n", + "\n", + "for doc_type, value_field in value_variable.items():\n", + " # Update documents of this type to move the value from data[value_field] to root level 'value'\n", + " update_result = fitbit_collection.update_many(\n", + " {\n", + " \"type\": doc_type,\n", + " f\"data.{value_field}\": {\"$exists\": True}\n", + " },\n", + " [\n", + " {\"$set\": {\n", + " \"value\": f\"$data.{value_field}\",\n", + " \"value_origin\": value_field\n", + " }}\n", + " ]\n", + " )\n", + " # fix date field if applicable\n", + " if doc_type in date_variable:\n", + " date_field = date_variable[doc_type]\n", + " date_update_result = fitbit_collection.update_many(\n", + " {\n", + " \"type\": doc_type,\n", + " f\"data.{date_field}\": {\"$exists\": True}\n", + " },\n", + " [\n", + " {\"$set\": {\n", + " \"date\": f\"$data.{date_field}\",\n", + " \"date_origin\": date_field\n", + " }}\n", + " ]\n", + " )\n", + " print(f\"Updated {date_update_result.modified_count} documents of type '{doc_type}' to add 'date' field from 'data.{date_field}'.\")\n", + " print(f\"Updated {update_result.modified_count} documents of type '{doc_type}' to add 'value' field from 'data.{value_field}'.\")\n", + "\n", + "print(\"\\nVerifying updates:\")\n", + "for doc_type in value_variable.keys():\n", + " count_with_value = fitbit_collection.count_documents({\n", + " \"type\": doc_type,\n", + " \"value\": {\"$exists\": True}\n", + " })\n", + " total_count = fitbit_collection.count_documents({\"type\": doc_type})\n", + " print(f\"Type '{doc_type}': {count_with_value}/{total_count} documents now have 'value' field.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "b137d0f7", + "metadata": {}, + "source": [ + "Harmonize by adding type field to sema database" + ] + }, + { + "cell_type": "markdown", + "id": "335a1d89", + "metadata": {}, + "source": [ + "rename id to user_id for fitbit collection to harmonize with others" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "89b2a549", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Renamed 'id' field to 'user_id' in 61439304 documents in 'fitbit' collection.\n" + ] + } + ], + "source": [ + "# rename \"id\" field to \"user_id\" in fitbit collection\n", + "\n", + "# skip if all entities already have \"user_id\"\n", + "if fitbit_collection.count_documents({\"user_id\": {\"$exists\": False}}) > 0:\n", + "\n", + " rename_result = fitbit_collection.update_many(\n", + " {},\n", + " {\"$rename\": {\"id\": \"user_id\"}}\n", + " )\n", + " print(f\"Renamed 'id' field to 'user_id' in {rename_result.modified_count} documents in 'fitbit' collection.\")\n", + "else:\n", + " print(\"Field 'id' already renamed to 'user_id' in 'fitbit' collection; skipping rename.\") " + ] + }, + { + "cell_type": "markdown", + "id": "df746117", + "metadata": {}, + "source": [ + "Doublecheck that every document has a fields for type, value, user_id, and date." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "523fa955", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Missing field counts in 'fitbit' collection:\n", + "Field 'user_id': 0 documents missing this field.\n", + "Field 'type': 0 documents missing this field.\n", + "Field 'value': 0 documents missing this field.\n", + "Field 'date': 48748852 documents missing this field.\n" + ] + } + ], + "source": [ + "field_to_check_for = ['user_id', 'type', 'value', 'date']\n", + "# check that all documents in fitbit collection have these fields\n", + "missing_field_counts = {}\n", + "for field in field_to_check_for:\n", + " count_missing = fitbit_collection.count_documents({field: {\"$exists\": False}})\n", + " missing_field_counts[field] = count_missing\n", + "print(\"Missing field counts in 'fitbit' collection:\")\n", + "for field, count in missing_field_counts.items():\n", + " print(f\"Field '{field}': {count} documents missing this field.\") " + ] + }, + { + "cell_type": "markdown", + "id": "ce192d95", + "metadata": {}, + "source": [ + "### Combine all three stores into a single store\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "ce02ac7b", "metadata": {}, "outputs": [], - "source": [] + "source": [ + "# delete intermediate collections if desired\n", + "#fitbit_collection.drop()\n", + "#sema_collection.drop()\n", + "#print(\"Deleted intermediate collections 'fitbit' and 'sema'.\")\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "5e7e8199", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total size of 'lifesnaps_data' collection: 0.00 GB\n" + ] + } + ], + "source": [ + "# get total size of lifesnaps_data collection in gigabytes\n", + "\n", + "total_size_gb = get_collection_size(db, 'lifesnaps_data') / 1e7\n", + "print(f\"Total size of 'lifesnaps_data' collection: {total_size_gb:.02f} GB\")" + ] } ], "metadata": { From 3beb12d3fdfeb58aaa7b722b9875d63b28ccd6bb Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Fri, 19 Dec 2025 12:52:28 -0800 Subject: [PATCH 23/87] add a few topics --- book/workflows.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/book/workflows.md b/book/workflows.md index 4d10fdb..79c77ef 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -66,6 +66,56 @@ but sometimes state is required - running it multiple time should give same answer as running it once - +- somewhere talk about in-place operations and their challenges + +- local mutation - never change an object that is passed in as an argument + - always copy + - if the package uses copy-on-write then this is cheap (only copied metadata) + - this is not default in pandas 1.x but coming in 2.x + - google says it's possible using pd.options.mode.copy_on_write = True - need to confirm + - need to check for other frameworks + +- lazy frames (polars) + +- any function that must mutate in place should do so clearly + - e.g. normalize_(x) (apparently pytorch style for mutating functions?) + - or using "inplace" in the function name + + +- can encode state in type (e.g. a lightweight class that tracks state ,e.g. "NormalizedArray") +- or track stage explicitly (e.g. Dataset(stage='normalized', data=array)) + +- use zarr to save each pipeline step as a new group: + +dataset.zarr/ +├── raw/ +│ └── signal +├── zscored/ +│ └── signal +├── filtered/ +│ └── signal + +- can also store parameters as attrs in zarr +- e.g. z.attrs.update({ + "stage": "zscore", + "mean_method": "time", + "std_ddof": 1, +}) + + +also look at arrow for columnar data - look into arrow immutability + + +## Deferred execution + +- dask, xarray + +## Checkpointing + +- pipeline state should be files on disk, not in memory +- functions don't pass large objects in memory, they simply pass file names +- for modern formats like parquet (others?) reading is very fast so the penalty is minimal + ## Precomputing expensive/common operations @@ -104,6 +154,9 @@ https://workflowhub.eu/ ## Logging +## Report generation + + ## Simple workflow management with Makefiles From bc8554900e97783b6c7d2c31390b9279d5509a36 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Fri, 19 Dec 2025 12:52:51 -0800 Subject: [PATCH 24/87] add deps for scrna-seq example: --- pyproject.toml | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/pyproject.toml b/pyproject.toml index 62b01fc..49d70ca 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -65,6 +65,17 @@ dependencies = [ "mne>=1.11.0", "mongomock>=4.3.0", "linkcheckmd>=1.4.0", + "anndata>=0.12.7", + "xarray>=2025.12.0", + "dask>=2025.12.0", + "pyarrow>=22.0.0", + "scanpy>=1.11.5", + "scrublet>=0.2.3", + "igraph>=1.0.0", + "leidenalg>=0.11.0", + "fastcluster>=1.3.0", + "scikit-misc>=0.5.2", + "harmony-pytorch>=0.1.8", ] [build-system] From c0434ca6b41ce51fa624bc2da1ad5ea25af3fe74 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Fri, 19 Dec 2025 12:53:07 -0800 Subject: [PATCH 25/87] add deps for scrna-seq example: --- uv.lock | 599 ++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 512 insertions(+), 87 deletions(-) diff --git a/uv.lock b/uv.lock index 0c89da9..51b298c 100644 --- a/uv.lock +++ b/uv.lock @@ -103,6 +103,26 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/32/34/d4e1c02d3bee589efb5dfa17f88ea08bdb3e3eac12bc475462aec52ed223/alabaster-0.7.16-py3-none-any.whl", hash = "sha256:b46733c07dce03ae4e150330b975c75737fa60f0a7c591b6c8bf4928a28e2c92", size = 13511, upload-time = "2024-01-10T00:56:08.388Z" }, ] +[[package]] +name = "anndata" +version = "0.12.7" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "array-api-compat" }, + { name = "h5py" }, + { name = "legacy-api-wrap" }, + { name = "natsort" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "pandas" }, + { name = "scipy" }, + { name = "zarr" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/d8/64/ea6da6d88c0b5ad3231828a8ab895cec871ff965e626e4986bc4dfae053d/anndata-0.12.7.tar.gz", hash = "sha256:10612d476e78570be2fdd391b09cb64d3b33cda32b1b46a0a4b999ba98d64d47", size = 2248853, upload-time = "2025-12-16T13:47:14.246Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/78/bc/dee9a01c1b9cd16d7e257644a2fc8ee6df6c685faaf68d289bdc4c91adec/anndata-0.12.7-py3-none-any.whl", hash = "sha256:bd7c18bdc2ed24b9089fd1494b52b787566dea175dde4689d4144693d0949581", size = 174195, upload-time = "2025-12-16T13:47:12.637Z" }, +] + [[package]] name = "annexremote" version = "1.6.6" @@ -130,6 +150,12 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" }, ] +[[package]] +name = "annoy" +version = "1.17.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/07/38/e321b0e05d8cc068a594279fb7c097efb1df66231c295d482d7ad51b6473/annoy-1.17.3.tar.gz", hash = "sha256:9cbfebefe0a5f843eba29c6be4c84d601f4f41ad4ded0486f1b88c3b07739c15", size = 647460, upload-time = "2023-06-14T16:37:34.152Z" } + [[package]] name = "anthropic" version = "0.75.0" @@ -219,6 +245,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/42/b9/f8d6fa329ab25128b7e98fd83a3cb34d9db5b059a9847eddb840a0af45dd/argon2_cffi_bindings-25.1.0-cp39-abi3-win_arm64.whl", hash = "sha256:b0fdbcf513833809c882823f98dc2f931cf659d9a1429616ac3adebb49f5db94", size = 27149, upload-time = "2025-07-30T10:01:59.329Z" }, ] +[[package]] +name = "array-api-compat" +version = "1.12.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/8d/bd/9fa5c7c5621698d5632cc852a79fbbdc28024462c9396698e5fdcb395f37/array_api_compat-1.12.0.tar.gz", hash = "sha256:585bc615f650de53ac24b7c012baecfcdd810f50df3573be47e6dd9fa20df974", size = 99883, upload-time = "2025-05-16T08:49:59.897Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e0/b1/0542e0cab6f49f151a2d7a42400f84f706fc0b64e85dc1f56708b2e9fd37/array_api_compat-1.12.0-py3-none-any.whl", hash = "sha256:a0b4795b6944a9507fde54679f9350e2ad2b1e2acf4a2408a098cdc27f890a8b", size = 58156, upload-time = "2025-05-16T08:49:58.129Z" }, +] + [[package]] name = "arrow" version = "1.4.0" @@ -343,25 +378,31 @@ version = "0.1.0" source = { editable = "." } dependencies = [ { name = "accelerate" }, + { name = "anndata" }, { name = "anthropic" }, { name = "biopython" }, { name = "biothings-client" }, { name = "blue" }, { name = "chromadb" }, { name = "codespell" }, + { name = "dask" }, { name = "datalad" }, { name = "datalad-osf" }, { name = "docutils" }, + { name = "fastcluster" }, { name = "fastembed" }, { name = "fastparquet" }, { name = "fmriprep-docker" }, { name = "gprofiler-official" }, { name = "h5py" }, + { name = "harmony-pytorch" }, { name = "hypothesis" }, { name = "icecream" }, + { name = "igraph" }, { name = "jupyter" }, { name = "jupyter-book" }, { name = "jupytext" }, + { name = "leidenalg" }, { name = "linkcheckmd" }, { name = "mariadb" }, { name = "matplotlib" }, @@ -382,6 +423,7 @@ dependencies = [ { name = "pandas" }, { name = "pickleshare" }, { name = "pre-commit" }, + { name = "pyarrow" }, { name = "pygithub" }, { name = "pymongo" }, { name = "pyppeteer" }, @@ -392,8 +434,11 @@ dependencies = [ { name = "pyyaml" }, { name = "rpy2" }, { name = "ruff" }, + { name = "scanpy" }, { name = "scikit-learn" }, + { name = "scikit-misc" }, { name = "scipy" }, + { name = "scrublet" }, { name = "seaborn" }, { name = "statsmodels" }, { name = "templateflow" }, @@ -401,31 +446,38 @@ dependencies = [ { name = "torch" }, { name = "tqdm" }, { name = "transformers" }, + { name = "xarray" }, { name = "zarr" }, ] [package.metadata] requires-dist = [ { name = "accelerate", specifier = ">=1.4.0" }, + { name = "anndata", specifier = ">=0.12.7" }, { name = "anthropic", specifier = ">=0.61.0" }, { name = "biopython", specifier = ">=1.86" }, { name = "biothings-client", specifier = ">=0.4.1" }, { name = "blue", specifier = ">=0.9.1" }, { name = "chromadb", specifier = ">=1.3.5" }, { name = "codespell", specifier = ">=2.4.1" }, + { name = "dask", specifier = ">=2025.12.0" }, { name = "datalad", specifier = ">=1.2.3" }, { name = "datalad-osf", specifier = ">=0.3.0" }, { name = "docutils", specifier = "==0.17.1" }, + { name = "fastcluster", specifier = ">=1.3.0" }, { name = "fastembed", specifier = ">=0.7.3" }, { name = "fastparquet", specifier = ">=2024.11.0" }, { name = "fmriprep-docker", specifier = ">=25.2.3" }, { name = "gprofiler-official", specifier = ">=1.0.0" }, { name = "h5py", specifier = ">=3.15.1" }, + { name = "harmony-pytorch", specifier = ">=0.1.8" }, { name = "hypothesis", specifier = ">=6.115.3" }, { name = "icecream", specifier = ">=2.1.4" }, + { name = "igraph", specifier = ">=1.0.0" }, { name = "jupyter", specifier = ">=1.1.1" }, { name = "jupyter-book", specifier = ">=1.0.2" }, { name = "jupytext", specifier = ">=1.16.4" }, + { name = "leidenalg", specifier = ">=0.11.0" }, { name = "linkcheckmd", specifier = ">=1.4.0" }, { name = "mariadb", specifier = ">=1.1.14" }, { name = "matplotlib", specifier = ">=3.9.2" }, @@ -446,6 +498,7 @@ requires-dist = [ { name = "pandas", specifier = ">=2.2.3" }, { name = "pickleshare", specifier = ">=0.7.5" }, { name = "pre-commit", specifier = ">=4.2.0" }, + { name = "pyarrow", specifier = ">=22.0.0" }, { name = "pygithub", specifier = ">=2.4.0" }, { name = "pymongo", extras = ["srv"], specifier = ">=4.15.4" }, { name = "pyppeteer", specifier = ">=2.0.0" }, @@ -456,8 +509,11 @@ requires-dist = [ { name = "pyyaml", specifier = ">=6.0.2" }, { name = "rpy2", specifier = ">=3.6.4" }, { name = "ruff", specifier = ">=0.6.9" }, + { name = "scanpy", specifier = ">=1.11.5" }, { name = "scikit-learn", specifier = ">=1.5.2" }, + { name = "scikit-misc", specifier = ">=0.5.2" }, { name = "scipy", specifier = ">=1.14.1" }, + { name = "scrublet", specifier = ">=0.2.3" }, { name = "seaborn", specifier = ">=0.13.2" }, { name = "statsmodels", specifier = ">=0.14.5" }, { name = "templateflow", specifier = ">=25.1.1" }, @@ -465,6 +521,7 @@ requires-dist = [ { name = "torch", specifier = ">=2.6.0" }, { name = "tqdm", specifier = ">=4.66.5" }, { name = "transformers", specifier = ">=4.49.0" }, + { name = "xarray", specifier = ">=2025.12.0" }, { name = "zarr", specifier = ">=3.1.3" }, ] @@ -609,30 +666,30 @@ wheels = [ [[package]] name = "boto3" -version = "1.42.8" +version = "1.42.12" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "botocore" }, { name = "jmespath" }, { name = "s3transfer" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/9d/34/64e34fb40903d358a4a3d697e2ee4784a7b52c11e7effbad01967b2d3fc3/boto3-1.42.8.tar.gz", hash = "sha256:e967706af5887339407481562c389c612d5eae641eb854ddd59026d049df740e", size = 112886, upload-time = "2025-12-11T21:54:15.614Z" } +sdist = { url = "https://files.pythonhosted.org/packages/98/66/ffe9623d64e97800ff6bac26953cd9ef99410fb864a0b26a0ea2e09b97f0/boto3-1.42.12.tar.gz", hash = "sha256:649b134d25b278c24fcc8b3f94519de3884283b7848dc32f42b0ffdd9d19ce99", size = 112868, upload-time = "2025-12-17T20:30:42.394Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/96/37/9702c0b8e63aaeb1ad430ece22567b03e58ea41e446d68b92e2cb00e7817/boto3-1.42.8-py3-none-any.whl", hash = "sha256:747acc83488fc80b0e7d1c4ff0c533039ff3ede21bdbd4e89544e25b010b070c", size = 140559, upload-time = "2025-12-11T21:54:14.513Z" }, + { url = "https://files.pythonhosted.org/packages/3e/8b/20a90c75499e3c3a8e3eb5607d930c723577ef8c64968b9be6b743f18158/boto3-1.42.12-py3-none-any.whl", hash = "sha256:8112e1beb5978bb455ea4b41a9ef26fc408f6340d8ff69ef93dded4f80fd53e9", size = 140573, upload-time = "2025-12-17T20:30:40.063Z" }, ] [[package]] name = "botocore" -version = "1.42.8" +version = "1.42.12" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "jmespath" }, { name = "python-dateutil" }, { name = "urllib3" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/3a/ea/4be7a4a640d599b5691c7cf27e125155d7d3643ecbe37e32941f412e3de5/botocore-1.42.8.tar.gz", hash = "sha256:4921aa454f82fed0880214eab21126c98a35fe31ede952693356f9c85ce3574b", size = 14861038, upload-time = "2025-12-11T21:54:04.031Z" } +sdist = { url = "https://files.pythonhosted.org/packages/a0/b6/9b7988a8476712cdbfeeb68c733933005465c85ebf0ee469a6ea5ca3415c/botocore-1.42.12.tar.gz", hash = "sha256:1f9f63c3d6bb1f768519da30d6018706443c5d8af5472274d183a4945f3d81f8", size = 14879004, upload-time = "2025-12-17T20:30:29.542Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/1c/24/a4301564a979368d6f3644f47acc921450b5524b8846e827237d98b04746/botocore-1.42.8-py3-none-any.whl", hash = "sha256:4cb89c74dd9083d16e45868749b999265a91309b2499907c84adeffa0a8df89b", size = 14534173, upload-time = "2025-12-11T21:54:01.143Z" }, + { url = "https://files.pythonhosted.org/packages/8f/73/22764d0a17130b7d95b2a4104607e6db5487a0e5afb68f5691260ae9c3dc/botocore-1.42.12-py3-none-any.whl", hash = "sha256:4f163880350f6d831857ce5d023875b7c6534be862e5affd9fcf82b8d1ab3537", size = 14552878, upload-time = "2025-12-17T20:30:24.671Z" }, ] [[package]] @@ -651,11 +708,11 @@ wheels = [ [[package]] name = "cachetools" -version = "6.2.2" +version = "6.2.4" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/fb/44/ca1675be2a83aeee1886ab745b28cda92093066590233cc501890eb8417a/cachetools-6.2.2.tar.gz", hash = "sha256:8e6d266b25e539df852251cfd6f990b4bc3a141db73b939058d809ebd2590fc6", size = 31571, upload-time = "2025-11-13T17:42:51.465Z" } +sdist = { url = "https://files.pythonhosted.org/packages/bc/1d/ede8680603f6016887c062a2cf4fc8fdba905866a3ab8831aa8aa651320c/cachetools-6.2.4.tar.gz", hash = "sha256:82c5c05585e70b6ba2d3ae09ea60b79548872185d2f24ae1f2709d37299fd607", size = 31731, upload-time = "2025-12-15T18:24:53.744Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/e6/46/eb6eca305c77a4489affe1c5d8f4cae82f285d9addd8de4ec084a7184221/cachetools-6.2.2-py3-none-any.whl", hash = "sha256:6c09c98183bf58560c97b2abfcedcbaf6a896a490f534b031b661d3723b45ace", size = 11503, upload-time = "2025-11-13T17:42:50.232Z" }, + { url = "https://files.pythonhosted.org/packages/2c/fc/1d7b80d0eb7b714984ce40efc78859c022cd930e402f599d8ca9e39c78a4/cachetools-6.2.4-py3-none-any.whl", hash = "sha256:69a7a52634fed8b8bf6e24a050fb60bff1c9bd8f6d24572b99c32d4e71e62a51", size = 11551, upload-time = "2025-12-15T18:24:52.332Z" }, ] [[package]] @@ -766,7 +823,7 @@ wheels = [ [[package]] name = "chromadb" -version = "1.3.6" +version = "1.3.7" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "bcrypt" }, @@ -797,13 +854,13 @@ dependencies = [ { name = "typing-extensions" }, { name = "uvicorn", extra = ["standard"] }, ] -sdist = { url = "https://files.pythonhosted.org/packages/c0/5c/c8d751b327f863c11cc51e4cd01750696bacdc65b291beda8b008917910e/chromadb-1.3.6.tar.gz", hash = "sha256:834d7d154471b36bed10ddb53fcc96dfa912d18f0d57418490d829f7aad59895", size = 1959127, upload-time = "2025-12-10T05:25:22.644Z" } +sdist = { url = "https://files.pythonhosted.org/packages/a9/b9/23eb242c0bad56bcac57d9f45a6cc85e016a44ae9baf763c0d040e45e2d7/chromadb-1.3.7.tar.gz", hash = "sha256:393b866b6ac60c12fc0f2a43d07b2884f2d02a68a1b2cb43c5ef87d141543571", size = 1960950, upload-time = "2025-12-12T21:03:13.941Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/3b/6d/2a09221575a4fd7b6356c1416160d491fccd38ab6b24eee7df030552a7ac/chromadb-1.3.6-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:d80b9621af6dfdd23a4aced8e33f667ab41e8252c2a1c4f46eb27acbbb67fe48", size = 20782782, upload-time = "2025-12-10T05:25:20.205Z" }, - { url = "https://files.pythonhosted.org/packages/bf/8e/528f64a8ec1e32b9d079c77572e0e8301e1fb461474fefca6bce7ce90f8e/chromadb-1.3.6-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:1dee1987755d1dc920d9608f76eedb0f70ae06ade020d462f3265b290ca05b65", size = 20078289, upload-time = "2025-12-10T05:25:17.621Z" }, - { url = "https://files.pythonhosted.org/packages/33/1c/07bb66f9d8f243ae232fa9f73bcb63ae1386dc327cfa085feca7365c34ad/chromadb-1.3.6-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f1a59a2ef5121b0321ce1ffcdf84dfa1f11fba7c6734fe6b3b22a22c9c531314", size = 20703285, upload-time = "2025-12-10T05:25:11.594Z" }, - { url = "https://files.pythonhosted.org/packages/4b/5d/210639c32d3f6f49b265c84990be5da1bb6e1b2fd31f9845abec4580e3ba/chromadb-1.3.6-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2f98ee1e180c11aaf33f0aed11a54f385def8361931fcab8c876e56e94f02699", size = 21633675, upload-time = "2025-12-10T05:25:14.759Z" }, - { url = "https://files.pythonhosted.org/packages/59/91/bf5dfa3bd5b457a9195439e077128c274202b1a231bd3a9d4d7f3c2259a3/chromadb-1.3.6-cp39-abi3-win_amd64.whl", hash = "sha256:cde42db6c3b31b7edf75bbf60447cd83e9693fe08d2005496ae418c05cae9b3f", size = 21870402, upload-time = "2025-12-10T05:25:24.475Z" }, + { url = "https://files.pythonhosted.org/packages/b6/9d/306e220cfb4382e9f29e645339826d1deec64c34cf905c344d0d7345dbdb/chromadb-1.3.7-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:74839c349a740b8e349fabc569f8f4becae9806fa8ff9ca186797bef1f54ee4c", size = 20816599, upload-time = "2025-12-12T21:03:11.173Z" }, + { url = "https://files.pythonhosted.org/packages/51/3e/0fbb4c6e7971019c976cf3dbef1c22c1a3089f74ef86c88e2e066edc47e4/chromadb-1.3.7-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:fe9c96f73450274d9f722572afc9d455b4f6f4cd960fa49e4bf489075ef30e6f", size = 20113076, upload-time = "2025-12-12T21:03:07.873Z" }, + { url = "https://files.pythonhosted.org/packages/69/78/2ae4064c9b194271b9c2bc66a26a7e11363d13ed2bd691a563fac1a7c5f2/chromadb-1.3.7-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:972cb168033db76a4bb1031bc38b6cc4e6d05ef716c1ffce8ae95a1a3b515dd2", size = 20738619, upload-time = "2025-12-12T21:03:01.409Z" }, + { url = "https://files.pythonhosted.org/packages/01/5d/3aa34cb02c3c0e4920a47da5d9092cab690fcbf6df13ec744eacf96891d6/chromadb-1.3.7-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3e05190236e309b54165866dd11676c2702a35b73beaa29502741f22f333c51a", size = 21654395, upload-time = "2025-12-12T21:03:04.909Z" }, + { url = "https://files.pythonhosted.org/packages/00/36/7d2d7b6bb26e53214492d71ccb4e128fa2de4d98a215befb7787deaf2701/chromadb-1.3.7-cp39-abi3-win_amd64.whl", hash = "sha256:4618ba7bb5ef5dbf0d4fd9ce708b912d8cd1ab24d3c81e0e092841f325b2c94d", size = 21874973, upload-time = "2025-12-12T21:03:16.918Z" }, ] [[package]] @@ -839,6 +896,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ae/8a/c4bb04426d608be4a3171efa2e233d2c59a5c8937850c10d098e126df18e/cloudpathlib-0.23.0-py3-none-any.whl", hash = "sha256:8520b3b01468fee77de37ab5d50b1b524ea6b4a8731c35d1b7407ac0cd716002", size = 62755, upload-time = "2025-10-07T22:47:54.905Z" }, ] +[[package]] +name = "cloudpickle" +version = "3.1.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/27/fb/576f067976d320f5f0114a8d9fa1215425441bb35627b1993e5afd8111e5/cloudpickle-3.1.2.tar.gz", hash = "sha256:7fda9eb655c9c230dab534f1983763de5835249750e85fbcef43aaa30a9a2414", size = 22330, upload-time = "2025-11-03T09:25:26.604Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/39/799be3f2f0f38cc727ee3b4f1445fe6d5e4133064ec2e4115069418a5bb6/cloudpickle-3.1.2-py3-none-any.whl", hash = "sha256:9acb47f6afd73f60dc1df93bb801b472f05ff42fa6c84167d25cb206be1fbf4a", size = 22228, upload-time = "2025-11-03T09:25:25.534Z" }, +] + [[package]] name = "codespell" version = "2.4.1" @@ -1001,15 +1067,15 @@ wheels = [ [[package]] name = "curies" -version = "0.12.5" +version = "0.12.6" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "pydantic" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/a9/4c/fc5d51c21b99f802948a8b3079565806239c76e7b2f1f6702a603fe282f7/curies-0.12.5.tar.gz", hash = "sha256:57e4853045f8029c2564fbf2290221ff7a529034405076d1e82b7a8727b33dfc", size = 282912, upload-time = "2025-11-25T12:47:24.825Z" } +sdist = { url = "https://files.pythonhosted.org/packages/7b/48/f2d73a4b266c7407af39a8f6d397bce7260627802d1bdf88ab3745a89408/curies-0.12.6.tar.gz", hash = "sha256:fe16fd217dadc7f85e80137ae303ba6afc35abaeeadb2730b45de01433843248", size = 283201, upload-time = "2025-12-17T11:42:18.316Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/8c/dd/29000adb47118edbf865a6e366fba294dcdacdf34322cedb23b8e7d30ae0/curies-0.12.5-py3-none-any.whl", hash = "sha256:e7fbb63cb49aeb389d46db64dae02f1563741084e033c2075cd1e163fdb1ead8", size = 69711, upload-time = "2025-11-25T12:47:23.058Z" }, + { url = "https://files.pythonhosted.org/packages/de/b5/2a7682e76e011ea56547f7adf0bddd4cc151caab2183355773d069368e55/curies-0.12.6-py3-none-any.whl", hash = "sha256:64568d18d547daeca82f96e24acf7b829ae62c7a7ee875f9d178f7f59871f321", size = 69950, upload-time = "2025-12-17T11:42:16.427Z" }, ] [[package]] @@ -1037,6 +1103,46 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/94/fb/1b681635bfd5f2274d0caa8f934b58435db6c091b97f5593738065ddb786/cymem-2.0.13-cp312-cp312-win_arm64.whl", hash = "sha256:6bbd701338df7bf408648191dff52472a9b334f71bcd31a21a41d83821050f67", size = 35959, upload-time = "2025-11-14T14:57:41.682Z" }, ] +[[package]] +name = "cython" +version = "3.2.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/39/e1/c0d92b1258722e1bc62a12e630c33f1f842fdab53fd8cd5de2f75c6449a9/cython-3.2.3.tar.gz", hash = "sha256:f13832412d633376ffc08d751cc18ed0d7d00a398a4065e2871db505258748a6", size = 3276650, upload-time = "2025-12-14T07:50:34.691Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b4/14/d16282d17c9eb2f78ca9ccd5801fed22f6c3360f5a55dbcce3c93cc70352/cython-3.2.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:cf210228c15b5c625824d8e31d43b6fea25f9e13c81dac632f2f7d838e0229a5", size = 2968471, upload-time = "2025-12-14T07:51:01.207Z" }, + { url = "https://files.pythonhosted.org/packages/d0/3c/46304a942dac5a636701c55f5b05ec00ad151e6722cd068fe3d0993349bb/cython-3.2.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f5bf0cebeb4147e172a114437d3fce5a507595d8fdd821be792b1bb25c691514", size = 3223581, upload-time = "2025-12-14T07:51:04.336Z" }, + { url = "https://files.pythonhosted.org/packages/29/ad/15da606d71f40bcf2c405f84ca3d4195cb252f4eaa2f551fe6b2e630ee7c/cython-3.2.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d1f8700ba89c977438744f083890d87187f15709507a5489e0f6d682053b7fa0", size = 3391391, upload-time = "2025-12-14T07:51:05.998Z" }, + { url = "https://files.pythonhosted.org/packages/51/9e/045b35eb678682edc3e2d57112cf5ac3581a9ef274eb220b638279195678/cython-3.2.3-cp312-cp312-win_amd64.whl", hash = "sha256:25732f3981a93407826297f4423206e5e22c3cfccfc74e37bf444453bbdc076f", size = 2756814, upload-time = "2025-12-14T07:51:07.759Z" }, + { url = "https://files.pythonhosted.org/packages/43/49/afe1e3df87a770861cf17ba39f4a91f6d22a2571010fc1890b3708360630/cython-3.2.3-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:74f482da8b605c61b4df6ff716d013f20131949cb2fa59b03e63abd36ef5bac0", size = 2874467, upload-time = "2025-12-14T07:51:31.568Z" }, + { url = "https://files.pythonhosted.org/packages/c7/da/044f725a083e28fb4de5bd33d13ec13f0753734b6ae52d4bc07434610cc8/cython-3.2.3-cp39-abi3-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:0a75a04688875b275a6c875565e672325bae04327dd6ec2fc25aeb5c6cf82fce", size = 3211272, upload-time = "2025-12-14T07:51:33.673Z" }, + { url = "https://files.pythonhosted.org/packages/95/14/af02ba6e2e03279f2ca2956e3024a44faed4c8496bda8170b663dc3ba6e8/cython-3.2.3-cp39-abi3-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:6b01b36c9eb1b68c25bddbeef7379f7bfc37f7c9afc044e71840ffab761a2dd0", size = 2856058, upload-time = "2025-12-14T07:51:36.015Z" }, + { url = "https://files.pythonhosted.org/packages/69/16/d254359396c2f099ab154f89b2b35f5b8b0dd21a8102c2c96a7e00291434/cython-3.2.3-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:3829f99d611412288f44ff543e9d2b5c0c83274998b2a6680bbe5cca3539c1fd", size = 2993276, upload-time = "2025-12-14T07:51:37.863Z" }, + { url = "https://files.pythonhosted.org/packages/51/0e/1a071381923e896f751f8fbff2a01c5dc8860a8b9a90066f6ec8df561dc4/cython-3.2.3-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:c2365a0c79ab9c0fa86d30a4a6ba7e37fc1be9537c48b79b9d63ee7e08bf2fef", size = 2890843, upload-time = "2025-12-14T07:51:40.409Z" }, + { url = "https://files.pythonhosted.org/packages/f4/46/1e93e10766db988e6bb8e5c6f7e2e90b9e62f1ac8dee4c1a6cf1fc170773/cython-3.2.3-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:3141734fb15f8b5e9402b9240f8da8336edecae91742b41c85678c31ab68f66d", size = 3225339, upload-time = "2025-12-14T07:51:42.09Z" }, + { url = "https://files.pythonhosted.org/packages/d4/ae/c284b06ae6a9c95d5883bf8744d10466cf0df64cef041a4c80ccf9fd07bd/cython-3.2.3-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:9a24cc653fad3adbd9cbaa638d80df3aa08a1fe27f62eb35850971c70be680df", size = 3114751, upload-time = "2025-12-14T07:51:44.088Z" }, + { url = "https://files.pythonhosted.org/packages/c6/d6/7795a4775c70256217134195f06b07233cf17b00f8905d5b3d782208af64/cython-3.2.3-cp39-abi3-win32.whl", hash = "sha256:b39dff92db70cbd95528f3b81d70e06bd6d3fc9c1dd91321e4d3b999ece3bceb", size = 2435616, upload-time = "2025-12-14T07:51:46.063Z" }, + { url = "https://files.pythonhosted.org/packages/18/9e/2a3edcb858ad74e6274448dccf32150c532bc6e423f112a71f65ff3b5680/cython-3.2.3-cp39-abi3-win_arm64.whl", hash = "sha256:18edc858e6a52de47fe03ffa97ea14dadf450e20069de0a8aef531006c4bbd93", size = 2440952, upload-time = "2025-12-14T07:51:47.943Z" }, + { url = "https://files.pythonhosted.org/packages/e5/41/54fd429ff8147475fc24ca43246f85d78fb4e747c27f227e68f1594648f1/cython-3.2.3-py3-none-any.whl", hash = "sha256:06a1317097f540d3bb6c7b81ed58a0d8b9dbfa97abf39dfd4c22ee87a6c7241e", size = 1255561, upload-time = "2025-12-14T07:50:31.217Z" }, +] + +[[package]] +name = "dask" +version = "2025.12.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "click" }, + { name = "cloudpickle" }, + { name = "fsspec" }, + { name = "packaging" }, + { name = "partd" }, + { name = "pyyaml" }, + { name = "toolz" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/49/ae/92fca08ff8fe3e8413842564dd55ee30c9cd9e07629e1bf4d347b005a5bf/dask-2025.12.0.tar.gz", hash = "sha256:8d478f2aabd025e2453cf733ad64559de90cf328c20209e4574e9543707c3e1b", size = 10995316, upload-time = "2025-12-12T14:59:10.885Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6f/3a/2121294941227c548d4b5f897a8a1b5f4c44a58f5437f239e6b86511d78e/dask-2025.12.0-py3-none-any.whl", hash = "sha256:4213ce9c5d51d6d89337cff69de35d902aa0bf6abdb8a25c942a4d0281f3a598", size = 1481293, upload-time = "2025-12-12T14:58:59.32Z" }, +] + [[package]] name = "datalad" version = "1.2.3" @@ -1098,15 +1204,15 @@ wheels = [ [[package]] name = "debugpy" -version = "1.8.18" +version = "1.8.19" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/62/1a/7cb5531840d7ba5d9329644109e62adee41f2f0083d9f8a4039f01de58cf/debugpy-1.8.18.tar.gz", hash = "sha256:02551b1b84a91faadd2db9bc4948873f2398190c95b3cc6f97dc706f43e8c433", size = 1644467, upload-time = "2025-12-10T19:48:07.236Z" } +sdist = { url = "https://files.pythonhosted.org/packages/73/75/9e12d4d42349b817cd545b89247696c67917aab907012ae5b64bbfea3199/debugpy-1.8.19.tar.gz", hash = "sha256:eea7e5987445ab0b5ed258093722d5ecb8bb72217c5c9b1e21f64efe23ddebdb", size = 1644590, upload-time = "2025-12-15T21:53:28.044Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/83/01/439626e3572a33ac543f25bc1dac1e80bc01c7ce83f3c24dc4441302ca13/debugpy-1.8.18-cp312-cp312-macosx_15_0_universal2.whl", hash = "sha256:530c38114725505a7e4ea95328dbc24aabb9be708c6570623c8163412e6d1d6b", size = 2549961, upload-time = "2025-12-10T19:48:21.73Z" }, - { url = "https://files.pythonhosted.org/packages/cd/73/1eeaa15c20a2b627be57a65bc1ebf2edd8d896950eac323588b127d776f2/debugpy-1.8.18-cp312-cp312-manylinux_2_34_x86_64.whl", hash = "sha256:a114865099283cbed4c9330cb0c9cb7a04cfa92e803577843657302d526141ec", size = 4309855, upload-time = "2025-12-10T19:48:23.41Z" }, - { url = "https://files.pythonhosted.org/packages/e4/6f/2da8ded21ae55df7067e57bd7f67ffed7e08b634f29bdba30c03d3f19918/debugpy-1.8.18-cp312-cp312-win32.whl", hash = "sha256:4d26736dfabf404e9f3032015ec7b0189e7396d0664e29e5bdbe7ac453043c95", size = 5280577, upload-time = "2025-12-10T19:48:25.386Z" }, - { url = "https://files.pythonhosted.org/packages/f5/8e/ebe887218c5b84f9421de7eb7bb7cdf196e84535c3f504a562219297d755/debugpy-1.8.18-cp312-cp312-win_amd64.whl", hash = "sha256:7e68ba950acbcf95ee862210133681f408cbb78d1c9badbb515230ec55ed6487", size = 5322458, upload-time = "2025-12-10T19:48:28.049Z" }, - { url = "https://files.pythonhosted.org/packages/dc/0d/bf7ac329c132436c57124202b5b5ccd6366e5d8e75eeb184cf078c826e8d/debugpy-1.8.18-py2.py3-none-any.whl", hash = "sha256:ab8cf0abe0fe2dfe1f7e65abc04b1db8740f9be80c1274acb625855c5c3ece6e", size = 5286576, upload-time = "2025-12-10T19:48:56.071Z" }, + { url = "https://files.pythonhosted.org/packages/4a/15/d762e5263d9e25b763b78be72dc084c7a32113a0bac119e2f7acae7700ed/debugpy-1.8.19-cp312-cp312-macosx_15_0_universal2.whl", hash = "sha256:bccb1540a49cde77edc7ce7d9d075c1dbeb2414751bc0048c7a11e1b597a4c2e", size = 2549995, upload-time = "2025-12-15T21:53:43.773Z" }, + { url = "https://files.pythonhosted.org/packages/a7/88/f7d25c68b18873b7c53d7c156ca7a7ffd8e77073aa0eac170a9b679cf786/debugpy-1.8.19-cp312-cp312-manylinux_2_34_x86_64.whl", hash = "sha256:e9c68d9a382ec754dc05ed1d1b4ed5bd824b9f7c1a8cd1083adb84b3c93501de", size = 4309891, upload-time = "2025-12-15T21:53:45.26Z" }, + { url = "https://files.pythonhosted.org/packages/c5/4f/a65e973aba3865794da65f71971dca01ae66666132c7b2647182d5be0c5f/debugpy-1.8.19-cp312-cp312-win32.whl", hash = "sha256:6599cab8a783d1496ae9984c52cb13b7c4a3bd06a8e6c33446832a5d97ce0bee", size = 5286355, upload-time = "2025-12-15T21:53:46.763Z" }, + { url = "https://files.pythonhosted.org/packages/d8/3a/d3d8b48fec96e3d824e404bf428276fb8419dfa766f78f10b08da1cb2986/debugpy-1.8.19-cp312-cp312-win_amd64.whl", hash = "sha256:66e3d2fd8f2035a8f111eb127fa508469dfa40928a89b460b41fd988684dc83d", size = 5328239, upload-time = "2025-12-15T21:53:48.868Z" }, + { url = "https://files.pythonhosted.org/packages/25/3e/e27078370414ef35fafad2c06d182110073daaeb5d3bf734b0b1eeefe452/debugpy-1.8.19-py2.py3-none-any.whl", hash = "sha256:360ffd231a780abbc414ba0f005dad409e71c78637efe8f2bd75837132a41d38", size = 5292321, upload-time = "2025-12-15T21:54:16.024Z" }, ] [[package]] @@ -1270,7 +1376,7 @@ wheels = [ [[package]] name = "fastapi" -version = "0.124.4" +version = "0.125.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "annotated-doc" }, @@ -1278,9 +1384,26 @@ dependencies = [ { name = "starlette" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/cd/21/ade3ff6745a82ea8ad88552b4139d27941549e4f19125879f848ac8f3c3d/fastapi-0.124.4.tar.gz", hash = "sha256:0e9422e8d6b797515f33f500309f6e1c98ee4e85563ba0f2debb282df6343763", size = 378460, upload-time = "2025-12-12T15:00:43.891Z" } +sdist = { url = "https://files.pythonhosted.org/packages/17/71/2df15009fb4bdd522a069d2fbca6007c6c5487fce5cb965be00fc335f1d1/fastapi-0.125.0.tar.gz", hash = "sha256:16b532691a33e2c5dee1dac32feb31dc6eb41a3dd4ff29a95f9487cb21c054c0", size = 370550, upload-time = "2025-12-17T21:41:44.15Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/34/2f/ff2fcc98f500713368d8b650e1bbc4a0b3ebcdd3e050dcdaad5f5a13fd7e/fastapi-0.125.0-py3-none-any.whl", hash = "sha256:2570ec4f3aecf5cca8f0428aed2398b774fcdfee6c2116f86e80513f2f86a7a1", size = 112888, upload-time = "2025-12-17T21:41:41.286Z" }, +] + +[[package]] +name = "fastcluster" +version = "1.3.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/41/1e/417892546cb92e71f5bcaeffc8d89b47716fd811805a8ae559b91f754015/fastcluster-1.3.0.tar.gz", hash = "sha256:d5233aeba5c3faa949c7fa6a39345a09f716ccebbd748541e5735c866696df02", size = 173065, upload-time = "2025-05-06T17:45:30.101Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/3e/57/aa70121b5008f44031be645a61a7c4abc24e0e888ad3fc8fda916f4d188e/fastapi-0.124.4-py3-none-any.whl", hash = "sha256:6d1e703698443ccb89e50abe4893f3c84d9d6689c0cf1ca4fad6d3c15cf69f15", size = 113281, upload-time = "2025-12-12T15:00:42.44Z" }, + { url = "https://files.pythonhosted.org/packages/41/dc/b43081c5f4c1441b46e847adee464cea22dbb106891437b4a2d41a81f59a/fastcluster-1.3.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:b20785852abb0ba5af62316327b654cea0fd736f819cd48792de0875ffb485f0", size = 62799, upload-time = "2025-05-06T17:45:15.084Z" }, + { url = "https://files.pythonhosted.org/packages/fe/77/d1cf1f6e6c83c11ebcf4d378a5ea566d30b50e240477f695e33a9b88698b/fastcluster-1.3.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:6be2e33529917df1398f5d85ea55856ebddd81041b0fbe2dfc6badcb0c3b2054", size = 38879, upload-time = "2025-05-06T17:45:16.649Z" }, + { url = "https://files.pythonhosted.org/packages/e5/69/0bf77416d2fba60d773039eb236c6fcf64384236c58e63b5a2120e803af3/fastcluster-1.3.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:1ddd6df989ee9ced20c4ecd7cef8df421a10b5410913385bb29d9183d21cc5ee", size = 34778, upload-time = "2025-05-06T17:45:17.646Z" }, + { url = "https://files.pythonhosted.org/packages/db/36/bc720b34d27bcb40024d63692e1f30a4e9402670881121755c5a1fb5e5c8/fastcluster-1.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d0d22a99d2fef1d7a314650e0f5fc78d4f91d4b233e5aa81b31da45506d25f21", size = 184385, upload-time = "2025-05-06T17:45:18.697Z" }, + { url = "https://files.pythonhosted.org/packages/27/eb/df607b9e505fc105539977c7da68af06a448d6dfb86355ff2b839c775fbe/fastcluster-1.3.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:428126288895fcb6316a239635bafaa19e4677240afee2723a952e488929091d", size = 194095, upload-time = "2025-05-06T17:45:20.378Z" }, + { url = "https://files.pythonhosted.org/packages/d0/63/e6ffa0b2cc9d708f9ab6eb4dd22fc843d64002e7cf9b2bc1ca6ec6df0dd7/fastcluster-1.3.0-cp312-cp312-win_amd64.whl", hash = "sha256:317db2531895cdf178a3009d3a8b13dfa83a5ed4ab14943b33377174cf9420cf", size = 37350, upload-time = "2025-05-06T17:45:22.018Z" }, ] [[package]] @@ -1360,11 +1483,11 @@ wheels = [ [[package]] name = "filelock" -version = "3.20.0" +version = "3.20.1" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/58/46/0028a82567109b5ef6e4d2a1f04a583fb513e6cf9527fcdd09afd817deeb/filelock-3.20.0.tar.gz", hash = "sha256:711e943b4ec6be42e1d4e6690b48dc175c822967466bb31c0c293f34334c13f4", size = 18922, upload-time = "2025-10-08T18:03:50.056Z" } +sdist = { url = "https://files.pythonhosted.org/packages/a7/23/ce7a1126827cedeb958fc043d61745754464eb56c5937c35bbf2b8e26f34/filelock-3.20.1.tar.gz", hash = "sha256:b8360948b351b80f420878d8516519a2204b07aefcdcfd24912a5d33127f188c", size = 19476, upload-time = "2025-12-15T23:54:28.027Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/76/91/7216b27286936c16f5b4d0c530087e4a54eead683e6b0b73dd0c64844af6/filelock-3.20.0-py3-none-any.whl", hash = "sha256:339b4732ffda5cd79b13f4e2711a31b0365ce445d95d243bb996273d072546a2", size = 16054, upload-time = "2025-10-08T18:03:48.35Z" }, + { url = "https://files.pythonhosted.org/packages/e3/7f/a1a97644e39e7316d850784c642093c99df1290a460df4ede27659056834/filelock-3.20.1-py3-none-any.whl", hash = "sha256:15d9e9a67306188a44baa72f569d2bfd803076269365fdea0934385da4dc361a", size = 16666, upload-time = "2025-12-15T23:54:26.874Z" }, ] [[package]] @@ -1401,19 +1524,19 @@ wheels = [ [[package]] name = "fonttools" -version = "4.61.0" +version = "4.61.1" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/33/f9/0e84d593c0e12244150280a630999835a64f2852276161b62a0f98318de0/fonttools-4.61.0.tar.gz", hash = "sha256:ec520a1f0c7758d7a858a00f090c1745f6cde6a7c5e76fb70ea4044a15f712e7", size = 3561884, upload-time = "2025-11-28T17:05:49.491Z" } +sdist = { url = "https://files.pythonhosted.org/packages/ec/ca/cf17b88a8df95691275a3d77dc0a5ad9907f328ae53acbe6795da1b2f5ed/fonttools-4.61.1.tar.gz", hash = "sha256:6675329885c44657f826ef01d9e4fb33b9158e9d93c537d84ad8399539bc6f69", size = 3565756, upload-time = "2025-12-12T17:31:24.246Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/00/5d/19e5939f773c7cb05480fe2e881d63870b63ee2b4bdb9a77d55b1d36c7b9/fonttools-4.61.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:e24a1565c4e57111ec7f4915f8981ecbb61adf66a55f378fdc00e206059fcfef", size = 2846930, upload-time = "2025-11-28T17:04:46.639Z" }, - { url = "https://files.pythonhosted.org/packages/25/b2/0658faf66f705293bd7e739a4f038302d188d424926be9c59bdad945664b/fonttools-4.61.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:e2bfacb5351303cae9f072ccf3fc6ecb437a6f359c0606bae4b1ab6715201d87", size = 2383016, upload-time = "2025-11-28T17:04:48.525Z" }, - { url = "https://files.pythonhosted.org/packages/29/a3/1fa90b95b690f0d7541f48850adc40e9019374d896c1b8148d15012b2458/fonttools-4.61.0-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:0bdcf2e29d65c26299cc3d502f4612365e8b90a939f46cd92d037b6cb7bb544a", size = 4949425, upload-time = "2025-11-28T17:04:50.482Z" }, - { url = "https://files.pythonhosted.org/packages/af/00/acf18c00f6c501bd6e05ee930f926186f8a8e268265407065688820f1c94/fonttools-4.61.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e6cd0d9051b8ddaf7385f99dd82ec2a058e2b46cf1f1961e68e1ff20fcbb61af", size = 4999632, upload-time = "2025-11-28T17:04:52.508Z" }, - { url = "https://files.pythonhosted.org/packages/5f/e0/19a2b86e54109b1d2ee8743c96a1d297238ae03243897bc5345c0365f34d/fonttools-4.61.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e074bc07c31406f45c418e17c1722e83560f181d122c412fa9e815df0ff74810", size = 4939438, upload-time = "2025-11-28T17:04:54.437Z" }, - { url = "https://files.pythonhosted.org/packages/04/35/7b57a5f57d46286360355eff8d6b88c64ab6331107f37a273a71c803798d/fonttools-4.61.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:5a9b78da5d5faa17e63b2404b77feeae105c1b7e75f26020ab7a27b76e02039f", size = 5088960, upload-time = "2025-11-28T17:04:56.348Z" }, - { url = "https://files.pythonhosted.org/packages/3e/0e/6c5023eb2e0fe5d1ababc7e221e44acd3ff668781489cc1937a6f83d620a/fonttools-4.61.0-cp312-cp312-win32.whl", hash = "sha256:9821ed77bb676736b88fa87a737c97b6af06e8109667e625a4f00158540ce044", size = 2264404, upload-time = "2025-11-28T17:04:58.149Z" }, - { url = "https://files.pythonhosted.org/packages/36/0b/63273128c7c5df19b1e4cd92e0a1e6ea5bb74a400c4905054c96ad60a675/fonttools-4.61.0-cp312-cp312-win_amd64.whl", hash = "sha256:0011d640afa61053bc6590f9a3394bd222de7cfde19346588beabac374e9d8ac", size = 2314427, upload-time = "2025-11-28T17:04:59.812Z" }, - { url = "https://files.pythonhosted.org/packages/0c/14/634f7daea5ffe6a5f7a0322ba8e1a0e23c9257b80aa91458107896d1dfc7/fonttools-4.61.0-py3-none-any.whl", hash = "sha256:276f14c560e6f98d24ef7f5f44438e55ff5a67f78fa85236b218462c9f5d0635", size = 1144485, upload-time = "2025-11-28T17:05:47.573Z" }, + { url = "https://files.pythonhosted.org/packages/6f/16/7decaa24a1bd3a70c607b2e29f0adc6159f36a7e40eaba59846414765fd4/fonttools-4.61.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:f3cb4a569029b9f291f88aafc927dd53683757e640081ca8c412781ea144565e", size = 2851593, upload-time = "2025-12-12T17:30:04.225Z" }, + { url = "https://files.pythonhosted.org/packages/94/98/3c4cb97c64713a8cf499b3245c3bf9a2b8fd16a3e375feff2aed78f96259/fonttools-4.61.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:41a7170d042e8c0024703ed13b71893519a1a6d6e18e933e3ec7507a2c26a4b2", size = 2400231, upload-time = "2025-12-12T17:30:06.47Z" }, + { url = "https://files.pythonhosted.org/packages/b7/37/82dbef0f6342eb01f54bca073ac1498433d6ce71e50c3c3282b655733b31/fonttools-4.61.1-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:10d88e55330e092940584774ee5e8a6971b01fc2f4d3466a1d6c158230880796", size = 4954103, upload-time = "2025-12-12T17:30:08.432Z" }, + { url = "https://files.pythonhosted.org/packages/6c/44/f3aeac0fa98e7ad527f479e161aca6c3a1e47bb6996b053d45226fe37bf2/fonttools-4.61.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:15acc09befd16a0fb8a8f62bc147e1a82817542d72184acca9ce6e0aeda9fa6d", size = 5004295, upload-time = "2025-12-12T17:30:10.56Z" }, + { url = "https://files.pythonhosted.org/packages/14/e8/7424ced75473983b964d09f6747fa09f054a6d656f60e9ac9324cf40c743/fonttools-4.61.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e6bcdf33aec38d16508ce61fd81838f24c83c90a1d1b8c68982857038673d6b8", size = 4944109, upload-time = "2025-12-12T17:30:12.874Z" }, + { url = "https://files.pythonhosted.org/packages/c8/8b/6391b257fa3d0b553d73e778f953a2f0154292a7a7a085e2374b111e5410/fonttools-4.61.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:5fade934607a523614726119164ff621e8c30e8fa1ffffbbd358662056ba69f0", size = 5093598, upload-time = "2025-12-12T17:30:15.79Z" }, + { url = "https://files.pythonhosted.org/packages/d9/71/fd2ea96cdc512d92da5678a1c98c267ddd4d8c5130b76d0f7a80f9a9fde8/fonttools-4.61.1-cp312-cp312-win32.whl", hash = "sha256:75da8f28eff26defba42c52986de97b22106cb8f26515b7c22443ebc9c2d3261", size = 2269060, upload-time = "2025-12-12T17:30:18.058Z" }, + { url = "https://files.pythonhosted.org/packages/80/3b/a3e81b71aed5a688e89dfe0e2694b26b78c7d7f39a5ffd8a7d75f54a12a8/fonttools-4.61.1-cp312-cp312-win_amd64.whl", hash = "sha256:497c31ce314219888c0e2fce5ad9178ca83fe5230b01a5006726cdf3ac9f24d9", size = 2319078, upload-time = "2025-12-12T17:30:22.862Z" }, + { url = "https://files.pythonhosted.org/packages/c7/4e/ce75a57ff3aebf6fc1f4e9d508b8e5810618a33d900ad6c19eb30b290b97/fonttools-4.61.1-py3-none-any.whl", hash = "sha256:17d2bf5d541add43822bcf0c43d7d847b160c9bb01d15d5007d84e2217aaa371", size = 1148996, upload-time = "2025-12-12T17:31:21.03Z" }, ] [[package]] @@ -1505,29 +1628,29 @@ wheels = [ [[package]] name = "google-auth" -version = "2.43.0" +version = "2.45.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "cachetools" }, { name = "pyasn1-modules" }, { name = "rsa" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/ff/ef/66d14cf0e01b08d2d51ffc3c20410c4e134a1548fc246a6081eae585a4fe/google_auth-2.43.0.tar.gz", hash = "sha256:88228eee5fc21b62a1b5fe773ca15e67778cb07dc8363adcb4a8827b52d81483", size = 296359, upload-time = "2025-11-06T00:13:36.587Z" } +sdist = { url = "https://files.pythonhosted.org/packages/e5/00/3c794502a8b892c404b2dea5b3650eb21bfc7069612fbfd15c7f17c1cb0d/google_auth-2.45.0.tar.gz", hash = "sha256:90d3f41b6b72ea72dd9811e765699ee491ab24139f34ebf1ca2b9cc0c38708f3", size = 320708, upload-time = "2025-12-15T22:58:42.889Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/6f/d1/385110a9ae86d91cc14c5282c61fe9f4dc41c0b9f7d423c6ad77038c4448/google_auth-2.43.0-py2.py3-none-any.whl", hash = "sha256:af628ba6fa493f75c7e9dbe9373d148ca9f4399b5ea29976519e0a3848eddd16", size = 223114, upload-time = "2025-11-06T00:13:35.209Z" }, + { url = "https://files.pythonhosted.org/packages/c6/97/451d55e05487a5cd6279a01a7e34921858b16f7dc8aa38a2c684743cd2b3/google_auth-2.45.0-py2.py3-none-any.whl", hash = "sha256:82344e86dc00410ef5382d99be677c6043d72e502b625aa4f4afa0bdacca0f36", size = 233312, upload-time = "2025-12-15T22:58:40.777Z" }, ] [[package]] name = "google-crc32c" -version = "1.7.1" +version = "1.8.0" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/19/ae/87802e6d9f9d69adfaedfcfd599266bf386a54d0be058b532d04c794f76d/google_crc32c-1.7.1.tar.gz", hash = "sha256:2bff2305f98846f3e825dbeec9ee406f89da7962accdb29356e4eadc251bd472", size = 14495, upload-time = "2025-03-26T14:29:13.32Z" } +sdist = { url = "https://files.pythonhosted.org/packages/03/41/4b9c02f99e4c5fb477122cd5437403b552873f014616ac1d19ac8221a58d/google_crc32c-1.8.0.tar.gz", hash = "sha256:a428e25fb7691024de47fecfbff7ff957214da51eddded0da0ae0e0f03a2cf79", size = 14192, upload-time = "2025-12-16T00:35:25.142Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/dd/b7/787e2453cf8639c94b3d06c9d61f512234a82e1d12d13d18584bd3049904/google_crc32c-1.7.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:2d73a68a653c57281401871dd4aeebbb6af3191dcac751a76ce430df4d403194", size = 30470, upload-time = "2025-03-26T14:34:31.655Z" }, - { url = "https://files.pythonhosted.org/packages/ed/b4/6042c2b0cbac3ec3a69bb4c49b28d2f517b7a0f4a0232603c42c58e22b44/google_crc32c-1.7.1-cp312-cp312-macosx_12_0_x86_64.whl", hash = "sha256:22beacf83baaf59f9d3ab2bbb4db0fb018da8e5aebdce07ef9f09fce8220285e", size = 30315, upload-time = "2025-03-26T15:01:54.634Z" }, - { url = "https://files.pythonhosted.org/packages/29/ad/01e7a61a5d059bc57b702d9ff6a18b2585ad97f720bd0a0dbe215df1ab0e/google_crc32c-1.7.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:19eafa0e4af11b0a4eb3974483d55d2d77ad1911e6cf6f832e1574f6781fd337", size = 33180, upload-time = "2025-03-26T14:41:32.168Z" }, - { url = "https://files.pythonhosted.org/packages/3b/a5/7279055cf004561894ed3a7bfdf5bf90a53f28fadd01af7cd166e88ddf16/google_crc32c-1.7.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b6d86616faaea68101195c6bdc40c494e4d76f41e07a37ffdef270879c15fb65", size = 32794, upload-time = "2025-03-26T14:41:33.264Z" }, - { url = "https://files.pythonhosted.org/packages/0f/d6/77060dbd140c624e42ae3ece3df53b9d811000729a5c821b9fd671ceaac6/google_crc32c-1.7.1-cp312-cp312-win_amd64.whl", hash = "sha256:b7491bdc0c7564fcf48c0179d2048ab2f7c7ba36b84ccd3a3e1c3f7a72d3bba6", size = 33477, upload-time = "2025-03-26T14:29:10.94Z" }, + { url = "https://files.pythonhosted.org/packages/e9/5f/7307325b1198b59324c0fa9807cafb551afb65e831699f2ce211ad5c8240/google_crc32c-1.8.0-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:4b8286b659c1335172e39563ab0a768b8015e88e08329fa5321f774275fc3113", size = 31300, upload-time = "2025-12-16T00:21:56.723Z" }, + { url = "https://files.pythonhosted.org/packages/21/8e/58c0d5d86e2220e6a37befe7e6a94dd2f6006044b1a33edf1ff6d9f7e319/google_crc32c-1.8.0-cp312-cp312-macosx_12_0_x86_64.whl", hash = "sha256:2a3dc3318507de089c5384cc74d54318401410f82aa65b2d9cdde9d297aca7cb", size = 30867, upload-time = "2025-12-16T00:38:31.302Z" }, + { url = "https://files.pythonhosted.org/packages/ce/a9/a780cc66f86335a6019f557a8aaca8fbb970728f0efd2430d15ff1beae0e/google_crc32c-1.8.0-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:14f87e04d613dfa218d6135e81b78272c3b904e2a7053b841481b38a7d901411", size = 33364, upload-time = "2025-12-16T00:40:22.96Z" }, + { url = "https://files.pythonhosted.org/packages/21/3f/3457ea803db0198c9aaca2dd373750972ce28a26f00544b6b85088811939/google_crc32c-1.8.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:cb5c869c2923d56cb0c8e6bcdd73c009c36ae39b652dbe46a05eb4ef0ad01454", size = 33740, upload-time = "2025-12-16T00:40:23.96Z" }, + { url = "https://files.pythonhosted.org/packages/df/c0/87c2073e0c72515bb8733d4eef7b21548e8d189f094b5dad20b0ecaf64f6/google_crc32c-1.8.0-cp312-cp312-win_amd64.whl", hash = "sha256:3cc0c8912038065eafa603b238abf252e204accab2a704c63b9e14837a854962", size = 34437, upload-time = "2025-12-16T00:35:21.395Z" }, ] [[package]] @@ -1640,6 +1763,23 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/09/a8/2d02b10a66747c54446e932171dd89b8b4126c0111b440e6bc05a7c852ec/h5py-3.15.1-cp312-cp312-win_arm64.whl", hash = "sha256:61d5a58a9851e01ee61c932bbbb1c98fe20aba0a5674776600fb9a361c0aa652", size = 2458214, upload-time = "2025-10-16T10:34:35.733Z" }, ] +[[package]] +name = "harmony-pytorch" +version = "0.1.8" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, + { name = "pandas" }, + { name = "psutil" }, + { name = "scikit-learn" }, + { name = "threadpoolctl" }, + { name = "torch" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/8d/df/71fe694d70ff148db2364c4118a6bbd37e598be269be471ed3d2f6bfee1a/harmony-pytorch-0.1.8.tar.gz", hash = "sha256:1b097906d49c6ed9dde6cf234f7d987fb49a3b649b8a1323d99e6ea71b5b7df2", size = 8373, upload-time = "2024-01-07T21:36:56.059Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/75/da/42486f1c79b6f2db9140ee23161791e5b25d9369f30c1d9f67b67f3eb4bf/harmony_pytorch-0.1.8-py3-none-any.whl", hash = "sha256:1f92f6145ea93225b0226fda9da5bdd442e411d14ff402052afae0fde7fd1452", size = 8474, upload-time = "2024-01-07T21:36:54.488Z" }, +] + [[package]] name = "hbreader" version = "0.9.1" @@ -1792,6 +1932,26 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/0e/61/66938bbb5fc52dbdf84594873d5b51fb1f7c7794e9c0f5bd885f30bc507b/idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea", size = 71008, upload-time = "2025-10-12T14:55:18.883Z" }, ] +[[package]] +name = "igraph" +version = "1.0.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "texttable" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/23/be/56bef1919005b4caf1f71522b300d359f7faeb7ae93a3b0baa9b4f146a87/igraph-1.0.0.tar.gz", hash = "sha256:2414d0be2e4d77ee5357807d100974b40f6082bb1bb71988ec46cfb6728651ee", size = 5077105, upload-time = "2025-10-23T12:22:50.127Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a5/03/3278ad0ceb3ea0e84d8ae3a85bdded4d0e57853aeb802a200feb43847b93/igraph-1.0.0-cp39-abi3-macosx_10_15_x86_64.whl", hash = "sha256:c2cbc415e02523e5a241eecee82319080bf928a70b1ba299f3b3e25bf029b6d4", size = 2257415, upload-time = "2025-10-23T12:22:27.246Z" }, + { url = "https://files.pythonhosted.org/packages/0d/bc/6281ec7f9baaf71ee57c3b1748da2d3148d15d253e1a03006f204aa68ca5/igraph-1.0.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:1a27753cd80680a8f676c2d5a467aaa4a95e510b30748398ec4e4aeb982130e8", size = 2048555, upload-time = "2025-10-23T12:22:29.49Z" }, + { url = "https://files.pythonhosted.org/packages/2a/38/3cd6428a4ed4c09a56df05998438e7774fd1d799ee4fb8fc481674f5f7fc/igraph-1.0.0-cp39-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:a55dc3a2a4e3fc3eba42479910c1511bfc3ecb33cdf5f0406891fd85f14b5aee", size = 5314141, upload-time = "2025-10-23T12:22:31.023Z" }, + { url = "https://files.pythonhosted.org/packages/7d/da/dd2867c25adbb41563720f14b5fc895c98bf88be682a3faff4f7b3118d2a/igraph-1.0.0-cp39-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:2d04c2c76f686fb1f554ee35dfd3085f5e73b7965ba6b4cf06d53e66b1955522", size = 5683134, upload-time = "2025-10-23T12:22:32.423Z" }, + { url = "https://files.pythonhosted.org/packages/e5/40/243c118d34ab80382d7009c4dcb99b887384c3d2ce84d29eeac19e2a007a/igraph-1.0.0-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:f2b52dc1757fff0fed29a9f7a276d971a11db4211569ed78b9eab36288dfcc9d", size = 6211583, upload-time = "2025-10-23T12:22:34.238Z" }, + { url = "https://files.pythonhosted.org/packages/1d/b7/88f433819c54b496cb0315fce28e658970cb20ff5dbd52a5a605ce2888de/igraph-1.0.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:05c79a2a8fca695b2f217a6fa7f2549f896f757d4db41be32a055400cb19cc30", size = 6594509, upload-time = "2025-10-23T12:22:35.831Z" }, + { url = "https://files.pythonhosted.org/packages/7b/5d/8f7f6f619d374e959aa3664ebc4b24c10abc90c2e8efbed97f2623fadaf5/igraph-1.0.0-cp39-abi3-win32.whl", hash = "sha256:c2bce3cd472fec3dd9c4d8a3ea5b6b9be65fb30edf760beb4850760dd4f2d479", size = 2725406, upload-time = "2025-10-23T12:22:37.588Z" }, + { url = "https://files.pythonhosted.org/packages/af/77/a85b3745cf40a0572bae2de8cd9c2a2a8af78e5cf3e880fc0a249114e609/igraph-1.0.0-cp39-abi3-win_amd64.whl", hash = "sha256:faeff8ede0cf15eb4ded44b0fcea6e1886740146e60504c24ad2da14e0939563", size = 3221663, upload-time = "2025-10-23T12:22:39.404Z" }, + { url = "https://files.pythonhosted.org/packages/ef/7e/5df541c37bdf6493035e89c22bd53f30d99b291bcda6c78e9a8afeecec2b/igraph-1.0.0-cp39-abi3-win_arm64.whl", hash = "sha256:b607cafc24b10a615e713ee96e58208ef27e0764af80140c7cc45d4724a3f2df", size = 2785701, upload-time = "2025-10-23T12:22:41.03Z" }, +] + [[package]] name = "ijson" version = "3.4.0.post0" @@ -1811,6 +1971,19 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/a1/2b/6f7ade27a8ff5758fc41006dadd2de01730def84fe3e60553b329c59e0d4/ijson-3.4.0.post0-cp312-cp312-win_amd64.whl", hash = "sha256:e15833dcf6f6d188fdc624a31cd0520c3ba21b6855dc304bc7c1a8aeca02d4ac", size = 54789, upload-time = "2025-10-10T05:28:19.552Z" }, ] +[[package]] +name = "imageio" +version = "2.37.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, + { name = "pillow" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/a3/6f/606be632e37bf8d05b253e8626c2291d74c691ddc7bcdf7d6aaf33b32f6a/imageio-2.37.2.tar.gz", hash = "sha256:0212ef2727ac9caa5ca4b2c75ae89454312f440a756fcfc8ef1993e718f50f8a", size = 389600, upload-time = "2025-11-04T14:29:39.898Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/fb/fe/301e0936b79bcab4cacc7548bf2853fc28dced0a578bab1f7ef53c9aa75b/imageio-2.37.2-py3-none-any.whl", hash = "sha256:ad9adfb20335d718c03de457358ed69f141021a333c40a53e57273d8a5bd0b9b", size = 317646, upload-time = "2025-11-04T14:29:37.948Z" }, +] + [[package]] name = "imagesize" version = "1.4.1" @@ -2064,11 +2237,11 @@ wheels = [ [[package]] name = "joblib" -version = "1.5.2" +version = "1.5.3" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/e8/5d/447af5ea094b9e4c4054f82e223ada074c552335b9b4b2d14bd9b35a67c4/joblib-1.5.2.tar.gz", hash = "sha256:3faa5c39054b2f03ca547da9b2f52fde67c06240c31853f306aea97f13647b55", size = 331077, upload-time = "2025-08-27T12:15:46.575Z" } +sdist = { url = "https://files.pythonhosted.org/packages/41/f2/d34e8b3a08a9cc79a50b2208a93dce981fe615b64d5a4d4abee421d898df/joblib-1.5.3.tar.gz", hash = "sha256:8561a3269e6801106863fd0d6d84bb737be9e7631e33aaed3fb9ce5953688da3", size = 331603, upload-time = "2025-12-15T08:41:46.427Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/1e/e8/685f47e0d754320684db4425a0967f7d3fa70126bffd76110b7009a0090f/joblib-1.5.2-py3-none-any.whl", hash = "sha256:4e1f0bdbb987e6d843c70cf43714cb276623def372df3c22fe5266b2670bc241", size = 308396, upload-time = "2025-08-27T12:15:45.188Z" }, + { url = "https://files.pythonhosted.org/packages/7b/91/984aca2ec129e2757d1e4e3c81c3fcda9d0f85b74670a094cc443d9ee949/joblib-1.5.3-py3-none-any.whl", hash = "sha256:5fc3c5039fc5ca8c0276333a188bbd59d6b7ab37fe6632daa76bc7f9ec18e713", size = 309071, upload-time = "2025-12-15T08:41:44.973Z" }, ] [[package]] @@ -2341,7 +2514,7 @@ wheels = [ [[package]] name = "jupyterlab" -version = "4.5.0" +version = "4.5.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "async-lru" }, @@ -2358,9 +2531,9 @@ dependencies = [ { name = "tornado" }, { name = "traitlets" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/df/e5/4fa382a796a6d8e2cd867816b64f1ff27f906e43a7a83ad9eb389e448cd8/jupyterlab-4.5.0.tar.gz", hash = "sha256:aec33d6d8f1225b495ee2cf20f0514f45e6df8e360bdd7ac9bace0b7ac5177ea", size = 23989880, upload-time = "2025-11-18T13:19:00.365Z" } +sdist = { url = "https://files.pythonhosted.org/packages/09/21/413d142686a4e8f4268d985becbdb4daf060524726248e73be4773786987/jupyterlab-4.5.1.tar.gz", hash = "sha256:09da1ddfbd9eec18b5101dbb8515612aa1e47443321fb99503725a88e93d20d9", size = 23992251, upload-time = "2025-12-15T16:58:59.361Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/6c/1e/5a4d5498eba382fee667ed797cf64ae5d1b13b04356df62f067f48bb0f61/jupyterlab-4.5.0-py3-none-any.whl", hash = "sha256:88e157c75c1afff64c7dc4b801ec471450b922a4eae4305211ddd40da8201c8a", size = 12380641, upload-time = "2025-11-18T13:18:56.252Z" }, + { url = "https://files.pythonhosted.org/packages/af/c3/acced767eecc11a70c65c45295db5396c4f0c1937874937d5a76d7b177b6/jupyterlab-4.5.1-py3-none-any.whl", hash = "sha256:31b059de96de0754ff1f2ce6279774b6aab8c34d7082e9752db58207c99bd514", size = 12384821, upload-time = "2025-12-15T16:58:55.563Z" }, ] [[package]] @@ -2536,6 +2709,35 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/83/60/d497a310bde3f01cb805196ac61b7ad6dc5dcf8dce66634dc34364b20b4f/lazy_loader-0.4-py3-none-any.whl", hash = "sha256:342aa8e14d543a154047afb4ba8ef17f5563baad3fc610d7b15b213b0f119efc", size = 12097, upload-time = "2024-04-05T13:03:10.514Z" }, ] +[[package]] +name = "legacy-api-wrap" +version = "1.5" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/58/49/f06f94048c8974205730d40beca879e43b6eee08efb0101cfb8623e60f41/legacy_api_wrap-1.5.tar.gz", hash = "sha256:b41ba6532f3ebfe3a897a35a7f97dec3be04b92a450f6c2bcf89f1b91c9cadf2", size = 11610, upload-time = "2025-11-03T13:21:12.437Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/41/5b/058db09c45ba58a7321bdf2294cae651b37d6fec68117265af90cde043b0/legacy_api_wrap-1.5-py3-none-any.whl", hash = "sha256:5a8ea50e3e3bcbcdec3447b77034fd0d32cb2cf4089db799238708e4d7e0098d", size = 10182, upload-time = "2025-11-03T13:21:11.102Z" }, +] + +[[package]] +name = "leidenalg" +version = "0.11.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "igraph" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/5d/a5/853e93441aed7f82b0389f86f37e19413e817ba0c54cc790895935256968/leidenalg-0.11.0.tar.gz", hash = "sha256:f454be96bbc8089ea2a90ca853d8d389ab646de964a03bd58417f8b29ff8ef5d", size = 452850, upload-time = "2025-10-31T17:14:48.684Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/28/a9/4ab4e244215db0c8b626e4bed0d3e0fbd191c52d2d5f5cb9d160139ecc7e/leidenalg-0.11.0-cp38-abi3-macosx_10_9_x86_64.whl", hash = "sha256:5607589050bfc1926e657b4d8a3b5341fe1eb81018c22cf4a3d3a39e368d1fcb", size = 2256514, upload-time = "2025-10-31T17:14:27.574Z" }, + { url = "https://files.pythonhosted.org/packages/98/f4/98db342d603671ae0a233f0a624939a47161044a2716cbd62a50440a1132/leidenalg-0.11.0-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:9b5781876b1f1faed72a4f9926ff52de286843556b9d6791fe25a2acb33b7a5c", size = 1926003, upload-time = "2025-10-31T17:14:29.521Z" }, + { url = "https://files.pythonhosted.org/packages/9b/38/fd6ac21af10b12828b472eada4fce0edf2a212581238ad0c8d1afebc6f98/leidenalg-0.11.0-cp38-abi3-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a80f49477f8e793f27d8e08949177f19e3834cd878af50a662b4f87335d06549", size = 2545535, upload-time = "2025-10-31T17:14:31.102Z" }, + { url = "https://files.pythonhosted.org/packages/77/87/b087584750a788535b4a8d56ddeb82a175d32b472aa5338a4e2cc593a42c/leidenalg-0.11.0-cp38-abi3-manylinux_2_26_i686.manylinux_2_28_i686.whl", hash = "sha256:2143be3e80485584ccbdf927323fce65345da17facd0f8b438f11015f5dc6c27", size = 2845029, upload-time = "2025-10-31T17:14:32.815Z" }, + { url = "https://files.pythonhosted.org/packages/b0/a4/a89e2ce16a580f7bea066ed49364f0b3e04a6412f0c3692975bee8515141/leidenalg-0.11.0-cp38-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:571a0934f831a69442d82889d319bdba93de924bd9e09b720cd8cbe6fdc08c17", size = 2738084, upload-time = "2025-10-31T17:14:35.246Z" }, + { url = "https://files.pythonhosted.org/packages/e8/fe/8923cac6cd7c9e0ac5f38aaa69a4744c93d025575763d05f7a3baae8020d/leidenalg-0.11.0-cp38-abi3-musllinux_1_2_i686.whl", hash = "sha256:aec03e7178b19102dd271b453a39b9865cf283b4113151ba60514e5681046294", size = 4070307, upload-time = "2025-10-31T17:14:36.796Z" }, + { url = "https://files.pythonhosted.org/packages/fe/94/beaab5ee9968f9389f705532c31ffb868bad8a5ce68fb699ddde5ddc5409/leidenalg-0.11.0-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:310b9269a11fd1e960590c1a2b6ff685a2cc42aa3234ce67bc2a623ab61f26a9", size = 3797863, upload-time = "2025-10-31T17:14:38.124Z" }, + { url = "https://files.pythonhosted.org/packages/6e/8e/8caf4ba38fd7d8e6197348b348a4ab666b1b3117225ea2f0934a98a93176/leidenalg-0.11.0-cp38-abi3-win32.whl", hash = "sha256:5ea4cd7ee054540112b28f7e2d64658dcccd59f61a5d6a08a41df808645f96e9", size = 1643351, upload-time = "2025-10-31T17:14:39.385Z" }, + { url = "https://files.pythonhosted.org/packages/47/15/7d459a8e2a43f17c1db129b997b7bb7aa7f000a0967bab87c28b8c5cf448/leidenalg-0.11.0-cp38-abi3-win_amd64.whl", hash = "sha256:5e789c0960008d185413344a402d0587580c441644d4d20bf57c96f25d4d1710", size = 1990321, upload-time = "2025-10-31T17:14:40.892Z" }, +] + [[package]] name = "linkcheckmd" version = "1.4.0" @@ -2631,6 +2833,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/2a/6b/d139535d7590a1bba1ceb68751bef22fadaa5b815bbdf0e858e3875726b2/llvmlite-0.46.0-cp312-cp312-win_amd64.whl", hash = "sha256:398b39db462c39563a97b912d4f2866cd37cba60537975a09679b28fbbc0fb38", size = 38138940, upload-time = "2025-12-08T18:15:10.162Z" }, ] +[[package]] +name = "locket" +version = "1.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/2f/83/97b29fe05cb6ae28d2dbd30b81e2e402a3eed5f460c26e9eaa5895ceacf5/locket-1.0.0.tar.gz", hash = "sha256:5c0d4c052a8bbbf750e056a8e65ccd309086f4f0f18a2eac306a8dfa4112a632", size = 4350, upload-time = "2022-04-20T22:04:44.312Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/db/bc/83e112abc66cd466c6b83f99118035867cecd41802f8d044638aa78a106e/locket-1.0.0-py2.py3-none-any.whl", hash = "sha256:b6c819a722f7b6bd955b80781788e4a66a55628b858d347536b7e81325a3a5e3", size = 4398, upload-time = "2022-04-20T22:04:42.23Z" }, +] + [[package]] name = "loguru" version = "0.7.3" @@ -3025,11 +3236,20 @@ wheels = [ [[package]] name = "narwhals" -version = "2.13.0" +version = "2.14.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/4a/84/897fe7b6406d436ef312e57e5a1a13b4a5e7e36d1844e8d934ce8880e3d3/narwhals-2.14.0.tar.gz", hash = "sha256:98be155c3599db4d5c211e565c3190c398c87e7bf5b3cdb157dece67641946e0", size = 600648, upload-time = "2025-12-16T11:29:13.458Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/79/3e/b8ecc67e178919671695f64374a7ba916cf0adbf86efedc6054f38b5b8ae/narwhals-2.14.0-py3-none-any.whl", hash = "sha256:b56796c9a00179bd757d15282c540024e1d5c910b19b8c9944d836566c030acf", size = 430788, upload-time = "2025-12-16T11:29:11.699Z" }, +] + +[[package]] +name = "natsort" +version = "8.4.0" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/89/ea/f82ef99ced4d03c33bb314c9b84a08a0a86c448aaa11ffd6256b99538aa5/narwhals-2.13.0.tar.gz", hash = "sha256:ee94c97f4cf7cfeebbeca8d274784df8b3d7fd3f955ce418af998d405576fdd9", size = 594555, upload-time = "2025-12-01T13:54:05.329Z" } +sdist = { url = "https://files.pythonhosted.org/packages/e2/a9/a0c57aee75f77794adaf35322f8b6404cbd0f89ad45c87197a937764b7d0/natsort-8.4.0.tar.gz", hash = "sha256:45312c4a0e5507593da193dedd04abb1469253b601ecaf63445ad80f0a1ea581", size = 76575, upload-time = "2023-06-20T04:17:19.925Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/87/0d/1861d1599571974b15b025e12b142d8e6b42ad66c8a07a89cb0fc21f1e03/narwhals-2.13.0-py3-none-any.whl", hash = "sha256:9b795523c179ca78204e3be53726da374168f906e38de2ff174c2363baaaf481", size = 426407, upload-time = "2025-12-01T13:54:03.861Z" }, + { url = "https://files.pythonhosted.org/packages/ef/82/7a9d0550484a62c6da82858ee9419f3dd1ccc9aa1c26a1e43da3ecd20b0d/natsort-8.4.0-py3-none-any.whl", hash = "sha256:4732914fb471f56b5cce04d7bae6f164a592c7712e1c85f9ef585e197299521c", size = 38268, upload-time = "2023-06-20T04:17:17.522Z" }, ] [[package]] @@ -3181,7 +3401,7 @@ wheels = [ [[package]] name = "notebook" -version = "7.5.0" +version = "7.5.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "jupyter-server" }, @@ -3190,9 +3410,9 @@ dependencies = [ { name = "notebook-shim" }, { name = "tornado" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/89/ac/a97041621250a4fc5af379fb377942841eea2ca146aab166b8fcdfba96c2/notebook-7.5.0.tar.gz", hash = "sha256:3b27eaf9913033c28dde92d02139414c608992e1df4b969c843219acf2ff95e4", size = 14052074, upload-time = "2025-11-19T08:36:20.093Z" } +sdist = { url = "https://files.pythonhosted.org/packages/8a/a9/882707b0aa639e6d7d3e7df4bfbe07479d832e9a8f02d8471002a4ea6d65/notebook-7.5.1.tar.gz", hash = "sha256:b2fb4cef4d47d08c33aecce1c6c6e84be05436fbd791f88fce8df9fbca088b75", size = 14058696, upload-time = "2025-12-16T07:38:59.223Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/73/96/00df2a4760f10f5af0f45c4955573cae6189931f9a30265a35865f8c1031/notebook-7.5.0-py3-none-any.whl", hash = "sha256:3300262d52905ca271bd50b22617681d95f08a8360d099e097726e6d2efb5811", size = 14460968, upload-time = "2025-11-19T08:36:15.869Z" }, + { url = "https://files.pythonhosted.org/packages/d1/86/ca516cb58ad2cb2064124d31cf0fd8b012fca64bebeb26da2d2ddf03fc79/notebook-7.5.1-py3-none-any.whl", hash = "sha256:f4e2451c19910c33b88709b84537e11f6368c1cdff1aa0c43db701aea535dd44", size = 14468080, upload-time = "2025-12-16T07:38:55.644Z" }, ] [[package]] @@ -3508,7 +3728,7 @@ wheels = [ [[package]] name = "openai" -version = "2.11.0" +version = "2.13.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "anyio" }, @@ -3520,9 +3740,9 @@ dependencies = [ { name = "tqdm" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/f4/8c/aa6aea6072f985ace9d6515046b9088ff00c157f9654da0c7b1e129d9506/openai-2.11.0.tar.gz", hash = "sha256:b3da01d92eda31524930b6ec9d7167c535e843918d7ba8a76b1c38f1104f321e", size = 624540, upload-time = "2025-12-11T19:11:58.539Z" } +sdist = { url = "https://files.pythonhosted.org/packages/0f/39/8e347e9fda125324d253084bb1b82407e5e3c7777a03dc398f79b2d95626/openai-2.13.0.tar.gz", hash = "sha256:9ff633b07a19469ec476b1e2b5b26c5ef700886524a7a72f65e6f0b5203142d5", size = 626583, upload-time = "2025-12-16T18:19:44.387Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/e5/f1/d9251b565fce9f8daeb45611e3e0d2f7f248429e40908dcee3b6fe1b5944/openai-2.11.0-py3-none-any.whl", hash = "sha256:21189da44d2e3d027b08c7a920ba4454b8b7d6d30ae7e64d9de11dbe946d4faa", size = 1064131, upload-time = "2025-12-11T19:11:56.816Z" }, + { url = "https://files.pythonhosted.org/packages/bb/d5/eb52edff49d3d5ea116e225538c118699ddeb7c29fa17ec28af14bc10033/openai-2.13.0-py3-none-any.whl", hash = "sha256:746521065fed68df2f9c2d85613bb50844343ea81f60009b60e6a600c9352c79", size = 1066837, upload-time = "2025-12-16T18:19:43.124Z" }, ] [[package]] @@ -3722,6 +3942,19 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/16/32/f8e3c85d1d5250232a5d3477a2a28cc291968ff175caeadaf3cc19ce0e4a/parso-0.8.5-py2.py3-none-any.whl", hash = "sha256:646204b5ee239c396d040b90f9e272e9a8017c630092bf59980beb62fd033887", size = 106668, upload-time = "2025-08-23T15:15:25.663Z" }, ] +[[package]] +name = "partd" +version = "1.4.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "locket" }, + { name = "toolz" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/b2/3a/3f06f34820a31257ddcabdfafc2672c5816be79c7e353b02c1f318daa7d4/partd-1.4.2.tar.gz", hash = "sha256:d022c33afbdc8405c226621b015e8067888173d85f7f5ecebb3cafed9a20f02c", size = 21029, upload-time = "2024-05-06T19:51:41.945Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/71/e7/40fb618334dcdf7c5a316c0e7343c5cd82d3d866edc100d98e29bc945ecd/partd-1.4.2-py3-none-any.whl", hash = "sha256:978e4ac767ec4ba5b86c6eaa52e5a2a3bc748a2ca839e8cc798f1cc6ce6efb0f", size = 18905, upload-time = "2024-05-06T19:51:39.271Z" }, +] + [[package]] name = "pathlib-abc" version = "0.5.2" @@ -3851,7 +4084,7 @@ wheels = [ [[package]] name = "pre-commit" -version = "4.5.0" +version = "4.5.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "cfgv" }, @@ -3860,9 +4093,9 @@ dependencies = [ { name = "pyyaml" }, { name = "virtualenv" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/f4/9b/6a4ffb4ed980519da959e1cf3122fc6cb41211daa58dbae1c73c0e519a37/pre_commit-4.5.0.tar.gz", hash = "sha256:dc5a065e932b19fc1d4c653c6939068fe54325af8e741e74e88db4d28a4dd66b", size = 198428, upload-time = "2025-11-22T21:02:42.304Z" } +sdist = { url = "https://files.pythonhosted.org/packages/40/f1/6d86a29246dfd2e9b6237f0b5823717f60cad94d47ddc26afa916d21f525/pre_commit-4.5.1.tar.gz", hash = "sha256:eb545fcff725875197837263e977ea257a402056661f09dae08e4b149b030a61", size = 198232, upload-time = "2025-12-16T21:14:33.552Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/5d/c4/b2d28e9d2edf4f1713eb3c29307f1a63f3d67cf09bdda29715a36a68921a/pre_commit-4.5.0-py2.py3-none-any.whl", hash = "sha256:25e2ce09595174d9c97860a95609f9f852c0614ba602de3561e267547f2335e1", size = 226429, upload-time = "2025-11-22T21:02:40.836Z" }, + { url = "https://files.pythonhosted.org/packages/5d/19/fd3ef348460c80af7bb4669ea7926651d1f95c23ff2df18b9d24bab4f3fa/pre_commit-4.5.1-py2.py3-none-any.whl", hash = "sha256:3b3afd891e97337708c1674210f8eba659b52a38ea5f822ff142d10786221f77", size = 226437, upload-time = "2025-12-16T21:14:32.409Z" }, ] [[package]] @@ -4038,6 +4271,21 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/2f/6a/15135b69e4fd28369433eb03264d201b1b0040ba534b05eddeb02a276684/py_rust_stemmers-0.1.5-cp312-none-win_amd64.whl", hash = "sha256:6ed61e1207f3b7428e99b5d00c055645c6415bb75033bff2d06394cbe035fd8e", size = 209395, upload-time = "2025-02-19T13:55:36.519Z" }, ] +[[package]] +name = "pyarrow" +version = "22.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/30/53/04a7fdc63e6056116c9ddc8b43bc28c12cdd181b85cbeadb79278475f3ae/pyarrow-22.0.0.tar.gz", hash = "sha256:3d600dc583260d845c7d8a6db540339dd883081925da2bd1c5cb808f720b3cd9", size = 1151151, upload-time = "2025-10-24T12:30:00.762Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/af/63/ba23862d69652f85b615ca14ad14f3bcfc5bf1b99ef3f0cd04ff93fdad5a/pyarrow-22.0.0-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:bea79263d55c24a32b0d79c00a1c58bb2ee5f0757ed95656b01c0fb310c5af3d", size = 34211578, upload-time = "2025-10-24T10:05:21.583Z" }, + { url = "https://files.pythonhosted.org/packages/b1/d0/f9ad86fe809efd2bcc8be32032fa72e8b0d112b01ae56a053006376c5930/pyarrow-22.0.0-cp312-cp312-macosx_12_0_x86_64.whl", hash = "sha256:12fe549c9b10ac98c91cf791d2945e878875d95508e1a5d14091a7aaa66d9cf8", size = 35989906, upload-time = "2025-10-24T10:05:29.485Z" }, + { url = "https://files.pythonhosted.org/packages/b4/a8/f910afcb14630e64d673f15904ec27dd31f1e009b77033c365c84e8c1e1d/pyarrow-22.0.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:334f900ff08ce0423407af97e6c26ad5d4e3b0763645559ece6fbf3747d6a8f5", size = 45021677, upload-time = "2025-10-24T10:05:38.274Z" }, + { url = "https://files.pythonhosted.org/packages/13/95/aec81f781c75cd10554dc17a25849c720d54feafb6f7847690478dcf5ef8/pyarrow-22.0.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:c6c791b09c57ed76a18b03f2631753a4960eefbbca80f846da8baefc6491fcfe", size = 47726315, upload-time = "2025-10-24T10:05:47.314Z" }, + { url = "https://files.pythonhosted.org/packages/bb/d4/74ac9f7a54cfde12ee42734ea25d5a3c9a45db78f9def949307a92720d37/pyarrow-22.0.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:c3200cb41cdbc65156e5f8c908d739b0dfed57e890329413da2748d1a2cd1a4e", size = 47990906, upload-time = "2025-10-24T10:05:58.254Z" }, + { url = "https://files.pythonhosted.org/packages/2e/71/fedf2499bf7a95062eafc989ace56572f3343432570e1c54e6599d5b88da/pyarrow-22.0.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ac93252226cf288753d8b46280f4edf3433bf9508b6977f8dd8526b521a1bbb9", size = 50306783, upload-time = "2025-10-24T10:06:08.08Z" }, + { url = "https://files.pythonhosted.org/packages/68/ed/b202abd5a5b78f519722f3d29063dda03c114711093c1995a33b8e2e0f4b/pyarrow-22.0.0-cp312-cp312-win_amd64.whl", hash = "sha256:44729980b6c50a5f2bfcc2668d36c569ce17f8b17bccaf470c4313dcbbf13c9d", size = 27972883, upload-time = "2025-10-24T10:06:14.204Z" }, +] + [[package]] name = "pyasn1" version = "0.6.1" @@ -4294,6 +4542,22 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/35/76/c34426d532e4dce7ff36e4d92cb20f4cbbd94b619964b93d24e8f5b5510f/pynacl-1.6.1-cp38-abi3-win_arm64.whl", hash = "sha256:5953e8b8cfadb10889a6e7bd0f53041a745d1b3d30111386a1bb37af171e6daf", size = 183970, upload-time = "2025-11-10T16:02:05.786Z" }, ] +[[package]] +name = "pynndescent" +version = "0.5.13" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "joblib" }, + { name = "llvmlite" }, + { name = "numba" }, + { name = "scikit-learn" }, + { name = "scipy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/7e/58/560a4db5eb3794d922fe55804b10326534ded3d971e1933c1eef91193f5e/pynndescent-0.5.13.tar.gz", hash = "sha256:d74254c0ee0a1eeec84597d5fe89fedcf778593eeabe32c2f97412934a9800fb", size = 2975955, upload-time = "2024-06-17T15:48:32.914Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d2/53/d23a97e0a2c690d40b165d1062e2c4ccc796be458a1ce59f6ba030434663/pynndescent-0.5.13-py3-none-any.whl", hash = "sha256:69aabb8f394bc631b6ac475a1c7f3994c54adf3f51cd63b2730fefba5771b949", size = 56850, upload-time = "2024-06-17T15:48:31.184Z" }, +] + [[package]] name = "pyparsing" version = "3.2.5" @@ -4906,6 +5170,61 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/5d/e6/ec8471c8072382cb91233ba7267fd931219753bb43814cbc71757bfd4dab/safetensors-0.7.0-cp38-abi3-win_amd64.whl", hash = "sha256:d1239932053f56f3456f32eb9625590cc7582e905021f94636202a864d470755", size = 341380, upload-time = "2025-11-19T15:18:44.427Z" }, ] +[[package]] +name = "scanpy" +version = "1.11.5" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "anndata" }, + { name = "h5py" }, + { name = "joblib" }, + { name = "legacy-api-wrap" }, + { name = "matplotlib" }, + { name = "natsort" }, + { name = "networkx" }, + { name = "numba" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "pandas" }, + { name = "patsy" }, + { name = "pynndescent" }, + { name = "scikit-learn" }, + { name = "scipy" }, + { name = "seaborn" }, + { name = "session-info2" }, + { name = "statsmodels" }, + { name = "tqdm" }, + { name = "typing-extensions" }, + { name = "umap-learn" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/d2/a8/285f1a9c995906b7e0ae3c399208fe67cfba8126dd31359dfef0908f6edc/scanpy-1.11.5.tar.gz", hash = "sha256:b2ef5476dfb1144b7dd0fae90b0198699c7988e6b27f083904150642c7ba6b89", size = 14088122, upload-time = "2025-10-21T08:24:43.999Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/32/e9/c1d43543da87cd27e8e2a74db85cf0b6c5cff2d5f04a86bd584d2fbc2bb0/scanpy-1.11.5-py3-none-any.whl", hash = "sha256:fcd383ddcf7acbf7c0ca232c25ad51b00aec9f8d2f7c8954b8c6ee0962257166", size = 2097836, upload-time = "2025-10-21T08:24:41.741Z" }, +] + +[[package]] +name = "scikit-image" +version = "0.25.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "imageio" }, + { name = "lazy-loader" }, + { name = "networkx" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "pillow" }, + { name = "scipy" }, + { name = "tifffile" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/c7/a8/3c0f256012b93dd2cb6fda9245e9f4bff7dc0486880b248005f15ea2255e/scikit_image-0.25.2.tar.gz", hash = "sha256:e5a37e6cd4d0c018a7a55b9d601357e3382826d3888c10d0213fc63bff977dde", size = 22693594, upload-time = "2025-02-18T18:05:24.538Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/35/8c/5df82881284459f6eec796a5ac2a0a304bb3384eec2e73f35cfdfcfbf20c/scikit_image-0.25.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8db8dd03663112783221bf01ccfc9512d1cc50ac9b5b0fe8f4023967564719fb", size = 13986000, upload-time = "2025-02-18T18:04:47.156Z" }, + { url = "https://files.pythonhosted.org/packages/ce/e6/93bebe1abcdce9513ffec01d8af02528b4c41fb3c1e46336d70b9ed4ef0d/scikit_image-0.25.2-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:483bd8cc10c3d8a7a37fae36dfa5b21e239bd4ee121d91cad1f81bba10cfb0ed", size = 13235893, upload-time = "2025-02-18T18:04:51.049Z" }, + { url = "https://files.pythonhosted.org/packages/53/4b/eda616e33f67129e5979a9eb33c710013caa3aa8a921991e6cc0b22cea33/scikit_image-0.25.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9d1e80107bcf2bf1291acfc0bf0425dceb8890abe9f38d8e94e23497cbf7ee0d", size = 14178389, upload-time = "2025-02-18T18:04:54.245Z" }, + { url = "https://files.pythonhosted.org/packages/6b/b5/b75527c0f9532dd8a93e8e7cd8e62e547b9f207d4c11e24f0006e8646b36/scikit_image-0.25.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a17e17eb8562660cc0d31bb55643a4da996a81944b82c54805c91b3fe66f4824", size = 15003435, upload-time = "2025-02-18T18:04:57.586Z" }, + { url = "https://files.pythonhosted.org/packages/34/e3/49beb08ebccda3c21e871b607c1cb2f258c3fa0d2f609fed0a5ba741b92d/scikit_image-0.25.2-cp312-cp312-win_amd64.whl", hash = "sha256:bdd2b8c1de0849964dbc54037f36b4e9420157e67e45a8709a80d727f52c7da2", size = 12899474, upload-time = "2025-02-18T18:05:01.166Z" }, +] + [[package]] name = "scikit-learn" version = "1.8.0" @@ -4926,6 +5245,21 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/24/90/344a67811cfd561d7335c1b96ca21455e7e472d281c3c279c4d3f2300236/scikit_learn-1.8.0-cp312-cp312-win_arm64.whl", hash = "sha256:8c497fff237d7b4e07e9ef1a640887fa4fb765647f86fbe00f969ff6280ce2bb", size = 7641898, upload-time = "2025-12-10T07:08:01.36Z" }, ] +[[package]] +name = "scikit-misc" +version = "0.5.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/02/71/d6d2d1710fb56473817b0520212d33874069952dcb417614f6dd24efb51e/scikit_misc-0.5.2.tar.gz", hash = "sha256:49fa30e4051b341edc7422db66a12c0f59d468729285bfe644d10924dc51be0a", size = 298626, upload-time = "2025-11-03T11:56:30.08Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/56/43/1daca03447aa3bb80e4ca604fac647fc9ce926d928102a2aab9a9426ef18/scikit_misc-0.5.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:36f33c33494bea53196e68ba165a03cc0af19d09f83585adb0dc469d62dff0b7", size = 162933, upload-time = "2025-11-03T11:56:16.738Z" }, + { url = "https://files.pythonhosted.org/packages/59/48/5a486b3a9cff8cd8abc0bdc21a1a23f9c5b73962ef6e66a502b7636fad08/scikit_misc-0.5.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:efc64474adcec7fc373b13519db19682ae1e75fbed0da044efce1ae232a6bb01", size = 150855, upload-time = "2025-11-03T11:56:17.895Z" }, + { url = "https://files.pythonhosted.org/packages/6a/7e/f003fd232ec3c3e29ae565e38536dbdef417c76f7c29a67203e05b800f44/scikit_misc-0.5.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:cd5a6e06864b07e9fe18c2bac756163e87f26615e5ddaa5f6129fd62535b7cfb", size = 182978, upload-time = "2025-11-03T11:56:19.104Z" }, + { url = "https://files.pythonhosted.org/packages/46/35/fe7a3074c1453b2b8cd259d1797fc5146d2383603f9ac838c92bc0bca148/scikit_misc-0.5.2-cp312-cp312-win_amd64.whl", hash = "sha256:4e46fd2e8c46625d1e69ea7fa6f4544d73203387e2601f2bbce82ff0a086ada1", size = 150692, upload-time = "2025-11-03T11:56:20.286Z" }, +] + [[package]] name = "scipy" version = "1.16.3" @@ -4947,6 +5281,27 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ce/69/c5c7807fd007dad4f48e0a5f2153038dc96e8725d3345b9ee31b2b7bed46/scipy-1.16.3-cp312-cp312-win_arm64.whl", hash = "sha256:a8a26c78ef223d3e30920ef759e25625a0ecdd0d60e5a8818b7513c3e5384cf2", size = 25463014, upload-time = "2025-10-28T17:33:25.975Z" }, ] +[[package]] +name = "scrublet" +version = "0.2.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "annoy" }, + { name = "cython" }, + { name = "matplotlib" }, + { name = "numba" }, + { name = "numpy" }, + { name = "pandas" }, + { name = "scikit-image" }, + { name = "scikit-learn" }, + { name = "scipy" }, + { name = "umap-learn" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/bf/f8/52cecc93d2ac7b7ffe53662b60c34b2ad7f97eed7360e3d264080f8b1608/scrublet-0.2.3.tar.gz", hash = "sha256:2185f63070290267f82a36e5b4cae8c321f10415d2d0c9f7e5e97b1126bf653a", size = 15331, upload-time = "2020-12-29T03:02:03.561Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/21/74/82308f7bdcbda730b772a6d1afb6f55b9706601032126c4359afb3fb8986/scrublet-0.2.3-py3-none-any.whl", hash = "sha256:92b8a0206fc710b397c8dd535ac75d26242dea0976d8aa632e3765438b60478a", size = 15491, upload-time = "2020-12-29T03:02:02.62Z" }, +] + [[package]] name = "seaborn" version = "0.13.2" @@ -5006,6 +5361,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/49/65/dea992c6a97074f6d8ff9eab34741298cac2ce23e2b6c74fb7d08afdf85c/sentinels-1.1.1-py3-none-any.whl", hash = "sha256:835d3b28f3b47f5284afa4bf2db6e00f2dc5f80f9923d4b7e7aeeeccf6146a11", size = 3744, upload-time = "2025-08-12T07:57:48.858Z" }, ] +[[package]] +name = "session-info2" +version = "0.2.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/45/4f/6333d79d97ccfea6d2199b7e666f8c53c5a31b64968c948a750a0b5c748a/session_info2-0.2.3.tar.gz", hash = "sha256:6d16e3c6bb72ea52e589da4d722d24798aa3511c34ab8446a131d655cba2e2c9", size = 23859, upload-time = "2025-10-09T12:51:28.07Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/9d/b7/7d4c95c7b8525dabea23c548a1bb068d7a61635d544e8c92c51e784dad63/session_info2-0.2.3-py3-none-any.whl", hash = "sha256:f211d9930f73b485b727b6c4d8b964fa1b634351b3079393738f42be9b4c7f5e", size = 16347, upload-time = "2025-10-09T12:51:26.413Z" }, +] + [[package]] name = "setuptools" version = "80.9.0" @@ -5450,6 +5814,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/6a/9e/2064975477fdc887e47ad42157e214526dcad8f317a948dee17e1659a62f/terminado-0.18.1-py3-none-any.whl", hash = "sha256:a4468e1b37bb318f8a86514f65814e1afc977cf29b3992a4500d9dd305dcceb0", size = 14154, upload-time = "2024-03-12T14:34:36.569Z" }, ] +[[package]] +name = "texttable" +version = "1.7.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/1c/dc/0aff23d6036a4d3bf4f1d8c8204c5c79c4437e25e0ae94ffe4bbb55ee3c2/texttable-1.7.0.tar.gz", hash = "sha256:2d2068fb55115807d3ac77a4ca68fa48803e84ebb0ee2340f858107a36522638", size = 12831, upload-time = "2023-10-03T09:48:12.272Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/24/99/4772b8e00a136f3e01236de33b0efda31ee7077203ba5967fcc76da94d65/texttable-1.7.0-py2.py3-none-any.whl", hash = "sha256:72227d592c82b3d7f672731ae73e4d1f88cd8e2ef5b075a7a7f01a23a3743917", size = 10768, upload-time = "2023-10-03T09:48:10.434Z" }, +] + [[package]] name = "thinc" version = "8.3.10" @@ -5489,6 +5862,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/32/d5/f9a850d79b0851d1d4ef6456097579a9005b31fea68726a4ae5f2d82ddd9/threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb", size = 18638, upload-time = "2025-03-13T13:49:21.846Z" }, ] +[[package]] +name = "tifffile" +version = "2025.12.12" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/31/b9/4253513a66f0a836ec3a5104266cf73f7812bfbbcda9d87d8c0e93b28293/tifffile-2025.12.12.tar.gz", hash = "sha256:97e11fd6b1d8dc971896a098c841d9cd4e6eb958ac040dd6fb8b332c3f7288b6", size = 373597, upload-time = "2025-12-13T03:42:53.765Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d5/5c/e444e1b024a519e488326525f0c154396c6b16baff17e00623f2c21dfc42/tifffile-2025.12.12-py3-none-any.whl", hash = "sha256:e3e3f1290ec6741ca248a5b5a997125209b5c2962f6bd9aef01ea9352c25d0ee", size = 232132, upload-time = "2025-12-13T03:42:52.072Z" }, +] + [[package]] name = "tinycss2" version = "1.4.0" @@ -5543,6 +5928,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/77/b8/0135fadc89e73be292b473cb820b4f5a08197779206b33191e801feeae40/tomli-2.3.0-py3-none-any.whl", hash = "sha256:e95b1af3c5b07d9e643909b5abbec77cd9f1217e6d0bca72b0234736b9fb1f1b", size = 14408, upload-time = "2025-10-08T22:01:46.04Z" }, ] +[[package]] +name = "toolz" +version = "1.1.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/11/d6/114b492226588d6ff54579d95847662fc69196bdeec318eb45393b24c192/toolz-1.1.0.tar.gz", hash = "sha256:27a5c770d068c110d9ed9323f24f1543e83b2f300a687b7891c1a6d56b697b5b", size = 52613, upload-time = "2025-10-17T04:03:21.661Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/fb/12/5911ae3eeec47800503a238d971e51722ccea5feb8569b735184d5fcdbc0/toolz-1.1.0-py3-none-any.whl", hash = "sha256:15ccc861ac51c53696de0a5d6d4607f99c210739caf987b5d2054f3efed429d8", size = 58093, upload-time = "2025-10-17T04:03:20.435Z" }, +] + [[package]] name = "torch" version = "2.9.1" @@ -5581,21 +5975,21 @@ wheels = [ [[package]] name = "tornado" -version = "6.5.3" +version = "6.5.4" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/7f/2e/3d22d478f27cb4b41edd4db7f10cd7846d0a28ea443342de3dba97035166/tornado-6.5.3.tar.gz", hash = "sha256:16abdeb0211796ffc73765bc0a20119712d68afeeaf93d1a3f2edf6b3aee8d5a", size = 513348, upload-time = "2025-12-11T04:16:42.225Z" } +sdist = { url = "https://files.pythonhosted.org/packages/37/1d/0a336abf618272d53f62ebe274f712e213f5a03c0b2339575430b8362ef2/tornado-6.5.4.tar.gz", hash = "sha256:a22fa9047405d03260b483980635f0b041989d8bcc9a313f8fe18b411d84b1d7", size = 513632, upload-time = "2025-12-15T19:21:03.836Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/d3/e9/bf22f66e1d5d112c0617974b5ce86666683b32c09b355dfcd59f8d5c8ef6/tornado-6.5.3-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:2dd7d7e8d3e4635447a8afd4987951e3d4e8d1fb9ad1908c54c4002aabab0520", size = 443860, upload-time = "2025-12-11T04:16:26.638Z" }, - { url = "https://files.pythonhosted.org/packages/ca/9c/594b631f0b8dc5977080c7093d1e96f1377c10552577d2c31bb0208c9362/tornado-6.5.3-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:5977a396f83496657779f59a48c38096ef01edfe4f42f1c0634b791dde8165d0", size = 442118, upload-time = "2025-12-11T04:16:28.32Z" }, - { url = "https://files.pythonhosted.org/packages/78/f6/685b869f5b5b9d9547571be838c6106172082751696355b60fc32a4988ed/tornado-6.5.3-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f72ac800be2ac73ddc1504f7aa21069a4137e8d70c387172c063d363d04f2208", size = 445700, upload-time = "2025-12-11T04:16:29.64Z" }, - { url = "https://files.pythonhosted.org/packages/91/4c/f0d19edf24912b7f21ae5e941f7798d132ad4d9b71441c1e70917a297265/tornado-6.5.3-cp39-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c43c4fc4f5419c6561cfb8b884a8f6db7b142787d47821e1a0e1296253458265", size = 445041, upload-time = "2025-12-11T04:16:30.799Z" }, - { url = "https://files.pythonhosted.org/packages/eb/2b/e02da94f4a4aef2bb3b923c838ef284a77548a5f06bac2a8682b36b4eead/tornado-6.5.3-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:de8b3fed4b3afb65d542d7702ac8767b567e240f6a43020be8eaef59328f117b", size = 445270, upload-time = "2025-12-11T04:16:32.316Z" }, - { url = "https://files.pythonhosted.org/packages/58/e2/7a7535d23133443552719dba526dacbb7415f980157da9f14950ddb88ad6/tornado-6.5.3-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:dbc4b4c32245b952566e17a20d5c1648fbed0e16aec3fc7e19f3974b36e0e47c", size = 445957, upload-time = "2025-12-11T04:16:33.913Z" }, - { url = "https://files.pythonhosted.org/packages/a0/1f/9ff92eca81ff17a86286ec440dcd5eab0400326eb81761aa9a4eecb1ffb9/tornado-6.5.3-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:db238e8a174b4bfd0d0238b8cfcff1c14aebb4e2fcdafbf0ea5da3b81caceb4c", size = 445371, upload-time = "2025-12-11T04:16:35.093Z" }, - { url = "https://files.pythonhosted.org/packages/70/b1/1d03ae4526a393b0b839472a844397337f03c7f3a1e6b5c82241f0e18281/tornado-6.5.3-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:892595c100cd9b53a768cbfc109dfc55dec884afe2de5290611a566078d9692d", size = 445348, upload-time = "2025-12-11T04:16:36.679Z" }, - { url = "https://files.pythonhosted.org/packages/4b/7d/7c181feadc8941f418d0d26c3790ee34ffa4bd0a294bc5201d44ebd19c1e/tornado-6.5.3-cp39-abi3-win32.whl", hash = "sha256:88141456525fe291e47bbe1ba3ffb7982549329f09b4299a56813923af2bd197", size = 446433, upload-time = "2025-12-11T04:16:38.332Z" }, - { url = "https://files.pythonhosted.org/packages/34/98/4f7f938606e21d0baea8c6c39a7c8e95bdf8e50b0595b1bb6f0de2af7a6e/tornado-6.5.3-cp39-abi3-win_amd64.whl", hash = "sha256:ba4b513d221cc7f795a532c1e296f36bcf6a60e54b15efd3f092889458c69af1", size = 446842, upload-time = "2025-12-11T04:16:39.867Z" }, - { url = "https://files.pythonhosted.org/packages/7a/27/0e3fca4c4edf33fb6ee079e784c63961cd816971a45e5e4cacebe794158d/tornado-6.5.3-cp39-abi3-win_arm64.whl", hash = "sha256:278c54d262911365075dd45e0b6314308c74badd6ff9a54490e7daccdd5ed0ea", size = 445863, upload-time = "2025-12-11T04:16:41.099Z" }, + { url = "https://files.pythonhosted.org/packages/ab/a9/e94a9d5224107d7ce3cc1fab8d5dc97f5ea351ccc6322ee4fb661da94e35/tornado-6.5.4-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:d6241c1a16b1c9e4cc28148b1cda97dd1c6cb4fb7068ac1bedc610768dff0ba9", size = 443909, upload-time = "2025-12-15T19:20:48.382Z" }, + { url = "https://files.pythonhosted.org/packages/db/7e/f7b8d8c4453f305a51f80dbb49014257bb7d28ccb4bbb8dd328ea995ecad/tornado-6.5.4-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:2d50f63dda1d2cac3ae1fa23d254e16b5e38153758470e9956cbc3d813d40843", size = 442163, upload-time = "2025-12-15T19:20:49.791Z" }, + { url = "https://files.pythonhosted.org/packages/ba/b5/206f82d51e1bfa940ba366a8d2f83904b15942c45a78dd978b599870ab44/tornado-6.5.4-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d1cf66105dc6acb5af613c054955b8137e34a03698aa53272dbda4afe252be17", size = 445746, upload-time = "2025-12-15T19:20:51.491Z" }, + { url = "https://files.pythonhosted.org/packages/8e/9d/1a3338e0bd30ada6ad4356c13a0a6c35fbc859063fa7eddb309183364ac1/tornado-6.5.4-cp39-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:50ff0a58b0dc97939d29da29cd624da010e7f804746621c78d14b80238669335", size = 445083, upload-time = "2025-12-15T19:20:52.778Z" }, + { url = "https://files.pythonhosted.org/packages/50/d4/e51d52047e7eb9a582da59f32125d17c0482d065afd5d3bc435ff2120dc5/tornado-6.5.4-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e5fb5e04efa54cf0baabdd10061eb4148e0be137166146fff835745f59ab9f7f", size = 445315, upload-time = "2025-12-15T19:20:53.996Z" }, + { url = "https://files.pythonhosted.org/packages/27/07/2273972f69ca63dbc139694a3fc4684edec3ea3f9efabf77ed32483b875c/tornado-6.5.4-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9c86b1643b33a4cd415f8d0fe53045f913bf07b4a3ef646b735a6a86047dda84", size = 446003, upload-time = "2025-12-15T19:20:56.101Z" }, + { url = "https://files.pythonhosted.org/packages/d1/83/41c52e47502bf7260044413b6770d1a48dda2f0246f95ee1384a3cd9c44a/tornado-6.5.4-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:6eb82872335a53dd063a4f10917b3efd28270b56a33db69009606a0312660a6f", size = 445412, upload-time = "2025-12-15T19:20:57.398Z" }, + { url = "https://files.pythonhosted.org/packages/10/c7/bc96917f06cbee182d44735d4ecde9c432e25b84f4c2086143013e7b9e52/tornado-6.5.4-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:6076d5dda368c9328ff41ab5d9dd3608e695e8225d1cd0fd1e006f05da3635a8", size = 445392, upload-time = "2025-12-15T19:20:58.692Z" }, + { url = "https://files.pythonhosted.org/packages/0c/1a/d7592328d037d36f2d2462f4bc1fbb383eec9278bc786c1b111cbbd44cfa/tornado-6.5.4-cp39-abi3-win32.whl", hash = "sha256:1768110f2411d5cd281bac0a090f707223ce77fd110424361092859e089b38d1", size = 446481, upload-time = "2025-12-15T19:21:00.008Z" }, + { url = "https://files.pythonhosted.org/packages/d6/6d/c69be695a0a64fd37a97db12355a035a6d90f79067a3cf936ec2b1dc38cd/tornado-6.5.4-cp39-abi3-win_amd64.whl", hash = "sha256:fa07d31e0cd85c60713f2b995da613588aa03e1303d75705dca6af8babc18ddc", size = 446886, upload-time = "2025-12-15T19:21:01.287Z" }, + { url = "https://files.pythonhosted.org/packages/50/49/8dc3fd90902f70084bd2cd059d576ddb4f8bb44c2c7c0e33a11422acb17e/tornado-6.5.4-cp39-abi3-win_arm64.whl", hash = "sha256:053e6e16701eb6cbe641f308f4c1a9541f91b6261991160391bfc342e8a551a1", size = 445910, upload-time = "2025-12-15T19:21:02.571Z" }, ] [[package]] @@ -5699,11 +6093,11 @@ wheels = [ [[package]] name = "tzdata" -version = "2025.2" +version = "2025.3" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/95/32/1a225d6164441be760d75c2c42e2780dc0873fe382da3e98a2e1e48361e5/tzdata-2025.2.tar.gz", hash = "sha256:b60a638fcc0daffadf82fe0f57e53d06bdec2f36c4df66280ae79bce6bd6f2b9", size = 196380, upload-time = "2025-03-23T13:54:43.652Z" } +sdist = { url = "https://files.pythonhosted.org/packages/5e/a7/c202b344c5ca7daf398f3b8a477eeb205cf3b6f32e7ec3a6bac0629ca975/tzdata-2025.3.tar.gz", hash = "sha256:de39c2ca5dc7b0344f2eba86f49d614019d29f060fc4ebc8a417896a620b56a7", size = 196772, upload-time = "2025-12-13T17:45:35.667Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/5c/23/c7abc0ca0a1526a0774eca151daeb8de62ec457e77262b66b359c3c7679e/tzdata-2025.2-py2.py3-none-any.whl", hash = "sha256:1a403fada01ff9221ca8044d701868fa132215d84beb92242d9acd2147f667a8", size = 347839, upload-time = "2025-03-23T13:54:41.845Z" }, + { url = "https://files.pythonhosted.org/packages/c7/b0/003792df09decd6849a5e39c28b513c06e84436a54440380862b5aeff25d/tzdata-2025.3-py2.py3-none-any.whl", hash = "sha256:06a47e5700f3081aab02b2e513160914ff0694bce9947d6b76ebd6bf57cfc5d1", size = 348521, upload-time = "2025-12-13T17:45:33.889Z" }, ] [[package]] @@ -5718,6 +6112,23 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/c2/14/e2a54fabd4f08cd7af1c07030603c3356b74da07f7cc056e600436edfa17/tzlocal-5.3.1-py3-none-any.whl", hash = "sha256:eb1a66c3ef5847adf7a834f1be0800581b683b5608e74f86ecbcef8ab91bb85d", size = 18026, upload-time = "2025-03-05T21:17:39.857Z" }, ] +[[package]] +name = "umap-learn" +version = "0.5.9.post2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numba" }, + { name = "numpy" }, + { name = "pynndescent" }, + { name = "scikit-learn" }, + { name = "scipy" }, + { name = "tqdm" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/5f/ee/6bc65bd375c812026a7af63fe9d09d409382120aff25f2152f1ba12af5ec/umap_learn-0.5.9.post2.tar.gz", hash = "sha256:bdf60462d779bd074ce177a0714ced17e6d161285590fa487f3f9548dd3c31c9", size = 95441, upload-time = "2025-07-03T00:18:02.479Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6b/b1/c24deeda9baf1fd491aaad941ed89e0fed6c583a117fd7b79e0a33a1e6c0/umap_learn-0.5.9.post2-py3-none-any.whl", hash = "sha256:fbe51166561e0e7fab00ef3d516ac2621243b8d15cf4bef9f656d701736b16a0", size = 90146, upload-time = "2025-07-03T00:18:01.042Z" }, +] + [[package]] name = "universal-pathlib" version = "0.3.7" @@ -5980,6 +6391,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/15/d1/b51471c11592ff9c012bd3e2f7334a6ff2f42a7aed2caffcf0bdddc9cb89/wrapt-2.0.1-py3-none-any.whl", hash = "sha256:4d2ce1bf1a48c5277d7969259232b57645aae5686dba1eaeade39442277afbca", size = 44046, upload-time = "2025-11-07T00:45:32.116Z" }, ] +[[package]] +name = "xarray" +version = "2025.12.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, + { name = "packaging" }, + { name = "pandas" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/d3/af/7b945f331ba8911fdfff2fdfa092763156119f124be1ba4144615c540222/xarray-2025.12.0.tar.gz", hash = "sha256:73f6a6fadccc69c4d45bdd70821a47c72de078a8a0313ff8b1e97cd54ac59fed", size = 3082244, upload-time = "2025-12-05T21:51:22.432Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d5/e4/62a677feefde05b12a70a4fc9bdc8558010182a801fbcab68cb56c2b0986/xarray-2025.12.0-py3-none-any.whl", hash = "sha256:9e77e820474dbbe4c6c2954d0da6342aa484e33adaa96ab916b15a786181e970", size = 1381742, upload-time = "2025-12-05T21:51:20.841Z" }, +] + [[package]] name = "yarl" version = "1.22.0" From 87ac49d7a304757c8b2e07c49138aef3880255cc Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Fri, 19 Dec 2025 12:53:51 -0800 Subject: [PATCH 26/87] initial add --- CLAUDE.md | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..7512e7a --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,82 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +This is an open-source book on building better code for science using AI, authored by Russell Poldrack. The rendered book is published at https://poldrack.github.io/BetterCodeBetterScience/. + +## Build Commands + +```bash +# Install dependencies (uses uv package manager) +uv pip install -r pyproject.toml +uv pip install -e . + +# Build book as HTML and serve locally +myst build --html +npx serve _build/html + +# Build PDF (requires LaTeX) +jupyter-book build book/ --builder pdflatex + +# Clean build artifacts +rm -rf book/_build +``` + +## Testing + +```bash +# Run all tests +pytest + +# Run tests with coverage +pytest --cov=src/BetterCodeBetterScience --cov-report term-missing + +# Run specific test modules +pytest tests/textmining/ +pytest tests/property_based_testing/ +pytest tests/narps/ + +# Run tests with specific markers +pytest -m unit +pytest -m integration +``` + +Test markers defined in pyproject.toml: `unit` and `integration`. + +## Linting and Code Quality + +```bash +# Spell checking (configured in pyproject.toml) +codespell + +# Python linting and formatting +ruff check . +ruff format . + +# Pre-commit hooks (runs codespell) +pre-commit run --all-files +``` + +## Project Structure + +- `book/` - MyST markdown chapters (configured in myst.yml) +- `src/BetterCodeBetterScience/` - Example Python code referenced in book chapters +- `tests/` - Test examples demonstrating testing concepts from the book +- `data/` - Data files for examples +- `scripts/` - Utility scripts +- `_build/` - Build output (gitignored) + +## Key Configuration Files + +- `myst.yml` - MyST book configuration (table of contents, exports, site settings) +- `pyproject.toml` - Python dependencies, pytest config, codespell settings +- `.pre-commit-config.yaml` - Pre-commit hooks (codespell) + +## Contribution Guidelines + +- New text should be authored by a human (AI may be used to check/improve text) +- Code examples should follow PEP8 +- Avoid introducing new dependencies when possible +- Custom words for codespell are in `project-words.txt` From 0541df3198752d857b3f2c97002b647f6e1c25e1 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 20 Dec 2025 12:23:10 -0800 Subject: [PATCH 27/87] initial add --- .../rnaseq/immune_scrnaseq_1_dataprep.ipynb | 406 ++++ .../rnaseq/immune_scrnaseq_2_preprocess.ipynb | 1984 +++++++++++++++++ 2 files changed, 2390 insertions(+) create mode 100644 src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_1_dataprep.ipynb create mode 100644 src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_2_preprocess.ipynb diff --git a/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_1_dataprep.ipynb b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_1_dataprep.ipynb new file mode 100644 index 0000000..3fbe4cb --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_1_dataprep.ipynb @@ -0,0 +1,406 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b317a9e1", + "metadata": {}, + "source": [ + "### Immune system gene expression and aging\n", + "\n", + "We will use a dataset distributed by the [OneK1K](https://onek1k.org/) project, which includes single-cell RNA-seq data from peripheral blood mononuclear cells (PBMCs) obtained from 982 donors, comprising more than 1.2 million cells in total. These data are released under a Creative Commons Zero Public Domain Dedication and are thus free to reuse, with the restriction that users agree not to attempt to reidentify the participants. \n", + "\n", + "The flagship paper for this study is:\n", + "\n", + "Yazar S., Alquicira-Hernández J., Wing K., Senabouth A., Gordon G., Andersen S., Lu Q., Rowson A., Taylor T., Clarke L., Maccora L., Chen C., Cook A., Ye J., Fairfax K., Hewitt A., Powell J. Single cell eQTL mapping identified cell type specific control of autoimmune disease. Science, 376, 6589 (2022)\n", + "\n", + "We will use the data to ask a simple question: how does gene expression in PBMCs change with age?" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "0d3385e0", + "metadata": {}, + "outputs": [], + "source": [ + "import anndata as ad\n", + "from anndata.experimental import read_lazy\n", + "import dask.array as da\n", + "import h5py\n", + "import numpy as np\n", + "import scanpy as sc\n", + "from pathlib import Path\n", + "import os\n", + "\n", + "datadir = Path('/Users/poldrack/data_unsynced/BCBS/immune_aging/')" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "7b67c0b6", + "metadata": {}, + "outputs": [], + "source": [ + "datafile = datadir / 'a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad'\n", + "url = 'https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad'\n", + "dataset_name = 'OneK1K'\n", + "\n", + "if not datafile.exists():\n", + " cmd = f'wget -O {datafile.as_posix()} {url}'\n", + " print(f'Downloading data from {url} to {datafile.as_posix()}')\n", + " os.system(cmd)\n", + "\n", + "load_annotation_index = True\n", + "adata = read_lazy(h5py.File(datafile, 'r'),\n", + " load_annotation_index=load_annotation_index)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "40d53939", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "AnnData object with n_obs × n_vars = 1248980 × 35528\n", + " obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'donor_id', 'pool_number', 'predicted.celltype.l2', 'predicted.celltype.l2.score', 'age', 'tissue_ontology_term_id', 'assay_ontology_term_id', 'disease_ontology_term_id', 'cell_type_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'sex_ontology_term_id', 'is_primary_data', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'\n", + " var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type'\n", + " uns: 'cell_type_ontology_term_id_colors', 'citation', 'default_embedding', 'organism', 'organism_ontology_term_id', 'schema_reference', 'schema_version', 'title'\n", + " obsm: 'X_azimuth_spca', 'X_azimuth_umap', 'X_harmony', 'X_pca', 'X_umap'\n", + " varm: 'PCs'\n" + ] + } + ], + "source": [ + "print(adata)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "7eeca179", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['CD14-low, CD16-positive monocyte' 'CD14-positive monocyte'\n", + " 'CD16-negative, CD56-bright natural killer cell, human'\n", + " 'CD4-positive, alpha-beta T cell'\n", + " 'CD4-positive, alpha-beta cytotoxic T cell'\n", + " 'CD8-positive, alpha-beta T cell'\n", + " 'central memory CD4-positive, alpha-beta T cell'\n", + " 'central memory CD8-positive, alpha-beta T cell'\n", + " 'conventional dendritic cell' 'dendritic cell'\n", + " 'double negative thymocyte'\n", + " 'effector memory CD4-positive, alpha-beta T cell'\n", + " 'effector memory CD8-positive, alpha-beta T cell' 'erythrocyte'\n", + " 'gamma-delta T cell' 'hematopoietic precursor cell'\n", + " 'innate lymphoid cell' 'memory B cell' 'mucosal invariant T cell'\n", + " 'naive B cell' 'naive thymus-derived CD4-positive, alpha-beta T cell'\n", + " 'naive thymus-derived CD8-positive, alpha-beta T cell'\n", + " 'natural killer cell' 'peripheral blood mononuclear cell' 'plasmablast'\n", + " 'plasmacytoid dendritic cell' 'platelet' 'regulatory T cell'\n", + " 'transitional stage B cell']\n" + ] + } + ], + "source": [ + "unique_cell_types = np.unique(adata.obs['cell_type'])\n", + "print(unique_cell_types)" + ] + }, + { + "cell_type": "markdown", + "id": "c763a5e1", + "metadata": {}, + "source": [ + "### Filtering out bad donors" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "a0e4918c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Donor Cell Count Statistics:\n", + "count 981.000000\n", + "mean 1273.170234\n", + "std 322.280557\n", + "min 333.000000\n", + "25% 1070.000000\n", + "50% 1246.000000\n", + "75% 1446.000000\n", + "max 3511.000000\n", + "Name: count, dtype: float64\n", + "cutoff of 894 would exclude 98 donors\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "from scipy.stats import scoreatpercentile\n", + "\n", + "# 1. Calculate how many cells each donor has\n", + "donor_cell_counts = pd.Series(adata.obs['donor_id']).value_counts()\n", + "\n", + "# Print some basic statistics to read the exact numbers\n", + "print(\"Donor Cell Count Statistics:\")\n", + "print(donor_cell_counts.describe())\n", + "\n", + "# 2. Plot the histogram\n", + "plt.figure(figsize=(10, 6))\n", + "# Bins set to 'auto' or a fixed number depending on your N of donors\n", + "plt.hist(donor_cell_counts.values, bins=50, color='skyblue', edgecolor='black')\n", + "\n", + "plt.title('Distribution of Total Cells per Donor')\n", + "plt.xlabel('Number of Cells Captured')\n", + "plt.ylabel('Number of Donors')\n", + "plt.grid(axis='y', alpha=0.5)\n", + "\n", + "# Optional: Draw a vertical line at the propsoed cutoff\n", + "# This helps you visualize how many donors you would lose.\n", + "cutoff_percentile = 10 # e.g., 10th percentile\n", + "min_cells_per_donor = int(scoreatpercentile(donor_cell_counts.values, cutoff_percentile))\n", + "print(f'cutoff of {min_cells_per_donor} would exclude {(donor_cell_counts < min_cells_per_donor).sum()} donors')\n", + "plt.axvline(min_cells_per_donor, color='red', linestyle='dashed', linewidth=1, label=f'Cutoff ({min_cells_per_donor} cells)')\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "8ad05821", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Filtering to keep only donors with at least 894 cells.\n", + "Number of donors excluded: 98\n" + ] + } + ], + "source": [ + "print(f\"Filtering to keep only donors with at least {min_cells_per_donor} cells.\")\n", + "print(f\"Number of donors excluded: {(donor_cell_counts < min_cells_per_donor).sum()}\")\n", + "valid_donors = donor_cell_counts[donor_cell_counts >= min_cells_per_donor].index\n", + "adata = adata[adata.obs['donor_id'].isin(valid_donors)]" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "5a5e8f9b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Number of donors after filtering: 883\n" + ] + } + ], + "source": [ + "print(f'Number of donors after filtering: {len(valid_donors)}')" + ] + }, + { + "cell_type": "markdown", + "id": "81b16da4", + "metadata": {}, + "source": [ + "### Filtering cell types by frequency\n", + "\n", + "Drop cell types that don't have at least 10 cells for at least 95% of people" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "00dff55b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Keeping 8 cell types out of 29\n", + "Cell types to keep: ['central memory CD4-positive, alpha-beta T cell', 'effector memory CD4-positive, alpha-beta T cell', 'effector memory CD8-positive, alpha-beta T cell', 'memory B cell', 'naive B cell', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'natural killer cell', 'regulatory T cell']\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "# 1. Calculate the count of cells for each 'cell_type' within each 'donor_id'\n", + "# We use pandas crosstab on adata.obs, which is loaded in memory.\n", + "counts_per_donor = pd.crosstab(adata.obs['donor_id'], adata.obs['cell_type'])\n", + "\n", + "# 2. Identify cell types to keep\n", + "# Keep if >= 10 cells in at least 90% of donors\n", + "\n", + "min_cells = 10\n", + "percent_donors = 0.9\n", + "donor_count = counts_per_donor.shape[0]\n", + "cell_types_to_keep = counts_per_donor.columns[\n", + " (counts_per_donor >= min_cells).sum(axis=0) >= (donor_count * percent_donors)]\n", + "\n", + "print(f\"Keeping {len(cell_types_to_keep)} cell types out of {len(counts_per_donor.columns)}\")\n", + "print(f\"Cell types to keep: {cell_types_to_keep.tolist()}\")\n", + "\n", + "# 3. Filter the AnnData object\n", + "# We subset the AnnData to include only observations belonging to the valid cell types.\n", + "adata_filtered = adata[adata.obs['cell_type'].isin(cell_types_to_keep)]" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "ba931464", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Final number of donors after filtering: 698\n" + ] + } + ], + "source": [ + "# now drop subjects who have any zeros in these cell types\n", + "donor_celltype_counts = pd.crosstab(adata_filtered.obs['donor_id'], adata_filtered.obs['cell_type'])\n", + "valid_donors_final = donor_celltype_counts.index[\n", + " (donor_celltype_counts >= min_cells).all(axis=1)]\n", + "adata_filtered = adata_filtered[adata_filtered.obs['donor_id'].isin(valid_donors_final)]\n", + "print(f\"Final number of donors after filtering: {len(valid_donors_final)}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "f741845e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loading data into memory (this can take a few minutes)...\n", + "Filtering genes with zero counts...\n" + ] + } + ], + "source": [ + "\n", + "print(\"Loading data into memory (this can take a few minutes)...\")\n", + "adata_loaded = adata_filtered.to_memory()\n", + "\n", + "# filter out genes with zero counts across all selected cells\n", + "print(\"Filtering genes with zero counts...\")\n", + "sc.pp.filter_genes(adata_loaded, min_counts=1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "310f8343", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "AnnData object with n_obs × n_vars = 785021 × 29331\n", + " obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'donor_id', 'pool_number', 'predicted.celltype.l2', 'predicted.celltype.l2.score', 'age', 'tissue_ontology_term_id', 'assay_ontology_term_id', 'disease_ontology_term_id', 'cell_type_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'sex_ontology_term_id', 'is_primary_data', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'\n", + " var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type', 'n_counts'\n", + " uns: 'citation', 'default_embedding', 'organism', 'organism_ontology_term_id', 'schema_reference', 'schema_version', 'title'\n", + " obsm: 'X_azimuth_spca', 'X_azimuth_umap', 'X_harmony', 'X_pca', 'X_umap'\n", + " varm: 'PCs'\n" + ] + } + ], + "source": [ + "print(adata_loaded)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "ecb61446", + "metadata": {}, + "outputs": [], + "source": [ + "adata_loaded.write(datadir / f'dataset-{dataset_name}_subset-immune_filtered.h5ad')" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "10b6c9e6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "total 22022184\n", + "-rw-r--r--@ 1 poldrack staff 4.1G Dec 19 09:03 a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad\n", + "-rw-r--r--@ 1 poldrack staff 6.4G Dec 20 09:36 dataset-OneK1K_subset-immune_filtered.h5ad\n", + "-rw-r--r-- 1 poldrack staff 185B Dec 19 09:02 get_data.sh\n" + ] + } + ], + "source": [ + "!ls -lh /Users/poldrack/data_unsynced/BCBS/immune_aging" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "BetterCodeBetterScience", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_2_preprocess.ipynb b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_2_preprocess.ipynb new file mode 100644 index 0000000..02a3b5e --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_2_preprocess.ipynb @@ -0,0 +1,1984 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c3333c8c", + "metadata": {}, + "source": [ + "Preprocessing based on suggestions from Google Gemini\n", + "\n", + "based on https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html\n", + "\n", + "and https://www.10xgenomics.com/analysis-guides/common-considerations-for-quality-control-filters-for-single-cell-rna-seq-data\n", + "\n", + "Code in this notebook primarily generated using Gemini 3.0" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "5e94c5f6", + "metadata": {}, + "outputs": [], + "source": [ + "import anndata as ad\n", + "import dask.array as da\n", + "import h5py\n", + "import numpy as np\n", + "import scanpy as sc\n", + "from pathlib import Path\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "\n", + "datadir = Path('/Users/poldrack/data_unsynced/BCBS/immune_aging/')" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "3c5b35d2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "AnnData object with n_obs × n_vars = 785021 × 29331\n", + " obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'donor_id', 'pool_number', 'predicted.celltype.l2', 'predicted.celltype.l2.score', 'age', 'tissue_ontology_term_id', 'assay_ontology_term_id', 'disease_ontology_term_id', 'cell_type_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'sex_ontology_term_id', 'is_primary_data', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'\n", + " var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type', 'n_counts'\n", + " uns: 'citation', 'default_embedding', 'organism', 'organism_ontology_term_id', 'schema_reference', 'schema_version', 'title'\n", + " obsm: 'X_azimuth_spca', 'X_azimuth_umap', 'X_harmony', 'X_pca', 'X_umap'\n", + " varm: 'PCs'\n" + ] + } + ], + "source": [ + "adata = ad.read_h5ad(datadir / 'dataset-OneK1K_subset-immune_filtered.h5ad')\n", + "print(adata)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "003237c7", + "metadata": {}, + "outputs": [], + "source": [ + "var_to_feature = dict(zip(adata.var_names, adata.var['feature_name']))\n" + ] + }, + { + "cell_type": "markdown", + "id": "ca1edf40", + "metadata": {}, + "source": [ + "### Quality control\n", + "\n", + "based on https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "a95e8baa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Number of mitochondrial genes: 13\n", + "Number of ribosomal genes: 107\n", + "Number of hemoglobin genes: 12\n" + ] + } + ], + "source": [ + "# mitochondrial genes\n", + "adata.var[\"mt\"] = adata.var['feature_name'].str.startswith(\"MT-\")\n", + "print(f\"Number of mitochondrial genes: {adata.var['mt'].sum()}\")\n", + "\n", + "# ribosomal genes\n", + "adata.var[\"ribo\"] = adata.var['feature_name'].str.startswith((\"RPS\", \"RPL\"))\n", + "print(f\"Number of ribosomal genes: {adata.var['ribo'].sum()}\")\n", + "\n", + "# hemoglobin genes.\n", + "adata.var[\"hb\"] = adata.var['feature_name'].str.contains(\"^HB[^(P)]\")\n", + "print(f\"Number of hemoglobin genes: {adata.var['hb'].sum()}\")\n", + "\n", + "sc.pp.calculate_qc_metrics(\n", + " adata, qc_vars=[\"mt\", \"ribo\", \"hb\"], inplace=True, percent_top=[20], log1p=True\n", + ")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "c79181f1", + "metadata": {}, + "source": [ + "#### Visualization of distributions " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "a4819733", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "\n", + "# 1. Violin plots to see the distribution of QC metrics\n", + "# Note: I am using the exact column names from your adata output\n", + "p1 = sc.pl.violin(adata, ['total_counts', 'n_genes_by_counts', 'pct_counts_mt'],\n", + " jitter=0.4, multi_panel=True)\n", + "\n", + "# 2. Scatter plot to spot doublets and dying cells\n", + "# High mito + low genes = dying cell\n", + "# High counts + high genes = potential doublet\n", + "sc.pl.scatter(adata, x='total_counts', y='n_genes_by_counts', color='pct_counts_mt')" + ] + }, + { + "cell_type": "markdown", + "id": "f44acde9", + "metadata": {}, + "source": [ + "#### Check Hemoglobin (RBC contamination)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "73d83421", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhsAAAGJCAYAAAAjYfFoAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAQwlJREFUeJzt3Qd8VFX6//EnBAhVAZFOCCgdAaWJDVE0ArKCjV1bxBVWBUWxLPx2JeofRUUiKiqWBewiKqwVlCJYQJoURQRWQBapikpoUbj/1/dkZ5hUQshkJnc+79fryr13JnfOnRnnPvec55wT53meZwAAAGFSKlwHBgAAEIINAAAQVgQbAAAgrAg2AABAWBFsAACAsCLYAAAAYUWwAQAAwopgAwAAhBXBBgAACCuCDaCYnX322W45UuvXr7e4uDh75JFHDvvce+65xz0X0SnwWU6cODHsr6XX0GvpNQOSkpLswgsvtOLwySefuNfXv4hdBBuIuMCP4aJFi3J9XBfmVq1aFXu5kNNvv/1m9957r7Vp08YqVapk5cuXd5/N3//+d/vxxx/D9rp79uxxAVRxXbCeeuqpIwoE9P0NLKVLl7Zq1apZu3btbPDgwbZy5cqIlas4RXPZEHmlI10AAEXvn//8pw0dOrRIj/n9999bt27d7IcffrDLLrvMBgwYYGXLlrXly5fbv/71L5syZYqtXr3awhVsKMiRwtQKFebCWb16dbv22msL/DfnnXeeXXPNNabppn799VdbtmyZvfDCC+5YDz30kA0ZMiT43AYNGtjevXutTJkyYS/X1VdfbX/+858tISHBwimvsp111lnuXPVdQewi2AB8SHfXWorKH3/8YRdffLFt3brV1S6cccYZWR6///773QU1ljVp0sSuuuqqLPsefPBB69Wrl91+++3WrFkz69Gjh9uvGpBy5cqFtTy7d++2ihUrWnx8vFsipVSpUmE/V0Q/mlFQYr388suuqlpV+aq21t3bxo0bc22C0d13ly5drEKFCnbiiSfam2++6R6fM2eOderUyR2jadOmNmPGjByv89VXX1n37t3tmGOOcU0H5557rs2fPz/H8wKvoWPVq1fPRowYYRMmTMjRXp6bbdu22V//+lerWbOm+2FWM4XuivPy6KOPurtjvZZe8+uvvz5szoa2Bw0aZFOnTnXvie50W7ZsadOmTbPDeeutt9yd+j/+8Y8cgYbovVHAEWry5MnBz0d3vLoQb9q0KctzdBes91T7e/fu7daPP/54u+OOO+zAgQPuOXrvtE9UuxFortA5BqxatcouvfRS9z3Q+9e+fXt75513cm2u+/zzz10tg46pi3GfPn1s+/btWfIZvvnmG/fdCLxWYWtTjjvuOHv99ddd4Bf6/uSWs7Flyxbr16+f++7os6ldu7ZddNFFwe9OfuUKnJseu+mmm6xGjRruOKGP5fYd/Oijj6xt27buPWvRooW9/fbbBcr9yX7M/MqWV85GUX0/UDJQs4GooarnHTt25Nj/+++/59inH+67777bLr/8crv++uvdxeKJJ55wVbYKDqpUqRJ87s6dO10ynIIRVf8//fTTbv2VV16xW2+91W644Qa74oorbNSoUe6CpYClcuXK7m/1A3rmmWe6i+ldd93lqr2feeYZ90MaCFREP4Zdu3Z1P6rDhg1zF7Hnn3++QFXXqmLW8dauXeuCgYYNG7ofYv3Q/vLLL67dP9SLL75ou3btsoEDB9q+ffvsscces3POOcdWrFjhgpX8fPbZZ+6CoguSzvHxxx+3Sy65xDWN6MKYl8CFW1XyBaGLkS6cHTp0sJEjR7oaEZVTF/rsn48uGsnJye69VPKrAr7Ro0fbCSecYDfeeKO7uOgz07oCA9WwSOvWrYOf0emnn25169Z1TUd679944w13cVKQpL8JdfPNN1vVqlUtNTXVXSzHjBnj3vdJkya5x7Wt5+jCpuBKDve+5icxMdEFhLNnz3Y5L/ou5Uafg85Fr62LtwLQjz/+2H022i5IufS56v0aPny4q9nIz5o1a6xv377u+5+SkuICY/3/oeBTTUJH4kjfs6L8fqCE8IAImzBhgqevYn5Ly5Ytg89fv369Fx8f791///1ZjrNixQqvdOnSWfZ36dLF/f2rr74a3Ldq1Sq3r1SpUt78+fOD+6dPn+72qzwBvXv39sqWLev95z//Ce778ccfvcqVK3tnnXVWcN/NN9/sxcXFeV999VVw308//eRVq1bNHXPdunVZyqQlYMyYMe45L7/8cnBfRkaG17lzZ69SpUreb7/95vbpGHpe+fLlvf/+97/B53755Zdu/2233Rbcl5qa6vaF0rbOZe3atcF9y5Ytc/ufeOIJLz8nn3yyd+yxx+b7nNCy16hRw2vVqpW3d+/e4P733nvPvdbw4cOD+1JSUty+++67L8frtWvXLri9fft29zydV3bnnnuud9JJJ3n79u0L7jt48KB32mmneY0bN87xPevWrZt7PEDvm75Pv/zyS3Cfvm+hn9Hh6LgDBw7M8/HBgwe75+j9Dv0sA9+1nTt3uu1Ro0bl+zp5lStwbmeccYb3xx9/5PpY6HewQYMGbt9bb70V3Pfrr796tWvXdu99ft+jvI6ZV9lmz57tnqt/w/X9QPSjGQVR48knn3R3ctmXwB1sgO7MDx486Go1VBMSWGrVqmWNGzd2d5ChdLelmowANZfozql58+bBmgkJrCsRMnBHpWpm3SE3atQo+DxVb6smRLUEulMV3Q127tzZVUkHqEr/yiuvPOx5f/DBB67sf/nLX4L7VINyyy23WHp6uqtBCaXy6C4+oGPHjq7sOs7hKMFTd4QBem91px0457zoPAO1PYejXkW6K9dddmhbfc+ePV3ewvvvv5/jb3R3HUq1SYcrk/z88882a9Ys911QbU/gu/DTTz+5u2HdvWevmldia2jTgF5Ln/WGDRssXPQdFJUxN2pKUAKlmhpUE1dY/fv3L3B+Rp06dbLU+uh7oARX1SyoSSdcivP7gehBMwqihi6aamvPTlXeoc0ruoDoZlKBRW6yZ/ir7Tp7u/Oxxx5r9evXz7FPAj/2appRLwgFJ9kpUFHAoyYX5T3oQqVgIzvlhxyO/lbnokS67K8ReDxUbuet5EQ1HRSkSj+39/dwF7iCBCQBgfLm9r7pYqIgLZQuOIGcjCMpk6jpSd8FNalpyY0ubKHBWfb3QK8lR3ORPxwFjZJXwKbmNiXYKpFUzQ+nnnqqa/rTxV+BaEGpCa6g9N3M/v+Fvkei5qUjed0jUZzfD0QPgg2UOLrI60fyww8/zPUuLnAXGZDXnV5e+zNrxf2psOesi4DueBVcZQ/SwlWmgn4XRAmDqsnITfaALxKfuxJ49br5BQPKH1LPFSXwTp8+3QVPymdQzc3JJ59coNdRDUlRymtguOJMzoxkTxoUHYINlDhqBtCFQT/cgTuxcNDdlHqvfPfddzkeU+8H1UQELrzqGaK77Oxy25ed/lY9WXThDK3d0GsEHg+lmp3sNL6FkgjDRRfB1157zfUAUgJsfgLl1fumxNVQ2pf9fI7mohdo3lJtlpqIikpRjr6qBE81hanm63BNUfpuq3ZDiz5nNcspGVLve1GXK1ArFHrMwDgpge9SoNZHicqhSZu5NTkVtGzh+H4g+pGzgRJHvRF0t6NukNnvRrWt9vqioNc4//zz7d///neWboPKnH/11VddF9BAzwLdVc+bN8+WLl2aJZ9APV4OR2MvqI080BsiMK6FeteolkY9GULpzjc0D2HBggX25Zdfuu654aJeOieddJLrBaTzzE65CIFeCGoKU9fLcePG2f79+4PPUU3Ut99+69rmj5SCvsBFL5ReRz151ENo8+bNOf4utEvrkVCPluyvVRj6DigXRzUBgfcnN2quU8+i7IGHgpPQ97CoyiUa8VUDsYXm5ainkwKcQBNKIL9n7ty5weepl0tu3bILWrZwfD8Q/ajZQImjH0CNYaE7bAUBSpjUj/K6devcj6cSAFWtXhT0OkpSVWChhDaNl6ALm34kH3744eDz1C1Wd5/qMqgugIGur8oP0AUnv7s+lVfHVFfXxYsXu7tKjQOiboDqUpj9bljNAiqPuv2pHHqOuq2qDOGimgMl5qr2QN2LlZCp7qbar+6aCr50F6xgRPuUf6CujQqUdLENdG3Uud12221H/PpqHtA4EArIVJul5FuNFaJFicV6PxQMKUFStR16PQVF//3vf934IEdK4z+ou60+f73fujhmvwvPTrUC+g4o4NWFW6+rLszK10hLS7MLLrgg37/V+C16X3We+p7pu6zzCE1uLky58qL3UWO7LFy40OWJjB8/3r2eusAGKNjWd1jPu/POO10Aruep1k81NoV5z8Lx/UAJEOnuMECgG93ChQtzfVzd6UK7vgao2566+lWsWNEtzZo1c90Pv/vuu8P+rbr+9ezZs0BdGJcsWeIlJye7bqgVKlTwunbt6n3xxRc5/lbdXs8880wvISHBq1evnjdy5Ejv8ccfd8fcsmVLljJl7yK4detWr1+/fl716tVd91R15QztghvaXVLdI0ePHu3Vr1/fvZZeM9Cl8nBdX3Prnqn3Ql0MC0JdNNU1UeXTe1GuXDnXhXHYsGHe5s2bszx30qRJrouiyqguwFdeeWWWLrui19Vnl11u5dd7ru6Oen+yd4NV1+RrrrnGq1WrllemTBmvbt263oUXXui9+eabh/2eZe+aKfq89P1QF2c9drhusKHdtNWlukqVKu7c1eX1m2++yfH87F1fd+zY4T4bfYf1fqibcadOnbw33ngjy9/lVa78/h/Kq+urjqPu3q1bt3afkV578uTJOf5+8eLFrix63xMTE720tLRcj5lX2XJ7f8Px/UB0i9N/Ih3wAH6lpD/VWujulkQ3ALGKnA2giGgk0FDKHXnppZdcFT+BBoBYRs4GUETU20DJihofQ23QmglVbfd5jf8AALGCYAMoIupVosTOZ5991iWEnnLKKS7gUEIlAMQycjYAAEBYkbMBAADCimADAACEVcznbGiIaI2kp4GTinIoYAAA/M7zPDeCsGYRzj6ZZKiYDzYUaBT1xFIAAMSSjRs3uhm28xLzwUZgKGi9UYF5LgAAwOGpe79u2A83yWDMBxuBphMFGiU62Nizx6xDh8z1hQs1c1WkSwQAiBFxh0lD8EWCqCbg6tq1q5vASJMxaVbCmKMezCtXZi70ZgYARBFf1GxotkzNNHjmmWe6GTYTEhIiXSQAAOCXYEPTW2vKYgUaoqmnAQBA9Ih4sDF37lwbNWqULV682DZv3mxTpkyx3r17Z3nOk08+6Z6zZcsWa9OmjT3xxBPWsWNH99iaNWusUqVK1qtXL9u0aZNdeuml9n//938ROhsAiG3qCvnHH3/YgQMHIl0UFAFNIlm6dOmjHhoi4sGG8isUQFx33XV28cUX53h80qRJNmTIEBs3bpx16tTJxowZY8nJyfbdd99ZjRo13Jf6008/taVLl7rtCy64wDp06GDnnXdeRM4HAGJVRkaGu2nco4R1+EaFChWsdu3aVrZs2ZIbbHTv3t0teUlLS7P+/ftbv3793LaCjvfff9/Gjx9vQ4cOtbp161r79u2DY2VoMiwFHnkFG/v373dLaLcdAMDRD5CoZH3dCWuAJ12YGCix5NdSZWRk2Pbt291n27hx43wH7orqYCM/Okk1rwwbNiy4TyfarVs3mzdvnttWLca2bdts586dduyxx7pmmb/97W95HnPkyJF27733mu/of+oGDQ6tA0Ax/14r4NCNn+6E4Q/ly5d3eZEbNmxwn3G5cuUKdZyo7vq6Y8cO1+5Xs2bNLPu1rfwNUVvSAw884Kbxbt26tYu8LrzwwjyPqcDl119/DS4azMsX9D/3+vWZC/+jA4iQwt75wt+faVTXbBRVU0wodYvVoqRTLSQxAQAQXlEdglavXt21/23dujXLfm3XqlXrqI49cOBAW7lypS3UaJsAACA2gw0lGLVr185mzpwZ3Kc2QW137tz5qI6tWg2NOKqcD1/YuzdzuHItWgcAIITSD9R5omLFilalSpU89/ky2EhPT3e9R7SIMl61/sMPP7htdXt97rnn7IUXXrBvv/3WbrzxRtddNtA7JVprNsaOHWvNmjXLc9HjRergQbNFizIXrQMADuuee+5xvWZCF/1Gh9J1SANGKvn1lVdeyfLY5MmT3ThPBZGRkWEPP/ywG+5BSbSqvT/99NNtwoQJ9vvvvxe4zCrj1KlT7Ug9+uijrmuyrrGrV6/Oc184RDxnY9GiRW5ek9APVVJSUmzixInWt29f1+1m+PDhLgJr27atTZs2LUfSaLQZNGiQWwAA0a1ly5Y2Y8aM4LY6HgS8++679uqrr9pHH33kBpHUmFAa60mBgjoZ/OMf/8jyt/kFGsnJybZs2TL7f//v/7kgQ5N/zp8/3x555BE7+eST3fUtnP7zn/+41gJ1pMhvX1h4MWrs2LFe8+bNvSZNmmjWMu/XX3/1SrT0dE2/lrloHQCK0d69e72VK1e6f7PQ71Fey5E8d8+egj33CKWmpnpt2rTJ8/GHHnrI69u3b3C7Ro0a3oIFC9z6gAEDvLS0tAK9zkMPPeSVKlXKW7JkSY7HMjIyvPT/lb1Bgwbeo48+muVxlU/lDDyua1Zg0XbAU0895TVq1MgrU6aMu7a9+OKLwcey/11KSkqu+47os/U8d+0syDU04jUbkaJmFC0a1EvjcwAAwqBSpbwf69HD7P33D23XqGGW1+ijXbqYffLJoe2kJI2PkPN5hZj1WjUWGohMY0goH1DjMSUmJrrH1OTx7LPPurGcvv/+e9u7d6+deOKJ9tlnn9mSJUvsqaeeKtBrvPLKK26MKNVgZKdxLLQUhJr+NVq2ml40YrY6UYim+hg8eLAbZVuv895777l0g3r16rnWA/3dNddc42pTHnvsMTd+hmpbsu/zbc4GAACRomkw1GSv5vmnn37a5Q1qYs9du3a5x9X0cdVVV7nOBJphXPmDSqZU/qBGtNbfNG3a1DWLaGLQ/AKaZtlyQQrj+OOPd/8qmVO9MgPbaopR+W666SZr0qSJS0nQFCDaH/g7DfuggEJ/p5vs3PaFS8zWbDDOBgAUg/T0vB/731150LZteT83+8BSGsCwCISO0aSBIRV8NGjQwN544w3761//Gkwi1RKgUahVe6DaiBEjRtiKFStcTYJqCTTqdV5Df4eTOlAMGDAgyz4FQKqxiAYxG2z4shmlevVIlwAAsqpYMfLPPQKqMVDNwNq1a3N9fNWqVfbyyy/bV1995ebo0ujVqiG4/PLLXfKoakQqV66c4++aNGni/rYgo3VmD0yOpKdKtKIZxS/0P9727ZlLmP4nBAC/03AM6qGhWU6zUxCgubc0QWilSpVczXggEAj8m1dt+RVXXOF6rShIyU5/qyEdRIGLuqIG6IZYTTuhVKOS/XWaN29un3/+eZZ92tZ4UtGAYAMAELPuuOMOmzNnjq1fv96++OIL69Onj0u6/Mtf/pLjuc8//7wLBgLjaqiZYtasWa77qsar0IU9r4Gxbr31Vvf8c8891zXhqwusEk7VXHPqqae6nA4555xz7KWXXrJPP/3UNc9oGIhAEmhAUlKSG9xSw0EocVXuvPNOl3uiHBIdSwHR22+/7c4vGsRsMwo5GwCA//73vy6w+Omnn1wgccYZZ7jgIZB4GTpNxv333+8CkoCOHTva7bffbj179nQ9RJQ8mpeEhAT7+OOPXVDyzDPPuCBAA3upRuKWW26xVq1aBScLVU2GJhRVE7/G5MheszF69OjggJd169Z1gVLv3r1dfoYSQtUrpWHDhq7Hytlnn23RIE79Xy2GBXI2NDiLuv+UWBqiPJDo9OGHmhc40iUCEEP27dvnLoq6yBV2GnKUvM+2oNfQmK3Z8B0NUT5nzqF1AACiBDkbAAAgrAg2AABAWMVssOG7KeYBAIhSMRtshHuKeQCIRTHe58CXvCL4TGM22AAAFJ3ARGJ78ppIDSVW4DMt6GRxuaE3ip9UqBDpEgCIURp4SgNabfvf/CYaQyIuLi7SxcJR1mgo0NBnqs82++BiR4Jgwy80RPn/hrsFgEjQzKESCDjgD1X+N8Ps0SDYAAAUCdVkaE4Rjabph8nDYK7p5GhqNCzWgw2GKweA8NDFqSguUPAPhiv3y3Dl+/aZXXJJ5vpbb5kxXDAAIMwYrjzWqIbmgw8OrQMAECXo+goAAMKKYAMAAIQVwQYAAAgrgg0AABBWBBsAACCsYjbYYNZXAACKB+Ns+GWcDQAAovQaGrM1GwAAoHgQbAAAgLAi2PALDVd+2WWZi9YBAIgSBBt+oSHK33wzc2G4cgBAFCHYAAAAYUWwAQAAwopgAwAAhJUvpphPSkpy/XtLlSplVatWtdmzZ0e6SAAAwE/BhnzxxRdWqVKlSBcDAABkQzMKAADwd7Axd+5c69Wrl9WpU8fi4uJs6tSpuc5joqaScuXKWadOnWzBggVZHtffdenSxc1z8sorr1hMqlDBLD09c9E6AABRIuLBxu7du61NmzYuoMjNpEmTbMiQIZaammpLlixxz01OTrZt27YFn/PZZ5/Z4sWL7Z133rEHHnjAli9fnufr7d+/343lHrr4QlycWcWKmYvWAQCIEhEPNrp3724jRoywPn365Pp4Wlqa9e/f3/r16+dmaR03bpxVqFDBxo8fH3xO3bp13b+1a9e2Hj16uKAkLyNHjnSTxgSW+vXrh+GsAABA1AQb+cnIyHA1Ft26dQvuU48Tbc+bNy9YM7Jr1y63np6ebrNmzbKWLVvmecxhw4a52ekCy8aNG80X9u83u/bazEXrAABEiajujbJjxw47cOCA1axZM8t+ba9atcqtb926NVgroueqFkS5G3lJSEhwi5pttOhvfOGPP8xeeCFzXU1SCQmRLhEAANEfbBREo0aNbNmyZUf8dwMHDnSLcjbUnAIAAGKwGaV69eoWHx/vai9CabtWrVoRKxcAAPBJsFG2bFlr166dzZw5M7jv4MGDbrtz585HdWw1oSjhNL8mFwAA4INmFCV1rl27Nri9bt06W7p0qVWrVs0SExNdt9eUlBRr3769dezY0caMGeOSQtU75WjQjAIAQIwEG4sWLbKuXbsGtxVciAKMiRMnWt++fW379u02fPhw27Jli7Vt29amTZuWI2kUAABEpzjP8zyLQaG9UVavXu26wWoytxJr926zwNwwGkVUg3sBABBGgdaBw11DYzbYONI3KurpY9yxI3O9enVGEQUARM01NOLNKCgiCi6OPz7SpQAAoGT1RgkneqMAAFA8aEbxSzOKhij/X3KtpaUxgigAIGquoTFbs+E7Gq78qacyF60DABAlCDYAAEBYxWywQc4GAADFg5wNv+RsMM4GAKCYkbMBAACiAsEGAAAIq5gNNsjZAACgeJCz4ZecjYMHzX74IXM9MdGsVMzGkQCAYsJw5bFGwUVSUqRLAQBADtz+AgCAsCLY8IuMDLM778xctA4AQJQgZ8MvORuMswEAKGaMs3EY9EYBAKB4ULNBzQYAAIVCzQYAAIgKBBsAACCsCDYAAEBYEWwAAICwYgRRvyhf3uzrrw+tAwAQJQg2/DRcecuWkS4FAAA5xGwzCuNsAABQPBhnwy/jbGiI8gceyFz/v/8zK1s20iUCAPjcbwW8hhJs+CXYYFAvAEAxY1AvAAAQFQg2AABAWBFsAACAsCLYAAAAYUWwAQAAwopgAwAAhJVvgo09e/ZYgwYN7I477rCYVK6c2YIFmYvWAQCIEr4Zrvz++++3U0891WJWfLwZo6ECAKKQL2o21qxZY6tWrbLu3btHuigAACDago25c+dar169rE6dOhYXF2dTp07NdR6TpKQkK1eunHXq1MkWqKkghJpORo4caTFNw5WPGpW5aB0AgCgR8WBj9+7d1qZNGxdQ5GbSpEk2ZMgQS01NtSVLlrjnJicn27Zt29zj//73v61JkyZuiWm//252112Zi9YBAIgSEc/ZUNNHfs0faWlp1r9/f+vXr5/bHjdunL3//vs2fvx4Gzp0qM2fP99ef/11mzx5sqWnp9vvv//uxmcfPnx4rsfbv3+/W0LHdQcAAD6u2chPRkaGLV682Lp16xbcV6pUKbc9b948t63mk40bN9r69evtkUcecYFJXoFG4PmaNCaw1K9fv1jOBQCAWBXVwcaOHTvswIEDVrNmzSz7tb1ly5ZCHXPYsGFudrrAokAFAAD4uBmlKF177bWHfU5CQoJblCOiRcEMAACI0ZqN6tWrW3x8vG3dujXLfm3XqlXrqI49cOBAW7lypS1cuPAoSwkAAEpssFG2bFlr166dzZw5M7jv4MGDbrtz585HdWzVarRo0cI6MBAWAAD+bkZRD5K1a9cGt9etW2dLly61atWqWWJiouv2mpKSYu3bt7eOHTvamDFjXHfZQO+Uo6nZ0KLeKEoULfE0RPns2YfWAQCIEhEPNhYtWmRdu3YNbiu4EAUYEydOtL59+9r27dtdDxMlhbZt29amTZuWI2k05mm48rPPjnQpAADIIc7zPM9iUGiC6OrVq13PFI3PAQAACibQOnC4a2jMBhtH+kZFPY0a+uyzmesDBpiVKRPpEgEAfO63Al5DI96MgiKi+VAGDcpcVxdggg0AQJSI6t4o4URvFAAAigfNKH5pRtm926xSpcz19HSzihUjXSIAgM/9VsBraMzWbAAAgOJBsAEAAMIqZoMNcjYAACge5GyQswEAQKHQ9TXWJCSYvffeoXUAAKIEwYZflC5t1rNnpEsBAEAOMZuzAQAAikfMBhu+SxDVcOUTJ2YuWgcAIEqQIEqCKAAAhcKgXgAAICoQbAAAgLAi2AAAAGFFsAEAAMIqZoMN3/VGAQAgStEbhd4oAAAUCsOVxxoNUf7GG4fWAQCIEgQbfhqu/LLLIl0KAAByiNmcDQAAUDyo2fCLP/4wmzIlc71Pn8yaDgAAogBXJL/Yv9/s8ssPJYgSbAAASnIzSqNGjeynn37Ksf+XX35xjwEAABxVsLF+/Xo7cOBAjv379++3TZs2WUnAOBsAABSPI6prf+edd4Lr06dPd31rAxR8zJw505KSkqwkGDhwoFsCfYQBAEAUBBu9e/d2/8bFxVlKSkqWx8qUKeMCjdGjRxdtCQEAQOwEGwcPHnT/NmzY0BYuXGjVq1cPV7kAAIBPFKrLwrp164q+JAAAwJcK3T9S+Rlatm3bFqzxCBg/fnxRlA1HomxZswkTDq0DAFCSg417773X7rvvPmvfvr3Vrl3b5XAgwsqUMbv22kiXAgCAogk2xo0bZxMnTrSrr766MH8OAABiSKGCjYyMDDvttNOKvjQ4uuHKp0/PXE9OZgRRAEDJHtTr+uuvt1dffdWigUYtVXNO27ZtrVWrVvbcc89ZzA5XfuGFmYvWAQCIEoW6/d23b589++yzNmPGDGvdurUbYyNUWlqaFZfKlSvb3LlzrUKFCrZ7924XcFx88cV23HHHFVsZAABAEQcby5cvdzUJ8vXXX2d5rLiTRePj412gERgu3fM8twAAgBIcbMyePbvICqBaiVGjRtnixYtt8+bNNmXKlOBIpaHzmOg5W7ZssTZt2tgTTzxhHTt2zNKU0qVLF1uzZo17HoONAQBQwnM2ipKaPhRAKKDIzaRJk2zIkCGWmppqS5Yscc9NTk5243sEVKlSxZYtW+YGG1MuydatW/N8PdV+aD6U0AUAAIRPnFeINoeuXbvm21wya9aswhUmLi5HzUanTp3czKxjx4512xpArH79+nbzzTfb0KFDcxzjpptusnPOOccuvfTSXF/jnnvuceOEZPfrr7/aMcccYyXW7t1mlSplrqenm1WsGOkSAQB87rf/TWZ6uGtooWo2lK+hGobAoqna1R1WNQ8nnXSSFRUdU80r3bp1O1TgUqXc9rx589y2ajF27drl1nWyapZp2rRpnsccNmyYe15g2bhxY5GVFwAAFFHOxqOPPppnrUG67qqLyI4dO9zU9TVr1syyX9urVq1y6xs2bLABAwYEE0NV45FfwJOQkOAWNdto0fF9QUOU/6/2h+HKAQDRpEhHfrrqqqtc4uYjjzxixUWvt3Tp0iP+u4EDB7olUAVU4qn78cCBkS4FAADhTRBV00a5cuWK7HjqVaKurdkTPrVdq1atInsdAAAQZTUbGjQrlJov1G110aJFdvfddxdV2axs2bLWrl07N7tsIGlUCaLaHjRo0FEd23fNKDqPTz/NXD/zTA1AEukSAQBQ+GAje7ODkjaVlKmZYM8///wjOpZyPNauXRvcVvdVNYtUq1bNEhMTXbfXlJQUNyS5mkzGjBnjusv269fPjobvmlH27VM3ocx1eqMAAEp6sDFhwoQiK4BqQ9SVNkDBhSjA0Myyffv2te3bt9vw4cPdoF7qCTNt2rQcSaMAAMBH42wEqFvqt99+69ZbtmxpJ598spUUoc0oq1evZpwNAADCNM5GoYINjd755z//2T755BM3emdgyHDVULz++ut2/PHHm9/eqKhHsAEA8NOgXhrLQgNpffPNN/bzzz+7RROy6UVvueWWoyk3AADwmULlbChnQtPLN2/ePLhPo4iqWeJIE0QjxXe9UQAAiFKFqtlQ99MyGkQqG+3TYyWBeqKsXLnSFi5cGOmiAADga4UKNjTR2eDBg+3HH38M7tu0aZPddtttdu655xZl+VBQCv4efjhzySUQBAAgUgqVIKrJy/70pz+5nA3NwBrY16pVK3vnnXesXr16VlL4JkEUAIAovYYWKmdDAYZmeFXeRmBCNOVvhM7OGu3I2QAAIAprNmbNmuWGCZ8/f36OCEZRzWmnnWbjxo2zMzVcdgnhm5oNBU1LlmSun3IKw5UDAEpm11cNFd6/f/9cD6gX+9vf/mZpaWmFKzGOfrjyjh0zF60DABAljijYWLZsmV1wwQV5Pq5urxpVFAAAoFDBhqZ2z63La0Dp0qXdPCYlgfI1NDZIhw4dIl0UAAB87YiCjbp167qRQvOyfPlyq127tpUEjLMBAEAUBhs9evSwu+++2/blkhOwd+9eS01NtQsvvLAoywcAAGKpN4qaUU455RSLj493vVKaNm3q9qv7a6AbqbrElqTp333TG4WJ2AAAfhhnQ0HEF198YTfeeKMNGzbMAnFKXFycJScnu4CjJAUaAAAg/I54UK8GDRrYBx98YDt37rS1a9e6gKNx48ZWtWrV8JQQBaPE3dTUQ+sAAJTk4cr9IHQE0dWrV5f8ZhQAAKK0GSVmgw3f5WwAAOCnuVEQhQ4eNPv228z15s3NShVqQl8AAIocwYZf7N1r1qpV5jq9UQAAUYTbXwAAEFYEGwAAIKwINgAAQFgRbAAAgLCK2WCDWV8BACgejLPhl3E2mBsFAFDMGGcj1miI8jvuOLQOAECUINjwi7JlzUaNinQpAADIgWAjQsaOHeuW/AwaNMgtAACUZORs+CVnQ8OV//BD5npiIsOVAwDCjpyNWByuvGHDzHUSRAEAUYTbXwAAEFYEGwAAIKxKfLCxceNGO/vss90AXa1bt7bJkydHukgAAMBPORulS5e2MWPGWNu2bW3Lli3Wrl0769Gjh1UkZwEAgKhQ4oON2rVru0Vq1apl1atXt59//plgAwCAKBHxZpS5c+dar169rE6dOhYXF2dTp07NdR6TpKQkK1eunHXq1MkWLFiQ67EWL15sBw4csPr16xdDyQEAQIkINnbv3m1t2rRxAUVuJk2aZEOGDLHU1FRbsmSJe25ycrJt27Yty/NUm3HNNdfYs88+azGpdGmzm27KXLQOAECUiKpBvVSzMWXKFOvdu3dwn2oyNDNrYLTNgwcPupqLm2++2YYOHer27d+/38477zzr37+/XX311fm+hp6rJXRAEh2vxA/qBQBAlA7qFfGajfxkZGS4ppFu3boF95UqVcptz5s3z20rVrr22mvtnHPOOWygISNHjnRvTGChyQUAgPCK6mBjx44dLgejZs2aWfZrWz1P5PPPP3dNLcr1UI8ULStWrMjzmMOGDXMRWGBR11lfUAXV9u2ZS/RUVgEAUPJ7o5xxxhmuaaWgEhIS3KIcES0KZnxhzx6zGjUy1xmuHAAQRaK6ZkPdWOPj423r1q1Z9mtb3VyPxsCBA23lypW2cOHCoywlAAAoscFG2bJl3SBdM2fODO5TLYa2O3fufFTHVq2GRh1V8ikAAPBxM0p6erqtXbs2uL1u3TpbunSpVatWzRITE12315SUFGvfvr117NjRjRaq7rL9+vU76poNLYFMWgAA4NNgY9GiRda1a9fgtoILUYAxceJE69u3r23fvt2GDx/ukkKVADpt2rQcSaMAACA6RdU4G8UpNEF09erVJX+cjd27zSpVylwnQRQAEEXjbMRssHGkb1TUI9gAAETpNTTizSgoIhqiPCXl0DoAAFEiZq9KvhtnIyHBbOLESJcCAIAcaEbxSzMKAADFjGaUWKOYUaOISoUKmtUu0iUCACD6B/XCEVCgoQRRLYGgAwCAKBCzwQYjiAIAUDzI2fBLzgZdXwEAUXoNjdmaDQAAUDwINgAAQFgRbAAAgLCK2WCDBFEAAIoHCaJ+SRDdt8/s6qsz1196yaxcuUiXCADgc78xqFeMUXAxeXKkSwEAQA4x24wCAACKB8EGAAAIK4INv9CgXpoPRYvWAQCIEjEbbNAbBQCA4kFvFL/0RmG4cgBAMaM3ig+MHTvWLXkZNGiQWwAAiGbUbFCzAQBAoTARGwAAiAoEGwAAIKzI2fCL+HizHj0OrQMAECUINvw0XPn770e6FAAA5BCzzSiMswEAQPGgN4pfeqMAAFDM6I0Sa9T1Vd1dtTBcOQAgipCz4Sd79kS6BAAA5EDNBgAACCuCDQAAEFYEGwAAIKwINgAAQFgRbAAAgLDyRbDRp08fq1q1ql166aUWs0qVMuvSJXPROgAAUcIXV6XBgwfbiy++aDGtfHmzTz7JXLQOAECU8EWwcfbZZ1vlypUjXQwAABCNwcbcuXOtV69eVqdOHYuLi7OpU6fmOo9JUlKSlStXzjp16mQLFiyISFmjzdixY61Zs2b5LnoOAAAxPYLo7t27rU2bNnbdddfZxRdfnOPxSZMm2ZAhQ2zcuHEu0BgzZowlJyfbd999ZzVq1Dji19u/f79bQsd1L6kGDRrkFkdDlCclZa6vX585bDkAAFEg4jUb3bt3txEjRrgkz9ykpaVZ//79rV+/fm6WVgUdFSpUsPHjxxfq9UaOHOkmjQks9evXN9/YsSNzAQAgikQ82MhPRkaGLV682Lp16xbcV6pUKbc9b968Qh1z2LBhbna6wLJx48YiLDEAAIi6ZpT87Nixww4cOGA1a9bMsl/bq1atCm4r+Fi2bJlrkqlXr55NnjzZOnfunOsxExIS3KI8EC06PgAAiNFgo6BmzJhxxH8zcOBAtyhnQ80pAAAgBptRqlevbvHx8bZ169Ys+7Vdq1atiJULAAD4JNgoW7astWvXzmbOnBncd/DgQbedVzNJQakJRQmnHTp0KIKSAgCAqG1GSU9Pt7Vr1wa3161bZ0uXLrVq1apZYmKi6/aakpJi7du3t44dO7qur8rNUO+Uo+G7ZhQNUd6+/aF1AACiRMSDjUWLFlnXrl2D2wouRAHGxIkTrW/fvrZ9+3YbPny4bdmyxdq2bWvTpk3LkTQa8zRE+cKFkS4FAAA5xHme51kMCu2Nsnr1atcN9phjjol0sQAAKDECrQOHu4bGbLBxpG8UAAAo3DWUxn2/2LMnc7hyLVoHACBKRDxnI1J8N6iXKqg2bDi0DgBAlKAZxS/NKJqIrVKlzPX0dCZiAwCEHc0oAAAgKhBsAACAsCJnwy85Gz43duxYt+Rn0KBBbgHvVyx/tnyuiEbkbJCzAQBAWK+hMVuz4TtxcWYtWhxaBwAgShBs+EWFCmbffBPpUgAAkEPMBhvkbMQe8hgQC/ieIxqRs+GXnA0AAIoZ42zEGg1R3rJl5sJw5QCAKBKzzSi+owqqlSsPrQMAECWo2QAAAGFFsAEAAMIqZptRYqU3yuEy00866SRbsWJFoR8vSGZ7QbLjD/c6sZQ9Hy29Cfz0uRXHqJvR8rkVhaI4F0Y6RSh6o/ilNwojiAIAihm9UQAAQFSI2WYU39EQ5Q0aHFoHACBKEGz4abjy9esjXQoAAHKgGQUAAIQVwQYAAAgrgg2/2LvXrEOHzEXrAABEiZjN2fDdOBsHD5otWnRovRhFS3/6oihHtJxLcYilc42lMUVKiuIYy6O4joHDY5wNxtkAAKBQGGcDAABEBYINAAAQVgQbAAAgrAg2AABAWMVsbxRfql490iUAACAHgg2/UO+T7dsjXQoAAHKgGQUAAISVL4KN9957z5o2bWqNGze2559/PtLFAQAAfmpG+eOPP2zIkCE2e/ZsN7BIu3btrE+fPnbcccdZTNEQ5d27Z65/+KFZ+fKRLhEAAP6o2ViwYIG1bNnS6tata5UqVbLu3bvbRx99ZDFHQ5TPmZO5FPNw5QAARHWwMXfuXOvVq5fVqVPH4uLibOrUqTmeozlMkpKSrFy5ctapUycXYAT8+OOPLtAI0PqmTZuKrfwAACDKg43du3dbmzZtXECRm0mTJrlmktTUVFuyZIl7bnJysm3btq3YywoAAEpgsKFmjxEjRrg8i9ykpaVZ//79rV+/ftaiRQsbN26cVahQwcaPH+8eV41IaE2G1rUvL/v373cTx4QuAAAgRhNEMzIybPHixTZs2LDgvlKlSlm3bt1s3rx5brtjx4729ddfuyBDCaIffvih3X333Xkec+TIkXbvvfcWS/nhP8UxpXVBprMuKdPDF8XU7Yd7vCDPiZb343CY7rxkipbPbWwU/y5E1RTzytmYMmWK9e7dO0s+xhdffGGdO3cOPu+uu+6yOXPm2Jdffum233nnHbvjjjvs4MGD7rEBAwbkW7OhJUA1G/Xr12eKeQAAwjTFfFTXbBTUn/70J7cUREJCgluUI6LlwIED5hsVKkS6BAAARF/ORn6qV69u8fHxtnXr1iz7tV2rVq2jOvbAgQNt5cqVtnDhQvMF1WSodkMLtRoAgCgS1cFG2bJl3SBdM2fODO5TU4m2Q5tVCkO1Gko47dChQxGUFAAARG0zSnp6uq1duza4vW7dOlu6dKlVq1bNEhMTXbfXlJQUa9++vUsGHTNmjOsuq94pR1uzoSXQ3gQAAHwabCxatMi6du0a3FZwIQowJk6caH379rXt27fb8OHDbcuWLda2bVubNm2a1axZM4KljkL79pldcknm+ltvmZUrF+kSAQAQfb1RilNogujq1avpjQIAQJh6o8RssHGkb1TUI9gAAETpNTSqE0QBAEDJF7PBBr1RAAAoHjSj0IwCAECh0IwCAACiQsS7vkZaoGKnxM/+qpqNAJ2Ln4ZhBwBEpcC183CNJDEfbOzatcv9q8nYfKNOnUiXAAAQY9fSY/MZIDPmczY0/Llml61cubKbdbYoBGaS3bhxY8nOAymgWDvfWDznWDvfWDznWDvfWDzn38JwvgohFGjUqVPHSpXKOzMj5ms29ObUq1cvLMfWhxkLX+BYPd9YPOdYO99YPOdYO99YPOdjivh8CzLlBwmiAAAgrAg2AABAWBFshEFCQoKlpqa6f2NBrJ1vLJ5zrJ1vLJ5zrJ1vLJ5zQgTPN+YTRAEAQHhRswEAAMKKYAMAAIQVwQYAAAgrgg0AABBWBBthmLo+KSnJypUrZ506dbIFCxaYX8ydO9d69erlRorTaKtTp07N8rhyjYcPH261a9e28uXLW7du3WzNmjVWUo0cOdI6dOjgRpetUaOG9e7d27777rssz9m3b58NHDjQjjvuOKtUqZJdcskltnXrViupnn76aWvdunVw0J/OnTvbhx9+6Nvzze7BBx903+1bb73Vl+d8zz33uPMLXZo1a+bLcw21adMmu+qqq9x56bfppJNOskWLFvnytyspKSnHZ6xFn2skP2OCjSI0adIkGzJkiOtatGTJEmvTpo0lJyfbtm3bzA92797tzkkBVW4efvhhe/zxx23cuHH25ZdfWsWKFd3568tdEs2ZM8f9Tzl//nz7+OOP7ffff7fzzz/fvQ8Bt912m7377rs2efJk93wNfX/xxRdbSaXRdHXBXbx4sfsxPuecc+yiiy6yb775xpfnG2rhwoX2zDPPuGArlN/OuWXLlrZ58+bg8tlnn/n2XGXnzp12+umnW5kyZVzgvHLlShs9erRVrVrVl79dCxcuzPL56rdLLrvsssh+xur6iqLRsWNHb+DAgcHtAwcOeHXq1PFGjhzp+Y2+OlOmTAluHzx40KtVq5Y3atSo4L5ffvnFS0hI8F577TXPD7Zt2+bOe86cOcHzK1OmjDd58uTgc7799lv3nHnz5nl+UbVqVe/555/39fnu2rXLa9y4sffxxx97Xbp08QYPHuz2++2cU1NTvTZt2uT6mN/ONeDvf/+7d8YZZ+T5uN9/uwYPHuydcMIJ7jwj+RlTs1FEMjIy3N2gqt9C513R9rx588zv1q1bZ1u2bMly/hovX01Jfjn/X3/91f1brVo1968+b9V2hJ6zqqQTExN9cc4HDhyw119/3dXkqDnFz+erGqyePXtmOTfx4zmreUBNoY0aNbIrr7zSfvjhB9+eq7zzzjvWvn17d2ev5tCTTz7ZnnvuuZj47crIyLCXX37ZrrvuOteUEsnPmGCjiOzYscP9ONesWTPLfm3ri+x3gXP06/lrdmC146s6tlWrVm6fzqts2bJWpUoVX53zihUrXFuuRhm84YYbbMqUKdaiRQvfnq8CKjV7KkcnO7+dsy6gEydOtGnTprn8HF1ozzzzTDdrp9/ONeD7779359q4cWObPn263XjjjXbLLbfYCy+84PvfrqlTp9ovv/xi1157rduO5Gcc87O+AgW98/3666+ztG/7VdOmTW3p0qWuJufNN9+0lJQU17brR5pqe/Dgwa5dW0ndfte9e/fgunJTFHw0aNDA3njjDZcY6Ue6UVDNxgMPPOC2VbOh/5eVn6Hvtp/961//cp+5arIijZqNIlK9enWLj4/PkdWr7Vq1apnfBc7Rj+c/aNAge++992z27NkugTJA56VqSt05+Omcdedz4oknWrt27dzdvpKCH3vsMV+er6qVlcB9yimnWOnSpd2iwErJglrXHZ/fzjmU7nCbNGlia9eu9eXnK+phopq5UM2bNw82H/n1t2vDhg02Y8YMu/7664P7IvkZE2wU4Q+0fpxnzpyZJaLWttq7/a5hw4buyxp6/r/99pvL7C6p5688WAUaakaYNWuWO8dQ+ryV4R56zuoaqx+xknrOudH3eP/+/b4833PPPdc1G6kmJ7DoLli5DIF1v51zqPT0dPvPf/7jLsh+/HxFTZ/Zu6yvXr3a1ej49bdLJkyY4HJUlIsUENHPOKzppzHm9ddfdxnMEydO9FauXOkNGDDAq1KlirdlyxbPD5Sx/9VXX7lFX520tDS3vmHDBvf4gw8+6M733//+t7d8+XLvoosu8ho2bOjt3bvXK4luvPFG79hjj/U++eQTb/PmzcFlz549wefccMMNXmJiojdr1ixv0aJFXufOnd1SUg0dOtT1tlm3bp37DLUdFxfnffTRR74839yE9kbx2znffvvt7vusz/fzzz/3unXr5lWvXt31tPLbuQYsWLDAK126tHf//fd7a9as8V555RWvQoUK3ssvvxx8jt9+uw4cOOA+R/XEyS5SnzHBRhF74okn3AdZtmxZ1xV2/vz5nl/Mnj3bBRnZl5SUFPe4ulbdfffdXs2aNV3Qde6553rfffedV1Lldq5aJkyYEHyOfoxuuukm1z1UP2B9+vRxAUlJdd1113kNGjRw39/jjz/efYaBQMOP51uQYMNP59y3b1+vdu3a7vOtW7eu2167dq0vzzXUu+++67Vq1cr9LjVr1sx79tlnszzut9+u6dOnu9+q3M4hUp8xU8wDAICwImcDAACEFcEGAAAIK4INAAAQVgQbAAAgrAg2AABAWBFsAACAsCLYAAAAYUWwAQAAwopgA0DE3HPPPda2bdsCP3/9+vUWFxfn5i3JyyeffOKek32yKQCRQ7ABFLNXXnnF6tevb1WrVrUhQ4bkuJhqFk5NBFXYi+7ZZ59tt956q8Wq0047zTZv3mzHHnvsUR/rrbfecu+njlWpUiU3Lft9991nP//8s0VzUAZEG4INoBjt2LHDTfn8yCOP2EcffWQvv/yym74+4KabbrIHH3zQjjnmmIiWs6TPwKxZPBWMHY1//OMf1rdvX+vQoYN9+OGH9vXXX9vo0aNt2bJl9tJLLxVZeYFYQLABFKPvv//e3SUHLmJdu3a1b7/91j322muvuemfL7744iJ9TU0Pf8cdd1jdunWtYsWK1qlTJ9fUEDBx4kSrUqWKC3qaNm1qFSpUsEsvvdT27NljL7zwgiUlJblamFtuucUOHDgQ/LudO3faNddc4x7T33Tv3t3WrFmT5bWfe+45V4ujx/v06WNpaWnutfKbzl41B/Xq1bOEhAR3Nz9t2rQcz1u1apWrwShXrpy1atXK5syZk2czSuD8pk+fbs2bN3c1FBdccIGr/cjLggUL7IEHHnDBxahRo9xr6X0477zzXG1HSkpK8LlPP/20nXDCCS7I0fsXGojkVgOlcmlf4DMIlFfTfmtKe71Xer3AtOgq/7333uuCHD1Pi/ZpWivVeCQmJrr3qk6dOu4zAqJS2Kd6AxD0888/e5UrV/aWLFni/fTTT24a62nTprn9J5xwgvfDDz8U6DiaIlz/+3711VeHnbX0+uuv90477TRv7ty5bobPUaNGuZktV69e7R7XLLZlypTxzjvvPFcuTTF/3HHHeeeff753+eWXe998842bNVMzhb7++uvB4/7pT3/ymjdv7o67dOlSLzk52TvxxBO9jIwM9/hnn33mlSpVyr2eZp988sknvWrVqnnHHnts8BipqalemzZtgttpaWneMccc47322mveqlWrvLvuusuVLVDWwHnXq1fPe/PNN72VK1e689N7umPHjiyzE+/cuTPL+Wk69YULF3qLFy925b7iiivyfH9vueUWr1KlSsFzycvbb7/tjq1z0zmOHj3ai4+Pd9N35/U5qVzap3KGlrdTp05u+ne932eeeab7zGTPnj1uaviWLVu62Tm1aN/kyZPde/XBBx94GzZs8L788sscs5kC0YJgAyhmukBpumsFF7rYBqZ2f/TRR92Fvm3btu7CootJXgIXsfLly3sVK1bMsugCHwg2dBHSxW/Tpk1Z/l5TaA8bNix4MdaxQqca/9vf/uamn961a1dwn4IJ7Rdd/PU3n3/+efBxXexVnjfeeMNta/rynj17ZnndK6+8Mt9go06dOt7999+f5W86dOjgpsQOPe8HH3ww+Pjvv//ugo+HHnooz2Aj+/kpONB04nnp3r2717p1a+9wFBD0798/y77LLrvM69GjxxEHGzNmzAg+5/3333f7NB14bu+TKLBp0qTJYQMiIBqUjnTNChBr1JygJUBNAMuXL7cnnnjCTjzxRNecopyDjh072llnnWU1atTI81iTJk1yTQOhrrzyyuD6ihUrXNOHkk6zN60cd9xxwW1V3aspIKBmzZqu2UBNDqH7tm3b5tbV9FO6dGnXJBOg46kZIdAspGaA0PMUnVNojkooJcX++OOPdvrpp2fZr201IYTq3LlzcF3lUPND4HVzk/38ateuHTyX3OhGrCD0mgMGDMhR3scee8yOlJJPQ8snKqOaSXJz2WWX2ZgxY6xRo0auWahHjx7Wq1cv934A0YZvJRBBuugrKVTt/GvXrrU//vjDunTp4h5TgPDll1+6C0helA+hACVU+fLlg+vp6ekWHx9vixcvdv+GCg0klCsSSnkBue1TTkVJlNu55BdQ6L3/7LPP7Pfff8/xt0eiVKnMtLjQ19IxD1fGQHJrfu+3PnsFdDNmzLCPP/7YfY+UX6Lg9WjKDIQDCaJABI0YMcLdlZ5yyimuBkLBRuhFKTQhszBOPvlkdwzdISsoCV1Ue1JYqk1RWRUMBfz000/u4teiRQu3rVqOhQsXZvm77Nuh1ANHSY6ff/55lv3aDhwzYP78+cF1lUPBVPYanqNxxRVXuEDtqaeeyvXxQPKpXjO/8h5//PHu39Bk1PzGCMmLkk9z+y4osFQw+vjjj7tE03nz5rnaLCDaULMBRMjKlStdM8hXX33ltps1a+buhP/1r3+5QEA9LtRj5WjoDl3NKuo1op4VCj62b9/uej6o2r5nz56FOm7jxo3toosusv79+9szzzxjlStXtqFDh7oeL9ovN998s2sGUg8UXRBnzZrlupDm1yX1zjvvtNTUVNfkoZ4oEyZMcBdnjU0S6sknn3Rl0MX+0UcfdT1jrrvuOisqah6666677Pbbb7dNmza55iAFQqp9GjdunJ1xxhk2ePBgV97LL7/cva/dunWzd999195++21X2xAIBk499VTXnblhw4Yu6PvnP/95xOVRk9a6devce6GeOnq/1dymAERlVTORulHr9Ro0aFBk7wNQZCKdNALEooMHD3qnn3666+URStuJiYkuefG5554rkt4oSiAcPny4l5SU5HpO1K5d2+vTp4+3fPnyYAJlaNJmXgmJKSkp3kUXXRTcVg+aq6++2v2tEkOVQBroNRKg3hF169Z1j/fu3dsbMWKEV6tWrTxf58CBA94999zj/kZl1WMffvhhjvN+9dVXvY4dO7oeMi1atAj2/sgrQTT7+U2ZMsU953AmTZrknXXWWa63i5JvlTR63333BY8tTz31lNeoUSNXXiVsvvjii1mOoR4znTt3du+Bkn8/+uijXBNEQ4+pz1X7dL6yb98+75JLLvGqVKni9uucdA7qwaIeKSrbqaeemiXJFIgmcfpP0YUuAJA31YSoxubTTz+NdFEAFCOaUQCEjUZK1UBYGkxMTSgaJCyvPAgA/kXNBoCwUT6DEhd37drlumgqj+OGG26IdLEAFDOCDQAAEFZ0fQUAAGFFsAEAAMKKYAMAAIQVwQYAAAgrgg0AABBWBBsAACCsCDYAAEBYEWwAAAALp/8PeIpPJUg+i0YAAAAASUVORK5CYII=", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "\n", + "plt.figure(figsize=(6, 4))\n", + "sns.histplot(adata.obs['pct_counts_hb'], bins=50, log_scale=(False, True)) # Log scale y to see small RBC populations\n", + "plt.title(\"Hemoglobin Content Distribution\")\n", + "plt.xlabel(\"% Hemoglobin Counts\")\n", + "plt.axvline(5, color='red', linestyle='--', label='5% Cutoff')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "dabf25b2", + "metadata": {}, + "source": [ + "#### Create a copy of the data and apply QC cutoffs\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "be603387", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Before filtering: 785021 cells\n", + "After filtering: 784929 cells\n" + ] + } + ], + "source": [ + "# Create a copy or view to avoid modifying the original if needed\n", + "adata_qc = adata.copy()\n", + "\n", + "# --- Define Thresholds ---\n", + "# Low quality (Empty droplets / debris)\n", + "min_genes = 200 # Standard for immune cells (T-cells can be small)\n", + "min_counts = 500 # Minimum UMIs\n", + "\n", + "# Doublets (Two cells stuck together)\n", + "# Adjust this based on the scatter plot above. \n", + "# 4000-6000 is common for 10x Genomics data.\n", + "max_genes = 6000 \n", + "max_counts = 30000 # Very high counts often indicate doublets\n", + "\n", + "# Contaminants\n", + "max_hb_pct = 5.0 # Remove Red Blood Cells (> 5% hemoglobin)\n", + "\n", + "# --- Apply Filtering ---\n", + "print(f\"Before filtering: {adata_qc.n_obs} cells\")\n", + "\n", + "# 1. Filter Low Quality & Doublets\n", + "adata_qc = adata_qc[\n", + " (adata_qc.obs['n_genes_by_counts'] > min_genes) &\n", + " (adata_qc.obs['n_genes_by_counts'] < max_genes) &\n", + " (adata_qc.obs['total_counts'] > min_counts) &\n", + " (adata_qc.obs['total_counts'] < max_counts)\n", + "]\n", + "\n", + "# 2. Filter Red Blood Cells (Hemoglobin)\n", + "# Only run this if you want to remove RBCs\n", + "adata_qc = adata_qc[adata_qc.obs['pct_counts_hb'] < max_hb_pct]\n", + "\n", + "print(f\"After filtering: {adata_qc.n_obs} cells\")" + ] + }, + { + "cell_type": "markdown", + "id": "faa4a504", + "metadata": {}, + "source": [ + "### Perform doublet detection\n", + "\n", + "According to Gemini:\n", + "\n", + "You must do this before normalization or clustering because doublets create \"hybrid\" expression profiles that can form fake clusters (e.g., a \"cluster\" that looks like a mix of T-cells and B-cells) or distort your normalization factors.\n", + "\n", + "**Important: Run Per Donor**\n", + "\n", + "Since you have multiple people, you must run doublet detection separately for each donor. The doublet rate is a technical artifact of the physical loading of the machine (10x Genomics chip), which varies per run. If you run it on the whole dataset at once, the algorithm will get confused by biological differences between people.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "7c89ced5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Data shape before doublet detection: (784929, 29331)\n", + "Running Scrublet on 698 donors...\n", + "Detected 7335 doublets across all donors.\n", + "predicted_doublet\n", + "False 777594\n", + "True 7335\n", + "Name: count, dtype: int64\n" + ] + } + ], + "source": [ + "\n", + "# 1. Check preliminary requirements\n", + "# Scrublet needs RAW counts. Ensure adata.X contains integers, not log-normalized data.\n", + "# If your main layer is already normalized, use adata.raw or a specific layer.\n", + "print(f\"Data shape before doublet detection: {adata_qc.shape}\")\n", + "\n", + "# 2. Run Scrublet per donor\n", + "# We split the data, run detection, and then recombine.\n", + "# This prevents the algorithm from comparing a cell from Person A to a cell from Person B.\n", + "\n", + "adatas_list = []\n", + "# Get list of unique donors\n", + "donors = adata_qc.obs['donor_id'].unique()\n", + "\n", + "print(f\"Running Scrublet on {len(donors)} donors...\")\n", + "\n", + "for donor in donors:\n", + " # Subset to current donor\n", + " curr_adata = adata_qc[adata_qc.obs['donor_id'] == donor].copy()\n", + " \n", + " # Skip donors with too few cells (Scrublet needs statistical power)\n", + " if curr_adata.n_obs < 100:\n", + " print(f\"Skipping donor {donor}: too few cells ({curr_adata.n_obs})\")\n", + " # We still add it back to keep the data, but mark as singlet (or filter later)\n", + " curr_adata.obs['doublet_score'] = 0\n", + " curr_adata.obs['predicted_doublet'] = False\n", + " adatas_list.append(curr_adata)\n", + " continue\n", + "\n", + " # Run Scrublet\n", + " # expected_doublet_rate=0.06 is standard for 10x (approx ~0.8% per 1000 cells recovered)\n", + " # If you loaded very heavily (20k cells/well), increase this to 0.10\n", + " sc.pp.scrublet(curr_adata, expected_doublet_rate=0.06)\n", + " \n", + " adatas_list.append(curr_adata)\n", + "\n", + "# 3. Merge back into one object\n", + "adata_qc = sc.concat(adatas_list)\n", + "\n", + "# 4. Check results\n", + "print(f\"Detected {adata_qc.obs['predicted_doublet'].sum()} doublets across all donors.\")\n", + "print(adata_qc.obs['predicted_doublet'].value_counts())" + ] + }, + { + "cell_type": "markdown", + "id": "04d2984a", + "metadata": {}, + "source": [ + "#### Visualize doublets\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "5882e417", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "sc.pl.umap(adata_qc, color=['doublet_score', 'predicted_doublet'], size=20)" + ] + }, + { + "cell_type": "markdown", + "id": "25f6e3fb", + "metadata": {}, + "source": [ + "#### Filter doublets\n", + "- Question: how consistent are these results with other methods for doublet detection? https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html#doublet-detection" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "dd9b6443", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "found 7335 predicted doublets\n", + "Remaining cells: 777594\n" + ] + } + ], + "source": [ + "# Check how many doublets were found\n", + "print(f'found {adata_qc.obs[\"predicted_doublet\"].sum()} predicted doublets')\n", + "\n", + "# Filter the data to keep only singlets (False)\n", + "# write back to adata for simplicity\n", + "adata = adata_qc[adata_qc.obs['predicted_doublet'] == False, :]\n", + "print(f\"Remaining cells: {adata.n_obs}\")" + ] + }, + { + "cell_type": "markdown", + "id": "71f3bbaa", + "metadata": {}, + "source": [ + "#### Save raw counts for later use" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "00da60cb", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/var/folders/r2/f85nyfr1785fj4257wkdj7480000gn/T/ipykernel_48619/871672687.py:2: ImplicitModificationWarning: Setting element `.layers['counts']` of view, initializing view as actual.\n", + " adata.layers['counts'] = adata.X.copy()\n" + ] + } + ], + "source": [ + "# set the .raw attribute (standard Scanpy convention)\n", + "adata.layers['counts'] = adata.X.copy()" + ] + }, + { + "cell_type": "markdown", + "id": "d97842a3", + "metadata": {}, + "source": [ + "### Total Count Normalization\n", + "This scales each cell so that they all have the same total number of counts (default is often 10,000, known as \"CP10k\")." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "2d2d9b0c", + "metadata": {}, + "outputs": [], + "source": [ + "# Normalize to 10,000 reads per cell\n", + "# target_sum=1e4 is the standard for 10x data\n", + "sc.pp.normalize_total(adata, target_sum=1e4)" + ] + }, + { + "cell_type": "markdown", + "id": "0efc2045", + "metadata": {}, + "source": [ + "### Log Transformation (Log1p)\n", + "This applies a natural logarithm to the data: log(X+1). This reduces the skewness of the data (since gene expression follows a power law) and stabilizes the variance." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "d412ddd8", + "metadata": {}, + "outputs": [], + "source": [ + "# Logarithmically transform the data\n", + "sc.pp.log1p(adata)" + ] + }, + { + "cell_type": "markdown", + "id": "bd5a1cde", + "metadata": {}, + "source": [ + "### select high-variance features\n", + "\n", + "according to Gemini:\n", + "For a large immune dataset (PBMCs, ~1.2M cells), the standard defaults often fail to capture the subtle biological variation needed to distinguish similar cell types (like CD4+ T-cell subsets).\n", + "\n", + "Here are the reasonable parameters and, more importantly, the **immune-specific strategy** you should use.\n", + "\n", + "#### The Recommended Parameters\n", + "\n", + "For a dataset of your size, the **`seurat_v3`** flavor is generally superior because it selects genes based on standardized variance (handling the mean-variance relationship better than the dispersion-based method).\n", + "\n", + "* **`flavor`**: `'seurat_v3'` (Requires **RAW integer counts** in `adata.X` or a layer)\n", + "* **`n_top_genes`**: **2000 - 3000** (3000 is safer for immune data to capture rare cytokines/markers)\n", + "* **`batch_key`**: **`'donor_id'`** (CRITICAL)\n", + " * *Why?* With 1.2M cells across many people, you have massive batch effects. If you don't set this, \"highly variable genes\" will just be the genes that differ between Person A and Person B (e.g., HLA genes, gender-specific genes like XIST/RPS4Y1), rather than genes distinguishing cell types.\n", + "\n", + "#### The \"Expert\" Trick: Blocklisting Nuisance Genes\n", + "In immune datasets, \"highly variable\" does not always mean \"biologically interesting.\" You often need to **exclude** specific gene families from the HVG list *after* calculation but *before* PCA, or they will hijack your clustering:\n", + "1. **TCR/BCR Variable Regions (IG*, TR*):** These are hyper-variable by definition (V(D)J recombination). If you keep them, T-cells will cluster by **clone** (clonotype) rather than by **phenotype** (state).\n", + "2. **Mitochondrial/Ribosomal:** Usually technical noise.\n", + "3. **Cell Cycle:** (Optional) If you don't want proliferating cells to cluster separately.\n", + "\n", + "\n", + "\n", + "#### Why 3000 genes instead of 2000?\n", + "Immune cells are dense with specific markers. The difference between a *Naive CD8 T-cell* and a *Central Memory CD8 T-cell* might rest on a handful of genes (e.g., *CCR7, SELL, IL7R* vs *GZMK*). If you limit to 2000 genes in a massive, diverse dataset, you might accidentally drop a subtle marker required to resolve these fine-grained states." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "5a64c2c4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Blocked 0 immune receptor genes from HVG list.\n", + "Final HVG count: 3000\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/poldrack/Dropbox/code/BetterCodeBetterScience/.venv/lib/python3.12/site-packages/scanpy/preprocessing/_pca/__init__.py:226: FutureWarning: Argument `use_highly_variable` is deprecated, consider using the mask argument. Use_highly_variable=True can be called through mask_var=\"highly_variable\". Use_highly_variable=False can be called through mask_var=None\n", + " mask_var_param, mask_var = _handle_mask_var(\n" + ] + } + ], + "source": [ + "\n", + "import scanpy as sc\n", + "import pandas as pd\n", + "\n", + "\n", + "# 2. Run Highly Variable Gene Selection\n", + "# batch_key is critical here to find genes variable WITHIN donors, not BETWEEN them.\n", + "sc.pp.highly_variable_genes(\n", + " adata,\n", + " n_top_genes=3000,\n", + " flavor='seurat_v3',\n", + " batch_key='donor_id',\n", + " span=0.8, # helps avoid numerical issues with LOESS\n", + " layer='counts', # Change this to None if adata.X is raw counts\n", + " subset=False # Keep False so we can manually filter the list below\n", + ")\n", + "\n", + "# 3. Filter out \"Nuisance\" Genes from the HVG list\n", + "# We don't remove the genes from the object, we just set their 'highly_variable' status to False\n", + "# so they aren't used in PCA.\n", + "\n", + "# A. Identify TCR/BCR genes (starts with IG or TR)\n", + "# Regex: IG or TR followed by a V, D, J, or C gene part\n", + "import re\n", + "immune_receptor_genes = [\n", + " name for name in adata.var_names \n", + " if re.match(r'^(IG[HKL]|TR[ABDG])[VDJC]', name)\n", + "]\n", + "\n", + "# B. Identify Ribosomal/Mitochondrial (if not already handled)\n", + "mt_genes = adata.var_names[adata.var_names.str.startswith('MT-')]\n", + "rb_genes = adata.var_names[adata.var_names.str.startswith(('RPS', 'RPL'))]\n", + "\n", + "# C. Manually set them to False\n", + "genes_to_block = list(immune_receptor_genes) + list(mt_genes) + list(rb_genes)\n", + "\n", + "# Using set operations for speed\n", + "adata.var.loc[adata.var_names.isin(genes_to_block), 'highly_variable'] = False\n", + "\n", + "print(f\"Blocked {len(immune_receptor_genes)} immune receptor genes from HVG list.\")\n", + "print(f\"Final HVG count: {adata.var['highly_variable'].sum()}\")\n", + "\n", + "# 4. Proceed to PCA\n", + "sc.tl.pca(adata, svd_solver='arpack', use_highly_variable=True)\n" + ] + }, + { + "cell_type": "markdown", + "id": "2055120b", + "metadata": {}, + "source": [ + "### Dimensionality reduction" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "93802dfb", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2025-12-20 10:37:58,784 - harmonypy - INFO - Computing initial centroids with sklearn.KMeans...\n", + "2025-12-20 10:38:23,074 - harmonypy - INFO - sklearn.KMeans initialization complete.\n", + "2025-12-20 10:38:25,994 - harmonypy - INFO - Iteration 1 of 10\n", + "2025-12-20 10:49:03,832 - harmonypy - INFO - Iteration 2 of 10\n", + "2025-12-20 10:59:50,478 - harmonypy - INFO - Iteration 3 of 10\n", + "2025-12-20 11:10:24,641 - harmonypy - INFO - Iteration 4 of 10\n", + "2025-12-20 11:20:50,945 - harmonypy - INFO - Iteration 5 of 10\n", + "2025-12-20 11:30:24,097 - harmonypy - INFO - Converged after 5 iterations\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Harmony integration successful. Using corrected PCA.\n" + ] + } + ], + "source": [ + "import scanpy.external as sce\n", + "\n", + "# 1. Run Harmony\n", + "# This adjusts the PCA coordinates to mix donors together while preserving biology.\n", + "# It creates a new entry in obsm: 'X_pca_harmony'\n", + "try:\n", + " sce.pp.harmony_integrate(adata, key='donor_id', basis='X_pca', adjusted_basis='X_pca_harmony')\n", + " use_rep = 'X_pca_harmony'\n", + " print(\"Harmony integration successful. Using corrected PCA.\")\n", + "except ImportError:\n", + " print(\"Harmony not installed. Proceeding with standard PCA (Warning: Batch effects may persist).\")\n", + " print(\"To install: pip install harmony-pytorch\")\n", + " use_rep = 'X_pca'" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "55f80ce1", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Reality check: Check if PC1 is just \"Cell Size\":\n", + "\n", + "sc.pl.pca(adata, color=['total_counts', 'cell_type'], components=['1,2'])" + ] + }, + { + "cell_type": "markdown", + "id": "406256c8", + "metadata": {}, + "source": [ + "PC1 separates cell types and isn't driven only by the number of cells." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "5cde35ac", + "metadata": {}, + "outputs": [], + "source": [ + "# 2. Compute Neighbors\n", + "# n_neighbors: 15-30 is standard. Higher (30-50) is better for large datasets to preserve global structure.\n", + "# n_pcs: 30-50 is standard.\n", + "sc.pp.neighbors(adata, n_neighbors=30, n_pcs=40, use_rep=use_rep)\n", + "\n", + "# 3. Compute UMAP\n", + "# This projects the graph into 2D for you to look at.\n", + "sc.tl.umap(adata, init_pos='X_pca_harmony')" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "ce2bc327", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "sc.pl.umap(adata, color=\"total_counts\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "2209ee21", + "metadata": {}, + "source": [ + "### Clustering\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "35247ba6", + "metadata": {}, + "outputs": [], + "source": [ + "# 4. Run Clustering (Leiden algorithm)\n", + "# We run multiple resolutions so you can choose the best one later.\n", + "#sc.tl.leiden(adata, resolution=0.5, key_added='leiden_0.5')\n", + "sc.tl.leiden(adata, resolution=1.0, key_added='leiden_1.0',\n", + " flavor=\"igraph\", n_iterations=2)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "1e9ab973", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAABWAAAAGvCAYAAADVBnyvAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3QV8XGW6BvDnnDM+mXgaqRvQlhaKU6y4LL6Lu+tiiy/uLssuuotedGEXWFhkcXcvlLo3buMzR+7v/dIJSZq0TRvP87+/3GTszJlJ7+XNM+/3fprjOA6IiIiIiIiIiIiIqMvpXX9IIiIiIiIiIiIiIhIMYImIiIiIiIiIiIi6CQNYIiIiIiIiIiIiom7CAJaIiIiIiIiIiIiomzCAJSIiIiIiIiIiIuomDGCJiIiIiIiIiIiIugkDWCIiIiIiIiIiIqJuwgCWiIiIiIiIiIiIqJswgCUiIiIiIiIiIiLqJgxgiahLaJqGq666qvnyo48+qq5bsGBBr54XEREREVFft7a1s9Tf8ri1qdeJiKjnMIAloj7jk08+UUVhfX19b58KERERERF1g+uvvx777rsviouL1yoUTiaTuOiii1BWVga/348tt9wS//vf/7rtfImIugIDWCLqUwHs1VdfzQCWiIiIiGgNXHbZZYjH4+hv5/zll19i6tSpa/X4Y489FnfccQeOOOII3H333TAMA3vttRc++uijLj9XIqKu4uqyIxERERERERFRj3G5XOqrP5k/fz5GjRqF6upqFBUVdeqxX3zxBZ555hnceuutOP/889V1Rx99NDbccENceOGFqqGDiKgvYgcs0SC3dOlSnHDCCWoJj9frxejRo3HaaachlUqp26Ub9ZxzzsHw4cPV7ePGjcPNN98M27a79Dxk6dEFF1ygfpZzkOVImTlYO+ywAzbaaKN2H7f++utj9913Vz/LfeUxt912G+68806MHDlSLUuSx//0008rPXbmzJn4wx/+gPz8fPh8Pmy22WZ4+eWXu/R1ERERERGtjddeew3bbbcdgsEgQqEQfve732HGjBmrnQErS/TPPfdcFW7K42S5/5IlSzr8W+D4449X4wCk1p80aRIefvjhVvd577331HM899xzanzAsGHDVO288847Y86cOZ1+XRK+rq3nn39edbyefPLJzdfJucjfM59++ikWL1681scmIupO/eujMiLqUsuWLcMWW2yhQlYpYjbYYANVhElhE4vFYJqmCi/lulNOOQUjRoxQnypfcsklWL58Oe66664uO5cDDzwQs2bNwtNPP63C08LCQnW9FI5HHXUUTjrpJBWiyqfbGbJ0SR4jy5haevzxxxEOh3HGGWcgkUiopUk77bQTfvzxR1VcCilet9lmGwwdOhQXX3yxKmylqNx///3xwgsv4IADDuiy10ZERERE1BlPPPEEjjnmGNVoIM0PUpvfd9992HbbbfHtt9+uMsQ88cQT8X//9384/PDDMW3aNLzzzjsqvG2roqICW221lQpXzzzzTFV3S+grYWZjY6Nqwmjppptugq7rqvO0oaEBt9xyixoD8Pnnn6OnyGtfb731kJ2d3ep6+ZtGfPfdd6pxhIioz3GIaNA6+uijHV3XnS+//HKl22zbdq699lonGAw6s2bNanXbxRdf7BiG4SxatKj5Ovl/J1deeWXz5UceeURdN3/+/DU+n1tvvbXdx9TX1zs+n8+56KKLWl1/1llnqfOLRCLqsjxOHu/3+50lS5Y03+/zzz9X15977rnN1+28887O5MmTnUQi0eo1T5s2zRk/fvwanzMRERER0bpqWTuHw2EnNzfXOemkk1rdp7y83MnJyWl1vdTfLf+s/+6779Tl008/vdVjDz/88JXq9RNOOMEpLS11qqurW9330EMPVc8Ti8XU5XfffVc9dsKECU4ymWy+3913362u//HHH9fqNVdVVa10TqszadIkZ6eddlrp+hkzZqhj3X///Wt1LkRE3Y0jCIgGKRkh8OKLL2KfffZRS+/bkk/C//nPf6plT3l5eWpGU+Zrl112gWVZ+OCDD3rkXHNycrDffvup7timrBfq+Z999lnVsSrdqy3JddLZ2vITcdkd9b///a+6XFtbqzoBDj74YNUpm3ldNTU1qstg9uzZquuXiIiIiKin/e9//1Mr1A477LBWNbgsvZea9t133+3wsZl696yzzmp1fdtuVqmpZdWX/C0gP7d8HqmHpcP1m2++afWY4447Dh6Pp/my/J0g5s2bh54iG47JqIS2ZAxB5nYior6IIwiIBqmqqiq1tKjlkv62JIj84YcfOhyOX1lZiZ4iw/UlcP3www+x/fbb46233lLLpmQ8QVvjx49f6TpZqiQjBoTMqpJC8/LLL1dfHb22liEuEREREVFPkBpcyAit9rRdft/SwoUL1ZiAsWPHrrRvQtu/BSTkffDBB9XXmtT6Mo6sJWnSEHV1degpsr+DzLhtS8aOZW4nIuqLGMAS0Sq7ZHfddVe1o2h7JNTsKfJJvMxvlXlWEsDK95KSEtWN21mZDcRkflVmA6+2ZLMxIiIiIqKelqlVZQ6s1LttuVyuLnuOI488Us2abc+UKVNaXZYO3PZkVqj1hNLS0nZXqsn+FEI2FiYi6osYwBINUtLVKp+ey8ZWHZFPziORyFqFnGuj7Q6ubQs+2Ujg0UcfVRsRyPgE2ZirvUIw0zXQkmzWldmsYMyYMeq72+3usddGRERERLQmMt2rQ4YM6XStOnLkSBWuzp07t1XX66+//rrS3wKhUEiN9epP9fDGG2+sRjDISr6WncCZjcDkdiKivogzYIkGKVmaJLNS//Of/+Crr75q95NsmZH66aef4o033ljpdlmyZJpml55TZparHLs9Mm5AljidcsopKhiWT+zbI+Fsy0/Gv/jiC1WU7bnnns3F7PTp0/HAAw80f1redkkWEREREVFvkBVaEi7ecMMNSKfTnapVM/XuX/7yl1bX33XXXa0uSxPD73//ezUHtr2GjL5QD8s82pkzZyIWizVf94c//EGFxi3HJshIgkceeUTNxx0+fHgvnS0R0aqxA5ZoEJOi7s0338QOO+yAk08+GRMmTFCBpGy+9dFHH+GCCy7Ayy+/jL333hvHHnssNt10U0SjUfz44494/vnnsWDBAhQWFnbZ+cjxxZ///GcceuihqkNVNgbIBLNTp05VM2vl/ORcN9lkkw7HB2y77bY47bTTVEEmBWdBQUGrUQp/+9vf1H0mT56sOmmlK1ZmykrgvGTJEnz//fdd9rqIiIiIiNaUhK/33Xefaj6QelfqYulYXbRoEV599VVss802+Otf/9ruY6UDVDbvuvfee9VGWtOmTcPbb7+t9kBo66abblLdpBJcSj08ceJEtVmtbL4l+y3Iz91BRivIrNpMsCob+1533XXqZ3nN0sUr5DVeffXV6hyleULIuR500EG45JJL1Ixaqfsfe+wx9XfJP/7xj245XyKirsAAlmgQk02mpDNUNqJ68skn1VIeuU4+OQ8EAmqX0/fff18FtRJ6Pv7446oglNmvUgzl5OR06flsvvnmuPbaa3H//ffj9ddfV8un5s+f3xzAZjbjkiC1vc23Wt5HOnwleJXCbIsttlAFnMyMypACUzp/5XXIWIOamhrVGSsh7xVXXNGlr4uIiIiIqDNk9JbMM5WQ9NZbb1VNBVKnb7fddjjuuONW+diHH35YBbZS38vKMNnMS4Lbtt2hsr+CrBS75ppr8K9//UuFttK0MGnSJDXyq7tIUCp/Y2RIwCpfQhokMgFsR+RvEvn7RYJcWR0ns2pfeeUVtU8EEVFfpTk9OTGbiGgd3X333Tj33HPVp9xtd2KV60aPHq2KVNlgi4iIiIiIiIiot3EGLBH1G/J5kXxiLiMT2oavRERERERERER9EUcQEFG3kw2z5GtVZJmUbAbQHpk7K7NoZWmSzJ996aWXuulMiYiIiIios+LxuJo5uyr5+flqxBkR0WDEAJaIut1tt92mZq2uisx6HTVqVLu3yS6sMgcrNzcXl156Kfbdd99uOlMiIiIiIuqsZ599drWzaVtupkVENNhwBiwRdbt58+apr1WRgfs+n6/HzomIiIiIiLrG8uXLMWPGjFXeZ9NNN0VeXl6PnRMRUV/CAJaIiIiIiIiIiIiom3ATLiIiIiIiIiIiIqKBNAPWtm0sW7YMoVAImqb1xikQERFRHyYLdMLhMMrKyqDr/LyYqCexViciIqLVYb3eDwJYKeiGDx/eG09NRERE/cjixYsxbNiw3j4NokGFtToRERGtKdbrfTiAlU/TM7+k7Ozs3jgFIiIi6sMaGxtVAJSpGYio57BWJyIiotVhvd4PAtjMUiYp6FjUERERUUe4/Jmo57FWJyIiojXFen3NcEgDERERERERERERUTdhAEtERERERERERETUTRjAEhEREREREREREXUTBrBERERERERERERE3YQBLBEREREREREREVE3YQBLRERERERERERE1E0YwBIRERERERERERF1EwawRERERERERERERN2EASwRERERERERERFRN2EAS0RERERERERERNRNGMASERERERERERERdRMGsERERERERERERETdhAEsERERERERERERUTdhADuIRFKRVpeXNC5BOBnutfMhIiIiIqLf2Amz+WfHtGHWJ2HF0716TkRERLTuXF1wDOpDbMeGrunNP8fSMdQl6hB0BxFJR1Abr1XXp+00loWXYUT2CIS8od4+bSIiIiKiQcGxHWi61hy4ah4D6aoYNJ8BOA6scBJw6UjVxmG4dOh+Dwy/u7dPm4iIiNYBA9gBwLRNuPSmX+W8+nlIWknAaQpgy6PlKmz1G34VuKacFPI8eSgOFWNm7UxUxapUSCvXDwkMQY43B1merN5+SUREREREA4ZjOdCMptA1sbgRmlcHLMAKp+CYacDR4EBDalkEus+Au9APK2kjmUjDZWgwoym4Qx7A0GAEPdA9Rm+/JCIiIuoEBrD9lOM4iJkxGJqBmngNSoIlSNkplGWV4aMlH6HMX4ZPl32KqmgV3C43quPVCBgBlMfLodkaxhWMU49bXL8Yny/9HIdMPER1yspxZVRBSVZJb79EIiIiIqJ+SWpqTdNgRVPQXAbslAXNrcHwuaG7dFjhNKyEhfjCOjVqAG7Aqk9LDwWchiiQG0BgaDZiM6th5PuhL44ge/NSGEE30nVxuHN80H38U46IiKi/4H+1+ykJX+PRKhTmjlLjBX6o+gFF/iJ8V/Ed5tTPwRPLnkCOyw/N8KIqXgW/y4/6eD3MdBQNZkJ1wzYkG5Dnz4PX7cUHSz5Aob9QHWtc3jhke7NV0SiPIyIiIiKiNeekbdgSrNoOHNuG5tGRWBqGK8eL5PIIkssakfy5CigLAXFTlrQBKQuw04Bs27C8AeHqBuhxCWbjsLL9iHy7HN4R2UDSgr5+oXoehrBERET9A/+L3U9JUFqtV8OKVSCejquRAw98+wA8mgdxO666WGUcgQceOEgjARMOHMQQU49vTDTCgoWaWA0KtHwsCy5THbI5nhw0Jhsxv2E+RuaM7O2XSURERETU76gRAR4DqcoYNI8NO2nBbIij7vOlcGUZMGsSgOkAsxsBmSYgra/J1sfQlsmVQAoaPFLNF/qh1yTgLg0iVR+HK2HCKwEuERER9XkMYPuhimgFUlYKny/+HEVZRbAsC3Pq5mBy0WQ8+sOjiCVjiCKKhJ0Amvbj+o0tFaGMnLJUoefVgLATwXvL3sPMupnYvHRzFPgKsPPonZHl5ixYIiIiIqLOzntN18ZV92tibi00mdka8iAxvxHeoAfJGRVAfVNNLrW5k1bjYGEiDR/cSCIFGw588ECDBgM27DoN+L4W8SBgTi2BrzAI11BPb79UIiIiWkMMYPsR6WqVDbKk+3Vhw0IUBgtRHi5Xm2ktiSxBNBlFRbICKaSaHtAyfJUP0LU212lA0rYBven+Mqpgds1s5A7NxesLXke+Px/FweKefZFERERERP2QnTDVSADZbEs20krXJGBpDtKLG+HAQrq8EWhMNoWv6gHS3WqqLxfcMGEiASCOOPKQgwpUwwNDlfFSxLtNL0IxH9IL6yFbdum6BsPvgivX16uvm4iIiFaPAWw/IuGrhLAy/zVtpjGzeiZqkjVY1LAItbFaLEss+y18zZClTZrWFL42bbzaRAWvetPXCjKeQEYP1MarsGHhFHyx7AtMLJqoOmEZxBIRERERdSwzj1WCWBk5EJldA8PQYEdSMGtiQF2yab7rClGkYavw1VDdrzI0LA23uq4WtchFDuoRVRMKNFgwYSGVNuFZnECqIQFnQ0APeGBG0vCUBNXmXkRERNQ3MYDtR2Q2a8pO4aMlH2FZ4zK12ZadttHYWIlysxoJ9Zl5G4YGWCagrdmvuhGNcFIOljTOQ9LaUgW7gexA178YIiIiIqIBxJENt1KW2mzLrI2pn2OVUSBpAtUJaW1VGpCEjbQaOiDRaloNIJBOCQdxJBCAgTgsRFCNLEh3q6Oq/CQakIIfHtOPvJQP6UX1SA4NwOtu2WVBREREfRE/Ju0nTNvE8shyVEYrEUlE4E3EUBYqwy+1vyCcqkbMadpcC1bTIqWmny0glVQdsDI7qvk6+cp0vjpO02X7t8dFEEFtKoL/zvkvahI1mF83Hz9U/YBYesVzEBERERFRK3Y0hXRDElYsCasxBbfXDbh1mfOFVNxEBAnUqa7XmNokNyazXVUUK9dYiKseV7nFgR8G3PAgDBM1iCCKGBJqOqwJNxxUxSJwqpOIza2HY9qIz6lBYmFDb78FRERE1AEGsP2ES3ehJFiCpeGlqInXIOgbAi1SB90xUalFfxsvoDlN814lb5XRAx4PEEnAksA1E77K9aZ84r5CJoyVsQTqM3YH1akaJM0kFjQsQI4vB1OKpiDgZicsEREREVF7jJAXukeXTRaQjiaRlro7kZZZA3CrrlYbFsKqVE8irUaHSawaUZtuyYgBU226FUcKi1CHatSr/9HhUsGrtFQ0IIzlqIVtJ7EoUQFNdyFVHYO7LAu+kTm9/RYQERFRBziCoB+paVgEj+FBZXgJPEY1Pqj+FpblIGRpqHe1mOuaMoF0CtCNpsA15AcSCcC0AK+36WCG0dT9Kpo7YrUVAa0Or66rjb6G1g5FjicHhmFgfO54GHJMIiIiIiJaSXJxWHW9pqoT0NwpoLxpBVkCDgwkVMCaUN2u6RXjBzSkkEQSgA8GokjBp+JaDVkIqGEFsimXC7qaBSsdsgF4Va9F0PSiYeYS2IkieEuz4ERMeMtCvf0WEBERUTvYAdtPVDQsRrp+Ib5c/iVKXFlYlo6gzFeAHN392z5a8oNpNwWxPn9TyOr3NnXDSiesR5ZBuVTIqknQmkyuGD9gN3XFOnbTYzQNjXYcebofo3JHoSHVgHA8DHtFh6xsBEZERERERGjeeCtRHkaqPo7okgboIQ8c2Ych20C9J4plqIYOLxqRVGGsjB6QWbDS5SqXJXSVVWgSscqoAQlca9GoRoOFVTwrnTNyH9mzQa6TgQU2fD4vArlZSC7/bUWcY9lw0jJXloiIiPoKBrD9hEszUO3yotgIoipWg0Q8jLn1c7DQLkfStGFIkJo24U+l1P01Kc9UU6sGdzIOxBNAOg00ptT4AW+y6X4qgJXQNdMFq8LYpn8ZVXYYby14C42xRiyPLoe+IunN8mT15ltBRERERNSnaF4DTtKCNz8ALWbBTltIhqOIL0uhLhWBjRQqUaU22Wqa/5pS33VoiKlANYUw0liGBsxGFRoRUxNhJajNR0CFtTHE4YMXHrjU5rvqCGGg/pdlcOqT0GTerJyLoUNzc9UaERFRX8IAtp+oj1WhyOXHUF8eHM1GvRmGyzZhOEDAsWDF08i2ZIlSUyesu7EBfhk70NCgljZJOItUGtCbgteEhKwe729jCOQ+mR8tB17bVnOoKiOVqE5VI5qOIm6u2LqViIiIiIia2XETMB1Y2QbSbhsIpxCPJdCIRgThQVzNeNXggY75KIcNB3WIwK3+HDPUGALpi5Umilz4EFV9sBYakUD1ikvSCStdrzIz1gsvsqDDTqbgjQJxO4Z0XQJ2ssU+D0RERNRnMIDtJ3xGCCF/Eapi1XDiYQzVPSjyl8LQgJjmQpYB+KJhZEknq5lGynAhLh2vALypZNMoAhkzIB2utgOX3BRONIWy6joZRbBiHIFjq4VOsjXApKJJ8MCD0lApKmIV6ngcQUBERERE9Bsps42QB1rcggsuxOw4NJcLZm7TXFc/dBWyViIKN1yIIg4NOqpU92tM/SxjBSRY9alpsNIZK/NjTRXaSjAbggflaFC77crU2AhkzICNaAEQzM2BFTObTmTFSAQiIiLqOxjA9gMJMwGfZmNZ3WxoXj/GBoag2kygMJmGLwHoasyAhRrDjZSZApIJQGZOaU0bbqWlyzUWA1wGQhLQOg4cRzbpkp27nKZZsBK8Gi7AJV+G6qKV/zFTJjYt3RSL6hehPFyOmngNRxAQEREREbVgmzYc20G6KgatwA3NAzS6YyoIlYECS1GPBahSQwVk/MCKR6mfZf6rjByoVQMKUqhDvVqdJh2zCbUmzYIBl+qY9cBQ4wrccMMj82CNBAKBIDSXDrMhATOSVM+p+7jXMhERUV/CALYf0NIueLNLsMhJwgrXYaw3hLJYBNWJZQjGk0jHk/DapkwOgCG/Us2NkAO47ASk9HLLJ+Eut5oBG5YdU9MpWBK0mpGmDbnkNtNEIBoBkmkEkk1D+6fmbATNBVTWzIGmaSgOFsPn8sFpHltARERERES621AhqMvtQnxZI9yeALwxA07CVuMDQvChADmoh8yDtVVXbCPSKEGO2nyrqcNVYlUdbngRRgJpWPBLNy0SqFJbciUhQ8Ky4FGzYOV+BcXZMGUcWUMaRsjdvBEXERER9S0MYPsBrxaDFqvDLsWbY7yRh0BWMWJuH4I2UKLZGGnqSEBDELLkyUa2k4ZP5kc5DmTxUVI22TJlyH/TLzwqna8ynsBx1CfnXgloPR7EdLfqfI15DbX06eeGXxH0ZCHp8WL6qOnI9+cj6A6qMJaIiIiIiH6jauR8NwKluXC8Fmq8UcSQhA8eRFdsoiUJqQVL1eQuaGhAI7LgQ6O6h6W6YCtQBxd0NRO2UW3OZSIAP7LgVaMK5BgyqkDmyMbDNtwuLwy/Bv+IXLiCXna/EhER9UEMYPuBmOGCz5+PaKQBfieIurqF2BQu5OePQ5ajw9RsFEJHtgMVuhY4Duo1DQ26Dtn/NM9smgFly5d0r9pyWQZVudU/Ab+MJ0hbcHkMlCXkXnKrhgPGHoCklcS4vHEIuUNI22nOfyUiIiIiaofMgJVRAVYshfJkGAXZ+dBzPaqulo24pGdVBgrUIgYTDvwwVFdrJcKq7xWwVCesjCRIIq26XW2k1WNlCy8JaCNq9qts7hWD5dZRuFGZem7vqHwVvsomXPaK1WxERETUdzCA7QcC7gCMRD2CwQByvW5MXH8/jPTkY+vqpVjfchBAGsXpNEotB4WmlG6A5gCFjgPNcZC14hcdcBzkNx/VUXcyNQ11mg44BkxTQ4PfheHIw8TgSEStKLYbvh1yfDloSDYg25vN+a9ERERERG1I16kdk81tHXhGZGPY6KEI2xHZ1RYBuJGChRRMBOHFEGSrUDYFG14EVGesBK8ODPW/5Z4y9VVud+BSYWs9YpDWCpn8WooiePL9yCnNgWUDgVG56hyscAKukBe6V1owiIiIqC9hANtfhEoRiCVROm4L5FYuRXVDDRJFY6EZPhTaGibFo9A0By5NR5WmYajjqEVORY6DRgA++WVrGkrV+ABDBbOj0xbcjg24vCiwHWRrOspc2ShHI3Kzh2JY9jCMyB6BSQWTkOPNgd/l7+13gYiIiIioT9L9LhhZbmQVhmDbNvKi2QhmBVCrJ9XM1ywE1LRXPzxqNEEaptqvwQ8fYqrjFWoTLhk2IPFsAYJqYIHIRY7qiPXAjZgehx2zoee4kRXwwjciB56SIPSgDBcjIiKivogBbD/hxBtg+/PVZ9+pqhnI9+gIRBsxwgI2tNzIMXyYHrUxOpnAREeWNAFDHdXninxV6AHDHAdLAWQ7plSIaNCl4HNgaB64fC6EvCH5kB67hyajNl6FHE8Ocv25iFtx5PpyOX6AiIiIiKgDVtyEO8+PdCwJozyFtCuFRlcCxU4QBUY20pB9GlxqBqwbBrJUhW6jARHokD0dpH3CVtGs1PAS2jYgre6rVrjBQHEoDz6XH4GROQhHwkiXeYCkrVa/GQE37ETT6DEiIiLqWxjA9hPhdBppuxFZmg6zcDMMsYCCSBIl0UaMTKeRazZNiaoyPChKAVMSaeQ5GoZKAGvZyJERBLaDPAfIk7mwdhwF8MAJ5mCI4cAxDHjdeRiaux5m2o3YOH8DFb7menNRGChELB3j+AEiIiIiolVIVcfhzc1COgTkaAFEI2GE9Tgsy0KO6myVGNVAHkLqD7EIYvDChVz4UYhsdZ8c+GBDhwkLhQjCDRfcEtF6ZLtdDfljCpEIh1EwuhielAbf8JDa3sGxbG7ARURE1Efxv9D9RJY/C1aiEUjFUJyYg8qsUhQ7fkS1GEY1LMMC3cBIOwUjGUW9x49qB/DaDkpMG0s9GraIpFHh12E5NpKagRzZnisQUBt2GZ4QcjUPvK4ANtN8CA/dEVPLpmJs3li18ZZlW/Aa3t5+C4iIiIiI+iwnZcOd64UZSSGEACoDEYzRRmIuFiK/IYCU+h8LDtKII6maJ3KQgzCialMuBzaiaoMuX9NGuggh5XYQdAJIu0xoIRdSPsDt82BC6SQEJhYh7bGQTCbh9Xtl3lhvvwVERETUAXbA9hO6NwB3XiksmREwZANkBQIwrBjykwl40hbWt+MwkIbhuFGUNLFDOolRsShsK43ihI18x4QvkcbQtI0C+DBSDyDXnY3NHC+m6VnwaTa29+TByRuNYXnDUCozZ90B5Pvz4XP5YOgc5k9ERERE1BF3kV9twuXK8kALueH2GPC6NAxJZsNU/awWdGhwwUAQPhXBNiCqxg/I2IEaRKDBg0YkoRtu1LqTKPTlocCXg6AniBK9EKOySuBkeZAa7obH54ZHc8OfE1TdsZra64GIiIj6InbA9is2DPmNjdwSDb98gKxANtzLw7ARQ518gq5pcDkyuB9YrGnw6jq2sEz8YBiIOhpGmIDPSqPCcBB0GZiS0pFTugMak9XIc+nQ0zEMNxNw0mkU+wpQm2qAV/fCY3hg2iZCnpCaA8tRBERERERErUkAKiMArEgK/qE5cBYuRyJoI5FIqoA1IcGq2p/Bj0ak4JYRYfAjBRtpWAjCDwsmgvBAd2kY6iqCpulwD89CmVEALWaiQcaCBaTad5B2OfAaHsSqGuHJC8BxHLjdTXNgOYqAiIiob2EHbH/iDQG5Q1H36wfIGb4+oikdjVYYsuWWS9OQ7Thq1quM3k9JAAvA5wDTzCTSbkPNj0oGh6BYd8Nvmgj4izCvbjZcmoFiVzaKR+0Ib9EGyM8fi5rwUiTNpJonVR4rR0OiQZ0Cw1ciIiIiovZJ8Kl5DKTrEnAXhhCrjsIPrwpXA/CozbTCSEB2b3DBBT/8anMuLwy4oMOreZHtDqHQyUHYiiEnPxtaowXHsWGGdJSOK0V2SQ5CrgCctIVEOAaXy41IRT309G/nQERERH0LA9j+xkohr7AUiEWRX/8LCqEjC2FEHAd6ChgGQGqvKbaNTSwLAU1DCMDWqSR2QAKbhOsw3jSRlpjW5cWGMPBFbDka88pQGshHfjqN0ZqBZKIehmYgnArDMR01joCIiIiIiFbNSdswQi5olgN/1AMbQBwxeFTIqqERCdXxKvW49MZKk4Tc5oEOv+NCYToE3dKQl5UNXRYspm0V6JaNHAqXx4dQxAs95EGiOoKEmVZjDwyvW5piiYiIqI9iANvvOMCwzREYtwOC+SOgpZMwHCDH0hDQgaqUhmG2DCuQiQUagg7QqGkodhzEnCCGI45s08a2Whby/EE18P+w4qnIbahEypeNhNuNxmQDhuWMxMickRibOxZBbxCV0UqEk+HefvFERERERH2aEXQja3wRAoEAsnK8kPVjQQTV5luy0VYushCCV32XEQRZ8EPL/FmmGWh0x+DRXch1ZcEIuOF4gZIxZahfUotQcS6iqQQMU0N2cR7yigvgCnrgzvIiVRnt7ZdOREREHWAA29/4coBUHFj0KdKBPBjuYoQTQLbhwHYBRW5HjR3IcoChcDDccbCR46Bac6HUimCJr1DNim3U3VhW/i2qfAFEdGD8yO3QmA7D8echqXthGW7MqpmFhY0LkeXOQq4vt/kUZA4sERERERGtTEYAmJEU7KQt87tUY6pMbZUtuAxoah6spYYQWKhHI/IRhA0HDgwYjgGf4UfSa8OuT8FZFgd8biTNNNabvD6saBIB3Q2X3424mUC4sqFp9qtjwJ3nU/NfReY7ERER9Q0cENTf2BZQOQPIGw5r+TdIGgkEvUA0DQRk5VEC8DtANKjBZTtIaoBpAYWGiWojFzG3jnAkgXy/B3m+QniNIjiOG3asBpuN2B4ROwUYksz7sEHhBqhN1CKajiLkDSFhJhi+EhERERGtgoSfZnUcWtCAS9NRiyQ0pNSGui64YaMRyxCTXlm1um0ZatUYgjBiKEMuyu0GpE0guyQXBYlcLM2Kwp8dRGV1FUo2HAlXwkEiEkOwJBe624AVbtq3wfJrsCJJuD3ssSEiIupr+F/n/kY3gMLxgOGGT/PA682DXwcCGgCpveQ36gMKbQe2DXgcIF8Dih0gF/WYGK5BkSuAEsQQdfmwJL8MSw0Di3QdP9fORsAfgunPQ8qwkbJSGBIYgqAnCMuW4f+y06ps7QXUJ+tVIEtERERERK07YI0cD7xBL0wPEPR64YYPluqFdRBDGsNQgkI1fMCDHARQiwgC8GIWlsFlGShEDsykBafYh1I7G5F4FAGXH9VVVYBHhz8vhFQ0Cc3QYGR71agCy7KQdFvQNPnDADAbkr39VhAREdEKDGD7Idkn1QkNBXy5cA8Zp2bAYkUACzfUCAL5UF0zAM0GZPWT3C7ffDDQ6ERhWSbSNXOx/vIfUWoD60Xq4PX4Ea+rQnjpzypgDVhNS5dUB6wnBEM34NKbmqZzPDnNYSwREREREbUOYVMewOcPIpEjf3RpqEQjTNjQYCCCRrjUQAIHFoB8BNQYgiKEkDDS0E0TemMK1eVVqIk2YEh+IRL1ERXWxpY1Il4XgR1rqtU1XUM8moDf74ff8CKdTsNxSzDrUQ0URERE1PsYwPZDbqSg+YKAPwRbkxIOyNKBhB8IOIBlADmBpvDVMJq6Yy1bQ8iWz9xTyDNNxMwYcvy58GWVqU22sks2RGOyDpZuIFg4THXO1sNWna8BVwCVsUokraT6nvlUXY5BREREREStSQgaGip7KFjI1rMQRwJFyFXNEPIHmBd+pGCqnxsRQXjFBl0ueICUg6Djg2YYCMKHkSNGwIynkDO0AIalwwi5YXk1eLK8iIeb6nG3ZiAajarA1Yyn1fOr7gtTbc1LREREvYwBbH/kywa8IWD93ZDOHQNTy4NlAaVuwNCbGmFl7n6e3jQyNqwDPt2BowM5ahMAE6M1H/J0H+J1i5AfrkXacTDKdDC/6gukHRtDVvzTkLC1Lt6Iykgj8n35anuAJeEl+Kr8KwTdQXUffrJORERERPQbX3YAXp8XyeFuhPL8cMOj+l2lA7YIeUgioQJXmf0qS9XkVhlB4IEGuXe1FkFICyCZZyK5sA6OacPULETqGhFbUA/drcM0LHjdHiQSCVhuB6naGHSPC+lIAlY8jciyemhuadVgvU5ERNTbGMD2R7E6IB0HYvXwjtoE7oJhcLk8iFtAMg1400DSBOrMpjEEWRoQWvHQejXCwKXmt+aloihw+dFgRhF0+xDPH4GyoZurTbwaPEH4XD4sqPgJAXcWJucUozJaiUJfIdy6G+vlr4dwKtx0OuyEJSIiIiJqZtYn4NgOvJoH8RFuaEGXmgCbUrGrBQ0upGGhElH44Iah1qlZqEVCzYrNMX2ImXHYSQtmjhvelAG334esnCwE1s+H7dhw0rYadVC5tAJux0B2fq7qfA3mhmDrDjw5Ppimqc6DnbBERES9iwFsfxTIA0LFMMumwgoWwhqzF1w5I2W+ANw2EHABmg8IpgCfBVnIpOhp6X4FZBFU2vCixjZhhnKQnzcG0WQjvKYGd0MFsgvHozirGD4zhTElU1CgAxE4KHT5UV69ALnefMTNODyGB7XxWtUJG0lFevlNISIiIiLqG1y5PrUxVvH4MpQVlyGwXiF8WVlqFmxMDR9Iw4SDAgQQgA8lyFOXhyCELOmHNdxoRByGqcGbE4Dl0VTgCl2DHnGQW5SPQEEIdsLE8DEjoLl0pMw0PLoL9TV18Lm8gFdHKpJUM2JlmZydkmmzRERE1BsYwPZjLlgwxu4AM2cIMHpb+IduhlqfdKRqyNEAT6DpfjKSwEwBhltTQ/5loZMr3oAhaRvpaDUWJiOINCxEmZNCcfEEeCPV6nFpjx+J2rmImjE4mo4Z5TMQS1WiLlKvljHNrp2NukQdKqIVyPJk9e6bQURERETUxxguF7LGFCBckIa2gR8+XxABeJAFnxo4IJtuSRQr3a858KEBDZCttWTvhZx0AF7ThZrlVUjZJoyIBcPvhjvHp0YMiFQiBStpwoqkEAmHUVNTA03X4URMpMpjcKWAZHUUqWgCuqdpHAERERH1PAaw/VUyrGbB2rYNffQ0YNI+QP4YlARGIi2ffKcB01EfksNrAZobCDoOJCbVoCMBHZ5YNfzROEI5xRhTNAHlZhT13hDi8Tq1XCkaroQvq1RdDplJbDpme3hCZRhbNApBVxC5rlzV/TokMAThZBixNEcREBEREREJmc0qIwLExA0moGzcKGhlHlT6ZEsumf9qqOA1igQqUK9GFPgRRL6KaF2IuhIor69CQSgX/hy/Glkg3aymYyKRSqruV0dzANtBYzqGwqJCFJUMgS/oh6c4qB5jehwYLgNunwdWNNXbbwkREdGgxQC2v5JNuOQXaBhqBiwMN5BdBpROQrB4MtyahkyJJR2wMvpJJj/FLPmlyw6rDTAMvxrYX1oxS+3WpdkWcjx50LNGweVyIejxq+PqvhxEbBO10Wrkp+JAohENqQbMqp2FeKoRS+vmojJeCa/hZQhLRERERCQbcfl86ruEqcXDSpFOJuHK82DcpPEwgm7UIII6xJCPLLihw4EGGzYiSCKKOLwxl9rIqyZWD6sqAW92APF0EnZAhz8rAMetwev1wtRteD0eFchGq8PwB3zq54aGRqA6jcaaeqRr4mpcmeAoAiIiop7HAHYgSIaRLNkMGLMtUqN3Q8pxwzVsWtNMWB3wG03dsG4NCKxYeRSCC+FkI8yEjepUAsnsUrhjjaiom9N0h3g9bDmAlUIgFkfA9sM0YzByh6E6XIPhnhxMHToVfk82orBQ5spCeeNCFcISEREREVET6YKtqqrCuKkTMHrCerAkMM3xIzc3F1kIQIOBIciXaBRhNSE2DRsaHIlkow6MOgfw6/AW+BEurwNSNuykiVh9RB3bsiykUin1s8vSAceBnUgjWJwNJ9+FnLICpB0L0XAE6VhKzYslIiKintW0Job6t5yh8KaiwPid4BlWCz1SDu3X/yI5Zj8kF3wAB3Xw6apWk3oMXhXCehEM5gGpGuixbJgz/wfPsE0Q0A0YhgloOry+XETidXAbBuysILJsN3RLh9/nQ1jT4E+bsGMxmOkGLLQdGI6FxbWzMDx/PRg6Z0wREREREYmioiI1kmDkhmORNFMwA3mom1uJWn8U/gogYkfhhVuFrqUoUtt0FQTzoCUdBPJCwII4ooUNyBmSC7/Hh1TaRKggp+ngUQvekF89xje0aWOupGPCrXthORqSlVGkNBOa5iBW3YhgdpbaJIyIiIh6DgPYgcITbPqeSiBdPBX+oROBisXQckrgXfA2kA6rma+AWyZSSaUG2xiKhOOCO1mDZMkEZNsm0LhUjTdIx8PQLRNZwXwgFQPSMbjgwHJswJ2HoNcHODbcWXmobmzA0OxhWNCwAC7Dg7kNc+EzfPAYHpQES3r7nSEiIiIi6jMjCTSPjuwNSjBk9FAsmrcIVYEK5FS6YNom/NEU/FoWkk6tlOvwF3mgLYoBxSG4/G6kXTaidWEYIQ+qq6uRlZUFf24QjmkjbVtqvIDsESGbdXk8HriCOuJ2DL6kC1aOAbM+qTpwGxdVIic/T4XBgdwVf0cQERFRt2EAO9DkDYffcCNlB+AN/QItuQHCyShCQ0YBs99GePlseLRsILsQ3uwh8BsBmGYU6cVfIzl6F3g9XsA24fIGoJkJoOIXYMgGau4rHAtJOw3bbSCRAHS3X818Xb9kqnrqiYUTsTS8FCGXXxWQ8XQc4UQaIZ+EvkRERERENH7i+kjXxRFPJlA0thi5oSykSlJwWxoSC2vgKtcR8A5BsMgPmSlm+xzYQTfqF9TAPyYbjleHxzGQ7cmCbhlIN8ThLgxAi5nQvBrMhAm3y606bj26G/5QAHpR0599lu6DCQs+vweOZcGKxQEGsERERN2OAewAk4qb8GSXQA+HYYzYDHUz3kXWNmfBcaLQ0lGE8oYj5S2CUb8Y0G2kHRd82x4D7/q7I1m1CMjOg520oHt1JCKN8PlygGgNoGlAeDkCOcMATQLYMFzeAuRAR8JMwEybCHj9cKW05k/3U6YO3UghadrwujgbloiIiIhIRgRIh2oo1wdPwIuIqwHeEV7IFLB5yRSMYTpCCCFdEYWW7VEb5RbvvQF02UrXsWF4XDAbk9C9Bux4GlZAgxFNw+V2Ibm4EZ7h2XASptqBV7ptHemKlcsS5zoOzEQa/lCWmhmrpWJIxBJqsy9N13r7rSEiIhqwGMAOMB6/C8lYDOmaGmSFQsiZtDOklEovmQF9+A6o8yxAbskY6OH5wIitgHQa8HoQrp2DkFRpugEtKwBEq+DLLQZiNUBWIdBYDmSXwa6ZBz1UAl9OWdMT+rJhhpchK1SmumENlwepdAr5wXy4DTdMS+bJApFUBFmerN5+e4iIiIiIepUEn/FwDK6YDl9+AN5AU/NCYmkYhaOHqI20nEI3gssC8G9eBKs6CSduwYIJ07BhuA0YEsw6gOZzw2VogK7BNi1oWR4V3LryfPD5V4w8kOdriMKXHYCua3BHf9uEKxDKbv5ZQlo5NyIiIup63AJzAPIGAnAPGaJ+TtXWwElG4CkeCbNoHLI33gtutwWnZDPYmg5f6QZIpWyECsYD2aVqlmyqsRrIKkbCtJuuE1lFahasVTQRyGo6NsIVTTe5s9TPAXcAQ3KHwOP2NH/C7nF5YMOGz8VB/0REREREwuv1wpXjhWM5SFdFobl0eEuCCA7LQ/5U2WBXhzPaDyMJ+PL8sj8uXCEvAvkhFZLa0XRTB6uhNX136XAFpO624Clp6m51bAdWONX8fPKz7jHgKW4aOeCkbWheQz1WumQZvhIREXUf/ld2gJHuVwlg5UvoviB0vx/JaAqerEJoPh+SmgdGIAgzEYFH14GsgqYH+7JVoWYYeYCuN48SaDqQASdYCpcsTYosRzKVgtulQ49WAzIr1t30fKlUCgFX08+yREpzNPhdMr+qCTthiYiIiGgwk9msmTo7mUzCXRRQNbjtOAgWhlSLTHBkvrpd86wISO1Uc0AqnaoydkB2Wchclxkf4M2SsFZDqjzSFMI6DuzaGHTDaPV4dZvlwJHRYysC3PbOj4iIiLoGO2AHmEzwKrNgha6nkU4moekm0ol6pBNxaJ4sOLYNX3Y+rHQa7mAuovPmobahUhVsrux25rUmGqC5DWiGLhUevEWjoftzAcMDBIcAgaYiUXZb1SXUlSVN7gDSVrrVYSSMlVEFRERERESDUSbclKBTOlOduKkCWE3T1ExXO26psQESvkrHqmPa0Nw6UhVRxOqjKjBtG5DaKUsdIxOy6n43XLk+1SHryvFBD7rVzFi0DG1XdM9m5sNmaLbdQ+8EERHR4MEO2AE+C1Z+xZpjw+MPAPK1gm03haRurw9mOg2tsBD52b/NgFqJbMa1QsqbD08y3HSdI8OnOh7Y7zf8qitWgllh6AZgelAfSyE30HQdEREREdFgIyGqhJ/S5ADLgeF1AfK1ggSqQkJY23bgyvXC0+L2luQ+LTloeqwR8qpgt7MNHclYVKLY5uYOIiIiWjfsgB2gknGzeRSBx/fbCAAJTJN1ldAlCF0hhTS8WcHmEQGrY7ikOAyteHAEMFNAOtHqPvKJvrqvYTSHrxkBj6tV+BpNS4FHRERERDR4ZEYBqK8V3aki3hBTowMyYwUy9BXha9uO1fa4sppWtMls144ek7muvdmv3kCwOXy1LQtmqmmWLBEREa0dBrADlNffVEhZZusRAKmEBfiykIw2Ba3SJaslLNhmUwG2JvNZJVT97YlCgMsDOWxLbZdFZQLZ9mRmxhIRERERDRaZ4NNcUYcLO2nCl+1XIWzzdQkTdvK3YntNNsuS8QItQ1v1mERju8/f8nnaPU/DgOGWibNERES0thjADnCydEk+QW85msArowhkK9UVS4x0Q4fhWreianWD+ld1e2eXRRERERERDRQta3UJTKU2DuQGmwNRtRGXz2h1v7XiW8W4sdUEu6zXiYiI1g0D2AHO5fGojbea5sE2ScVjiNQ1tlpihHT3D9tfVRcsEREREdFgZK/Y9CoTuMrlZDyBeDiuLssoApkPmxkn0K3nsgbjDYiIiKjzGMAO0PmvLckGXGqY/orrDbhRMLSk+fZ0MtG8PKkrWJa1Vl2yRERERESDQcug0+v1Nnegqrmwug4XDASLQr/dP2V1ab3uWO03X6zJeAMiIiLqPAawA3j+a1uOtiKAdZny0Xrz9ZnxA027nXbdp/hERERERNT5oFMz9HYvd1mHqr2O4wyIiIioUxjADiLNHaiGR31L1i5rHqyfGUXQFWMC3BzST0RERES01sGs5tJVl2qsPtp6U60VXbLrSnO32FSXiIiIuh0D2MFIN2TdEbw5Q1a6qdWYgGSkVadsd+J8WCIiIiIiNM99lS5V2YxrVd2zPTmzlfNhiYiI1h4D2MHKcMNG0yfpjm0j3V4A6s0C9J75J8L5sERERERErbtUHctZZfjZkzNbOR+WiIho7TGAHcQsM62+a7oO94oAlJ9sExERERH1DZnNsnqr65WIiIi6BgPYQSIZX7lQc3tX7jrlJ9tERERERD2vvWBV96w8q5X1OhERUf/DAHaQ8PpZqBERERER9VUMVomIiAYuBrCDVDLWtKMqERERERH1PRw1QERENHAwgB2kvIGVd1QlIiIiIqK+gR2xREREAwcDWCIiIiIiIiIiIqJuwgCWiIiIiIiIiIiIqJswgCUiIiIiIiIiIiLqJgxgiYiIiIiIiIiIiLoJA9hBIBnnDqpERERERH2RYzuwk1ZvnwYRERF1I26tOQh4/fw1ExERERH1RZquQfMavX0aRERE1I3YAUtERERERERERETUTRjAEhEREREREREREXUTBrCDgGWm4ThOb58GERERERG1wzHt3j4FIiIi6kYMYAcJTdN6+xSIiIiIiIiIiIgGHQawg4Dhcvf2KRARERERUQc0F/8sIyIiGsj4X3oiIiIiIiIiIiKibsIAloiIiIiIiIiIiKibMIAlIiIiIiIiIiIi6iYMYImIiIiIiIiIiIi6CQNYIiIiIiIiIiIiom7CAJaIiIiIiIiIiIiomzCAJSIiIiIiIiIiIuomDGCJiIiIiIiIiIiIugkDWCIiIiIiIiIiIqJuwgCWiIiIiIiIiIiIqJswgCUiIiIiIiIiIiLqJgxgiYiIiIiIiIiIiLoJA1giIiIi6tD06dNxzjnndHj7VVddhY033rjD2x999FHk5uZ2eP9jjz0W+++/P/oiTdPw4osvqp8XLFigLn/33Xe9fVpERERE1M8wgCUiIiLqJ9qGmf3BIYccglmzZvX2aRARERERrZG//e1vGDVqFHw+H7bcckt88cUXWFcMYImIiGhQsmwHn86twUvfLVXf5fJgkk6ne+R5/H4/hgwZ0m3HdxwHpml22/GJiIiIaPDU688++yzOO+88XHnllfjmm2+w0UYbYffdd0dlZeU6HZcBLBEREQ06r/+0HNve/A4Oe+gznP3Md+q7XJbru3Mp/1lnnYULL7wQ+fn5KCkpUcvxW7rjjjswefJkBINBDB8+HKeffjoikYi67b333sNxxx2HhoYGtRRevjKPb7lUPkM6ZaVjtuXyeSkod9hhB/Vp/pNPPomamhocdthhGDp0KAKBgHrup59+ep1e59y5czFmzBiceeaZKhztbNeubdu48cYbMXr0aBXeStH7/PPPN98u74O8ltdeew2bbropvF4vPvroo3aPtWTJEvX65P2W93SzzTbD559/3nz7Sy+9hE022US9H3LOV199NcNcIiIiokFar2fq8ZNOOknV3RMnTsT999+v6uSHH34Y64IBLBEREQ0qUrSd9n/fYHlDotX15Q0JdX13FnWPPfaYCgIlBLzllltwzTXX4H//+1/z7bqu4y9/+QtmzJih7vvOO++owFZMmzYNd911F7Kzs7F8+XL1df7553fq+S+++GKcffbZ+OWXX9Qn+YlEQoWYr776Kn766SecfPLJOOqoo9Z6mdUPP/yAbbfdFocffjj++te/qqC0syR8ffzxx1WxK+/DueeeiyOPPBLvv//+Sq/lpptuUq9lypQpKx1HgmsJm5cuXYqXX34Z33//vXovJeAVH374IY4++mj1fvz888944IEHVFh8/fXXr9VrJyIiIqL+Xa+nUil8/fXX2GWXXVrV53L5008/Xadju7rg/IiIiIj6BVm2dPV/fkZ7i5fkOokL5fZdJ5bA0DsfHq6OBIWynEmMHz9ehZRvv/02dt11V3Vdy82uZO7Uddddh1NPPRX33nsvPB4PcnJyVKgp3bNrQ45/4IEHtrquZYj7xz/+EW+88Qaee+45bLHFFp069ieffIK9994bf/7zn/GnP/1prc4vmUzihhtuwFtvvYWtt95aXSedqdLhKgGpBKoZEl5n3rf2PPXUU6iqqsKXX36pOmDFuHHjmm+XblcJcY855pjm57n22mtVSJv5HRERERHR4KnXq6urYVkWiouLW10vl2fOnLlOx2YAS0RERIPGF/NrV/okvW1RJ7fL/bYeW9Dlz9+2U7O0tLTVPCkJHqUDVAq8xsZGtRxeulRjsZha+rSuZAl+S1JgSuApgat0isqn/hKCdva5Fi1apMJQ6R5tGSJ31pw5c9RrbRusynlNnTp1la+lre+++049JhO+tiUdsR9//HGrjld5P7ry/SYiIiKizunter27MIAlIiKiQaMynOjS+3WW2+1udVm6WTNL4mVOq3SQnnbaaSoUlOBQOj9POOEEFUCuKhCU48i81dVtsiXjD1q69dZbcffdd6vRBpnZsxKgyvN1RlFREcrKytT82OOPP16NSVgbmXm3MhJB5tK2JLNeV/Va2pL5sat7LumCbdsRLGQmLBERERENrnq9sLAQhmGgoqKi1fVyeW1XoGVwBiwRERENGkNCvi69X1eSeVMSxt5+++3YaqutsN5662HZsmWt7iNjCKRLs70AVGbCZsyePVt1ca6OdIDut99+asaqbHYly/BnzZrV6XOXsPOVV15RwaXMlg2Hw1gbstGBBK3SUSvjAlp+yaZkne02li7Y2tradm+Xzbd+/fXXlZ5HvmTWFxERERENrnrd4/Go/RFkRFiG1OdyOTMea22xuiQiIqJBY4vR+SjN8anZUe2R6+V2uV9Pk+BPulbvuecezJs3D0888YTaiKolmQsrnZtSBMqMqkzIutNOO6l5st9++y2++uorNTe2bbdte2QOrWwCJvNbZTOrU045ZaVP/NeUdKRK56rL5cKee+7Z3M3aGaFQSM2klY23ZBOyuXPn4ptvvlHviVzujMMOO0x1Kuy///4qaJb39IUXXmjeQOGKK65Qm31JF6xs9iWv/5lnnsFll13W6fMmIiIiooFRr5933nl46KGHVO0p9aGsTotGozjuuOPW6bgMYImIiGjQkEH9V+4zUf3ctqjLXJbbu2MDrtWRDtQ77rgDN998MzbccEM8+eSTah5sS9OmTVPh6iGHHKK6Xm+55RZ1vXTNSofodttth8MPP1yFmGsyw1TCRukEla7V6dOnNweWaysrKwuvvfaaGofwu9/9ThWrnSUbYV1++eXqtU+YMAF77LGHCnZHjx7d6Q6GN998E0OGDMFee+2lRizcdNNNalmZkNcsXbtyn80331x1Hd95550YOXJkp8+ZiIiIiAZGvX7IIYfgtttuUx/Wb7zxxmpF1euvv77SxlydpTltB4b1ANlUQnbxbWhoWOsZYURERDRwdXet8PpPy9XuqS0H/Msn6VLM7bFhaZc/H1F/wlqdiIiIVof1eudwEy4iIiIadKRo23Viido9VQb4ywwpWcbUG52vREREREQ0sOt1BrBEREQ0KEnxtvXYgt4+DSIiIiIiGuD1OmfAEhEREREREREREXUTBrBERERERERERERE3YQBLBEREREREREREVE3YQBLRERERERERERE1E0YwBIRERERERERERF1EwawRERERERERERERN2EASwRERERERERERFRN2EAS0REREQDznvvvQdN01BfX68uP/roo8jNze3t0yIiIiKiQYgBLBERERF1exCa+fL7/Zg0aRIefPDB3j41IiIiIqJWPvjgA+yzzz4oKytTteuLL76IruDqkqMQERER9Te2BSz8BIhUAFnFwMhpgG709ln1WalUCh6PZ60f/+uvvyI7OxvxeBz/+c9/cNppp2Hs2LHYeeedu/Q8iYiIiGiAsHu+Xo9Go9hoo41w/PHH48ADD+yy47IDloiIiAafn18G7toQeGxv4IUTmr7LZbm+m0yfPh1//OMfcc455yAvLw/FxcV46KGHVJF33HHHIRQKYdy4cXjttddaPe6nn37CnnvuiaysLPWYo446CtXV1et83Pfffx9bbLEFvF4vSktLcfHFF8M0zVbHPfPMM9VxCwsLsfvuu6tCdO+99251nHQ6jSFDhuAf//jHKl+/3KekpASjR4/GWWedpb5/8803q3zMxx9/rM4jEAio1ybnUFdXp26zbRs33nijOo501Uqh/Pzzz6/Bb4KIiIiI+ryfe75eF1J3X3fddTjggAO69LgMYImIiGhwkaLtuaOBxmWtr29c3nR9NxZ1jz32mAozv/jiCxWaShfoQQcdhGnTpqkwcrfddlMBaywWU/eX+aU77bQTpk6diq+++gqvv/46KioqcPDBB6/TcZcuXYq99toLm2++Ob7//nvcd999KkCVYrPtcaXrVYLQ+++/HyeeeKI6h+XLlzff55VXXlHHPeSQQ9boPXAcRx1j0aJF2HLLLTu833fffae6YydOnIhPP/0UH330kVoOZlmWul3C18cff1yd14wZM3DuuefiyCOPVMEyEREREfVjP/devd5dNEeq4B7W2NiInJwcNDQ0qKVoRERERD1SK8gyJvnkvG0x10wDssuAc37s8uVN0skp4eGHH36oLsvP8hplaZMEiaK8vFx1o0rguNVWW6lAVO7/xhtvNB9nyZIlGD58uFrSv956663Vcf/85z/jhRdewC+//KJmW4l7770XF110kXrPdV1Xx5XfQ9suVZnfeswxx+DCCy9Ul/fdd18UFBTgkUce6XAG7I477ohgMKguJ5NJ1b16zTXXqPPoyOGHH65CWgle25Jj5Ofn46233sLWW2/dfL0ExBIGP/XUU83PKx2zsvmWbMIl3byZTbmoY6zViYiIaDDW621Jnfzvf/8b+++/P9YVZ8ASERHR4CEzpDos5oQDNC5tut/o7br86adMmdL8s2EYKricPHly83UyPkBUVlaq79Kd+u6776rxA23NnTtXBbBrc1wJXiW4zISvYptttkEkElEB74gRI9R1m2666UrPKyGnbKAlAax048pog3feeWe1r10CYhmHIOGpdOrKeAMJUaVbt6MOWOnibc+cOXNU0LrrrruuNKdWuoWJiIiIqJ9a2Lv1endhAEtERESDhwzw78r7dZLb7W51WQLQltdlAlHpEFWnEYmoZfc333zzSseSjta1Pe6aynSttnT00UerebHSTfvJJ5+oGazbbbf64lfuJ52omS7azz//HNdff32HAazMde2IvC/i1VdfxdChQ1vdJjNtiYiIiKifivRuvd5dGMASERHR4CG7p3bl/brZJptsokYFjBo1Ci5X15VtEyZMUMeVSVSZcFbmvEqH6rBhw1b5WOmulWVYMnJAQljZ6GttSKduPB7v8Hbp6n377bdx9dVXr3SbzIWVoFVGFOywww5r9fxERERE1Adl9a96fU1xEy4iIiIaPEZOa5oZJbOjOpwpNbTpfn3AGWecgdraWhx22GH48ssv1dgBmQcroWdmM6q1cfrpp2Px4sVqw66ZM2fipZdewpVXXonzzjtPzX9dHRlDIBt0ySgDmQe7JmT8gcyiXbhwIf75z3/iiSeewH777dfh/S+55BL1muVcf/jhB3WesllYdXW1CorPP/98tfGWnIe8LzKr9p577lGXiYiIiKifGtm79bqstJJRWPIl5s+fr36WD/7XBTtgiYiIaPCQQf173Ny0e6oq6lruRbqiyNvjpm4f6L+mysrKVGeqbI612267qfmpI0eOxB577LFGQWlHZNn+f//7X1xwwQXYaKON1CzWE044AZdddtkaPX6XXXZRIxBklICc45pYf/311Xfp5JVNxE455RRcddVVHd5f5tu++eabuPTSS7HFFluokQRbbrmlCqPFtddei6KiItx4442YN2+eGm8gHcNyfyIiIiLqp/Terde/+uortZFrhjQoCGk6kE1d15bmyNqzHsadVYmIiKhXa4WfXwZev6j1gH/5JF2KuYn7dv3zDTDSGSAhrowhOPDAA3v7dKiLsVYnIiKi1WG93jnsgCUiIqLBR4q2DX7XtHuqDPCXGVKyjKmPdL72VbKJl4wAuP3221XH6b779r/il4iIiIj6gYkDq15nAEtERESDkxRvo7fr7bPoV2T21ejRo9VGXbIEqys3BiMiIiIiGqj1OqtmIiIiIlojo0aNQi9MryIiIiIi6tfWfvcGIiIiIiIiIiIiIlolBrBERERERERERERE3YQBLBEREREREREREVE3YQBLRERERERERERE1E0YwBIRERERERERERF1EwawRERERERERERERN2EASwRERFRP/Hoo48iNzcXA9F7770HTdNQX18/4F8rEREREQ0uDGCJiIiI+olDDjkEs2bN6pEgNPPl9/sxadIkPPjgg936vEREREREvenGG2/E5ptvjlAohCFDhmD//ffHr7/+2iXHdnXJUYiIiIj6Gcu28E3lN6iKVaEoUIRNhmwCQzfQl0kYKl89QYrN7OxsxONx/Oc//8Fpp52GsWPHYuedd+6R5yciIiKiwc3q4Xr9/fffxxlnnKFCWNM0cemll2K33XbDzz//jGAwuE7HZgcsERERDTpvLXwLu7+wO45/43hc9OFF6rtcluu7y/Tp03HWWWfhwgsvRH5+PkpKSnDVVVe1us8dd9yByZMnqwJv+PDhOP300xGJRJpvb7ksXzphpUN15syZrY5x5513qqA046effsKee+6JrKwsFBcX46ijjkJ1dfVqz1c+9ZdzHD16tDpv+f7NN9+s8jEff/yxep2BQAB5eXnYfffdUVdXp26zbVt1FchxJETeaKON8Pzzz6/hu0dEREREg8lbvVCvv/766zj22GPV6i+pVaX2XrRoEb7++ut1PjYDWCIiIhpUpGg7773zUBGraHV9ZaxSXd+dRd1jjz2mwtXPP/8ct9xyC6655hr873//a75d13X85S9/wYwZM9R933nnHRXYtme99dbDZptthieffLLV9XL58MMPVz/LPNWddtoJU6dOxVdffaWKyoqKChx88MFrfM6O46jHSfG55ZZbdni/7777TnXHTpw4EZ9++ik++ugj7LPPPrAsS90u4evjjz+O+++/X72+c889F0ceeaTqNCAiIiIi6gv1eksNDQ3quzRPrCvNkaq6hzU2NiInJ0e9EFnaRkRERNQTtYIsY5JPztsWcxkaNBQHivH671/v8uVN0hkqYeSHH37YfN0WW2yhAtKbbrqp3cdIh+ipp57a3LEqn8Kfc845zRtV3XXXXfjrX/+KOXPmNHfFrr/++vjll1+wwQYb4LrrrlPP98YbbzQfc8mSJaq7VkYMSIjb3gzYHXfcsXmZVTKZVN2rEhb/+c9/7vD1SegrIa0Er23JMaRwfeutt7D11ls3X3/iiSciFovhqaeean5e6ZiVLt+2r5V6Dmt1IiIiGoz1ektS/+67776qFm2vvu0szoAlIiKiQUNmSHVUzAkHDspj5ep+m5ds3uXPP2XKlFaXS0tLUVlZ2XxZAkrpFJWxAlLUyuypRCKhQkpZ1t/WoYceivPPPx+fffYZttpqK9X9uskmm6jwVXz//fd499131fiBtubOndtuAJshwa1sQCDh6RdffIEzzzxThagyC7ajDtiDDjqo3dskIJbXsOuuu7a6PpVKqe5cIiIiIqK+UK9nyCxYGeXVFeGrYABLREREg4YM8O/K+3WW2+1udVlmuMqn62LBggXYe++9VcB5/fXXq7BTCr4TTjhBBZXtBbAyo1U6aKWDVAJY+d4yIJX5sTIG4Oabb17psRL+rorMas3Mm5U5WDI2Qc6rowB2VZuDZebYvvrqqxg6dGir27xe7yrPg4iIiIgGj6perteFNB688sor+OCDDzBs2LAuOSYDWCIiIho0ZPfUrrxfV5Lh/hLG3n777WoWrHjuuedW+7gjjjhCzYk97LDDMG/ePNUVmyHdsC+88AJGjRoFl2vdyj7DMBCPx1fZ3fv222/j6quvXuk2mQsrQauMKNhhhx3W6TyIiIiIaOAq6sV6Xaa0/vGPf8S///1vNR5LGhK6CjfhIiIiokFjkyGbqJlRMjuqPXJ9SaBE3a+njRs3Dul0Gvfcc48KUp944gm1YdXqHHjggQiHw6ozVWaolpWVtVo6VVtbq8LZL7/8Uo0dkHmwxx13XPPmWB2R0Qjl5eVYuHAh/vnPf6rz2W+//Tq8/yWXXKKe4/TTT8cPP/ygxijcd999an6tjDKQUQmy8ZZsLibn8c0336jXKpeJiIiIiHq7Xpfa+f/+7//UqjKpX6UWlq9VNSGsKQawRERENGjIoP6Lt7hY/dy2qMtcvmiLi7p1oH9HNtpoI9xxxx1qXMCGG26o5rnKPNjVkeJQxgzIvFfphm1JwtiPP/5Yha277bYbJk+erDa2ktECmS7bjshmXjKmQILhiy66CKeccooKTDsi82TffPNNdR6yuZhstvXSSy81d95ee+21uPzyy9VrmjBhAvbYYw81kqArOwuIiIiIqH8zerFel+YB2VRMNs+VOjjz9eyzz67zsTVH+mt7GHdWJSIiot6sFd5a+BZu+uKmVgP+5ZN0KeZ2GblLlz8fUX/CWp2IiIhWh/V653AGLBEREQ06UrTtOHxHtXuqDPCXGVKyjKk3Ol+JiIiIiGhg1+sMYImIiGhQkuJt85LNe/s0iIiIiIhogNfrnAFLRERERERERERE1E0YwBIRERERERERERF1EwawRERERERERERERN2EASwRERERERERERFRXwlg4/E4PvroI/z8888r3ZZIJPD444931bkREREREVEnsV4nIiIi6scB7KxZszBhwgRsv/32mDx5MnbYYQcsX768+faGhgYcd9xx3XGeRERERES0GqzXiYiIiPp5AHvRRRdhww03RGVlJX799VeEQiFss802WLRoUfedIRERERERrRHW60RERET9PID95JNPcOONN6KwsBDjxo3Df/7zH+y+++7YbrvtMG/evO47SyIiIiJaawsWLICmafjuu+96+1T6tPfee0+9T/X19eryo48+itzcXPQnrNeJiIiI+nkAK/OkXC5X82UpUO+77z7ss88+anmTLHkiIiIiov7v2GOPxf77798rz33VVVepOnNVX9Q+1utEREREa0dqpilTpiA7O1t9bb311njttdfQFX6rztbABhtsgK+++krNlWrpr3/9q/q+7777dslJEREREXU3x7IQ++prmFVVcBUVIbDZptAMo8eeP5VKwePxYKBbm9d5/vnn49RTT22+vPnmm+Pkk0/GSSed1A1nOLCwXiciIqKBwunhen3YsGG46aabMH78eDiOg8ceewz77bcfvv32W0yaNKnnOmAPOOAAPP300+3eJkXdYYcdpk6QiIiIqC9rfPNNzNl5Fyw65hgsO/989V0uy/XdZfr06TjzzDNxzjnnqOXhsixc/PTTT9hzzz2RlZWF4uJiHHXUUaiurm5+XDgcxhFHHIFgMIjS0lLceeed6lhynJZdji+++GKr55Ol87KEvj2WZeGEE07A6NGj4ff7sf766+Puu+9u1YEqBedLL73U3HEqy/PFjz/+iJ122kk9rqCgQAWjkUhkpc7Z66+/HmVlZerY11xzjZpL2tbGG2+Myy+/fKXr5b0oKSlp/jIMQ80ybXldRz7++GP1/gQCAeTl5an3ua6uTt1m27Zanp953RtttBGef/55DCSs14mIiGggaOyFel1WDO21114qgF1vvfVUPSt16WeffbbOx+5UAHvJJZfgv//9b4e333vvvaqwJSIiIuqrpGhbevY5MMvLW11vVlSo67uzqJNQU7pBJSS8//771axRCTOnTp2quhZff/11VFRU4OCDD25+zHnnnafu//LLL+N///sfPvzwQ3zzzTfrdB5Sr8kn/P/85z/x888/44orrsCll16K5557rrkDVc5hjz32wPLly9XXtGnTEI1GVaApweaXX36pHv/WW2+pYLmlt99+W20AJef7yiuv4Pjjj8cvv/yiHpMhnQQ//PADjjvuOHQVmXG78847Y+LEifj000/x0UcfqUJaAmch4evjjz+u3vsZM2bg3HPPxZFHHon3338fAwXrdSIiIurvGnuxXs+Q+vGZZ55R9a+MIujREQSZTRykmJblZDJHqr1uBiIiIqK+uoyp4oYbgfY6AOU6TVO3h3beuVuWN8mn6bfcckvz5euuu06FrzfccEPzdQ8//DCGDx+uZnVKx6uEtk899ZQKFsUjjzyiOkvXhdvtxtVXX918WTpCJbCUAFaCV/mkXzpEk8lkq25TOZdEIqFCTOnIzXRVSsh58803qw5eIbf9/e9/bzV6QIJbOXcZJ5B5HVJLjhkzBl1F3tvNNttMhYwZmeVi8lrkfZbAOFNEy3NLSPvAAw+ocxkoWK8TERFRf+X0cr0uq72kVpSaV2rif//73+rD/R4NYN99913svffeari/erDLpf5IkM4BIiIior5OzZBq80l6K46jbpf7Bbfcosuff9NNN211+fvvv1f1lRR3bc2dO1fVXOl0Glts8du55OTkqGX96+pvf/ubquMWLVqknkfCOhkJsCrSxSrL9jPhq9hmm21UR6V0vGYC2MmTJ68091Xmt0on7B133AFd11WoLOMUupJ0wB500EHt3jZnzhzEYjHsuuuura6X1y0h+EDBep2IiIj6s1gv1+tSZ0tN2dDQoEZVHXPMMWq11LqGsJ0KYGVGlxStsiuYz+fDZZddhgsvvJAFHREREfULMsC/K+/XWS2DSyGzUzPdo21J96uEhmtCZrS2nespwW1HZDmVjBm4/fbb1Sf8Ml/11ltvxeeff47ueJ1CXqfX61VdBBLOyvn94Q9/QFeSrt2OZObUvvrqqxg6dGir2+S8BgrW60RERNSfmb1cr0udOm7cuObmCRmhJXslyIqpHgtgZZOITz75RP1BIKRQlxOoqalRmzAQERER9WWye2pX3m9dbbLJJnjhhRcwatQo1anYliyRl3EBUviNGDFCXSefxst4gu233775fkVFRWpOa8bs2bNVt2dHZKaszHQ9/fTTW3Xcti0+M7NTMyZMmKA29pJZWJmQVY4lHa2r68qV1ycdBDJ6QI596KGHrjIwXRtTpkxR82dbjlfIkK4FCVql43cgjRtoi/U6ERER9WeuPlavy0ovGWXVo5twNTY2ql17M2R3WSmc5Q8BIiIior4usNmmcMlMU01r/w6apm6X+/WEM844A7W1tWpneglZJQR944031MZUEn5KZ6qElhdccIFaWi4bR51wwgkq8JSu1wzZyEtmscrGVrKZ16mnnqqC21XNopX7yXNJmCtdky03yBISCssmWTJaoLq6WnWsHnHEEaqrUs5Jgj45pz/+8Y846qijmscPrMqJJ56Id955R202JuMIumMDKnkdEizLuc+cOVN1gsr5y3spXb+y8ZbMspX3WjYzu+eee9TlgYL1OhEREfVngV6s16WW/OCDD9Q8fZkFK5ffe+89VQP3+CZcUqjL7LGWSbB0GkgRnrHvvvuu84kRERERdTUZ1F986SVq91RV1LVctr+iyJPbu2Ogf3tkMy3pIL3ooouw2267qU/XR44ciT322EOFrEJmpkqgKnM9s7Oz1XLyxYsXqyA0Q0YJSGi73XbbqWPKMqmvv/66w+c95ZRTVFh7yCGHqCBXAmAJLV977bVWM1ul4JRNrWT5voSt06dPV7Xg2WefrTbTknDv97//vTrHNSHBr3TeSui85ZZboqutt956ePPNN3HppZequbkSPMrzyOsT1157reoWvvHGGzFv3jzk5uaqLmS5/0DCep2IiIj6K60X6/XKykocffTRamWZ1FKyukrqqrZ7CKwNzWk7MGwVMn8IrPKAmrbScrX2PpmXFyKfxMsfEkREREQ9WSs0vvmm2j215YB/+SRdirns3XZDXybL/2WGqYSu0g3bn0jZKSGshL3nnXdeb5/OgNQV9TprdSIiIlod1uvd2AErn54TERER9XdStIV23rlpl9WqKjVDSpYx9VTna2dIp6ospZeOTilwr7nmGnX9fvvth/6kqqpKbf5VXl6uunWpe7BeJyIiooEgux/V690ygmB1Bd9///tftUSOiIiIqC+T4i245RboD2677TY1i1U2r5LdWD/88MNWcz77gyFDhqhzfvDBB5GXl9fbpzNosV4nIiKi/kLrR/V6jwSwc+bMwcMPP6x2xZXuBtmkgYiIiIjW3dSpU1c5z7W/6MTUK+oGrNeJiIiIes/qh0R1IB6P4/HHH8f222+P9ddfH5988gmuuOIKLFmypGvPkIiIiIiIOo31OhEREVE/7YD98ssv8fe//13N8Bo7diyOOOIIVczde++9mDhxYvecJRERERERrRHW60RERET9OICdMmWK2uXs8MMPV0XcpEmT1PUXX3xxd50fERERERGtIdbrRERERP18BIFs/iBLmHbccUd+ek5ERERE1MewXiciIiLq5wHsvHnz1Pyo0047DcOGDcP555+Pb7/9Fpqmdd8ZEhERERHRGmG9TkRERNTPA9ihQ4fiz3/+s9pF9YknnkB5eTm22WYbmKapdlSdNWtW950pERERERGtEut1IiIion4ewLa000474f/+7/+wfPly/PWvf8U777yDDTbYQM2dIiIiIqKuJwFabm5urzz3qFGjcNddd2GgOvbYY7H//vuv83GuuuoqbLzxxugLvw/W60RERET9PIDNyMnJwemnn46vvvoK33zzDaZPn941Z0ZERETUjWzbwdJf6zDry3L1XS73dYcccki3dzD2Zsg7EMiS/7fffht9gWzGJd2wW265Jc477zwsWbIEm2++OYYPHw7Hafr3LrW7jCeQL6/XqzpoDz744N4+dSIiIiL0dr1+0003qRrpnHPOWedjudCF5NP+v/zlL115SCIiIqIuN/fbSnz47GxE65PN1wVzvdjukPEYO3UI+iq/36++qO+RQNOyLGRlZamv3lZfX49tt90WDQ0NuO6661Tw6nK58P777+Pmm29W12eC9pNOOgnXXHONGlMgIe0zzzyDN954o7dfAhEREQ1ic3u5Xv/yyy/xwAMPdNnKoU51wMoyptV97bzzzl1yYkRERETdVcy9/sBPrYo5IZflerm9O0in4VlnnYULL7wQ+fn5KCkpUcvVW7rjjjswefJkBINB1aUoq4wikUi73anSCSufyM+cObPVMe68806MHTu2+fJPP/2EPffcU4WCxcXFOOqoo1BdXd3uOb733ns47rjjVDiX6YpseY6xWAzHH388QqEQRowYgQcffLD5NqkDzzzzzFbHq6qqgsfjae4IlWXzEgYeffTR6nxGjhyJl19+Wd1vv/32U9dJkSsrq1a1pF+W3suxWp73Fltsod43eX9k5unChQs7/F1IUCodoXLfgoIC9TvJdIRm2LaNG2+8EaNHj1ah90YbbYTnn3++1XPK+/Paa69h0003Vd2jH330UavzffPNN+Hz+VQY2tLZZ5+t3q8Medx2222nnkd+7/LvJBqNNt9eWVmJffbZR90u5/Pkk092+NoyNfmkSZPwyy+/qN/TY489pn43p556Kp5++mm1OVfLkDgQCKh/j3L9VlttpcJYIiIiosFWr2dI/X3EEUfgoYceQl5eHno8gJVCc/78+Zg4caIqQtv74kwpIiIi6qtk2ZJ8kr4qHz03u9uWN0kQJiHh559/jltuuUUFXf/73/+ab9d1Xa0mmjFjhrqvzOyUcLA96623HjbbbLOVwji5fPjhh6ufJfiTMG7q1Kkq1Hz99ddRUVHR4RLzadOmqXAzOztbzQ2VL1lSn3H77ber5/z2229VOHzaaafh119/VbedeOKJeOqpp5BM/lYoy/xRWdLeMmyUgFgCUjnG7373OxUISyB75JFHqnFWEh7L5baBaEeka1Nmt+6www744Ycf8Omnn+Lkk09W4WhH5HVImP3www+r8LO2thb//ve/W91HwtfHH38c999/v/p9nHvuueocpYO0pYsvvlgtT5Ows20dLI0JEvK+8MILrcLfZ599VhX1Yu7cudhjjz3w+9//Xp2/3Cbn1DLMlvm0ixcvxrvvvqtC4HvvvVeFsquq12tqalTNLsF023pd/j1INywRERFRX2P3cr0uzjjjDFWn7rLLLl12zE5VXrJc6ZFHHsE///lPVTRKB8SGG27YZSdDRERE1J2Wz65f6ZP0tiJ1SXW/oet3zafdLUlAd+WVV6qfx48frzZGku7QXXfdVV3Xcr5UpltUuhYlcGuP1GNyjGuvvba5K/brr79WwaeQ2yRsu+GGG5ofI6GjdFnKfSXEbUm6VWW+v4SX0hHZ1l577aWCV3HRRRepMFVCwfXXXx8HHnigCg1feuml5oBXQk4JD1uGoXKMU045Rf18xRVX4L777lPL4w866KDm42699dYqKG7vHNqbcyodu3vvvXdz5++ECRNW+RgJmS+55BJ1zkJC1pZL7iVElvfsrbfeUucixowZo4JRWYomYW+GhOiZ319bhmHg0EMPVcH0CSecoK6T37cE4xK4ZoJe+T1mfvfy70JCeHkOeW8WLVqkumy/+OIL9T6Jf/zjHx2+RqnX//73v6vXIN298rys14mIiKi/WN7L9bqMYpKmABlB0JU61QF7wQUX4Oeff8aLL76IcDisuhfkU3UpWqX4JSIiIurLoo3JLr1fZ7XtkCwtLW3VySiBn3RNSteoLPOX7lDpZJSl/+2RcG/BggX47LPPmrtfN9lkE7XTvfj+++9VQJqZSypfmduk83Jdzj8T0mbOX5bay/lKwCukcJXxBxLAdnQMGYkgZOxC2+s66vBsS8Y5yHPsvvvuapn+3XffrTp3hYSXLV+7hKoS1srtsjFVhnSDSmdvxpw5c9R7LsFqy8dLR2zb963l49oj4ap0pS5btqz5dyQdFZlREvI7kqC65fPIa5ERCNLJKp21cn4y5iBDfocdbZQm9foHH3ygfk4kEqzXiYiIqF+J9mK9LiuOZFSU1GtS2/ZaAJshnQAyB0GKV2nLlUK7rKyMRR0RERH1acFsb5fer7PcbneryxJiStAmJEiVLk4JKGXJunSy/u1vf1O3pVKpdo8nAags75cOSyHfM0vbM/OrJJT87rvvWn3Nnj0b22+/fZeef2YMgYxUkI2cZNWUnJvMee3oGJnO2PauyxxXxjK0HUeQTqdbXZbnktEDMkJBlvBLZ6+E0lKftnzd0k28JjJzd1999dVWj5dGhJZzYIWMlFgV6VqVzlzppojH42rUQdvfkXQEt3weCWXld9Rylm9nFBUVqYBWxg2wXiciIqL+JNiL9brU39IEIA0N8gF4ZgNTWZ0kP8soqbW1TsOfpLNBTkQ+mZelTW2LciIiIqK+pHR8rto9dVXLmrLyvOp+PU0KPgkdZT6phI7iueeeW+3jJMyTObGHHXYY5s2bp7piM6R4lDBXxhms6cxPGUOwtsWldLJKR6h8UC9hsIxAWFcSJpaXl6sQNhPOSkjZloxakC8ZLSDNAvL8sqHUuHHjVrqvdB7LHN5MCC1zZOX9l/dLyOxU2VRLOmhbjhtYW/I7kk4K2eRKfrfSAZshzynBbnvnmel2zZxfZgSBzN1tu7FXS/Ic8u/giSeeUBuwta3XJfSVrg7OgSUiIqK+prQX63VZifbjjz+2uk42qJV6TMZkyXipHuuAleVTsnxLOgv+8Ic/qGVfUsBKl4HszEpERETUV+m6hu0OGb/K+2x78Hh1v54mAZx0dt5zzz0qSJXwTJaNr47MMZXRULIh1o477qi6HDOk81E2mJJwVuZYyfJ5mXUqhWRHIauEtRLQyazS6urqDscfdES6YGVTKglMDzjgAKyr6dOno6qqSm1aJucvXcEyEzVDlulL6CodsAsXLsSbb76pukdXNQdWlpbJOcpYrZkzZ6q5ti0DTRn/IJuPycZbshmaPK80HsjvRi6vTQArj7/++utV/SzhboYU85988oman5vpTpY5uplNuGS+rmzSJV2yUnNLECvv8arqbqnXZf6rdE7LrFnpgpUAWP5NPf300yqoznT5CvkdS8gtnctS08tsXiIiIqLBVq+HQiH1gXXLL1ntJHXVus7U71QAK5smyFIoKf5uvfVWVaTddtttqkuAiIiIqD8YO3UI9jhlQ/XJettP0uV6ub03yHLxO+64Q22iJAWeBGayQdOaFIoyZkCWrbdc2i4kjP34449V2LrbbrupDlXZ7EmWp2e6bNuSZfyyVP+QQw5R3acSfHaGhL3SWSnfu2J2lgSpsgmZBK/yHslmVBKOZgQCARWiStAoDQInn3yyCp4zG321509/+pOaV3vMMceobll5D9uGxbKx2eWXX65+B3IOEoLKSILRo0evVbguc1h/+OGHlX5HMnJCOlRlU7TttttOhaMSgLYM0mXEglyWblwJ3OU1DhkyZJX1unRvyGxZea8kQJZNzuT4EsBKHS+brWVIx7J0Bcvj5PjyfhIRERH1lrF9tF5fF5rTdqjWKkihLsWZFHwtd7NtSz7hXxWZPSVFn2yCkJ2d3bkzJiIiogGvJ2oF23aadlltTKoZUrKMqTc6XwcamWUrQZ503GaW9FPP6Yp6nbU6ERERrQ7r9c7p1OAn+TR+VYUcERERUX8hxdvQ9fN6+zQGDBmfUFNTg8suu0zNXmX42jtYrxMREdFAoQ+ger1TAexVV13VfWdCRERERP2WjDqQGbQyBuD555/v7dMZtFivExEREfXzADYvL6/dT9Sl5ViKbZkxteuuu3bl+RERERFRPyCbZXVishV1E9brRERERP08gL3rrrvavV52jZUdWffee2/V8SAbQRARERERUc9ivU5ERETUzwNY2Sl2VTbeeGO1UywLOiIiIiKinsd6nYiIiKjv0bvyYPKJ+syZM7vykERERERE1EVYrxMRERH18wA2mUzC4/F05SGJiIiIiKiLsF4nIiIi6ucB7D/+8Q+1rImIiIiIiPoe1utEREREfXwG7Hnnndfu9Q0NDfjmm28wa9YsfPDBB111bkRERERE1Ams14mIiIj6eQD77bfftnt9dnY2dt11V/zrX//C6NGju+rciIiIiKgXaZqGf//739h///17+1RoDbFeJyIiIurnAey7777bfWdCRERE1INs28LSX2YgUl+HrNw8DJ0wCbpuoD8bNWoUzjnnHPVF3ctxHDz00ENqSf+MGTPgcrkwbtw4HHnkkTj55JMRCARw1VVX4eqrr1b3NwwDubm5mDhxIg488ECcdtpp8Hq97R771FNPxQMPPIA777yz079L1utEREQ0UNg9XK+3rN0y1l9//S7ZwLRTASwRERHRQDD780/wzqMPIlJb3XxdVn4hdjr2ZIzfchoGMsuyVGerrnfpVgD9UiqVWusNqY466ijVTXrZZZfhr3/9K4qKivD999/jrrvuUkF4pmt40qRJeOutt2DbNmpqavDee+/huuuuwxNPPKF+DoVCrY4rHcefffYZysrKuuQ1EhEREfVHs3upXs/UbhnyIXtXYOVNREREg66Ye/mOG1oVc0Iuy/Vye3eQAO6WW25RXZLS+ThixAhcf/31zbcvXrwYBx98sOqSzM/Px3777YcFCxY0337ssceqUO+2225DaWkpCgoKcMYZZyCdTqvbp0+fjoULF+Lcc89VAat8iUcffVQd8+WXX1bdl/LcixYtwpdffqmWpBcWFiInJwc77LCDmhHaGfKcf/zjH1WXZl5eHoqLi1VXaDQaxXHHHafCRXm9r732WqvH/fTTT9hzzz2RlZWlHiNhZnV19Tof9/3338cWW2yhXqO8RxdffDFM02x13DPPPFMdV1737rvvjuOPPx577713q+PIezpkyBDV3dqe5557Dk8++SSefvppXHrppdh8881V6Cq/s3feeQc77rhjq6K9pKREBaqTJ09Wr0vOU96Dm2++udVxly5dqm6XY7vd7tW+/xLgyu/51VdfxZQpU+Dz+bDVVlupY7f0wgsvqD8m5H2R87z99ttb3X7vvfdi/Pjx6vGZ3wcRERHRYKvXW9ZumS+pGbsCA1giIiIaVMuY5JP0VXn3sQfV/braJZdcgptuugmXX345fv75Zzz11FMq7MoEfhIGSrD44Ycf4uOPP1bh5B577KG6NJvP7d13MXfuXPX9scceU+GqfAnpxhw2bBiuueYaLF++XH1lxGIxFfb9/e9/V8vlJVwMh8M45phj8NFHH6mOSwng9tprL3V9Z8h5SGH6xRdfqPBQltYfdNBBmDZtmgp0d9ttNxXoyTmI+vp67LTTTpg6dSq++uorvP7666ioqFDh87ocV8JLOX8JQ6UT9b777lMBqnSbtj2udL3Ke3z//ffjxBNPVOfQ8v165ZVX1HEPOeSQdl+zBKSyHE0C17YkEJVAe1U22GADFUDL76xlQC+v54ILLlBhaWfIYyRUlVBdOnH32Wef5mD+66+/Vu/toYceih9//FEtrZN/g5l/N/I7OOuss9S/m19//VW9F9tss02nnp+IiIhoINTrYvbs2eqD8zFjxuCII45QjQtdgQEsERERDRpqhlSbT9LbCtdUq/t1JQk17777btUBK6Hn2LFjse2226rwTzz77LMqgJOAVLokJ0yYgEceeUQVfNLlmCHdoLLcXQI86dr83e9+h7ffflvdJl2zMmdUQtzMJ/YZEsZJl6OElxIcynxSCUFlXqkcS57vwQcfVKGjdGd2xkYbbaSW4UuAKyGzdFFKcHrSSSep66644gq19P6HH35Q95fzl/D1hhtuUM8tPz/88MMqVJ41a9ZaH1de3/Dhw5vfH+kWlhleEkzKe5shj5Xfg7wP8pV5T2QkQIa89xL2SgjeUWEuj1kXco4tO5wlIJeOCwlDO+vKK69U3czyb0cCZgm0ZZSBuOOOO7Dzzjur0HW99dZTndTSBXzrrbeq2+XfWDAYVP+eRo4cqX4fMoOWiIiIaDDV62LLLbdUH1LLB9LyYf78+fOx3XbbdbpBoT0MYImIiGjQkAH+XXm/NfXLL78gmUyqIKw90rE5Z84cFZ5K6CdfEqgmEgnV8ZohnZESsmbIMvvKysrVPr90fMoS9ZYkpMuEmdKxmZ2djUgk0ulP+VseV85NRiNIEJiR6fLNnKe8VglbM69TviSMFC1fa2ePK+/x1ltv3Tx6QUgnp7ymJUuWNF+36aabrvQaJAiX0DXzvshoAxlNsKoNuNaVHCNzrtKlKgG9FPwtz7+lzMgG+WrbISuvO0P+3Ug4LO+HkO9tO1rlsoTIMg9YglsJXqXLQzpwpbs301VMRERENFjq9Uy9JR/CSx0qq9P++9//qtVbMn5qXXETLiIiIho0ZPfUrrzfmvL7/au8XUJCCQYl/GpLlpRntJ0LKmFdy+7OVT1/22BPOnGlg1SCPwngZD6oBHktRx6sifbOqeV1mefNnKe8Vlki33b+aSZQXtvjrinp9mzr6KOPVvNiP/30U3zyyScYPXq06nboiHSSrutuuBKMyvMIGTshQbLMBc6QcPRPf/qT2tRLOmWlOzoej6vb1mQ+7JqS0F9GOkin9Ztvvqk6i+WLiIiIqDf0Vr3eHtlHQeo+aZRYV+yAJSIiokFj6IRJavfUVQkVFKr7dSXpMpUQNDMuoK1NNtlEdSTKbFbZXKrl1+rmibbtdJXgbk3IDFRZ7i5zUzMbNLXcCKu7yGuVObSyGVTb19peOLqmZIyCBKgtu1PlNUrAKLNxV0W6a2VkgXTBSheqbPS1Kocffrgal/DSSy+tdJs8f0NDwyofL+GtLG37/e9/ry5L56mMUvjuu++av2T2mMx2feONN9R9hg4d2vw+SWDekszwzairq1PnJu9H5n2R96EluSx/TGS6qWX0wS677KJGM8h5yGZuRERERIOpXm+PNA7ICq2WTQJriwEsERERDRq6bmCnY09e5X12POZkdb+uJPNLL7roIlx44YV4/PHHVSEnoZlsEiVkwL/MN5VNnaQbUuZNSUeiBKQtl8+vjoSaH3zwgdqQanVhqoTCMvdUOjE///xzdQ6r69TtCmeccQZqa2tx2GGHqU2j5L2QkFFCzzUNj9tz+umnY/HixWrDLgk4JRyV2ajnnXcedH31Ja+MIZD5qfJ+SHfwqsimVrJBl7wGmWUrG1lJaCmbd0mQKSMWMkzTRHl5OZYtW6Y2wbrnnnuwww47YOONN1YBayYA3nDDDVt9SZerzPFdk1mzsoGWhPs//fSTmvEq/5YkUBbSRSu3XXvttSqYldcoc3LPP/98dbuc81/+8hcV+sprkH+fne0qJiIiIurv9bqQ+kj2Q5DVR7Iq6oADDlAfWEvNt64YwBIREdGgMn7Ladj3vEtX+mRdPkmX6+X27iCbIEkYJsu7pStRArzM/FLZFEuCU1mCfuCBB6rbTzjhBDUDVmazrikJ4qRglE2+Wo4uaI+Ev9ItKR2p0oEpYa904HY36eyUDkwJW3fbbTc11/Wcc85RS7zWJCjtiHSIypyuL774Qm3gJRtJyXsoG3mtCQlOpbtB5n3JOa6KjD946qmn1AZXL774ogpUZVbYVVddpUJ0OUaGdPvKceV3O336dDVDTDYVk6C9o02+Ouumm27C2WefrcZYSNj7n//8R3VDC/n9ynM+88wzKtiVf3/y70SCWiHv+7/+9S+1KZv8u7v//vvVpmhEREREg61eX7JkiQpb5QNw+cBdPiSXponV1dVrQnO6YheBTmpsbFTL6WR5Vmf+qCAiIqLBoSdqBdu2mnZZra9TM6RkGVN3fJJO/YMsMZMQV8YQSAjeH0iX9I477qiCdAlSuwprdSIiIlod1uudw024iIiIaFCS4m34pCm9fRrUy2S5vYxruP3221WIue+++/b2KRERERERBla9zgCWiIj6FFmY0Xa3diKi7rJo0SKMHj1abdQlG3DJhlRERETUMdbrRJ3HCpOIiPqUmBlD0L32O6ETEXWGbFzWCxO5uoTMlO2v505ERP2Xk7KgeRknEXUGN+EiIqI+ZVXhayQV6dFzISIiIiKi1vRVhK/JWKxHz4Wov2AAS0RE/UaWp2t2DCciIiIioq7nDQR6+xSI+iQGsERE1O80xlMIJ9K9fRpERERERNSGjMeRTlh2wxL9hgEsERH1O4YrjZDPvdL1kaTZK+dDRERERERNzHQKbq93pW7YVJyBLA1enJpMRET9ek5sy11Ygx6jF8+KiIiIiIjcHm+ry5l63e3z99o5EfU2dsASEVG/E0v91ukaTVnqu4wkyASxRP3Vxx9/jMmTJ8PtdmP//ffv8DoiIiKiviydSLT4Oa6+p+JN34kGIwawRETU57Wd9+p1/dbpmtViF9akacGynR49N6KudN5552HjjTfG/Pnz8eijj3Z43bpYsGCB+rDiu+++64IzprYaGxvx5z//GRtssAF8Ph9KSkqwyy674F//+pfqABLTp09XvwP58nq9GDp0KPbZZx91n44kk0n174C/OyIi6ovsROtRYIbnt3FhHv9vowjSyd+CWaK+aOnSpTjyyCNRUFAAv9+vGiG++uqrdT4uA1giIurz2s57NXSt3ftUh5NoeRM36qJVcWwHibn1iH1Xqb7L5d42d+5c7LTTThg2bBhyc3M7vK6vSKcH5v+Nre3rqq+vx7Rp0/D444/jkksuwTfffIMPPvgAhxxyCC688EI0NDQ03/ekk07C8uXL1e/3hRdewMSJE3HooYfi5JNPbvfY8viysrK1fk1ERETdSfe1nnCp6yuPBpOZsLEW/y0U3KiL+lK9XldXh2222UatPHvttdfw888/4/bbb0deXt46H5sBLBERDRhD8wKtxhBkgtto0oTdB8I16jviP1Wj/OYvUP3Qj6h95lf1XS7L9d3Ftm3ceOONGD16tPo0faONNsLzzz/fqiu1pqYGxx9/vPpZul3bu0789NNP2HPPPZGVlYXi4mIcddRRqK6ubvVct9xyC8aNG6c6LEeMGIHrr79e3SbPL6ZOnaqOKd2Ymcdcc801KuiVx0i35euvv958zMw5Pvvss9hhhx1Ud+eTTz7Z7muV+z3wwAPYe++9EQgEMGHCBHz66aeYM2eOer5gMKiCSgkfW3rppZewySabqGOPGTMGV199NUzTXOfj3nfffRg7diw8Hg/WX399PPHEEyudr9xn3333Vce47rrr1Ht32223tbqfdJ7KfeX52nPppZeq9+nzzz/HMccco0LV9dZbT4Wt8lj5fWXI+Ut3rLzfW221FW6++Wb12h566CG89dZbrY4rfwC8+eabK51PR6666ir1+5PjDR8+XD3XwQcf3CoAXt3vm4iIqDvkDCludTmzUVcyFu2lM6K+Kt4L9brUY1I7PfLII9hiiy1U3bzbbrupOnJdMYAlIqIBobFFt2tFpF4t9c10wFqOA13X1OWGWEqFsS3vT4OLFG01//cLrIZUq+vlslzfXUWdhK/SGXn//fdjxowZOPfcc9Xypvfff18VetINmZ2djbvuukv9fNBBB610nXRSSpeldMRKgCrLoSQ0q6ioUAFbhnRf3nTTTbj88svVJ/dPPfWUCmrFF198ob5LyCfHzCx7v/vuu9Un/BLy/fDDD9h9991VIDl79uxWr+Piiy/G2WefjV9++UXdpyPXXnstjj76aBU8ynL8ww8/HKeccoo6Nzlv+b/RM888s/n+H374obq/HFvOWcJDCZwzwfHaHvff//63Ouaf/vQnFVzLfY877ji8++67K4WWBxxwAH788UeccMIJKvSW4rslubz99turcLYtCTSfeeYZHHHEEe12qkr46nKtev9bCW2lw6LlKAL53UqAK6GxBKlrSkLi5557Dv/5z3/Uv5Fvv/0Wp59+evPta/r7JiIi6gqObSO1YhasCNdUt9sBG66tgZlqqtEYyg5e8V6q119++WVsttlmqg4fMmSIqrflw/Eu4fSChoYGaUNS34mIiDrLtGwnmkw3Xw4nw05jPNX0cyLt2Latfs5c11LKtJy0aa10vWXZ6qs97R2H+metYFu2s+yGz5zFF33Q4deyGz5X9+tKiUTCCQQCzieffNLq+hNOOME57LDDmi/n5OQ4jzzySKv7tL3u2muvdXbbbbdW91m8eLF6v3799VensbHR8Xq9zkMPPdTuucyfP1/d99tvv211fVlZmXP99de3um7zzTd3Tj/99FaPu+uuu1b7euV+l112WfPlTz/9VF33j3/8o/m6p59+2vH5fM2Xd955Z+eGG25odZwnnnjCKS0tXafjTps2zTnppJNaHfeggw5y9tprr1bHPeecc1rdZ+nSpY5hGM7nn3+uLqdSKaewsNB59NFH233NFRUV6jh33HHHat+fHXbYwTn77LPbvW3LLbd09txzT/Wz/P+yPfbYQ/3OV/W7a+vKK69U575kyZLm61577TVH13Vn+fLlq/19s1YnIqJ1lYhGV7qcuS4RjXR4P5FKJto9ppn+rf5vdf943LEscx3PmAZ7vS6khpavSy65xPnmm2+cBx54QNWVHdV/ncEOWCIi6ndkBmzA09RJprITx9c8bkA25cqMIZDrltfHW82CdRtN/+mrakw2b9ylNu9yHNgrNshpOzu27QzagUZe74Lqpg6D2mgSKdNWP8t7J9KmjWTaWukx/VFyfsNKn6S3ZTUk1f26knQjxmIx7LrrrqoTMvMlHbFtl8uvzvfff6+6N1seRzpBhRxLOlNlw6add965UxtHLVu2TM28akkuy/Fakq6ANTFlypTmnzPdt7KJQcvrEomEeu7M65Il8S1fV2ZOqrx3a3tcOf+1eV3Sxfq73/0ODz/8sLosnaTyvkpHRHsyG2ytKzlO5v+H3XPPPQiHw6q7tyMt369TTz21+XoZOyGbe2VsvfXWqkv3119/7dTvm4iIaG1kRgtkOlnlcuY6byDY6n41Sxa3eqzb40W4pqb5v62ZLlnbttrtmnX7fO3OnB1I5DXXly9T70EiEoFlplVXcbi2Vn230ml1XdvH9EfJXqrXhdRKMg7rhhtuUN2vMptf6lFZwbauVr0OioiIqI+ToEJC17ZkxEC2z43SXP9Kt7kMHUXZXhUitheuDoTA1bRs9Tozr1HGLsypDKMk1w/TcuDSgUTSgnwU69Z1eFwaPptXjbpoUg23H1+SjSEhn3ofY0kTy+tjmDqyQIWzmaC6P7LDqS6935qKRCLq+6uvvtoqFBMyf7Ozx9pnn33UjKq2SktLMW/ePHQnmZG6JmTzgoxMoNjedVLoZl6XzHw98MADVzqWzIRd2+Ouy+s68cQT1XzdO++8U40fkBEQHY0BKCoqUpukzZw5E2vLsiw1AmDzzTdXl9955x0147btvxEJi2XUwWOPPaZGMWTIuAoiIqK+pmXgmmFblgoNJTwtGDZ8pdtDBQXNIWImuJVgtul4az6Spy+T90A3jObXKN8TkTA8K16f1DTpRBzpVBqB7Gy4vF7M/upTBHMKUVu+GGOmbAq3x4OG6kq4vX7E6mpRNGq0Oo7hcsHl6Z9/09i9VK9nammZ4d+S7DkgG6auKwawREQ04EjoqLVzXdtgtaMAtq9re94Stpq2g28W1qAoJEGVhhy/G0tqo5AMKifowveL6lERjmOzEXn4bnE9huYF8cHMcjhwEEnbcCwLHrcLU8pyMTQ/gK/n16q5udJtHE2lMboopDYziyTS8HtdsGynX75/esjTpfdbU1LISYi2aNEitYHVupBP5aUIHDVqVLszRcePH682+Xr77bdVgNiWbESVCftaBnfS8fnxxx+3Oj+5LBsQ9AR5XdKd2d581XUhRbO8DpmvmiGX2xbX7dlrr71UMCsbdMkc1Q8++KDD++q6jkMPPVTNar3yyitXmgMrAbMEyauaAyuBquy++/vf/15d/stf/qI2BMuQrlWZ1SoboW255Zbquo7eL/m3JvfPnMdnn32mzlE2IesLv28iIhq8UvEY3D7/Sh+WtgxbM/dznM59oNoXSOduOpmAx+dfKWhdNutnFAwdoW63TVM1NliWCTOZQqS+Fg0VFSgePQYV8+YhmJ+H+T98C38giEhdDSzTQoFsrhnKgSfgx+JZM5CONq1Yc7ncCBUVIdZYD48/qIJbmaUbKihEf6P3Ur2eWQ0k9WhLs2bNwsiRI9f52AxgiYhowFnTrtay3NV/et5RyNid4WOme7WlBdURjMgPqlBUnveX5fXI9nnh1jU8//l8jByShe8X16M4x4fZFWEVwMYSJhpTJiLxFFwaMKc6ird+LkeB34WfvPVYUBtFcdCL/Cw/GpMOPIaOuJlGNGWhPpps+tTdsbDjBmVwYKMyHEdBlk8VlXKOeYHfuhL7C+/oHBg5nlUuazJyvOp+XSkUCuH8889XG2/JHxvbbrut2pFeAi8Jw1qGg6tzxhlnqM0ADjvsMFx44YXIz89XIw5kA6i///3vKuS76KKL1G0StkohWVVVpTb+ks2lZEMBCWglUBw2bJi6f05ODi644AIVHMourxtvvLHq+JTuyieffBI94YorrsDee++tls7/4Q9/UGGhjCWQjbNahpCdJa9LNiiTZWS77LKLGiUgm1zJJmSrYxgGjj32WDUCQIJtWca/KrJh2HvvvafCUflZOlWlO1c2GJNN2L788kvVJStkrEJ5eTlM08SSJUvUZmHSaXvaaadhxx13VPeR96IlGTMg5Hckv7tVkd+r/LuSTbZk5MBZZ52l3oeSkpLm96U3f99ERDR4efyZUQSta/G2lzP3W522wW1H13WlTKiaIRuMJWNxhPLzVQ0t4euSmb+gcNgwVC9ZjNnffIbxm22F+T/8gFlffgZ/VjbMeArhxjo4lglL02DFYyhfMB9Dx41XK9KgOQjX1qFo+HA4moa8kqGI1tUjkJsPjy+Iyvnz4fX4VRg7YvIU1C1bDrfXByudgmWaa/z+9TXeXqrXhdTq06ZNUyMIpG6SzWsffPBB9bWuGMASEdGAlBlB0FIibcHnNjqct9iejkLW7gpf62MpuHQdWYauQs76WBquFaf8ydwqxFMmKuqTWFgbhgYNlZEEKhuieOarefB5PPC5dCyuTUBGQCWkbpOl0SEgHAdkcEBNYxoLDMA2AVmVZCZseA0gkrLhNzR8Nb8O08a78FN5PQ6YOgJel46ldVGMLMhCZUMSI/KzVFdsf6XpGnL3Gat2T+1I7j5j1P262rXXXquWqUsQJ2MCJIiTrs9LL720U8fJdC5KyLrbbrupuaTyqfwee+yhQktx+eWXq05LCTWlC1KWU2Xmg8r10lkp81bl9u22206FhhLQSSj8pz/9CZWVlapDVHaCleCxJ0hn5yuvvKLOS8YrSHAps23b6+LtjP333x933323CiLPPvtsjB49WoWN06dPX6PHS2gtRfhxxx232vtKGC6dpjfddJMKjRcuXIi8vDw1o/bWW29VQXeGhOjyJSF5QUEBNt10U9XZesABB6ArSGesjHOQLt7a2loVbt97773Nt6/q952Zn0tERNSdmmbDth5PkE4lm0cNrGm93l7Q2p3ha6yhAW6ft7mrNZ1KQNcMxOprsGz2z/AGQiifOxvpVArfv/OGOvdls2di7tdfw0mbMg8N4fpaOFGZ0WoDhgtZ+UWIRBqgazoWzZwBt8uFZGMj/PmFqFiyCAXFQ1FbuQxWKgV7vg6zOI54XSNGbrMh0mkTtYsXI6ekFBXz52HYBhPbHfvQX2i9WK/LGCj5UFw+fJeaVOrGu+66S41+Wlea7MSFHiZFnRSgUvRxVhUREfWUtGWrELZleBpJmu3OkO0pmYLyp6X1TV2rKRMp01JB7K/LI5g0NBsf/FqlZrMmzTTqImnMqWxANCnFCWCZQEPr/bHaJa9QSlnZakt66GQxV2EQWB4Fhua6VDfwhmU5mFXRiE1HFaIo5MGIwhDcLh25Pg+G5fnV6IGBUivEf6pG/X/mtvpkXT5Jl2LOv2H/W6pF3Ue6V2VDs8WLFzdv+tXXXXXVVXjxxRdbzYftDNbqRETUW+KRMHyBILQVHypnRhH0VjenbHAl5yLfZRzA0PUmoHbpEpjpFNLJJOqWL8PwSZPwzWsvIxFLwIGGeLgBdcuXIJlOSRMr7GjTXgCrpUnXhWwwbEMPZsNOJgCvB0jEUTB6HHweH7KLipBKJlA2dgOkzSQmbrMD4vV1COTnIysnF6424XV3Yr3eOeyAJSKifqczy/9bdsK6DV19tSThqxxPZI7ZXvdsV5Hn0jUN0WQa86oiiCZTqI2ZKAy6YZsePPHZQlQ1JODzaJhVUY/qekc+FIfbA2grRmBVNI166hRzxZeQT17lFddHARmdJCucgl4XokkLG48sgNtlwIaGyUNzm65PmT0avvYEKdp8EwvU7qkywF9mSMkypu74JJ36J+ksltENEmYedNBB/SZ8JSIi6gsSiUSrTTQ7Ikvlbdtq7nr1Z4VWuo+Er9JpKlVsprOzve7ZriLHTsXjaiOrmqVL0VBdAZfHA92lo3b5Unzw7OMwDDcMjweLfv4Bzr/1piX/bo/q4NV0N+xogzpWpzoend+6KmwrpUYQIJFAztDhSDU0IDg8hJziMiQj9fCFsuBJ+5BXXILs/AL1HvZk+NoT/AOsXh9Yf00REVG/IF2oEkJ6XK3D0DUlQWkmJF1dGLsmQWrbx6/qMWsa/iZNS81UlXUm8nob4mmU5vrV42sjSXy9oAbL6qLwuFxYVNMAGzo+mV2jHit7IyVMIJE5mKSlTRlxl5BDlYYA04aaGZvl1jCqIIBheVnQDA3Zfg8mlWar7tdwMo0cf9cPuO8LpHjzjW2ax0nU1tNPP63GD8h81Mcff7y3T4eIiKhHresMVQlfMyHpqo4lIaexBtHUyvNi2w9fzXRT0exyr75el02qJFiVsNXt8yFcU4NQQQESkTCqlixG9ZJFSEQiaKwqV5tZlc+ahcolC+Dx+tUYArXbbYvQNKXWmnUydO2INEhk50N3G2qkWOGIUcgqKFTvQ/Go0YjUVWPCtjuo91Zq2syGXwONNoDqdY4gICKiAachlkJO4LfQUEJPN6x2P4XPBLkSmApvZuBqBzrqjpXHy2MrGuPQoSGSMKFpjgqZf17WqOaoLq2LYVZFGGnTRMBjYMbyBoS8OpY32EimgSR6hpSvoSAQNKA2Y5o8NAeThucix+dRowhGFQbRmDAxpqhpw5/ewFqBqPfw//6IiKg7yXL+VCLRKlRt6nBtf3ZrJsBd01C4ve5YeU7ZCFXm5Ucb65GMROELZSNaXw9oFuqWLUO0vhGLf/kRdjqJeCKFRETuF4OVTsJxNKTC9egpnlC22nzXn50Pw+3BmE02RSAnG4GsEHKHFCO/bJja9Cu7sAi9hfVC57ADloiIBhTbltCzdYja1LHqbreTNROmtgxe2+tybXl/y3bURlRynXyOmeV1Y3l9XIW+M5bUozKSRGVjAkuqIwh43fhoTgVsy8bSegseDfB5gKpM2hpeMVegB//DL521hS4N65fkY1RREBuPyseE0hzMqQxj/ZJshPxu5AZ69ryIiIiIaHCQZford7R2HLxmbst8TyXiamRByzmxre8fVEvydb1pkyy1PN/lRjIeh+PYWDxzBuKNdQjX1COdSCDSUIvl8+bA0HVEKpbDyMqGFendDSlTiRRySktQOmYcPMEgRk/eGG6vT7320nEbqPu4vQOz63WgYgBLREQDinxSvIpNUputaoxAe7dVhhNw6zrqEykVwlZHEiq0XVobRyRloiGawqdzqjCzvGkzLdlIqzKcRDJpoTxiIzM6PukA4aR0266q01Zul4JSWzG5teV/rlf32I5JyZoXAvKCfpRmeTC+LIQNSnIwsiCEvKAXm470qPBVuNrMyiUiIiIi6gqyXD6z/H9VOup2bW+5vWyIlYxHVKerNEjICIKG+grohhuVC2chkFeIxTN+xrKZPyOZjMLnz0I8FkbVwoUwskJIVlU0H6tXw9dgFrIkRA6GUDhiJAqGjcSQEaNVp6uMSTBkc4gVdGPt/iag3sEAloiIBhyZL9uelGm3O3dWrjdtWwWq0tma6aSdXx1BQZYXpmWjIODF818thNdrIBo31XULaiIwLQs/LalDbSSlxgjEU00zVuU/sHE1AarluWTCVCmWMh2m+opQ1Wnxn2WjRdDqWnFfp/lxOhzYnfhPuEvdW8cmI7PhchsYXhDEpLIQfG4XhmT7sUEplwwRERERUc9pbyMl2chKQkWtnVreMtMwU+lWoWyktgaaywWXy4V4OKzq5V+//xSpZEL+IEA6EkGkrg7LFy+A7gANFUvVsn2kUoCEuImmsQdmL3e7NvMHUDJ8JAqGjkAgFII3mIX80lL4s0O9OmqAugYDWCIiGnA62tzL7mDsuVwf8LjUaAEhn5r/uKQeQ7K9+HBWJRxoasMsmVLwyexy1WFbHUnj5yURSKNoveSqbTSN/9dW8Z/dludorHhEy27Xtp9oZ47lbo5uWz9bxx29pdkGRuX6EbEsbDUiFxOH5SKRsjCqIISNR+Z1+DgiIiIiou5guNqrXTveosixHRW+ZkYLSL1eX1kBGBoSDWGEa6uRiETh8fmw8IevYbgMNFRVoGbpMtVpa0YjrQ+4InztK3KHj4A/lAtfMAiP34sRkyajoaIC+UOHIb90aG+fHnUBBrBERDSgtDe/NcPnNhBPWXAbMqZAQ0VDvHnJ/eLaKGrDKRTmePHw+/MwdVQu/vr2ctiWibRtYWZ5RNWEqaSJqO1q7nI1W6WhHY0HcFZ8Zbpd2+uIbdv52vJY6RY/Z46RuSwn0P7yozwvMCTkxYTiAMoKs5Hn92G79QrREE8h2+/FhLKczry1RERERETrrKPNtDKhbOZ227IQra+D2+dXY8aWzJwNTTfUz9+/9QZKx6+PXz/7BB6PC/W1tQhXVajGCjMSlhZbSW3V8cx0ZhhY3xMsKobH60PZ2PHQ3R6Mm7oZArl5gOYgp2gK8krLevsUqYswgCUiogElE762F8TKJ+UyYuDXijCqGuPwunTEEjaqojFE4iaqIkmUR5JoCMfx4awlMC3AkrEC8uU0xZwptaC/yW+Nr1LU6dChrbgtU+TJXCu5puUjXCtCVLmP0SKcda+4LjMLKxOqWm1C2vZC3tYdvzkGUBgyMG29EkwZlo2i7CCCHh2hgBfheApbjOESJiIiIiLqHc2bacVj8PgDK9XrUhtXL16EJbN+xrD1JmLhjO8Rb2yAyxvAou+/g8vvQ0NNJeZ/9w1sWDBNB2Ys2lS4Z+riFeFrX+UOZaOgZBgKRgzHuE23hNfrgyOb9QaC0F0uFAwdvtImY9S/MYAlIqI+JZFIIAVDbXS1NpKmpeazNsZ/C2BllqvH0PH+rxUq6vR73Ph2YZWa25rtN1AfNVERjiGeNFEfS6su17TtQtQ2Yaml/1L8OCvGCrQshDKzWSU0jcFW95VNAQz102+LqDIBa9NjXEjAVJclSE22CFTd7XTAtg1b2w9f5T/o3hUR7/RJQ+B36Zg0LBdDcppmvMp7ku33IFTCea992ccff4xTTz0VM2fOxO9+9zu8+OKL7V5HRERE1FuSEnaqIDW4lo+PwXC71XcJYOORRiQjUcTCDQjX1qJmyWKECgqx9OcZqJg3V4Wy5fPmweNxIxJuQCLcCNO04NgmkJRauqW+HbzC61NL6DaYth08Hi9yS0rV2IHCEaOQikXhDWR1uPkY9W8MYImIqM8Erz6fr+mrE4+ToFW6WuW7pJ6/Lm/A5OF58HsMvP7jEnwxtw4luToiiab7f7ewBlWRNLZerwjzKxuxtDoFjxeIrbhdSjizeeOrzH8mrRVBZ2xFAOpts/TfbBHMyn0tOCv6ZZvCWXPFl8SyBkx138SK42stjtXymJkuWXebTljRenOvkA6kbGDYEA8O2mQkKsMJTByaj3AyiU1HFiDL50K5jFtYy1Cbes55552HjTfeGK+99hqysrI6vG5dLFiwAKNHj8a3336rjktdq7GxETfffDNeeOEF9V7n5uZiww03xOmnn44DDjhAjT+ZPn063n//fXV/j8eDwsJCbLLJJjjuuONw4IEHtjrerFmzcMEFF6ggPpVKYcqUKbj22mux44479tIrJCKiwSozGqCzwWukthpuX1OoWFe+DI5lIbe4BOlUCj++9yaWz52LvOJSRBvrkYxG1NxWK5lEwdBhWDjjB0QaGuDIxlqy+YJsoNXvaIBmAC4dIydujMKRwyCNviMnTEa0sQ5FI0dD13R4fPLeMnztTaNGjcLChQtXul7quL/97W/rdGwGsERE1CdI8Lo2Qj4X5lVGUBNJoDqaxPeLavHhrAqMKcpGVSSOGeUN+Lncxv+z9x9AkuVrehf8O/6kd+VN+57pcXeuX2+02tXKg1YWIpD4hAn48IFAGAFCIZBCAkkQfBBCAi3I7iIkVhJi/WoN1+i6uePbTpvyLn0ef84X7z8re2r6julxPT09/9+NvFl18uRxVdP15nOe93m3DoZ4DrzWBd+E/a/foRulKjSgH87a/k9iHgui2fHzSfEyP5ZqRQR1TwzBmoUSJMfL5XUpEm1MRuRKWi6Ot+mecMZmx4LszE07G8o1E37N42VyvCLu2mpLNhauC6faZb5wocNCrYRnW/zg40s8e6qlhB7/eCBZ2dN/8u8lz3NVYI1GIyVsnj59GvMjbvW6fv26cruura297bKHhSRJcJxHT9h/r+fV6/X4/u//fvr9Pn/6T/9pvvCFL6jJzCK2/of/4X/Ij/zIjyhBVvhX/pV/hT/1p/4UaZqysbHB3//7f58/9If+EP/iv/gv8j//z//z3W3+zt/5O7l48SK//Mu/TKlU4i/9pb+klsnvxdLS0gd63hqNRqPRvB3vVRws1Zvs375JOB4SjkfcfvklHM+ns7LKreefIxgO2b5+ldHhAYZhEXT3oVRmb3sDgqnbVnFylMLHBdel1GxSa3RYf+op6nNLRIMel77vh/CrNWzXwbJsTMsiP3YWaz66ev1rX/saWfb6L9qLL77Ij/3Yj/H7f//vf9/b1oESGo1Go/lYMAgTwiRT2a7Cc3eO+AffusNf/9INfvI3rvIXfv5F/sZvXOfm7oAvv7zB/+8XXuZXX9xhrzfg1Y2hylDa6eZKxhzlcBBJDWccRwHMclpHGEowje/5UzlzqZrKvyriq/WGCnAm4GbHwuzM9ZocP1JyFU1gHAut/vHzyX2Jq5YTDljZRulEZuyUAo+mZ1ErWZxf8vjei3N84VyHZ9fbnO3U+NGnlvn+xxbUulXPxhanALznSIdHlZdfflkJWf/b//a/KaeiPMv3svzDLCD/zJ/5M8p9KkLas88+y9/9u39XvSZOSRHMDw8P+aN/9I+qr3/yJ3/yTZfNisHf9tt+mypEFxcX+Rf+hX+Bg4ODN+zrz/25P8eFCxfwPI9Tp07xX/1X/5V6TfYvfOYzn7nrxpy9RwRBEXrlPeKO/dmf/dm725wd40/91E/xQz/0Q+qmyd/8m3/zTc9V1vvLf/kvK6GwXC7zxBNP8OUvf5lr166p/VUqFb73e79XiYgn+Zmf+RnlBJVtnzt3jv/yv/wvlUD5frf7P/1P/xPnz59XbtPHH3+cv/7X//p3HK+s87t/9+9W2xDxVK7df/Pf/DdvWO+5555T68r+3oz/5D/5T9R1+upXv8of+SN/hCeffJLHHntMia3y3pMOZjl+EVDlen/3d3+3cs3Kuf2Vv/JX+MVf/EW1jvxMr169yn/0H/1HyvkqQuyf/bN/lslkon4H3gr5PRGhV+Iq5D1yPX/8x3+cO3fuvKvrotFoNBrNu0EGZs2csmkccfvlF3nxV3+ZL/+ff4fnf/GX+PLf/z/4yj/4e+xcu8zVr32Fr/5fP83WjWscbN2hu7ONYVsq61URTN4ovn7MMMs1ynNztFfWufDpL3D6mU8xv3qWztIqz/zIb6Fcr1OqVnFcT4mv7yfS4VHl5Y+gXp+fn1f12ezxj/7RP1K1ktS+7xejmCYcP/DWrEajodwB9brOotNoNJqHLQbgo+Te4VnjKMU1Df7fGwd8Zr3FX/4nr1LzXX71yj6eZbDTDan6hsp+HYYpcVwovbI3yQiwaNpgZDAsCkwSlS87FT0NDEwK9fXMmZofC58j8daeEFTt48fJ2IDieD33WDyNjpfJ9g2MPKAwneNtj4+fT7pq4+N9zDJmZ27Ze924CSaOWuqYcH7B48JinXrF55nVFqc7ZXYGIb/16RV6k5isKJivfbQ/w4e9VpCi7ad/+qff8vU/8Af+gBLOPmhEAP0bf+NvqMJRRLFf+7VfU87Wn/u5n1Ouyf39fSWCiQj6B//gH6RWqzEcDt+wTK5JFEVK1PuX/+V/mT/8h/8wQRDwx//4H1dCpbgkBflehLy/+Bf/otr29va2ypCV98id/S9+8YtK5HvqqaeU+NZut9W6f/JP/kklAoo4+7/+r/+rWvbSSy+p451FF0hr1n/73/63ah3592J5efk7zlVEytXVVf7CX/gLSsiV4xEBUkRVcYKKICyisoiEEq0g/Pqv/7oSVv/7//6/5wd+4AeUiPqv/qv/qnKE/hf/xX/xnrcrzlK5dnLdf/RHf1QV0rLuL/zCL9xt45ftLiwsKHFTCmxxrcrPSgRmOf8Z/86/8++o/c3iA04iArZECYhDQq7h2yFisRy/HNObbUOcsP/j//g/qsw7EZnlesi6IozL85//839e/TxbrdZbCrBy7UTkl+spP2NpnZPzkiiDd7oun/vc53StrtFoNA8hSRhiuQ6mee9MgAefAXtSLFSCaxqzf/s25UqNl37jl8mznP07NzEomAz6WJbDZDwkzwrl+hweHkB8nN/q+sdfz0wPH0/Mao08zWgvrbB87hyNhSVay6vYjk210Wbx3HlG3SPcUumREFsfxXr9JBL9tLKyouLA5Cb7+0ULsBqNRqN5aB2vUoJd2R7wj1/YUuLi9d0hc1UXx4KdUcwrG2MWqnAwgoU6ZCnsTgpyUlJpz1fyqqXcqpkq5qYZrK/HAMizyJsBiXKbZsc5rzNBVpgJtukJIVa2E4iH7YSjdbot+d7Ap1Cia+l4PRfrWJyd+mZLmByR03pDlqtDSnJPOlDThOWOnLPJZ0938BxLuVqfXm3xg48vkBeFEq1Xmo9WXtSHVSuIwCWCk2z/rZD9/bv/7r/7gbY3iWgqIqeInt/zPd9zd7kIouJm/Ft/62+p70U4lOMT0XHGvcvEnSlipQi3M6SFfX19ncuXLytBVO7e/w//w/+gtn+/GbAibP4b/8a/8YYCU4RaaaOXzKvZ++RYRIh8O0TQ/BN/4k+orFLhK1/5ijrv/+V/+V+UQCr8nb/zd1TmqQjIgoiAv/k3/2b+4//4P767HRFBRRTc2tp6z9v9vu/7PiU0n2zrl6J9PB7zf//f//fd7crPXATnGbJPEXS/9KUvqesgsQRShIsrVtyt97K3t6fcyCIO/3v/3r/3ngRYQdyw8rvyj//xP777s/1n/9l/lm9+85vqd1KEYjluEcDfChFg5RrI9fmu7/outUwEWxFzxZ0r5/N21+Vv/+2/rWt1jUaj0bwjwXAApslr3/4GN776Nbxqia1r17B9H8s2VfbroNtXNxRFiPUqFZIwIBtPPtZC6xswbZVNu3bpSYb7+zz2vT/MZNBl8cw55tbWaS2t4FerSrSuNKYxRI8Kj1q9fi8iAP/z//w/z+3bt1UN+H7RgXAajUajeSgQB6vkl17bG3AwjHh5s8dcrcRXr+8zX3X5+Rf31J30q9tjJVk2alPJcxhC2YXNwcxDaijhVV4VKVaQqAGPmEj92Ztluk6HWkmmaqZE1iFQOSG4iqgqxaF9T6SAPM8cuiLwyL4kyCAnUeuJq1Zw1DtTJcIWZOo9U+esxYRMuV/jY/fsVJaVrS+VYCeAlgO1ksk/89k1JbJ+9vQ8RWFQL9ksNnz2+iEVz34kxdcPE8mQertiTpDXZb1Zq/4HgbSsi9AqGVL33ll/OyHtzfj2t7/Nr/zKr7zpQC5xjUoOqQi+ImbeL3LOIjiKKHcS+V72d5LPf/7z97VNaZmfIcKk8Mwzz7xhmbjuZd9SRMt+xJ05i0oQJINL1pFrJy3772W7r7zyinKD3nte/91/99+97XlJof07fsfvUE5gESz/4T/8h+q6vlUG2AflaVAfUg3j7tciiovoKqK7RFf81b/6V/ldv+t3KSeziO0ios6GRYhTdub8FberiOczLl26pMR8uR5yPvd7XTQajUajEbI0wbKnNfCtF76tXKx7N6+TJCnBoEthwo3nn1MOT9t1Sccj/HZbDc8yS2XyKCI83OeRwPHFjoxRqTK3tMr5z30eyzJZPPe4ejYsC79UEcsj1VZLOYQfNfH1UazX70Vu8Evk1wchvgpagNVoNBrNh4oIFtI2m+cFYZpRdr/zT88oSPjFl3eUFPpT//Q2q3WfUZRxZaeHJcJmYXAwlAiBjIZrEiQ5ljX1lHbTqQBaVU7WBAuDAJcaEybHg7EcJb5OBQ2L5K4LVSIEZKvyeN3VahwPuhLxVZDXZl/P3KrFPc/iXPVOrCfH4ZOeyIM1CchpKPF36sblhJAra8t3GWvtMktZwefPzhEl4gI2ORpF1DybL56fV4LrXj/ic2c66p0n4xo074wE+H+Q673b/YpzUZymJ5H/Pt7ttkSAk8zQexFB7saNG3yYSEbq/XByiNVMUHyzZeJymJ2XZL7+xE/8xHds62Q0yrvd7vs5L3EQS76uOGP/2l/7a6plfyYE34u4jkXgFKfpe0UEZ8l8nQmnEikh0QDdbveus0SiCSQmQHLQJBtWnLLizhVEoNVoNBqN5t0ShwGuX1JC4ZsN2hLxNYlCnvulnyUaTrj2tS8xf/Eih7fuMOwdKME1GAxIJL/VLeG6JXLDPE74SsnHJ8Q0+Tv94BuxPxhEgC4KLnz+i/R2t1k+ex7LL0NhEI4CmgsLVJotlet6tLVBZ+3U+xpe9kll9BHV6ycRcVc61/7e3/t7H9g2tQCr0Wg0mg+VmTBimsZ3iK+9ccSvvLrHTn/ML7+6C0VOb5QwihKsJGJ3UNApwSgR12jGWIRKA8LCJOpN/4jFZNjEFDj0MKioXFaPofKTTsXZhDIOE5Jj8dVSUu6sEBSRxlbip6StqmNVjtVpNqtDyDQMQYTUqcjhEJEoyTRVYQOh2oaIt/7dDNhMuVvLx45ZT3lwp8jr1bui8UrHJokjcmw+f26Jz59qszdJcYyCM3MNLq226E0SOlXvruCqRdf3zpu5Rt/PeveLZFSJ0CotTO83xF+GVMkgAsliFZfjvUheqwhxv/RLv/SmEQSSByqcnPAq4p7c3RcH6snjk+/FLfkgkPOSCAUZfvVBIm33ch4nYwPk+/vJDfvtv/23K2FWhlXJQDLJ7X0rpAVOsltlkJVk1t7rlpAPCSIkv9nPbIaIqiK2/t7f+3vV9+L8nW373n3NBGaZBvxmSCbw17/+9bs/P7m24o6W6/F+r4tGo9FoHi1EMHwroTAYDvn6//MzytV59Sv/L4a03BcZ25cvE/R7hOMAt+yTRMd5rllKEI4pgjHhm+WcfszEV7fWwKtWyOIU0zZ57Lu/n0qtxuLZc2RJwvqlpzAsg3KthV+uKCFbmImvmo9PvX4SufEuHUjSDfVBoQVYjUaj0XyonBQOxAUbZzmbRxN+5uu36E4ibuz12R8GzNdLZHlG3S04GIXcGeVUDZPdIOdTyyUlgV7bjbCSDI9ESasigi5YBqPMJ8TEYESkhFILj5CClFiJoiK9SvaUcRxMkN17lMciq2zVIFdrylYkS1a+ksdM0JWvHAwiFSYQnhii5RIQK6FVRFb3ONZAXK+m2qZzvBVJfq3WbM4vNXl8qQpZzrmlFj/x2XWu7A5ZapRI0pSl5sc/nP9hQ8QqERvfKVPqrUSt94oM1Ppjf+yPqWxQEc5kMJbkZYngJft7s0zRt0Ja0mXA1j/3z/1zKh9V8kIl4kCyT6U9XUQ+GU4lr4nYKm3lMuBLhkn9S//Sv6SKSRFoRVBcW1tT60t+13/wH/wHSjiUSa+STyqFpwyckkFUD4L//D//z9UQLsld/X2/7/epfzskluDFF19UubfvFTkvyTaVqAfJmZUoAXEziKvhnRCnvWTvSi6tCNsn83vfDIlP+Cf/5J+o3FX5WmIN5CaUxAf8mT/zZ1RsgLhkZ+Lqzs6OEkol51WGYonT9l//1//1u8PBZH8yaEt+P+T6yM9NfvavvfbaO34gkP3+W//Wv6WGcIno+2/+m/+mypedCbLv57poNBqN5tHCOFGvixvWsm1ee+E5rnzlS9Tn5vn2L/+iym+t1hrkRcbwaKyyXONBDyxb1fDz66eZ9HuMu0cU8dS0wGQMIkgWhkzq4uODgV2rYTsuFz//XViWTZHnXPjCFzn15KfYfe065XqTUq2mBmppHo16fYbU6lIHS/31djfO3y1agNVoNBrNh06S5Wx1J7y63ePa7gjPhis7h1imyfXDgCyG7UGAGDv7Ecx5EhBQkBWR8q4OhgH9SUjL8QiTFNcwmRQipg7ZyzzlaDUZHTtYIyWnRipiQEZwRcrNKn/wJKdVwgBkvemArlmkgHyVKp9qqIZ2iXRr3BVf7eP3TZlOZxU5dxpjMB28NR2h5dx1yk7XbuCS4ZsWTgkatslCs8xCzeVUu0pcGKx1ynzx7ByjMFGCk+S7+o5FU4JtNR84co1/62/9rW87VVVe/zAC/WVwlLSpixAnMQEixInr891OVZ05VUVk/S2/5beomA8pQE8e93/2n/1nqmAU0U6yXSWa4F/71/419ZosF1HuT/2pP6Vel9xQEQ3/7X/731ai8L//7//7aqCUOCH/wT/4B0p4fBD8+I//uGq3l+OSeAURECW39M1cvO8GGWAluaYyPEuGh0lWmBTVMgjrfhDR+r/+r/9rNdTqnRAxXAZf/dk/+2eVaCztayKgSkbtn//zf14J3TNESJWHiOSdTofPfe5z/NRP/RS/5/f8nrvrzM3NKaH8P/1P/1N+5Ed+REUNSObrz/zMz/Dss8++7bFIVIL8jsjwiM3NTfVzliyz+7ku75S7ptFoNJpHjziYsHP9Ot3dLRVFEI6HTHpHTPoD4iggCicM9vemDtY4nAqr0umWJFimxf7tm5AX4HsQp5AdmxvC6VDMhxrTEhUat1qhXG/RWFhUsQqdlVWiIGD9qadozi3ilMqYlkWpVqexsECe32vq0Hzc63VBbkZL19pswOsHhVF8UBMDHoJJaRqNRqN5eBiECeMoZTCJGYYxv35ln9Ek5JWtAa5rMQgz8izFiBMOJxINYBGGOfNeRGK4TMIEwzWVHNqLZbxVqtaRoADv2FEqya4JJeU8DcS1moNjyjAsQ6WwToVTCQZI1fq5kkXlq5lEOhVgPTVAS5JZRUAtlIQqkuz0/eKglZaq4niolrhjPQpKKtZAogYkd3bOrnKQGtRtA7OAcsViHMNyo8RTa3V81+J0u8EwjPjBi20mqcFSs0QUZ9imybmFGqaYA9JcCbCfdD7sWuHll19WwtZJoUn2I8WcbsHWnETcqzLQ7M6dO3eHfj3s/ORP/qSaDCyRA+8FXatrNBrNo4+Ih9ForG7udXe2lLB49etfpb+7RTAaUVgW6WRClkjXl8Hw4AApVvNUZiWA5/hESTQVZKWkDieScyQTRl/PS02PnbAPCUa5SjEZgeXhVX0VqxCOQ574vh8iGg6YWz+FZRs4pSpnnn6WUfdQ5bpK11uexLRX19R1kviwWWzDJxldr787tANWo9FoNB84/UnMVm/CK5sD9oYj+kHG9n6XkdwNt22OhmMGQUbZgkrJoJlZ+FWP65sDLL9EFomoWdBwc+LcpZkMSQobny5VXLoqmzWnj0+uPKsmFcl6NafSqMil8rpNRqpelVFemRJWRVSVKlFE2qkLVuRVqRptqhQqQfa4bMTBxTr+TnJhbUzGamuSOSvLKiyVXMLc5UzH5SnfwzYzohx+6NI6Nw+GXFyo0mmUWW9XcC2D8ws1dgch20cjzs3VCJPsruAqX1uiwmo+dKRoE3elOBQlm1MypMRF+mHdSdd8/BBnsUQ3/Mk/+Sf5/b//939sxFeNRqPRaO6HwcEB4WjInkycP9xh0uvT3d6gkMGOlkt4dEgwCSiXKyRppCIILNdhuLdPY2Ge/sE+hu1gmCJOihPUhESsD8c8LOKruFtLvopBuPjpzxIMB0RhTGd5Gbdcxi1XWTpzBq9cxfY96u0OXrnC8OiI+vwCreW1Nwiu0WSsXtd8+Dz5iNXrWoDVaDQazQfCMEzUcKh+ECsh8We+doODUYxnW2wNY5xoTG657HeHTCYZy22X/SCjGqf4vk33oMdSyWJvnFC1RRiNyYOU2Mwp+yXGgYQAmEyUAOoQE9JkSIbDSLldp65YKfvKKiKgwM5zAtNUHlURV2VIVwNLuWXrajiXhAiIEIuKKRhjUFP3+MXTKoO4UN/b2IywWal5jDMLI0+peTanFqosNsqcaXvcHOZKUK24Fk+ttkjznO+5OK/crY6RUViOihWQgmG5WVYP4aTbVTtfHyzys5C2a43mzfjbf/tvq/gBycP93//3//2jPhyNRqPRaN43Ei1g2haGYdLb3ebyV77MuHeoWuuTYESWpCrP9XDzNWzXptGZU0KtabtSOBH2hzh+if6+uGEtivEIu9FWPWN2tUr6IU6lf2dMsC11fKVGg8HBPpVGi/Nf+G7yOKYxN8foqMfaU5/CdVw6p9YxTRvDMLDErUuhBonVOvNqa53V1weSnXS7avH1wWI+QvW6jiDQaDQazfsWXeM0Z38Ysrt/QD932DoKuL3fYzIa89xejFOElCd9mF+WHnt2DiPmajmD3KZGRJhJOEDBsB+y3PYY5h7j0RjSsRJU5X5hT0msqXKeStkzwcOmICBU2aslWbMwGRoWlXxM16xTJ1Z3Gj0iAjwcRIwVV6xHhTFjfBVqIMKuQQWTUK1Td1LGRQkrL2iWc5r1mhritVb3uHBqlVu7XZY6DUpOwfpck+89P8flrQHDJOFUq8woLvjUWpM4E3ervtf5XtC1gkbz0aH/+9NoNJpHg5Nuzd7+LnmWEfS6ZFnOa899nSRO6e5sMzzaJQ0jLM9VQ3Mbi8tsXXmVUrlCvd1mf+MObrlCmkRE4zH1hQXi0Ziw351GEDxoSmWIYnAsSpUaWRxTmetQb7SxXJeFM2fpHezjqBzXddYvPUm5Xmf7+hXldJ1bP0N/b4e5U6fVoC0RYTXvHl0vvDv0p0KNRqPRvGvCMFST0/Nowmv7Yw5Tn0EYUnVLkGWMJwELzRq3R0OW7IBSxadr1TnqT3h23qBhFXTcnCtHMeMooeIY3BnatCWzPwjoB2MwS8RKNB0rZ6uDQZuQHjJ0K8YnIVAhA45ys06wKBsJdZFUTZsVEiYU1AgYUqVuOhzmMbXjcVk5Lk3LxSEjyzIca0JqlWj7MNduY+YJjWqZUZhhOS7ffWGROJdk2Yzf+elTfOrMAqZpsNGd0B0GPLXeouRajKKUCzJNTBytpv4zq9FoNBqNRqP5aNyuErslOabjbpdxv6v6vPxSlaPt21iOh2Hb2I5Nc3GV4dEB8WSMZbsqhmBueYXJcEAwGuD6NmkcEE1iDMdheHhAEQTgl6fZr+8C6TKbDrO9B8tFTea9u6IFRXYstkYgA69cHwyD9rwMxCphug62YRKMB6xeeIql8+c42LhNrbPAhS98D9V6E69WZffGVTUc9+ynP68EV7k2yxce+wCuskZz/+hPhhqNRvMJEUs/CKdrkuX0JjFpHGPHsN9P2ewmXN3vc2a+ws9/86o0IPH87S5/+AcvcXl/zGdOtbm9e8RaOWOt4fLy7TFzXsZ+tUKrnNOuuGwcTDjtSM4r9FM51hAzD5g3c4LcZawiAWBMQRlLDdgyVdKrOFkTqmR0GJMYVZJimug6JEci8yumS6nIiPMJbbNC08zppxl1r0IRDXnqzCLDzCSKExoVl+VOjWatzpm5MoXtsVS1SZOEcWLSqZc5P+erNqw4zah6DqutMiXHUmKsINdJo9FoNBqNRqO5b8IB+O/PQSiiolcuK9erEl6PDlVbvRgJZMjW3q2blKo1Nq9cIQpHDLuHnHnqWYLxCNf1qHfmmbgetuewceUK9WqJamcJS4ZxHaREoz6likc4HoApAV6S4SWBXYK4YF8XVR2rQMXC3l0mPW02WCZWnpAWJ2K3xH0qEQDJsfjq+ZDlkMbgV5TDdvWpZwiGY4oiVbECi2fO4pbKtJZXZJPkSUFtcZE8Tpg7dZ5au6XONQ4DbNthbk0yXl+PFDj5tUbzoNACrEaj0TzivJn4OhNU74eT6+Zpwm5voFr9XTPh16/uUTcT7mx1MYoGjlPGTSZ8asHmqNujlAVsbkaqcJPBXJ26zdkFj3wyIA6G7ByN8EslepGB53sqqqCeT+gRKyE3V1muDktWQmw71IuM7SRn2QhU4muQ94/9sCaOX+colErPwjcLWg7EqQmmzZId4pYqXB1a1Eo2jzc9DoOcs8unSQqX9YrBhbOn2N454gvnF0ksh3bVo+Y5LNR9bh6M6FQ9qp7NOM6U01UeQsXTf0o1Go1Go9FoNO+De8XXeAy2Px0g9S7EVyHPczZefoHVS0+ydeUyh5u3GBwcEokYWRIjgYntOMytniIadBnsbbOwvEC4u0FkVmi1KzTrZcqOSX93k/FwQsk1lBCaxDG+ZZCnE9VPliXRcddXSlJq4SYjIon2ymXoLbhMiCljOja+CZMkJ7ctZBCDIQOwrIxoLPV7oZyqfr3J+HCX1vIcRqlOEsU05xfx6zWWz16gsbhKEozprK9SrjVVfd9aWmLYO6JUrqpzd0slkjDAtCz8SlUdhxZcNQ8D+lOjRqPRfAK5X/F1vzdU7fwySMq2DI6CnOFwhGX77AYxc/QoVzustEuq/ai/v82leR/Hs3ht4xanWx6D0ZhkklGtlhiPYrJ4QC+U5qMU1wxJopySFJg5DKMAKTPPWCGx6dFNMlxDQgJyomjIIRYlJjimTWL7BFFEatrMVUxGucuCPSKkIE7BNwycagXfs0hDm1bF51NehldrcGGtw3qzxPmWw9raGrtHfdK84Lc+/SlsyySIM3Z7Q+ru9DqdmZsWb4KIsBqNRqPRaDQazYeGe3+DnuLJhLwoSOMQ0zLJs5x4EtDv9pmPYo52t5VxYuXxJ7jz8kt0t7YZbF1X3w9394ksm0arQTIZkbktyQKjv71NnoSMkwK/GEEyIHSXaFhj0rRgHMmA2oBms4IVT9ifeGR+hVa6w1HsqvG14mVdbLnsTZq4SZc4rVA7tUiwN6Tsir48gWSEb3t4Cy0l6hZ5TmqYNBfmWbr0aWzT4dznv4hjOzTXVjna2KTSaCjXaxpHOJ7P4cYd4jCk3p574+WT2AKN5iFDf4rUaDQazV0molxmKeWSz1ZvgmdMU5q+fuU2WRxiuz7b/Yj97gEUCeQWHStUU1Mj02OJkG/vuMz5CaZhcGP7iE7VouOkXN8O8CseZhRhZQmu5TBJMwwrp27ISC2DHibn3ZhuUaZchGRWAzcPaVkmVSOjbAYc5A26qc16OVftSacrEbtZmXoxoe94lK2CugltJyG3DdbmyjQbDc4vNFQo/0FU8MNPrbF/NKTTaSLJAdVKhflmlSwv1DVwLZMzC82P+seh0Wg0Go1Go9F8R3eamyfK1Tnu91QWajgaqmiBNAnxyxWS7g7f+Ed/XwmpzeVTdG++iimRWdE+9axLEckwrR52EZEebGO31vGKAd7oFnF1iSV3wHiSkuY5sVGhlncpewbjSU5Z6vKyryIDoiBhpVOi248o/BIt18KIR6RehSgc0yl5GI6JMz9PcrTFWs0iSS2yuQaEfSrzLbLIpH7mAounL5EZBqValTSKWbr4OK7vU643VJRBY36B+tyccvs6nqeuRWdt/aP+cWg0940WYDUajeYTwttlwcprrusqUXJOWu73elRdmyBOeXGrz3LZ4ueuDHDMI1omtBhzkLh89fouT62WmQxHjNOcPE4xwwPCSYJlO/hpQNKN2cLHLjJGI4jMEvVkwFgyVI0MM+sxycoMcOkQchibWHZKZFdoZqG6g34UZyy7OWOzgVPIHXUbL5oWghUpzEKTPB3xhfUycwtLjMcpuZFRcy3Or6/Tnmtxfqml8mt/qF0hCEJWl+bvOoFnl8UyDcqu/tOo0Wg0Go1Go3m4smCnQ7UgGw4oqjV2rl+htbzG9svfonBKRMGY7muvYFkGdtLFNetkk022vnVDggmoVW2Koy0sI6S68yWGeQMjOES0zFJwXYJUaZRTtnp3yMwcv5hg2GWa1QPizKJkOrRKIVFmkOPh2iHmcoejYULZntCWqK7Uxq56lJ0JtYpPFE6QhrLc2CfoNGjlmwxbn6fa6RDbLUrhFllljvUv/Bh+tUatM68yZv1KXYmv4m6dxQfc+6zRfNzQnzI1Go3mE4KIr28mwh70huqucksEU8/iha0Rjy3V2Tk4Yjjokw8DXuq7LJkTllZX+cpzlzmaTGgUI5Ydi8H+QMUA3DkIuThX4nYQUsoj6nZMZPh4nkceBSQYGEZGPT7ElCj+rKDiu4zjgjknZDcrYRUlWtZQTTMN0zGJN0caBKx4BoZpU0uHzFfK2PkIs1JmbNeozDW4ZJvER9u0ltZ5+uwq4eCAdrNNu1GhWS3jOA6+77yeZSvb0lECGo1Go9FoNJqHCRFfwz744vqcEgcTiqJQAqy01ju+x9FrL1FbPsvOjSuEkwAj2MbLE6rpAUbnKXqXbzHe/SqLa+eYTPapmiMalkFiBvh+mbi3yby5Q1IUWJU5qkwIRkPKMnCr6tBMN3BIGNHAqHoEkwn1yoRB1qGT95irDNgYVXDyPU4t2uTDAWl9gSSJ8JyMIKuSp2Pqi2cIC59qvUkWGizPLbBQWqR2+kk8xyb0v5dKZx6nXAXbw/VLdwVWGaClxVbNo4T+9KnRaDQfY6QYi+NYiZz3N0zL50iC9G2DUqmklnm+z43NPYr+mEk6IpiY7B/GXLuzx97WBtVakzQN2R1F7Lz4EpZjYOYx/eEQvBb5ZA+jXGfRDHCDCZ0iwLQtjCKFZEiRZhylVSpEVFyLfu6w6sYEccF8uUKemxxFBnN2H4yAYdpisWQwKjwadqymqDaqvhrKNUk8FUngVqtUmnN8rpYzKgxKnkVz+TE+//QlxjJU68kz6rqIq3d27idlZz04S6PRaDQajUbzUbta33yYVoPR0SGO7yvBNTZdnCxmsHUT06sSjvqYJGy8/Dx1a8jo6reoVEpYaYAZHlHd+iUCy+FstUt/7+uslRokwwNMt4Y92aHdnGeU72H6HbJoRDO9TWCW8QhxC4c0jbHIqJVznLhP4S0Shjkto0/TDyUmlmHgstryVY0dJCaV2lRAXZmrsrW3h2clLK0uEZaqZJlJWF/k/No8vcCh9eT3UW7NKXG1FIUqy3V27icRMVajeZTQn0A1Go3mY4xhGO8ovspAKclvEvdnmuUcTlLy0YRKM0XGW5mmxXgUcSUdUisbbG7usbPjcOvqFdqtCsngkFe3jqgZIxKzTpJmuJM+RZEzGe8RUKboDig5E8KooYTSWhGRFLDkwEbis+RE7CUeRTyhbccEhscZNyVOItxChnE5qh3KzKFh9TGTlIrVoe2kOJ5BnGYsVQusuk+7UqVZr7IVOJy/dIFSrU7Vsan6troWddsmSZK74uu7GTqm0Wg0Go1Go9F8oNyH+JpEoRqgJe7WNI4JRj16d45YW5+navsMRhPc0S3swGRp+Sw3f+0f0Wy0ObjyFZrVFt5Rn+hwF89tYzkJi2FAnS4RFk40wPOHlLKb9K0atWQXp1rQj/qsV2IMK8BNC0qdnCgMaJRGDPIyFceg7hWM8xHLcyFhnJJj0K7GxNgUaZdqY528cHDiQ9JSmaZnU16fp3bm0+xu71KfW6H0+A+TYFNptmm11yFLiPJCnfdMfBW021XzqKMFWI1Go/mYEUWREhdFfL0fHMmBMmA0GnFl+4h6pczu0RHFSHJaU2qOSU7Br726xXK2xwv7GV9Yd0mKlM2DCCPq0ah4XN9Kqdl7lDGIsxTPiEjyEo7ZJSRnkDgcJSENMyRWd9EnBLlF0xxBCutOpHJiVZmVx1iWQy7uXVPyYftULYvI9MkNcEwP0yuUYze1LBZbNZYX5uiPA86vLRAUFt/fbrC60MY0TSW8iqNXo9FoNBqNRqP5OLleBcMwsc2UcPs6LiFWlJPuXaEfb9BYu4gx6JH298h2nsd84Q7lkYkXVll0UorwNm5wQLNaIul/FS90iIsQJ7dZsBKsvCBOCxw7puUdSvIYTQP8UpmikGkLhRp0GyY2ZT8jTTPmjQF1y+EwK1Gnh0p9dQOcDGKrhJODUepQsQNyy+So/BSdU4/hhdsUc58lXTxPo3aAv3wRRHQ1bfBq4JTU45179zSaRw8twGo0Gs3HbGDWWzlee+MIu0hUMZUmMZsHA6oVn0a1wiuvXlM5rZnhMuwe4Tk5L7/4DZbmazw/yFivuGRHN+k8fpbK7stsdWsc7R9iFRlBYRD3+5SKCKuAQQ6FIZEAYItrtSjwLQvfilVAf5wb+GZMNzTomBm9wqdjh5TMHNc1Gec5SWpiGQnjzGDJGTO2DOXG7Rc5DUL6SZ31hsuFixexipRqrY5lmlxYnePMmTNUq1UmYawmsA6HQy2+ajQajUaj0WgeHNEQ3Koop2/++puJr9GQQe5RZ6IyXqPRgG5/zJI3wupcYPiVn8ZtLRJ4c1T6V8gaZXj+pwn2nqDmGCSHGzjGBGrz1A9ewi9Mkv4dadbHI4JwQpTBxJKBuTAyHKI0wTehYkPJksMt040nSBqXJ/FhTJ2okQlelmJgUbMzJcpGeYxppDSkRU0ivXBwLOmgy5XAW4sLaF6E8z/ASqlFVJgw90P4ovCufhq7uQ7jI/AqEI3elSCt0TyKaAFWo9FoHkJm4uvbCbGz14UEi3rJIUkgznI15MozMl68epu6W2DkFuPwiGGUMdg/JDFNDsYTXtvaYqVd58uX96nkId/80gZZkXM42FflmG8a5HlBKnWYCeMUJcJaIqgaMM4KItPAyjJSQ4Zb5aqQC+KUugnS+d/IYiYJJCYYZo5nmNTsnMSQYyzUfpLC4iiH1XJBbe4Mn55vUa/WKAyT+YU1OrUyfq1J1bPV9ZCIgVplKrrW67qY02g0Go1Go9E8QMTNeZ9u1zzokhkl8tyiboZgl5Ug6eVjFqw+vPzzhHOX1NyDZPNlCtMh3n+ZtpGQhnuYr95i6DhUupvkdok4PqBsTChCD4MIFwuQQV0wyaFlqeYzMBIWHTAN+awAg0yiviZUbek2kxWKEwcJvmyG7PVTFFGWXIm6WA1qjOULaKzgzZ2FchuWnyQKI7ylJ8CuQ2sF3BLYPuQ5VNrTjdna86rRaAFWo9FoHtCwrDRNcZx3l0V6UnyVkHspk+rHeaaDMFHCZ5xDt9clrfoYjsfW5i1sv8xrNzfZ3d/D6DS5cv01wtihtTLH9Zu3cT2HIE2xx3vsjPeQtNSe3D0vCroppIXPvBtKKYdjQSk1ifIcRwo4aTnKp4Kq1GjyHtuBJBNnbEGam1TNnLCAUSrFW45jo0Rc15BSLyezoFxALAIvBp1qBcu2mG/WMCybM2ur1KpVFS9QbXZYWeioc86yTF3LtxOlNRqNRqPRaDSad00STFvk3y0z8TWNIBanZwtMpXDC5AjTdDHHmzDpEbUfx9h5gXzcxShVGW9ext+7QmmyS3LnBXxX/KMZ9LeJMDEGB5D2qBoZ/SzBKyoYIr4CMSmx1NfWQO0qNqB1/FFDfLnV2RcSSYaIr8dfHx/aSURsfSs89b4Y2k9AFkKtA63TsHhJOXm9c0+C4+J1zqp8V6UE26/PYtBoNFO0AKvRaDQPAMlrFTHxvZJkOWGSMV/zlQhpSct/mvH113Z4bG2O/a3bbOKwsXuAY1vs3bnN3nDC/HyHL732GulkyKSw2Nu9SpyC6ZiMshTHsMiKjKxADccaFuAUcrwh41jcriKWmkjkvhRrFQsmUlMh7tdpASd31VOpAg2Dhl0wiAvlbhWHbCB3048Lv4GJEoszw6AtLlmgYhbY1Tq+WfC5zzxDuVym3W7fdbW2Wq03DNOS89ZoNBqNRqPRaD5wJKf0/SACrGxDav5M2sZsRoMBztFVvGoTdl/Ge+3XIOhD2GNw65u0bYNhVsbIeyTdLSxyUgKiLKWu6t6USW5iWjlNC6JsrMwR6nDzjOqJjxcnPaYfXMUsO3Dh9HdDdwO+//8LtuQZLEBjGeprUG6CebxHSw++1WjeCi3AajQazYdAHMdvEA7fq3gorfaScSpCZMXKubnbY+fgiOW6h2UUOEXC1VeuctA9ZHtnT+6TMwxjeqMQI404eu0KoyJXAmmcJkoszTAx85SaKcJpRpiCDCLdz+UO+vSmtWMaZLm4X00SEWyNacSVirnKp3fCaxbKLZuJEGuJsFocC7ESeSC1mUHDKuhlBk0RWiXs3xIXbKGcwGW3xFy7gVuq8sPf990qx9W2bSXCCtrlqtFoNBqNRqP5UIjH4FbeuOy9iodHN6G2OBUhRXjdeh4ObxA1zlE1YvJwn8HuFbIooHztHxE3L1Db+wb1OFTt/bU0ICq6zI7Guus6TdXrZWuawcrd5VOktr7PmbzvAROcuWl+6+KT4NXhN/0xaJ6BsA9LT73rQWMazScdLcBqNBrNh4AIibkMm0qStxya9U4C7mAwYKs7Yr7qcXBwwJ3tPTYP+uSmyfbNAUZhsLt1h8y22dk6pFHKlViLUaEoxmqC6ci0iGMouagYgH4sYqkIsgZRZqi8V8l0lVQDCegf5dDxICgKDFOep6+JsCpnIc8S5yTFXv84NsqyTVyJFTAglBgCaW2yRLSdrtC2C7Wwbk9F6EqlwurqKk8++aS6RjOhdWFh4YP9IWg0Go1Go9FoNG+GZJQK70dEjCdKfI2Ge8RhQs0IGG68AqM9asUEb/8KbH4d0y5R37tGb3iA11zCu/aPp44Ggtddqypn9fVND6R2lnkJb+Pf+NDE19rFad7Ys38QWqegvjgVlktz0FwD1t560JhGo3lLtACr0Wg0HzAnB2e9G/E1iiK1fhAEKhf1cG8P1zT59teukZY9qk7B7q2bbGzsYvkeRjohjRJSMiYxmKkM4JImoT5Dw1ZtS2EONU9ynwp6cY5jWSr/KaTAtYppPIAxDewfipPVMNhP5LmYxgvI+eTTZ/mLIe5YccPGMiPVKFSmq4ioYu41xWVrgGVC1YKe2t50+65hcO7cWeUKPnPmDKdOnWJ5eZnRaKQyXUWUfaeBYxqNRqPRaDQazQfmgBUB8d2KiNEIvCrR0Q5eY45o7xpeFsGL/5DIr1GbO0300v8Jgzsk1VVG3U1aoq4WXUoish7svOlmRWgVt+tMcBXx9cEi07qqsP55mH8M1p6Fcz84HTaWy4eMMVTmIQnB0fW6RvNe0AKsRqP5RCFi4fvJYr0f7kdEFIdrlBvUfIeNrR1azQa7B12icZ9JlDAaDjGKDNfzeP7GdRxCdezDgYTsF4wGY2xxrEYZBZZyuEqOq4ioJctWRVtfxFcTjCxjbEiilKVy+PsZNO3pffdeDmVzmtdatmGUFcw78jxdJtsoHTthJWpA7rTLfh1zKr4OVabrNNg/KqbbnSE5VTNX6+nTp6nVasr5uri4SLVaVYKrLJuh8101Go1Go9FoPuHk2et5oh8m9yO8Bj2wPeV0HaYWtXBHuVsxHbyNb0Cpgjc8hGhCvPVt3P4mmWXgBVKvBzhHB7SUm9V6Rzer8E6vf/Achx4016F9DuYuQHUenv5noNyaXiOJGyg1pw+NRvO+0AKsRqP5xCACpgyw+rAFWOHN3JwHvRGNisd4PFYPyXW9cuU16o0mt2/dJCsMvvalL9FcXObOrdtUyy637uxBESrhNDgWQ0kKCstQ7tKKZ5GIWGrCODep2LmKEpgJoAeJQcu2lDu1Yk+drhIFIC5WGaJVETFVogMk51XyWWVQVjF9lnUEyW0VtVa+F9F1hqxXMaZzBsQZm6g+qEKd99LSknLziuAqma7ieG00GuraS/6r2q52u2o0Go1Go9FoTpKG35nN+mEh4qLfeOOy/oaKJxgO+9TKJRjsQNClJi7XLIfudbj8C1Bfge0XIepBb5saY1UHT4DyiQgBoW6I9fVhojQdntWQxxKc/r5p5MDp75kuc6fzGBT3Xh+NRvOe0QKsRqN55HirVnYR/x6E+CpRAif3L8cjwq9nFdy4cYOdnR0lUF69eoWDgwHDUY9chm0d9uiFI25t3FHv2xEhNC+YpDm2iKhZTmybZIYMyJpm44s2WpjTmIAKCePCUsJokWZ4jkXFLESvVe5YiWRt2BLnPxVMJc9VXpsF/YeS82pMX1fHnU/FWdmeZbz+9UyEPSnGVqsVlms15XhtNpt86lOfUtfaMAyVg1uv15Xw+iCuv0aj0Wg0Go3mIefNxE/hQYmv0fCN+5csWHlkMWx+m5rkty5/iujVXyQuCoruBslgh3x8hB9tUSNWwa3idRUv7VANooVZb9eDjxB4KyQOTVrYSlN3a+sMzD0GK8/AwiWoLcFgE+rLUJ4D+41DhDUazQeHFmA1Gs0jx0flrIyjCNOyVKapcOXly5w6d5qbG1tMBj3293Zpd+a4fPk6N2/e5OjoiH6//50bKqbt/CJ6xkaO7U77kQzDVE5YiQaQiIBEhmGJaHrsUBULa1mEVsl0tSwlrhbGsWtW6srj/FYpq5Rge+x+nQX4y3YFmZml3K/H34vo6twjunY6HRUlIMLyhQsXWFlZUfmuIrLOMl3le41Go9FoNBqN5jv4iJyV+aQ3NQRItmmWMn75F6mcenrqZr3zLah2VMzAcOPb1C7/Et7eS8RpRJ3guOCG6MT2ZkEGtYdGcJUDkW4zH87/wFTQlnNd+yyc/i5wazA5gPaZ6YcAeb2+9FEftEbziUALsBqNRvMeEWen5LiK6Dmi4GBzk0azoRyvIrDu7uzy7eeeI3cdeof7YNp8+Stf/Y7thJmJJ20/IoYWkEstVEjgqkVuWDgiqh4XfNLvH8Vg2JLDOhVHFcfuVMl7dWxL5bqKDizriNNVNFgZuCUxBhyXZSLw3js9VbYh65dPuls9F9t28EolKiWfxx57jLNnzyoRdjKZqMf8/LwSXkV0FderOi89VEuj0Wg0Go1G81EyPiQrtYlGXcqlEsGtb1BZvAA7L8DL/w9WNKF/+WcpHV3GLc/Dt59jODqiYHh3EzNXa3xcQ9//iN0HgQt2e6q7lmpQqkOpAed/BJ78HWA5YEhW2T7MX5y6fBtPv+4C1mg0DwwtwGo0Gs27QERGiRjoTyLGvUPlAn3t9haTJGD/qE+nP+Drzz/PoHtEliSEcUSaHjf158eDBWzrDYKnbeZqkFWeg5slKtjfKAoVBeAkCYbjEGPedZ+atsQNZMSFpYpAcbpOU1VR4uuMmbhqnxBTZ27Ye5GIAMlqHQ6HSliVc5ybm2NlbZ08L9RAsMcef5zcdHCNnNFopIZmyTpvhRZfNRqNRqPRaDQPHHEhRAM4vK4KZ2v7ecp+g+6L36ZVKTN67nmqN/8J+XAPP+yTZClRvItLFbFVvD4i9o18uCZXVdW/zevVaT6rHESeQPs8pJPpssd+Mwz3p3ECC49DuQkyCEwE1sWnwLKhsfqdw8fEGavRaB4YWoDVaDSat0HE02GYMN7dJ3Ft8jigVCqpVvvdjW2uXr3K3t6ecsPevnVbuVan01unUQJ33akirooCa08zWpWYKsJplpKaFnFqUJG6S8TXMCJwPUo5xLaDm6W4UjgdZ0tN4wBeF2RnvFnDv7hRK14Zp+QqUVUGY4mILGKrxAeI2CrDwA4ODjh37pyKDhDhdDAYqNdEmJ0NMJNzlu/lfTJQS85Zo9FoNBqNRqP5SOlvKoFx8OqvUffsaY6pDPMScfL5n4LDOypWgO4Ww94GVRIGRUxmxGoWwkySHBaj74gSGB93hzUkuusDOVjZY3ZcuUtxn4C9AH51+vnBKUEaTV2sMujBcWH+ianwKu6K1c9AuQ3lFnQ3oLYInfMQi5vVgrnz0/VKEzVMjGQClgjLGo3mo0YLsBqNRnOMCIvi6tza2lICZZEUjJKINArojsdsvfAae70uS3Mtdu/ssnewRyaDse6OrZI66lh85Z6BVTNBNgFX/uVNJWJAogRsVcxJfTUj9bxpHqu09Bci2r7+T7XKrMqnQa+WbVOtVhkNhlTLNYzjoFYRWCdJQb3kKIeuiKoSByACrHwtea0ioEr+7ExY/cIXvqDEWbkGtm2rbcj3IrLOYgVmTl65RiefNRqNRqPRaDSaB4J0i0lb/da3GZbXYLhFLZ+oqIHyzjeJjq7jpSGHgYET7lE6eIlR0qdFNBU/juclyJAs6SJTy4y3znGtnJiLcN9YTdTEXL80bf+3JbTAAVMGNEg2mDMViN06WCZc+m0w2IBoBNWF6aTdS78V4jEc3YRSGzqnoH12Ks6WO1Mn63wP3CokATRWptdm1gLnlqfPnhZfNZqHBS3AajQfU0QoE7QI9vZi6tsh4qMImpPeiNyeul2ltV4yTcfjsdrGc889RxAESoC8fXuDPE+5/ZrktUrI6nFBpgZgvV6cjXKomseia36iaFMF2PH3J6IC7rpX5WdqmjiGgYONX/UYjse0W20mwUQJojPnqRyPOHFFgJXlcq5y/OJQlddERJUhX5/5zGeUu1VEV1lHvhaH69ramtqOPGbCq7x+7zWT12bINjUajUaj0Wg090eR5BgSyK/5TlTH2H18jhHBUkTFwxtTMXGwM22x379MbS6HF/6Pqaxx+8tYSYzZ32KQHtEhpTube3C8KWm4H5wQUwMRYt/POdhzYJch7MHcuWMn7tJ00JUIoyK8Bkew9jlorMPmN6BzemqAXXpyur5EBVz6MRgdQnURFh6D678K9RVonQLLnW7fb05drrJ8Rqk5fbaOowREmNZoNA8t+tO0RvMxZTboSPPuBFhxgjqFRZxHJFnO/t4e1167Q7tZZW97izub20RxpNyhIjjGcawmpKrspLscu1DThNR2MY3pDe2Z0Cri613366zmFserWvE7i/CZYFrYNmsry5imRWEYuJaF6ThkUYq5uKDE11qtpqIBRCgWIVUyWDc2NlhaWlKvyTn3ej11/uJ+FZH2mWeeUectgvPp06eV41WQ59nXWsjXaDQajUaj+YDR5fpbo9ya5ndOhJXcUrfKeNil4nuEh7fwD69MIwTE2HD9N6ZirKiYo12GYirNArJiQku6y44jBY4KaBsSK+BRNyIVJVCS107s7qT4Kg38ryeiyhyDUPyv4PoQT44XL4GRwNJTx8dtQHUeNr8JS4/D079LOXFZfhqcCnRvwfrnoTIHt74E3/VHYelpGGxN3a6l1jRKoL4Eq5+D4Q5k8dT9ehKJHFD7b3zwPweNRvPA0AKsRvMxRbWia94SaZsXpJVe8kyl9X7r1gZ9cX0OBlx77ZrqO9rrHxFMxsrxOn3DtJYq0ow4zrHlMs/EV3ltJrSKmGpZxwOujtuJpIic7X9W3Imr1XYp+y6GbamhXKVymVatSW/UpzW/wKeefIJrly+zduFxgkFXCbIXL15UzlsRTUVkFXeruFhFGJbHqVOnlGgrwqlkuJ4ceqXiE4ri7u+IiLF6IJZGo9FoNBrNg8VQhaTmTXFO1Kbh4PgLA679kmrRLw334LVfxxKhUoZp5SmMNsTVcPdtg2NzrBghRFidiajDY/FVqBnR61EC34EYEETVhZqIq+I2FQFYlpUWoboCkwN47Mfgwm+Cyz8Hp744da42T8HZH4K9F+GxH4VzPwz927D8DBzdgElv6nytL08/I4jIKrMi3ArMXZw6gJUL2J4O1VIHsfQhXWyNRvMwoAVYjUbzyCBipYiNIlCKaHnz5k3u3LnD4eGhcoK+9NJLjMc5pCPV+TOND8inrlRxtKoCbOoEFbHUTqTAs9UqUvOpDvxZlqsMyDKt42FaBqZlYlqWatkX4TNLcixZZoq71aFRb3BqeYn20jxBGFJgKPfp0vw8ju8rgfTs2bNKKJaYAIkGaDabSkxVMQmT6Z33WeTASUH13miAk25WuSYajUaj0Wg0Gs1DgYiO4vSU1n3Hgxf/IYw2p9ECRxtMdl9lOB6xSJe4iKczbU+IpyKuigHiDW7W2fOxUeLNslyHlKlVliHsQqms6ndiyU5dhag3PZ7zv+lYKBVn6wIEIyhX4PyPQH9jKpZ+6vdB0IP+Fsishku/fermlUzWbHG6M4kbKM9Nl824170q8QuzCAZdrms0nwi0AKvRPAREkzGe/HHX3DcyUEriAUScFKeotN2LcCm5p9vb22rZ0cEh+bRsmzJzr85yW6XuSo+zWg2DODNxRZqd5bOKsCl31dMMQ4rFQpZbGGJ4NaEmEQVpSqPZIE4SLl26RBHndI/2WF1f59S5c+oYRUg9PDigUq0q4bRjmpgpVNvTvFYRVSVWQMUdgBKPRTgV5+4sIkDiBYQ3y2l9O7RTWqPRaDQajeb9kUeZynI1pANKc/+E/WlLvgiS4grdfmnqHh3vws0vw/iAbDIiyQ8I8NSgLPGCzsZGSX+afVzKeypO4HizhUnJeKNqKa+J8DoqoGpI/eyrmn6IS83MqIm7NBrAZ/4AdG9PM1pXPgO1DrTPT4dbSdTB0jPg1WC8D6tzYLrToVfyWHx6OhhLslflIUYHiRNork0Porbwehbru8ljVUO6NBrNo44WYDWah4BPqvgquaT32xovgqQIlNJaf/v2bXZ2dpSz9dat2+omtbw2c4nG+YkIAEG+FsuriKjOsXipnKvgyp1rUWMzG9eRhSfEzdk6lkVuObgVnzjJWWp1SJIQz/doH2ezBqMxjz1xieFwyLnHz5NnGevr60oMljiE8xcuKHfrbOjVvYioKi7dk8KpCLcajUaj0Wg0mo8W0/uEZuUnwdT1eT9iosRxySMeQTKBF/6vqdg63GF4559SsxsQ7EG4ffct0uwvV7YsA7GyiEyMD8Y0LkAEVcl0lXq8m08F2NedrfldwXXanmZSGFJfR1T8BbBkutaSmtdQ65wDrwKVRZi/OHWtnvvBqeAqy898P2x9G6pz00Fa4nwVR+vC4995jiqW7EQdL0aHmfiq0Wg074AWYDUazUfGW4mvSRQrl2l/PFRO1q2tLXrdHoPhQImvInKKEKs4jgNQhgQDnNwCI5tmr1ompMbxv3TibDWnQizHYqw4Xb03/jNYN8sM8gnlkq9iAmqVGuWSS2dhUQmo4rwteSWeePwS416fQThi/cwZPNfFdpy75zTodlV8wJkzZ97gWH0z8VWj0Wg0Go1Go3nocF43B9zLcNCnVvKgfwfGR0SvfRlvcHsquu5dheEtgiJFpFt5SJ+X6iVT4mqJmhGoOr4sRorjOQtStqttH4urs+fWGxq6JH9glVqwwaDzOerhPjTWqItrdf4JDL88dd2ufBqqi+CV4Oj2NLu1tiiDHqZxACIuy0NE2FNfeKPIPIsG0Gg0mg8QLcBqNJoHjgyUOplbKu5WeVy+fJlut3s3SuBgb59JGBy/SVRWqbkkrzWdZgBYJqlk14sCKxmupkWixFd5gzUVZy15TSasijB7fJNcMC0cwyTJEhbabYZxzHyrw/LyCoPBEbVmi7Lp45Y8Kq2aEl5XV1eVg7Ver1N2fBqdJst5roZqzRy9M+rHg7HeTVyARqPRaDQajUbzkZPGaojsXWYzEyRS4IWfgfEetf4u3PrKNFpAFd0TukVB04BJYRAXBVXxQoigmsOcIUOzDBrisiBQUQEzR6tEDTSlVDfe4C+lZjgMJUpAtr/4DLXBBtRX4MKPMAxH1CWwII/g/A9P81y9KrglqK9OB2pVOhBP4PT3TDd+cuCXiMszgfndxAVoNBrNe0QLsBqN5k3JoxTDtVRG6QcZJSDOVXnIezY2Nrh+5Rrt+Q4vvPCCihQYj8cnVj4Rqi/jTTNz+q+WiK0ixMo/YpIgIAWhiK4ivsoda2M6PMvCIJu9doxjWpSqFYoip11v4eY2ixdWiIMAI4qpZSal5jzVxQ7VUklFB9iuq4Thdrv9hnOJgwnusfgq3O810Gg0Go1Go9Fo3i95mGL69v0JqtJ67/j3GSWQv/6eK78A1351mpV67Zdh9xpkR2+IEZBJBd6xy1VKd7colKs1E1er1N8G9KRmLwq6TB2t27lP1QjZKHzWjFBlv4aFQc2oMizPU5O63q9SK89PN/rYb4Htb4PXgckutbXPTgXhCz8CzVPT2AD5fFA6MexKdbx5r4uvwsnBWBqNRvMA0QKsRqN5U8x7WvPfjrcTHme5rftbu1i+o0RXiRV49eVXlLu13+upOm+WtzoVUo/fPBuWJZWbdVy9yTq2M31OUshlUJYMRrAolPgqB28rp6uBie+YFK7FE+vn8eaqzJVbdKMB7WqLzZ1NmrUGZ8+fZjAcKVH28UtPYJgmRZ7heq+f173iq+DKBFWNRqPRaDQajeYj4L7EV+Gkm/VeknDqFpUogaCr4gQ4vAEHV+HKr0zNE8MryqU6HQn7OjMXq2S0CrOoAMlyFW9pUFhYRo5NQaI62V4XIJZN6RyrsWT7DIsWju/Bpd8t02+pzT8GO9+Gte+GG78CF34YKkvTYV7yuWPti1P3qqr536bbTF4z3zpGQaPRaB4kWoDVaDTvizdzvw4GA6rVqooa2Nzc5Fvf+hZbm1tK7Nzb28PEoDcZTMXV5PhfoiydBtsfi69qkFZ2/Jp0BUm260yInTla7dn6FoXkv2JSbTZoWBXwLaqeR63TwIhSKrUWpZqD5Ze42FhUA64+812fZXjYp7nQVkKxDMuaMRlM3iDAajQajUaj0Wg0HzuiIbjVN7pAhf72NBN1uAPXfgW2X4YbvwSN07B/FYJDhkyU6CrvlOfNwmJVnKnHOa3V4+eZ8JoWU0esDM86knSw46JdhmrJNoZUqDnVqRN37jwUGXZ1Tg3Q8ucew5d81loLqvOw+v+B2jw88ePToV6N1WmcgFt+XThOQ3A/mcOMNRrNxw8twGo0n1DeTWzA2yHbkHxU0zSVuJoEEftHh2xubXLn2k2KrCALEvrFhIPDg6mIKkWaiK/HdaAp8U0ipqrM1mn/km9Mo1sVx/O2LCnjJIPKOF5P2ooMi5XaIoZnsLy+gpkYLJ9aJghDLjx+gTCKcWyHarlEnGVUymWMIlduWclnFfFVOCm+CuX6ifYljUaj0Wg0Go3mYY0YeDtkONVMiA3608FT289PzQ//9K+pPNdhkVAbHjJiTLV//e5bZ47X2TCsOSNTkurMcyqabnqipo+OowgEx5T3NxhZPsNsQH3tczIIAh7/MejehMVLU2F49dPUZXiuDMySfNgshuZpSMRzaxxHBkxnK9wVX9UOtFFCo9F8vNACrEbzARIFKV7pwf9nFU0meCeySO+H9yq+nhygJcLr1sYmhmVy+duvMM4Ctl+7w3g4ISCSm9rHQumxWCouVjPHKiwyUVWPBVeZqXV3OJZ87U7fMnXISl6VSd0ukxYpVsXBcR0ajo/bqOLGMZ210yyuLKjjckslnNAkrI0p8oLFxUXVOjU73zgMcEv6TrlGo9FoNBrNJ5EPRNR8L4gAOhND75P3fJwnh2hNjqaO1sObcPn/AbcON/4JjCWNFQYMlH4q30mzflccrfeOgBBjhJTuapDW9PvZOuJ+lYa2ITaOM09JOtryEbXWGYZuk6rnY3h1OP1FapV5hqVlaouPK8F1GKXULA9WnpgOx5rls0Yj8LUZQqPRPFpoAfYTgBKcfJ198yD4KMRXtd93Kb6+H/I8Z3tjm3QU8tzlF8i6EYfDHv1ggBkW9NLR9C643Mk2prEBTmqRFBmmKt7ExSo9ScWxwmqBc1zRJRJBZZDJ3W9ZxYKK6+LUa/iOTctv4rsVGitNKrUyzXZHOW9HvRFGLsO1putVFhqUSqfeIBbP0P8taDQajUajeZjIskzdLL63ZtF8OHwk4qvwLsXX94waopXB1/8GLD0OX/4r0NuENIL+5tRZmvXvri4yrV3A5Fh8bU5LcpXvOikcVs1EeSb6xz7UFIeSWgP2C4/UWKLRqkKWU20tYYigWplTDtda6zxDr0UtH8HRLbA8aiUP6o/D/GPUZHCW5LjeOxzLk3ADjUajebTQf+U/ATjurBFE86jzXu/oz+II7o0lCCYBpXKJW7duEU4C8iSj3x1w+/YtwmHI4d4+Q8RpCkZukB8Lp4r0uDnJgsScul9z+VWc5bhKA1NhY5qQS/9SkVHxS8R5RqVUotWoc+bcOQwjI49CSq15VhbXmAQ9qlaZ9ullxuMxFc/l9PK6Ou9oMsYrv+5ulQ8yWZooJ6x9T8SARqPRaDQazcOARCKJAKv5ZPCe6nXJOzXMqav1pFN05nCVWnr3Mgx2YLQNlg9Xfha++lfp7r9KTgmbI8RTOjzOapVc1nEO9rHgKoOzJAJMrAriclUO1yJRA7YSw6BCoURYw2xiFSOGtbP4p7+XvNJmIqaH+gLj1ily06JOynDh09SsCCe3ieuLuKe+OB32JYKr5Rz/8h8/33tOGo1G8wiiBdhPADLRXfNgEcHPELvnA+a9iK9SBM5EV1dmlBYF3W5XPff3uhz1jlQMwNb2FnsHB6T7AX65zNZkE2J7mtOaZsquKsO1zMIiNdPpvy7yWULEVxFjbXk+nkaqPmPYariWZ5Uo1XyWyh0q7QZJPmF9+RRm1WdhYYHhzhFzC01SxyEOAuZbC8ytLanjrVQqSmBN4hgP+w3i6wzLPi7sNBqNRqPRaB5SjHsHJGkeyVr9PdXr4lyV349ZpICK8SqmsQLidL39NTJcrKNXGRxtUr/zFSZJQtlxYbJJkjssmEds5i65GYtRlcKw6BUZcyaMjwXZKPMxjZSumRIWEBQurpliUqfilalUOhzVztOfe5Ik3Kdx+gns9e/i4HCfZtnGrq9A0GOQSfhrg1pnTR2tL8cqUWRJDM5bdKJp8VWj0XwC0AKsRvMhUCQ5hjeLp3+4EGF1VuRPRmNKns+gPyAYDCnXm1x/7hU8v0S312UyGHN74zZxPyLOYoo4p7ANDgbbpCK8Zin13GJcGBiZQe7mGHmBIxmvZoZlWngJjOwUNzOpVBp4FZPJOKFc9ajXG9SrVZZOnaZSONgNByf3sHwbt1qi0+nQbjQwbPstM2tFYNUiq0aj0Wg0Go3mvslyCrEOfEQi7DsiouVxvT4a71JtnILeHRjuEjVP4fzyn8C88KPkG18j3HgOe7xPv79DOxowLCwSMgbRAZaR0kxShp588E8IZbQCDpESYV32SIhxiFmiZXYxjIS8chEzt3Bsg8Aq4Z/7fvI8pHL604SBzdzyWfz+TezOaSg3mVt4Aobb0JwKrt8hpcp5SC6spUVWjUbzyUYLsBrNh4D5kIivMiTL86YRFEmSkEwiTM8mCxIcy+Fgfx+75HLtylXMPtw4ukXVq7B5+zYrtQV2dvZIrJgkTRinKTXTZJjklAsLD4swCxiXfMkQIM1ylSwgAq/pgpMU6vtmtU3bdYnNlJVTy5g4KvvJapRYbs9jug55kSmBteyXaM7NMx4PqdamRZpd1RlQGo1Go9FoNJoPDsN5OGp1FS2gWvKPP5ZLhIBbhngsNl3Vol/dvQr7N+DWrzEcjKF3hajwyV764xRpihHtE5hlleu6WeSUiZXAapkFk9ynaxnME+Aa0+8jw2BUNKh7Pnky4g5r1Ms+rtNWJozKhR+gNNggLi/gti7RXfgM7mSXSblBfaWGVaqRzZ/DKddRWWLCsfiq0Wg0mrdGC7AazSPILMvVsR3yOCOOAyzJX8oTov0h3e4RSZRw68p14jBlkiaM+l06Rh3iIckoojc8xMgK0iSikvvK8RqmGXXTYpwGFK6La7vkcYzp2rSsOnG1oO02GKRjluotmvMt1k+dZTDpMr+0wsGdHRrLbTWZdf38Re7cuM7p9bPqWKPJBNt1lBNhJr5qNBqNRqPRaDSPJJJ7KoO5ouE0hFUQIfbgOnh12PomJBOG3/g/qTkGw8mYIjriwD7Nle6EzxeXmdAgzTLCQnK+XObMPpPcYlJ4uCaUzQmVHJrAnrFG1zRZLplEjUuUkl1u1X+cx5aXaLfa7McupUqN3Mwozv0goVVn3jeJMofV1U+pmIBhmOA5psot1mg0Gs27QwuwGs0jIrZKlqsQJTF2YjAc9Qm2Rnh1n2++8DVqRo2FpXle+uZLZEnKaDjGCgtKls2AEUUS0o0iHNMlLxIm4ZCBk1FhmulaNkyGZo7EOmUVl7pRxsLAdS0V3l+qtKjULebPnsbaTqicruOWHIowZmlxhVKthnPRZXV1lXA8wjRNHn/6mbvn4ZXLH+FV1Gg0Go1Go9FoPiTCPviNqeh6PHBqe2Ky2H+F3cMB7XgL9+AVeqmJ4/jkV3+RflFlMDxkb+IzNB0+XezjFD32woBzTsghdcR/apFTGCktK+ZOsUKFIQvOkKExp5yySWWJseWS+21keENx7hmK2lluDCIudmzG7hwHS09hhEOMWk3ltdZai9TEiZsErJ7IZ635OvZLo9Fo3itagH2EKeIYQ09+f2SHBgT9MVmWYRgmk3DE/rVtqnM19nf2qNXrJHnK4fV9DnY2CYcTuv4ed758ld3iEMu2GeRjvNRVw6uGBKSkqgMqTEbUzQpDbyxx/mDY2J7LKOqx5DexSh51CxbsMqX5JiurS5SX5wi2emTZhErZpfVDF1XOfmGm1BsNbNdTIvHw8GAai1Cu6DvnGo1Go9FoPvGkMj3e1h/JHjlk6FQWQzyZOlxlCO34EK78Iqx9lv0Xfp7lc5+F7efIbl7jxZ1NWtkBVtjl9kRmIgxIzBpeukHLkC62OV7MYc22cM2cSVKAXScrMhZMSA0Hv9KCkUV98Ty3YwsbE6+9hF9tED3xe1iv2lzb3OFrWYtT800+92yVPAkxJiG1ugfr59Whb/cCatkIrIaKQNBoNBrNB4P+a/8Io4YtfdQHoXnfg7JOOl3D/gRfZTS5ap3bV28zjkeszq2QWTk3X7zKaBzw4sbzmJlBlKakoWQ+lYi6I5zCw8GkGjt4VGWGKoGR4hayFFpZmdvGkcr97xhVnMwkqBtU22VaQYPmXBuj4tCwHWorS1glk2ZR5aC7T7npU186jZEnzC2tMDw6otZuv+Ecap25B3wlNRqNRqPRaB5epJ7TfAw5MSTrJNGoqxyunl+eiq57l0mO7uCkI0aUKWQQ1a3niJ7/B3zzxS8zDmLuDGLmOGQzdwiyFm0OuZqtsIfPPCUOafIFrpJSkBg+ThHQsPtcL+bouB59Wkw6T2MXh7S8BOPU57GNBisVg37lDO78eRrj27zSX6S0/iwXkoAzp05TOA7hsEdtofWGc1hulgB5aDQajeaDRAuwjzCm55FEGc5DMhBKc//EcayGZ4noaucm4yQgOwopdarsvHCLoZMyvzBHOAiY86q8+s0XsDcCts0BlZFLK7I5ZICPQZjnYIbs50ecMhfIyDkqAjBMiiKhavoMipCl+SXCIMBLTS6urFKe72COUrr9LpXVBbwioV6ts3TmHMOwT6laZX5xkfFkwqXKCo7rEnQHlFrz6hzuFV81Go1Go9FoNG9EC7AfU1IZnuWojFaJEyj6W0RpwV4w4VTF4863fo5KZ4UtY5nLm/t8VzPhG7e7ZHe+QtVxuDlsY6UjBpnJEW3WCWgwokuNJXpU8fk0r3KFVZ61Ntgr2ixbKeHC9zPcu8JypUW19QMczT3L/PCbVFrn2A1DnNYpzM55LrRsbrPAeX9C0jpLJVnk6UqHIMnU4Tvu1Nlarr9RfNVoNBrNh4cWYB9xLOd4MqXmvshGI6xq9YHuM89zwiCkVC4p12uv16PuT4/BjgzC3QHD/j5G7nHn5RsYQUExznjlN64Q9cfcnAQYtq2iCWqFxX6xT2HapFmEhwuWRURADY+BGWAUBZ7hkHgFSRBRml+gQZ1KuYZVqfO5xz9LYOQE+3tc/MHPcHR0QKMzRzKOWT67jmvatIMOuW9QZBmNRkNluooAW2rp4VkajUaj0Wg094uOH3h3yMwD03/w16xIYwwRXd0qmCbDyRE1GaCVRiRJyss3dxkkBf0xbB78ApPcwLz1NYLty9wY1flKZNPhgIAqOzTocEibgD1q2CR8mUv8dr6MS8Id4wyhWefQtlkU40S+SHPt01wNXC42DKLOJeLHLpFsblKbW2Xw5I+w4ieUMMnSmItnzyqB+HSYUmquk4YJoVWmZBg6w1Wj0Wg+QvRf/Ecc03zwIQRFnmOYU+E3zzKyNMHxfD4OfJjiq8o+PXa1Sh5qIMKpZLnuT8hcg9H1ffxGFSvJuX77ZVzXwUlt9m9sMWJMthdiFR5RfzLNbc27VM0KmQzGSgtiM2VsZDTyMqN8QseaZ1QMMXyHIJxQ8mtEZkCl0sI3HaJgxPwXP0Ol2qBRKROPI+w8olzzWV1ZJltcwqtWaS0uUhQ5pVNlxgdd0nqF3APf8+6eW7mmhVeNRqPRaDSad4sMJX3QyM3/k/v9qETN98KHeZxplpPmBX4+UUJrVuRY8YihYbJ3bROvM8/B3kvUWh2ivU16wz57IRxSYftgyHJ8k4ONm7waLxBYHlkkFojHqZAQUeIFOuS4VJkQYLFPU6wSzBsRTzsjriWXaNRKrMWXob3AjckqS/UK5xZKxOvfzfzmZfzVsxwyT9ezWP3045SNDLO9wFxF9jfC8mtc3RlycUlq80Sd11R01cKrRqPRfNR8PP7Sat41aZxhux9N9EASR7j+NDfItCz1+DDIkvyhd/kWaU4eZRRJRmakpJmBhK2Od3rkhzHjoxHZokOw2aezuMjW169QhCn5MKO3M2TkxdBPCLMxriWCZ8iEBDO1wDYJSXBT6FoJfmpw6I7pWA2qmU3sJiyvnmY07tOem8Na9alVmlhHEdXzHbKowDQzojjFq1mc/+ynMbEY9ns05jv0HQOvUpEqHb9cVudTmdNtShqNRqPRaDQfBEmS4DjORzL4S/Y7mzfwYYqaHwdxN0ozrNEQI5sQFCX8ikF2uMWekbD18jVKzSZOkWPvH9KL6/zKV1/GHG5xeVzloBfRT4/YDWMlcUY8SUqCkWY4zFMl5IiScr0WmPwEX+eWcZ79wmCNPS7UYoL647RLEX4ywV56itXF38LL2yNOrz3F+fG3uFL7InOTa6x+10+wMtdmaXRIYNVZaZa4tnXImucQZwWuX1Pnc3Fp+qzdrhqNRvNw8XD/NdS8e8I++A1Ma1pQRZOxmjj/IJmJrx82H5XwOnOwvlkxKzle4igQZ8GsoC6yHLvpk++PiaMJTAomvQmj/QPSUUq2GXDn1TuEq132bmxQ88p0J31K2PihQxKnNIwG8iNNyKmT0bdTnCQnskMiueaZxVFpyGKpTdWuclQccfqxiww3tlg8PQ/zZVwMLj75FNgW+zc2aD69SBInDPcPmD+1ThSFRIMJ86fX1HG35xdIk0RFGGg0Go1Go9Fo3j9Zlt2tE2cu1LeqLT8sXFd8lw+Gj0x8DQcqm/Wtlt8bOzbGotxconc0YdAdc+dwgu97fPXAotu1ubWxybg/JPcq3NiLqRsWR8UBR2abat5jhPz8xLDgI5Wzi0mdsfq4Le7XJqaKF9iyz+O3TiOjkptU8C+cZXdriy+sN+j7KzyxWOP0qTW81T3yOKD6+O/l2TRVHXOLnRbBoIudhax0FtVxX1jp3P38p9FoNJqHGy3APuTEQYpbehc/puM/vqY1Lejei/j6rvf5ERIFKd4DPtaTBbLkrkqMwMkMr2h3pJaNsjHJboDRdBl9/TJxP2TSC6kXPrsb+xi1nGzXxHUtXMl6vX6EmRYqo9WQu9iZx8ScYNkuRmGyGe9Q8RuESU+yEui483TtEVbNJclGPLn4NCMvoFqpsb54gdBMmH/yixi9CVazipXnJFFM2a+y+PgalmUTmSb15WWSPMPzfOqn3zg4y/4IXBkajUaj0Wg0HxckYkoEzZmb9J2wLEs9Zl8L71Z8nd30/yjcsx8bF+wJ8fVkPJosH4YJo9Qgur3DKMsYDBMKxyR84WVeubXDK1GLowg2ukPmfbjenRCnEhvgykQFIly6hUnCKuRjJb5Kn1qE/A5ESnqNsIiNCm6Rs96wMIqUed+gs/AkrmnSnl/hqdoKUXmJ+sJZVpdsVvOCXphxMIpZX+iQmD5BnLLom9zIzpNlOfPz829yrlp81Wg0mo8DHw+V7RPMRyGEvu99PsC7sA9CfD1ZNMrXhWOQScxCqazEaktybicp6Tgmq5tk+2N6r+6QDlNSPyY9HGGFPvFgjBHnbJNiZJAPLWwzgdDCyQzlji05JkE6xjYMUsdUImk/6tF2OzT9Jqmd4VWaVNpV8smEi49dYnRwwHh+jopX4+ypS/QHh1j1KityRzyHoX1AZ3WZ8eEeQRbgmxUKw8JyHaq+T5HkuA9xjINGo9FoNBrNw4rk+3/cBnc9aMftAxFfj52tSZaT5QW2aRCmORXXoujvsUkd0zDYfO0Oa6eXeOX6bX7xxginiBmFCbcmNsNJzOhgSGhbdFNDymjuDMEgVPEBMCFC4rhiCQE73rGYXSqqI01omykNL8GxU2r1thJN/8jnmvz85SFPrbeolGosN1wOg4zVMyvUfJtwMiDFZW2uztXdESUzIzbKVPMRpUpDzfRYszJ8R3elaTQazccZo5Dbpw+YwWCgJqf3+33qdT2858Pm4+RofRgQkfVksSgu13QQYh3nKPV3D3AKh3wQYbRK5Pshw90uxf6I3m6PtJkzH7fYu71Hw6kySofspQdU7BrlwqFbxHhZQZLHWIVFTQRRG0bZiMgocAwTOzcYGBE1WzKcYopOCadZwo9NBpUhnuHRmF8iXy/TlNamMKKzvMD+zgG+5zKKh3QqHRK/oNGZp1SbZriG4xF+5cMbNKbRaDQfFLpW0Gg+OvR/fw+ej0NW6sOCOFgl33T2LGR5xt4gplZy6AZ9ukOTeDDCaViMQ4ev3zrECWO+vjNir9tlvtVk62jMfj8hU+7WspJYA+Sj8czNPDmOFRApNjxeNq2pZ8je68ZECabfe9ZlkvuEScpjS03iNOGfOW/SLerUG2JOKVhqlLhyaxu3XGe5lDHIPRolk4tLrbsCaz+IaZQeXEyERqPRvFd0vfDu0H/lPwF8FOLr+7mzHk0meMdDnx4kaW9aWEleazZJOHx1g9pCmySJ8KplJrtdgu6AvJeTlTxuX7nOgtOhd7NLUozwCg8mKcaBwa7xGuQeo2TCIBlSjhxsq6DnDfByhwpluumEzCrop30K26NqVSjMMdWiRLccUcur1JstJrWYUsfDtB1MC86fvYCZGxRJTFF2MYOMtYsX2Lv6CstPPI5hm3QkUqBcIUvTNwxB0+KrRqPRaDQazcPHgxZf8/EYo1R6vTX/YyIWb/cC9SzC6zhM+cr1fT59qs04CbAMl/1hwNWdmIPdTV7u2qxUM77+2iHbo4Q8s0jzjMNBymuHPfIiZfIdH4ffLEpCrpHMU+gSU8YioG3ZJDhYNjyztqze95lTJcLcpGRbfN+5Ol/bimgtzhONQxbclKWFea5v7vC9T53FtU1qIvD6DeI0w7Vfr9e1+KrRaDSPJlqA/YTzZmLnB+GYvV/x9c32/2GLr7N9ikjsJCaJmWJGBZbjkAUxQW9AvDHAs0r0rh8wPhjgGj7haER6NMAo2cRHEc4o5mp8mXbkqGmncTykbJUZpdP1IwLs0RicAtvzSbOMdtxg5ERK1HVdDwMP25X76hmpkVGqVqktdag6BWkJCrvg8bWLxHOQZBl+ZkPJIjjqMvfkOYo44eD2HaIsoPXEBWzPmZaNs2yx99miptFoNBqNRqP5aLlX8JTc2ZNDvN4LZqXynvf/IMRXcbeKSKm+DlKKOKbkWozCjIpvsdmd8NXXDllqePz0V1/DsSzS8ZhX9iYMspx2xeXaTpd/OumzF5QoGROGeZmRmFnF0SpGV2lBU4XzW332eH15lQkj6ipw4PxSm7pvc2GxhinDtTybp9eblEyLkifHXKgIgqcX4Im1JuWjMeGwT5CkfGqljlv11PnNIttOiq8ajUajeXTR6swjSp7lREFEqVp6W0fpmy3/IByzo26Xaqv1joOy7kdsTZNEDRUzxf75Fpw8R4kMkGLq3mEI4nDNopTMTun3h1RKDeI8ZPNbN2hfWCSOU8zdBNOzsDslBncOmLywh1WaxunHRU62MWAchZCllMwSrdQkSkb4dplxGBLlE6V9mlaG4dlkdoTv1jDTCaMiwihEaC0wPJ/QjKi7FQIvopz5VFdaVBaa2BWHqJRjByH+Spva+iLD/iH1RlvlyNZXFuD0sjpnv15j/eknH3iWl0aj0Wg0Go3m/RFOQjzfU7n+ZqX8loO87hU8P6jc2WAwplSvkGUZeZ6/5VCv+xFc8zjDdN9eSDxp8kizHPt4aPBJNo/G07q+WqEYjam06vSCmF//1i0+f7aDbxf83MuHPN6p41s2P//8Ntf3R7TdhAiHpIBvvjZAZuT2M9liSW239waR9dhhapyMGZjx+vfy/3JKjgtL9Rb9MOZ7zi2yXMlptdqYRsHpuolhOpxdbhGnOZYB9ZJLq+ISxBl5UXB2rgpz1amo7E+3PYtO0Gg0Gs0nBy3APqKIYOkeF0sfRTt/+UT+x73iazQZq/b4+xFT8zxT52IY73SHv6BIc/IoozAK8jRVA6ySNMI2XNU5NMrGlMsljMQi3B0xGfeplKvMV+c4urzP+PZtvDMtxl8Z0lmo0rt6xIgxjuORBDl+Knfj+3iGj+26DLMh1iQjIiaOMmIzpexX6Yc9anYTexySmxaTdEDVcPFsG8N1KYqIyvkqftqARs5abQXTkTamGGvBYzw4olRqYSw1qZeaBP0Bw6MB2AZ2vcp4NKJSrWJ73t1rpcVXjUaj0Wg0mo8XMyHVqt6/G/WDxK9M623LstTjJIMwof42IuEsf7XICgzLwHgTMfVec4hhGup9Yj5N0pwkKyi7prgmGAYJ9ZKDYYX41aoSaH/jzpjmYUQlT6Fc4jdeuMlrYU6tUuWXr19mMLE4GIS4Vs4d0+NoPGaSwHSaw1RffedhJ69/ThKpNqFMVRrOXPjexxbYPJyw1imzWPVpVz31OcN1HNZbZXUOq4s17HDA/iBU8Qc/+sQywahPHgSUSg01FGx2rbToqtFoNJ9stAD7CDETNmeinGV/dH/kT+aO3su94qs4XIWiyHFc7w2CcZYkOJ5PHMe47ut5SGkcKxF0huuUlABrVRySbsBwZxvLr1E/Pcdks0vv9j55mJMu14niMfH1PsO9A3qduhp6NTkaEo4igq9vq/U2D7tE6URFCRhByiQ+IkilRahCGI1wkzJhPsK1fSp+hXEQ4vgmk7SPbVsE2QSn5OHYNuOsT7HYwZ8M8ZfrtBfK5IOU6pOLMM4wqh5W3cUZTXBbPp3TnyKxEny/jOnabN+6wfzF01imoYTlWVaXOBU+CnFdo9FoNBqNRvM+hr2ahsrsF+Hyo+Tt9n+v+DrraCuSDMOx7oqJhQirlqW2JbmyJ6MNwiS7O1gqzgss28DJTXqThLmqy8++tMWzq03m6z4vvraH59psHHSp1Bp4jsmvv7pDzc9Zn2/x8o1d8Fz2RinJ7j77+yNS1yPOcjzG7KXfKQC/mfgqR6OMsffQsKBVsfBtg+++OE+GScWIeebZNazJAc35Fu2aw14/4uJSXYnFVdfGsU3yikc+CDk/P4+Mt869KoFhqLiCLC+08KrRaDQahRZgP2Tkbm+WFTjv0JIzI40z7Ptc962EzQ9alMszcaFad9uLVMEofT3vkTiY4Pilu21W9nG7UxwGdwtTw7MochXSRNSPMGyDLEtI0hBzbGC2pu7PPJBMKpfJdp9Kq0a83yMdxATdEWY1Zfh/bZDkMVajTHhlj3hyBHGBH9UJDntq0Faj2SExJySDROWw2rnkQZmUDJ8sjEgqBqW8QRDvkOOTFQm5Y1I4UAQZh+NDJRKX8ypd6wDbs/FNmyxPsBoei+3TVCsV3PoqUT1mdLDL4toFKqstrr78Mo+f+bTKofXOdMjjlKJUKDHYsWysksPKufPqOiRJguV62Me5rm/VJqbRaDQajUajuX/ebZTTTIR8L3xY+aknW/qz0Qir+v4Gr947E2LW0SZRX8axQ7bm2RRxAG6Vw16I49oYYcJuL6RaLuhUpm33gyBR77l1OGahXiLLc752Yx9JDfvW7S4v3u5hZzGDPCVNoHlUUHITkjzjy692+flvH+B4Fg1/zO3+9ANsgkcRidtVzlmyWFWy63fwhuUFFMY0gKBThlEEoo1WHJtLq3UWGj6tii9NZ+qzgO9UaVcdxu4CF5dq1Hyb1WaVZnnqZr12u8fFU01M06BRdu9et1rp9Rp9JkBrNBqNRqMF2A8ZlV36Lv7u3s+d8LfKdE3iSDlIP2iyVDJYpych2U7vZ0iXiKpydzwMU0qt+hvORTJelStAzuUgUC33IjpKPr6VQz4qoGnRzQfUuiWC233MBZfstYRuukNvz8UemdOMpX6f8PJrarBWc3WN3eevUV6oc3h1i3ySMDk6ZG5hmdxLORru4OQ+ru9gjnNMx2UUj0F+djKYKx7TD7s0/DmKNMUpl4jsAVWnwyTeV2Wda7v4tSaNSUK9tUy8mFNPynSzI4p6AmdL+NUWu1/+Bus//GkM31HF/lPf912E4yGlRl05eq2arZzMnYUFJoPBLKFKoQVXjUaj0Wg0mg+edx3ldB9GhLcSdfMwxPwQoqOknX82y0nE1/dTr4vbVc5wcnCA3WyrHNOZkCidWIejiKpvc+3mLvVOA2e/qzq1ktjndn+ohlOFO9u8chSzezQidx1sy2ISply+fUC77rE1TGmWbL5684gaBtWay85OxJoz5sVuyJX9kRJH/ZLNiIwgQgmungFjEVJPOFzlyGZVcnj8LMdfNcGz4SCGmgEx0K5C3bMYRTm/89kFldO6P0p4bKnBM6cabBwGXN8e8Ye+7zS+bXE0Snhitc52P2De9zAkMkFd71yJr0mY4pWdD3SWhkaj0WgeTfRfiIcMyzK/Y9Lovdwrvs4KLGOaJP+mr71bTgqj4u48ycntiVgoYfLiiJ2F70sWVBwF6v2zAm72HnG5utUWafH6PrLhEKNSUQJkMBwqVym+QdINlfiaXBsRtyw8yybdHWAmLvvFIbldkO4kBHcOqS4ucvDNW9QW64R3btO9s4/lOVhl2HrlFTrNU2xefhU/MJkYMUkScrC1gUOZlIz9YJNF8xShM4J0jGEWjOhTzts0WwvkLkRphCXuX8fBycqkWUDfn3Dq0kX6gz1wbDr1ddxOk/hwE/90m7miQm4aVBpNbNNi7guXqC3NkxkFw/0dcA0azaU3dTKfzNHVaDQajUaj0TwcREmMb729iHpSfC3imKIoMCXz9Z5BW+KmxXz3cQQy98BwXu9KK93TQTervWW/Uo/7xyKhCI6ubSrBdJZNOu5H+FVHfQ6R9RNT1ktpNNtEaabWSQcDrFpN7TPupQyO81slZuDaVp/M8TGZUHdMvnJjn92+fAYYcmrO58ZeQHccqpzZL1094InlBjvdIVcOAjqeQ7Vs8trNXVYaZX7tVkzqJQQhRDH004yyMR2NZR8Lr/IJRfrmxHYijWHimj0759ANEvohiBm1TMhO6OPn8HTb5fRSnSQvOBpHXFis45gmyw2PJC34wScqbHYndMol6iAItn0AAEHSSURBVJ7LWtVgrVXByWEiPx/gscU31uUzY+tJ8VWj0Wg0mrdDC7AfEW/lYn0vrUmzAksGVglZkmM55htem+U2vRX3ir7i3D0ZPfBW64tYqApAT7KfpvsM+iP8Sonh4QG1zhzhaESWeoy2AuprNVUo9q9dx7HrJHZANBqzc3SZxz7/DMP9I0w1fNSg0mrS39ojKMY4sUN/NOTo5ZusXHqGqDsgHPU5urqpttf0K2R2xN7GLQ6ONukULfIgI58UmKHFnWuvUCmVCWyHJBipcP5JHmNWM/JhTMnyyK0Uf1Jm6PUp202sNGNc9LFdj6gXUZufV07c3aMbtGtLWG6Fx8uPE7gW59pPyu10qucWCZIRbqNJZW2e1IzJMehtbbNy8QLzSY1JMKI5N0fl7IW7ea7iMpbMqFkcg0aj0Wg0Go3mo+WtWvnfrWNWhrDelVfTVKZvkccxpiw/VvJE+MzHk7cdyHXyeExv2jlmvM3nBjE5iGianRB3HcvgcBzhWya+zBroBSw3SxwMI+ZqHldu7vL42SWqns3/+/xt1lZb3Ng4YG+Q0GolPLXa5Op+j7AweXqhRtm1uNJP+exyRpbm/NKNHv3BhEsrNa4ehrxw64DL+0Nsy2a97eOZJhuHA67KQNxhRm3B4FvXQmwffuPWmPUy7A0L3EKlhiFn2xBDRgoTYzpgS854uQz7E5ivmEwGOeMoVd1r51oOqZkyDkucrU3r6lbVo+SYLJddnl1u8NhqnaNxSsk1+KxEgKU5c75NnGWsljzOdpZJRgl+y+dTa6271+79uIo1Go1GozEK+Wv/gBkMBjQaDfr9PnXt8vtAicaJahMqVV3yKFKDmuySzPT8Tt6piJi9LrEBEm/g+iV6e0c0F9pMeiNVfEpcqpmi7oZLHqs8B5MhpmNTqtbo7mxRJDb5IMYpObjtCgdXb+LPVYj2hpgVh+0rN2l7JeqnzrK3eRunVqHIQghyBr19DNumsE2Sm30m4QgTm9zNsDyf4cYepWaZOzdvUDZK+FaJ3IjVe8pZicDKKDnS2JQzDga0aytMenuM4zFOqYSVGviSWzW+xULrEkHUxfI8zCLFWW/S39ymvbBC4kk4f1O5FjJHclz36TTO4s2VaJ9aJ/dySvNN0s0xI+uIemuV3IyZJCPa7WX8apnChF73iPbc/If4G6DRaDSPBrpW0Gg+OvR/fx8uaS/EKptKmB0MA+q1N6/V3ynL9eTrM3PEKEqVW9Wv+sRpRqVSxUgTJsXU3SozGOJJqATJ1J5mmb640WMhGfH8CM7XbTLDohtApery1RsHfPFch199fkvFBNRKHuPxSAnAg2Gh8k6v7A45W7d46TBUYubNozEly+BUp85ef8z24QC3WuHW5ph6DY4G8OS8w7VeQpzCharJIMnV54rYgDCCkThwHZtAZvUa08FZYjaV2b0iJ9fq4tcwWZlrcDQcq+PyXEvl0uZpRr3q8cpGnydX2yx1SlQ9lwuLVZY9h6t9OT5bxSo0aw5uZlKqOrQrLtEoUbM7tNCq0Wg074yuF94d+i/LQ8793mnNskw9vIp7NyBf2pyyOKJIc9WvI+LoSVTAvPxHs9XFrjmYRoFp20pozdIU08oJdnfx5ufUMokYcF2X0V6PIowwm1XG+13yOFSipRR01tjlcHNTuUq9WhW3WVYFJmWb29+4hl1LGO73aA/nyE2HIhjieFU279zm9ms3MPMKYdjFNctMRj1cr8zoaIfISujUlznY28SzS0wmPTrWKoHZJzroUcMjdBN8u0wWuYwmm0RGDc+r0g8PSAnw4iZb6StUym1q1iITe09t1zQcvGqL3A5xCgO3VMUp+8SjCY7rIrfg0/6QfjNXsQanLz2Fd6tC85k1zJ7JKD5keNBlwTxLmAxYPHuB3Cjo3jyifXFZOZ3TNMFxfC2+ajQajUaj0TxCvJuBVzNHqt2cOmiLLKPqToXRYZRSl9DTE8h2ZfuJ6zHY2KGyukw/SFhpzgTbaTEv+aQLtelnAFtirjwY3t5hsVGmKFfY6k4o4oiGLx1lIaVOh5e+/Sq7YcHcYo1T7QZdw2a9FPN3f+5bdBarmHNz+EclVkoFv/DCbS4YAS/eiNjKHIowVjms/bhgte6yddTja4lFveQwDgJyx+Gwm7B1OGE0LlSEwPXemKYpnXrTQVhb4wQ3gsyC2+OcuhiBEzjM4YKItNj4KZytWdxMMpriFM4zDsT9WnOol1yGaUp/GFCxHMTk2wtylhuuEpVPtSusV8tUGz5rVZdBlnNrs0++VKUk7t21JuNuRHAYs3y2Ob12lond+ODnaWg0Go1GI2gB9iGIHHjDelGEJ/lQs/D+0ju3OKlppJInlZt3B3GJ6FpInZJmJIZFmub4mckoGOB5PlZqUThSNKZ4VZckyel192Fg4s0baqKoa1mMDgbUx4Vyb96+fpuzn71If2ODYBzgzlWZ9Puq9Du4fIVwEuCZFQoRfg8OCO+Mycc5lpFjdlz2ru2R5GOMscHB9dcIDvYx2w2S4YQ0H+H5VYxhQVDOSQd90omFWzaJBj2segVrFNKfHFErLZDGGUOny2B8hGuXiJMRS9Z5BsURo/EhnlNlFHXJnIJqaU4NPEjoUyt3sCQtdz7BH9eIwkP64S6duVVyx2TuzEUm9CnylPLqIsWOSbU2R/3UIsXApPX4Gn5qM14ukWYxXsPDzF1On34Gr15leOgQ5yFH+3usf+pxLAmmUpEOegKqRqPRaDQazcPKOBlTtsvTmvptmCQTys60ph/GQ2rV2n1tXwa5zvJfZ4O4iihSswWyXo9quUxyeCRTVwnSHEPmJPT7NGRo7WBEZ75JcrTPwZ1d9uod1ltl8u4R9twc4xde4dbpJUb9FGdwSPnxSyry6mrk8dl6yvbhEb7lcuXFV7lidOiML/PMZ57gV3/9BtW9CfPBi2R+k369ybV+TiMesftcj6RdYTgIVNxX2wi5HXhyeHgx1CseN7sRz++MVRZrToo/DvEMGytJZMwBu+MCaeA/OD71SgaWhLeWYDiGwFRfsppDxbDoFRmPl2EwgqWKxblTDV7Z71MxDfwwp9OpcME2KSyTM2ea9PcD5hbKbOwO+fz5ebqTEDeFJ0+3uXM05snTLTWcbM33OOxN+J4nlimXHUbdcPqjmKSce6yFNZteptFoNBrNh4iOIHiAiKDqOY4K4jfviQWQYVaSpxpHEaZp3s0CDYJAtfoXUaYGV1lOwWi7S2N9gf7mJo3VVQb7PUq1Kjuv3MFtQmNhBb/isfnqyxS5RXVhjuHmmLnzLXq39giiAFccnmHAaPOAtc88Sz7ss7Fxg0qljJ9W6IYHFBNpj1rESHsqd8nyXCWI3vzWy6w/cZGDl69jz7XovXabw6MDqvU2JbdJd3SdIjax8gr9wQ5GGjFMj6haLZXbZEwmBCpK36VQ80glJdWhwKBZW6A33oU8wKZGjSaDYpuSUVEDAdLcomzIrfMyhg2DdMhcfX06FlWEaMfEzA3KdoO+26cUOUqMnWQ9vFqDUrVKvTrH7WvPs/rYp4iigYo8iHZHrD/1FDdufQsvsFl99lOkVkLZrVCEGXbVxzRtFp48z2tf/QblpRatpRVGgwHJaMipp55hODigtbx632K7RqPRaN6aT2qtoNE8DHxS//uTj0WTdIKd23cNETMk2kvE2cKxKVRW6+uuV2n5t+t15VZVhAOy1MRuz5MeHeAuLdLduklJBs0aMVli4i0tqcis3i//CvXv+iIkAd1b29TmOoRJShAlpNUqvVdeoV6r03z2GV584VWq7XmMXo+NwwPczgrjtKCdbZMVTaLxhP284KmzK/zMlductV1eGAes+Q7fuNanZRsMfY8zjsHz/Ry7CChFOS8EZRqGSd/okdKklOwRFgvE0vIvuas52HmKbRrYhoVnwLCAJAfp7Yqz6XCs1BAXKUipLuvI91KvlwqoG9CXS5ND04I2sAUs1S0GRxnrNZtDAxUhcKHsMRhnXA8jHlurMkgyyp5FFma056oqtmESFywvVfi+xxa5fP2Q1bqPXfeJJzHff6rDC1cPKbVKnFqq0N8LqHs2peWK2r6gIwY0Go3m/fNJrRfeK1qAvYcHEa4ejsf4Fck5ze8Orupt7eHVS2KBpXvUp72yTri3xeHmNka9il9exMoj+sMBftkm7VsMxgfkBwHzj61w++WX6aytcbR5xLnPnmO4v8/e7gi7SKmvX+Bw+yb52KG56nDzuWtU7ZCSVyMtEgxabNx5iaW1C4xGt/GMJsPeNtG4j+e1iOMYy3ew8gRyl97hDmEe45kWQdqFxCFKo2ND9UBGE6jZpCYOubg/s4hcnacpvVbqFriJhUuTMD9Ui9+AhDxZImBOpPmIlBCTBrkq2zwaxhxBMSQlIlf7DGhV1ikccQHnuIZDKIKsaVA3OuReTNWuY3gumZNTSsskjQTLcjGNnNbKWepeg266jyXu1njI4pnTLJ2+QJYlbL30Ko2lBYrYwGv6hFlAfX6BNApZOH2WNI5x3uUwBo1Go9F8fGsFjeZR52H/7+/e4bEfNNOBWGPM4xvqMjQ1TxL6r12htnyKgRHivXQD9/x5+nsb7N2+zdnHniYbj5kMeyT7PSpf/DSDb30dd+MAw/dJzy5x+NWXaF+8gLl5C/93/QRHv/SLVPKU51pNnt7cplhsYu8ccj122HHLLEd9rMMeu5/9LOMXLpNbIZvuEutNjzv7MRkxG70hjdYio+AAx3CpLDT59pFFfzgmcSu0soSRZTIqcspRyJbfwCgcnNGY0LJksgKpVYKswLUd8smYtFSfRpVZkE5n/GLGqtkOW2LNZCiWLJT/M8HJIfFV6Y2TQsVWoxyopyDzZicJLHiQmVBXBgo4X7W5OkmZL2DddNko55RdE3tUcGG+zPUk4VTFx6k59IYR3/vYIjtHE3LDoNV0sYOcufNNNTTrcGvIWPaNQcmxOLVYpzcIac+XsQpoLVVIogzH005XjUaj+STVCw8bWoB9G9FVHKKSffp2hV93a4c8T2itrHK01cNPBzgLC0wGAV7ZZTJKcRwP28248hvfwnCX2Xr5Kzz++TMkgwpZPCb1TPbuDHGIsTKbwWgfy6lglRbZvf1NPLtFNtgk9n11B9wyPZI4IA0c9o9ucard4mA4odyqEU0qJP1NjIqPFzpM2MawHNJoj/n2ebqDXcJxgWWEVBoVBpMMNxtjUCFxHSwmROND8qxCUQTkmJjZkNyKsXDIpuqo+l9GCLENriyTXyMJdjruMcqKqbAqwqu8JEVc8fpqb7yYuXKvyi1zednGI0ME3Rm2EmxFfM0ZYKlmJcluLZMUE3wqOP//9u4sVrL8qvP9d8875uHMOWdWuSo9ULQxtLtFW/iqaa4RcNV+QYwSQrzwghC8IoQQbzzwAuKJ6YFBIHGRuKB7hdp0c41paDcu29UeqrKGnM8cc8Se/63135HldFUZ290+VVmZ6yMdZeY5EXHiRGXaK357/dciwrg5VRjgRtCiT3dnh3kyoUwyOttDyllB3OwwWt6n3+/hNVps7V6ks3WO9GDEwpvT7W7QOb9NWRVUpsB1IqoyJzke43VCNi5eJFnM2Lp4lfgbnPellFLq3V8rKPUkeRT//c2zOe3wzbWXDUpbrbe8j3SkyjH/MklJGy5VHBOezDCNBlEYUp6OoNmwR/09OfK/XDL71Kc4XZ3SPl0xv36ROIgpMbi39qlefgn/4lUOzRjv1X2GxOxvuITHc6Lbd1hsbJAuJ7Tfcx1/NKV87ZC02yU/eIXeBz5E9pnnSfd2GXkj5kmHrizMTcfMhhfYnJfMmzFhOcXH49WFIR32OFdU3Ci6vLyzzeXVPo3Y47NVl9h0mJUOKz9lYQa4Xk57kfNSawPHDQmqnGUprasOK2JbkvfyEaedXWnnlflY2HlljoefGgq3gWNSTBmBn8rRt7p+T9YvZgBxDokc0qukOSKWkay27Jd3ky0XsrCe8SrTaGWNgoSxoTQSy9sDA/Jfb+7BZSn4A2h5Lu9rh9yToJYK04xYJJntlo36Ee2lw/Bal0vtJrNZils6uH2f88MmYRwSZCW9Qcz+nSkbW03uvjzm4rMbdhFYb6tJqx/hy+xYpZRST0y98Ch7cgPYZApxl7Ks8NZdqA+rlksWSUq732N6MsVbzJnJpdzZGH+wQTI65fDmPlFfBrg7BPGcipjT1/aJhw3y0iXfnzA7WeD1uixv32Tlx4ROwWwuXZ8RZVrRaJ+jmr9MGe3apU8Oc3vUP5MxAXlI5htcs8JzIgoyG1BS7oMvGzu3yf0uppiCuQtOgeddpCxXeG4F0nkaDKDKMbbykTsX9op2Uckxfwi9c2SMwcS45ZzKTfCcBqUzW3eyyqXtrG5cXQ/HtxWUPIiEpq5cMV/awgu/7mut8iUEdUha560SrJY4pcxskp7YNbmTPKjx7XGsdbb7eterlG8yvsDYS/ASkBc2BJaZtzK8P3BDG44WrsvlC+/l7u3PsbFzhayqaHQ6hHHMeHpII+4R02Z4ZY8yLexrubk7oHftCqM7d3F7vv17sLl3idnoGM/zWI4nBHHDbniVH3fryjWGe+fseAH5J+O6sqxLxwwopdRjXSso9YR6FP79FUVha65AxnfluZ2V+kblfEExmxLu7JDcuGHTwMlyBOOpndO6qFKSF75AdmWb41e+yPWr38nB0ausPvNZBs99F+nxAav//hnbVJE5Do0v3WS51ebES+jdn9KuAl5oL7iyapGVS8owwjlOpGymdDzCvLThYurDnaHHhdOSaeyxbFQMJi5pXDKYwWlTqmqP41aLHQlmpY7NYdqNWHgBvUlChYfvBdxp+yR+xHF8iaRhGCQ+G7MDRkGbG5sXCFOfSVMW8DYxLYdl6kEcEVYBx1HArKxYhj166YqZ71O5McZxwStxKw/XZFSuQ2QatvTO3QpTSS0eg1nZDVlDx3Bqmx/keYfrqtz+qLYil/4L+YO8U0jk25d1MNvOsS0UhQvtCqIYOjmsQtiTejr2mFUlW47PqgXd7RbhpCQrMvLM4alzHRqNAGeeU243ebrZoLsRcedwTmu7wRU/pjdo8PLtE66c63PnpVO2LnSZHS/Zudan1Y3oDBvMRglR7OuYAaWUegLqhXeTJzKATVcFUcMnWeasJjMqU9Hu9UiTKcm9Kf6wyf2Xj+i1fe6/cANv0OHO5/ZphAWjhSGbVrjZfWZ5BwIH//SIpRlSpS/hRzvSOkvlX8TNXrILsHz/AmX+GkUwIKwcMi8nkI7T4i6OhJM2ZB3g4GMY4xZQeRtU7gg7IakcyTl+cOZgJNiscGTJlr8B3MEtQ4xZYgK5+p/SyLssgyVOGeG5GaXTBnMPt3QJ8Um9EmOvfMur0bRXsXGlklvYEQJuWVJF7rrKkmJXVpRKYOqtn8e6ozUDzw8oqxzflBRy7sht2fmteB3I5+uwdv3CS3grD7fIoSUbwORLlQ1iC1lSVT5IedchryuvTkBpcnwnpJAnZCpc2QhmCjvWwLHX0lcYeeDApe8PiTabFFlKf/sSaTnHL0K2z58nXy0ItmWBQcLWM5cZ3bpPMIjxcygWGc3NHoNnr5HLttkspbu9S3dz246L8OUyvlJKqSemVlDqSfYo/PuT7lWZs5q8+qoNX6uioHHlCsvPfc6ea3d7Gyz+8VMctyrK2RxzcMDBF/+ZxuYO4/ER5mTE7qjgJMjJs4JqNWKVQn8Or1yBp1+C0x60J7C/AdszOGjAKoDhwpb4lLJwalWRRy5eAoHsY1gHrpMGxAYy223g0JsZ0qAOHGUI1zJwOW5Df1H3GSTSERpAYwGHDR8/KFg5Ls3Swy8qbssprDRi5Jc03JCTVoSTlbiZi+s3mXk+w6Ti5e4WR+0+T83GzAOfm92hfU5H7R54Ma2y4MCXoawtVo5HbBxKryKqGsjQAuP4BGVC7vVps2CFRybdrOvWCwlb1/mqjWC9dfku2xvkR5WK2F3frn5XUn9etjpckg5Xe2atHkoWyMgCH7ayehRBI4SpD9dKON9vknQ9zuFzf56xMfTZcGPCrSaNeQLNkNDx6Dd8jmX57v6cD/27i8zvLRjstXB8l3Ses3mhQ6MZ0hp89cxepZRST0a98G7yxF0WlGAtn8y58/yUdHkizZssVxlOu8XpP32G1pVNJl9+jdNpSpFuUTnHFJl0pC4pTYpH1y58KrI7OMEFKqc+ch86FYVUHr5L4Uuod5vCD3CcgJwZrtfEdwoyVzpLO+SswB/iVnKsaUzlnICRMHEL31nim4mdfOqXRxRy1RqHOItJvDF4W5hgCWUC3pDKO7VXq+0HbVJzB8fsgTPFcXfrBVXSDetfIDEneElKGW1jn7A/lfQRNwqpvJa8GBCHYHKQP5dL/GKTIpzWowKcPh4LPHlNggWlTNmvDEWwPl8k5Zc83Ux+Tg83cKkKuRyeQ25ksj6u79rmWbl5IStSbehajyB4nRwxcjwbvkrYK92/D1Ru3Vfr01zPh5XXrcArXYpOSZUtGWzsMjq+zWB7j3l6SupsEbdjtncuMGvN8Johg6vnMVnJ0fFtujubBP0um+cv4bmunc0rHbBfbxOuUkoppZT61oev2f4BJ//0fxNvbjOZHrJ67WXKpy+T/H9/S/bBZyg/+d+ZFVNu5Sd0mz1e5JDhBHZf+Dw3OyCNoVsF7NyBSQvaSziRg2EJ7NyEwy5MQ8hb4OUwbkAewYUR3B9CO4F4JeWry8qDTgj+ov7aMPXxnIKFD1EGzczwmafgmbvrDlDHZSqBYw5HXewc0t4UDhyXeMOQuQUrr0EjhShdcRy0uDqaMYkLhmWbtEroL+YM05x73U2mQUy0yiljB+lbHaYn9v6e67OzWHGr3+f6eMys0+ae32KjKoiMoecXHBppevBJHWn18PGMIXVjGd5lu267xDY0lSq++VC/hN3jsP5YPPS5eP2r3DZb/1m+3gGO1qGshLPbEtpKn4UHpyFckg7Z9aKuLHao2j7NwnC8TNhtx7Yrtx0FbBiX3pUNPByytkdDntjhEr8ZwbJk91qf9jCmKycO01K7XJVSSr1rPFH/j2WqCi8vuPvCXRIS/sef/RfK1h7e7CaNzhYHd4+pvvgaReVQOSmlc0qQTsHtEpgIJxhSmZTS7WLCy7jVHM+VMQZH5G4MrT6UE3uEHkcudw+ppLxx9vFMi9L0iCvfHjsqi6mdV1rK5WB7LTkAR0qXOaUf4pgIzyRU0hkqpZCBIpBSJsCRSsY0Ma5ciw5t0ViFQ4x0hjo9TLAJxSvgNClcKYVkz6h0uC7BNZRxB8pb4O/Z8DR0QzLjEOcJSVSB17bhrJvMqKIupQxwylw8b0U5cym9tF532pa5UY26+lpIhbWEKKy7YyvPHnWqZHK/tAPIxP7YsRVoJeeSHlR3EhpLt6v9D7TurJUXyDP22NnDZHWX67gUVLQlaHZS+XFsmee6hrA9wDUlTbdBmWa0B9tsPvM03dGM+eyY/jPv5fhkn41LF1mcnBL3+xBXfOT/+DFO7txmcO4ccUv+GyillFJKqXdCOZtxMt7n89mLvOK9yBc+/ftcWEa44wX7y4DhZJ/u//PfWBhYSodlCaP5IQMJOwd1IijB5tEuNAu4ea5OF0/bsHcKh0MYt+DcCXSS+j7SqXrtfh3UurkcqTcYma/aglEEckJ/sYKO9+CYfWFPrB1vQH8J/gqePQIpxSXUTXzoSBPqOng93JbQERJT2T/LmbWOt2LuDwjSgvmgSZslyyhh0YyZl3361ZKbrffi5ROq2JDEHXqpwW3uYlxpVKiIUp88DHmvdPgGWywKw7MEbGYeX45dpqXLOdmh4MLcgYGRkNRh6fgUOETr8PVB9SsB6ca6e7V6KGCN1mV7b93hKuX3an3bbB26dtZlvNy+bjdZfy2FrnQO2x27Lu284ny/wTTLOVdAsNXm2l6H1Sil0fKJNkLGpwkXrvRIjxb4m03OPT3g+jAilKXAoUfcrMdRaPiqlFLq3eSJ+H8tO4A/anDn8ze5+cJNpvuG1Yuf5mSWkY9vEK3mzE4CMqdpL/MaJ8IxshgqJDAZSy8GL6xnnsrRneoWxuvg0LVXi6UL1dAkMgmpt7UuSST0dPDk2A89cr+PZyoSt41DgpGJ9KaDbyoqJ6d0DI59tBmuuUjqyFzXCj+Z40QDylI6auWbtfCXRxRxH6T7FIcq2MEzOYWMKCCtg1i/wq2OqOTslDexoaqMX61khmoygsYQjFyf9sjcNl45JpFr9n6Iyf06QJUO3myEI9mqzKz1e1BNoNGwn7Nl1vS0PlskH450zjo0/YAkT+x8fztnQKo218FPKyqpAKUbN4zX82SlUpNZsNLBu/6zfR1cTCYLAOSqvZzl8jFFZY9QSUPwwszwysqOIZDbx80usRvS6LcpY5/58YhL199PsUpp9jt4nYD+3h6Hr91ienJIGDfp7Wyzce4iWZKwffUpXBt2K6WUUkqpt1NSJHYk2HQx4r+98v9z9/Q1vnz7c4xeexEzX5DMbBMn55+Hz23BpbQOV6WLVfoERj3YmkAzhVETWikMTuu5pN05TNrQXNUjB5YRjGPYvwybK9g9qUcO3N3Cdlt+8Ro8c8dBDncdyoiCEvoTmHXAHUNV1J2yUrq2M9ujYI/Vy7sqCYPtjoUCpgGY0OF4YOjJwbfEp+kVpH6LwinYHKe8fNUnrNr00jZFKM0KLfbmBWFV8upgg8g1jJvnaZZ3CY1HFp6nkUzxow597zyLRsFm4ZCagNzkPG1CJq7Dot3mfQ7cBq6WhmMcBsR2rIIEpA+2X8jTHq8/zkujaV25s7MOWiXm7NfDyOzn6/NtdQkvtz+1Z/fqzld/3RU7Wz++PKZoVXU37LGBAR7Tjst23CAZpwwbPlutkGhe0DvXpuF7do7r6bRgdLRisN1m40KbdiciWRa0+vGbGjSUUkqpd4vHOoAt84pqNefk5iEvff6Ewy98kaPjDtn4HpUJqKq83uAZD5jb0kKOs8uL4lO4nXpWUhQSOSmpBI5ORuy2KMwUx4QU5sR2ucpcVh8ZFSChqwSpAa6Z4zl9KjmUIwuuzAzXiWxzp+tsUhkf44wpJPTFx5EuWDOSQQPkzvH6urF0lF6kkhLI3VmXQCV588K6tJEnJXtZKwonxjEdjCzPksVY8lnZnOXv4hcnFJxSmuY68LwCZlJ3xPqRPaNVyvH/UjpfS4zv4yUhpSSorUt4ZkK1LGAu3cARzKYUgYe78qi8JkQyXb8idnySoLQjHWi4dQ4tzzNJII8p7IgEubwugbZMP5Bu25hC5jzZRNdAQ7qHJcPO8YOYYjmXy9sgW1ylasyreohUWdnmWT8rKUKPJM8oqjGL+yOa29s02z1mizEmmbB34SmKNGd8eJ+ty+ftTNdWf4Nmr0fcbMHX2JyrlFJKKaXOVlZmnCQnvHLnBf765n8ivX+Hf1h+jjKDnimRnoWNDE5jmFyQI/tw47w9xMTeMQxncNSDmQ/3tyBcwGAO8249PetqWXe/7vfh/IksnYKBjBZYB6h3diGQ7lbpQ3Dh+m24s4ldnjVrweY+3NmqZ8WO2vDKLrz/Fow70JxA6IPsvpWyuzuDZaM+/CUn/4PScHEMt7elii/IWg0a4yWTpuw/gDJsg3PIpDshSGPyxjkaxRHTSE57hbSXCWXcsAGtU0VUvSHzMKJFhLR4hN6Q2CyYuT6tZovccxi6KW6esgwinjIw9lxi1xDJeoWHOlpvAbsPjRrw12GqxJvyrmiwvp0EqQ8OrjUeur/t4l23TUQPzYF90BXbW3fUPpgReyWXE4CGC0uPk+WM882QTjNkepTi9mOc05Rg2CSZ55x7Tx838Nnca+HasWC+/RA6HkwppdS71WMdwCajCYvJivHxCUE+ZZlJ+llQmhinOsF1dyjJMLLkSjpcpeuykiVPPpXnkMrZIjYx5n49R9UZklQyWkDC0CUOcux/hOtIJ+wmQTHB8zq2yMgd1z42HOM7OxSUdpaRY2SR1HE9rtUWTxXGhq+yaErCXUktfTBj3MpQOkd4Evoywi9PKbzLNJY3WTUkuJXn3LHjDowtnWQW7GRdBsmPI2VRk6K4aztUTbCLI4/vnOAyxK+W5KsUrzimkIVeUYGzzDFBTiWdp7JMK59QLCR4DXBbAdWywo82KMwCU0oZtgJ5XWVEQrEEI1Woi7soqWRUgoTIUihV0tHbsI/DTF7XlEKuYNtO2nqzrRvKHjCZ+SqzbUOKbAGlqTtg7bpVmUtr8KRDWb5eZVRS9QoZ7ZCucGU7beUzX41wWh5bl64yn494/7/5KKdHd7n03m8jakk5qJRSSiml3kllVTJajbg3vccr+b4dETZZTfD9mMbBwtbLEpi+tAWbU9jvwb9+CV4+B7Ke4NY2fGkHro3rTtRlALELQQVHDdiWuau9ehFWFsKkCbMGvP9efe3/pR3YnEN3VYeuMuv1srQ8LOCkBxcP4ahVd89uJ3C7C9/xCjSlR6KQEV4Zi6heNjVtRTSqlEUI3SziMEiZBy6LjQ6NlSyPXVIELU76Pr15g2mrxd6ize3dlMrtE/uJ7eo97e1g/KdoJYajjTm9qkPSbLAMNigaOWE5wE99TocNNsYGL3aJw0gOmxHJ3gQjTyi2Z/EkZN5dL7CV9yfLdTAqAev2OkwdrGe4Cn/d6Vo+tIxLHkdaQ9rr7lZ5C+M9dNvV+vEkzD34SguJDXMn618beESuRzk3hKFjD79VXZfp8ZytrS7tfojneXS6IXEn4PL75dkppZRSj5fHOoB14iaruyMODiImMw9/mZAvE3vUv3Q31mXCgEpmnkr5YCIiGmQyv9UGrS05KAPOJnCCX92ncOUlk6KgxLgNOwfWcXoE0rnqN3DMCOOE+OWKwnUIqz1k5KlfypVuY8cblPZasQyGkqVWEb50sJJTyeM6cpgnwCHHeF2MdOMaufYsj7PCq1JWjQH2HL6ZrwNXR8bU2/mvvP7YXZC5tPL1uI1HQFEd2nm28rWqOiXzDX4RUjRC26HrzWbk3R0oEoycr2qsd5hKN2sqoWxsu1CLqp67ZPPVoAm5lGiVXTpmh12VMqVVDi7Jc6hnz0rw7DgzKrvETF5TmRUb2jm3pVNgkgVl5uGYGOOs8LMlpQ2zV5BmEMn3lrEFKWUhM2h9CKv1DFmpzuf1lfeiopgf0t27QKs/ZPvyFfaefg+bFy5znve+038llVJKKaXUmsz2l47GF45f4Hh2zGJ0zIE7Z1otiWSUwBIGY5hswrgts1nhP38AntmH14bQyWHWhZvrKVI7p7KstZ7luujCpILOFLyoHj8gfQISqt7ago1TOyGL3VM46kCQw3AOr23bfg16Kdwf1IFtV/owSmgV8OULEKbSgZuRezCYYn9NgpQbW/VM2C/tOPRP5fMOSTilv+rSKzdx5GRep+Le5iXa0qWaw6Tf5up+n/3ugLIokfN5y/ZnqZz/QBkPODE+ebOgP/VIfReTN3G6hqHMcA0MiyC2pTCOb+vtwInJkPc7sW3VsO8Z7LuFhBaxDUujr7Rr2HYR+ZDd1c6627Wz/pCA1V1/Tl7ii+tQdT1xwdbeG+uv76/D1gezYqXH4qr9DnLSz8GvHDqyz0E+lbsEy5yNvR69rRab55s8853n8QMdB6aUUurx5T/WA/zvzFnNMlbHI1jMmS8rnColrGReqgSeFS4BlYSBtCmdef25qk+Ah5FQz1nZcajSnlm5EtQ6xMkBSSzlRknlSjhrKO0BnQbGaeOaBYUnk5GkMJNZr3OM38DYEQKn6+BVSpzEFi710XzphZXrxR2caoJxm7ZAMeYYnHPra84SVkaQn4AUKE7LFlv2unIxxXgjKke+x/26JDKvrksgj9JkOHaYq5RDMxw7GmFA0SihSHGKFnlDHk9CW1n6lcA0B79RL8mqSlhJ8Glw3Mw+bzuOYDWuby+drfJj2PmzIRQzfAxFKaWbhMltjF0eJoFpcx14Z5ROC79KbEiNK7eRcLZJIQu97G3k9hIAS/eydMLm0G7UZ8TkFZKzZeujSG6zTbPT40P/138kDkKe/tf/ltB2CiullFJKqUfJqlgRuAGfvv9p2kGbT00+hdvpcD8ds3fscRIVct2dL12ug0/Z63oS1ouzXtuAZ+/Vs11lxutBq971KsHosgnbY+hM6i5WCVjv7tTLsGQhlIS602Y9rkC6Zl+8AJMGhCUcDKC9hN4MJl3oLqG/gsM+7HfrgFbKzlJK4FxCXp/esmDUDZiFuaxgYG535yaETsjtXos4W7DfyvHSgJP+VRa+PKaci9siie/hO9/GSbvPazs3aOfb7CV99iYbJFEB3iaOkzDIm6z6hqocE4a+3QshqyCSlmOD0fqcvyFzG4R2GJr0tjoYOxrtgXr664M3fzLCjHVg+iBsDdajBSRQffA1Ie+ApPL21h2z2fprD+bByq9PrW8r388lpIsszq3P+z2QYmg3HNq7ER/+P58iaPjsXusTrscLKKWUUo8z/3ENX5epS75MWd69RehVjOYuKyI8v03hyVKuNtVyRSlzkoxj5776VQPjevZqdUELxxtg6rLGBqehFDZOTBL3cMmo7LVjCRVlbqldx1UHqY6Es1JYSZnh4ZoMz2lSmhFemVL6UuZIh60c1pHAUUqe3fVYA8eGr9LJWlHiODs41TG+icg8KYkyCDbAzMCGrRL8xpSedMBOwMj3lseXs1XSbSrdoSUE0qXah3wf392hcEu7DEvaWH23QR7HOFmAky8wsiCrOpVVpfbn8hIJcBMc49mfx1Slfb1yM8aR51pJWCrFn7QHxFBIcCpjExY4VYjjNKgq6Yh16zDXCvBsAJtSyPeR0QjSdSs1WiXPVSYsyCra9eQoWZAlC7eCRv37B1oygsGwc+kq7/mO7+LcM+9l8+JlijTV8FUppZRS6hFVVAUvj15mmk45WB5wtXeVT9z4BFVaMW+4pAmUsZwmkziz7n6VkFSu50uX6UEf5g1soLmzgDSEPKwPScnMVulobSTw0rk6tJ23ZN4sHFyuF2rJ3Fdf6s5xPbd1JrtlZezAPry4Uc+Lvb+JXWIroa10zRZBHdY2DGxOpC+h4r++B3xZQpv7uHnBohlw0ulSOKd0c4cvn2viFSHzzS6JV9Gfx9wdNonznPbqCrlTMRpOeGrxESqzYuXl+Bvdeinwao4XRCy9JS5NHL9jl5HZnodAwtU63HTcEI8Yad9YUki/qdzatnr08FhS2XcmUkHLO5gmHiMyW8/LMLQhHtKGktt3QEHdIIIhs7GsVO11lJvI4tt1lFuPP/PWt5Z1xIG0psg7n9dD1wcVu327UMGzzw547785T9wO6G23iJq+dr0qpZR6YjyWAazX6TC9vY9nFsSDHfJZTrc9ZSlnmYojkoZHtVriui1cU1JKR6bxSG2wlxKWEwqvK1GjXbQV0LSdmokjweap/XNuj/nLClTpwJTDOOH6WvB6BAATXNvVmoDbsx2oOO16aZW9btwAc2pHENjZqLY7VfLS+jh/fT1ZHmuLypmR2ZEJo/X1Z/mQWbKy6EvGpEqXbIhftCn8g/XX23X4Kh23ocy8le7cCQQyj3aKX0YUsrBLumA9H6+YUHmFLWqd8QQv7lNkCTRzysCAL8GrPLR0zJYUzgonauAtxpSRDG+V2a8S3JbgyOswtX+9jBeut5XKyIF1gCw/nx9RljKyIIdA7ivzYT1YunXAKo+ZyzKveqGYHW0g58SkcvZCnGbM3rWnuPK+f0WaLPi2j/4Hmt0ejY78N5IfX+e8KqWUUko9yk5Xp1wdXLVBrJSvzWaDy+Ueo2iJMRNymWa1KAkXxh68ko7VzIORzHUdy4gvaJWwL4NKC1jE4OfQWtULseKiDmNvduBf3YDbOzCO4JL0LDhw0KwD1fPzeu2Ak8LLu/X1/2lQP+4zd+Botw5xT3sxbpFQVHBnENNbJCRRQBH4zLOMIvQx/jn81Zh7O0NSLyTMZIZrxd2WR28x5PbWjKhqMkiaLFo9e9IrMG2SYEVZGRp5E68sKfyYvCXzbGW/wdK+Pg/6VyVulXECU0ridaerxLJyRq25vo2EoA/GDNTn6mr+OhYd1AMKbLuI7Uy1twltICvvQFq49uMBWftbyCxXXHv7hztb5WsS8H4lcq3fZcoe3a0LHc6/d4PQc3nqO3ZpDyS+VUoppZ48j2UAu5ymdHebLCYuk/2bBGZE6+kNmt4pk/Qid+4muDI/1Nkkzw8IiogyKwmLAxx3iOPEtvDKnRKv6lB4UmZ0CBiR0yO3R/zl8nOKY0PYFoYRURnbQqsOSkPZTWoLEWPmuDZU9TG221VCRQfHiTC2E9bHMzmlDS6b6/vLttMDjO0AldH4MjLgDjjX61H4Tj0yX1Z7uVVll2YVUnFKUGlnPWV4ZUYpow8qWWwlRZEP5V3wfAovw5i+nVnrlh6VDLuqZPnXHNOvKMppPebAHvEv6kxXSEU2aOPNXEonodwM7aQAT66Rh0bOk60PHzmYsImTTqDRhmQdGstcWbmJrK5NpLM1htir7yeVcEtC2jUZNZAX6+5XD+Im57/tac4/e53NS1fZuniZzfOXKIsCz38s/yorpZRSSj2Wy7fSMuXp4dOMkhGbzU1ujG7w7698LzcPX+ZedsiSxHZoLtoL8pYhCRwahyXdqWEkh8AkPAzrbtam7HqV7K8+VW9HBow7EJX1Yi4vhU8+B42sXiFwT0piUy/l6hT1n49jWZ5Vd7kOx3B3D+IU5v2QUWRIo4pRnFAFLm4R4BaFnTUrna/jZoLX7DOYLqnKe0xbe5j5Ma7bwHEHuMWUVjlg3shpZHISzeN+L6CbePhGelsjjrxDdswWbtAnC6TCL3BloK28Xq70l8pZNqn/v0Lmtsq8V9lhIR4Erm/0xsD0YW/8vDSQ1NHsV6s7aJ23vM+Dz4ugB5vn23z7v7toR5NtXe7S2YgxlcGV02xKKaXUE8oxdXvi22o6ndLr9ZhMJnS7Ujp8a6XLnKoyLMZzSgnwCDj+8pcZXLvGp//fz+EkKbN5wnLZJvZvMRkFVFVsj9b4EmpOZcaSj+clFHGMtxpROh6lLWukg1PGCkAlXavrrlfpdTW2+JHR9jIX9qi+xlzOKL0Y38QUjiMvuA1cjV0AJp23MpvV2KJE5sz6+QFFIOMIDuspTNIhKleTZRQqE4yRAFeO92xS2s5bWcTVXC/lkhEAsvRKtgrI507WYa+UUS6UJ+BJi4DMhu1D4WCHVQmZvSpjEIrbuIVLFa4LKTtjdj1/VZ6K/FFCWZn16vlyfgyyqj4TVsjzrMCTv1Je3T4gA7vsQzl1d6x8Xu7ny7Iu6YRdh7z2LvIc5f4PFWdxk86gz95Tz/D0d36Y7StXabS6NM/g741SSqlHx1nXCkqpd/bf3yydscgXnKxOaAZNvnD8BTzH4978Hq+OX+X2/DYnixOSPKHvdflS+iKNKiJxMzqT0p61IpKGBoOXGaKFsXsbGgWcRhDJ6gB3Hcyaej6sdM/aJk0PQukRcGV5Vn3wLJXDUyXIeoRUylPp4FxCIfNJy4JQZq4GHjtzj2lc4WcFRbxBkI/w3A2i1YrMD+kmO6zcI4b5M+TOksRL2Mqu2Vq7zDu0Smnq8OmUm8jKLXkyXXaoSO1uCululYpZjvM3iIhlLNo64EztQLU6bH2UyD7jOIJrHzzH1ec2Gey0CXyPqPVgiqxSSqnHkdbr35zHMoB9oyIrWewfUvgRWWo4vX9AmVTc/h932Lh2hYNXRjSqA+4dRmQnUzuLCX+EwzbV7JSsygndLhkz2y2bByF+tiLHw4vbeJ5HtkjwE1nYVa6DWR/HjzCFlIdd4mpMIoGkdHyWEuI2cc0Ur3LIvYLISFElYaWEqhGEcmQ/hPQYvB18r6LIZBFXCtEepKd4Hd9uS3WdBqRzTKOP62VU6QwvSPH9DCep6O1d5+T4LpW7ouM7ON0u6WT6+uh93y9JV9DsRrhhwejuAVs7V/EbFaODA1qNDmXkUY5H5JXD5rVLnN67STOIOL57i0COSBXrObD1VKr616gB0zGEsqBrHejaAPqrr5p3Bps47QZBEBPEMc9+14eZHB/zng/9W2ajE576jg9RFTmeHxA1H+qQVUop9djSgk6pJ+vf3zybc7g6JM1TpvmU26Pb3J7eZpSOuNC5wOcPP2/3N9wqDpgtjqlMRd8fclAe45UFs6CwKw9yaW2VjPK0hEgu6jv1FimpP0dS469PdEk3RewSFpBJU0IhNWoJfc9+3V9AYEcehLSmCYvIpZSO2dOSWdMnzlyiuImXDGknM+53pmxWF2jM2rSqju1hMGlseyLOF5dZJnOa2ZB20YOmQ5WU7PqXWJgRvdYQN4PlYv10e3L4y2ExlXFo0O36rOYFnnTzui6Oa1jODIOdmLLImY1KWn0P3/c4PcwIIxhsNZhNVhQSRNe7gu2ptf9VcR/ag5D5OOPSsxsMd9rkhexh6NpZrsNzLYrM6IgBpZR6gmi9/s15IgLYBzJ7PB6CSAoXl+n9OfgFi5OE5WRM/9we91854vZLx/T9FdPJ0i6rWkqIubHF/ssvkaURzfaAfl82kDrkSc7k7ph2T9avuixHx6R5TNxuY8oVReqRLUe0dy7TaoxIpyt7/MaNQhrtNuP79/CbAcUip/K7XP7AZe6/fIO9p7YIQ4d7N8bsvKfD6d3bpPOKS+97D37gkMym9Hc2Od6fyEhUZvtz4kaD/kaTyqvYuLBDnleYMqdIc7zExR82cLIli3RBkbm0hi3KJCfPMpLphEvPfZjV9C5VVTEfnXLx+vtYzVKy0wMyY9h75gPcfOF5PN+h1R/S7PeZHN5nOR7juj6zySmu69HqdHClOAx8OhtbHLxyw/6s0uWb5SmtTp/2cIO42SSVStPAxoVL9vdBFBG1WpRZjh9FOI6DqSr730sppdSTQws6pZ7Mf3+TZELgBcR+bMcUHC2PbK14d36XrMhoRS3+4fY/cDS7zzPNq/yX438ijmK7cMtpN3np5CVW2YpznXMkZcL53nlOZ6e8NH2J53aeY3+6z2w1oxN3GC1H9Ft9G/4usyVXu1cZtAbcmt5iO94iI6cTdrg5vkXbb7EV73AyGfFc90OM09fY3LpMvGpzK3+FLXeXeNWxq7CuX3iaO7M7BFmDS+EVRv4Rk9GMMjF0iw2uvW+P5SSjt9Uiz3Ki0ON4f2lr5Yvv3+Tk1oQsKfFCaeGF/mbM6DClKks2zrfJkorZ6ZLNCx0CXxaWlSznGa12RLMbcOfFU9oDaW5waQ1i7t8Y2cNrctv5JCFPDXtXuowOVrSHIV7oMztZEndCqtzY9wZRI6Ldj2kPY9JZZp/P+WeHTE5WxM0AP3SpyoogrMeAyVtKqduVUko9ObRe/+Y8UQHsw0Fs2PBJl0s8P8LJVnZMwNHN1+jtnmN8KEFiE79KyCcTyuYQ1ytoDdu0ezGje8ekSWpDV+mAndw7wnccjo5Ldi4NufvSq+w+fbkODUuZtZqzeX6b47uHtAcdDl++zeDiZVaH+zQ3e6zmOVE3pMoS4pYEt+D6LlGzyeToiK1Le4zun+J4GfORFFvnbFHkuA4n9+7i+bLQq6TIEzYvXLQ/l5D7LydT3CAgbn7luNJiPKbV77MYjyjLks5wwxZ38ngPF1APgk8p9oosrX8eGaPguPhBfaTI/vWR20ugfXxEs9enzHP7vd9KniYE0de+Mi7P/WvdVyml1JPjna4VlHqSPQr//mbZzIafMqJgGA+ZplPm+ZxxMqYdtFkUC1b5ikbQsLedZlM2402u9K7Yzz1/8Lw9un994zqrcmXHGozTMQ2vged6HM4Pub553TYNSD1bmpKtxharQpZhlRytjmgFLfIqtyMSJKA93znP0eSEyxsXYeqTRymBGzDL5mx3trg3v0+jalKsKlqbIb1G/dod7Y+pCkO73STPC/pbLebjxAacMjZtfprQ3fxKrb6cpDR7EWUuTREJrX6EJ+24tpH3jfW61OGOfX8jb+qihm9/H0Te67W9BKXSACL3WYxTGu2Qsqjs+6G3kmclQfjQQq2v8V5KKaXUk+1RqBfeTZ7IAPYB6bh8qyPtyWKOkbH/HvZI/IPFXjLHqEhXuL5vOz2X04LO8CthoryUcnVYCp9vRpHnZKslzW6v/l6TGc1e53/751NKKaXerR6VWkGpJ9Gj9O/vQRD7sFxOeJWFHVWw09qxn1vmSxuiSjgrQa2QmbISoPai3td9zK/nwePL/aRjtuE0Xw9FlVJKqSfRo1QvvBs80Zcuv9Y8UelCfSOZjyq8h+7TGX71yydXob/Z8FVIN6kvK0MffC8NX5VSSimllHrLoFRGFMhHI/xK16h0qf5L9/l6j/n1PPz47fDN7xWUUkoppf4letlWKaWUUkoppZRSSimlzogGsEoppZRSSimllFJKKXVGNIBVSimllFJKKaWUUkqpM6IBrFJKKaWUUkoppZRSSp0RDWCVUkoppZRSSimllFLqjGgAq5RSSimllFJKKaWUUmdEA1illFJKKaWUUkoppZQ6IxrAKqWUUkoppZRSSiml1BnRAFYppZRSSimllFJKKaXOiAawSimllFJKKaWUUkopdUY0gFVKKaWUUkoppZRSSqkzogGsUkoppZRSSimllFJKnRENYJVSSimllFJKKaWUUuqMaACrlFJKKaWUUkoppZRSZ8TnHWCMsb9Op9N34tsrpZRS6hH3oEZ4UDMopd4+WqsrpZRS6uvRev1dEMDOZjP768WLF9+Jb6+UUkqpdwmpGXq93jv9NJR6omitrpRSSqlvlNbr3xjHvANRdVVV3Lt3j06ng+M4b/e3V0oppdQjTsoTKebOnTuH6+rEJKXeTlqrK6WUUurr0Xr9XRDAKqWUUkoppZRSSiml1JNAI2qllFJKKaWUUkoppZQ6IxrAKqWUUkoppZRSSiml1BnRAFYppZRSSimllFJKKaXOiAawSimllFJKKaWUUkopdUY0gFVKfcM++tGP8vM///Nv+vzv//7v0+/37e9/5Vd+xW5M/tjHPvam2/36r/+6/Zo8zhvduXOHMAz5wAc+8JbfW+734KPX6/Hd3/3dfOITn3j963/3d3/HD/3QD9kNjHKbv/iLv/jf/GmVUkoppZR6d9F6XSmlHk0awCqlvuX29vb427/9W1ukPex3f/d3uXTp0lveR4rCH/7hH2Y6nfKP//iPb3mb3/u93+P+/fv8/d//PZubm/zgD/4gr7zyiv3aYrHg27/92/mt3/qtM/iJlFJKKaWUenxova6UUm8vDWCVUt9y29vbfN/3fR9/8Ad/8PrnPvWpT3F8fMwP/MAPvOn2xhhbrP3kT/4kP/ZjP8bv/M7vvOXjylX73d1de9X9t3/7t1mtVvzN3/yN/dr3f//382u/9mt8/OMfP8OfTCmllFJKqXc/rdeVUurtpQGsUupM/PRP/7S9Sv7w1fQf//Eft8eW3kiuvi+XS773e7+Xn/iJn+BP/uRP7BXyf0mj0bC/Zll2Bs9eKaWUUkqpx5vW60op9fbRAFYpdSbkuJEcT5JZT1Kc/emf/qkt8t6KXEH/kR/5ETzPs1fLr127xp/92Z99zceW4u+XfumX7O2/53u+5wx/CqWUUkoppR5PWq8rpdTbx38bv5dS6gkSBIG9Oi5HlWTu0zPPPMNzzz33ptuNx2P+/M//nE9+8pOvf07uJ0XeT/3UT33VbX/0R3/UFnFylGlra8ve5q0eUymllFJKKfUv03pdKaXePhrAKqW+Yd1ul8lk8pZFmWw6fSO5gv7hD3+YF1544WteTf+jP/ojkiSxt3t4xlRVVbz44ou2EHzgN37jN+yxJ/leUtAppZRSSimlvkLrdaWUejTpCAKl1Dfs2Wef5Z//+Z/f9Hn53MOF1wPvf//77YcUdDKs/63IVfFf/MVf5Pnnn3/947Of/Swf+chH7Byqh8lA/6efflqLOaWUUkoppd6C1utKKfVo0g5YpdQ37Gd/9mf5zd/8TX7u536On/mZnyGKIv7qr/6KP/7jP+Yv//Iv3/I+n/jEJ8jz3G5EfSMp3qQY/MM//EOuX7/+puNLv/qrv2o3pfr+1/+fqvl8zo0bN17/86uvvmoffzgccunSpf+ln1cppZRSSql3E63XlVLq0aQdsEqpb5gM25ch/V/60pfs0SI5hiTD+mUA/8c+9rG3vE+r1XrLYu7B1fT3ve99byrmxMc//nEODw/567/+62/ouX3605/mgx/8oP0Qv/ALv2B//8u//Mvf1M+olFJKKaXUu5XW60op9WhyjAxvUUoppZRSSimllFJKKfUtpx2wSimllFJKKaWUUkopdUY0gFVKKaWUUkoppZRSSqkzogGsUkoppZRSSimllFJKnRENYJVSSimllFJKKaWUUuqMaACrlFJKKaWUUkoppZRSZ0QDWKWUUkoppZRSSimllDojGsAqpZRSSimllFJKKaXUGdEAVimllFJKKaWUUkoppc6IBrBKKaWUUkoppZRSSil1RjSAVUoppZRSSimllFJKqTOiAaxSSimllFJKKaWUUkqdEQ1glVJKKaWUUkoppZRSirPxPwEzC3b9tjhwEAAAAABJRU5ErkJggg==", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Plot UMAP colored by Donor (to check integration) and Clusters\n", + "sc.pl.umap(adata, color=['cell_type', 'leiden_1.0'], wspace=0.3)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "44a42466", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "cell_type natural killer cell memory B cell naive B cell \\\n", + "leiden_1.0 \n", + "0 16 0 0 \n", + "1 3307 0 0 \n", + "2 4 2 0 \n", + "3 0 0 0 \n", + "4 1 4 0 \n", + "5 115151 0 7 \n", + "6 4 13789 31184 \n", + "7 3 9320 19541 \n", + "\n", + "cell_type regulatory T cell \\\n", + "leiden_1.0 \n", + "0 562 \n", + "1 24 \n", + "2 6528 \n", + "3 5765 \n", + "4 8161 \n", + "5 1 \n", + "6 67 \n", + "7 29 \n", + "\n", + "cell_type naive thymus-derived CD4-positive, alpha-beta T cell \\\n", + "leiden_1.0 \n", + "0 2518 \n", + "1 73 \n", + "2 100187 \n", + "3 69940 \n", + "4 23760 \n", + "5 3 \n", + "6 79 \n", + "7 33 \n", + "\n", + "cell_type central memory CD4-positive, alpha-beta T cell \\\n", + "leiden_1.0 \n", + "0 51618 \n", + "1 427 \n", + "2 34061 \n", + "3 37446 \n", + "4 104032 \n", + "5 51 \n", + "6 145 \n", + "7 93 \n", + "\n", + "cell_type effector memory CD4-positive, alpha-beta T cell \\\n", + "leiden_1.0 \n", + "0 17521 \n", + "1 2570 \n", + "2 1341 \n", + "3 1125 \n", + "4 2417 \n", + "5 62 \n", + "6 23 \n", + "7 7 \n", + "\n", + "cell_type effector memory CD8-positive, alpha-beta T cell \n", + "leiden_1.0 \n", + "0 2877 \n", + "1 102572 \n", + "2 211 \n", + "3 150 \n", + "4 142 \n", + "5 8594 \n", + "6 35 \n", + "7 11 \n" + ] + } + ], + "source": [ + "# compute overlap between clusters and cell types\n", + "contingency_table = pd.crosstab(adata.obs['leiden_1.0'], adata.obs['cell_type'])\n", + "print(contingency_table)" + ] + }, + { + "cell_type": "markdown", + "id": "eb9e87c0", + "metadata": {}, + "source": [ + "### Pseudobulking" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "28538794", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Aggregating counts...\n", + "Pseudobulk complete.\n", + "Original shape: (777594, 29331)\n", + "Pseudobulk shape: (5584, 29331) (Samples x Genes)\n", + " cell_type \\\n", + "central memory CD4-positive, alpha-beta T cell:... central memory CD4-positive, alpha-beta T cell \n", + "central memory CD4-positive, alpha-beta T cell:... central memory CD4-positive, alpha-beta T cell \n", + "central memory CD4-positive, alpha-beta T cell:... central memory CD4-positive, alpha-beta T cell \n", + "central memory CD4-positive, alpha-beta T cell:... central memory CD4-positive, alpha-beta T cell \n", + "central memory CD4-positive, alpha-beta T cell:... central memory CD4-positive, alpha-beta T cell \n", + "\n", + " donor_id n_cells \\\n", + "central memory CD4-positive, alpha-beta T cell:... 1000_1001 562 \n", + "central memory CD4-positive, alpha-beta T cell:... 1001_1002 392 \n", + "central memory CD4-positive, alpha-beta T cell:... 1003_1004 414 \n", + "central memory CD4-positive, alpha-beta T cell:... 1004_1005 248 \n", + "central memory CD4-positive, alpha-beta T cell:... 1008_1009 656 \n", + "\n", + " development_stage sex \n", + "central memory CD4-positive, alpha-beta T cell:... 73-year-old stage female \n", + "central memory CD4-positive, alpha-beta T cell:... 57-year-old stage female \n", + "central memory CD4-positive, alpha-beta T cell:... 58-year-old stage female \n", + "central memory CD4-positive, alpha-beta T cell:... 74-year-old stage female \n", + "central memory CD4-positive, alpha-beta T cell:... 71-year-old stage male \n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import anndata as ad\n", + "from scipy import sparse\n", + "from sklearn.preprocessing import OneHotEncoder\n", + "\n", + "def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols=None):\n", + " \"\"\"\n", + " Sum raw counts for each (Donor, CellType) pair.\n", + " \n", + " Parameters:\n", + " -----------\n", + " adata : AnnData\n", + " Input single-cell data\n", + " group_col : str\n", + " Column name for grouping (e.g., 'cell_type')\n", + " donor_col : str\n", + " Column name for donor ID\n", + " layer : str\n", + " Layer to use for aggregation (default: 'counts')\n", + " metadata_cols : list of str, optional\n", + " Additional metadata columns to preserve from obs (e.g., ['development_stage', 'sex'])\n", + " These should have consistent values within each donor\n", + " \"\"\"\n", + " # 1. Create a combined key (e.g., \"Bcell::Donor1\")\n", + " groups = adata.obs[group_col].astype(str)\n", + " donors = adata.obs[donor_col].astype(str)\n", + " \n", + " # Create a DataFrame to manage the unique combinations\n", + " group_df = pd.DataFrame({'group': groups, 'donor': donors})\n", + " group_df['combined'] = group_df['group'] + \"::\" + group_df['donor']\n", + " \n", + " # 2. Build the Aggregation Matrix (One-Hot Encoding)\n", + " enc = OneHotEncoder(sparse_output=True, dtype=np.float32)\n", + " membership_matrix = enc.fit_transform(group_df[['combined']])\n", + " \n", + " # 3. Aggregation (Summing)\n", + " if layer is not None and layer in adata.layers:\n", + " X_source = adata.layers[layer]\n", + " else:\n", + " X_source = adata.X\n", + " \n", + " pseudobulk_X = membership_matrix.T @ X_source\n", + " \n", + " # 4. Create the Obs Metadata for the new object\n", + " unique_ids = enc.categories_[0]\n", + " \n", + " # Split back into Donor and Cell Type\n", + " obs_data = []\n", + " for uid in unique_ids:\n", + " ctype, donor = uid.split(\"::\")\n", + " obs_data.append({'cell_type': ctype, 'donor_id': donor})\n", + " \n", + " pb_obs = pd.DataFrame(obs_data, index=unique_ids)\n", + " \n", + " # 5. Count how many cells went into each sum\n", + " cell_counts = np.array(membership_matrix.sum(axis=0)).flatten()\n", + " pb_obs['n_cells'] = cell_counts.astype(int)\n", + " \n", + " # 6. Add additional metadata columns\n", + " if metadata_cols is not None:\n", + " for col in metadata_cols:\n", + " if col in adata.obs.columns:\n", + " # For each pseudobulk sample, get the first (should be consistent) value\n", + " # from the original data for that donor\n", + " col_values = []\n", + " for uid in unique_ids:\n", + " ctype, donor = uid.split(\"::\")\n", + " # Get value from any cell with this donor (should all be the same)\n", + " donor_mask = adata.obs[donor_col] == donor\n", + " if donor_mask.any():\n", + " col_values.append(adata.obs.loc[donor_mask, col].iloc[0])\n", + " else:\n", + " col_values.append(None)\n", + " pb_obs[col] = col_values\n", + " \n", + " # 7. Assemble the AnnData\n", + " pb_adata = ad.AnnData(X=pseudobulk_X, obs=pb_obs, var=adata.var.copy())\n", + " \n", + " return pb_adata\n", + "\n", + "# --- Execute ---\n", + "\n", + "target_cluster_col = 'cell_type'\n", + "\n", + "print(\"Aggregating counts...\")\n", + "pb_adata = create_pseudobulk(\n", + " adata, \n", + " group_col=target_cluster_col, \n", + " donor_col='donor_id', \n", + " layer='counts',\n", + " metadata_cols=['development_stage', 'sex'] # Add any other donor-level metadata here\n", + ")\n", + "\n", + "print(f\"Pseudobulk complete.\")\n", + "print(f\"Original shape: {adata.shape}\")\n", + "print(f\"Pseudobulk shape: {pb_adata.shape} (Samples x Genes)\")\n", + "print(pb_adata.obs.head())" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "8fb76888", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dropping samples with < 10 cells...\n", + "Remaining samples: 5561\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "min_cells = 10\n", + "print(f\"Dropping samples with < {min_cells} cells...\")\n", + "\n", + "pb_adata = pb_adata[pb_adata.obs['n_cells'] >= min_cells].copy()\n", + "\n", + "print(f\"Remaining samples: {pb_adata.n_obs}\")\n", + "\n", + "# Optional: Visualize the 'depth' of your new pseudobulk samples\n", + "import scanpy as sc\n", + "pb_adata.obs['total_counts'] = np.array(pb_adata.X.sum(axis=1)).flatten()\n", + "sc.pl.violin(pb_adata, ['n_cells', 'total_counts'], multi_panel=True)" + ] + }, + { + "cell_type": "markdown", + "id": "2907db1a", + "metadata": {}, + "source": [ + "### Differential expression with age\n", + "\n", + "First need to z-score the age variable to put it on same scale as expression, to help with convergence" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "b6298767", + "metadata": {}, + "outputs": [], + "source": [ + "# first need to create 'age_scaled' variable from 'development_stage'\n", + "# eg. from '19-year-old stage' to 19\n", + "ages = pb_adata.obs['development_stage'].str.extract(r'(\\d+)-year-old').astype(float)\n", + "pb_adata.obs['age'] = ages\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "f5fade97", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " age age_scaled\n", + "central memory CD4-positive, alpha-beta T cell:... 73.0 0.632347\n", + "central memory CD4-positive, alpha-beta T cell:... 57.0 -0.313739\n", + "central memory CD4-positive, alpha-beta T cell:... 58.0 -0.254608\n", + "central memory CD4-positive, alpha-beta T cell:... 74.0 0.691478\n", + "central memory CD4-positive, alpha-beta T cell:... 71.0 0.514087\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "from pydeseq2.dds import DeseqDataSet\n", + "from pydeseq2.ds import DeseqStats\n", + "from sklearn.preprocessing import StandardScaler\n", + "\n", + "# Assume pb_adata is your pseudobulk object from the previous step\n", + "# 1. Extract counts and metadata\n", + "counts_df = pd.DataFrame(\n", + " pb_adata.X.toarray(), \n", + " index=pb_adata.obs_names, \n", + " columns=[var_to_feature.get(var, var) for var in pb_adata.var_names]\n", + ")\n", + "# remove duplicate columns if any\n", + "counts_df = counts_df.loc[:,~counts_df.columns.duplicated()]\n", + "\n", + "metadata = pb_adata.obs.copy()\n", + "\n", + "# 2. IMPORTANT: Scale the continuous variable\n", + "# This prevents convergence errors.\n", + "scaler = StandardScaler()\n", + "metadata['age_scaled'] = scaler.fit_transform(metadata[['age']]).flatten()\n", + "metadata['age_scaled'] = metadata['age_scaled'].astype(float)\n", + "\n", + "\n", + "# Check the scaling (Mean should be ~0, Std ~1)\n", + "print(metadata[['age', 'age_scaled']].head())" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "0e30e949", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/var/folders/r2/f85nyfr1785fj4257wkdj7480000gn/T/ipykernel_48619/2983949029.py:14: DeprecationWarning: design_factors is deprecated and will soon be removed.Please consider providing a formulaic formula using the design argumentinstead.\n", + " dds = DeseqDataSet(\n", + "Fitting size factors...\n", + "... done in 0.18 seconds.\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Using None as control genes, passed at DeseqDataSet initialization\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Fitting dispersions...\n", + "... done in 1.58 seconds.\n", + "\n", + "Fitting dispersion trend curve...\n", + "... done in 0.19 seconds.\n", + "\n", + "Fitting MAP dispersions...\n", + "... done in 2.87 seconds.\n", + "\n", + "Fitting LFCs...\n", + "... done in 1.74 seconds.\n", + "\n", + "Calculating cook's distance...\n", + "... done in 0.29 seconds.\n", + "\n", + "Replacing 1 outlier genes.\n", + "\n", + "Fitting dispersions...\n", + "... done in 0.01 seconds.\n", + "\n", + "Fitting MAP dispersions...\n", + "... done in 0.01 seconds.\n", + "\n", + "Fitting LFCs...\n", + "... done in 0.01 seconds.\n", + "\n" + ] + } + ], + "source": [ + "# Perform DE analysis separately for each cell type\n", + "# For this example we just choose one of them\n", + "\n", + "cell_type = 'central memory CD4-positive, alpha-beta T cell'\n", + "pb_adata_ct = pb_adata[pb_adata.obs['cell_type'] == cell_type].copy()\n", + "counts_df_ct = counts_df.loc[pb_adata_ct.obs_names].copy()\n", + "\n", + "metadata_ct = metadata.loc[pb_adata_ct.obs_names].copy()\n", + "\n", + "assert 'age_scaled' in metadata_ct.columns, \"age_scaled column missing in metadata\"\n", + "assert 'sex' in metadata_ct.columns, \"sex column missing in metadata\"\n", + "\n", + "# 3. Initialize DeseqDataSet\n", + "dds = DeseqDataSet(\n", + " counts=counts_df_ct,\n", + " metadata=metadata_ct,\n", + " design_factors=[\"age_scaled\", \"sex\"], # Use the scaled column\n", + " refit_cooks=True,\n", + " n_cpus=8\n", + ")\n", + "\n", + "# 4. Run the fitting (Dispersions & LFCs)\n", + "dds.deseq2()\n" + ] + }, + { + "cell_type": "markdown", + "id": "6c867e35", + "metadata": {}, + "source": [ + "#### Compute statistics" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "0ab33f40", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "contrast: [0 1 0], model_vars: Index(['Intercept', 'age_scaled', 'sex[T.male]'], dtype='object')\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Running Wald tests...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Log2 fold change & Wald test p-value, contrast vector: [0 1 0]\n", + " baseMean log2FoldChange lfcSE stat \\\n", + "OR4F5 0.000000 NaN NaN NaN \n", + "ENSG00000239945 0.005952 -0.100132 0.949557 -0.105451 \n", + "ENSG00000241860 0.365122 0.025313 0.091008 0.278135 \n", + "ENSG00000241599 0.001283 -0.103422 1.448770 -0.071386 \n", + "ENSG00000229905 0.000000 NaN NaN NaN \n", + "... ... ... ... ... \n", + "MT-ND4L 102.560848 0.079307 0.014440 5.492099 \n", + "MT-ND4 2574.820764 0.015613 0.009306 1.677810 \n", + "MT-ND5 759.807343 0.007827 0.010848 0.721464 \n", + "MT-ND6 37.003224 0.029722 0.013672 2.173891 \n", + "MT-CYB 2443.119315 0.030728 0.011338 2.710161 \n", + "\n", + " pvalue padj \n", + "OR4F5 NaN NaN \n", + "ENSG00000239945 9.160176e-01 NaN \n", + "ENSG00000241860 7.809089e-01 NaN \n", + "ENSG00000241599 9.430907e-01 NaN \n", + "ENSG00000229905 NaN NaN \n", + "... ... ... \n", + "MT-ND4L 3.971854e-08 0.000002 \n", + "MT-ND4 9.338413e-02 0.226846 \n", + "MT-ND5 4.706241e-01 0.658790 \n", + "MT-ND6 2.971332e-02 0.099708 \n", + "MT-CYB 6.725063e-03 0.032845 \n", + "\n", + "[29324 rows x 6 columns]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "... done in 2.22 seconds.\n", + "\n" + ] + } + ], + "source": [ + "model_vars = dds.varm[\"LFC\"].columns\n", + "contrast = np.array([0, 1, 0])\n", + "print(f\"contrast: {contrast}, model_vars: {model_vars}\")\n", + "\n", + "# 5. Statistical Test (Wald Test)\n", + "# Syntax for continuous: [\"variable\", \"\", \"\"]\n", + "stat_res = DeseqStats(\n", + " dds, \n", + " contrast=contrast\n", + ")\n", + "\n", + "stat_res.summary()\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "6598e6ce", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Running Wald tests...\n", + "... done in 0.62 seconds.\n", + "\n" + ] + } + ], + "source": [ + "stat_res.run_wald_test()" + ] + }, + { + "cell_type": "markdown", + "id": "01e512d6", + "metadata": {}, + "source": [ + "#### Find significant genes" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "cd4ccda1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Found 3124 significant genes.\n", + " log2FoldChange padj\n", + "ENSG00000260613 0.662743 6.155660e-05\n", + "GTSCR1 0.529188 2.553945e-35\n", + "PHLDA3 0.518890 7.488643e-12\n", + "GABRE 0.494251 4.340036e-04\n", + "DONSON 0.488810 5.428557e-09\n" + ] + } + ], + "source": [ + "# 1. Get the results dataframe\n", + "res = stat_res.results_df\n", + "\n", + "# 2. Filter for significant genes (e.g., FDR < 0.05)\n", + "# This automatically excludes NaNs (since NaN < 0.05 is False)\n", + "sigs = res[res['padj'] < 0.05]\n", + "\n", + "# 3. Sort by effect size (Log2 Fold Change) to see top hits\n", + "sigs = sigs.sort_values('log2FoldChange', ascending=False)\n", + "\n", + "print(f\"Found {len(sigs)} significant genes.\")\n", + "print(sigs[['log2FoldChange', 'padj']].head())" + ] + }, + { + "cell_type": "markdown", + "id": "c68688e2", + "metadata": {}, + "source": [ + "### Pathway enrichment: GSEA\n", + "\n", + "- what pathways are enriched in the differentially expressed genes?" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "7f64589e", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2025-12-20 11:59:18,597 [WARNING] Duplicated values found in preranked stats: 5.93% of genes\n", + "The order of those genes will be arbitrary, which may produce unexpected results.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Top Upregulated Pathways:\n", + " Term NES FDR q-val\n", + "0 MSigDB_Hallmark_2020__TNF-alpha Signaling via ... 2.097542 0.0\n", + "1 MSigDB_Hallmark_2020__Hypoxia 1.782058 0.003216\n", + "2 MSigDB_Hallmark_2020__Interferon Gamma Response 1.750043 0.003216\n", + "3 MSigDB_Hallmark_2020__Apoptosis 1.660934 0.004824\n", + "4 MSigDB_Hallmark_2020__p53 Pathway 1.652138 0.005788\n", + "5 MSigDB_Hallmark_2020__Reactive Oxygen Species ... 1.601382 0.008039\n", + "6 MSigDB_Hallmark_2020__Interferon Alpha Response 1.550408 0.014701\n", + "7 MSigDB_Hallmark_2020__IL-2/STAT5 Signaling 1.4815 0.023314\n", + "8 MSigDB_Hallmark_2020__Oxidative Phosphorylation 1.473942 0.023582\n", + "10 MSigDB_Hallmark_2020__Cholesterol Homeostasis 1.429341 0.031836\n", + "\n", + "Top Downregulated Pathways:\n", + " Term NES FDR q-val\n", + "40 MSigDB_Hallmark_2020__Spermatogenesis -1.003029 0.778001\n", + "39 MSigDB_Hallmark_2020__Androgen Response -1.023541 0.776461\n", + "36 MSigDB_Hallmark_2020__Xenobiotic Metabolism -1.042344 0.787156\n", + "31 MSigDB_Hallmark_2020__Myc Targets V2 -1.143465 0.462271\n", + "25 MSigDB_Hallmark_2020__Apical Junction -1.2273 0.266236\n", + "23 MSigDB_Hallmark_2020__PI3K/AKT/mTOR Signaling -1.24776 0.26635\n", + "21 MSigDB_Hallmark_2020__Protein Secretion -1.251274 0.320531\n", + "14 MSigDB_Hallmark_2020__TGF-beta Signaling -1.369893 0.120925\n", + "12 MSigDB_Hallmark_2020__Notch Signaling -1.394944 0.128555\n", + "9 MSigDB_Hallmark_2020__Wnt-beta Catenin Signaling -1.468539 0.102245\n" + ] + } + ], + "source": [ + "import gseapy as gp\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "\n", + "# 1. Prepare the Ranked List\n", + "# We use the 'stat' column if available (best metric). \n", + "# If 'stat' isn't there, approximate it with -log10(pvalue) * sign(log2FoldChange)\n", + "rank_df = res[['stat']].dropna().sort_values('stat', ascending=False)\n", + "\n", + "# 2. Run GSEA Preranked\n", + "# We look at GO Biological Process and the \"Hallmark\" set (good for general states)\n", + "# For immune specific, you can also add 'Reactome_2022' or 'KEGG_2021_Human'\n", + "prerank_res = gp.prerank(\n", + " rnk=rank_df, \n", + " gene_sets=['MSigDB_Hallmark_2020'],\n", + " threads=4,\n", + " min_size=10, # Min genes in pathway\n", + " max_size=1000, \n", + " permutation_num=1000, # Reduce to 100 for speed if testing\n", + " seed=42\n", + ")\n", + "\n", + "# 3. View Results\n", + "# 'NES' = Normalized Enrichment Score (Positive = Upregulated in Age, Negative = Downregulated)\n", + "# 'FDR q-val' = Significance\n", + "terms = prerank_res.res2d.sort_values('NES', ascending=False)\n", + "\n", + "print(\"Top Upregulated Pathways:\")\n", + "print(terms[['Term', 'NES', 'FDR q-val']].head(10))\n", + "\n", + "print(\"\\nTop Downregulated Pathways:\")\n", + "print(terms[['Term', 'NES', 'FDR q-val']].tail(10))\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "573c2d0a", + "metadata": {}, + "source": [ + "#### Create a plot for the results" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "217bc4f9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Plotting 20 pathways.\n", + " Term NES FDR q-val Count\n", + "0 TNF-alpha Signaling via NF-kB 2.097542 0.000000 86\n", + "1 Hypoxia 1.782058 0.003216 51\n", + "2 Interferon Gamma Response 1.750043 0.003216 80\n", + "3 Apoptosis 1.660934 0.004824 51\n", + "4 p53 Pathway 1.652138 0.005788 59\n" + ] + } + ], + "source": [ + "\n", + "# 1. Get the results table\n", + "# (Assumes 'prerank_res' is your output from gp.prerank)\n", + "gsea_df = prerank_res.res2d.copy()\n", + "\n", + "# 2. Sort by NES to separate Up vs Down\n", + "gsea_df = gsea_df.sort_values('NES', ascending=False)\n", + "\n", + "# 3. Select Top 10 Up and Top 10 Down\n", + "top_up = gsea_df.head(10).copy()\n", + "top_down = gsea_df.tail(10).copy()\n", + "\n", + "# 4. Combine them\n", + "combined_gsea = pd.concat([top_up, top_down])\n", + "\n", + "# 5. Create metrics for plotting\n", + "# Direction based on NES sign\n", + "combined_gsea['Direction'] = combined_gsea['NES'].apply(lambda x: 'Upregulated' if x > 0 else 'Downregulated')\n", + "\n", + "# Significance for X-axis (-log10 FDR)\n", + "# We add a tiny epsilon (1e-10) to avoid log(0) errors if FDR is exactly 0\n", + "combined_gsea['FDR q-val'] = pd.to_numeric(combined_gsea['FDR q-val'], errors='coerce')\n", + "combined_gsea['log_FDR'] = -np.log10(combined_gsea['FDR q-val'] + 1e-10)\n", + "\n", + "# Gene Count for Dot Size\n", + "# GSEApy stores the leading edge genes as a semi-colon separated string in 'Lead_genes'\n", + "combined_gsea['Count'] = combined_gsea['Lead_genes'].apply(lambda x: len(str(x).split(';')))\n", + "\n", + "## remove MSigDB label from Term\n", + "combined_gsea['Term'] = combined_gsea['Term'].str.replace('MSigDB_Hallmark_2020__', '', regex=False)\n", + "\n", + "print(f\"Plotting {len(combined_gsea)} pathways.\")\n", + "print(combined_gsea[['Term', 'NES', 'FDR q-val', 'Count']].head())" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "3da89ee8", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure(figsize=(10, 8))\n", + "\n", + "# Create the scatter plot\n", + "sns.scatterplot(\n", + " data=combined_gsea,\n", + " x='log_FDR',\n", + " y='Term',\n", + " hue='Direction', # Color by NES Direction\n", + " size='Count', # Size by number of Leading Edge genes\n", + " palette={'Upregulated': '#E41A1C', 'Downregulated': '#377EB8'}, # Red/Blue\n", + " sizes=(50, 400), # Range of dot sizes\n", + " alpha=0.8\n", + ")\n", + "\n", + "# Customization\n", + "plt.title('Top GSEA Pathways (Up vs Down)', fontsize=14)\n", + "plt.xlabel('-log10(FDR q-value)', fontsize=12)\n", + "plt.ylabel('')\n", + "\n", + "# Add a vertical line for significance (FDR < 0.05 => -log10(0.05) ~= 1.3)\n", + "plt.axvline(-np.log10(0.25), color='gray', linestyle=':', label='FDR=0.25 (GSEA standard)')\n", + "plt.axvline(-np.log10(0.05), color='gray', linestyle='--', label='FDR=0.05')\n", + "\n", + "plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)\n", + "plt.grid(axis='x', alpha=0.3)\n", + "plt.tight_layout()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ceaad09c", + "metadata": {}, + "source": [ + "### Enrichr analysis for overrepresentation" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "7f5e6224", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Analyzing 1205 upregulated and 1919 downregulated genes.\n", + "Upregulated Pathways:\n", + " Term Adjusted P-value Overlap\n", + "0 TNF-alpha Signaling via NF-kB 3.695219e-15 48/200\n", + "1 Myc Targets V1 2.231576e-13 45/200\n", + "2 p53 Pathway 5.909649e-11 41/200\n", + "3 Apoptosis 6.464364e-11 36/161\n", + "4 Interferon Gamma Response 2.287547e-09 38/200\n", + "5 Hypoxia 2.679064e-07 34/200\n", + "6 Oxidative Phosphorylation 2.679064e-07 34/200\n", + "7 Reactive Oxygen Species Pathway 4.441324e-06 14/49\n", + "8 Unfolded Protein Response 1.912995e-05 21/113\n", + "9 mTORC1 Signaling 4.982898e-05 29/200\n", + "Downregulated Pathways:\n", + " Term Adjusted P-value Overlap\n", + "0 Myc Targets V1 0.001618 38/200\n", + "1 Protein Secretion 0.005788 21/96\n", + "2 PI3K/AKT/mTOR Signaling 0.005788 22/105\n", + "3 Androgen Response 0.014601 20/100\n", + "4 Wnt-beta Catenin Signaling 0.054265 10/42\n", + "5 TGF-beta Signaling 0.102715 11/54\n", + "6 Oxidative Phosphorylation 0.115051 29/200\n", + "7 Fatty Acid Metabolism 0.301496 22/158\n", + "8 G2-M Checkpoint 0.342076 26/200\n", + "9 mTORC1 Signaling 0.342076 26/200\n" + ] + } + ], + "source": [ + "# 1. Define your significant gene lists\n", + "# Up in Age\n", + "up_genes = res[\n", + " (res['padj'] < 0.05) & (res['log2FoldChange'] > 0)\n", + "].index.tolist()\n", + "\n", + "# Down in Age\n", + "down_genes = res[\n", + " (res['padj'] < 0.05) & (res['log2FoldChange'] < 0)\n", + "].index.tolist()\n", + "\n", + "print(f\"Analyzing {len(up_genes)} upregulated and {len(down_genes)} downregulated genes.\")\n", + "\n", + "# 2. Run Enrichr (Over-Representation Analysis)\n", + "if len(up_genes) > 0:\n", + " enr_up = gp.enrichr(\n", + " gene_list=up_genes,\n", + " gene_sets=['MSigDB_Hallmark_2020'],\n", + " organism='human', \n", + " outdir=None\n", + " )\n", + " print(\"Upregulated Pathways:\")\n", + " print(enr_up.results[['Term', 'Adjusted P-value', 'Overlap']].head(10))\n", + " \n", + "\n", + "if len(down_genes) > 0:\n", + " enr_down = gp.enrichr(\n", + " gene_list=down_genes,\n", + " gene_sets=['MSigDB_Hallmark_2020'],\n", + " organism='human', \n", + " outdir=None\n", + " )\n", + " print(\"Downregulated Pathways:\")\n", + " print(enr_down.results[['Term', 'Adjusted P-value', 'Overlap']].head(10))\n", + " \n" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "390796aa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Plotting 20 pathways.\n" + ] + } + ], + "source": [ + "\n", + "# 1. Add a \"Direction\" column to distinguish them\n", + "up_res = enr_up.results.copy()\n", + "up_res['Direction'] = 'Upregulated'\n", + "up_res['Color'] = 'Red' # For custom palette\n", + "\n", + "down_res = enr_down.results.copy()\n", + "down_res['Direction'] = 'Downregulated'\n", + "down_res['Color'] = 'Blue'\n", + "\n", + "# 2. Filter for top 10 pathways by Adjusted P-value\n", + "# (You can also filter by 'Combined Score' if you prefer)\n", + "top_up = up_res.sort_values('Adjusted P-value').head(10)\n", + "top_down = down_res.sort_values('Adjusted P-value').head(10)\n", + "\n", + "# 3. Concatenate\n", + "combined = pd.concat([top_up, top_down])\n", + "\n", + "# 4. Create a \"-log10(P-value)\" column for plotting\n", + "combined['log_p'] = -np.log10(combined['Adjusted P-value'])\n", + "\n", + "# 5. Extract \"Count\" from the \"Overlap\" column (e.g., \"5/200\" -> 5)\n", + "# This is used to size the dots\n", + "combined['Gene_Count'] = combined['Overlap'].apply(lambda x: int(x.split('/')[0]))\n", + "\n", + "print(f\"Plotting {len(combined)} pathways.\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "290615f7", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.figure(figsize=(10, 8))\n", + "\n", + "# Create the scatter plot\n", + "sns.scatterplot(\n", + " data=combined,\n", + " x='log_p',\n", + " y='Term',\n", + " hue='Direction', # Color by Up/Down\n", + " size='Gene_Count', # Size by number of genes in pathway\n", + " palette={'Upregulated': '#E41A1C', 'Downregulated': '#377EB8'}, # Red/Blue\n", + " sizes=(50, 400), # Range of dot sizes\n", + " alpha=0.8\n", + ")\n", + "\n", + "# Customization\n", + "plt.title('Top Enriched Pathways (Up vs Down)', fontsize=14)\n", + "plt.xlabel('-log10(Adjusted P-value)', fontsize=12)\n", + "plt.ylabel('')\n", + "plt.axvline(-np.log10(0.05), color='gray', linestyle='--', alpha=0.5, label='p=0.05') # Significance threshold line\n", + "plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)\n", + "plt.grid(axis='x', alpha=0.3)\n", + "plt.tight_layout()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "bc7868dc", + "metadata": {}, + "source": [ + "### Age prediction from gene expression\n", + "\n", + "Here we will build a predictive model and assess our ability to predict age from held-out individuals. We also test against a baseline model with only sex as a covariate." + ] + }, + { + "cell_type": "markdown", + "id": "b826767b", + "metadata": {}, + "source": [] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "086597f5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Feature matrix shape: (698, 29325)\n", + "Number of samples: 698\n", + "Age range: 19.0 - 97.0 years\n", + "\n", + "Fold 1/5\n", + " R² Score: 0.310\n", + " MAE: 11.38 years\n", + "\n", + "Fold 2/5\n", + " R² Score: 0.305\n", + " MAE: 11.63 years\n", + "\n", + "Fold 3/5\n", + " R² Score: 0.247\n", + " MAE: 11.94 years\n", + "\n", + "Fold 4/5\n", + " R² Score: 0.314\n", + " MAE: 10.67 years\n", + "\n", + "Fold 5/5\n", + " R² Score: 0.240\n", + " MAE: 10.87 years\n", + "\n", + "==================================================\n", + "CROSS-VALIDATION RESULTS\n", + "==================================================\n", + "R² Score: 0.283 ± 0.033\n", + "MAE: 11.30 ± 0.47 years\n", + "==================================================\n" + ] + } + ], + "source": [ + "from sklearn.svm import LinearSVR\n", + "from sklearn.model_selection import ShuffleSplit\n", + "from sklearn.preprocessing import StandardScaler\n", + "from sklearn.metrics import r2_score, mean_absolute_error\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "# 1. Prepare features and target\n", + "# Features: all genes from counts_df_ct + sex variable\n", + "X_genes = counts_df_ct.copy()\n", + "\n", + "# Add sex as a binary feature (encode as 0/1)\n", + "sex_encoded = pd.get_dummies(metadata_ct['sex'], drop_first=True)\n", + "X = pd.concat([X_genes, sex_encoded], axis=1)\n", + "\n", + "# Target: age\n", + "y = metadata_ct['age'].values\n", + "\n", + "print(f\"Feature matrix shape: {X.shape}\")\n", + "print(f\"Number of samples: {len(y)}\")\n", + "print(f\"Age range: {y.min():.1f} - {y.max():.1f} years\")\n", + "\n", + "# 2. Set up ShuffleSplit cross-validation\n", + "# Using 5 splits with 20% test size\n", + "cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=42)\n", + "\n", + "# 3. Store results\n", + "r2_scores = []\n", + "mae_scores = []\n", + "predictions_list = []\n", + "actual_list = []\n", + "\n", + "# 4. Train and evaluate model for each split\n", + "for fold, (train_idx, test_idx) in enumerate(cv.split(X)):\n", + " print(f\"\\nFold {fold + 1}/5\")\n", + " \n", + " # Split data\n", + " X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]\n", + " y_train, y_test = y[train_idx], y[test_idx]\n", + " \n", + " # Scale features (important for SVR)\n", + " scaler = StandardScaler()\n", + " X_train_scaled = scaler.fit_transform(X_train)\n", + " X_test_scaled = scaler.transform(X_test)\n", + " \n", + " # Train Linear SVR\n", + " # C parameter controls regularization (smaller = more regularization)\n", + " model = LinearSVR(C=1.0, max_iter=10000, random_state=42, dual='auto')\n", + " model.fit(X_train_scaled, y_train)\n", + " \n", + " # Predict on test set\n", + " y_pred = model.predict(X_test_scaled)\n", + " \n", + " # Calculate metrics\n", + " r2 = r2_score(y_test, y_pred)\n", + " mae = mean_absolute_error(y_test, y_pred)\n", + " \n", + " r2_scores.append(r2)\n", + " mae_scores.append(mae)\n", + " predictions_list.extend(y_pred)\n", + " actual_list.extend(y_test)\n", + " \n", + " print(f\" R² Score: {r2:.3f}\")\n", + " print(f\" MAE: {mae:.2f} years\")\n", + "\n", + "# 5. Summary statistics\n", + "print(\"\\n\" + \"=\"*50)\n", + "print(\"CROSS-VALIDATION RESULTS\")\n", + "print(\"=\"*50)\n", + "print(f\"R² Score: {np.mean(r2_scores):.3f} ± {np.std(r2_scores):.3f}\")\n", + "print(f\"MAE: {np.mean(mae_scores):.2f} ± {np.std(mae_scores):.2f} years\")\n", + "print(\"=\"*50)" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "9a1f7eda", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Visualize predictions vs actual ages\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.figure(figsize=(8, 6))\n", + "\n", + "# Scatter plot of predictions vs actual\n", + "plt.scatter(actual_list, predictions_list, alpha=0.6, s=80)\n", + "\n", + "# Add diagonal line (perfect predictions)\n", + "min_age = min(min(actual_list), min(predictions_list))\n", + "max_age = max(max(actual_list), max(predictions_list))\n", + "plt.plot([min_age, max_age], [min_age, max_age], 'r--', linewidth=2, label='Perfect Prediction')\n", + "\n", + "plt.xlabel('Actual Age (years)', fontsize=12)\n", + "plt.ylabel('Predicted Age (years)', fontsize=12)\n", + "plt.title(f'Age Prediction Performance\\nR² = {np.mean(r2_scores):.3f}, MAE = {np.mean(mae_scores):.2f} years', \n", + " fontsize=14)\n", + "plt.legend()\n", + "plt.grid(alpha=0.3)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "902799f5", + "metadata": {}, + "source": [ + "#### Baseline model: Sex only\n", + "\n", + "Compare against a baseline model that only uses sex as a predictor to assess the contribution of gene expression." + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "781a934c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "============================================================\n", + "MODEL COMPARISON\n", + "============================================================\n", + "Full Model (Genes + Sex):\n", + " R² Score: 0.283 ± 0.033\n", + " MAE: 11.30 ± 0.47 years\n", + "\n", + "Baseline Model (Sex Only):\n", + " R² Score: -0.027 ± 0.018\n", + " MAE: 13.55 ± 0.54 years\n", + "\n", + "Improvement:\n", + " ΔR²: 0.310\n", + " ΔMAE: 2.25 years\n", + "============================================================\n" + ] + } + ], + "source": [ + "# Baseline model: Sex only\n", + "X_baseline = sex_encoded.copy()\n", + "\n", + "# Store baseline results\n", + "baseline_r2_scores = []\n", + "baseline_mae_scores = []\n", + "\n", + "# Train and evaluate baseline model for each split\n", + "for fold, (train_idx, test_idx) in enumerate(cv.split(X_baseline)):\n", + " # Split data\n", + " X_train, X_test = X_baseline.iloc[train_idx], X_baseline.iloc[test_idx]\n", + " y_train, y_test = y[train_idx], y[test_idx]\n", + " \n", + " # Scale features\n", + " scaler = StandardScaler()\n", + " X_train_scaled = scaler.fit_transform(X_train)\n", + " X_test_scaled = scaler.transform(X_test)\n", + " \n", + " # Train Linear SVR\n", + " model = LinearSVR(C=1.0, max_iter=10000, random_state=42, dual='auto')\n", + " model.fit(X_train_scaled, y_train)\n", + " \n", + " # Predict on test set\n", + " y_pred = model.predict(X_test_scaled)\n", + " \n", + " # Calculate metrics\n", + " r2 = r2_score(y_test, y_pred)\n", + " mae = mean_absolute_error(y_test, y_pred)\n", + " \n", + " baseline_r2_scores.append(r2)\n", + " baseline_mae_scores.append(mae)\n", + "\n", + "# Summary comparison\n", + "print(\"=\"*60)\n", + "print(\"MODEL COMPARISON\")\n", + "print(\"=\"*60)\n", + "print(f\"Full Model (Genes + Sex):\")\n", + "print(f\" R² Score: {np.mean(r2_scores):.3f} ± {np.std(r2_scores):.3f}\")\n", + "print(f\" MAE: {np.mean(mae_scores):.2f} ± {np.std(mae_scores):.2f} years\")\n", + "print(f\"\\nBaseline Model (Sex Only):\")\n", + "print(f\" R² Score: {np.mean(baseline_r2_scores):.3f} ± {np.std(baseline_r2_scores):.3f}\")\n", + "print(f\" MAE: {np.mean(baseline_mae_scores):.2f} ± {np.std(baseline_mae_scores):.2f} years\")\n", + "print(f\"\\nImprovement:\")\n", + "print(f\" ΔR²: {np.mean(r2_scores) - np.mean(baseline_r2_scores):.3f}\")\n", + "print(f\" ΔMAE: {np.mean(baseline_mae_scores) - np.mean(mae_scores):.2f} years\")\n", + "print(\"=\"*60)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "052f9c31", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "BetterCodeBetterScience", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 91d26f880d6da49140b18440e89c610a8a8e094f Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 20 Dec 2025 12:27:29 -0800 Subject: [PATCH 28/87] initial add --- .../rnaseq/immune_scrnaseq_monolithic.py | 1119 +++++++++++++++++ 1 file changed, 1119 insertions(+) create mode 100644 src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py diff --git a/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py new file mode 100644 index 0000000..7c8d32c --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py @@ -0,0 +1,1119 @@ +# --- +# jupyter: +# jupytext: +# text_representation: +# extension: .py +# format_name: percent +# format_version: '1.3' +# jupytext_version: 1.18.1 +# kernelspec: +# display_name: BetterCodeBetterScience +# language: python +# name: python3 +# --- + +# %% [markdown] +# ### Immune system gene expression and aging +# +# We will use a dataset distributed by the [OneK1K](https://onek1k.org/) project, which includes single-cell RNA-seq data from peripheral blood mononuclear cells (PBMCs) obtained from 982 donors, comprising more than 1.2 million cells in total. These data are released under a Creative Commons Zero Public Domain Dedication and are thus free to reuse, with the restriction that users agree not to attempt to reidentify the participants. +# +# The flagship paper for this study is: +# +# Yazar S., Alquicira-Hernández J., Wing K., Senabouth A., Gordon G., Andersen S., Lu Q., Rowson A., Taylor T., Clarke L., Maccora L., Chen C., Cook A., Ye J., Fairfax K., Hewitt A., Powell J. Single cell eQTL mapping identified cell type specific control of autoimmune disease. Science, 376, 6589 (2022) +# +# We will use the data to ask a simple question: how does gene expression in PBMCs change with age? + + + +# %% +import anndata as ad +import dask.array as da +import h5py +import numpy as np +import scanpy as sc +from pathlib import Path +import matplotlib.pyplot as plt +import seaborn as sns + +datadir = Path('/Users/poldrack/data_unsynced/BCBS/immune_aging/') + + +# %% [markdown] +# ### Immune system gene expression and aging +# +# We will use a dataset distributed by the [OneK1K](https://onek1k.org/) project, which includes single-cell RNA-seq data from peripheral blood mononuclear cells (PBMCs) obtained from 982 donors, comprising more than 1.2 million cells in total. These data are released under a Creative Commons Zero Public Domain Dedication and are thus free to reuse, with the restriction that users agree not to attempt to reidentify the participants. +# +# The flagship paper for this study is: +# +# Yazar S., Alquicira-Hernández J., Wing K., Senabouth A., Gordon G., Andersen S., Lu Q., Rowson A., Taylor T., Clarke L., Maccora L., Chen C., Cook A., Ye J., Fairfax K., Hewitt A., Powell J. Single cell eQTL mapping identified cell type specific control of autoimmune disease. Science, 376, 6589 (2022) +# +# We will use the data to ask a simple question: how does gene expression in PBMCs change with age? +# +# # Code in this notebook primarily generated using Gemini 3.0 + + +# %% +import anndata as ad +from anndata.experimental import read_lazy +import dask.array as da +import h5py +import numpy as np +import scanpy as sc +from pathlib import Path +import os + +datadir = Path('/Users/poldrack/data_unsynced/BCBS/immune_aging/') + +# %% +datafile = datadir / 'a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad' +url = 'https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad' +dataset_name = 'OneK1K' + +if not datafile.exists(): + cmd = f'wget -O {datafile.as_posix()} {url}' + print(f'Downloading data from {url} to {datafile.as_posix()}') + os.system(cmd) + +load_annotation_index = True +adata = read_lazy(h5py.File(datafile, 'r'), + load_annotation_index=load_annotation_index) + +# %% +print(adata) + +# %% +unique_cell_types = np.unique(adata.obs['cell_type']) +print(unique_cell_types) + +# %% [markdown] +# ### Filtering out bad donors + +# %% +import matplotlib.pyplot as plt +import pandas as pd +from scipy.stats import scoreatpercentile + +# 1. Calculate how many cells each donor has +donor_cell_counts = pd.Series(adata.obs['donor_id']).value_counts() + +# Print some basic statistics to read the exact numbers +print("Donor Cell Count Statistics:") +print(donor_cell_counts.describe()) + +# 2. Plot the histogram +plt.figure(figsize=(10, 6)) +# Bins set to 'auto' or a fixed number depending on your N of donors +plt.hist(donor_cell_counts.values, bins=50, color='skyblue', edgecolor='black') + +plt.title('Distribution of Total Cells per Donor') +plt.xlabel('Number of Cells Captured') +plt.ylabel('Number of Donors') +plt.grid(axis='y', alpha=0.5) + +# Optional: Draw a vertical line at the propsoed cutoff +# This helps you visualize how many donors you would lose. +cutoff_percentile = 10 # e.g., 10th percentile +min_cells_per_donor = int(scoreatpercentile(donor_cell_counts.values, cutoff_percentile)) +print(f'cutoff of {min_cells_per_donor} would exclude {(donor_cell_counts < min_cells_per_donor).sum()} donors') +plt.axvline(min_cells_per_donor, color='red', linestyle='dashed', linewidth=1, label=f'Cutoff ({min_cells_per_donor} cells)') +plt.legend() + +plt.show() + +# %% +print(f"Filtering to keep only donors with at least {min_cells_per_donor} cells.") +print(f"Number of donors excluded: {(donor_cell_counts < min_cells_per_donor).sum()}") +valid_donors = donor_cell_counts[donor_cell_counts >= min_cells_per_donor].index +adata = adata[adata.obs['donor_id'].isin(valid_donors)] + +# %% +print(f'Number of donors after filtering: {len(valid_donors)}') + +# %% [markdown] +# ### Filtering cell types by frequency +# +# Drop cell types that don't have at least 10 cells for at least 95% of people + +# %% +import pandas as pd + +# 1. Calculate the count of cells for each 'cell_type' within each 'donor_id' +# We use pandas crosstab on adata.obs, which is loaded in memory. +counts_per_donor = pd.crosstab(adata.obs['donor_id'], adata.obs['cell_type']) + +# 2. Identify cell types to keep +# Keep if >= 10 cells in at least 90% of donors + +min_cells = 10 +percent_donors = 0.9 +donor_count = counts_per_donor.shape[0] +cell_types_to_keep = counts_per_donor.columns[ + (counts_per_donor >= min_cells).sum(axis=0) >= (donor_count * percent_donors)] + +print(f"Keeping {len(cell_types_to_keep)} cell types out of {len(counts_per_donor.columns)}") +print(f"Cell types to keep: {cell_types_to_keep.tolist()}") + +# 3. Filter the AnnData object +# We subset the AnnData to include only observations belonging to the valid cell types. +adata_filtered = adata[adata.obs['cell_type'].isin(cell_types_to_keep)] + +# %% +# now drop subjects who have any zeros in these cell types +donor_celltype_counts = pd.crosstab(adata_filtered.obs['donor_id'], adata_filtered.obs['cell_type']) +valid_donors_final = donor_celltype_counts.index[ + (donor_celltype_counts >= min_cells).all(axis=1)] +adata_filtered = adata_filtered[adata_filtered.obs['donor_id'].isin(valid_donors_final)] +print(f"Final number of donors after filtering: {len(valid_donors_final)}") + +# %% + +print("Loading data into memory (this can take a few minutes)...") +adata_loaded = adata_filtered.to_memory() + +# filter out genes with zero counts across all selected cells +print("Filtering genes with zero counts...") +sc.pp.filter_genes(adata_loaded, min_counts=1) + + +# %% +print(adata_loaded) + + +# %% +adata_loaded.write(datadir / f'dataset-{dataset_name}_subset-immune_filtered.h5ad') +del adata_loaded + +# %% +adata = ad.read_h5ad(datadir / f'dataset-{dataset_name}_subset-immune_filtered.h5ad') +print(adata) + +# %% +var_to_feature = dict(zip(adata.var_names, adata.var['feature_name'])) + +# %% [markdown] +# Preprocessing based on suggestions from Google Gemini +# +# based on https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html +# +# and https://www.10xgenomics.com/analysis-guides/common-considerations-for-quality-control-filters-for-single-cell-rna-seq-data +# + +# %% [markdown] +# ### Quality control +# +# based on https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html +# + +# %% +# mitochondrial genes +adata.var["mt"] = adata.var['feature_name'].str.startswith("MT-") +print(f"Number of mitochondrial genes: {adata.var['mt'].sum()}") + +# ribosomal genes +adata.var["ribo"] = adata.var['feature_name'].str.startswith(("RPS", "RPL")) +print(f"Number of ribosomal genes: {adata.var['ribo'].sum()}") + +# hemoglobin genes. +adata.var["hb"] = adata.var['feature_name'].str.contains("^HB[^(P)]") +print(f"Number of hemoglobin genes: {adata.var['hb'].sum()}") + +sc.pp.calculate_qc_metrics( + adata, qc_vars=["mt", "ribo", "hb"], inplace=True, percent_top=[20], log1p=True +) + + + +# %% [markdown] +# #### Visualization of distributions + +# %% + +# 1. Violin plots to see the distribution of QC metrics +# Note: I am using the exact column names from your adata output +p1 = sc.pl.violin(adata, ['total_counts', 'n_genes_by_counts', 'pct_counts_mt'], + jitter=0.4, multi_panel=True) + +# 2. Scatter plot to spot doublets and dying cells +# High mito + low genes = dying cell +# High counts + high genes = potential doublet +sc.pl.scatter(adata, x='total_counts', y='n_genes_by_counts', color='pct_counts_mt') + +# %% [markdown] +# #### Check Hemoglobin (RBC contamination) +# + +# %% + +plt.figure(figsize=(6, 4)) +sns.histplot(adata.obs['pct_counts_hb'], bins=50, log_scale=(False, True)) # Log scale y to see small RBC populations +plt.title("Hemoglobin Content Distribution") +plt.xlabel("% Hemoglobin Counts") +plt.axvline(5, color='red', linestyle='--', label='5% Cutoff') +plt.legend() +plt.show() + +# %% [markdown] +# #### Create a copy of the data and apply QC cutoffs +# + +# %% +# Create a copy or view to avoid modifying the original if needed +adata_qc = adata.copy() + +# --- Define Thresholds --- +# Low quality (Empty droplets / debris) +min_genes = 200 # Standard for immune cells (T-cells can be small) +min_counts = 500 # Minimum UMIs + +# Doublets (Two cells stuck together) +# Adjust this based on the scatter plot above. +# 4000-6000 is common for 10x Genomics data. +max_genes = 6000 +max_counts = 30000 # Very high counts often indicate doublets + +# Contaminants +max_hb_pct = 5.0 # Remove Red Blood Cells (> 5% hemoglobin) + +# --- Apply Filtering --- +print(f"Before filtering: {adata_qc.n_obs} cells") + +# 1. Filter Low Quality & Doublets +adata_qc = adata_qc[ + (adata_qc.obs['n_genes_by_counts'] > min_genes) & + (adata_qc.obs['n_genes_by_counts'] < max_genes) & + (adata_qc.obs['total_counts'] > min_counts) & + (adata_qc.obs['total_counts'] < max_counts) +] + +# 2. Filter Red Blood Cells (Hemoglobin) +# Only run this if you want to remove RBCs +adata_qc = adata_qc[adata_qc.obs['pct_counts_hb'] < max_hb_pct] + +print(f"After filtering: {adata_qc.n_obs} cells") + +# %% [markdown] +# ### Perform doublet detection +# +# According to Gemini: +# +# You must do this before normalization or clustering because doublets create "hybrid" expression profiles that can form fake clusters (e.g., a "cluster" that looks like a mix of T-cells and B-cells) or distort your normalization factors. +# +# **Important: Run Per Donor** +# +# Since you have multiple people, you must run doublet detection separately for each donor. The doublet rate is a technical artifact of the physical loading of the machine (10x Genomics chip), which varies per run. If you run it on the whole dataset at once, the algorithm will get confused by biological differences between people. +# + +# %% + +# 1. Check preliminary requirements +# Scrublet needs RAW counts. Ensure adata.X contains integers, not log-normalized data. +# If your main layer is already normalized, use adata.raw or a specific layer. +print(f"Data shape before doublet detection: {adata_qc.shape}") + +# 2. Run Scrublet per donor +# We split the data, run detection, and then recombine. +# This prevents the algorithm from comparing a cell from Person A to a cell from Person B. + +adatas_list = [] +# Get list of unique donors +donors = adata_qc.obs['donor_id'].unique() + +print(f"Running Scrublet on {len(donors)} donors...") + +for donor in donors: + # Subset to current donor + curr_adata = adata_qc[adata_qc.obs['donor_id'] == donor].copy() + + # Skip donors with too few cells (Scrublet needs statistical power) + if curr_adata.n_obs < 100: + print(f"Skipping donor {donor}: too few cells ({curr_adata.n_obs})") + # We still add it back to keep the data, but mark as singlet (or filter later) + curr_adata.obs['doublet_score'] = 0 + curr_adata.obs['predicted_doublet'] = False + adatas_list.append(curr_adata) + continue + + # Run Scrublet + # expected_doublet_rate=0.06 is standard for 10x (approx ~0.8% per 1000 cells recovered) + # If you loaded very heavily (20k cells/well), increase this to 0.10 + sc.pp.scrublet(curr_adata, expected_doublet_rate=0.06) + + adatas_list.append(curr_adata) + +# 3. Merge back into one object +adata_qc = sc.concat(adatas_list) + +# 4. Check results +print(f"Detected {adata_qc.obs['predicted_doublet'].sum()} doublets across all donors.") +print(adata_qc.obs['predicted_doublet'].value_counts()) + +# %% [markdown] +# #### Visualize doublets +# +# + +# %% +sc.pl.umap(adata_qc, color=['doublet_score', 'predicted_doublet'], size=20) + +# %% [markdown] +# #### Filter doublets +# - Question: how consistent are these results with other methods for doublet detection? https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html#doublet-detection + +# %% +# Check how many doublets were found +print(f'found {adata_qc.obs["predicted_doublet"].sum()} predicted doublets') + +# Filter the data to keep only singlets (False) +# write back to adata for simplicity +adata = adata_qc[adata_qc.obs['predicted_doublet'] == False, :] +print(f"Remaining cells: {adata.n_obs}") + +# %% [markdown] +# #### Save raw counts for later use + +# %% +# set the .raw attribute (standard Scanpy convention) +adata.layers['counts'] = adata.X.copy() + +# %% [markdown] +# ### Total Count Normalization +# This scales each cell so that they all have the same total number of counts (default is often 10,000, known as "CP10k"). + +# %% +# Normalize to 10,000 reads per cell +# target_sum=1e4 is the standard for 10x data +sc.pp.normalize_total(adata, target_sum=1e4) + +# %% [markdown] +# ### Log Transformation (Log1p) +# This applies a natural logarithm to the data: log(X+1). This reduces the skewness of the data (since gene expression follows a power law) and stabilizes the variance. + +# %% +# Logarithmically transform the data +sc.pp.log1p(adata) + +# %% [markdown] +# ### select high-variance features +# +# according to Gemini: +# For a large immune dataset (PBMCs, ~1.2M cells), the standard defaults often fail to capture the subtle biological variation needed to distinguish similar cell types (like CD4+ T-cell subsets). +# +# Here are the reasonable parameters and, more importantly, the **immune-specific strategy** you should use. +# +# #### The Recommended Parameters +# +# For a dataset of your size, the **`seurat_v3`** flavor is generally superior because it selects genes based on standardized variance (handling the mean-variance relationship better than the dispersion-based method). +# +# * **`flavor`**: `'seurat_v3'` (Requires **RAW integer counts** in `adata.X` or a layer) +# * **`n_top_genes`**: **2000 - 3000** (3000 is safer for immune data to capture rare cytokines/markers) +# * **`batch_key`**: **`'donor_id'`** (CRITICAL) +# * *Why?* With 1.2M cells across many people, you have massive batch effects. If you don't set this, "highly variable genes" will just be the genes that differ between Person A and Person B (e.g., HLA genes, gender-specific genes like XIST/RPS4Y1), rather than genes distinguishing cell types. +# +# #### The "Expert" Trick: Blocklisting Nuisance Genes +# In immune datasets, "highly variable" does not always mean "biologically interesting." You often need to **exclude** specific gene families from the HVG list *after* calculation but *before* PCA, or they will hijack your clustering: +# 1. **TCR/BCR Variable Regions (IG*, TR*):** These are hyper-variable by definition (V(D)J recombination). If you keep them, T-cells will cluster by **clone** (clonotype) rather than by **phenotype** (state). +# 2. **Mitochondrial/Ribosomal:** Usually technical noise. +# 3. **Cell Cycle:** (Optional) If you don't want proliferating cells to cluster separately. +# +# +# +# #### Why 3000 genes instead of 2000? +# Immune cells are dense with specific markers. The difference between a *Naive CD8 T-cell* and a *Central Memory CD8 T-cell* might rest on a handful of genes (e.g., *CCR7, SELL, IL7R* vs *GZMK*). If you limit to 2000 genes in a massive, diverse dataset, you might accidentally drop a subtle marker required to resolve these fine-grained states. + +# %% + +import scanpy as sc +import pandas as pd + + +# 2. Run Highly Variable Gene Selection +# batch_key is critical here to find genes variable WITHIN donors, not BETWEEN them. +sc.pp.highly_variable_genes( + adata, + n_top_genes=3000, + flavor='seurat_v3', + batch_key='donor_id', + span=0.8, # helps avoid numerical issues with LOESS + layer='counts', # Change this to None if adata.X is raw counts + subset=False # Keep False so we can manually filter the list below +) + +# 3. Filter out "Nuisance" Genes from the HVG list +# We don't remove the genes from the object, we just set their 'highly_variable' status to False +# so they aren't used in PCA. + +# A. Identify TCR/BCR genes (starts with IG or TR) +# Regex: IG or TR followed by a V, D, J, or C gene part +import re +immune_receptor_genes = [ + name for name in adata.var_names + if re.match(r'^(IG[HKL]|TR[ABDG])[VDJC]', name) +] + +# B. Identify Ribosomal/Mitochondrial (if not already handled) +mt_genes = adata.var_names[adata.var_names.str.startswith('MT-')] +rb_genes = adata.var_names[adata.var_names.str.startswith(('RPS', 'RPL'))] + +# C. Manually set them to False +genes_to_block = list(immune_receptor_genes) + list(mt_genes) + list(rb_genes) + +# Using set operations for speed +adata.var.loc[adata.var_names.isin(genes_to_block), 'highly_variable'] = False + +print(f"Blocked {len(immune_receptor_genes)} immune receptor genes from HVG list.") +print(f"Final HVG count: {adata.var['highly_variable'].sum()}") + +# 4. Proceed to PCA +sc.tl.pca(adata, svd_solver='arpack', use_highly_variable=True) + + +# %% [markdown] +# ### Dimensionality reduction + +# %% +import scanpy.external as sce + +# 1. Run Harmony +# This adjusts the PCA coordinates to mix donors together while preserving biology. +# It creates a new entry in obsm: 'X_pca_harmony' +try: + sce.pp.harmony_integrate(adata, key='donor_id', basis='X_pca', adjusted_basis='X_pca_harmony') + use_rep = 'X_pca_harmony' + print("Harmony integration successful. Using corrected PCA.") +except ImportError: + print("Harmony not installed. Proceeding with standard PCA (Warning: Batch effects may persist).") + print("To install: pip install harmony-pytorch") + use_rep = 'X_pca' + +# %% +# Reality check: Check if PC1 is just "Cell Size": + +sc.pl.pca(adata, color=['total_counts', 'cell_type'], components=['1,2']) + +# %% [markdown] +# PC1 separates cell types and isn't driven only by the number of cells. + +# %% +# 2. Compute Neighbors +# n_neighbors: 15-30 is standard. Higher (30-50) is better for large datasets to preserve global structure. +# n_pcs: 30-50 is standard. +sc.pp.neighbors(adata, n_neighbors=30, n_pcs=40, use_rep=use_rep) + +# 3. Compute UMAP +# This projects the graph into 2D for you to look at. +sc.tl.umap(adata, init_pos='X_pca_harmony') + +# %% +sc.pl.umap(adata, color="total_counts") + + +# %% [markdown] +# ### Clustering +# + +# %% +# 4. Run Clustering (Leiden algorithm) +# We run multiple resolutions so you can choose the best one later. +#sc.tl.leiden(adata, resolution=0.5, key_added='leiden_0.5') +sc.tl.leiden(adata, resolution=1.0, key_added='leiden_1.0', + flavor="igraph", n_iterations=2) + + +# %% +# Plot UMAP colored by Donor (to check integration) and Clusters +sc.pl.umap(adata, color=['cell_type', 'leiden_1.0'], wspace=0.3) + +# %% +# compute overlap between clusters and cell types +contingency_table = pd.crosstab(adata.obs['leiden_1.0'], adata.obs['cell_type']) +print(contingency_table) + +# %% [markdown] +# ### Pseudobulking + +# %% +import pandas as pd +import numpy as np +import anndata as ad +from scipy import sparse +from sklearn.preprocessing import OneHotEncoder + +def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols=None): + """ + Sum raw counts for each (Donor, CellType) pair. + + Parameters: + ----------- + adata : AnnData + Input single-cell data + group_col : str + Column name for grouping (e.g., 'cell_type') + donor_col : str + Column name for donor ID + layer : str + Layer to use for aggregation (default: 'counts') + metadata_cols : list of str, optional + Additional metadata columns to preserve from obs (e.g., ['development_stage', 'sex']) + These should have consistent values within each donor + """ + # 1. Create a combined key (e.g., "Bcell::Donor1") + groups = adata.obs[group_col].astype(str) + donors = adata.obs[donor_col].astype(str) + + # Create a DataFrame to manage the unique combinations + group_df = pd.DataFrame({'group': groups, 'donor': donors}) + group_df['combined'] = group_df['group'] + "::" + group_df['donor'] + + # 2. Build the Aggregation Matrix (One-Hot Encoding) + enc = OneHotEncoder(sparse_output=True, dtype=np.float32) + membership_matrix = enc.fit_transform(group_df[['combined']]) + + # 3. Aggregation (Summing) + if layer is not None and layer in adata.layers: + X_source = adata.layers[layer] + else: + X_source = adata.X + + pseudobulk_X = membership_matrix.T @ X_source + + # 4. Create the Obs Metadata for the new object + unique_ids = enc.categories_[0] + + # Split back into Donor and Cell Type + obs_data = [] + for uid in unique_ids: + ctype, donor = uid.split("::") + obs_data.append({'cell_type': ctype, 'donor_id': donor}) + + pb_obs = pd.DataFrame(obs_data, index=unique_ids) + + # 5. Count how many cells went into each sum + cell_counts = np.array(membership_matrix.sum(axis=0)).flatten() + pb_obs['n_cells'] = cell_counts.astype(int) + + # 6. Add additional metadata columns + if metadata_cols is not None: + for col in metadata_cols: + if col in adata.obs.columns: + # For each pseudobulk sample, get the first (should be consistent) value + # from the original data for that donor + col_values = [] + for uid in unique_ids: + ctype, donor = uid.split("::") + # Get value from any cell with this donor (should all be the same) + donor_mask = adata.obs[donor_col] == donor + if donor_mask.any(): + col_values.append(adata.obs.loc[donor_mask, col].iloc[0]) + else: + col_values.append(None) + pb_obs[col] = col_values + + # 7. Assemble the AnnData + pb_adata = ad.AnnData(X=pseudobulk_X, obs=pb_obs, var=adata.var.copy()) + + return pb_adata + +# --- Execute --- + +target_cluster_col = 'cell_type' + +print("Aggregating counts...") +pb_adata = create_pseudobulk( + adata, + group_col=target_cluster_col, + donor_col='donor_id', + layer='counts', + metadata_cols=['development_stage', 'sex'] # Add any other donor-level metadata here +) + +print(f"Pseudobulk complete.") +print(f"Original shape: {adata.shape}") +print(f"Pseudobulk shape: {pb_adata.shape} (Samples x Genes)") +print(pb_adata.obs.head()) + +# %% +min_cells = 10 +print(f"Dropping samples with < {min_cells} cells...") + +pb_adata = pb_adata[pb_adata.obs['n_cells'] >= min_cells].copy() + +print(f"Remaining samples: {pb_adata.n_obs}") + +# Optional: Visualize the 'depth' of your new pseudobulk samples +import scanpy as sc +pb_adata.obs['total_counts'] = np.array(pb_adata.X.sum(axis=1)).flatten() +sc.pl.violin(pb_adata, ['n_cells', 'total_counts'], multi_panel=True) + +# %% [markdown] +# ### Differential expression with age +# +# First need to z-score the age variable to put it on same scale as expression, to help with convergence + +# %% +# first need to create 'age_scaled' variable from 'development_stage' +# eg. from '19-year-old stage' to 19 +ages = pb_adata.obs['development_stage'].str.extract(r'(\d+)-year-old').astype(float) +pb_adata.obs['age'] = ages + + + +# %% +import pandas as pd +from pydeseq2.dds import DeseqDataSet +from pydeseq2.ds import DeseqStats +from sklearn.preprocessing import StandardScaler + +# Assume pb_adata is your pseudobulk object from the previous step +# 1. Extract counts and metadata +counts_df = pd.DataFrame( + pb_adata.X.toarray(), + index=pb_adata.obs_names, + columns=[var_to_feature.get(var, var) for var in pb_adata.var_names] +) +# remove duplicate columns if any +counts_df = counts_df.loc[:,~counts_df.columns.duplicated()] + +metadata = pb_adata.obs.copy() + +# 2. IMPORTANT: Scale the continuous variable +# This prevents convergence errors. +scaler = StandardScaler() +metadata['age_scaled'] = scaler.fit_transform(metadata[['age']]).flatten() +metadata['age_scaled'] = metadata['age_scaled'].astype(float) + + +# Check the scaling (Mean should be ~0, Std ~1) +print(metadata[['age', 'age_scaled']].head()) + +# %% +# Perform DE analysis separately for each cell type +# For this example we just choose one of them + +cell_type = 'central memory CD4-positive, alpha-beta T cell' +pb_adata_ct = pb_adata[pb_adata.obs['cell_type'] == cell_type].copy() +counts_df_ct = counts_df.loc[pb_adata_ct.obs_names].copy() + +metadata_ct = metadata.loc[pb_adata_ct.obs_names].copy() + +assert 'age_scaled' in metadata_ct.columns, "age_scaled column missing in metadata" +assert 'sex' in metadata_ct.columns, "sex column missing in metadata" + +# 3. Initialize DeseqDataSet +dds = DeseqDataSet( + counts=counts_df_ct, + metadata=metadata_ct, + design_factors=["age_scaled", "sex"], # Use the scaled column + refit_cooks=True, + n_cpus=8 +) + +# 4. Run the fitting (Dispersions & LFCs) +dds.deseq2() + + +# %% [markdown] +# #### Compute statistics + +# %% +model_vars = dds.varm["LFC"].columns +contrast = np.array([0, 1, 0]) +print(f"contrast: {contrast}, model_vars: {model_vars}") + +# 5. Statistical Test (Wald Test) +# Syntax for continuous: ["variable", "", ""] +stat_res = DeseqStats( + dds, + contrast=contrast +) + +stat_res.summary() + + + +# %% +stat_res.run_wald_test() + +# %% [markdown] +# #### Find significant genes + +# %% +# 1. Get the results dataframe +res = stat_res.results_df + +# 2. Filter for significant genes (e.g., FDR < 0.05) +# This automatically excludes NaNs (since NaN < 0.05 is False) +sigs = res[res['padj'] < 0.05] + +# 3. Sort by effect size (Log2 Fold Change) to see top hits +sigs = sigs.sort_values('log2FoldChange', ascending=False) + +print(f"Found {len(sigs)} significant genes.") +print(sigs[['log2FoldChange', 'padj']].head()) + +# %% [markdown] +# ### Pathway enrichment: GSEA +# +# - what pathways are enriched in the differentially expressed genes? + +# %% +import gseapy as gp +import matplotlib.pyplot as plt +import pandas as pd + +# 1. Prepare the Ranked List +# We use the 'stat' column if available (best metric). +# If 'stat' isn't there, approximate it with -log10(pvalue) * sign(log2FoldChange) +rank_df = res[['stat']].dropna().sort_values('stat', ascending=False) + +# 2. Run GSEA Preranked +# We look at GO Biological Process and the "Hallmark" set (good for general states) +# For immune specific, you can also add 'Reactome_2022' or 'KEGG_2021_Human' +prerank_res = gp.prerank( + rnk=rank_df, + gene_sets=['MSigDB_Hallmark_2020'], + threads=4, + min_size=10, # Min genes in pathway + max_size=1000, + permutation_num=1000, # Reduce to 100 for speed if testing + seed=42 +) + +# 3. View Results +# 'NES' = Normalized Enrichment Score (Positive = Upregulated in Age, Negative = Downregulated) +# 'FDR q-val' = Significance +terms = prerank_res.res2d.sort_values('NES', ascending=False) + +print("Top Upregulated Pathways:") +print(terms[['Term', 'NES', 'FDR q-val']].head(10)) + +print("\nTop Downregulated Pathways:") +print(terms[['Term', 'NES', 'FDR q-val']].tail(10)) + + + +# %% [markdown] +# #### Create a plot for the results + +# %% + +# 1. Get the results table +# (Assumes 'prerank_res' is your output from gp.prerank) +gsea_df = prerank_res.res2d.copy() + +# 2. Sort by NES to separate Up vs Down +gsea_df = gsea_df.sort_values('NES', ascending=False) + +# 3. Select Top 10 Up and Top 10 Down +top_up = gsea_df.head(10).copy() +top_down = gsea_df.tail(10).copy() + +# 4. Combine them +combined_gsea = pd.concat([top_up, top_down]) + +# 5. Create metrics for plotting +# Direction based on NES sign +combined_gsea['Direction'] = combined_gsea['NES'].apply(lambda x: 'Upregulated' if x > 0 else 'Downregulated') + +# Significance for X-axis (-log10 FDR) +# We add a tiny epsilon (1e-10) to avoid log(0) errors if FDR is exactly 0 +combined_gsea['FDR q-val'] = pd.to_numeric(combined_gsea['FDR q-val'], errors='coerce') +combined_gsea['log_FDR'] = -np.log10(combined_gsea['FDR q-val'] + 1e-10) + +# Gene Count for Dot Size +# GSEApy stores the leading edge genes as a semi-colon separated string in 'Lead_genes' +combined_gsea['Count'] = combined_gsea['Lead_genes'].apply(lambda x: len(str(x).split(';'))) + +## remove MSigDB label from Term +combined_gsea['Term'] = combined_gsea['Term'].str.replace('MSigDB_Hallmark_2020__', '', regex=False) + +print(f"Plotting {len(combined_gsea)} pathways.") +print(combined_gsea[['Term', 'NES', 'FDR q-val', 'Count']].head()) + +# %% +plt.figure(figsize=(10, 8)) + +# Create the scatter plot +sns.scatterplot( + data=combined_gsea, + x='log_FDR', + y='Term', + hue='Direction', # Color by NES Direction + size='Count', # Size by number of Leading Edge genes + palette={'Upregulated': '#E41A1C', 'Downregulated': '#377EB8'}, # Red/Blue + sizes=(50, 400), # Range of dot sizes + alpha=0.8 +) + +# Customization +plt.title('Top GSEA Pathways (Up vs Down)', fontsize=14) +plt.xlabel('-log10(FDR q-value)', fontsize=12) +plt.ylabel('') + +# Add a vertical line for significance (FDR < 0.05 => -log10(0.05) ~= 1.3) +plt.axvline(-np.log10(0.25), color='gray', linestyle=':', label='FDR=0.25 (GSEA standard)') +plt.axvline(-np.log10(0.05), color='gray', linestyle='--', label='FDR=0.05') + +plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.) +plt.grid(axis='x', alpha=0.3) +plt.tight_layout() + +plt.show() + +# %% [markdown] +# ### Enrichr analysis for overrepresentation + +# %% +# 1. Define your significant gene lists +# Up in Age +up_genes = res[ + (res['padj'] < 0.05) & (res['log2FoldChange'] > 0) +].index.tolist() + +# Down in Age +down_genes = res[ + (res['padj'] < 0.05) & (res['log2FoldChange'] < 0) +].index.tolist() + +print(f"Analyzing {len(up_genes)} upregulated and {len(down_genes)} downregulated genes.") + +# 2. Run Enrichr (Over-Representation Analysis) +if len(up_genes) > 0: + enr_up = gp.enrichr( + gene_list=up_genes, + gene_sets=['MSigDB_Hallmark_2020'], + organism='human', + outdir=None + ) + print("Upregulated Pathways:") + print(enr_up.results[['Term', 'Adjusted P-value', 'Overlap']].head(10)) + + +if len(down_genes) > 0: + enr_down = gp.enrichr( + gene_list=down_genes, + gene_sets=['MSigDB_Hallmark_2020'], + organism='human', + outdir=None + ) + print("Downregulated Pathways:") + print(enr_down.results[['Term', 'Adjusted P-value', 'Overlap']].head(10)) + + + +# %% + +# 1. Add a "Direction" column to distinguish them +up_res = enr_up.results.copy() +up_res['Direction'] = 'Upregulated' +up_res['Color'] = 'Red' # For custom palette + +down_res = enr_down.results.copy() +down_res['Direction'] = 'Downregulated' +down_res['Color'] = 'Blue' + +# 2. Filter for top 10 pathways by Adjusted P-value +# (You can also filter by 'Combined Score' if you prefer) +top_up = up_res.sort_values('Adjusted P-value').head(10) +top_down = down_res.sort_values('Adjusted P-value').head(10) + +# 3. Concatenate +combined = pd.concat([top_up, top_down]) + +# 4. Create a "-log10(P-value)" column for plotting +combined['log_p'] = -np.log10(combined['Adjusted P-value']) + +# 5. Extract "Count" from the "Overlap" column (e.g., "5/200" -> 5) +# This is used to size the dots +combined['Gene_Count'] = combined['Overlap'].apply(lambda x: int(x.split('/')[0])) + +print(f"Plotting {len(combined)} pathways.") + + +# %% +import seaborn as sns +import matplotlib.pyplot as plt + +plt.figure(figsize=(10, 8)) + +# Create the scatter plot +sns.scatterplot( + data=combined, + x='log_p', + y='Term', + hue='Direction', # Color by Up/Down + size='Gene_Count', # Size by number of genes in pathway + palette={'Upregulated': '#E41A1C', 'Downregulated': '#377EB8'}, # Red/Blue + sizes=(50, 400), # Range of dot sizes + alpha=0.8 +) + +# Customization +plt.title('Top Enriched Pathways (Up vs Down)', fontsize=14) +plt.xlabel('-log10(Adjusted P-value)', fontsize=12) +plt.ylabel('') +plt.axvline(-np.log10(0.05), color='gray', linestyle='--', alpha=0.5, label='p=0.05') # Significance threshold line +plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.) +plt.grid(axis='x', alpha=0.3) +plt.tight_layout() + +plt.show() + +# %% [markdown] +# ### Age prediction from gene expression +# +# Here we will build a predictive model and assess our ability to predict age from held-out individuals. We also test against a baseline model with only sex as a covariate. + +# %% [markdown] +# + +# %% +from sklearn.svm import LinearSVR +from sklearn.model_selection import ShuffleSplit +from sklearn.preprocessing import StandardScaler +from sklearn.metrics import r2_score, mean_absolute_error +import pandas as pd +import numpy as np + +# 1. Prepare features and target +# Features: all genes from counts_df_ct + sex variable +X_genes = counts_df_ct.copy() + +# Add sex as a binary feature (encode as 0/1) +sex_encoded = pd.get_dummies(metadata_ct['sex'], drop_first=True) +X = pd.concat([X_genes, sex_encoded], axis=1) + +# Target: age +y = metadata_ct['age'].values + +print(f"Feature matrix shape: {X.shape}") +print(f"Number of samples: {len(y)}") +print(f"Age range: {y.min():.1f} - {y.max():.1f} years") + +# 2. Set up ShuffleSplit cross-validation +# Using 5 splits with 20% test size +cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=42) + +# 3. Store results +r2_scores = [] +mae_scores = [] +predictions_list = [] +actual_list = [] + +# 4. Train and evaluate model for each split +for fold, (train_idx, test_idx) in enumerate(cv.split(X)): + print(f"\nFold {fold + 1}/5") + + # Split data + X_train, X_test = X.iloc[train_idx], X.iloc[test_idx] + y_train, y_test = y[train_idx], y[test_idx] + + # Scale features (important for SVR) + scaler = StandardScaler() + X_train_scaled = scaler.fit_transform(X_train) + X_test_scaled = scaler.transform(X_test) + + # Train Linear SVR + # C parameter controls regularization (smaller = more regularization) + model = LinearSVR(C=1.0, max_iter=10000, random_state=42, dual='auto') + model.fit(X_train_scaled, y_train) + + # Predict on test set + y_pred = model.predict(X_test_scaled) + + # Calculate metrics + r2 = r2_score(y_test, y_pred) + mae = mean_absolute_error(y_test, y_pred) + + r2_scores.append(r2) + mae_scores.append(mae) + predictions_list.extend(y_pred) + actual_list.extend(y_test) + + print(f" R² Score: {r2:.3f}") + print(f" MAE: {mae:.2f} years") + +# 5. Summary statistics +print("\n" + "="*50) +print("CROSS-VALIDATION RESULTS") +print("="*50) +print(f"R² Score: {np.mean(r2_scores):.3f} ± {np.std(r2_scores):.3f}") +print(f"MAE: {np.mean(mae_scores):.2f} ± {np.std(mae_scores):.2f} years") +print("="*50) + +# %% +# Visualize predictions vs actual ages +import matplotlib.pyplot as plt + +plt.figure(figsize=(8, 6)) + +# Scatter plot of predictions vs actual +plt.scatter(actual_list, predictions_list, alpha=0.6, s=80) + +# Add diagonal line (perfect predictions) +min_age = min(min(actual_list), min(predictions_list)) +max_age = max(max(actual_list), max(predictions_list)) +plt.plot([min_age, max_age], [min_age, max_age], 'r--', linewidth=2, label='Perfect Prediction') + +plt.xlabel('Actual Age (years)', fontsize=12) +plt.ylabel('Predicted Age (years)', fontsize=12) +plt.title(f'Age Prediction Performance\nR² = {np.mean(r2_scores):.3f}, MAE = {np.mean(mae_scores):.2f} years', + fontsize=14) +plt.legend() +plt.grid(alpha=0.3) +plt.tight_layout() +plt.show() + +# %% [markdown] +# #### Baseline model: Sex only +# +# Compare against a baseline model that only uses sex as a predictor to assess the contribution of gene expression. + +# %% +# Baseline model: Sex only +X_baseline = sex_encoded.copy() + +# Store baseline results +baseline_r2_scores = [] +baseline_mae_scores = [] + +# Train and evaluate baseline model for each split +for fold, (train_idx, test_idx) in enumerate(cv.split(X_baseline)): + # Split data + X_train, X_test = X_baseline.iloc[train_idx], X_baseline.iloc[test_idx] + y_train, y_test = y[train_idx], y[test_idx] + + # Scale features + scaler = StandardScaler() + X_train_scaled = scaler.fit_transform(X_train) + X_test_scaled = scaler.transform(X_test) + + # Train Linear SVR + model = LinearSVR(C=1.0, max_iter=10000, random_state=42, dual='auto') + model.fit(X_train_scaled, y_train) + + # Predict on test set + y_pred = model.predict(X_test_scaled) + + # Calculate metrics + r2 = r2_score(y_test, y_pred) + mae = mean_absolute_error(y_test, y_pred) + + baseline_r2_scores.append(r2) + baseline_mae_scores.append(mae) + +# Summary comparison +print("="*60) +print("MODEL COMPARISON") +print("="*60) +print(f"Full Model (Genes + Sex):") +print(f" R² Score: {np.mean(r2_scores):.3f} ± {np.std(r2_scores):.3f}") +print(f" MAE: {np.mean(mae_scores):.2f} ± {np.std(mae_scores):.2f} years") +print(f"\nBaseline Model (Sex Only):") +print(f" R² Score: {np.mean(baseline_r2_scores):.3f} ± {np.std(baseline_r2_scores):.3f}") +print(f" MAE: {np.mean(baseline_mae_scores):.2f} ± {np.std(baseline_mae_scores):.2f} years") +print(f"\nImprovement:") +print(f" ΔR²: {np.mean(r2_scores) - np.mean(baseline_r2_scores):.3f}") +print(f" ΔMAE: {np.mean(baseline_mae_scores) - np.mean(mae_scores):.2f} years") +print("="*60) + +# %% From 925f1f0656fb3d63d6b0e097d05ad9f10eb2af43 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sat, 20 Dec 2025 12:34:04 -0800 Subject: [PATCH 29/87] formatted and ruffed, about to remove plt.show commands which halt execution --- .../rnaseq/immune_scrnaseq_monolithic.py | 533 ++++++++++-------- 1 file changed, 306 insertions(+), 227 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py index 7c8d32c..95c994d 100644 --- a/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py +++ b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py @@ -15,7 +15,7 @@ # %% [markdown] # ### Immune system gene expression and aging # -# We will use a dataset distributed by the [OneK1K](https://onek1k.org/) project, which includes single-cell RNA-seq data from peripheral blood mononuclear cells (PBMCs) obtained from 982 donors, comprising more than 1.2 million cells in total. These data are released under a Creative Commons Zero Public Domain Dedication and are thus free to reuse, with the restriction that users agree not to attempt to reidentify the participants. +# We will use a dataset distributed by the [OneK1K](https://onek1k.org/) project, which includes single-cell RNA-seq data from peripheral blood mononuclear cells (PBMCs) obtained from 982 donors, comprising more than 1.2 million cells in total. These data are released under a Creative Commons Zero Public Domain Dedication and are thus free to reuse, with the restriction that users agree not to attempt to reidentify the participants. # # The flagship paper for this study is: # @@ -24,16 +24,29 @@ # We will use the data to ask a simple question: how does gene expression in PBMCs change with age? - # %% import anndata as ad -import dask.array as da import h5py import numpy as np import scanpy as sc from pathlib import Path import matplotlib.pyplot as plt import seaborn as sns +from anndata.experimental import read_lazy +import os +import pandas as pd +from scipy.stats import scoreatpercentile +import re +import scanpy.external as sce +from sklearn.preprocessing import OneHotEncoder +from pydeseq2.dds import DeseqDataSet +from pydeseq2.ds import DeseqStats +from sklearn.preprocessing import StandardScaler +import gseapy as gp +from sklearn.svm import LinearSVR +from sklearn.model_selection import ShuffleSplit +from sklearn.metrics import r2_score, mean_absolute_error + datadir = Path('/Users/poldrack/data_unsynced/BCBS/immune_aging/') @@ -41,26 +54,18 @@ # %% [markdown] # ### Immune system gene expression and aging # -# We will use a dataset distributed by the [OneK1K](https://onek1k.org/) project, which includes single-cell RNA-seq data from peripheral blood mononuclear cells (PBMCs) obtained from 982 donors, comprising more than 1.2 million cells in total. These data are released under a Creative Commons Zero Public Domain Dedication and are thus free to reuse, with the restriction that users agree not to attempt to reidentify the participants. +# We will use a dataset distributed by the [OneK1K](https://onek1k.org/) project, which includes single-cell RNA-seq data from peripheral blood mononuclear cells (PBMCs) obtained from 982 donors, comprising more than 1.2 million cells in total. These data are released under a Creative Commons Zero Public Domain Dedication and are thus free to reuse, with the restriction that users agree not to attempt to reidentify the participants. # # The flagship paper for this study is: # # Yazar S., Alquicira-Hernández J., Wing K., Senabouth A., Gordon G., Andersen S., Lu Q., Rowson A., Taylor T., Clarke L., Maccora L., Chen C., Cook A., Ye J., Fairfax K., Hewitt A., Powell J. Single cell eQTL mapping identified cell type specific control of autoimmune disease. Science, 376, 6589 (2022) # # We will use the data to ask a simple question: how does gene expression in PBMCs change with age? -# +# # # Code in this notebook primarily generated using Gemini 3.0 # %% -import anndata as ad -from anndata.experimental import read_lazy -import dask.array as da -import h5py -import numpy as np -import scanpy as sc -from pathlib import Path -import os datadir = Path('/Users/poldrack/data_unsynced/BCBS/immune_aging/') @@ -75,8 +80,9 @@ os.system(cmd) load_annotation_index = True -adata = read_lazy(h5py.File(datafile, 'r'), - load_annotation_index=load_annotation_index) +adata = read_lazy( + h5py.File(datafile, 'r'), load_annotation_index=load_annotation_index +) # %% print(adata) @@ -89,15 +95,13 @@ # ### Filtering out bad donors # %% -import matplotlib.pyplot as plt -import pandas as pd -from scipy.stats import scoreatpercentile + # 1. Calculate how many cells each donor has donor_cell_counts = pd.Series(adata.obs['donor_id']).value_counts() # Print some basic statistics to read the exact numbers -print("Donor Cell Count Statistics:") +print('Donor Cell Count Statistics:') print(donor_cell_counts.describe()) # 2. Plot the histogram @@ -113,17 +117,33 @@ # Optional: Draw a vertical line at the propsoed cutoff # This helps you visualize how many donors you would lose. cutoff_percentile = 10 # e.g., 10th percentile -min_cells_per_donor = int(scoreatpercentile(donor_cell_counts.values, cutoff_percentile)) -print(f'cutoff of {min_cells_per_donor} would exclude {(donor_cell_counts < min_cells_per_donor).sum()} donors') -plt.axvline(min_cells_per_donor, color='red', linestyle='dashed', linewidth=1, label=f'Cutoff ({min_cells_per_donor} cells)') +min_cells_per_donor = int( + scoreatpercentile(donor_cell_counts.values, cutoff_percentile) +) +print( + f'cutoff of {min_cells_per_donor} would exclude {(donor_cell_counts < min_cells_per_donor).sum()} donors' +) +plt.axvline( + min_cells_per_donor, + color='red', + linestyle='dashed', + linewidth=1, + label=f'Cutoff ({min_cells_per_donor} cells)', +) plt.legend() plt.show() # %% -print(f"Filtering to keep only donors with at least {min_cells_per_donor} cells.") -print(f"Number of donors excluded: {(donor_cell_counts < min_cells_per_donor).sum()}") -valid_donors = donor_cell_counts[donor_cell_counts >= min_cells_per_donor].index +print( + f'Filtering to keep only donors with at least {min_cells_per_donor} cells.' +) +print( + f'Number of donors excluded: {(donor_cell_counts < min_cells_per_donor).sum()}' +) +valid_donors = donor_cell_counts[ + donor_cell_counts >= min_cells_per_donor +].index adata = adata[adata.obs['donor_id'].isin(valid_donors)] # %% @@ -135,7 +155,6 @@ # Drop cell types that don't have at least 10 cells for at least 95% of people # %% -import pandas as pd # 1. Calculate the count of cells for each 'cell_type' within each 'donor_id' # We use pandas crosstab on adata.obs, which is loaded in memory. @@ -148,10 +167,14 @@ percent_donors = 0.9 donor_count = counts_per_donor.shape[0] cell_types_to_keep = counts_per_donor.columns[ - (counts_per_donor >= min_cells).sum(axis=0) >= (donor_count * percent_donors)] + (counts_per_donor >= min_cells).sum(axis=0) + >= (donor_count * percent_donors) +] -print(f"Keeping {len(cell_types_to_keep)} cell types out of {len(counts_per_donor.columns)}") -print(f"Cell types to keep: {cell_types_to_keep.tolist()}") +print( + f'Keeping {len(cell_types_to_keep)} cell types out of {len(counts_per_donor.columns)}' +) +print(f'Cell types to keep: {cell_types_to_keep.tolist()}') # 3. Filter the AnnData object # We subset the AnnData to include only observations belonging to the valid cell types. @@ -159,19 +182,24 @@ # %% # now drop subjects who have any zeros in these cell types -donor_celltype_counts = pd.crosstab(adata_filtered.obs['donor_id'], adata_filtered.obs['cell_type']) +donor_celltype_counts = pd.crosstab( + adata_filtered.obs['donor_id'], adata_filtered.obs['cell_type'] +) valid_donors_final = donor_celltype_counts.index[ - (donor_celltype_counts >= min_cells).all(axis=1)] -adata_filtered = adata_filtered[adata_filtered.obs['donor_id'].isin(valid_donors_final)] -print(f"Final number of donors after filtering: {len(valid_donors_final)}") + (donor_celltype_counts >= min_cells).all(axis=1) +] +adata_filtered = adata_filtered[ + adata_filtered.obs['donor_id'].isin(valid_donors_final) +] +print(f'Final number of donors after filtering: {len(valid_donors_final)}') # %% -print("Loading data into memory (this can take a few minutes)...") +print('Loading data into memory (this can take a few minutes)...') adata_loaded = adata_filtered.to_memory() # filter out genes with zero counts across all selected cells -print("Filtering genes with zero counts...") +print('Filtering genes with zero counts...') sc.pp.filter_genes(adata_loaded, min_counts=1) @@ -180,11 +208,15 @@ # %% -adata_loaded.write(datadir / f'dataset-{dataset_name}_subset-immune_filtered.h5ad') +adata_loaded.write( + datadir / f'dataset-{dataset_name}_subset-immune_filtered.h5ad' +) del adata_loaded # %% -adata = ad.read_h5ad(datadir / f'dataset-{dataset_name}_subset-immune_filtered.h5ad') +adata = ad.read_h5ad( + datadir / f'dataset-{dataset_name}_subset-immune_filtered.h5ad' +) print(adata) # %% @@ -206,37 +238,46 @@ # %% # mitochondrial genes -adata.var["mt"] = adata.var['feature_name'].str.startswith("MT-") +adata.var['mt'] = adata.var['feature_name'].str.startswith('MT-') print(f"Number of mitochondrial genes: {adata.var['mt'].sum()}") # ribosomal genes -adata.var["ribo"] = adata.var['feature_name'].str.startswith(("RPS", "RPL")) +adata.var['ribo'] = adata.var['feature_name'].str.startswith(('RPS', 'RPL')) print(f"Number of ribosomal genes: {adata.var['ribo'].sum()}") # hemoglobin genes. -adata.var["hb"] = adata.var['feature_name'].str.contains("^HB[^(P)]") +adata.var['hb'] = adata.var['feature_name'].str.contains('^HB[^(P)]') print(f"Number of hemoglobin genes: {adata.var['hb'].sum()}") sc.pp.calculate_qc_metrics( - adata, qc_vars=["mt", "ribo", "hb"], inplace=True, percent_top=[20], log1p=True + adata, + qc_vars=['mt', 'ribo', 'hb'], + inplace=True, + percent_top=[20], + log1p=True, ) - # %% [markdown] -# #### Visualization of distributions +# #### Visualization of distributions # %% # 1. Violin plots to see the distribution of QC metrics # Note: I am using the exact column names from your adata output -p1 = sc.pl.violin(adata, ['total_counts', 'n_genes_by_counts', 'pct_counts_mt'], - jitter=0.4, multi_panel=True) +p1 = sc.pl.violin( + adata, + ['total_counts', 'n_genes_by_counts', 'pct_counts_mt'], + jitter=0.4, + multi_panel=True, +) # 2. Scatter plot to spot doublets and dying cells # High mito + low genes = dying cell # High counts + high genes = potential doublet -sc.pl.scatter(adata, x='total_counts', y='n_genes_by_counts', color='pct_counts_mt') +sc.pl.scatter( + adata, x='total_counts', y='n_genes_by_counts', color='pct_counts_mt' +) # %% [markdown] # #### Check Hemoglobin (RBC contamination) @@ -245,9 +286,11 @@ # %% plt.figure(figsize=(6, 4)) -sns.histplot(adata.obs['pct_counts_hb'], bins=50, log_scale=(False, True)) # Log scale y to see small RBC populations -plt.title("Hemoglobin Content Distribution") -plt.xlabel("% Hemoglobin Counts") +sns.histplot( + adata.obs['pct_counts_hb'], bins=50, log_scale=(False, True) +) # Log scale y to see small RBC populations +plt.title('Hemoglobin Content Distribution') +plt.xlabel('% Hemoglobin Counts') plt.axvline(5, color='red', linestyle='--', label='5% Cutoff') plt.legend() plt.show() @@ -266,30 +309,30 @@ min_counts = 500 # Minimum UMIs # Doublets (Two cells stuck together) -# Adjust this based on the scatter plot above. +# Adjust this based on the scatter plot above. # 4000-6000 is common for 10x Genomics data. -max_genes = 6000 +max_genes = 6000 max_counts = 30000 # Very high counts often indicate doublets # Contaminants max_hb_pct = 5.0 # Remove Red Blood Cells (> 5% hemoglobin) # --- Apply Filtering --- -print(f"Before filtering: {adata_qc.n_obs} cells") +print(f'Before filtering: {adata_qc.n_obs} cells') # 1. Filter Low Quality & Doublets adata_qc = adata_qc[ - (adata_qc.obs['n_genes_by_counts'] > min_genes) & - (adata_qc.obs['n_genes_by_counts'] < max_genes) & - (adata_qc.obs['total_counts'] > min_counts) & - (adata_qc.obs['total_counts'] < max_counts) + (adata_qc.obs['n_genes_by_counts'] > min_genes) + & (adata_qc.obs['n_genes_by_counts'] < max_genes) + & (adata_qc.obs['total_counts'] > min_counts) + & (adata_qc.obs['total_counts'] < max_counts) ] # 2. Filter Red Blood Cells (Hemoglobin) # Only run this if you want to remove RBCs adata_qc = adata_qc[adata_qc.obs['pct_counts_hb'] < max_hb_pct] -print(f"After filtering: {adata_qc.n_obs} cells") +print(f'After filtering: {adata_qc.n_obs} cells') # %% [markdown] # ### Perform doublet detection @@ -308,7 +351,7 @@ # 1. Check preliminary requirements # Scrublet needs RAW counts. Ensure adata.X contains integers, not log-normalized data. # If your main layer is already normalized, use adata.raw or a specific layer. -print(f"Data shape before doublet detection: {adata_qc.shape}") +print(f'Data shape before doublet detection: {adata_qc.shape}') # 2. Run Scrublet per donor # We split the data, run detection, and then recombine. @@ -318,15 +361,15 @@ # Get list of unique donors donors = adata_qc.obs['donor_id'].unique() -print(f"Running Scrublet on {len(donors)} donors...") +print(f'Running Scrublet on {len(donors)} donors...') for donor in donors: # Subset to current donor curr_adata = adata_qc[adata_qc.obs['donor_id'] == donor].copy() - + # Skip donors with too few cells (Scrublet needs statistical power) if curr_adata.n_obs < 100: - print(f"Skipping donor {donor}: too few cells ({curr_adata.n_obs})") + print(f'Skipping donor {donor}: too few cells ({curr_adata.n_obs})') # We still add it back to keep the data, but mark as singlet (or filter later) curr_adata.obs['doublet_score'] = 0 curr_adata.obs['predicted_doublet'] = False @@ -337,14 +380,16 @@ # expected_doublet_rate=0.06 is standard for 10x (approx ~0.8% per 1000 cells recovered) # If you loaded very heavily (20k cells/well), increase this to 0.10 sc.pp.scrublet(curr_adata, expected_doublet_rate=0.06) - + adatas_list.append(curr_adata) # 3. Merge back into one object adata_qc = sc.concat(adatas_list) # 4. Check results -print(f"Detected {adata_qc.obs['predicted_doublet'].sum()} doublets across all donors.") +print( + f"Detected {adata_qc.obs['predicted_doublet'].sum()} doublets across all donors." +) print(adata_qc.obs['predicted_doublet'].value_counts()) # %% [markdown] @@ -365,8 +410,8 @@ # Filter the data to keep only singlets (False) # write back to adata for simplicity -adata = adata_qc[adata_qc.obs['predicted_doublet'] == False, :] -print(f"Remaining cells: {adata.n_obs}") +adata = adata_qc[not adata_qc.obs['predicted_doublet'], :] +print(f'Remaining cells: {adata.n_obs}') # %% [markdown] # #### Save raw counts for later use @@ -422,9 +467,6 @@ # %% -import scanpy as sc -import pandas as pd - # 2. Run Highly Variable Gene Selection # batch_key is critical here to find genes variable WITHIN donors, not BETWEEN them. @@ -435,7 +477,7 @@ batch_key='donor_id', span=0.8, # helps avoid numerical issues with LOESS layer='counts', # Change this to None if adata.X is raw counts - subset=False # Keep False so we can manually filter the list below + subset=False, # Keep False so we can manually filter the list below ) # 3. Filter out "Nuisance" Genes from the HVG list @@ -444,9 +486,10 @@ # A. Identify TCR/BCR genes (starts with IG or TR) # Regex: IG or TR followed by a V, D, J, or C gene part -import re + immune_receptor_genes = [ - name for name in adata.var_names + name + for name in adata.var_names if re.match(r'^(IG[HKL]|TR[ABDG])[VDJC]', name) ] @@ -460,7 +503,9 @@ # Using set operations for speed adata.var.loc[adata.var_names.isin(genes_to_block), 'highly_variable'] = False -print(f"Blocked {len(immune_receptor_genes)} immune receptor genes from HVG list.") +print( + f'Blocked {len(immune_receptor_genes)} immune receptor genes from HVG list.' +) print(f"Final HVG count: {adata.var['highly_variable'].sum()}") # 4. Proceed to PCA @@ -471,18 +516,21 @@ # ### Dimensionality reduction # %% -import scanpy.external as sce # 1. Run Harmony # This adjusts the PCA coordinates to mix donors together while preserving biology. # It creates a new entry in obsm: 'X_pca_harmony' try: - sce.pp.harmony_integrate(adata, key='donor_id', basis='X_pca', adjusted_basis='X_pca_harmony') + sce.pp.harmony_integrate( + adata, key='donor_id', basis='X_pca', adjusted_basis='X_pca_harmony' + ) use_rep = 'X_pca_harmony' - print("Harmony integration successful. Using corrected PCA.") + print('Harmony integration successful. Using corrected PCA.') except ImportError: - print("Harmony not installed. Proceeding with standard PCA (Warning: Batch effects may persist).") - print("To install: pip install harmony-pytorch") + print( + 'Harmony not installed. Proceeding with standard PCA (Warning: Batch effects may persist).' + ) + print('To install: pip install harmony-pytorch') use_rep = 'X_pca' # %% @@ -504,7 +552,7 @@ sc.tl.umap(adata, init_pos='X_pca_harmony') # %% -sc.pl.umap(adata, color="total_counts") +sc.pl.umap(adata, color='total_counts') # %% [markdown] @@ -514,9 +562,14 @@ # %% # 4. Run Clustering (Leiden algorithm) # We run multiple resolutions so you can choose the best one later. -#sc.tl.leiden(adata, resolution=0.5, key_added='leiden_0.5') -sc.tl.leiden(adata, resolution=1.0, key_added='leiden_1.0', - flavor="igraph", n_iterations=2) +# sc.tl.leiden(adata, resolution=0.5, key_added='leiden_0.5') +sc.tl.leiden( + adata, + resolution=1.0, + key_added='leiden_1.0', + flavor='igraph', + n_iterations=2, +) # %% @@ -525,23 +578,23 @@ # %% # compute overlap between clusters and cell types -contingency_table = pd.crosstab(adata.obs['leiden_1.0'], adata.obs['cell_type']) +contingency_table = pd.crosstab( + adata.obs['leiden_1.0'], adata.obs['cell_type'] +) print(contingency_table) # %% [markdown] # ### Pseudobulking # %% -import pandas as pd -import numpy as np -import anndata as ad -from scipy import sparse -from sklearn.preprocessing import OneHotEncoder -def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols=None): + +def create_pseudobulk( + adata, group_col, donor_col, layer='counts', metadata_cols=None +): """ Sum raw counts for each (Donor, CellType) pair. - + Parameters: ----------- adata : AnnData @@ -559,38 +612,38 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # 1. Create a combined key (e.g., "Bcell::Donor1") groups = adata.obs[group_col].astype(str) donors = adata.obs[donor_col].astype(str) - + # Create a DataFrame to manage the unique combinations group_df = pd.DataFrame({'group': groups, 'donor': donors}) - group_df['combined'] = group_df['group'] + "::" + group_df['donor'] - + group_df['combined'] = group_df['group'] + '::' + group_df['donor'] + # 2. Build the Aggregation Matrix (One-Hot Encoding) enc = OneHotEncoder(sparse_output=True, dtype=np.float32) membership_matrix = enc.fit_transform(group_df[['combined']]) - + # 3. Aggregation (Summing) if layer is not None and layer in adata.layers: X_source = adata.layers[layer] else: X_source = adata.X - + pseudobulk_X = membership_matrix.T @ X_source - + # 4. Create the Obs Metadata for the new object unique_ids = enc.categories_[0] - + # Split back into Donor and Cell Type obs_data = [] for uid in unique_ids: - ctype, donor = uid.split("::") + ctype, donor = uid.split('::') obs_data.append({'cell_type': ctype, 'donor_id': donor}) - + pb_obs = pd.DataFrame(obs_data, index=unique_ids) - + # 5. Count how many cells went into each sum cell_counts = np.array(membership_matrix.sum(axis=0)).flatten() pb_obs['n_cells'] = cell_counts.astype(int) - + # 6. Add additional metadata columns if metadata_cols is not None: for col in metadata_cols: @@ -599,48 +652,54 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # from the original data for that donor col_values = [] for uid in unique_ids: - ctype, donor = uid.split("::") + ctype, donor = uid.split('::') # Get value from any cell with this donor (should all be the same) donor_mask = adata.obs[donor_col] == donor if donor_mask.any(): - col_values.append(adata.obs.loc[donor_mask, col].iloc[0]) + col_values.append( + adata.obs.loc[donor_mask, col].iloc[0] + ) else: col_values.append(None) pb_obs[col] = col_values - + # 7. Assemble the AnnData pb_adata = ad.AnnData(X=pseudobulk_X, obs=pb_obs, var=adata.var.copy()) - + return pb_adata + # --- Execute --- target_cluster_col = 'cell_type' -print("Aggregating counts...") +print('Aggregating counts...') pb_adata = create_pseudobulk( - adata, - group_col=target_cluster_col, - donor_col='donor_id', + adata, + group_col=target_cluster_col, + donor_col='donor_id', layer='counts', - metadata_cols=['development_stage', 'sex'] # Add any other donor-level metadata here + metadata_cols=[ + 'development_stage', + 'sex', + ], # Add any other donor-level metadata here ) -print(f"Pseudobulk complete.") -print(f"Original shape: {adata.shape}") -print(f"Pseudobulk shape: {pb_adata.shape} (Samples x Genes)") +print('Pseudobulk complete.') +print(f'Original shape: {adata.shape}') +print(f'Pseudobulk shape: {pb_adata.shape} (Samples x Genes)') print(pb_adata.obs.head()) # %% min_cells = 10 -print(f"Dropping samples with < {min_cells} cells...") +print(f'Dropping samples with < {min_cells} cells...') pb_adata = pb_adata[pb_adata.obs['n_cells'] >= min_cells].copy() -print(f"Remaining samples: {pb_adata.n_obs}") +print(f'Remaining samples: {pb_adata.n_obs}') # Optional: Visualize the 'depth' of your new pseudobulk samples -import scanpy as sc + pb_adata.obs['total_counts'] = np.array(pb_adata.X.sum(axis=1)).flatten() sc.pl.violin(pb_adata, ['n_cells', 'total_counts'], multi_panel=True) @@ -652,26 +711,26 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # %% # first need to create 'age_scaled' variable from 'development_stage' # eg. from '19-year-old stage' to 19 -ages = pb_adata.obs['development_stage'].str.extract(r'(\d+)-year-old').astype(float) +ages = ( + pb_adata.obs['development_stage'] + .str.extract(r'(\d+)-year-old') + .astype(float) +) pb_adata.obs['age'] = ages - # %% -import pandas as pd -from pydeseq2.dds import DeseqDataSet -from pydeseq2.ds import DeseqStats -from sklearn.preprocessing import StandardScaler + # Assume pb_adata is your pseudobulk object from the previous step # 1. Extract counts and metadata counts_df = pd.DataFrame( - pb_adata.X.toarray(), - index=pb_adata.obs_names, - columns=[var_to_feature.get(var, var) for var in pb_adata.var_names] + pb_adata.X.toarray(), + index=pb_adata.obs_names, + columns=[var_to_feature.get(var, var) for var in pb_adata.var_names], ) # remove duplicate columns if any -counts_df = counts_df.loc[:,~counts_df.columns.duplicated()] +counts_df = counts_df.loc[:, ~counts_df.columns.duplicated()] metadata = pb_adata.obs.copy() @@ -695,16 +754,18 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols metadata_ct = metadata.loc[pb_adata_ct.obs_names].copy() -assert 'age_scaled' in metadata_ct.columns, "age_scaled column missing in metadata" -assert 'sex' in metadata_ct.columns, "sex column missing in metadata" +assert ( + 'age_scaled' in metadata_ct.columns +), 'age_scaled column missing in metadata' +assert 'sex' in metadata_ct.columns, 'sex column missing in metadata' # 3. Initialize DeseqDataSet dds = DeseqDataSet( counts=counts_df_ct, metadata=metadata_ct, - design_factors=["age_scaled", "sex"], # Use the scaled column + design_factors=['age_scaled', 'sex'], # Use the scaled column refit_cooks=True, - n_cpus=8 + n_cpus=8, ) # 4. Run the fitting (Dispersions & LFCs) @@ -715,21 +776,17 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # #### Compute statistics # %% -model_vars = dds.varm["LFC"].columns +model_vars = dds.varm['LFC'].columns contrast = np.array([0, 1, 0]) -print(f"contrast: {contrast}, model_vars: {model_vars}") +print(f'contrast: {contrast}, model_vars: {model_vars}') # 5. Statistical Test (Wald Test) # Syntax for continuous: ["variable", "", ""] -stat_res = DeseqStats( - dds, - contrast=contrast -) +stat_res = DeseqStats(dds, contrast=contrast) stat_res.summary() - # %% stat_res.run_wald_test() @@ -747,7 +804,7 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # 3. Sort by effect size (Log2 Fold Change) to see top hits sigs = sigs.sort_values('log2FoldChange', ascending=False) -print(f"Found {len(sigs)} significant genes.") +print(f'Found {len(sigs)} significant genes.') print(sigs[['log2FoldChange', 'padj']].head()) # %% [markdown] @@ -756,12 +813,10 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # - what pathways are enriched in the differentially expressed genes? # %% -import gseapy as gp -import matplotlib.pyplot as plt -import pandas as pd + # 1. Prepare the Ranked List -# We use the 'stat' column if available (best metric). +# We use the 'stat' column if available (best metric). # If 'stat' isn't there, approximate it with -log10(pvalue) * sign(log2FoldChange) rank_df = res[['stat']].dropna().sort_values('stat', ascending=False) @@ -769,13 +824,13 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # We look at GO Biological Process and the "Hallmark" set (good for general states) # For immune specific, you can also add 'Reactome_2022' or 'KEGG_2021_Human' prerank_res = gp.prerank( - rnk=rank_df, + rnk=rank_df, gene_sets=['MSigDB_Hallmark_2020'], threads=4, - min_size=10, # Min genes in pathway - max_size=1000, - permutation_num=1000, # Reduce to 100 for speed if testing - seed=42 + min_size=10, # Min genes in pathway + max_size=1000, + permutation_num=1000, # Reduce to 100 for speed if testing + seed=42, ) # 3. View Results @@ -783,14 +838,13 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # 'FDR q-val' = Significance terms = prerank_res.res2d.sort_values('NES', ascending=False) -print("Top Upregulated Pathways:") +print('Top Upregulated Pathways:') print(terms[['Term', 'NES', 'FDR q-val']].head(10)) -print("\nTop Downregulated Pathways:") +print('\nTop Downregulated Pathways:') print(terms[['Term', 'NES', 'FDR q-val']].tail(10)) - # %% [markdown] # #### Create a plot for the results @@ -812,21 +866,29 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # 5. Create metrics for plotting # Direction based on NES sign -combined_gsea['Direction'] = combined_gsea['NES'].apply(lambda x: 'Upregulated' if x > 0 else 'Downregulated') +combined_gsea['Direction'] = combined_gsea['NES'].apply( + lambda x: 'Upregulated' if x > 0 else 'Downregulated' +) # Significance for X-axis (-log10 FDR) # We add a tiny epsilon (1e-10) to avoid log(0) errors if FDR is exactly 0 -combined_gsea['FDR q-val'] = pd.to_numeric(combined_gsea['FDR q-val'], errors='coerce') +combined_gsea['FDR q-val'] = pd.to_numeric( + combined_gsea['FDR q-val'], errors='coerce' +) combined_gsea['log_FDR'] = -np.log10(combined_gsea['FDR q-val'] + 1e-10) # Gene Count for Dot Size # GSEApy stores the leading edge genes as a semi-colon separated string in 'Lead_genes' -combined_gsea['Count'] = combined_gsea['Lead_genes'].apply(lambda x: len(str(x).split(';'))) +combined_gsea['Count'] = combined_gsea['Lead_genes'].apply( + lambda x: len(str(x).split(';')) +) ## remove MSigDB label from Term -combined_gsea['Term'] = combined_gsea['Term'].str.replace('MSigDB_Hallmark_2020__', '', regex=False) +combined_gsea['Term'] = combined_gsea['Term'].str.replace( + 'MSigDB_Hallmark_2020__', '', regex=False +) -print(f"Plotting {len(combined_gsea)} pathways.") +print(f'Plotting {len(combined_gsea)} pathways.') print(combined_gsea[['Term', 'NES', 'FDR q-val', 'Count']].head()) # %% @@ -837,11 +899,11 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols data=combined_gsea, x='log_FDR', y='Term', - hue='Direction', # Color by NES Direction - size='Count', # Size by number of Leading Edge genes - palette={'Upregulated': '#E41A1C', 'Downregulated': '#377EB8'}, # Red/Blue - sizes=(50, 400), # Range of dot sizes - alpha=0.8 + hue='Direction', # Color by NES Direction + size='Count', # Size by number of Leading Edge genes + palette={'Upregulated': '#E41A1C', 'Downregulated': '#377EB8'}, # Red/Blue + sizes=(50, 400), # Range of dot sizes + alpha=0.8, ) # Customization @@ -850,10 +912,15 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols plt.ylabel('') # Add a vertical line for significance (FDR < 0.05 => -log10(0.05) ~= 1.3) -plt.axvline(-np.log10(0.25), color='gray', linestyle=':', label='FDR=0.25 (GSEA standard)') +plt.axvline( + -np.log10(0.25), + color='gray', + linestyle=':', + label='FDR=0.25 (GSEA standard)', +) plt.axvline(-np.log10(0.05), color='gray', linestyle='--', label='FDR=0.05') -plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.) +plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.0) plt.grid(axis='x', alpha=0.3) plt.tight_layout() @@ -874,30 +941,31 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols (res['padj'] < 0.05) & (res['log2FoldChange'] < 0) ].index.tolist() -print(f"Analyzing {len(up_genes)} upregulated and {len(down_genes)} downregulated genes.") +print( + f'Analyzing {len(up_genes)} upregulated and {len(down_genes)} downregulated genes.' +) # 2. Run Enrichr (Over-Representation Analysis) if len(up_genes) > 0: enr_up = gp.enrichr( gene_list=up_genes, gene_sets=['MSigDB_Hallmark_2020'], - organism='human', - outdir=None + organism='human', + outdir=None, ) - print("Upregulated Pathways:") + print('Upregulated Pathways:') print(enr_up.results[['Term', 'Adjusted P-value', 'Overlap']].head(10)) - + if len(down_genes) > 0: enr_down = gp.enrichr( gene_list=down_genes, gene_sets=['MSigDB_Hallmark_2020'], - organism='human', - outdir=None + organism='human', + outdir=None, ) - print("Downregulated Pathways:") + print('Downregulated Pathways:') print(enr_down.results[['Term', 'Adjusted P-value', 'Overlap']].head(10)) - # %% @@ -924,14 +992,15 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # 5. Extract "Count" from the "Overlap" column (e.g., "5/200" -> 5) # This is used to size the dots -combined['Gene_Count'] = combined['Overlap'].apply(lambda x: int(x.split('/')[0])) +combined['Gene_Count'] = combined['Overlap'].apply( + lambda x: int(x.split('/')[0]) +) -print(f"Plotting {len(combined)} pathways.") +print(f'Plotting {len(combined)} pathways.') # %% -import seaborn as sns -import matplotlib.pyplot as plt + plt.figure(figsize=(10, 8)) @@ -940,19 +1009,21 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols data=combined, x='log_p', y='Term', - hue='Direction', # Color by Up/Down + hue='Direction', # Color by Up/Down size='Gene_Count', # Size by number of genes in pathway - palette={'Upregulated': '#E41A1C', 'Downregulated': '#377EB8'}, # Red/Blue - sizes=(50, 400), # Range of dot sizes - alpha=0.8 + palette={'Upregulated': '#E41A1C', 'Downregulated': '#377EB8'}, # Red/Blue + sizes=(50, 400), # Range of dot sizes + alpha=0.8, ) # Customization plt.title('Top Enriched Pathways (Up vs Down)', fontsize=14) plt.xlabel('-log10(Adjusted P-value)', fontsize=12) plt.ylabel('') -plt.axvline(-np.log10(0.05), color='gray', linestyle='--', alpha=0.5, label='p=0.05') # Significance threshold line -plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.) +plt.axvline( + -np.log10(0.05), color='gray', linestyle='--', alpha=0.5, label='p=0.05' +) # Significance threshold line +plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.0) plt.grid(axis='x', alpha=0.3) plt.tight_layout() @@ -967,12 +1038,7 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # # %% -from sklearn.svm import LinearSVR -from sklearn.model_selection import ShuffleSplit -from sklearn.preprocessing import StandardScaler -from sklearn.metrics import r2_score, mean_absolute_error -import pandas as pd -import numpy as np + # 1. Prepare features and target # Features: all genes from counts_df_ct + sex variable @@ -985,9 +1051,9 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # Target: age y = metadata_ct['age'].values -print(f"Feature matrix shape: {X.shape}") -print(f"Number of samples: {len(y)}") -print(f"Age range: {y.min():.1f} - {y.max():.1f} years") +print(f'Feature matrix shape: {X.shape}') +print(f'Number of samples: {len(y)}') +print(f'Age range: {y.min():.1f} - {y.max():.1f} years') # 2. Set up ShuffleSplit cross-validation # Using 5 splits with 20% test size @@ -1001,48 +1067,47 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # 4. Train and evaluate model for each split for fold, (train_idx, test_idx) in enumerate(cv.split(X)): - print(f"\nFold {fold + 1}/5") - + print(f'\nFold {fold + 1}/5') + # Split data X_train, X_test = X.iloc[train_idx], X.iloc[test_idx] y_train, y_test = y[train_idx], y[test_idx] - + # Scale features (important for SVR) scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) - + # Train Linear SVR # C parameter controls regularization (smaller = more regularization) model = LinearSVR(C=1.0, max_iter=10000, random_state=42, dual='auto') model.fit(X_train_scaled, y_train) - + # Predict on test set y_pred = model.predict(X_test_scaled) - + # Calculate metrics r2 = r2_score(y_test, y_pred) mae = mean_absolute_error(y_test, y_pred) - + r2_scores.append(r2) mae_scores.append(mae) predictions_list.extend(y_pred) actual_list.extend(y_test) - - print(f" R² Score: {r2:.3f}") - print(f" MAE: {mae:.2f} years") + + print(f' R² Score: {r2:.3f}') + print(f' MAE: {mae:.2f} years') # 5. Summary statistics -print("\n" + "="*50) -print("CROSS-VALIDATION RESULTS") -print("="*50) -print(f"R² Score: {np.mean(r2_scores):.3f} ± {np.std(r2_scores):.3f}") -print(f"MAE: {np.mean(mae_scores):.2f} ± {np.std(mae_scores):.2f} years") -print("="*50) +print('\n' + '=' * 50) +print('CROSS-VALIDATION RESULTS') +print('=' * 50) +print(f'R² Score: {np.mean(r2_scores):.3f} ± {np.std(r2_scores):.3f}') +print(f'MAE: {np.mean(mae_scores):.2f} ± {np.std(mae_scores):.2f} years') +print('=' * 50) # %% # Visualize predictions vs actual ages -import matplotlib.pyplot as plt plt.figure(figsize=(8, 6)) @@ -1052,12 +1117,20 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # Add diagonal line (perfect predictions) min_age = min(min(actual_list), min(predictions_list)) max_age = max(max(actual_list), max(predictions_list)) -plt.plot([min_age, max_age], [min_age, max_age], 'r--', linewidth=2, label='Perfect Prediction') +plt.plot( + [min_age, max_age], + [min_age, max_age], + 'r--', + linewidth=2, + label='Perfect Prediction', +) plt.xlabel('Actual Age (years)', fontsize=12) plt.ylabel('Predicted Age (years)', fontsize=12) -plt.title(f'Age Prediction Performance\nR² = {np.mean(r2_scores):.3f}, MAE = {np.mean(mae_scores):.2f} years', - fontsize=14) +plt.title( + f'Age Prediction Performance\nR² = {np.mean(r2_scores):.3f}, MAE = {np.mean(mae_scores):.2f} years', + fontsize=14, +) plt.legend() plt.grid(alpha=0.3) plt.tight_layout() @@ -1081,39 +1154,45 @@ def create_pseudobulk(adata, group_col, donor_col, layer='counts', metadata_cols # Split data X_train, X_test = X_baseline.iloc[train_idx], X_baseline.iloc[test_idx] y_train, y_test = y[train_idx], y[test_idx] - + # Scale features scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) - + # Train Linear SVR model = LinearSVR(C=1.0, max_iter=10000, random_state=42, dual='auto') model.fit(X_train_scaled, y_train) - + # Predict on test set y_pred = model.predict(X_test_scaled) - + # Calculate metrics r2 = r2_score(y_test, y_pred) mae = mean_absolute_error(y_test, y_pred) - + baseline_r2_scores.append(r2) baseline_mae_scores.append(mae) # Summary comparison -print("="*60) -print("MODEL COMPARISON") -print("="*60) -print(f"Full Model (Genes + Sex):") -print(f" R² Score: {np.mean(r2_scores):.3f} ± {np.std(r2_scores):.3f}") -print(f" MAE: {np.mean(mae_scores):.2f} ± {np.std(mae_scores):.2f} years") -print(f"\nBaseline Model (Sex Only):") -print(f" R² Score: {np.mean(baseline_r2_scores):.3f} ± {np.std(baseline_r2_scores):.3f}") -print(f" MAE: {np.mean(baseline_mae_scores):.2f} ± {np.std(baseline_mae_scores):.2f} years") -print(f"\nImprovement:") -print(f" ΔR²: {np.mean(r2_scores) - np.mean(baseline_r2_scores):.3f}") -print(f" ΔMAE: {np.mean(baseline_mae_scores) - np.mean(mae_scores):.2f} years") -print("="*60) +print('=' * 60) +print('MODEL COMPARISON') +print('=' * 60) +print('Full Model (Genes + Sex):') +print(f' R² Score: {np.mean(r2_scores):.3f} ± {np.std(r2_scores):.3f}') +print(f' MAE: {np.mean(mae_scores):.2f} ± {np.std(mae_scores):.2f} years') +print('\nBaseline Model (Sex Only):') +print( + f' R² Score: {np.mean(baseline_r2_scores):.3f} ± {np.std(baseline_r2_scores):.3f}' +) +print( + f' MAE: {np.mean(baseline_mae_scores):.2f} ± {np.std(baseline_mae_scores):.2f} years' +) +print('\nImprovement:') +print(f' ΔR²: {np.mean(r2_scores) - np.mean(baseline_r2_scores):.3f}') +print( + f' ΔMAE: {np.mean(baseline_mae_scores) - np.mean(mae_scores):.2f} years' +) +print('=' * 60) # %% From 1267abab1814be47cd200f4059af3d2492f095fb Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 08:54:51 -0800 Subject: [PATCH 30/87] add coding prefernces --- CLAUDE.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 7512e7a..5e0589a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -80,3 +80,14 @@ pre-commit run --all-files - Code examples should follow PEP8 - Avoid introducing new dependencies when possible - Custom words for codespell are in `project-words.txt` + +## Coding guidelines + +## Notes for Development + +- Think about the problem before generating code. +- Write code that is clean and modular. Prefer shorter functions/methods over longer ones. +- Prefer reliance on widely used packages (such as numpy, pandas, and scikit-learn); avoid unknown packages from Github. +- Do not include *any* code in `__init__.py` files. +- Use pytest for testing. +- Use functions rather than classes for tests. Use pytest fixtures to share resources between tests. From 55bfa7f8a685ad9a83f26f6897a440a5ada6f8ac Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 08:55:53 -0800 Subject: [PATCH 31/87] intermediate progress --- book/workflows.md | 127 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 118 insertions(+), 9 deletions(-) diff --git a/book/workflows.md b/book/workflows.md index 79c77ef..edae351 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -1,6 +1,6 @@ # Workflow Management -In most parts of science today, data processing and analysis comprises many different steps. We will refer to such a set of steps as a computational *workflow* (or, interchangeably, *pipeline*). If you have been doing science for very long, you have very likely encountered a *mega-script* that implements such a workflow. Usually written in a scripting language like *Bash*, this is a script that may be hundreds or even thousands of lines long that runs a single workflow from start to end. Often these scripts are handed down to new trainees over generations, such that users become afraid to make any changes lest the entire house of cards comes crashing down. I think that most of us can agree that this is not an optimal workflow, and in this chapter I will discuss in detail how to move from a mega-script to a workflow that will meet all of the requirements that are required to provide robust and reliable answers to our scientific questions. +In most parts of science today, data processing and analysis comprises many different steps. We will refer to such a set of steps as a computational *workflow*. If you have been doing science for very long, you have very likely encountered a *mega-script* that implements such a workflow. Usually written in a scripting language like *Bash*, this is a script that may be hundreds or even thousands of lines long that runs a single workflow from start to end. Often these scripts are handed down to new trainees over generations, such that users become afraid to make any changes lest the entire house of cards comes crashing down. I think that most of us can agree that this is not an optimal workflow, and in this chapter I will discuss in detail how to move from a mega-script to a workflow that will meet all of the requirements that are required to provide robust and reliable answers to our scientific questions. ## What do we want from a scientific workflow? @@ -30,19 +30,128 @@ Finally, we care about the *efficiency* of the workflow implementation. This inc It's worth noting that these different desiderata will sometimes conflict with one another (such as configurability versus maintainability), and that no workflow will be perfect. -## An example workflow +### Pipelines versus workflows -In this chapter I will use a running example to show how to move from a monolithic analysis script to a well-structured and usable workflow that meets most of the desired features outlined above. +The terms *workflow* and *pipeline* are sometimes used interchangeable, but in this chapter I will use them to refer to different kinds of applications. I will use *workflow* as the more general term to refer to any set of analysis procedures that are implemented as separate modules. I will use the term *pipeline* to refer more specifically to a data analysis workflow where the several operations are combined into a single command through the use of *pipes*, which are a syntactic construct that feed the output of one process directly into the next process as input. Some readers may be familiar with pipes from the UNIX filesystem, where they are represented by the vertical bar "|". For example, let's say that we had a log file that contains the follow entries: +```bash +2024-01-15 10:23:45 ERROR: Database connection failed +2024-01-15 10:24:12 ERROR: Invalid user input +2024-01-15 10:25:33 ERROR: Database connection failed +2024-01-15 10:26:01 INFO: Request processed +2024-01-15 10:27:15 ERROR: Database connection failed +``` + +and that we wanted to generate a summary of errors. We could use the following pipeline: + +```bash +grep "ERROR" app.log | sed 's/.*ERROR: //' | sort | uniq -c | sort -rn > error_summary.txt + +``` + +where: + +- `grep "ERROR" app.log` extracts line containing the word "ERROR" +- `sed 's/.*ERROR: //'` replaces everything up to the actual message with an empty string +- `sort` sorts the rows alphabetically +- `uniq -c` counts the number of appearances of each unique error message +- `sort -rn` sorts the rows in numerical order +- `> error_summary.txt` redirects the output into a file called `error_summary.txt` + +#### Method chaining + +One way that simple pipelines can be built in Python is using *method chaining*, where the output of one class method is redirected into the next class method. This is commonly used to perform data transformations in `pandas`, as it allows composing multiple transformations into a single command. As an example, we will work with the Eisenberg et al. dataset that we used in a previous chapter, to compute the probability of having ever been arrested separately for males and females in the sample. To do this we need to perform a number of operations: + +- drop any observations that have missing values for the `Sex` or `ArrestedChargedLifeCount` variables +- replace the numeric values in the `Sex` variable with text labels +- create a new variable called `EverArrested` that binarizes the counts in the ArrestedChargedLifeCount variable +- group the data by the `Sex` variable +- select the column that we want to compute the mean of (`EverArrested`) +- compute the mean + +We can do this in a single command using method chaining in `pandas`. It's useful to format the code in a way that makes the pipeline steps explicit, by putting parentheses around the operation; in Python, any commands within parentheses are combined into a single command, which can be useful for making complex code more readable: + +```python +arrest_stats_by_sex = (df + .dropna(subset=['Sex', 'ArrestedChargedLifeCount']) + .replace({'Sex': {0: 'Male', 1: 'Female'}}) + .assign(EverArrested=lambda x: (x['ArrestedChargedLifeCount'] > 0).astype(int)) + .groupby('Sex') + ['EverArrested'] + .mean() +) +print(arrest_stats_by_sex) +``` +```bash +Sex +Female 0.156489 +Male 0.274131 +Name: EverArrested, dtype: float64 +``` + +Note that `pandas` data frames also include an explicit `.pipe` method that allows using arbitrary functions within a pipeline. While these kinds of pipelines can be useful for simple data processing operations, they can become very difficult to debug, so I would generally avoid using complex functions within a method chain. + + +## An example of a complex workflow + +In this chapter we will focus primarily on complex workflows that have many stages. I will use a running example to show how to move from a monolithic analysis script to a well-structured and usable workflow that meets most of the desired features outlined above. For this example I will use an analysis of single-cell RNA-sequencing data to determine how gene expression in immune system cells changes with age. This analysis will utilize a [large openly available dataset](https://cellxgene.cziscience.com/collections/dde06e0f-ab3b-46be-96a2-a8082383c4a1) that includes data from about 1.3 million immune system cells for about 35K transcripts. I chose this particular example for several reasons: + +- It is a realistic example of a workflow that a researcher might actually perform. +- The data are large enough to call for a real workflow management scheme, but small enough to be processed on a single laptop (assuming it has decent memory). +- The workflow has many different steps, some of which can take a significant amount of time (over 30 minutes) +- There is an established Python library ([scanpy](https://scanpy.readthedocs.io/en/stable/)) that implements the necessary workflow components. +- It's an example outside of my own research domain, to help demonstrate the applicability of the book's ideas across a broader set of data types. + +### Starting point: One huge notebook + +I developed the initial version of this workflow in a way that many researchers would do so: By creating a Jupyter notebook that implements the entire workflow, which can be found [here](). Although I don't usually prefer to do code generation using a chatbot, I did most of the coding for this example using the Google Gemini 3.0 chatbot, for a couple of reasons. First, this model seemed particularly knowledgeable about this kind of analysis and the relevant packages. Second, I found it useful to read the commentary about why particular analysis steps were being selected. For debugging I used a mixture of the Gemini 3.0 chatbot and the VSCode Copilot agent, depending on the nature of the problem; for problems specific to the RNA-seq analysis tools I used Gemini, while for standard Python/Pandas issues I used Copilot. The total execution time for this notebook is about two hours on an M3 Max Macbook Pro. + +#### The problem of in-place operations + +What I found as I developed the workflow is that I increasingly ran into problems that arose because the state of particular objects had changed. This occurred for two reasons at different points. In some cases it occurred because I saved a new version of the object to the same name, resulting in an object with different structure than before. + +#### Converting from Jupyter notebook to a runnable python script + +- had to prevent plots from being displayed because this blocked execution +- used copilot to find and fix all plotting commands to save them to file rather than showing + +## Decomposing a complex workflow + +The first thing we need to do with a large monolithic workflow is to determine how to decompose it into coherent modules. There are various reasons that one might choose a particular breakpoint between modules. First and foremost, there are usually different stages that do conceptually different things. In our example, we can break the workflow into several high-level processes: + +- Data (down)loading +- Data filtering (removing subjects or cell types with insufficient observations) +- Quality control + - identifying bad cells on the basis of mitochondrial, ribosomal, or hemoglobin genes or hemoglobin contamination + - identifying "doublets" (multiple cells identified as one) +- Preprocessing + - Count normalization + - Log transformation + - Identification of high-variance features + - Filtering of nuisance genes +- Dimensionality reduction +- UMAP generation +- Clustering +- Pseudobulking +- Differential expression analysis +- Pathway enrichment analysis (GSEA) +- Overrepresentation analysis (Enrichr) +- Predictive modeling + +In addition to a conceptual breakdown, there are also other reasons that one might to further decompose the workflow: + +- There may be points where one might need to restart the computation (e.g. due to computational cost) +- There may be sections where one might wish to swap in a new method or different parameterization +- There may be points where the output could be reusable elsewhere + + + +## Stateless workflows + +I asked Claude Code to help modularize the monolithic workflow, using a prompt that provided the conceptual breakdown described above. The resulting code (found at XXX) ran correctly, but crashed about two hours into the process due to a resource issue that appeared to be due to asking for too many CPU cores in the differential expression analysis. This left me in the situation of having to rerun the entire two hours of preliminary workflow simply to get to a point where I could test my fix for the differential expression component, which is not a particularly efficient way of coding. The problem here is that the workflow execution is *stateful*, in the sense that the previous steps need to be rerun prior to performing the current step in order to establish the required objects in memory. The solution to this problem is to implement the workflow in a *stateless* way, which doesn't require that earlier steps be rerun if they have already been completed. One way to do this is by implementing a process called *checkpointing*, in which intermediate results are stored for each step. These can then be used to start the workflow at any point without having to rerun all of the previous steps. -## Breaking a workflow into stages -good breakpoints between workflow modules include: -- conceptual logic - different stages do different things -- points where one might need to restart the computation (e.g. due to computational cost) -- sections where one might wish to swap in a new method or different parameterization -- points where the output could be reusable elsewhere the workflow should be stateless when possible From 10b4c998fd1688ae2f88c7d93df2a7dd0a45d392 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 08:56:41 -0800 Subject: [PATCH 32/87] add deps --- pyproject.toml | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index 49d70ca..50c0f88 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -35,7 +35,6 @@ dependencies = [ "pre-commit>=4.2.0", "mdnewline>=0.1.3", "anthropic>=0.61.0", - "rpy2>=3.6.4", "nibabel>=5.3.2", "fastparquet>=2024.11.0", "templateflow>=25.1.1", @@ -47,7 +46,6 @@ dependencies = [ "datalad-osf>=0.3.0", "pymongo[srv]>=4.15.4", "mysql-connector-python>=9.5.0", - "mariadb>=1.1.14", "biopython>=1.86", "neo4j>=6.0.3", "tqdm>=4.66.5", @@ -76,6 +74,11 @@ dependencies = [ "fastcluster>=1.3.0", "scikit-misc>=0.5.2", "harmony-pytorch>=0.1.8", + "pydeseq2>=0.5.3", + "gseapy>=1.1.11", + "ipython>=9.8.0", + "harmonypy>=0.0.10", + "rpy2>=3.6.4", ] [build-system] From 84138d344a45b47ef6232983af16bf7729e81e7b Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 08:56:50 -0800 Subject: [PATCH 33/87] add deps --- uv.lock | 281 ++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 182 insertions(+), 99 deletions(-) diff --git a/uv.lock b/uv.lock index 51b298c..996c206 100644 --- a/uv.lock +++ b/uv.lock @@ -394,17 +394,19 @@ dependencies = [ { name = "fastparquet" }, { name = "fmriprep-docker" }, { name = "gprofiler-official" }, + { name = "gseapy" }, { name = "h5py" }, { name = "harmony-pytorch" }, + { name = "harmonypy" }, { name = "hypothesis" }, { name = "icecream" }, { name = "igraph" }, + { name = "ipython" }, { name = "jupyter" }, { name = "jupyter-book" }, { name = "jupytext" }, { name = "leidenalg" }, { name = "linkcheckmd" }, - { name = "mariadb" }, { name = "matplotlib" }, { name = "mdnewline" }, { name = "mne" }, @@ -424,6 +426,7 @@ dependencies = [ { name = "pickleshare" }, { name = "pre-commit" }, { name = "pyarrow" }, + { name = "pydeseq2" }, { name = "pygithub" }, { name = "pymongo" }, { name = "pyppeteer" }, @@ -469,17 +472,19 @@ requires-dist = [ { name = "fastparquet", specifier = ">=2024.11.0" }, { name = "fmriprep-docker", specifier = ">=25.2.3" }, { name = "gprofiler-official", specifier = ">=1.0.0" }, + { name = "gseapy", specifier = ">=1.1.11" }, { name = "h5py", specifier = ">=3.15.1" }, { name = "harmony-pytorch", specifier = ">=0.1.8" }, + { name = "harmonypy", specifier = ">=0.0.10" }, { name = "hypothesis", specifier = ">=6.115.3" }, { name = "icecream", specifier = ">=2.1.4" }, { name = "igraph", specifier = ">=1.0.0" }, + { name = "ipython", specifier = ">=9.8.0" }, { name = "jupyter", specifier = ">=1.1.1" }, { name = "jupyter-book", specifier = ">=1.0.2" }, { name = "jupytext", specifier = ">=1.16.4" }, { name = "leidenalg", specifier = ">=0.11.0" }, { name = "linkcheckmd", specifier = ">=1.4.0" }, - { name = "mariadb", specifier = ">=1.1.14" }, { name = "matplotlib", specifier = ">=3.9.2" }, { name = "mdnewline", specifier = ">=0.1.3" }, { name = "mne", specifier = ">=1.11.0" }, @@ -499,6 +504,7 @@ requires-dist = [ { name = "pickleshare", specifier = ">=0.7.5" }, { name = "pre-commit", specifier = ">=4.2.0" }, { name = "pyarrow", specifier = ">=22.0.0" }, + { name = "pydeseq2", specifier = ">=0.5.3" }, { name = "pygithub", specifier = ">=2.4.0" }, { name = "pymongo", extras = ["srv"], specifier = ">=4.15.4" }, { name = "pyppeteer", specifier = ">=2.0.0" }, @@ -539,16 +545,16 @@ wheels = [ [[package]] name = "bidsschematools" -version = "1.1.3" +version = "1.1.4" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "acres" }, { name = "click" }, { name = "pyyaml" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/f6/e4/98ecf10fb56d0967619c7d03e78a895adce7546f24d6af6008297f07ba74/bidsschematools-1.1.3.tar.gz", hash = "sha256:a3df5a4dfa085cd3acd00be2f0770019eaa5c4e6827763dad522d3ebef735b3a", size = 1754718, upload-time = "2025-11-18T12:50:59.759Z" } +sdist = { url = "https://files.pythonhosted.org/packages/c9/1c/c9464d519150e88c1a5fdd298384313fff067405f58f08dcd9372561b933/bidsschematools-1.1.4.tar.gz", hash = "sha256:843b611050a2d294dde64e7774e0fb998f57cf96fe44127ac7e44b2bdaa3f750", size = 1757344, upload-time = "2025-12-19T01:06:54.75Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/3b/40/e91d484e66e24c684baac044ff703c9a4cec833be00cad4b21e8cb9d8f8d/bidsschematools-1.1.3-py3-none-any.whl", hash = "sha256:01334729cb84bb7c0f7a4ee026debfd562fbabc52e5dce3029a1e08fa9b8cce4", size = 180659, upload-time = "2025-11-18T12:50:57.27Z" }, + { url = "https://files.pythonhosted.org/packages/4f/ab/9f28c6637450c03ff0a950b92853262d65cc95fe9b595310483ec172a512/bidsschematools-1.1.4-py3-none-any.whl", hash = "sha256:8aa97c035ebff3d25b85a3e7554211e094333ff0887defeda4845d5477bd0394", size = 182396, upload-time = "2025-12-19T01:06:51.355Z" }, ] [[package]] @@ -666,30 +672,30 @@ wheels = [ [[package]] name = "boto3" -version = "1.42.12" +version = "1.42.14" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "botocore" }, { name = "jmespath" }, { name = "s3transfer" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/98/66/ffe9623d64e97800ff6bac26953cd9ef99410fb864a0b26a0ea2e09b97f0/boto3-1.42.12.tar.gz", hash = "sha256:649b134d25b278c24fcc8b3f94519de3884283b7848dc32f42b0ffdd9d19ce99", size = 112868, upload-time = "2025-12-17T20:30:42.394Z" } +sdist = { url = "https://files.pythonhosted.org/packages/09/72/e236ca627bc0461710685f5b7438f759ef3b4106e0e08dda08513a6539ab/boto3-1.42.14.tar.gz", hash = "sha256:a5d005667b480c844ed3f814a59f199ce249d0f5669532a17d06200c0a93119c", size = 112825, upload-time = "2025-12-19T20:27:15.325Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/3e/8b/20a90c75499e3c3a8e3eb5607d930c723577ef8c64968b9be6b743f18158/boto3-1.42.12-py3-none-any.whl", hash = "sha256:8112e1beb5978bb455ea4b41a9ef26fc408f6340d8ff69ef93dded4f80fd53e9", size = 140573, upload-time = "2025-12-17T20:30:40.063Z" }, + { url = "https://files.pythonhosted.org/packages/bb/ba/c657ea6f6d63563cc46748202fccd097b51755d17add00ebe4ea27580d06/boto3-1.42.14-py3-none-any.whl", hash = "sha256:bfcc665227bb4432a235cb4adb47719438d6472e5ccbf7f09512046c3f749670", size = 140571, upload-time = "2025-12-19T20:27:13.316Z" }, ] [[package]] name = "botocore" -version = "1.42.12" +version = "1.42.14" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "jmespath" }, { name = "python-dateutil" }, { name = "urllib3" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/a0/b6/9b7988a8476712cdbfeeb68c733933005465c85ebf0ee469a6ea5ca3415c/botocore-1.42.12.tar.gz", hash = "sha256:1f9f63c3d6bb1f768519da30d6018706443c5d8af5472274d183a4945f3d81f8", size = 14879004, upload-time = "2025-12-17T20:30:29.542Z" } +sdist = { url = "https://files.pythonhosted.org/packages/35/3f/50c56f093c2c6ce6de1f579726598db1cf9a9cccd3bf8693f73b1cf5e319/botocore-1.42.14.tar.gz", hash = "sha256:cf5bebb580803c6cfd9886902ca24834b42ecaa808da14fb8cd35ad523c9f621", size = 14910547, upload-time = "2025-12-19T20:27:04.431Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/8f/73/22764d0a17130b7d95b2a4104607e6db5487a0e5afb68f5691260ae9c3dc/botocore-1.42.12-py3-none-any.whl", hash = "sha256:4f163880350f6d831857ce5d023875b7c6534be862e5affd9fcf82b8d1ab3537", size = 14552878, upload-time = "2025-12-17T20:30:24.671Z" }, + { url = "https://files.pythonhosted.org/packages/ad/94/67a78a8d08359e779894d4b1672658a3c7fcce216b48f06dfbe1de45521d/botocore-1.42.14-py3-none-any.whl", hash = "sha256:efe89adfafa00101390ec2c371d453b3359d5f9690261bc3bd70131e0d453e8e", size = 14583247, upload-time = "2025-12-19T20:27:00.54Z" }, ] [[package]] @@ -1376,7 +1382,7 @@ wheels = [ [[package]] name = "fastapi" -version = "0.125.0" +version = "0.127.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "annotated-doc" }, @@ -1384,9 +1390,9 @@ dependencies = [ { name = "starlette" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/17/71/2df15009fb4bdd522a069d2fbca6007c6c5487fce5cb965be00fc335f1d1/fastapi-0.125.0.tar.gz", hash = "sha256:16b532691a33e2c5dee1dac32feb31dc6eb41a3dd4ff29a95f9487cb21c054c0", size = 370550, upload-time = "2025-12-17T21:41:44.15Z" } +sdist = { url = "https://files.pythonhosted.org/packages/0c/02/2cbbecf6551e0c1a06f9b9765eb8f7ae126362fbba43babbb11b0e3b7db3/fastapi-0.127.0.tar.gz", hash = "sha256:5a9246e03dcd1fdb19f1396db30894867c1d630f5107dc167dcbc5ed1ea7d259", size = 369269, upload-time = "2025-12-21T16:47:16.393Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/34/2f/ff2fcc98f500713368d8b650e1bbc4a0b3ebcdd3e050dcdaad5f5a13fd7e/fastapi-0.125.0-py3-none-any.whl", hash = "sha256:2570ec4f3aecf5cca8f0428aed2398b774fcdfee6c2116f86e80513f2f86a7a1", size = 112888, upload-time = "2025-12-17T21:41:41.286Z" }, + { url = "https://files.pythonhosted.org/packages/8a/fa/6a27e2ef789eb03060abb43b952a7f0bd39e6feaa3805362b48785bcedc5/fastapi-0.127.0-py3-none-any.whl", hash = "sha256:725aa2bb904e2eff8031557cf4b9b77459bfedd63cae8427634744fd199f6a49", size = 112055, upload-time = "2025-12-21T16:47:14.757Z" }, ] [[package]] @@ -1460,7 +1466,7 @@ wheels = [ [[package]] name = "fastparquet" -version = "2024.11.0" +version = "2025.12.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "cramjam" }, @@ -1469,16 +1475,15 @@ dependencies = [ { name = "packaging" }, { name = "pandas" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/b4/66/862da14f5fde4eff2cedc0f51a8dc34ba145088e5041b45b2d57ac54f922/fastparquet-2024.11.0.tar.gz", hash = "sha256:e3b1fc73fd3e1b70b0de254bae7feb890436cb67e99458b88cb9bd3cc44db419", size = 467192, upload-time = "2024-11-15T19:30:10.413Z" } +sdist = { url = "https://files.pythonhosted.org/packages/1e/ad/87f7f5750685e8e0a359d732c85332481ba9b5723af579f8755f81154d0b/fastparquet-2025.12.0.tar.gz", hash = "sha256:85f807d3846c7691855a68ed7ff6ee40654b72b997f5b1199e6310a1e19d1cd5", size = 480045, upload-time = "2025-12-18T16:22:22.016Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/08/76/068ac7ec9b4fc783be21a75a6a90b8c0654da4d46934d969e524ce287787/fastparquet-2024.11.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:dbad4b014782bd38b58b8e9f514fe958cfa7a6c4e187859232d29fd5c5ddd849", size = 915968, upload-time = "2024-11-12T20:37:52.861Z" }, - { url = "https://files.pythonhosted.org/packages/c7/9e/6d3b4188ad64ed51173263c07109a5f18f9c84a44fa39ab524fca7420cda/fastparquet-2024.11.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:403d31109d398b6be7ce84fa3483fc277c6a23f0b321348c0a505eb098a041cb", size = 685399, upload-time = "2024-11-12T20:37:54.899Z" }, - { url = "https://files.pythonhosted.org/packages/8f/6c/809220bc9fbe83d107df2d664c3fb62fb81867be8f5218ac66c2e6b6a358/fastparquet-2024.11.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cbbb9057a26acf0abad7adf58781ee357258b7708ee44a289e3bee97e2f55d42", size = 1758557, upload-time = "2024-11-12T20:37:56.553Z" }, - { url = "https://files.pythonhosted.org/packages/e0/2c/b3b3e6ca2e531484289024138cd4709c22512b3fe68066d7f9849da4a76c/fastparquet-2024.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:63e0e416e25c15daa174aad8ba991c2e9e5b0dc347e5aed5562124261400f87b", size = 1781052, upload-time = "2024-11-12T20:37:58.339Z" }, - { url = "https://files.pythonhosted.org/packages/21/fe/97ed45092d0311c013996dae633122b7a51c5d9fe8dcbc2c840dc491201e/fastparquet-2024.11.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0e2d7f02f57231e6c86d26e9ea71953737202f20e948790e5d4db6d6a1a150dc", size = 1715797, upload-time = "2024-11-12T20:38:00.694Z" }, - { url = "https://files.pythonhosted.org/packages/24/df/02fa6aee6c0d53d1563b5bc22097076c609c4c5baa47056b0b4bed456fcf/fastparquet-2024.11.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:fbe4468146b633d8f09d7b196fea0547f213cb5ce5f76e9d1beb29eaa9593a93", size = 1795682, upload-time = "2024-11-12T20:38:02.38Z" }, - { url = "https://files.pythonhosted.org/packages/b0/25/f4f87557589e1923ee0e3bebbc84f08b7c56962bf90f51b116ddc54f2c9f/fastparquet-2024.11.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:29d5c718817bcd765fc519b17f759cad4945974421ecc1931d3bdc3e05e57fa9", size = 1857842, upload-time = "2024-11-12T20:38:04.196Z" }, - { url = "https://files.pythonhosted.org/packages/b1/f9/98cd0c39115879be1044d59c9b76e8292776e99bb93565bf990078fd11c4/fastparquet-2024.11.0-cp312-cp312-win_amd64.whl", hash = "sha256:74a0b3c40ab373442c0fda96b75a36e88745d8b138fcc3a6143e04682cbbb8ca", size = 673269, upload-time = "2024-12-11T21:22:48.073Z" }, + { url = "https://files.pythonhosted.org/packages/6c/b2/229a4482d80a737d0fe6706c4f93adb631f42ec5b0a2b154247d63bb48fe/fastparquet-2025.12.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:27b1cf0557ddddbf0e28db64d4d3bea1384be1d245b2cef280d001811e3600fe", size = 896986, upload-time = "2025-12-18T21:53:52.611Z" }, + { url = "https://files.pythonhosted.org/packages/2c/c2/953117c43bf617379eff79ce8a2318ef49f7f41908faade051fa12281ac8/fastparquet-2025.12.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:9356c59e48825d61719960ccb9ce799ad5cd1b04f2f13368f03fab1f3c645d1e", size = 687642, upload-time = "2025-12-18T21:54:13.594Z" }, + { url = "https://files.pythonhosted.org/packages/92/35/41deaa9a4fc9ab6c00f3b49afe56cbafee13a111032aa41f23d077b69ad6/fastparquet-2025.12.0-cp312-cp312-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:c4c92e299a314d4b542dc881eeb4d587dc075c0a5a86c07ccf171d8852e9736d", size = 1764260, upload-time = "2025-12-18T21:58:11.197Z" }, + { url = "https://files.pythonhosted.org/packages/1a/0f/a229b3f699aaccc7b5ec3f5e21cff8aa99bc199499bff08cf38bc6ab52c6/fastparquet-2025.12.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4881dc91c7e6d1d08cda9968ed1816b0c66a74b1826014c26713cad923aaca71", size = 1810920, upload-time = "2025-12-18T21:57:31.514Z" }, + { url = "https://files.pythonhosted.org/packages/90/c2/ca76afca0c2debef368a42a701d501e696490e0a7138f0337709a724b189/fastparquet-2025.12.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d8d70d90614f19752919037c4a88aaaeda3cd7667aeb54857c48054e2a9e3588", size = 1819692, upload-time = "2025-12-18T21:58:43.095Z" }, + { url = "https://files.pythonhosted.org/packages/ab/41/f235c0d8171f6676b9d4fb8468c781fbe7bf90fed2c4383f2d8d82e574db/fastparquet-2025.12.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:8e2ccf387f629cb11b72fec6f15a55e0f40759b47713124764a9867097bcd377", size = 1784357, upload-time = "2025-12-18T21:58:13.258Z" }, + { url = "https://files.pythonhosted.org/packages/29/7e/c86bf33b363cf5a1ad71d3ebd4a352131ba99566c78aa58d9e56c98526ba/fastparquet-2025.12.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:1978e7f3c32044f2f7a0b35784240dfc3eaeb8065a879fa3011c832fea4e7037", size = 1815777, upload-time = "2025-12-18T21:58:44.432Z" }, ] [[package]] @@ -1506,11 +1511,10 @@ wheels = [ [[package]] name = "flatbuffers" -version = "25.9.23" +version = "25.12.19" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/9d/1f/3ee70b0a55137442038f2a33469cc5fddd7e0ad2abf83d7497c18a2b6923/flatbuffers-25.9.23.tar.gz", hash = "sha256:676f9fa62750bb50cf531b42a0a2a118ad8f7f797a511eda12881c016f093b12", size = 22067, upload-time = "2025-09-24T05:25:30.106Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/ee/1b/00a78aa2e8fbd63f9af08c9c19e6deb3d5d66b4dda677a0f61654680ee89/flatbuffers-25.9.23-py2.py3-none-any.whl", hash = "sha256:255538574d6cb6d0a79a17ec8bc0d30985913b87513a01cce8bcdb6b4c44d0e2", size = 30869, upload-time = "2025-09-24T05:25:28.912Z" }, + { url = "https://files.pythonhosted.org/packages/e8/2d/d2a548598be01649e2d46231d151a6c56d10b964d94043a335ae56ea2d92/flatbuffers-25.12.19-py2.py3-none-any.whl", hash = "sha256:7634f50c427838bb021c2d66a3d1168e9d199b0607e6329399f04846d42e20b4", size = 26661, upload-time = "2025-12-19T23:16:13.622Z" }, ] [[package]] @@ -1557,6 +1561,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/1a/9d/c2c8b51b32f829a16fe042db30ad1dcef7947bf1dcf77c2cfd7b6f37b83a/formulaic-1.2.1-py3-none-any.whl", hash = "sha256:661d6d2467aa961b9afb3a1e2a187494239793c63eb729e422d1307afa98b43b", size = 117290, upload-time = "2025-09-21T05:27:30.025Z" }, ] +[[package]] +name = "formulaic-contrasts" +version = "1.0.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "formulaic" }, + { name = "pandas" }, + { name = "session-info" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/28/e6/4850976c248746062cfaa08628b3ec5ba3dfcab3d6ecd0d3886c36c04681/formulaic_contrasts-1.0.0.tar.gz", hash = "sha256:0a575a810bf1fba28938259d86a3ae2ae90cb9826fca84b9409085170862f701", size = 123794, upload-time = "2024-12-15T13:44:06.844Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/40/7b/639411281256c84e8111bf6cb9676c44dbf5d8ad4cb042f4359b7e7b9e74/formulaic_contrasts-1.0.0-py3-none-any.whl", hash = "sha256:e1220d315cf446bdec9385375ca4da43896e4ba68114ebea1b2a37efa5d097f5", size = 10054, upload-time = "2024-12-15T13:44:05.454Z" }, +] + [[package]] name = "fqdn" version = "1.5.1" @@ -1723,6 +1741,26 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/9e/00/7bd478cbb851c04a48baccaa49b75abaa8e4122f7d86da797500cccdd771/grpcio-1.76.0-cp312-cp312-win_amd64.whl", hash = "sha256:c088e7a90b6017307f423efbb9d1ba97a22aa2170876223f9709e9d1de0b5347", size = 4704003, upload-time = "2025-10-21T16:21:46.244Z" }, ] +[[package]] +name = "gseapy" +version = "1.1.11" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "matplotlib" }, + { name = "numpy" }, + { name = "pandas" }, + { name = "requests" }, + { name = "scipy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/1c/78/7c0fbec6019db95dadb560049cc503e950438488b3b0822a2270e1f62d2a/gseapy-1.1.11.tar.gz", hash = "sha256:d36a164ee466f7ea6deadfe82ea041f3328ee937ff4c9de862b3e6e2825df0dd", size = 116084, upload-time = "2025-11-16T22:55:26.486Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/07/71/8034311bc4a7a41414cd188da9b411b4cd0c357574b01d8609d6e9a1d336/gseapy-1.1.11-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4d22c2ec30dc863b86292a0e967f8c7216ef03028b41f1ece6c59d277a870bdc", size = 533921, upload-time = "2025-11-16T23:02:53.061Z" }, + { url = "https://files.pythonhosted.org/packages/0d/3e/c3c23ff829d6a88c403cda12ed856ff93c7f07c510e3bf5c114a4d2f575e/gseapy-1.1.11-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:48453a402feae0412f6330a3c39f95ab02e82693f33bc4a1c6c02c867e7e6d1c", size = 605338, upload-time = "2025-11-16T22:58:48.471Z" }, + { url = "https://files.pythonhosted.org/packages/2a/9f/592125b3eabb64ecaf3275e4f0cff7dc59d438f8ffde360d686802787bc5/gseapy-1.1.11-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:6c64e60a8f61047c7d4053b791c7dea1c375bd28b955f0c50ae3cd607013c47f", size = 585366, upload-time = "2025-11-16T23:41:36.192Z" }, + { url = "https://files.pythonhosted.org/packages/fa/a6/2b86532b665a3dd50d5bf5390e3b751487a6b6f64eddff1c21ea9d302fef/gseapy-1.1.11-cp312-cp312-win32.whl", hash = "sha256:18ba31a03b043b7a78397c0589f04d0f4d7a3ff76af09e219f0240085708c4c6", size = 391320, upload-time = "2025-11-16T23:05:48.509Z" }, + { url = "https://files.pythonhosted.org/packages/9c/ab/6374ddf4cd4637b0cab1e9cd2dd8b1bf007bdf1e9fe1bf8bff2d83482a9b/gseapy-1.1.11-cp312-cp312-win_amd64.whl", hash = "sha256:5645f8f8c88a9218225a7c207d6d1de9eed9955f108ad2a06c46f42885ba4fa8", size = 423182, upload-time = "2025-11-16T23:03:12.868Z" }, +] + [[package]] name = "gunicorn" version = "23.0.0" @@ -1780,6 +1818,21 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/75/da/42486f1c79b6f2db9140ee23161791e5b25d9369f30c1d9f67b67f3eb4bf/harmony_pytorch-0.1.8-py3-none-any.whl", hash = "sha256:1f92f6145ea93225b0226fda9da5bdd442e411d14ff402052afae0fde7fd1452", size = 8474, upload-time = "2024-01-07T21:36:54.488Z" }, ] +[[package]] +name = "harmonypy" +version = "0.0.10" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, + { name = "pandas" }, + { name = "scikit-learn" }, + { name = "scipy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/1a/69/9af6183745618057797b940a76320c52a38ad2a69e688e6345e2a0219655/harmonypy-0.0.10.tar.gz", hash = "sha256:27bd39a6f9ada1708ffa577e46c9b7363d1e2fd62740e477ce11fd61819a54df", size = 20339, upload-time = "2024-07-04T20:55:06.385Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/cc/cd/9479dd66e503af191edc016a302d2125c4f02ea777ebea1e48f6b944b073/harmonypy-0.0.10-py3-none-any.whl", hash = "sha256:dab528052f909204e521c9c2bd980221c64003538b0c0fe25be2e43c1199282b", size = 20885, upload-time = "2024-07-04T20:55:00.329Z" }, +] + [[package]] name = "hbreader" version = "0.9.1" @@ -1880,11 +1933,11 @@ wheels = [ [[package]] name = "humanize" -version = "4.14.0" +version = "4.15.0" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/b6/43/50033d25ad96a7f3845f40999b4778f753c3901a11808a584fed7c00d9f5/humanize-4.14.0.tar.gz", hash = "sha256:2fa092705ea640d605c435b1ca82b2866a1b601cdf96f076d70b79a855eba90d", size = 82939, upload-time = "2025-10-15T13:04:51.214Z" } +sdist = { url = "https://files.pythonhosted.org/packages/ba/66/a3921783d54be8a6870ac4ccffcd15c4dc0dd7fcce51c6d63b8c63935276/humanize-4.15.0.tar.gz", hash = "sha256:1dd098483eb1c7ee8e32eb2e99ad1910baefa4b75c3aff3a82f4d78688993b10", size = 83599, upload-time = "2025-12-20T20:16:13.19Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/c3/5b/9512c5fb6c8218332b530f13500c6ff5f3ce3342f35e0dd7be9ac3856fd3/humanize-4.14.0-py3-none-any.whl", hash = "sha256:d57701248d040ad456092820e6fde56c930f17749956ac47f4f655c0c547bfff", size = 132092, upload-time = "2025-10-15T13:04:49.404Z" }, + { url = "https://files.pythonhosted.org/packages/c5/7b/bca5613a0c3b542420cf92bd5e5fb8ebd5435ce1011a091f66bb7693285e/humanize-4.15.0-py3-none-any.whl", hash = "sha256:b1186eb9f5a9749cd9cb8565aee77919dd7c8d076161cf44d70e59e3301e1769", size = 132203, upload-time = "2025-12-20T20:16:11.67Z" }, ] [[package]] @@ -1995,14 +2048,14 @@ wheels = [ [[package]] name = "importlib-metadata" -version = "8.7.0" +version = "8.7.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "zipp" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/76/66/650a33bd90f786193e4de4b3ad86ea60b53c89b669a5c7be931fac31cdb0/importlib_metadata-8.7.0.tar.gz", hash = "sha256:d13b81ad223b890aa16c5471f2ac3056cf76c5f10f82d6f9292f0b415f389000", size = 56641, upload-time = "2025-04-27T15:29:01.736Z" } +sdist = { url = "https://files.pythonhosted.org/packages/f3/49/3b30cad09e7771a4982d9975a8cbf64f00d4a1ececb53297f1d9a7be1b10/importlib_metadata-8.7.1.tar.gz", hash = "sha256:49fef1ae6440c182052f407c8d34a68f72efc36db9ca90dc0113398f2fdde8bb", size = 57107, upload-time = "2025-12-21T10:00:19.278Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/20/b0/36bd937216ec521246249be3bf9855081de4c5e06a0c9b4219dbeda50373/importlib_metadata-8.7.0-py3-none-any.whl", hash = "sha256:e5dd1551894c77868a30651cef00984d50e1002d06942a7101d34870c5f02afd", size = 27656, upload-time = "2025-04-27T15:29:00.214Z" }, + { url = "https://files.pythonhosted.org/packages/fa/5e/f8e9a1d23b9c20a551a8a02ea3637b4642e22c2626e3a13a9a29cdea99eb/importlib_metadata-8.7.1-py3-none-any.whl", hash = "sha256:5a1f80bf1daa489495071efbb095d75a634cf28a8bc299581244063b53176151", size = 27865, upload-time = "2025-12-21T10:00:18.329Z" }, ] [[package]] @@ -2158,14 +2211,14 @@ wheels = [ [[package]] name = "jaraco-functools" -version = "4.3.0" +version = "4.4.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "more-itertools" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/f7/ed/1aa2d585304ec07262e1a83a9889880701079dde796ac7b1d1826f40c63d/jaraco_functools-4.3.0.tar.gz", hash = "sha256:cfd13ad0dd2c47a3600b439ef72d8615d482cedcff1632930d6f28924d92f294", size = 19755, upload-time = "2025-08-18T20:05:09.91Z" } +sdist = { url = "https://files.pythonhosted.org/packages/0f/27/056e0638a86749374d6f57d0b0db39f29509cce9313cf91bdc0ac4d91084/jaraco_functools-4.4.0.tar.gz", hash = "sha256:da21933b0417b89515562656547a77b4931f98176eb173644c0d35032a33d6bb", size = 19943, upload-time = "2025-12-21T09:29:43.6Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/b4/09/726f168acad366b11e420df31bf1c702a54d373a83f968d94141a8c3fde0/jaraco_functools-4.3.0-py3-none-any.whl", hash = "sha256:227ff8ed6f7b8f62c56deff101545fa7543cf2c8e7b82a7c2116e672f29c26e8", size = 10408, upload-time = "2025-08-18T20:05:08.69Z" }, + { url = "https://files.pythonhosted.org/packages/fd/c4/813bb09f0985cb21e959f21f2464169eca882656849adf727ac7bb7e1767/jaraco_functools-4.4.0-py3-none-any.whl", hash = "sha256:9eec1e36f45c818d9bf307c8948eb03b2b56cd44087b3cdc989abca1f20b9176", size = 10481, upload-time = "2025-12-21T09:29:42.27Z" }, ] [[package]] @@ -2890,19 +2943,6 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ea/7b/93c73c67db235931527301ed3785f849c78991e2e34f3fd9a6663ffda4c5/lxml-6.0.2-cp312-cp312-win_arm64.whl", hash = "sha256:61cb10eeb95570153e0c0e554f58df92ecf5109f75eacad4a95baa709e26c3d6", size = 3672836, upload-time = "2025-09-22T04:01:52.145Z" }, ] -[[package]] -name = "mariadb" -version = "1.1.14" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "packaging" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/c4/ba/cedef19833be88e07bfff11964441cda8a998f1628dd3b2fa3e7751d36e0/mariadb-1.1.14.tar.gz", hash = "sha256:e6d702a53eccf20922e47f2f45cfb5c7a0c2c6c0a46e4ee2d8a80d0ff4a52f34", size = 111715, upload-time = "2025-10-07T06:45:48.017Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/00/04/659a8d30513700b5921ec96bddc07f550016c045fcbeb199d8cd18476ecc/mariadb-1.1.14-cp312-cp312-win32.whl", hash = "sha256:98d552a8bb599eceaa88f65002ad00bd88aeed160592c273a7e5c1d79ab733dd", size = 185266, upload-time = "2025-10-07T06:45:34.164Z" }, - { url = "https://files.pythonhosted.org/packages/e4/a9/8f210291bc5fc044e20497454f40d35b3bab326e2cab6fccdc38121cb2c1/mariadb-1.1.14-cp312-cp312-win_amd64.whl", hash = "sha256:685a1ad2a24fd0aae1c4416fe0ac794adc84ab9209c8d0c57078f770d39731db", size = 202112, upload-time = "2025-10-07T06:45:35.824Z" }, -] - [[package]] name = "markdown-it-py" version = "4.0.0" @@ -3254,7 +3294,7 @@ wheels = [ [[package]] name = "nbclient" -version = "0.10.2" +version = "0.10.3" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "jupyter-client" }, @@ -3262,9 +3302,9 @@ dependencies = [ { name = "nbformat" }, { name = "traitlets" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/87/66/7ffd18d58eae90d5721f9f39212327695b749e23ad44b3881744eaf4d9e8/nbclient-0.10.2.tar.gz", hash = "sha256:90b7fc6b810630db87a6d0c2250b1f0ab4cf4d3c27a299b0cde78a4ed3fd9193", size = 62424, upload-time = "2024-12-19T10:32:27.164Z" } +sdist = { url = "https://files.pythonhosted.org/packages/8d/f3/1f6cf2ede4b026bc5f0b424cb41adf22f9c804e90a4dbd4fdb42291a35d5/nbclient-0.10.3.tar.gz", hash = "sha256:0baf171ee246e3bb2391da0635e719f27dc77d99aef59e0b04dcb935ee04c575", size = 62564, upload-time = "2025-12-19T15:50:09.331Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/34/6d/e7fa07f03a4a7b221d94b4d586edb754a9b0dc3c9e2c93353e9fa4e0d117/nbclient-0.10.2-py3-none-any.whl", hash = "sha256:4ffee11e788b4a27fabeb7955547e4318a5298f34342a4bfd01f2e1faaeadc3d", size = 25434, upload-time = "2024-12-19T10:32:24.139Z" }, + { url = "https://files.pythonhosted.org/packages/b2/77/0c73678f5260501a271fd7342bee5d639440f2e9e07d590f1100a056d87c/nbclient-0.10.3-py3-none-any.whl", hash = "sha256:39e9bd403504dd2484dd0fd25235bb6a683ce8cd9873356e40d880696adc9e35", size = 25473, upload-time = "2025-12-19T15:50:07.671Z" }, ] [[package]] @@ -3728,7 +3768,7 @@ wheels = [ [[package]] name = "openai" -version = "2.13.0" +version = "2.14.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "anyio" }, @@ -3740,9 +3780,9 @@ dependencies = [ { name = "tqdm" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/0f/39/8e347e9fda125324d253084bb1b82407e5e3c7777a03dc398f79b2d95626/openai-2.13.0.tar.gz", hash = "sha256:9ff633b07a19469ec476b1e2b5b26c5ef700886524a7a72f65e6f0b5203142d5", size = 626583, upload-time = "2025-12-16T18:19:44.387Z" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/b1/12fe1c196bea326261718eb037307c1c1fe1dedc2d2d4de777df822e6238/openai-2.14.0.tar.gz", hash = "sha256:419357bedde9402d23bf8f2ee372fca1985a73348debba94bddff06f19459952", size = 626938, upload-time = "2025-12-19T03:28:45.742Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/bb/d5/eb52edff49d3d5ea116e225538c118699ddeb7c29fa17ec28af14bc10033/openai-2.13.0-py3-none-any.whl", hash = "sha256:746521065fed68df2f9c2d85613bb50844343ea81f60009b60e6a600c9352c79", size = 1066837, upload-time = "2025-12-16T18:19:43.124Z" }, + { url = "https://files.pythonhosted.org/packages/27/4b/7c1a00c2c3fbd004253937f7520f692a9650767aa73894d7a34f0d65d3f4/openai-2.14.0-py3-none-any.whl", hash = "sha256:7ea40aca4ffc4c4a776e77679021b47eec1160e341f42ae086ba949c9dcc9183", size = 1067558, upload-time = "2025-12-19T03:28:43.727Z" }, ] [[package]] @@ -4423,6 +4463,25 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f7/07/34573da085946b6a313d7c42f82f16e8920bfd730665de2d11c0c37a74b5/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:76d0819de158cd855d1cbb8fcafdf6f5cf1eb8e470abe056d5d161106e38062b", size = 2139017, upload-time = "2025-11-04T13:42:59.471Z" }, ] +[[package]] +name = "pydeseq2" +version = "0.5.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "anndata" }, + { name = "formulaic" }, + { name = "formulaic-contrasts" }, + { name = "matplotlib" }, + { name = "numpy" }, + { name = "pandas" }, + { name = "scikit-learn" }, + { name = "scipy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/58/e2/92bab7a299821396baca1a8c600c273ee13c5a081bcd788adc8e4a000ccc/pydeseq2-0.5.3.tar.gz", hash = "sha256:7dcbb34f80ce8147f566e9080d259b88df193174138983c4a77c1fe18ff3fe76", size = 790037, upload-time = "2025-10-28T15:43:41.284Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/fa/68/c969b97f7147090273f147a01cf48d87b26296775f02b3caacd4c061ce34/pydeseq2-0.5.3-py3-none-any.whl", hash = "sha256:113842dedaeffdeac0873ed498af2408c58b99fb646a89013f0c7710ae796608", size = 48420, upload-time = "2025-10-28T15:35:51.995Z" }, +] + [[package]] name = "pyee" version = "11.1.1" @@ -5112,28 +5171,28 @@ wheels = [ [[package]] name = "ruff" -version = "0.14.9" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/f6/1b/ab712a9d5044435be8e9a2beb17cbfa4c241aa9b5e4413febac2a8b79ef2/ruff-0.14.9.tar.gz", hash = "sha256:35f85b25dd586381c0cc053f48826109384c81c00ad7ef1bd977bfcc28119d5b", size = 5809165, upload-time = "2025-12-11T21:39:47.381Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/b8/1c/d1b1bba22cffec02351c78ab9ed4f7d7391876e12720298448b29b7229c1/ruff-0.14.9-py3-none-linux_armv6l.whl", hash = "sha256:f1ec5de1ce150ca6e43691f4a9ef5c04574ad9ca35c8b3b0e18877314aba7e75", size = 13576541, upload-time = "2025-12-11T21:39:14.806Z" }, - { url = "https://files.pythonhosted.org/packages/94/ab/ffe580e6ea1fca67f6337b0af59fc7e683344a43642d2d55d251ff83ceae/ruff-0.14.9-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:ed9d7417a299fc6030b4f26333bf1117ed82a61ea91238558c0268c14e00d0c2", size = 13779363, upload-time = "2025-12-11T21:39:20.29Z" }, - { url = "https://files.pythonhosted.org/packages/7d/f8/2be49047f929d6965401855461e697ab185e1a6a683d914c5c19c7962d9e/ruff-0.14.9-py3-none-macosx_11_0_arm64.whl", hash = "sha256:d5dc3473c3f0e4a1008d0ef1d75cee24a48e254c8bed3a7afdd2b4392657ed2c", size = 12925292, upload-time = "2025-12-11T21:39:38.757Z" }, - { url = "https://files.pythonhosted.org/packages/9e/e9/08840ff5127916bb989c86f18924fd568938b06f58b60e206176f327c0fe/ruff-0.14.9-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:84bf7c698fc8f3cb8278830fb6b5a47f9bcc1ed8cb4f689b9dd02698fa840697", size = 13362894, upload-time = "2025-12-11T21:39:02.524Z" }, - { url = "https://files.pythonhosted.org/packages/31/1c/5b4e8e7750613ef43390bb58658eaf1d862c0cc3352d139cd718a2cea164/ruff-0.14.9-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:aa733093d1f9d88a5d98988d8834ef5d6f9828d03743bf5e338bf980a19fce27", size = 13311482, upload-time = "2025-12-11T21:39:17.51Z" }, - { url = "https://files.pythonhosted.org/packages/5b/3a/459dce7a8cb35ba1ea3e9c88f19077667a7977234f3b5ab197fad240b404/ruff-0.14.9-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6a1cfb04eda979b20c8c19550c8b5f498df64ff8da151283311ce3199e8b3648", size = 14016100, upload-time = "2025-12-11T21:39:41.948Z" }, - { url = "https://files.pythonhosted.org/packages/a6/31/f064f4ec32524f9956a0890fc6a944e5cf06c63c554e39957d208c0ffc45/ruff-0.14.9-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:1e5cb521e5ccf0008bd74d5595a4580313844a42b9103b7388eca5a12c970743", size = 15477729, upload-time = "2025-12-11T21:39:23.279Z" }, - { url = "https://files.pythonhosted.org/packages/7a/6d/f364252aad36ccd443494bc5f02e41bf677f964b58902a17c0b16c53d890/ruff-0.14.9-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:cd429a8926be6bba4befa8cdcf3f4dd2591c413ea5066b1e99155ed245ae42bb", size = 15122386, upload-time = "2025-12-11T21:39:33.125Z" }, - { url = "https://files.pythonhosted.org/packages/20/02/e848787912d16209aba2799a4d5a1775660b6a3d0ab3944a4ccc13e64a02/ruff-0.14.9-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:ab208c1b7a492e37caeaf290b1378148f75e13c2225af5d44628b95fd7834273", size = 14497124, upload-time = "2025-12-11T21:38:59.33Z" }, - { url = "https://files.pythonhosted.org/packages/f3/51/0489a6a5595b7760b5dbac0dd82852b510326e7d88d51dbffcd2e07e3ff3/ruff-0.14.9-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:72034534e5b11e8a593f517b2f2f2b273eb68a30978c6a2d40473ad0aaa4cb4a", size = 14195343, upload-time = "2025-12-11T21:39:44.866Z" }, - { url = "https://files.pythonhosted.org/packages/f6/53/3bb8d2fa73e4c2f80acc65213ee0830fa0c49c6479313f7a68a00f39e208/ruff-0.14.9-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:712ff04f44663f1b90a1195f51525836e3413c8a773574a7b7775554269c30ed", size = 14346425, upload-time = "2025-12-11T21:39:05.927Z" }, - { url = "https://files.pythonhosted.org/packages/ad/04/bdb1d0ab876372da3e983896481760867fc84f969c5c09d428e8f01b557f/ruff-0.14.9-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:a111fee1db6f1d5d5810245295527cda1d367c5aa8f42e0fca9a78ede9b4498b", size = 13258768, upload-time = "2025-12-11T21:39:08.691Z" }, - { url = "https://files.pythonhosted.org/packages/40/d9/8bf8e1e41a311afd2abc8ad12be1b6c6c8b925506d9069b67bb5e9a04af3/ruff-0.14.9-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:8769efc71558fecc25eb295ddec7d1030d41a51e9dcf127cbd63ec517f22d567", size = 13326939, upload-time = "2025-12-11T21:39:53.842Z" }, - { url = "https://files.pythonhosted.org/packages/f4/56/a213fa9edb6dd849f1cfbc236206ead10913693c72a67fb7ddc1833bf95d/ruff-0.14.9-py3-none-musllinux_1_2_i686.whl", hash = "sha256:347e3bf16197e8a2de17940cd75fd6491e25c0aa7edf7d61aa03f146a1aa885a", size = 13578888, upload-time = "2025-12-11T21:39:35.988Z" }, - { url = "https://files.pythonhosted.org/packages/33/09/6a4a67ffa4abae6bf44c972a4521337ffce9cbc7808faadede754ef7a79c/ruff-0.14.9-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:7715d14e5bccf5b660f54516558aa94781d3eb0838f8e706fb60e3ff6eff03a8", size = 14314473, upload-time = "2025-12-11T21:39:50.78Z" }, - { url = "https://files.pythonhosted.org/packages/12/0d/15cc82da5d83f27a3c6b04f3a232d61bc8c50d38a6cd8da79228e5f8b8d6/ruff-0.14.9-py3-none-win32.whl", hash = "sha256:df0937f30aaabe83da172adaf8937003ff28172f59ca9f17883b4213783df197", size = 13202651, upload-time = "2025-12-11T21:39:26.628Z" }, - { url = "https://files.pythonhosted.org/packages/32/f7/c78b060388eefe0304d9d42e68fab8cffd049128ec466456cef9b8d4f06f/ruff-0.14.9-py3-none-win_amd64.whl", hash = "sha256:c0b53a10e61df15a42ed711ec0bda0c582039cf6c754c49c020084c55b5b0bc2", size = 14702079, upload-time = "2025-12-11T21:39:11.954Z" }, - { url = "https://files.pythonhosted.org/packages/26/09/7a9520315decd2334afa65ed258fed438f070e31f05a2e43dd480a5e5911/ruff-0.14.9-py3-none-win_arm64.whl", hash = "sha256:8e821c366517a074046d92f0e9213ed1c13dbc5b37a7fc20b07f79b64d62cc84", size = 13744730, upload-time = "2025-12-11T21:39:29.659Z" }, +version = "0.14.10" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/57/08/52232a877978dd8f9cf2aeddce3e611b40a63287dfca29b6b8da791f5e8d/ruff-0.14.10.tar.gz", hash = "sha256:9a2e830f075d1a42cd28420d7809ace390832a490ed0966fe373ba288e77aaf4", size = 5859763, upload-time = "2025-12-18T19:28:57.98Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/60/01/933704d69f3f05ee16ef11406b78881733c186fe14b6a46b05cfcaf6d3b2/ruff-0.14.10-py3-none-linux_armv6l.whl", hash = "sha256:7a3ce585f2ade3e1f29ec1b92df13e3da262178df8c8bdf876f48fa0e8316c49", size = 13527080, upload-time = "2025-12-18T19:29:25.642Z" }, + { url = "https://files.pythonhosted.org/packages/df/58/a0349197a7dfa603ffb7f5b0470391efa79ddc327c1e29c4851e85b09cc5/ruff-0.14.10-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:674f9be9372907f7257c51f1d4fc902cb7cf014b9980152b802794317941f08f", size = 13797320, upload-time = "2025-12-18T19:29:02.571Z" }, + { url = "https://files.pythonhosted.org/packages/7b/82/36be59f00a6082e38c23536df4e71cdbc6af8d7c707eade97fcad5c98235/ruff-0.14.10-py3-none-macosx_11_0_arm64.whl", hash = "sha256:d85713d522348837ef9df8efca33ccb8bd6fcfc86a2cde3ccb4bc9d28a18003d", size = 12918434, upload-time = "2025-12-18T19:28:51.202Z" }, + { url = "https://files.pythonhosted.org/packages/a6/00/45c62a7f7e34da92a25804f813ebe05c88aa9e0c25e5cb5a7d23dd7450e3/ruff-0.14.10-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6987ebe0501ae4f4308d7d24e2d0fe3d7a98430f5adfd0f1fead050a740a3a77", size = 13371961, upload-time = "2025-12-18T19:29:04.991Z" }, + { url = "https://files.pythonhosted.org/packages/40/31/a5906d60f0405f7e57045a70f2d57084a93ca7425f22e1d66904769d1628/ruff-0.14.10-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:16a01dfb7b9e4eee556fbfd5392806b1b8550c9b4a9f6acd3dbe6812b193c70a", size = 13275629, upload-time = "2025-12-18T19:29:21.381Z" }, + { url = "https://files.pythonhosted.org/packages/3e/60/61c0087df21894cf9d928dc04bcd4fb10e8b2e8dca7b1a276ba2155b2002/ruff-0.14.10-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:7165d31a925b7a294465fa81be8c12a0e9b60fb02bf177e79067c867e71f8b1f", size = 14029234, upload-time = "2025-12-18T19:29:00.132Z" }, + { url = "https://files.pythonhosted.org/packages/44/84/77d911bee3b92348b6e5dab5a0c898d87084ea03ac5dc708f46d88407def/ruff-0.14.10-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:c561695675b972effb0c0a45db233f2c816ff3da8dcfbe7dfc7eed625f218935", size = 15449890, upload-time = "2025-12-18T19:28:53.573Z" }, + { url = "https://files.pythonhosted.org/packages/e9/36/480206eaefa24a7ec321582dda580443a8f0671fdbf6b1c80e9c3e93a16a/ruff-0.14.10-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:4bb98fcbbc61725968893682fd4df8966a34611239c9fd07a1f6a07e7103d08e", size = 15123172, upload-time = "2025-12-18T19:29:23.453Z" }, + { url = "https://files.pythonhosted.org/packages/5c/38/68e414156015ba80cef5473d57919d27dfb62ec804b96180bafdeaf0e090/ruff-0.14.10-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:f24b47993a9d8cb858429e97bdf8544c78029f09b520af615c1d261bf827001d", size = 14460260, upload-time = "2025-12-18T19:29:27.808Z" }, + { url = "https://files.pythonhosted.org/packages/b3/19/9e050c0dca8aba824d67cc0db69fb459c28d8cd3f6855b1405b3f29cc91d/ruff-0.14.10-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:59aabd2e2c4fd614d2862e7939c34a532c04f1084476d6833dddef4afab87e9f", size = 14229978, upload-time = "2025-12-18T19:29:11.32Z" }, + { url = "https://files.pythonhosted.org/packages/51/eb/e8dd1dd6e05b9e695aa9dd420f4577debdd0f87a5ff2fedda33c09e9be8c/ruff-0.14.10-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:213db2b2e44be8625002dbea33bb9c60c66ea2c07c084a00d55732689d697a7f", size = 14338036, upload-time = "2025-12-18T19:29:09.184Z" }, + { url = "https://files.pythonhosted.org/packages/6a/12/f3e3a505db7c19303b70af370d137795fcfec136d670d5de5391e295c134/ruff-0.14.10-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:b914c40ab64865a17a9a5b67911d14df72346a634527240039eb3bd650e5979d", size = 13264051, upload-time = "2025-12-18T19:29:13.431Z" }, + { url = "https://files.pythonhosted.org/packages/08/64/8c3a47eaccfef8ac20e0484e68e0772013eb85802f8a9f7603ca751eb166/ruff-0.14.10-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:1484983559f026788e3a5c07c81ef7d1e97c1c78ed03041a18f75df104c45405", size = 13283998, upload-time = "2025-12-18T19:29:06.994Z" }, + { url = "https://files.pythonhosted.org/packages/12/84/534a5506f4074e5cc0529e5cd96cfc01bb480e460c7edf5af70d2bcae55e/ruff-0.14.10-py3-none-musllinux_1_2_i686.whl", hash = "sha256:c70427132db492d25f982fffc8d6c7535cc2fd2c83fc8888f05caaa248521e60", size = 13601891, upload-time = "2025-12-18T19:28:55.811Z" }, + { url = "https://files.pythonhosted.org/packages/0d/1e/14c916087d8598917dbad9b2921d340f7884824ad6e9c55de948a93b106d/ruff-0.14.10-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:5bcf45b681e9f1ee6445d317ce1fa9d6cba9a6049542d1c3d5b5958986be8830", size = 14336660, upload-time = "2025-12-18T19:29:16.531Z" }, + { url = "https://files.pythonhosted.org/packages/f2/1c/d7b67ab43f30013b47c12b42d1acd354c195351a3f7a1d67f59e54227ede/ruff-0.14.10-py3-none-win32.whl", hash = "sha256:104c49fc7ab73f3f3a758039adea978869a918f31b73280db175b43a2d9b51d6", size = 13196187, upload-time = "2025-12-18T19:29:19.006Z" }, + { url = "https://files.pythonhosted.org/packages/fb/9c/896c862e13886fae2af961bef3e6312db9ebc6adc2b156fe95e615dee8c1/ruff-0.14.10-py3-none-win_amd64.whl", hash = "sha256:466297bd73638c6bdf06485683e812db1c00c7ac96d4ddd0294a338c62fdc154", size = 14661283, upload-time = "2025-12-18T19:29:30.16Z" }, + { url = "https://files.pythonhosted.org/packages/74/31/b0e29d572670dca3674eeee78e418f20bdf97fa8aa9ea71380885e175ca0/ruff-0.14.10-py3-none-win_arm64.whl", hash = "sha256:e51d046cf6dda98a4633b8a8a771451107413b0f07183b2bef03f075599e44e6", size = 13729839, upload-time = "2025-12-18T19:28:48.636Z" }, ] [[package]] @@ -5204,7 +5263,7 @@ wheels = [ [[package]] name = "scikit-image" -version = "0.25.2" +version = "0.26.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "imageio" }, @@ -5216,13 +5275,16 @@ dependencies = [ { name = "scipy" }, { name = "tifffile" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/c7/a8/3c0f256012b93dd2cb6fda9245e9f4bff7dc0486880b248005f15ea2255e/scikit_image-0.25.2.tar.gz", hash = "sha256:e5a37e6cd4d0c018a7a55b9d601357e3382826d3888c10d0213fc63bff977dde", size = 22693594, upload-time = "2025-02-18T18:05:24.538Z" } +sdist = { url = "https://files.pythonhosted.org/packages/a1/b4/2528bb43c67d48053a7a649a9666432dc307d66ba02e3a6d5c40f46655df/scikit_image-0.26.0.tar.gz", hash = "sha256:f5f970ab04efad85c24714321fcc91613fcb64ef2a892a13167df2f3e59199fa", size = 22729739, upload-time = "2025-12-20T17:12:21.824Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/35/8c/5df82881284459f6eec796a5ac2a0a304bb3384eec2e73f35cfdfcfbf20c/scikit_image-0.25.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8db8dd03663112783221bf01ccfc9512d1cc50ac9b5b0fe8f4023967564719fb", size = 13986000, upload-time = "2025-02-18T18:04:47.156Z" }, - { url = "https://files.pythonhosted.org/packages/ce/e6/93bebe1abcdce9513ffec01d8af02528b4c41fb3c1e46336d70b9ed4ef0d/scikit_image-0.25.2-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:483bd8cc10c3d8a7a37fae36dfa5b21e239bd4ee121d91cad1f81bba10cfb0ed", size = 13235893, upload-time = "2025-02-18T18:04:51.049Z" }, - { url = "https://files.pythonhosted.org/packages/53/4b/eda616e33f67129e5979a9eb33c710013caa3aa8a921991e6cc0b22cea33/scikit_image-0.25.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9d1e80107bcf2bf1291acfc0bf0425dceb8890abe9f38d8e94e23497cbf7ee0d", size = 14178389, upload-time = "2025-02-18T18:04:54.245Z" }, - { url = "https://files.pythonhosted.org/packages/6b/b5/b75527c0f9532dd8a93e8e7cd8e62e547b9f207d4c11e24f0006e8646b36/scikit_image-0.25.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a17e17eb8562660cc0d31bb55643a4da996a81944b82c54805c91b3fe66f4824", size = 15003435, upload-time = "2025-02-18T18:04:57.586Z" }, - { url = "https://files.pythonhosted.org/packages/34/e3/49beb08ebccda3c21e871b607c1cb2f258c3fa0d2f609fed0a5ba741b92d/scikit_image-0.25.2-cp312-cp312-win_amd64.whl", hash = "sha256:bdd2b8c1de0849964dbc54037f36b4e9420157e67e45a8709a80d727f52c7da2", size = 12899474, upload-time = "2025-02-18T18:05:01.166Z" }, + { url = "https://files.pythonhosted.org/packages/99/e8/e13757982264b33a1621628f86b587e9a73a13f5256dad49b19ba7dc9083/scikit_image-0.26.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:d454b93a6fa770ac5ae2d33570f8e7a321bb80d29511ce4b6b78058ebe176e8c", size = 12376452, upload-time = "2025-12-20T17:10:52.796Z" }, + { url = "https://files.pythonhosted.org/packages/e3/be/f8dd17d0510f9911f9f17ba301f7455328bf13dae416560126d428de9568/scikit_image-0.26.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:3409e89d66eff5734cd2b672d1c48d2759360057e714e1d92a11df82c87cba37", size = 12061567, upload-time = "2025-12-20T17:10:55.207Z" }, + { url = "https://files.pythonhosted.org/packages/b3/2b/c70120a6880579fb42b91567ad79feb4772f7be72e8d52fec403a3dde0c6/scikit_image-0.26.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4c717490cec9e276afb0438dd165b7c3072d6c416709cc0f9f5a4c1070d23a44", size = 13084214, upload-time = "2025-12-20T17:10:57.468Z" }, + { url = "https://files.pythonhosted.org/packages/f4/a2/70401a107d6d7466d64b466927e6b96fcefa99d57494b972608e2f8be50f/scikit_image-0.26.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7df650e79031634ac90b11e64a9eedaf5a5e06fcd09bcd03a34be01745744466", size = 13561683, upload-time = "2025-12-20T17:10:59.49Z" }, + { url = "https://files.pythonhosted.org/packages/13/a5/48bdfd92794c5002d664e0910a349d0a1504671ef5ad358150f21643c79a/scikit_image-0.26.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:cefd85033e66d4ea35b525bb0937d7f42d4cdcfed2d1888e1570d5ce450d3932", size = 14112147, upload-time = "2025-12-20T17:11:02.083Z" }, + { url = "https://files.pythonhosted.org/packages/ee/b5/ac71694da92f5def5953ca99f18a10fe98eac2dd0a34079389b70b4d0394/scikit_image-0.26.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:3f5bf622d7c0435884e1e141ebbe4b2804e16b2dd23ae4c6183e2ea99233be70", size = 14661625, upload-time = "2025-12-20T17:11:04.528Z" }, + { url = "https://files.pythonhosted.org/packages/23/4d/a3cc1e96f080e253dad2251bfae7587cf2b7912bcd76fd43fd366ff35a87/scikit_image-0.26.0-cp312-cp312-win_amd64.whl", hash = "sha256:abed017474593cd3056ae0fe948d07d0747b27a085e92df5474f4955dd65aec0", size = 11911059, upload-time = "2025-12-20T17:11:06.61Z" }, + { url = "https://files.pythonhosted.org/packages/35/8a/d1b8055f584acc937478abf4550d122936f420352422a1a625eef2c605d8/scikit_image-0.26.0-cp312-cp312-win_arm64.whl", hash = "sha256:4d57e39ef67a95d26860c8caf9b14b8fb130f83b34c6656a77f191fa6d1d04d8", size = 11348740, upload-time = "2025-12-20T17:11:09.118Z" }, ] [[package]] @@ -5361,6 +5423,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/49/65/dea992c6a97074f6d8ff9eab34741298cac2ce23e2b6c74fb7d08afdf85c/sentinels-1.1.1-py3-none-any.whl", hash = "sha256:835d3b28f3b47f5284afa4bf2db6e00f2dc5f80f9923d4b7e7aeeeccf6146a11", size = 3744, upload-time = "2025-08-12T07:57:48.858Z" }, ] +[[package]] +name = "session-info" +version = "1.0.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "stdlib-list" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/f5/dc/4a0c85aee2034be368d3ca293a563128122dde6db6e1bc9ca9ef3472c731/session_info-1.0.1.tar.gz", hash = "sha256:d71950d5a8ce7f7f7d5e86aa208c148c4e50b5440b77d5544d422b48e4f3ed41", size = 24663, upload-time = "2025-04-11T16:08:43.504Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c5/c4/f6b7c0ec5241a2bde90c7ba1eca6ba44f8488bcedafe9072c79593015ec0/session_info-1.0.1-py3-none-any.whl", hash = "sha256:451d191e51816070b9f21a6ff3f6eb5d6015ae2738e8db63ac4e6398260a5838", size = 9119, upload-time = "2025-04-11T16:08:42.612Z" }, +] + [[package]] name = "session-info2" version = "0.2.3" @@ -5450,11 +5524,11 @@ wheels = [ [[package]] name = "soupsieve" -version = "2.8" +version = "2.8.1" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/6d/e6/21ccce3262dd4889aa3332e5a119a3491a95e8f60939870a3a035aabac0d/soupsieve-2.8.tar.gz", hash = "sha256:e2dd4a40a628cb5f28f6d4b0db8800b8f581b65bb380b97de22ba5ca8d72572f", size = 103472, upload-time = "2025-08-27T15:39:51.78Z" } +sdist = { url = "https://files.pythonhosted.org/packages/89/23/adf3796d740536d63a6fbda113d07e60c734b6ed5d3058d1e47fc0495e47/soupsieve-2.8.1.tar.gz", hash = "sha256:4cf733bc50fa805f5df4b8ef4740fc0e0fa6218cf3006269afd3f9d6d80fd350", size = 117856, upload-time = "2025-12-18T13:50:34.655Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/14/a0/bb38d3b76b8cae341dad93a2dd83ab7462e6dbcdd84d43f54ee60a8dc167/soupsieve-2.8-py3-none-any.whl", hash = "sha256:0cc76456a30e20f5d7f2e14a98a4ae2ee4e5abdc7c5ea0aafe795f344bc7984c", size = 36679, upload-time = "2025-08-27T15:39:50.179Z" }, + { url = "https://files.pythonhosted.org/packages/48/f3/b67d6ea49ca9154453b6d70b34ea22f3996b9fa55da105a79d8732227adc/soupsieve-2.8.1-py3-none-any.whl", hash = "sha256:a11fe2a6f3d76ab3cf2de04eb339c1be5b506a8a47f2ceb6d139803177f85434", size = 36710, upload-time = "2025-12-18T13:50:33.267Z" }, ] [[package]] @@ -5763,6 +5837,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/60/15/3daba2df40be8b8a9a027d7f54c8dedf24f0d81b96e54b52293f5f7e3418/statsmodels-0.14.6-cp312-cp312-win_amd64.whl", hash = "sha256:b5eb07acd115aa6208b4058211138393a7e6c2cf12b6f213ede10f658f6a714f", size = 9543991, upload-time = "2025-12-05T23:10:58.536Z" }, ] +[[package]] +name = "stdlib-list" +version = "0.12.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/8c/25/f1540879c8815387980e56f973e54605bd924612399ace31487f7444171c/stdlib_list-0.12.0.tar.gz", hash = "sha256:517824f27ee89e591d8ae7c1dd9ff34f672eae50ee886ea31bb8816d77535675", size = 60923, upload-time = "2025-10-24T19:21:22.849Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/3b/3d/2970b27a11ae17fb2d353e7a179763a2fe6f37d6d2a9f4d40104a2f132e9/stdlib_list-0.12.0-py3-none-any.whl", hash = "sha256:df2d11e97f53812a1756fb5510393a11e3b389ebd9239dc831c7f349957f62f2", size = 87615, upload-time = "2025-10-24T19:21:20.619Z" }, +] + [[package]] name = "sympy" version = "1.14.0" @@ -5864,14 +5947,14 @@ wheels = [ [[package]] name = "tifffile" -version = "2025.12.12" +version = "2025.12.20" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "numpy" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/31/b9/4253513a66f0a836ec3a5104266cf73f7812bfbbcda9d87d8c0e93b28293/tifffile-2025.12.12.tar.gz", hash = "sha256:97e11fd6b1d8dc971896a098c841d9cd4e6eb958ac040dd6fb8b332c3f7288b6", size = 373597, upload-time = "2025-12-13T03:42:53.765Z" } +sdist = { url = "https://files.pythonhosted.org/packages/f8/a6/85e8ecfd7cb4167f8bd17136b2d42cba296fbc08a247bba70d5747e2046a/tifffile-2025.12.20.tar.gz", hash = "sha256:cb8a4fee327d15b3e3eeac80bbdd8a53b323c80473330bcfb99418ee4c1c827f", size = 373364, upload-time = "2025-12-21T06:23:54.241Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/d5/5c/e444e1b024a519e488326525f0c154396c6b16baff17e00623f2c21dfc42/tifffile-2025.12.12-py3-none-any.whl", hash = "sha256:e3e3f1290ec6741ca248a5b5a997125209b5c2962f6bd9aef01ea9352c25d0ee", size = 232132, upload-time = "2025-12-13T03:42:52.072Z" }, + { url = "https://files.pythonhosted.org/packages/1b/fe/e59859aa1134fac065d36864752daf13215c98b379cb5d93f954dc0ec830/tifffile-2025.12.20-py3-none-any.whl", hash = "sha256:bc0345a20675149353cfcb3f1c48d0a3654231ee26bd46beebaab4d2168feeb6", size = 232031, upload-time = "2025-12-21T06:23:53.003Z" }, ] [[package]] @@ -6044,7 +6127,7 @@ wheels = [ [[package]] name = "typer" -version = "0.20.0" +version = "0.20.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "click" }, @@ -6052,22 +6135,22 @@ dependencies = [ { name = "shellingham" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/8f/28/7c85c8032b91dbe79725b6f17d2fffc595dff06a35c7a30a37bef73a1ab4/typer-0.20.0.tar.gz", hash = "sha256:1aaf6494031793e4876fb0bacfa6a912b551cf43c1e63c800df8b1a866720c37", size = 106492, upload-time = "2025-10-20T17:03:49.445Z" } +sdist = { url = "https://files.pythonhosted.org/packages/6d/c1/933d30fd7a123ed981e2a1eedafceab63cb379db0402e438a13bc51bbb15/typer-0.20.1.tar.gz", hash = "sha256:68585eb1b01203689c4199bc440d6be616f0851e9f0eb41e4a778845c5a0fd5b", size = 105968, upload-time = "2025-12-19T16:48:56.302Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/78/64/7713ffe4b5983314e9d436a90d5bd4f63b6054e2aca783a3cfc44cb95bbf/typer-0.20.0-py3-none-any.whl", hash = "sha256:5b463df6793ec1dca6213a3cf4c0f03bc6e322ac5e16e13ddd622a889489784a", size = 47028, upload-time = "2025-10-20T17:03:47.617Z" }, + { url = "https://files.pythonhosted.org/packages/c8/52/1f2df7e7d1be3d65ddc2936d820d4a3d9777a54f4204f5ca46b8513eff77/typer-0.20.1-py3-none-any.whl", hash = "sha256:4b3bde918a67c8e03d861aa02deca90a95bbac572e71b1b9be56ff49affdb5a8", size = 47381, upload-time = "2025-12-19T16:48:53.679Z" }, ] [[package]] name = "typer-slim" -version = "0.20.0" +version = "0.20.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "click" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/8e/45/81b94a52caed434b94da65729c03ad0fb7665fab0f7db9ee54c94e541403/typer_slim-0.20.0.tar.gz", hash = "sha256:9fc6607b3c6c20f5c33ea9590cbeb17848667c51feee27d9e314a579ab07d1a3", size = 106561, upload-time = "2025-10-20T17:03:46.642Z" } +sdist = { url = "https://files.pythonhosted.org/packages/3f/3d/6a4ec47010e8de34dade20c8e7bce90502b173f62a6b41619523a3fcf562/typer_slim-0.20.1.tar.gz", hash = "sha256:bb9e4f7e6dc31551c8a201383df322b81b0ce37239a5ead302598a2ebb6f7c9c", size = 106113, upload-time = "2025-12-19T16:48:54.206Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/5e/dd/5cbf31f402f1cc0ab087c94d4669cfa55bd1e818688b910631e131d74e75/typer_slim-0.20.0-py3-none-any.whl", hash = "sha256:f42a9b7571a12b97dddf364745d29f12221865acef7a2680065f9bb29c7dc89d", size = 47087, upload-time = "2025-10-20T17:03:44.546Z" }, + { url = "https://files.pythonhosted.org/packages/d8/f9/a273c8b57c69ac1b90509ebda204972265fdc978fbbecc25980786f8c038/typer_slim-0.20.1-py3-none-any.whl", hash = "sha256:8e89c5dbaffe87a4f86f4c7a9e2f7059b5b68c66f558f298969d42ce34f10122", size = 47440, upload-time = "2025-12-19T16:48:52.678Z" }, ] [[package]] @@ -6174,15 +6257,15 @@ wheels = [ [[package]] name = "uvicorn" -version = "0.38.0" +version = "0.40.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "click" }, { name = "h11" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/cb/ce/f06b84e2697fef4688ca63bdb2fdf113ca0a3be33f94488f2cadb690b0cf/uvicorn-0.38.0.tar.gz", hash = "sha256:fd97093bdd120a2609fc0d3afe931d4d4ad688b6e75f0f929fde1bc36fe0e91d", size = 80605, upload-time = "2025-10-18T13:46:44.63Z" } +sdist = { url = "https://files.pythonhosted.org/packages/c3/d1/8f3c683c9561a4e6689dd3b1d345c815f10f86acd044ee1fb9a4dcd0b8c5/uvicorn-0.40.0.tar.gz", hash = "sha256:839676675e87e73694518b5574fd0f24c9d97b46bea16df7b8c05ea1a51071ea", size = 81761, upload-time = "2025-12-21T14:16:22.45Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/ee/d9/d88e73ca598f4f6ff671fb5fde8a32925c2e08a637303a1d12883c7305fa/uvicorn-0.38.0-py3-none-any.whl", hash = "sha256:48c0afd214ceb59340075b4a052ea1ee91c16fbc2a9b1469cca0e54566977b02", size = 68109, upload-time = "2025-10-18T13:46:42.958Z" }, + { url = "https://files.pythonhosted.org/packages/3d/d8/2083a1daa7439a66f3a48589a57d576aa117726762618f6bb09fe3798796/uvicorn-0.40.0-py3-none-any.whl", hash = "sha256:c6c8f55bc8bf13eb6fa9ff87ad62308bbbc33d0b67f84293151efe87e0d5f2ee", size = 68502, upload-time = "2025-12-21T14:16:21.041Z" }, ] [package.optional-dependencies] From 943e3c17c6d4c6078242e853a0edfa7f1dc892fc Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 08:57:10 -0800 Subject: [PATCH 34/87] final version of monolithic workflow --- .../rnaseq/immune_scrnaseq_monolithic.py | 55 +++++++++++++------ 1 file changed, 38 insertions(+), 17 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py index 95c994d..cac638e 100644 --- a/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py +++ b/src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py @@ -68,11 +68,13 @@ # %% datadir = Path('/Users/poldrack/data_unsynced/BCBS/immune_aging/') +figure_dir = datadir / 'workflow/figures' +figure_dir.mkdir(parents=True, exist_ok=True) # %% -datafile = datadir / 'a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad' -url = 'https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad' dataset_name = 'OneK1K' +datafile = datadir / f'dataset-{dataset_name}_subset-immune_raw.h5ad' +url = 'https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad' if not datafile.exists(): cmd = f'wget -O {datafile.as_posix()} {url}' @@ -116,7 +118,7 @@ # Optional: Draw a vertical line at the propsoed cutoff # This helps you visualize how many donors you would lose. -cutoff_percentile = 10 # e.g., 10th percentile +cutoff_percentile = 1 # e.g., 1st percentile min_cells_per_donor = int( scoreatpercentile(donor_cell_counts.values, cutoff_percentile) ) @@ -132,7 +134,8 @@ ) plt.legend() -plt.show() +plt.savefig(figure_dir / 'donor_cell_counts_distribution.png', dpi=300, bbox_inches='tight') +plt.close() # %% print( @@ -164,7 +167,7 @@ # Keep if >= 10 cells in at least 90% of donors min_cells = 10 -percent_donors = 0.9 +percent_donors = 0.95 donor_count = counts_per_donor.shape[0] cell_types_to_keep = counts_per_donor.columns[ (counts_per_donor >= min_cells).sum(axis=0) @@ -270,14 +273,19 @@ ['total_counts', 'n_genes_by_counts', 'pct_counts_mt'], jitter=0.4, multi_panel=True, + show=False, ) +plt.savefig(figure_dir / 'qc_violin_plots.png', dpi=300, bbox_inches='tight') +plt.close() # 2. Scatter plot to spot doublets and dying cells # High mito + low genes = dying cell # High counts + high genes = potential doublet sc.pl.scatter( - adata, x='total_counts', y='n_genes_by_counts', color='pct_counts_mt' + adata, x='total_counts', y='n_genes_by_counts', color='pct_counts_mt', show=False ) +plt.savefig(figure_dir / 'qc_scatter_doublets.png', dpi=300, bbox_inches='tight') +plt.close() # %% [markdown] # #### Check Hemoglobin (RBC contamination) @@ -293,7 +301,8 @@ plt.xlabel('% Hemoglobin Counts') plt.axvline(5, color='red', linestyle='--', label='5% Cutoff') plt.legend() -plt.show() +plt.savefig(figure_dir / 'hemoglobin_distribution.png', dpi=300, bbox_inches='tight') +plt.close() # %% [markdown] # #### Create a copy of the data and apply QC cutoffs @@ -398,7 +407,9 @@ # # %% -sc.pl.umap(adata_qc, color=['doublet_score', 'predicted_doublet'], size=20) +sc.pl.umap(adata_qc, color=['doublet_score', 'predicted_doublet'], size=20, show=False) +plt.savefig(figure_dir / 'doublet_detection_umap.png', dpi=300, bbox_inches='tight') +plt.close() # %% [markdown] # #### Filter doublets @@ -410,7 +421,7 @@ # Filter the data to keep only singlets (False) # write back to adata for simplicity -adata = adata_qc[not adata_qc.obs['predicted_doublet'], :] +adata = adata_qc[adata_qc.obs['predicted_doublet'] == False, :] #noqa: E712 print(f'Remaining cells: {adata.n_obs}') # %% [markdown] @@ -536,7 +547,9 @@ # %% # Reality check: Check if PC1 is just "Cell Size": -sc.pl.pca(adata, color=['total_counts', 'cell_type'], components=['1,2']) +sc.pl.pca(adata, color=['total_counts', 'cell_type'], components=['1,2'], show=False) +plt.savefig(figure_dir / 'pca_cell_type.png', dpi=300, bbox_inches='tight') +plt.close() # %% [markdown] # PC1 separates cell types and isn't driven only by the number of cells. @@ -552,7 +565,9 @@ sc.tl.umap(adata, init_pos='X_pca_harmony') # %% -sc.pl.umap(adata, color='total_counts') +sc.pl.umap(adata, color='total_counts', show=False) +plt.savefig(figure_dir / 'umap_total_counts.png', dpi=300, bbox_inches='tight') +plt.close() # %% [markdown] @@ -574,7 +589,9 @@ # %% # Plot UMAP colored by Donor (to check integration) and Clusters -sc.pl.umap(adata, color=['cell_type', 'leiden_1.0'], wspace=0.3) +sc.pl.umap(adata, color=['cell_type', 'leiden_1.0'], wspace=0.3, show=False) +plt.savefig(figure_dir / 'umap_cell_type_leiden.png', dpi=300, bbox_inches='tight') +plt.close() # %% # compute overlap between clusters and cell types @@ -701,7 +718,9 @@ def create_pseudobulk( # Optional: Visualize the 'depth' of your new pseudobulk samples pb_adata.obs['total_counts'] = np.array(pb_adata.X.sum(axis=1)).flatten() -sc.pl.violin(pb_adata, ['n_cells', 'total_counts'], multi_panel=True) +sc.pl.violin(pb_adata, ['n_cells', 'total_counts'], multi_panel=True, show=False) +plt.savefig(figure_dir / 'pseudobulk_violin.png', dpi=300, bbox_inches='tight') +plt.close() # %% [markdown] # ### Differential expression with age @@ -721,7 +740,6 @@ def create_pseudobulk( # %% - # Assume pb_adata is your pseudobulk object from the previous step # 1. Extract counts and metadata counts_df = pd.DataFrame( @@ -924,7 +942,8 @@ def create_pseudobulk( plt.grid(axis='x', alpha=0.3) plt.tight_layout() -plt.show() +plt.savefig(figure_dir / 'gsea_pathways.png', dpi=300, bbox_inches='tight') +plt.close() # %% [markdown] # ### Enrichr analysis for overrepresentation @@ -1027,7 +1046,8 @@ def create_pseudobulk( plt.grid(axis='x', alpha=0.3) plt.tight_layout() -plt.show() +plt.savefig(figure_dir / 'enrichr_pathways.png', dpi=300, bbox_inches='tight') +plt.close() # %% [markdown] # ### Age prediction from gene expression @@ -1134,7 +1154,8 @@ def create_pseudobulk( plt.legend() plt.grid(alpha=0.3) plt.tight_layout() -plt.show() +plt.savefig(figure_dir / 'age_prediction_performance.png', dpi=300, bbox_inches='tight') +plt.close() # %% [markdown] # #### Baseline model: Sex only From 678983e1c337b6a23b0f35cfb974a87587cfd13e Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 08:57:51 -0800 Subject: [PATCH 35/87] initial add --- refactor_monolithic_to_modular.md | 27 ++ .../rnaseq/modular_workflow/__init__.py | 0 .../rnaseq/modular_workflow/clustering.py | 137 +++++++ .../rnaseq/modular_workflow/data_filtering.py | 246 +++++++++++++ .../rnaseq/modular_workflow/data_loading.py | 77 ++++ .../differential_expression.py | 268 ++++++++++++++ .../dimensionality_reduction.py | 182 ++++++++++ .../overrepresentation_analysis.py | 262 ++++++++++++++ .../modular_workflow/pathway_analysis.py | 249 +++++++++++++ .../modular_workflow/predictive_modeling.py | 337 ++++++++++++++++++ .../rnaseq/modular_workflow/preprocessing.py | 210 +++++++++++ .../rnaseq/modular_workflow/pseudobulk.py | 207 +++++++++++ .../modular_workflow/quality_control.py | 322 +++++++++++++++++ .../rnaseq/modular_workflow/run_workflow.py | 314 ++++++++++++++++ 14 files changed, 2838 insertions(+) create mode 100644 refactor_monolithic_to_modular.md create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/__init__.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/clustering.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/data_filtering.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/data_loading.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/differential_expression.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/dimensionality_reduction.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/overrepresentation_analysis.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/pathway_analysis.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/predictive_modeling.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/preprocessing.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/pseudobulk.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py create mode 100644 src/BetterCodeBetterScience/rnaseq/modular_workflow/run_workflow.py diff --git a/refactor_monolithic_to_modular.md b/refactor_monolithic_to_modular.md new file mode 100644 index 0000000..a6ea614 --- /dev/null +++ b/refactor_monolithic_to_modular.md @@ -0,0 +1,27 @@ +Prompt: please read CLAUDE.md for guidelines, and then read refactor_monolithic_to_modular.md for a description of your task. + +# Goal + +src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py is currently a single monolithic script for a data analysis workflow. I would like to refactor it into a modular script based on the following decomposition of the workflow: + +- Data (down)loading +- Data filtering (removing subjects or cell types with insufficient observations) +- Quality control + - identifying bad cells on the basis of mitochondrial, ribosomal, or hemoglobin genes or hemoglobin contamination + - identifying "doublets" (multiple cells identified as one) +- Preprocessing + - Count normalization + - Log transformation + - Identification of high-variance features + - Filtering of nuisance genes +- Dimensionality reduction +- UMAP generation +- Clustering +- Pseudobulking +- Differential expression analysis +- Pathway enrichment analysis (GSEA) +- Overrepresentation analysis (Enrichr) +- Predictive modeling + +Please generate a new set of scripts within a new directory called `src/BetterCodeBetterScience/rnaseq/modular_workflow` that implements the same workflow in a modular way. + diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/__init__.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/clustering.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/clustering.py new file mode 100644 index 0000000..d707a7e --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/clustering.py @@ -0,0 +1,137 @@ +"""Clustering module for scRNA-seq analysis workflow. + +Functions for cell clustering using Leiden algorithm. +""" + +from pathlib import Path + +import anndata as ad +import matplotlib.pyplot as plt +import pandas as pd +import scanpy as sc + + +def run_leiden_clustering( + adata: ad.AnnData, + resolution: float = 1.0, + key_added: str = "leiden_1.0", + flavor: str = "igraph", + n_iterations: int = 2, +) -> ad.AnnData: + """Run Leiden clustering algorithm. + + Parameters + ---------- + adata : AnnData + AnnData object with neighbor graph + resolution : float + Resolution parameter for clustering + key_added : str + Key to store cluster assignments + flavor : str + Implementation flavor + n_iterations : int + Number of iterations + + Returns + ------- + AnnData + AnnData with cluster assignments + """ + sc.tl.leiden( + adata, + resolution=resolution, + key_added=key_added, + flavor=flavor, + n_iterations=n_iterations, + ) + return adata + + +def plot_clusters( + adata: ad.AnnData, + cluster_key: str = "leiden_1.0", + cell_type_key: str = "cell_type", + figure_dir: Path | None = None, +) -> None: + """Plot UMAP colored by clusters and cell types. + + Parameters + ---------- + adata : AnnData + AnnData object with UMAP and clusters + cluster_key : str + Key for cluster assignments + cell_type_key : str + Key for cell type annotations + figure_dir : Path, optional + Directory to save figures + """ + sc.pl.umap(adata, color=[cell_type_key, cluster_key], wspace=0.3, show=False) + if figure_dir is not None: + plt.savefig( + figure_dir / "umap_cell_type_leiden.png", dpi=300, bbox_inches="tight" + ) + plt.close() + + +def compute_cluster_celltype_overlap( + adata: ad.AnnData, + cluster_key: str = "leiden_1.0", + cell_type_key: str = "cell_type", +) -> pd.DataFrame: + """Compute contingency table between clusters and cell types. + + Parameters + ---------- + adata : AnnData + AnnData object with clusters and cell types + cluster_key : str + Key for cluster assignments + cell_type_key : str + Key for cell type annotations + + Returns + ------- + pd.DataFrame + Contingency table + """ + contingency_table = pd.crosstab(adata.obs[cluster_key], adata.obs[cell_type_key]) + return contingency_table + + +def run_clustering_pipeline( + adata: ad.AnnData, + resolution: float = 1.0, + figure_dir: Path | None = None, +) -> ad.AnnData: + """Run complete clustering pipeline. + + Parameters + ---------- + adata : AnnData + Input AnnData object with UMAP computed + resolution : float + Leiden resolution parameter + figure_dir : Path, optional + Directory to save figures + + Returns + ------- + AnnData + AnnData with cluster assignments + """ + cluster_key = f"leiden_{resolution}" + + # Run Leiden clustering + adata = run_leiden_clustering(adata, resolution, cluster_key) + + # Plot clusters + plot_clusters(adata, cluster_key, figure_dir=figure_dir) + + # Compute and print overlap + contingency = compute_cluster_celltype_overlap(adata, cluster_key) + print("Cluster-Cell Type Contingency Table:") + print(contingency) + + return adata diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/data_filtering.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/data_filtering.py new file mode 100644 index 0000000..2bc75d6 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/data_filtering.py @@ -0,0 +1,246 @@ +"""Data filtering module for scRNA-seq analysis workflow. + +Functions for filtering donors and cell types with insufficient observations. +""" + +from pathlib import Path + +import anndata as ad +import matplotlib.pyplot as plt +import pandas as pd +import scanpy as sc +from scipy.stats import scoreatpercentile + + +def compute_donor_cell_counts(adata: ad.AnnData) -> pd.Series: + """Calculate how many cells each donor has. + + Parameters + ---------- + adata : AnnData + AnnData object with 'donor_id' in obs + + Returns + ------- + pd.Series + Cell counts per donor + """ + return pd.Series(adata.obs["donor_id"]).value_counts() + + +def plot_donor_cell_distribution( + donor_cell_counts: pd.Series, + cutoff_percentile: float = 1.0, + figure_dir: Path | None = None, +) -> int: + """Plot distribution of cells per donor and determine cutoff. + + Parameters + ---------- + donor_cell_counts : pd.Series + Cell counts per donor + cutoff_percentile : float + Percentile to use as cutoff (default: 1.0) + figure_dir : Path, optional + Directory to save figure + + Returns + ------- + int + Minimum cells per donor cutoff + """ + min_cells_per_donor = int( + scoreatpercentile(donor_cell_counts.values, cutoff_percentile) + ) + + plt.figure(figsize=(10, 6)) + plt.hist(donor_cell_counts.values, bins=50, color="skyblue", edgecolor="black") + plt.title("Distribution of Total Cells per Donor") + plt.xlabel("Number of Cells Captured") + plt.ylabel("Number of Donors") + plt.grid(axis="y", alpha=0.5) + + print( + f"cutoff of {min_cells_per_donor} would exclude " + f"{(donor_cell_counts < min_cells_per_donor).sum()} donors" + ) + plt.axvline( + min_cells_per_donor, + color="red", + linestyle="dashed", + linewidth=1, + label=f"Cutoff ({min_cells_per_donor} cells)", + ) + plt.legend() + + if figure_dir is not None: + plt.savefig( + figure_dir / "donor_cell_counts_distribution.png", + dpi=300, + bbox_inches="tight", + ) + plt.close() + + return min_cells_per_donor + + +def filter_donors_by_cell_count( + adata: ad.AnnData, min_cells_per_donor: int +) -> ad.AnnData: + """Filter to keep only donors with sufficient cells. + + Parameters + ---------- + adata : AnnData + AnnData object + min_cells_per_donor : int + Minimum cells required per donor + + Returns + ------- + AnnData + Filtered AnnData object + """ + donor_cell_counts = compute_donor_cell_counts(adata) + print(f"Filtering to keep only donors with at least {min_cells_per_donor} cells.") + print( + f"Number of donors excluded: {(donor_cell_counts < min_cells_per_donor).sum()}" + ) + valid_donors = donor_cell_counts[donor_cell_counts >= min_cells_per_donor].index + filtered = adata[adata.obs["donor_id"].isin(valid_donors)] + print(f"Number of donors after filtering: {len(valid_donors)}") + return filtered + + +def filter_cell_types_by_frequency( + adata: ad.AnnData, min_cells: int = 10, percent_donors: float = 0.95 +) -> ad.AnnData: + """Filter cell types that don't have sufficient observations. + + Keep cell types with at least min_cells in at least percent_donors of donors. + + Parameters + ---------- + adata : AnnData + AnnData object + min_cells : int + Minimum cells per cell type per donor + percent_donors : float + Fraction of donors that must meet the min_cells threshold + + Returns + ------- + AnnData + Filtered AnnData object + """ + counts_per_donor = pd.crosstab(adata.obs["donor_id"], adata.obs["cell_type"]) + donor_count = counts_per_donor.shape[0] + + cell_types_to_keep = counts_per_donor.columns[ + (counts_per_donor >= min_cells).sum(axis=0) >= (donor_count * percent_donors) + ] + + print( + f"Keeping {len(cell_types_to_keep)} cell types out of " + f"{len(counts_per_donor.columns)}" + ) + print(f"Cell types to keep: {cell_types_to_keep.tolist()}") + + return adata[adata.obs["cell_type"].isin(cell_types_to_keep)] + + +def filter_donors_with_missing_cell_types( + adata: ad.AnnData, min_cells: int = 10 +) -> ad.AnnData: + """Filter donors who have zeros in any remaining cell types. + + Parameters + ---------- + adata : AnnData + AnnData object + min_cells : int + Minimum cells per cell type per donor + + Returns + ------- + AnnData + Filtered AnnData object + """ + donor_celltype_counts = pd.crosstab(adata.obs["donor_id"], adata.obs["cell_type"]) + valid_donors = donor_celltype_counts.index[ + (donor_celltype_counts >= min_cells).all(axis=1) + ] + filtered = adata[adata.obs["donor_id"].isin(valid_donors)] + print(f"Final number of donors after filtering: {len(valid_donors)}") + return filtered + + +def load_to_memory_and_filter_genes(adata: ad.AnnData) -> ad.AnnData: + """Load lazy AnnData to memory and filter zero-count genes. + + Parameters + ---------- + adata : AnnData + Lazy AnnData object + + Returns + ------- + AnnData + In-memory AnnData with filtered genes + """ + print("Loading data into memory (this can take a few minutes)...") + adata_loaded = adata.to_memory() + + print("Filtering genes with zero counts...") + sc.pp.filter_genes(adata_loaded, min_counts=1) + + return adata_loaded + + +def run_filtering_pipeline( + adata: ad.AnnData, + cutoff_percentile: float = 1.0, + min_cells_per_celltype: int = 10, + percent_donors: float = 0.95, + figure_dir: Path | None = None, +) -> ad.AnnData: + """Run complete filtering pipeline. + + Parameters + ---------- + adata : AnnData + Input AnnData object (can be lazy) + cutoff_percentile : float + Percentile for donor cell count cutoff + min_cells_per_celltype : int + Minimum cells per cell type per donor + percent_donors : float + Fraction of donors that must meet cell count threshold + figure_dir : Path, optional + Directory to save figures + + Returns + ------- + AnnData + Filtered and loaded AnnData object + """ + # Step 1: Filter donors by total cell count + donor_counts = compute_donor_cell_counts(adata) + min_cells = plot_donor_cell_distribution( + donor_counts, cutoff_percentile, figure_dir + ) + adata = filter_donors_by_cell_count(adata, min_cells) + + # Step 2: Filter cell types by frequency + adata = filter_cell_types_by_frequency( + adata, min_cells_per_celltype, percent_donors + ) + + # Step 3: Filter donors with missing cell types + adata = filter_donors_with_missing_cell_types(adata, min_cells_per_celltype) + + # Step 4: Load to memory and filter genes + adata = load_to_memory_and_filter_genes(adata) + + print(f"Final dataset shape: {adata.shape}") + return adata diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/data_loading.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/data_loading.py new file mode 100644 index 0000000..491784c --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/data_loading.py @@ -0,0 +1,77 @@ +"""Data loading module for scRNA-seq analysis workflow. + +Functions for downloading and loading single-cell RNA-seq data. +""" + +import os +from pathlib import Path + +import anndata as ad +import h5py +from anndata.experimental import read_lazy + + +def download_data(datafile: Path, url: str) -> None: + """Download data file if it doesn't exist. + + Parameters + ---------- + datafile : Path + Path to save the downloaded file + url : str + URL to download from + """ + if not datafile.exists(): + cmd = f"wget -O {datafile.as_posix()} {url}" + print(f"Downloading data from {url} to {datafile.as_posix()}") + os.system(cmd) + + +def load_lazy_anndata(datafile: Path, load_annotation_index: bool = True) -> ad.AnnData: + """Load AnnData object lazily from h5ad file. + + Parameters + ---------- + datafile : Path + Path to h5ad file + load_annotation_index : bool + Whether to load annotation index + + Returns + ------- + AnnData + Lazily loaded AnnData object + """ + adata = read_lazy( + h5py.File(datafile, "r"), load_annotation_index=load_annotation_index + ) + return adata + + +def load_anndata(datafile: Path) -> ad.AnnData: + """Load AnnData object from h5ad file. + + Parameters + ---------- + datafile : Path + Path to h5ad file + + Returns + ------- + AnnData + AnnData object + """ + return ad.read_h5ad(datafile) + + +def save_anndata(adata: ad.AnnData, filepath: Path) -> None: + """Save AnnData object to h5ad file. + + Parameters + ---------- + adata : AnnData + AnnData object to save + filepath : Path + Path to save the file + """ + adata.write(filepath) diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/differential_expression.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/differential_expression.py new file mode 100644 index 0000000..ec257a7 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/differential_expression.py @@ -0,0 +1,268 @@ +"""Differential expression module for scRNA-seq analysis workflow. + +Functions for running DESeq2-based differential expression analysis. +""" + +import anndata as ad +import numpy as np +import pandas as pd +from pydeseq2.dds import DeseqDataSet +from pydeseq2.ds import DeseqStats +from sklearn.preprocessing import StandardScaler + + +def extract_age_from_development_stage(pb_adata: ad.AnnData) -> ad.AnnData: + """Extract numeric age from development_stage column. + + Parameters + ---------- + pb_adata : AnnData + Pseudobulk AnnData with 'development_stage' in obs + + Returns + ------- + AnnData + AnnData with 'age' column added to obs + """ + ages = ( + pb_adata.obs["development_stage"].str.extract(r"(\d+)-year-old").astype(float) + ) + pb_adata.obs["age"] = ages + return pb_adata + + +def prepare_deseq_inputs( + pb_adata: ad.AnnData, + var_to_feature: dict | None = None, +) -> tuple[pd.DataFrame, pd.DataFrame]: + """Prepare counts and metadata for DESeq2. + + Parameters + ---------- + pb_adata : AnnData + Pseudobulk AnnData object + var_to_feature : dict, optional + Mapping from var_names to feature names + + Returns + ------- + tuple[pd.DataFrame, pd.DataFrame] + Counts dataframe and metadata dataframe + """ + # Extract counts + if var_to_feature is not None: + columns = [var_to_feature.get(var, var) for var in pb_adata.var_names] + else: + columns = pb_adata.var_names.tolist() + + counts_df = pd.DataFrame( + pb_adata.X.toarray(), + index=pb_adata.obs_names, + columns=columns, + ) + # Remove duplicate columns + counts_df = counts_df.loc[:, ~counts_df.columns.duplicated()] + + # Extract metadata + metadata = pb_adata.obs.copy() + + # Scale continuous variables + if "age" in metadata.columns: + scaler = StandardScaler() + metadata["age_scaled"] = scaler.fit_transform(metadata[["age"]]).flatten() + metadata["age_scaled"] = metadata["age_scaled"].astype(float) + print("Age scaling applied:") + print(metadata[["age", "age_scaled"]].head()) + + return counts_df, metadata + + +def subset_by_cell_type( + pb_adata: ad.AnnData, + counts_df: pd.DataFrame, + metadata: pd.DataFrame, + cell_type: str, +) -> tuple[ad.AnnData, pd.DataFrame, pd.DataFrame]: + """Subset data to a specific cell type. + + Parameters + ---------- + pb_adata : AnnData + Pseudobulk AnnData object + counts_df : pd.DataFrame + Counts dataframe + metadata : pd.DataFrame + Metadata dataframe + cell_type : str + Cell type to subset to + + Returns + ------- + tuple[AnnData, pd.DataFrame, pd.DataFrame] + Subsetted AnnData, counts, and metadata + """ + pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type].copy() + counts_df_ct = counts_df.loc[pb_adata_ct.obs_names].copy() + metadata_ct = metadata.loc[pb_adata_ct.obs_names].copy() + + return pb_adata_ct, counts_df_ct, metadata_ct + + +def run_deseq2( + counts_df: pd.DataFrame, + metadata: pd.DataFrame, + design_factors: list[str], + n_cpus: int = 2, +) -> DeseqDataSet: + """Initialize and fit DESeq2 model. + + Parameters + ---------- + counts_df : pd.DataFrame + Counts dataframe + metadata : pd.DataFrame + Metadata dataframe + design_factors : list[str] + Design factors for the model + n_cpus : int + Number of CPUs for parallel processing + + Returns + ------- + DeseqDataSet + Fitted DESeq2 dataset + """ + # Validate required columns + for factor in design_factors: + assert factor in metadata.columns, f"{factor} column missing in metadata" + + # Initialize and fit + dds = DeseqDataSet( + counts=counts_df, + metadata=metadata, + design_factors=design_factors, + refit_cooks=True, + n_cpus=n_cpus, + ) + dds.deseq2() + + return dds + + +def run_wald_test( + dds: DeseqDataSet, + contrast: np.ndarray | None = None, +) -> DeseqStats: + """Run Wald test for differential expression. + + Parameters + ---------- + dds : DeseqDataSet + Fitted DESeq2 dataset + contrast : np.ndarray, optional + Contrast vector for the test + + Returns + ------- + DeseqStats + Statistics results object + """ + model_vars = dds.varm["LFC"].columns + print(f"Model variables: {model_vars.tolist()}") + + if contrast is None: + # Default: test second variable (typically age_scaled) + contrast = np.zeros(len(model_vars)) + contrast[1] = 1 + + print(f"Contrast: {contrast}") + + stat_res = DeseqStats(dds, contrast=contrast) + stat_res.summary() + stat_res.run_wald_test() + + return stat_res + + +def get_significant_genes( + stat_res: DeseqStats, + padj_threshold: float = 0.05, +) -> pd.DataFrame: + """Extract significant genes from DESeq2 results. + + Parameters + ---------- + stat_res : DeseqStats + DESeq2 statistics results + padj_threshold : float + Adjusted p-value threshold + + Returns + ------- + pd.DataFrame + Significant genes sorted by log2 fold change + """ + res = stat_res.results_df + sigs = res[res["padj"] < padj_threshold] + sigs = sigs.sort_values("log2FoldChange", ascending=False) + + print(f"Found {len(sigs)} significant genes.") + print(sigs[["log2FoldChange", "padj"]].head()) + + return sigs + + +def run_differential_expression_pipeline( + pb_adata: ad.AnnData, + cell_type: str, + design_factors: list[str] | None = None, + var_to_feature: dict | None = None, + n_cpus: int = 8, +) -> tuple[DeseqStats, pd.DataFrame, pd.DataFrame]: + """Run complete differential expression pipeline for a cell type. + + Parameters + ---------- + pb_adata : AnnData + Pseudobulk AnnData object + cell_type : str + Cell type to analyze + design_factors : list[str], optional + Design factors (default: ['age_scaled', 'sex']) + var_to_feature : dict, optional + Mapping from var_names to feature names + n_cpus : int + Number of CPUs + + Returns + ------- + tuple[DeseqStats, pd.DataFrame, pd.DataFrame] + Statistics results, full results dataframe, and counts dataframe + """ + if design_factors is None: + design_factors = ["age_scaled", "sex"] + + # Extract age + pb_adata = extract_age_from_development_stage(pb_adata) + + # Prepare inputs + counts_df, metadata = prepare_deseq_inputs(pb_adata, var_to_feature) + + # Subset to cell type + _, counts_df_ct, metadata_ct = subset_by_cell_type( + pb_adata, counts_df, metadata, cell_type + ) + + print(f"Running DE analysis for cell type: {cell_type}") + print(f"Number of samples: {len(counts_df_ct)}") + + # Run DESeq2 + dds = run_deseq2(counts_df_ct, metadata_ct, design_factors, n_cpus) + + # Run Wald test + stat_res = run_wald_test(dds) + + # Get significant genes + _ = get_significant_genes(stat_res) + + return stat_res, stat_res.results_df, counts_df_ct diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/dimensionality_reduction.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/dimensionality_reduction.py new file mode 100644 index 0000000..ddd7146 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/dimensionality_reduction.py @@ -0,0 +1,182 @@ +"""Dimensionality reduction module for scRNA-seq analysis workflow. + +Functions for batch correction, neighbor computation, and UMAP generation. +""" + +from pathlib import Path + +import anndata as ad +import matplotlib.pyplot as plt +import scanpy as sc +import scanpy.external as sce + + +def run_harmony_integration( + adata: ad.AnnData, + batch_key: str = "donor_id", + basis: str = "X_pca", + adjusted_basis: str = "X_pca_harmony", +) -> tuple[ad.AnnData, str]: + """Run Harmony batch correction on PCA coordinates. + + Parameters + ---------- + adata : AnnData + AnnData object with PCA computed + batch_key : str + Column name for batch variable + basis : str + Name of PCA coordinates + adjusted_basis : str + Name for corrected coordinates + + Returns + ------- + tuple[AnnData, str] + AnnData with Harmony results and the representation to use + """ + try: + sce.pp.harmony_integrate( + adata, key=batch_key, basis=basis, adjusted_basis=adjusted_basis + ) + use_rep = adjusted_basis + print("Harmony integration successful. Using corrected PCA.") + except ImportError: + print( + "Harmony not installed. Proceeding with standard PCA " + "(Warning: Batch effects may persist)." + ) + print("To install: pip install harmony-pytorch") + use_rep = basis + + return adata, use_rep + + +def plot_pca_qc(adata: ad.AnnData, figure_dir: Path | None = None) -> None: + """Plot PCA colored by total counts and cell type. + + Parameters + ---------- + adata : AnnData + AnnData object with PCA computed + figure_dir : Path, optional + Directory to save figures + """ + sc.pl.pca( + adata, color=["total_counts", "cell_type"], components=["1,2"], show=False + ) + if figure_dir is not None: + plt.savefig(figure_dir / "pca_cell_type.png", dpi=300, bbox_inches="tight") + plt.close() + + +def compute_neighbors( + adata: ad.AnnData, + n_neighbors: int = 30, + n_pcs: int = 40, + use_rep: str = "X_pca_harmony", +) -> ad.AnnData: + """Compute neighborhood graph. + + Parameters + ---------- + adata : AnnData + AnnData object + n_neighbors : int + Number of neighbors + n_pcs : int + Number of PCs to use + use_rep : str + Representation to use + + Returns + ------- + AnnData + AnnData with neighbor graph + """ + sc.pp.neighbors(adata, n_neighbors=n_neighbors, n_pcs=n_pcs, use_rep=use_rep) + return adata + + +def compute_umap( + adata: ad.AnnData, + init_pos: str = "X_pca_harmony", +) -> ad.AnnData: + """Compute UMAP embedding. + + Parameters + ---------- + adata : AnnData + AnnData object with neighbor graph + init_pos : str + Initial position for UMAP + + Returns + ------- + AnnData + AnnData with UMAP coordinates + """ + sc.tl.umap(adata, init_pos=init_pos) + return adata + + +def plot_umap_qc(adata: ad.AnnData, figure_dir: Path | None = None) -> None: + """Plot UMAP colored by total counts. + + Parameters + ---------- + adata : AnnData + AnnData object with UMAP + figure_dir : Path, optional + Directory to save figures + """ + sc.pl.umap(adata, color="total_counts", show=False) + if figure_dir is not None: + plt.savefig(figure_dir / "umap_total_counts.png", dpi=300, bbox_inches="tight") + plt.close() + + +def run_dimensionality_reduction_pipeline( + adata: ad.AnnData, + batch_key: str = "donor_id", + n_neighbors: int = 30, + n_pcs: int = 40, + figure_dir: Path | None = None, +) -> ad.AnnData: + """Run complete dimensionality reduction pipeline. + + Parameters + ---------- + adata : AnnData + Input AnnData object with PCA computed + batch_key : str + Column for batch correction + n_neighbors : int + Number of neighbors for graph + n_pcs : int + Number of PCs to use + figure_dir : Path, optional + Directory to save figures + + Returns + ------- + AnnData + AnnData with UMAP coordinates + """ + # Run Harmony integration + adata, use_rep = run_harmony_integration(adata, batch_key) + + # Plot PCA QC + plot_pca_qc(adata, figure_dir) + + # Compute neighbors + adata = compute_neighbors(adata, n_neighbors, n_pcs, use_rep) + + # Compute UMAP + init_pos = use_rep if use_rep == "X_pca_harmony" else "spectral" + adata = compute_umap(adata, init_pos) + + # Plot UMAP QC + plot_umap_qc(adata, figure_dir) + + return adata diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/overrepresentation_analysis.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/overrepresentation_analysis.py new file mode 100644 index 0000000..9aa6dd2 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/overrepresentation_analysis.py @@ -0,0 +1,262 @@ +"""Overrepresentation analysis module for scRNA-seq analysis workflow. + +Functions for Enrichr-based overrepresentation analysis. +""" + +from pathlib import Path + +import gseapy as gp +import matplotlib.pyplot as plt +import numpy as np +import pandas as pd +import seaborn as sns + + +def get_significant_gene_lists( + results_df: pd.DataFrame, + padj_threshold: float = 0.05, +) -> tuple[list[str], list[str]]: + """Extract lists of up and down regulated genes. + + Parameters + ---------- + results_df : pd.DataFrame + DESeq2 results dataframe + padj_threshold : float + Adjusted p-value threshold + + Returns + ------- + tuple[list[str], list[str]] + Lists of upregulated and downregulated genes + """ + up_genes = results_df[ + (results_df["padj"] < padj_threshold) & (results_df["log2FoldChange"] > 0) + ].index.tolist() + + down_genes = results_df[ + (results_df["padj"] < padj_threshold) & (results_df["log2FoldChange"] < 0) + ].index.tolist() + + print( + f"Analyzing {len(up_genes)} upregulated and " + f"{len(down_genes)} downregulated genes." + ) + + return up_genes, down_genes + + +def run_enrichr( + gene_list: list[str], + gene_sets: list[str] | None = None, + organism: str = "human", +) -> gp.Enrichr | None: + """Run Enrichr overrepresentation analysis. + + Parameters + ---------- + gene_list : list[str] + List of gene names + gene_sets : list[str], optional + Gene set databases to use + organism : str + Organism name + + Returns + ------- + gp.Enrichr or None + Enrichr results object or None if no genes + """ + if len(gene_list) == 0: + print("No genes to analyze.") + return None + + if gene_sets is None: + gene_sets = ["MSigDB_Hallmark_2020"] + + enr = gp.enrichr( + gene_list=gene_list, + gene_sets=gene_sets, + organism=organism, + outdir=None, + ) + + return enr + + +def run_enrichr_both_directions( + results_df: pd.DataFrame, + gene_sets: list[str] | None = None, + padj_threshold: float = 0.05, +) -> tuple[gp.Enrichr | None, gp.Enrichr | None]: + """Run Enrichr for both up and down regulated genes. + + Parameters + ---------- + results_df : pd.DataFrame + DESeq2 results dataframe + gene_sets : list[str], optional + Gene set databases + padj_threshold : float + Adjusted p-value threshold + + Returns + ------- + tuple[gp.Enrichr, gp.Enrichr] + Enrichr results for up and down genes + """ + up_genes, down_genes = get_significant_gene_lists(results_df, padj_threshold) + + enr_up = None + enr_down = None + + if len(up_genes) > 0: + enr_up = run_enrichr(up_genes, gene_sets) + if enr_up is not None: + print("Upregulated Pathways:") + print(enr_up.results[["Term", "Adjusted P-value", "Overlap"]].head(10)) + + if len(down_genes) > 0: + enr_down = run_enrichr(down_genes, gene_sets) + if enr_down is not None: + print("Downregulated Pathways:") + print(enr_down.results[["Term", "Adjusted P-value", "Overlap"]].head(10)) + + return enr_up, enr_down + + +def prepare_enrichr_plot_data( + enr_up: gp.Enrichr | None, + enr_down: gp.Enrichr | None, + n_top: int = 10, +) -> pd.DataFrame | None: + """Prepare Enrichr results for plotting. + + Parameters + ---------- + enr_up : gp.Enrichr + Enrichr results for upregulated genes + enr_down : gp.Enrichr + Enrichr results for downregulated genes + n_top : int + Number of top pathways per direction + + Returns + ------- + pd.DataFrame or None + Combined dataframe for plotting + """ + dfs = [] + + if enr_up is not None: + up_res = enr_up.results.copy() + up_res["Direction"] = "Upregulated" + up_res["Color"] = "Red" + top_up = up_res.sort_values("Adjusted P-value").head(n_top) + dfs.append(top_up) + + if enr_down is not None: + down_res = enr_down.results.copy() + down_res["Direction"] = "Downregulated" + down_res["Color"] = "Blue" + top_down = down_res.sort_values("Adjusted P-value").head(n_top) + dfs.append(top_down) + + if not dfs: + return None + + combined = pd.concat(dfs) + + # Compute -log10(P-value) + combined["log_p"] = -np.log10(combined["Adjusted P-value"]) + + # Extract gene count from Overlap (e.g., "5/200" -> 5) + combined["Gene_Count"] = combined["Overlap"].apply(lambda x: int(x.split("/")[0])) + + return combined + + +def plot_enrichr_results( + combined: pd.DataFrame, + figure_dir: Path | None = None, + title: str = "Top Enriched Pathways (Up vs Down)", +) -> None: + """Plot Enrichr results as dot plot. + + Parameters + ---------- + combined : pd.DataFrame + Prepared Enrichr dataframe + figure_dir : Path, optional + Directory to save figures + title : str + Plot title + """ + print(f"Plotting {len(combined)} pathways.") + + plt.figure(figsize=(10, 8)) + + sns.scatterplot( + data=combined, + x="log_p", + y="Term", + hue="Direction", + size="Gene_Count", + palette={"Upregulated": "#E41A1C", "Downregulated": "#377EB8"}, + sizes=(50, 400), + alpha=0.8, + ) + + plt.title(title, fontsize=14) + plt.xlabel("-log10(Adjusted P-value)", fontsize=12) + plt.ylabel("") + plt.axvline( + -np.log10(0.05), color="gray", linestyle="--", alpha=0.5, label="p=0.05" + ) + plt.legend(bbox_to_anchor=(1.05, 1), loc="upper left", borderaxespad=0.0) + plt.grid(axis="x", alpha=0.3) + plt.tight_layout() + + if figure_dir is not None: + plt.savefig(figure_dir / "enrichr_pathways.png", dpi=300, bbox_inches="tight") + plt.close() + + +def run_overrepresentation_pipeline( + results_df: pd.DataFrame, + gene_sets: list[str] | None = None, + padj_threshold: float = 0.05, + n_top: int = 10, + figure_dir: Path | None = None, +) -> tuple[gp.Enrichr | None, gp.Enrichr | None]: + """Run complete overrepresentation analysis pipeline. + + Parameters + ---------- + results_df : pd.DataFrame + DESeq2 results dataframe + gene_sets : list[str], optional + Gene set databases + padj_threshold : float + Adjusted p-value threshold + n_top : int + Number of top pathways to show + figure_dir : Path, optional + Directory to save figures + + Returns + ------- + tuple[gp.Enrichr, gp.Enrichr] + Enrichr results for up and down genes + """ + # Run Enrichr + enr_up, enr_down = run_enrichr_both_directions( + results_df, gene_sets, padj_threshold + ) + + # Prepare and plot + combined = prepare_enrichr_plot_data(enr_up, enr_down, n_top) + if combined is not None: + plot_enrichr_results(combined, figure_dir) + + return enr_up, enr_down diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/pathway_analysis.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/pathway_analysis.py new file mode 100644 index 0000000..bdba7a5 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/pathway_analysis.py @@ -0,0 +1,249 @@ +"""Pathway analysis module for scRNA-seq analysis workflow. + +Functions for Gene Set Enrichment Analysis (GSEA). +""" + +from pathlib import Path + +import gseapy as gp +import matplotlib.pyplot as plt +import numpy as np +import pandas as pd +import seaborn as sns + + +def prepare_ranked_list(results_df: pd.DataFrame) -> pd.DataFrame: + """Prepare ranked gene list for GSEA from DE results. + + Parameters + ---------- + results_df : pd.DataFrame + DESeq2 results dataframe with 'stat' column + + Returns + ------- + pd.DataFrame + Ranked gene list sorted by statistic + """ + rank_df = results_df[["stat"]].dropna().sort_values("stat", ascending=False) + return rank_df + + +def run_gsea_prerank( + rank_df: pd.DataFrame, + gene_sets: list[str] | None = None, + min_size: int = 10, + max_size: int = 1000, + permutation_num: int = 1000, + threads: int = 4, + seed: int = 42, +) -> gp.GSEA: + """Run GSEA preranked analysis. + + Parameters + ---------- + rank_df : pd.DataFrame + Ranked gene list + gene_sets : list[str], optional + Gene set databases to use + min_size : int + Minimum genes in pathway + max_size : int + Maximum genes in pathway + permutation_num : int + Number of permutations + threads : int + Number of threads + seed : int + Random seed + + Returns + ------- + gp.GSEA + GSEA results object + """ + if gene_sets is None: + gene_sets = ["MSigDB_Hallmark_2020"] + + prerank_res = gp.prerank( + rnk=rank_df, + gene_sets=gene_sets, + threads=threads, + min_size=min_size, + max_size=max_size, + permutation_num=permutation_num, + seed=seed, + ) + + return prerank_res + + +def get_gsea_top_terms( + prerank_res: gp.GSEA, + n_top: int = 10, +) -> tuple[pd.DataFrame, pd.DataFrame]: + """Get top upregulated and downregulated pathways. + + Parameters + ---------- + prerank_res : gp.GSEA + GSEA results object + n_top : int + Number of top terms to return + + Returns + ------- + tuple[pd.DataFrame, pd.DataFrame] + Top upregulated and downregulated pathways + """ + terms = prerank_res.res2d.sort_values("NES", ascending=False) + + print("Top Upregulated Pathways:") + print(terms[["Term", "NES", "FDR q-val"]].head(n_top)) + + print("\nTop Downregulated Pathways:") + print(terms[["Term", "NES", "FDR q-val"]].tail(n_top)) + + top_up = terms.head(n_top) + top_down = terms.tail(n_top) + + return top_up, top_down + + +def prepare_gsea_plot_data( + prerank_res: gp.GSEA, + n_top: int = 10, + label_prefix: str = "MSigDB_Hallmark_2020__", +) -> pd.DataFrame: + """Prepare GSEA results for plotting. + + Parameters + ---------- + prerank_res : gp.GSEA + GSEA results object + n_top : int + Number of top terms per direction + label_prefix : str + Prefix to remove from term names + + Returns + ------- + pd.DataFrame + Prepared dataframe for plotting + """ + gsea_df = prerank_res.res2d.copy() + gsea_df = gsea_df.sort_values("NES", ascending=False) + + top_up = gsea_df.head(n_top).copy() + top_down = gsea_df.tail(n_top).copy() + combined = pd.concat([top_up, top_down]) + + # Add direction + combined["Direction"] = combined["NES"].apply( + lambda x: "Upregulated" if x > 0 else "Downregulated" + ) + + # Compute -log10(FDR) + combined["FDR q-val"] = pd.to_numeric(combined["FDR q-val"], errors="coerce") + combined["log_FDR"] = -np.log10(combined["FDR q-val"] + 1e-10) + + # Get gene count from leading edge + combined["Count"] = combined["Lead_genes"].apply(lambda x: len(str(x).split(";"))) + + # Clean term names + combined["Term"] = combined["Term"].str.replace(label_prefix, "", regex=False) + + return combined + + +def plot_gsea_results( + combined_gsea: pd.DataFrame, + figure_dir: Path | None = None, + title: str = "Top GSEA Pathways (Up vs Down)", +) -> None: + """Plot GSEA results as dot plot. + + Parameters + ---------- + combined_gsea : pd.DataFrame + Prepared GSEA dataframe + figure_dir : Path, optional + Directory to save figures + title : str + Plot title + """ + plt.figure(figsize=(10, 8)) + + sns.scatterplot( + data=combined_gsea, + x="log_FDR", + y="Term", + hue="Direction", + size="Count", + palette={"Upregulated": "#E41A1C", "Downregulated": "#377EB8"}, + sizes=(50, 400), + alpha=0.8, + ) + + plt.title(title, fontsize=14) + plt.xlabel("-log10(FDR q-value)", fontsize=12) + plt.ylabel("") + + plt.axvline( + -np.log10(0.25), + color="gray", + linestyle=":", + label="FDR=0.25 (GSEA standard)", + ) + plt.axvline(-np.log10(0.05), color="gray", linestyle="--", label="FDR=0.05") + + plt.legend(bbox_to_anchor=(1.05, 1), loc="upper left", borderaxespad=0.0) + plt.grid(axis="x", alpha=0.3) + plt.tight_layout() + + if figure_dir is not None: + plt.savefig(figure_dir / "gsea_pathways.png", dpi=300, bbox_inches="tight") + plt.close() + + +def run_gsea_pipeline( + results_df: pd.DataFrame, + gene_sets: list[str] | None = None, + n_top: int = 10, + figure_dir: Path | None = None, +) -> gp.GSEA: + """Run complete GSEA pipeline. + + Parameters + ---------- + results_df : pd.DataFrame + DESeq2 results dataframe + gene_sets : list[str], optional + Gene set databases + n_top : int + Number of top pathways to show + figure_dir : Path, optional + Directory to save figures + + Returns + ------- + gp.GSEA + GSEA results object + """ + # Prepare ranked list + rank_df = prepare_ranked_list(results_df) + + # Run GSEA + prerank_res = run_gsea_prerank(rank_df, gene_sets) + + # Get top terms + get_gsea_top_terms(prerank_res, n_top) + + # Prepare and plot + combined = prepare_gsea_plot_data(prerank_res, n_top) + print(f"Plotting {len(combined)} pathways.") + print(combined[["Term", "NES", "FDR q-val", "Count"]].head()) + + plot_gsea_results(combined, figure_dir) + + return prerank_res diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/predictive_modeling.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/predictive_modeling.py new file mode 100644 index 0000000..474ecb4 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/predictive_modeling.py @@ -0,0 +1,337 @@ +"""Predictive modeling module for scRNA-seq analysis workflow. + +Functions for building and evaluating age prediction models. +""" + +from pathlib import Path + +import matplotlib.pyplot as plt +import numpy as np +import pandas as pd +from sklearn.metrics import mean_absolute_error, r2_score +from sklearn.model_selection import ShuffleSplit +from sklearn.preprocessing import StandardScaler +from sklearn.svm import LinearSVR + + +def prepare_features( + counts_df: pd.DataFrame, + metadata: pd.DataFrame, +) -> tuple[pd.DataFrame, np.ndarray]: + """Prepare feature matrix and target variable. + + Parameters + ---------- + counts_df : pd.DataFrame + Gene expression counts + metadata : pd.DataFrame + Metadata with 'age' and 'sex' columns + + Returns + ------- + tuple[pd.DataFrame, np.ndarray] + Feature matrix (genes + sex) and age target + """ + X_genes = counts_df.copy() + + # Add sex as binary feature + sex_encoded = pd.get_dummies(metadata["sex"], drop_first=True) + X = pd.concat([X_genes, sex_encoded], axis=1) + + # Target: age + y = metadata["age"].values + + print(f"Feature matrix shape: {X.shape}") + print(f"Number of samples: {len(y)}") + print(f"Age range: {y.min():.1f} - {y.max():.1f} years") + + return X, y + + +def prepare_baseline_features(metadata: pd.DataFrame) -> pd.DataFrame: + """Prepare baseline feature matrix (sex only). + + Parameters + ---------- + metadata : pd.DataFrame + Metadata with 'sex' column + + Returns + ------- + pd.DataFrame + Sex-only feature matrix + """ + return pd.get_dummies(metadata["sex"], drop_first=True) + + +def train_evaluate_fold( + X_train: np.ndarray, + X_test: np.ndarray, + y_train: np.ndarray, + y_test: np.ndarray, + C: float = 1.0, + max_iter: int = 10000, + random_state: int = 42, +) -> tuple[float, float, np.ndarray]: + """Train and evaluate model for one fold. + + Parameters + ---------- + X_train : np.ndarray + Training features + X_test : np.ndarray + Test features + y_train : np.ndarray + Training target + y_test : np.ndarray + Test target + C : float + Regularization parameter + max_iter : int + Maximum iterations + random_state : int + Random seed + + Returns + ------- + tuple[float, float, np.ndarray] + R2 score, MAE, and predictions + """ + # Scale features + scaler = StandardScaler() + X_train_scaled = scaler.fit_transform(X_train) + X_test_scaled = scaler.transform(X_test) + + # Train model + model = LinearSVR(C=C, max_iter=max_iter, random_state=random_state, dual="auto") + model.fit(X_train_scaled, y_train) + + # Predict + y_pred = model.predict(X_test_scaled) + + # Metrics + r2 = r2_score(y_test, y_pred) + mae = mean_absolute_error(y_test, y_pred) + + return r2, mae, y_pred + + +def run_cross_validation( + X: pd.DataFrame, + y: np.ndarray, + n_splits: int = 5, + test_size: float = 0.2, + random_state: int = 42, +) -> tuple[list[float], list[float], list[float], list[float]]: + """Run cross-validation for age prediction. + + Parameters + ---------- + X : pd.DataFrame + Feature matrix + y : np.ndarray + Target variable + n_splits : int + Number of CV splits + test_size : float + Fraction for test set + random_state : int + Random seed + + Returns + ------- + tuple[list, list, list, list] + R2 scores, MAE scores, predictions, actuals + """ + cv = ShuffleSplit(n_splits=n_splits, test_size=test_size, random_state=random_state) + + r2_scores = [] + mae_scores = [] + predictions_list = [] + actual_list = [] + + for fold, (train_idx, test_idx) in enumerate(cv.split(X)): + print(f"\nFold {fold + 1}/{n_splits}") + + X_train, X_test = X.iloc[train_idx], X.iloc[test_idx] + y_train, y_test = y[train_idx], y[test_idx] + + r2, mae, y_pred = train_evaluate_fold( + X_train.values, X_test.values, y_train, y_test + ) + + r2_scores.append(r2) + mae_scores.append(mae) + predictions_list.extend(y_pred) + actual_list.extend(y_test) + + print(f" R2 Score: {r2:.3f}") + print(f" MAE: {mae:.2f} years") + + return r2_scores, mae_scores, predictions_list, actual_list + + +def print_cv_results( + r2_scores: list[float], + mae_scores: list[float], + model_name: str = "Model", +) -> None: + """Print cross-validation summary. + + Parameters + ---------- + r2_scores : list[float] + R2 scores per fold + mae_scores : list[float] + MAE scores per fold + model_name : str + Name of the model + """ + print("\n" + "=" * 50) + print(f"{model_name} CROSS-VALIDATION RESULTS") + print("=" * 50) + print(f"R2 Score: {np.mean(r2_scores):.3f} +/- {np.std(r2_scores):.3f}") + print(f"MAE: {np.mean(mae_scores):.2f} +/- {np.std(mae_scores):.2f} years") + print("=" * 50) + + +def plot_predictions( + actual: list[float], + predicted: list[float], + r2_scores: list[float], + mae_scores: list[float], + figure_dir: Path | None = None, +) -> None: + """Plot predicted vs actual ages. + + Parameters + ---------- + actual : list[float] + Actual ages + predicted : list[float] + Predicted ages + r2_scores : list[float] + R2 scores + mae_scores : list[float] + MAE scores + figure_dir : Path, optional + Directory to save figures + """ + plt.figure(figsize=(8, 6)) + + plt.scatter(actual, predicted, alpha=0.6, s=80) + + min_age = min(min(actual), min(predicted)) + max_age = max(max(actual), max(predicted)) + plt.plot( + [min_age, max_age], + [min_age, max_age], + "r--", + linewidth=2, + label="Perfect Prediction", + ) + + plt.xlabel("Actual Age (years)", fontsize=12) + plt.ylabel("Predicted Age (years)", fontsize=12) + plt.title( + f"Age Prediction Performance\n" + f"R2 = {np.mean(r2_scores):.3f}, MAE = {np.mean(mae_scores):.2f} years", + fontsize=14, + ) + plt.legend() + plt.grid(alpha=0.3) + plt.tight_layout() + + if figure_dir is not None: + plt.savefig( + figure_dir / "age_prediction_performance.png", dpi=300, bbox_inches="tight" + ) + plt.close() + + +def compare_models( + full_r2: list[float], + full_mae: list[float], + baseline_r2: list[float], + baseline_mae: list[float], +) -> None: + """Print comparison between full and baseline models. + + Parameters + ---------- + full_r2 : list[float] + R2 scores for full model + full_mae : list[float] + MAE scores for full model + baseline_r2 : list[float] + R2 scores for baseline + baseline_mae : list[float] + MAE scores for baseline + """ + print("=" * 60) + print("MODEL COMPARISON") + print("=" * 60) + print("Full Model (Genes + Sex):") + print(f" R2 Score: {np.mean(full_r2):.3f} +/- {np.std(full_r2):.3f}") + print(f" MAE: {np.mean(full_mae):.2f} +/- {np.std(full_mae):.2f} years") + print("\nBaseline Model (Sex Only):") + print(f" R2 Score: {np.mean(baseline_r2):.3f} +/- {np.std(baseline_r2):.3f}") + print(f" MAE: {np.mean(baseline_mae):.2f} +/- {np.std(baseline_mae):.2f} years") + print("\nImprovement:") + print(f" Delta R2: {np.mean(full_r2) - np.mean(baseline_r2):.3f}") + print(f" Delta MAE: {np.mean(baseline_mae) - np.mean(full_mae):.2f} years") + print("=" * 60) + + +def run_predictive_modeling_pipeline( + counts_df: pd.DataFrame, + metadata: pd.DataFrame, + n_splits: int = 5, + figure_dir: Path | None = None, +) -> dict: + """Run complete predictive modeling pipeline. + + Parameters + ---------- + counts_df : pd.DataFrame + Gene expression counts + metadata : pd.DataFrame + Metadata with age and sex + n_splits : int + Number of CV splits + figure_dir : Path, optional + Directory to save figures + + Returns + ------- + dict + Results dictionary with scores + """ + # Prepare features + X, y = prepare_features(counts_df, metadata) + X_baseline = prepare_baseline_features(metadata) + + # Run full model + print("\n--- Full Model (Genes + Sex) ---") + r2_scores, mae_scores, predictions, actuals = run_cross_validation(X, y, n_splits) + print_cv_results(r2_scores, mae_scores, "Full Model") + + # Plot predictions + plot_predictions(actuals, predictions, r2_scores, mae_scores, figure_dir) + + # Run baseline model + print("\n--- Baseline Model (Sex Only) ---") + baseline_r2, baseline_mae, _, _ = run_cross_validation(X_baseline, y, n_splits) + print_cv_results(baseline_r2, baseline_mae, "Baseline") + + # Compare models + compare_models(r2_scores, mae_scores, baseline_r2, baseline_mae) + + return { + "full_r2": r2_scores, + "full_mae": mae_scores, + "baseline_r2": baseline_r2, + "baseline_mae": baseline_mae, + "predictions": predictions, + "actuals": actuals, + } diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/preprocessing.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/preprocessing.py new file mode 100644 index 0000000..4ab383e --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/preprocessing.py @@ -0,0 +1,210 @@ +"""Preprocessing module for scRNA-seq analysis workflow. + +Functions for normalization, log transformation, and feature selection. +""" + +import re + +import anndata as ad +import scanpy as sc + + +def normalize_counts(adata: ad.AnnData, target_sum: float = 1e4) -> ad.AnnData: + """Normalize counts to target sum per cell. + + Parameters + ---------- + adata : AnnData + AnnData object with raw counts + target_sum : float + Target sum for normalization (default: 10,000) + + Returns + ------- + AnnData + Normalized AnnData object + """ + sc.pp.normalize_total(adata, target_sum=target_sum) + return adata + + +def log_transform(adata: ad.AnnData) -> ad.AnnData: + """Apply log1p transformation. + + Parameters + ---------- + adata : AnnData + AnnData object + + Returns + ------- + AnnData + Log-transformed AnnData object + """ + sc.pp.log1p(adata) + return adata + + +def select_highly_variable_genes( + adata: ad.AnnData, + n_top_genes: int = 3000, + batch_key: str = "donor_id", + layer: str = "counts", + span: float = 0.8, +) -> ad.AnnData: + """Select highly variable genes using seurat_v3 method. + + Parameters + ---------- + adata : AnnData + AnnData object + n_top_genes : int + Number of top variable genes to select + batch_key : str + Column name for batch correction + layer : str + Layer containing raw counts + span : float + LOESS span parameter + + Returns + ------- + AnnData + AnnData with highly_variable annotation + """ + sc.pp.highly_variable_genes( + adata, + n_top_genes=n_top_genes, + flavor="seurat_v3", + batch_key=batch_key, + span=span, + layer=layer, + subset=False, + ) + return adata + + +def identify_nuisance_genes(adata: ad.AnnData) -> list[str]: + """Identify nuisance genes to exclude from HVG list. + + Identifies TCR/BCR variable regions, mitochondrial, and ribosomal genes. + + Parameters + ---------- + adata : AnnData + AnnData object + + Returns + ------- + list[str] + List of gene names to block + """ + # TCR/BCR genes (V(D)J recombination genes) + immune_receptor_genes = [ + name for name in adata.var_names if re.match(r"^(IG[HKL]|TR[ABDG])[VDJC]", name) + ] + + # Mitochondrial genes + mt_genes = adata.var_names[adata.var_names.str.startswith("MT-")] + + # Ribosomal genes + rb_genes = adata.var_names[adata.var_names.str.startswith(("RPS", "RPL"))] + + genes_to_block = list(immune_receptor_genes) + list(mt_genes) + list(rb_genes) + return genes_to_block + + +def filter_nuisance_genes_from_hvg(adata: ad.AnnData) -> ad.AnnData: + """Remove nuisance genes from HVG list. + + Parameters + ---------- + adata : AnnData + AnnData object with highly_variable annotation + + Returns + ------- + AnnData + AnnData with filtered HVG list + """ + genes_to_block = identify_nuisance_genes(adata) + + # Count immune receptor genes separately for reporting + immune_receptor_genes = [ + name for name in adata.var_names if re.match(r"^(IG[HKL]|TR[ABDG])[VDJC]", name) + ] + + # Set blocked genes to not highly variable + adata.var.loc[adata.var_names.isin(genes_to_block), "highly_variable"] = False + + print(f"Blocked {len(immune_receptor_genes)} immune receptor genes from HVG list.") + print(f"Final HVG count: {adata.var['highly_variable'].sum()}") + + return adata + + +def run_pca( + adata: ad.AnnData, + svd_solver: str = "arpack", + use_highly_variable: bool = True, +) -> ad.AnnData: + """Run PCA on the data. + + Parameters + ---------- + adata : AnnData + AnnData object + svd_solver : str + SVD solver to use + use_highly_variable : bool + Whether to use only HVGs + + Returns + ------- + AnnData + AnnData with PCA results + """ + sc.tl.pca(adata, svd_solver=svd_solver, use_highly_variable=use_highly_variable) + return adata + + +def run_preprocessing_pipeline( + adata: ad.AnnData, + target_sum: float = 1e4, + n_top_genes: int = 3000, + batch_key: str = "donor_id", +) -> ad.AnnData: + """Run complete preprocessing pipeline. + + Parameters + ---------- + adata : AnnData + Input AnnData object with raw counts in 'counts' layer + target_sum : float + Target sum for normalization + n_top_genes : int + Number of HVGs to select + batch_key : str + Column for batch correction + + Returns + ------- + AnnData + Preprocessed AnnData object + """ + # Normalize + adata = normalize_counts(adata, target_sum) + + # Log transform + adata = log_transform(adata) + + # Select HVGs + adata = select_highly_variable_genes(adata, n_top_genes, batch_key) + + # Filter nuisance genes + adata = filter_nuisance_genes_from_hvg(adata) + + # Run PCA + adata = run_pca(adata) + + return adata diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/pseudobulk.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/pseudobulk.py new file mode 100644 index 0000000..3fd1a92 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/pseudobulk.py @@ -0,0 +1,207 @@ +"""Pseudobulking module for scRNA-seq analysis workflow. + +Functions for aggregating single-cell counts to pseudobulk samples. +""" + +from pathlib import Path + +import anndata as ad +import matplotlib.pyplot as plt +import numpy as np +import pandas as pd +import scanpy as sc +from sklearn.preprocessing import OneHotEncoder + + +def create_pseudobulk( + adata: ad.AnnData, + group_col: str, + donor_col: str, + layer: str = "counts", + metadata_cols: list[str] | None = None, +) -> ad.AnnData: + """Sum raw counts for each (Donor, CellType) pair. + + Parameters + ---------- + adata : AnnData + Input single-cell data + group_col : str + Column name for grouping (e.g., 'cell_type') + donor_col : str + Column name for donor ID + layer : str + Layer to use for aggregation (default: 'counts') + metadata_cols : list of str, optional + Additional metadata columns to preserve from obs + + Returns + ------- + AnnData + Pseudobulk AnnData object + """ + # Create a combined key (e.g., "Bcell::Donor1") + groups = adata.obs[group_col].astype(str) + donors = adata.obs[donor_col].astype(str) + + group_df = pd.DataFrame({"group": groups, "donor": donors}) + group_df["combined"] = group_df["group"] + "::" + group_df["donor"] + + # Build the aggregation matrix (One-Hot Encoding) + enc = OneHotEncoder(sparse_output=True, dtype=np.float32) + membership_matrix = enc.fit_transform(group_df[["combined"]]) + + # Get source matrix + if layer is not None and layer in adata.layers: + X_source = adata.layers[layer] + else: + X_source = adata.X + + # Aggregate by summing + pseudobulk_X = membership_matrix.T @ X_source + + # Create obs metadata for the new object + unique_ids = enc.categories_[0] + + obs_data = [] + for uid in unique_ids: + ctype, donor = uid.split("::") + obs_data.append({"cell_type": ctype, "donor_id": donor}) + + pb_obs = pd.DataFrame(obs_data, index=unique_ids) + + # Count cells per pseudobulk sample + cell_counts = np.array(membership_matrix.sum(axis=0)).flatten() + pb_obs["n_cells"] = cell_counts.astype(int) + + # Add additional metadata columns + if metadata_cols is not None: + for col in metadata_cols: + if col in adata.obs.columns: + col_values = [] + for uid in unique_ids: + ctype, donor = uid.split("::") + donor_mask = adata.obs[donor_col] == donor + if donor_mask.any(): + col_values.append(adata.obs.loc[donor_mask, col].iloc[0]) + else: + col_values.append(None) + pb_obs[col] = col_values + + # Assemble the AnnData + pb_adata = ad.AnnData(X=pseudobulk_X, obs=pb_obs, var=adata.var.copy()) + + return pb_adata + + +def filter_pseudobulk_by_cell_count( + pb_adata: ad.AnnData, min_cells: int = 10 +) -> ad.AnnData: + """Filter pseudobulk samples with too few cells. + + Parameters + ---------- + pb_adata : AnnData + Pseudobulk AnnData object + min_cells : int + Minimum cells required per sample + + Returns + ------- + AnnData + Filtered pseudobulk AnnData + """ + print(f"Dropping samples with < {min_cells} cells...") + pb_adata = pb_adata[pb_adata.obs["n_cells"] >= min_cells].copy() + print(f"Remaining samples: {pb_adata.n_obs}") + return pb_adata + + +def compute_pseudobulk_qc(pb_adata: ad.AnnData) -> ad.AnnData: + """Compute QC metrics for pseudobulk samples. + + Parameters + ---------- + pb_adata : AnnData + Pseudobulk AnnData object + + Returns + ------- + AnnData + Pseudobulk AnnData with QC metrics + """ + pb_adata.obs["total_counts"] = np.array(pb_adata.X.sum(axis=1)).flatten() + return pb_adata + + +def plot_pseudobulk_qc(pb_adata: ad.AnnData, figure_dir: Path | None = None) -> None: + """Plot QC metrics for pseudobulk samples. + + Parameters + ---------- + pb_adata : AnnData + Pseudobulk AnnData object + figure_dir : Path, optional + Directory to save figures + """ + sc.pl.violin(pb_adata, ["n_cells", "total_counts"], multi_panel=True, show=False) + if figure_dir is not None: + plt.savefig(figure_dir / "pseudobulk_violin.png", dpi=300, bbox_inches="tight") + plt.close() + + +def run_pseudobulk_pipeline( + adata: ad.AnnData, + group_col: str = "cell_type", + donor_col: str = "donor_id", + metadata_cols: list[str] | None = None, + min_cells: int = 10, + figure_dir: Path | None = None, +) -> ad.AnnData: + """Run complete pseudobulking pipeline. + + Parameters + ---------- + adata : AnnData + Input single-cell AnnData object + group_col : str + Column for cell type grouping + donor_col : str + Column for donor ID + metadata_cols : list of str, optional + Metadata columns to preserve + min_cells : int + Minimum cells per pseudobulk sample + figure_dir : Path, optional + Directory to save figures + + Returns + ------- + AnnData + Pseudobulk AnnData object + """ + if metadata_cols is None: + metadata_cols = ["development_stage", "sex"] + + print("Aggregating counts...") + pb_adata = create_pseudobulk( + adata, + group_col=group_col, + donor_col=donor_col, + layer="counts", + metadata_cols=metadata_cols, + ) + + print("Pseudobulk complete.") + print(f"Original shape: {adata.shape}") + print(f"Pseudobulk shape: {pb_adata.shape} (Samples x Genes)") + print(pb_adata.obs.head()) + + # Filter by cell count + pb_adata = filter_pseudobulk_by_cell_count(pb_adata, min_cells) + + # Compute and plot QC + pb_adata = compute_pseudobulk_qc(pb_adata) + plot_pseudobulk_qc(pb_adata, figure_dir) + + return pb_adata diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py new file mode 100644 index 0000000..7115679 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py @@ -0,0 +1,322 @@ +"""Quality control module for scRNA-seq analysis workflow. + +Functions for identifying bad cells and doublets. +""" + +from pathlib import Path + +import anndata as ad +import matplotlib.pyplot as plt +import scanpy as sc +import seaborn as sns + + +def annotate_gene_types(adata: ad.AnnData) -> ad.AnnData: + """Annotate mitochondrial, ribosomal, and hemoglobin genes. + + Parameters + ---------- + adata : AnnData + AnnData object with 'feature_name' in var + + Returns + ------- + AnnData + AnnData with mt, ribo, hb annotations in var + """ + # Mitochondrial genes + adata.var["mt"] = adata.var["feature_name"].str.startswith("MT-") + print(f"Number of mitochondrial genes: {adata.var['mt'].sum()}") + + # Ribosomal genes + adata.var["ribo"] = adata.var["feature_name"].str.startswith(("RPS", "RPL")) + print(f"Number of ribosomal genes: {adata.var['ribo'].sum()}") + + # Hemoglobin genes + adata.var["hb"] = adata.var["feature_name"].str.contains("^HB[^(P)]") + print(f"Number of hemoglobin genes: {adata.var['hb'].sum()}") + + return adata + + +def calculate_qc_metrics(adata: ad.AnnData) -> ad.AnnData: + """Calculate QC metrics for cells. + + Parameters + ---------- + adata : AnnData + AnnData object with gene type annotations + + Returns + ------- + AnnData + AnnData with QC metrics in obs + """ + sc.pp.calculate_qc_metrics( + adata, + qc_vars=["mt", "ribo", "hb"], + inplace=True, + percent_top=[20], + log1p=True, + ) + return adata + + +def plot_qc_metrics(adata: ad.AnnData, figure_dir: Path | None = None) -> None: + """Plot QC metric distributions. + + Parameters + ---------- + adata : AnnData + AnnData object with QC metrics + figure_dir : Path, optional + Directory to save figures + """ + # Violin plots for QC metrics + sc.pl.violin( + adata, + ["total_counts", "n_genes_by_counts", "pct_counts_mt"], + jitter=0.4, + multi_panel=True, + show=False, + ) + if figure_dir is not None: + plt.savefig(figure_dir / "qc_violin_plots.png", dpi=300, bbox_inches="tight") + plt.close() + + # Scatter plot for doublets and dying cells + sc.pl.scatter( + adata, + x="total_counts", + y="n_genes_by_counts", + color="pct_counts_mt", + show=False, + ) + if figure_dir is not None: + plt.savefig( + figure_dir / "qc_scatter_doublets.png", dpi=300, bbox_inches="tight" + ) + plt.close() + + +def plot_hemoglobin_distribution( + adata: ad.AnnData, figure_dir: Path | None = None +) -> None: + """Plot hemoglobin content distribution to check RBC contamination. + + Parameters + ---------- + adata : AnnData + AnnData object with QC metrics + figure_dir : Path, optional + Directory to save figures + """ + plt.figure(figsize=(6, 4)) + sns.histplot(adata.obs["pct_counts_hb"], bins=50, log_scale=(False, True)) + plt.title("Hemoglobin Content Distribution") + plt.xlabel("% Hemoglobin Counts") + plt.axvline(5, color="red", linestyle="--", label="5% Cutoff") + plt.legend() + if figure_dir is not None: + plt.savefig( + figure_dir / "hemoglobin_distribution.png", dpi=300, bbox_inches="tight" + ) + plt.close() + + +def apply_qc_filters( + adata: ad.AnnData, + min_genes: int = 200, + max_genes: int = 6000, + min_counts: int = 500, + max_counts: int = 30000, + max_hb_pct: float = 5.0, +) -> ad.AnnData: + """Apply QC filters to remove low quality cells and doublets. + + Parameters + ---------- + adata : AnnData + AnnData object with QC metrics + min_genes : int + Minimum genes per cell + max_genes : int + Maximum genes per cell (doublet filter) + min_counts : int + Minimum UMIs per cell + max_counts : int + Maximum UMIs per cell (doublet filter) + max_hb_pct : float + Maximum hemoglobin percentage (RBC filter) + + Returns + ------- + AnnData + Filtered AnnData object + """ + adata_qc = adata.copy() + print(f"Before filtering: {adata_qc.n_obs} cells") + + # Filter low quality and doublets + adata_qc = adata_qc[ + (adata_qc.obs["n_genes_by_counts"] > min_genes) + & (adata_qc.obs["n_genes_by_counts"] < max_genes) + & (adata_qc.obs["total_counts"] > min_counts) + & (adata_qc.obs["total_counts"] < max_counts) + ] + + # Filter Red Blood Cells + adata_qc = adata_qc[adata_qc.obs["pct_counts_hb"] < max_hb_pct] + + print(f"After filtering: {adata_qc.n_obs} cells") + return adata_qc + + +def detect_doublets_per_donor( + adata: ad.AnnData, + expected_doublet_rate: float = 0.06, + min_cells_per_donor: int = 100, +) -> ad.AnnData: + """Run doublet detection separately for each donor. + + Parameters + ---------- + adata : AnnData + AnnData object with raw counts + expected_doublet_rate : float + Expected doublet rate for Scrublet + min_cells_per_donor : int + Minimum cells required to run Scrublet + + Returns + ------- + AnnData + AnnData with doublet annotations + """ + print(f"Data shape before doublet detection: {adata.shape}") + + adatas_list = [] + donors = adata.obs["donor_id"].unique() + + print(f"Running Scrublet on {len(donors)} donors...") + + for donor in donors: + curr_adata = adata[adata.obs["donor_id"] == donor].copy() + + if curr_adata.n_obs < min_cells_per_donor: + print(f"Skipping donor {donor}: too few cells ({curr_adata.n_obs})") + curr_adata.obs["doublet_score"] = 0 + curr_adata.obs["predicted_doublet"] = False + adatas_list.append(curr_adata) + continue + + sc.pp.scrublet(curr_adata, expected_doublet_rate=expected_doublet_rate) + adatas_list.append(curr_adata) + + adata_combined = sc.concat(adatas_list) + + print( + f"Detected {adata_combined.obs['predicted_doublet'].sum()} " + f"doublets across all donors." + ) + print(adata_combined.obs["predicted_doublet"].value_counts()) + + return adata_combined + + +def plot_doublets(adata: ad.AnnData, figure_dir: Path | None = None) -> None: + """Visualize doublet detection results on UMAP. + + Parameters + ---------- + adata : AnnData + AnnData object with doublet annotations + figure_dir : Path, optional + Directory to save figures + """ + sc.pl.umap(adata, color=["doublet_score", "predicted_doublet"], size=20, show=False) + if figure_dir is not None: + plt.savefig( + figure_dir / "doublet_detection_umap.png", dpi=300, bbox_inches="tight" + ) + plt.close() + + +def filter_doublets(adata: ad.AnnData) -> ad.AnnData: + """Remove predicted doublets from the dataset. + + Parameters + ---------- + adata : AnnData + AnnData object with doublet predictions + + Returns + ------- + AnnData + Filtered AnnData with only singlets + """ + print(f"Found {adata.obs['predicted_doublet'].sum()} predicted doublets") + adata_filtered = adata[adata.obs["predicted_doublet"] == False, :] # noqa: E712 + print(f"Remaining cells: {adata_filtered.n_obs}") + return adata_filtered + + +def run_qc_pipeline( + adata: ad.AnnData, + min_genes: int = 200, + max_genes: int = 6000, + min_counts: int = 500, + max_counts: int = 30000, + max_hb_pct: float = 5.0, + expected_doublet_rate: float = 0.06, + figure_dir: Path | None = None, +) -> ad.AnnData: + """Run complete quality control pipeline. + + Parameters + ---------- + adata : AnnData + Input AnnData object + min_genes : int + Minimum genes per cell + max_genes : int + Maximum genes per cell + min_counts : int + Minimum UMIs per cell + max_counts : int + Maximum UMIs per cell + max_hb_pct : float + Maximum hemoglobin percentage + expected_doublet_rate : float + Expected doublet rate + figure_dir : Path, optional + Directory to save figures + + Returns + ------- + AnnData + QC-filtered AnnData object + """ + # Annotate gene types + adata = annotate_gene_types(adata) + + # Calculate QC metrics + adata = calculate_qc_metrics(adata) + + # Plot QC metrics + plot_qc_metrics(adata, figure_dir) + plot_hemoglobin_distribution(adata, figure_dir) + + # Apply QC filters + adata = apply_qc_filters( + adata, min_genes, max_genes, min_counts, max_counts, max_hb_pct + ) + + # Detect and filter doublets + adata = detect_doublets_per_donor(adata, expected_doublet_rate) + adata = filter_doublets(adata) + + # Save raw counts for later use + adata.layers["counts"] = adata.X.copy() + + return adata diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/run_workflow.py new file mode 100644 index 0000000..f88d965 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/run_workflow.py @@ -0,0 +1,314 @@ +"""Main workflow runner for scRNA-seq immune aging analysis. + +This script orchestrates the complete analysis workflow using the modular components. +""" + +import os +from pathlib import Path + +from dotenv import load_dotenv + +from BetterCodeBetterScience.rnaseq.modular_workflow.clustering import ( + run_clustering_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.data_filtering import ( + run_filtering_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.data_loading import ( + download_data, + load_anndata, + load_lazy_anndata, + save_anndata, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.differential_expression import ( + run_differential_expression_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.dimensionality_reduction import ( + run_dimensionality_reduction_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.overrepresentation_analysis import ( + run_overrepresentation_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.pathway_analysis import ( + run_gsea_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.predictive_modeling import ( + run_predictive_modeling_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.preprocessing import ( + run_preprocessing_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.pseudobulk import ( + run_pseudobulk_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.quality_control import ( + run_qc_pipeline, +) + + +def run_full_workflow( + datadir: Path, + dataset_name: str = "OneK1K", + url: str = "https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad", + cell_type_for_de: str = "central memory CD4-positive, alpha-beta T cell", + skip_download: bool = False, + skip_filtering: bool = False, + skip_qc: bool = False, +) -> dict: + """Run the complete immune aging scRNA-seq analysis workflow. + + Parameters + ---------- + datadir : Path + Base directory for data files + dataset_name : str + Name of the dataset + url : str + URL to download data from + cell_type_for_de : str + Cell type to use for differential expression + skip_download : bool + Skip data download step + skip_filtering : bool + Skip filtering, load pre-filtered data + skip_qc : bool + Skip QC, load post-QC data + + Returns + ------- + dict + Dictionary containing all results + """ + # Setup directories + figure_dir = datadir / "workflow/figures" + figure_dir.mkdir(parents=True, exist_ok=True) + + results = {} + + # ===================================================================== + # STEP 1: Data Loading + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 1: DATA LOADING") + print("=" * 60) + + datafile = datadir / f"dataset-{dataset_name}_subset-immune_raw.h5ad" + filtered_file = datadir / f"dataset-{dataset_name}_subset-immune_filtered.h5ad" + + if not skip_download: + download_data(datafile, url) + + # ===================================================================== + # STEP 2: Data Filtering + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 2: DATA FILTERING") + print("=" * 60) + + if skip_filtering and filtered_file.exists(): + print("Loading pre-filtered data...") + adata = load_anndata(filtered_file) + else: + adata = load_lazy_anndata(datafile) + print(f"Loaded dataset: {adata}") + + adata = run_filtering_pipeline( + adata, + cutoff_percentile=1.0, + min_cells_per_celltype=10, + percent_donors=0.95, + figure_dir=figure_dir, + ) + + # Save filtered data + save_anndata(adata, filtered_file) + print(f"Saved filtered data to {filtered_file}") + + print(f"Dataset after filtering: {adata}") + + # Build var_to_feature mapping + var_to_feature = dict(zip(adata.var_names, adata.var["feature_name"])) + + # ===================================================================== + # STEP 3: Quality Control + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 3: QUALITY CONTROL") + print("=" * 60) + + qc_file = datadir / f"dataset-{dataset_name}_subset-immune_qc.h5ad" + + if skip_qc and qc_file.exists(): + print("Loading post-QC data...") + adata = load_anndata(qc_file) + else: + adata = run_qc_pipeline( + adata, + min_genes=200, + max_genes=6000, + min_counts=500, + max_counts=30000, + max_hb_pct=5.0, + expected_doublet_rate=0.06, + figure_dir=figure_dir, + ) + save_anndata(adata, qc_file) + + print(f"Dataset after QC: {adata}") + + # ===================================================================== + # STEP 4: Preprocessing + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 4: PREPROCESSING") + print("=" * 60) + + adata = run_preprocessing_pipeline( + adata, + target_sum=1e4, + n_top_genes=3000, + batch_key="donor_id", + ) + + # ===================================================================== + # STEP 5: Dimensionality Reduction + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 5: DIMENSIONALITY REDUCTION") + print("=" * 60) + + adata = run_dimensionality_reduction_pipeline( + adata, + batch_key="donor_id", + n_neighbors=30, + n_pcs=40, + figure_dir=figure_dir, + ) + + # ===================================================================== + # STEP 6: Clustering + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 6: CLUSTERING") + print("=" * 60) + + adata = run_clustering_pipeline( + adata, + resolution=1.0, + figure_dir=figure_dir, + ) + + results["adata"] = adata + + # ===================================================================== + # STEP 7: Pseudobulking + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 7: PSEUDOBULKING") + print("=" * 60) + + pb_adata = run_pseudobulk_pipeline( + adata, + group_col="cell_type", + donor_col="donor_id", + metadata_cols=["development_stage", "sex"], + min_cells=10, + figure_dir=figure_dir, + ) + + results["pb_adata"] = pb_adata + + # ===================================================================== + # STEP 8: Differential Expression + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 8: DIFFERENTIAL EXPRESSION") + print("=" * 60) + + stat_res, de_results, counts_df_ct = run_differential_expression_pipeline( + pb_adata, + cell_type=cell_type_for_de, + design_factors=["age_scaled", "sex"], + var_to_feature=var_to_feature, + n_cpus=8, + ) + + results["de_results"] = de_results + results["counts_df"] = counts_df_ct + + # ===================================================================== + # STEP 9: Pathway Analysis (GSEA) + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 9: PATHWAY ANALYSIS (GSEA)") + print("=" * 60) + + gsea_results = run_gsea_pipeline( + de_results, + gene_sets=["MSigDB_Hallmark_2020"], + n_top=10, + figure_dir=figure_dir, + ) + + results["gsea"] = gsea_results + + # ===================================================================== + # STEP 10: Overrepresentation Analysis (Enrichr) + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 10: OVERREPRESENTATION ANALYSIS (Enrichr)") + print("=" * 60) + + enr_up, enr_down = run_overrepresentation_pipeline( + de_results, + gene_sets=["MSigDB_Hallmark_2020"], + padj_threshold=0.05, + n_top=10, + figure_dir=figure_dir, + ) + + results["enrichr_up"] = enr_up + results["enrichr_down"] = enr_down + + # ===================================================================== + # STEP 11: Predictive Modeling + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 11: PREDICTIVE MODELING") + print("=" * 60) + + # Get metadata for the cell type + pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type_for_de].copy() + pb_adata_ct.obs["age"] = ( + pb_adata_ct.obs["development_stage"] + .str.extract(r"(\d+)-year-old")[0] + .astype(float) + ) + metadata_ct = pb_adata_ct.obs.copy() + + prediction_results = run_predictive_modeling_pipeline( + counts_df_ct, + metadata_ct, + n_splits=5, + figure_dir=figure_dir, + ) + + results["prediction"] = prediction_results + + print("\n" + "=" * 60) + print("WORKFLOW COMPLETE") + print("=" * 60) + print(f"Figures saved to: {figure_dir}") + + return results + + +if __name__ == "__main__": + load_dotenv() + + datadir_env = os.getenv("DATADIR") + if datadir_env is None: + raise ValueError("DATADIR environment variable not set") + + datadir = Path(datadir_env) / "immune_aging" + results = run_full_workflow(datadir) From 9b1daeb41a7be9f3fa9a428a1bfa43ce352aedcd Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 11:13:50 -0800 Subject: [PATCH 36/87] cleanup --- .../refactor_monolithic_to_modular.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename refactor_monolithic_to_modular.md => prompts/refactor_monolithic_to_modular.md (100%) diff --git a/refactor_monolithic_to_modular.md b/prompts/refactor_monolithic_to_modular.md similarity index 100% rename from refactor_monolithic_to_modular.md rename to prompts/refactor_monolithic_to_modular.md From 5ad1df94ec559628594e1f8a74442750568f13be Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 11:22:05 -0800 Subject: [PATCH 37/87] Add stateless workflow with checkpointing and execution logging MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implement a stateless execution pattern for the scRNA-seq workflow that: - Saves checkpoint files after each step for resumption - Automatically skips completed steps by loading from cache - Logs execution details (timing, parameters, status) to JSON - Supports forced re-execution from any step onwards New files: - checkpoint.py: Utilities for saving/loading checkpoints with auto-format detection - execution_log.py: Structured logging with StepRecord and ExecutionLog dataclasses - run_workflow.py: Workflow runner using checkpoint wrappers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/stateless_workflow/__init__.py | 0 .../rnaseq/stateless_workflow/checkpoint.py | 342 +++++++++ .../stateless_workflow/execution_log.py | 202 ++++++ .../rnaseq/stateless_workflow/run_workflow.py | 650 ++++++++++++++++++ 4 files changed, 1194 insertions(+) create mode 100644 src/BetterCodeBetterScience/rnaseq/stateless_workflow/__init__.py create mode 100644 src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py create mode 100644 src/BetterCodeBetterScience/rnaseq/stateless_workflow/execution_log.py create mode 100644 src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/__init__.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py new file mode 100644 index 0000000..11d2eba --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py @@ -0,0 +1,342 @@ +"""Checkpoint utilities for stateless workflow execution. + +Provides functions to save and load intermediate results, enabling +workflow resumption from any step. +""" + +from __future__ import annotations + +import hashlib +import json +import pickle +from collections.abc import Callable +from pathlib import Path +from typing import TYPE_CHECKING, Any + +import anndata as ad +import pandas as pd + +if TYPE_CHECKING: + from BetterCodeBetterScience.rnaseq.stateless_workflow.execution_log import ( + ExecutionLog, + ) + + +def get_file_type(filepath: Path) -> str: + """Determine file type from extension. + + Parameters + ---------- + filepath : Path + Path to the file + + Returns + ------- + str + File type identifier: 'h5ad', 'parquet', or 'pickle' + """ + suffix = filepath.suffix.lower() + if suffix == ".h5ad": + return "h5ad" + elif suffix == ".parquet": + return "parquet" + else: + return "pickle" + + +def save_checkpoint(data: Any, filepath: Path) -> None: + """Save data to a checkpoint file. + + Automatically selects serialization format based on file extension: + - .h5ad: AnnData objects + - .parquet: pandas DataFrames + - .pkl: Any picklable object + + Parameters + ---------- + data : Any + Data to save + filepath : Path + Path to save the checkpoint + """ + filepath.parent.mkdir(parents=True, exist_ok=True) + file_type = get_file_type(filepath) + + if file_type == "h5ad": + if not isinstance(data, ad.AnnData): + raise TypeError(f"Expected AnnData for .h5ad file, got {type(data)}") + data.write(filepath) + elif file_type == "parquet": + if not isinstance(data, pd.DataFrame): + raise TypeError(f"Expected DataFrame for .parquet file, got {type(data)}") + data.to_parquet(filepath) + else: + with open(filepath, "wb") as f: + pickle.dump(data, f) + + +def load_checkpoint(filepath: Path) -> Any: + """Load data from a checkpoint file. + + Automatically selects deserialization format based on file extension. + + Parameters + ---------- + filepath : Path + Path to the checkpoint file + + Returns + ------- + Any + Loaded data + """ + file_type = get_file_type(filepath) + + if file_type == "h5ad": + return ad.read_h5ad(filepath) + elif file_type == "parquet": + return pd.read_parquet(filepath) + else: + with open(filepath, "rb") as f: + return pickle.load(f) + + +def hash_parameters(**kwargs) -> str: + """Create a hash of parameters for cache invalidation. + + Parameters + ---------- + **kwargs + Parameters to hash + + Returns + ------- + str + 8-character hash string + """ + param_str = json.dumps(kwargs, sort_keys=True, default=str) + return hashlib.md5(param_str.encode()).hexdigest()[:8] + + +def run_with_checkpoint( + step_name: str, + checkpoint_file: Path, + func: Callable, + *args, + force: bool = False, + execution_log: ExecutionLog | None = None, + step_number: int | None = None, + log_parameters: dict[str, Any] | None = None, + **kwargs, +) -> Any: + """Execute a function with checkpoint caching. + + If the checkpoint file exists and force=False, loads and returns + the cached result. Otherwise, executes the function, saves the + result, and returns it. + + Parameters + ---------- + step_name : str + Human-readable name for logging + checkpoint_file : Path + Path to save/load the checkpoint + func : Callable + Function to execute + *args + Positional arguments for func + force : bool + If True, ignore existing checkpoint and re-run + execution_log : ExecutionLog, optional + Execution log to record step details + step_number : int, optional + Step number for logging + log_parameters : dict, optional + Parameters to record in the execution log + **kwargs + Keyword arguments for func + + Returns + ------- + Any + Result of func(*args, **kwargs) or cached result + """ + # Start logging if provided + step_record = None + if execution_log is not None and step_number is not None: + step_record = execution_log.add_step( + step_number=step_number, + step_name=step_name, + parameters=log_parameters, + checkpoint_file=str(checkpoint_file), + ) + + from_cache = False + error_message = None + + try: + if checkpoint_file.exists() and not force: + print(f"[{step_name}] Loading from checkpoint: {checkpoint_file.name}") + from_cache = True + result = load_checkpoint(checkpoint_file) + else: + print(f"[{step_name}] Executing...") + result = func(*args, **kwargs) + + print(f"[{step_name}] Saving checkpoint: {checkpoint_file.name}") + save_checkpoint(result, checkpoint_file) + + return result + + except Exception as e: + error_message = str(e) + raise + + finally: + if step_record is not None: + execution_log.complete_step( + step_record, from_cache=from_cache, error_message=error_message + ) + + +def run_with_checkpoint_multi( + step_name: str, + checkpoint_files: dict[str, Path], + func: Callable, + *args, + force: bool = False, + execution_log: ExecutionLog | None = None, + step_number: int | None = None, + log_parameters: dict[str, Any] | None = None, + **kwargs, +) -> dict[str, Any]: + """Execute a function that returns multiple outputs with checkpoint caching. + + The function must return a dict with keys matching checkpoint_files. + + Parameters + ---------- + step_name : str + Human-readable name for logging + checkpoint_files : dict[str, Path] + Mapping of output names to checkpoint file paths + func : Callable + Function to execute (must return dict) + *args + Positional arguments for func + force : bool + If True, ignore existing checkpoints and re-run + execution_log : ExecutionLog, optional + Execution log to record step details + step_number : int, optional + Step number for logging + log_parameters : dict, optional + Parameters to record in the execution log + **kwargs + Keyword arguments for func + + Returns + ------- + dict[str, Any] + Dict of results keyed by output names + """ + # Start logging if provided + step_record = None + if execution_log is not None and step_number is not None: + checkpoint_files_str = {k: str(v) for k, v in checkpoint_files.items()} + step_record = execution_log.add_step( + step_number=step_number, + step_name=step_name, + parameters=log_parameters, + checkpoint_file=str(checkpoint_files_str), + ) + + from_cache = False + error_message = None + + try: + all_exist = all(fp.exists() for fp in checkpoint_files.values()) + + if all_exist and not force: + print(f"[{step_name}] Loading from checkpoints...") + from_cache = True + return {key: load_checkpoint(fp) for key, fp in checkpoint_files.items()} + + print(f"[{step_name}] Executing...") + results = func(*args, **kwargs) + + if not isinstance(results, dict): + raise TypeError( + f"Function must return dict for multi-checkpoint, got {type(results)}" + ) + + print(f"[{step_name}] Saving checkpoints...") + for key, filepath in checkpoint_files.items(): + if key not in results: + raise KeyError(f"Function result missing key: {key}") + save_checkpoint(results[key], filepath) + + return results + + except Exception as e: + error_message = str(e) + raise + + finally: + if step_record is not None: + execution_log.complete_step( + step_record, from_cache=from_cache, error_message=error_message + ) + + +def clear_checkpoints(checkpoint_dir: Path, pattern: str = "step*.") -> list[Path]: + """Remove checkpoint files matching a pattern. + + Parameters + ---------- + checkpoint_dir : Path + Directory containing checkpoints + pattern : str + Glob pattern for files to remove + + Returns + ------- + list[Path] + List of removed files + """ + removed = [] + for filepath in checkpoint_dir.glob(pattern + "*"): + filepath.unlink() + removed.append(filepath) + print(f"Removed: {filepath.name}") + return removed + + +def clear_checkpoints_from_step(checkpoint_dir: Path, from_step: int) -> list[Path]: + """Remove checkpoints from a specific step onwards. + + Useful for invalidating downstream results when re-running an upstream step. + + Parameters + ---------- + checkpoint_dir : Path + Directory containing checkpoints + from_step : int + Step number to start clearing from (inclusive) + + Returns + ------- + list[Path] + List of removed files + """ + removed = [] + for filepath in checkpoint_dir.glob("step*"): + try: + step_num = int(filepath.name.split("_")[0].replace("step", "")) + if step_num >= from_step: + filepath.unlink() + removed.append(filepath) + print(f"Removed: {filepath.name}") + except (ValueError, IndexError): + continue + return removed diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/execution_log.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/execution_log.py new file mode 100644 index 0000000..b496d96 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/execution_log.py @@ -0,0 +1,202 @@ +"""Execution logging for stateless workflow. + +Tracks execution details including timing, parameters, and status for each step. +""" + +import json +from dataclasses import asdict, dataclass, field +from datetime import datetime +from pathlib import Path +from typing import Any + + +@dataclass +class StepRecord: + """Record of a single workflow step execution.""" + + step_number: int + step_name: str + start_time: str + end_time: str | None = None + duration_seconds: float | None = None + parameters: dict[str, Any] = field(default_factory=dict) + from_cache: bool = False + status: str = "running" + checkpoint_file: str | None = None + error_message: str | None = None + + +@dataclass +class ExecutionLog: + """Complete execution log for a workflow run.""" + + workflow_name: str + run_id: str + start_time: str + end_time: str | None = None + total_duration_seconds: float | None = None + status: str = "running" + steps: list[StepRecord] = field(default_factory=list) + workflow_parameters: dict[str, Any] = field(default_factory=dict) + + def add_step( + self, + step_number: int, + step_name: str, + parameters: dict[str, Any] | None = None, + checkpoint_file: str | None = None, + ) -> StepRecord: + """Add a new step record and return it.""" + record = StepRecord( + step_number=step_number, + step_name=step_name, + start_time=datetime.now().isoformat(), + parameters=parameters or {}, + checkpoint_file=checkpoint_file, + ) + self.steps.append(record) + return record + + def complete_step( + self, + record: StepRecord, + from_cache: bool = False, + error_message: str | None = None, + ) -> None: + """Mark a step as completed.""" + record.end_time = datetime.now().isoformat() + start = datetime.fromisoformat(record.start_time) + end = datetime.fromisoformat(record.end_time) + record.duration_seconds = (end - start).total_seconds() + record.from_cache = from_cache + record.status = "completed" if error_message is None else "failed" + record.error_message = error_message + + def complete(self, error_message: str | None = None) -> None: + """Mark the entire workflow as completed.""" + self.end_time = datetime.now().isoformat() + start = datetime.fromisoformat(self.start_time) + end = datetime.fromisoformat(self.end_time) + self.total_duration_seconds = (end - start).total_seconds() + self.status = "completed" if error_message is None else "failed" + + def to_dict(self) -> dict: + """Convert to dictionary for JSON serialization.""" + return { + "workflow_name": self.workflow_name, + "run_id": self.run_id, + "start_time": self.start_time, + "end_time": self.end_time, + "total_duration_seconds": self.total_duration_seconds, + "status": self.status, + "workflow_parameters": self.workflow_parameters, + "steps": [asdict(step) for step in self.steps], + } + + def save(self, log_dir: Path) -> Path: + """Save execution log to a date-stamped JSON file. + + Parameters + ---------- + log_dir : Path + Directory to save the log file + + Returns + ------- + Path + Path to the saved log file + """ + log_dir.mkdir(parents=True, exist_ok=True) + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + log_file = log_dir / f"execution_log_{timestamp}.json" + + with open(log_file, "w") as f: + json.dump(self.to_dict(), f, indent=2, default=str) + + return log_file + + def print_summary(self) -> None: + """Print a summary of the execution.""" + print("\n" + "=" * 60) + print("EXECUTION SUMMARY") + print("=" * 60) + print(f"Workflow: {self.workflow_name}") + print(f"Run ID: {self.run_id}") + print(f"Status: {self.status}") + if self.total_duration_seconds is not None: + print(f"Total Duration: {self.total_duration_seconds:.1f} seconds") + + print("\nStep Details:") + print("-" * 60) + for step in self.steps: + cache_indicator = " [cached]" if step.from_cache else "" + duration = ( + f"{step.duration_seconds:.1f}s" + if step.duration_seconds is not None + else "N/A" + ) + status_icon = "✓" if step.status == "completed" else "✗" + print( + f" {status_icon} Step {step.step_number}: {step.step_name:<25} " + f"{duration:>8}{cache_indicator}" + ) + print("-" * 60) + + +def create_execution_log( + workflow_name: str, + workflow_parameters: dict[str, Any] | None = None, +) -> ExecutionLog: + """Create a new execution log. + + Parameters + ---------- + workflow_name : str + Name of the workflow + workflow_parameters : dict, optional + Parameters passed to the workflow + + Returns + ------- + ExecutionLog + New execution log instance + """ + run_id = datetime.now().strftime("%Y%m%d_%H%M%S") + return ExecutionLog( + workflow_name=workflow_name, + run_id=run_id, + start_time=datetime.now().isoformat(), + workflow_parameters=workflow_parameters or {}, + ) + + +def serialize_parameters(**kwargs) -> dict[str, Any]: + """Convert parameters to JSON-serializable format. + + Handles common non-serializable types like Path objects. + + Parameters + ---------- + **kwargs + Parameters to serialize + + Returns + ------- + dict + Serialized parameters + """ + result = {} + for key, value in kwargs.items(): + if isinstance(value, Path): + result[key] = str(value) + elif hasattr(value, "tolist"): # numpy arrays + result[key] = value.tolist() + elif hasattr(value, "__dict__"): # objects + result[key] = str(type(value).__name__) + else: + try: + json.dumps(value) + result[key] = value + except (TypeError, ValueError): + result[key] = str(value) + return result diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py new file mode 100644 index 0000000..76d4072 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py @@ -0,0 +1,650 @@ +"""Stateless workflow runner for scRNA-seq immune aging analysis. + +This script orchestrates the complete analysis workflow using checkpointing +to enable stateless execution and resumption from any step. +""" + +import os +from pathlib import Path + +from dotenv import load_dotenv + +from BetterCodeBetterScience.rnaseq.modular_workflow.clustering import ( + run_clustering_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.data_filtering import ( + run_filtering_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.data_loading import ( + download_data, + load_lazy_anndata, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.differential_expression import ( + run_differential_expression_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.dimensionality_reduction import ( + run_dimensionality_reduction_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.overrepresentation_analysis import ( + run_overrepresentation_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.pathway_analysis import ( + run_gsea_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.predictive_modeling import ( + run_predictive_modeling_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.preprocessing import ( + run_preprocessing_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.pseudobulk import ( + run_pseudobulk_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.quality_control import ( + run_qc_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + clear_checkpoints_from_step, + run_with_checkpoint, + run_with_checkpoint_multi, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.execution_log import ( + ExecutionLog, + create_execution_log, + serialize_parameters, +) + + +def _run_differential_expression_as_dict( + pb_adata, + cell_type, + design_factors, + var_to_feature, + n_cpus, +): + """Wrapper to return DE results as dict for checkpointing.""" + stat_res, de_results, counts_df = run_differential_expression_pipeline( + pb_adata, + cell_type=cell_type, + design_factors=design_factors, + var_to_feature=var_to_feature, + n_cpus=n_cpus, + ) + return { + "stat_res": stat_res, + "de_results": de_results, + "counts_df": counts_df, + } + + +def _run_overrepresentation_as_dict( + de_results, + gene_sets, + padj_threshold, + n_top, + figure_dir, +): + """Wrapper to return enrichment results as dict for checkpointing.""" + enr_up, enr_down = run_overrepresentation_pipeline( + de_results, + gene_sets=gene_sets, + padj_threshold=padj_threshold, + n_top=n_top, + figure_dir=figure_dir, + ) + return { + "enr_up": enr_up, + "enr_down": enr_down, + } + + +def run_stateless_workflow( + datadir: Path, + dataset_name: str = "OneK1K", + url: str = "https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad", + cell_type_for_de: str = "central memory CD4-positive, alpha-beta T cell", + force_from_step: int | None = None, +) -> dict: + """Run the complete immune aging scRNA-seq analysis workflow with checkpointing. + + Each step saves its output to a checkpoint file. On subsequent runs, + completed steps are skipped by loading from checkpoints. + + Parameters + ---------- + datadir : Path + Base directory for data files + dataset_name : str + Name of the dataset + url : str + URL to download data from + cell_type_for_de : str + Cell type to use for differential expression + force_from_step : int, optional + If provided, clears checkpoints from this step onwards and re-runs + + Returns + ------- + dict + Dictionary containing all results + """ + # Setup directories + figure_dir = datadir / "workflow/figures" + figure_dir.mkdir(parents=True, exist_ok=True) + + checkpoint_dir = datadir / "workflow/checkpoints" + checkpoint_dir.mkdir(parents=True, exist_ok=True) + + log_dir = datadir / "workflow/logs" + log_dir.mkdir(parents=True, exist_ok=True) + + # Clear downstream checkpoints if forcing re-run + if force_from_step is not None: + print(f"\nClearing checkpoints from step {force_from_step} onwards...") + clear_checkpoints_from_step(checkpoint_dir, force_from_step) + + # Initialize execution log + execution_log = create_execution_log( + workflow_name="immune_aging_scrnaseq", + workflow_parameters=serialize_parameters( + datadir=datadir, + dataset_name=dataset_name, + url=url, + cell_type_for_de=cell_type_for_de, + force_from_step=force_from_step, + ), + ) + + results = {} + error_occurred = None + + try: + # ===================================================================== + # STEP 1: Data Download + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 1: DATA DOWNLOAD") + print("=" * 60) + + datafile = datadir / f"dataset-{dataset_name}_subset-immune_raw.h5ad" + + # Log step 1 manually (no checkpoint wrapper for download) + step1_record = execution_log.add_step( + step_number=1, + step_name="data_download", + parameters=serialize_parameters(datafile=datafile, url=url), + ) + download_data(datafile, url) + execution_log.complete_step( + step1_record, from_cache=datafile.exists(), error_message=None + ) + + # ===================================================================== + # STEP 2: Data Filtering + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 2: DATA FILTERING") + print("=" * 60) + + step2_params = { + "cutoff_percentile": 1.0, + "min_cells_per_celltype": 10, + "percent_donors": 0.95, + } + + def _load_and_filter(): + adata = load_lazy_anndata(datafile) + print(f"Loaded dataset: {adata}") + return run_filtering_pipeline( + adata, + cutoff_percentile=step2_params["cutoff_percentile"], + min_cells_per_celltype=step2_params["min_cells_per_celltype"], + percent_donors=step2_params["percent_donors"], + figure_dir=figure_dir, + ) + + adata = run_with_checkpoint( + "filtering", + checkpoint_dir / "step02_filtered.h5ad", + _load_and_filter, + execution_log=execution_log, + step_number=2, + log_parameters=step2_params, + ) + print(f"Dataset after filtering: {adata}") + + # Build var_to_feature mapping + var_to_feature = dict(zip(adata.var_names, adata.var["feature_name"])) + + # ===================================================================== + # STEP 3: Quality Control + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 3: QUALITY CONTROL") + print("=" * 60) + + step3_params = { + "min_genes": 200, + "max_genes": 6000, + "min_counts": 500, + "max_counts": 30000, + "max_hb_pct": 5.0, + "expected_doublet_rate": 0.06, + } + + adata = run_with_checkpoint( + "quality_control", + checkpoint_dir / "step03_qc.h5ad", + run_qc_pipeline, + adata, + min_genes=step3_params["min_genes"], + max_genes=step3_params["max_genes"], + min_counts=step3_params["min_counts"], + max_counts=step3_params["max_counts"], + max_hb_pct=step3_params["max_hb_pct"], + expected_doublet_rate=step3_params["expected_doublet_rate"], + figure_dir=figure_dir, + execution_log=execution_log, + step_number=3, + log_parameters=step3_params, + ) + print(f"Dataset after QC: {adata}") + + # ===================================================================== + # STEP 4: Preprocessing + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 4: PREPROCESSING") + print("=" * 60) + + step4_params = { + "target_sum": 1e4, + "n_top_genes": 3000, + "batch_key": "donor_id", + } + + adata = run_with_checkpoint( + "preprocessing", + checkpoint_dir / "step04_preprocessed.h5ad", + run_preprocessing_pipeline, + adata, + target_sum=step4_params["target_sum"], + n_top_genes=step4_params["n_top_genes"], + batch_key=step4_params["batch_key"], + execution_log=execution_log, + step_number=4, + log_parameters=step4_params, + ) + + # ===================================================================== + # STEP 5: Dimensionality Reduction + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 5: DIMENSIONALITY REDUCTION") + print("=" * 60) + + step5_params = { + "batch_key": "donor_id", + "n_neighbors": 30, + "n_pcs": 40, + } + + adata = run_with_checkpoint( + "dimensionality_reduction", + checkpoint_dir / "step05_dimreduced.h5ad", + run_dimensionality_reduction_pipeline, + adata, + batch_key=step5_params["batch_key"], + n_neighbors=step5_params["n_neighbors"], + n_pcs=step5_params["n_pcs"], + figure_dir=figure_dir, + execution_log=execution_log, + step_number=5, + log_parameters=step5_params, + ) + + # ===================================================================== + # STEP 6: Clustering + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 6: CLUSTERING") + print("=" * 60) + + step6_params = {"resolution": 1.0} + + adata = run_with_checkpoint( + "clustering", + checkpoint_dir / "step06_clustered.h5ad", + run_clustering_pipeline, + adata, + resolution=step6_params["resolution"], + figure_dir=figure_dir, + execution_log=execution_log, + step_number=6, + log_parameters=step6_params, + ) + + results["adata"] = adata + + # ===================================================================== + # STEP 7: Pseudobulking + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 7: PSEUDOBULKING") + print("=" * 60) + + step7_params = { + "group_col": "cell_type", + "donor_col": "donor_id", + "metadata_cols": ["development_stage", "sex"], + "min_cells": 10, + } + + pb_adata = run_with_checkpoint( + "pseudobulking", + checkpoint_dir / "step07_pseudobulk.h5ad", + run_pseudobulk_pipeline, + adata, + group_col=step7_params["group_col"], + donor_col=step7_params["donor_col"], + metadata_cols=step7_params["metadata_cols"], + min_cells=step7_params["min_cells"], + figure_dir=figure_dir, + execution_log=execution_log, + step_number=7, + log_parameters=step7_params, + ) + + results["pb_adata"] = pb_adata + + # ===================================================================== + # STEP 8: Differential Expression + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 8: DIFFERENTIAL EXPRESSION") + print("=" * 60) + + step8_params = { + "cell_type": cell_type_for_de, + "design_factors": ["age_scaled", "sex"], + "n_cpus": 8, + } + + de_outputs = run_with_checkpoint_multi( + "differential_expression", + { + "stat_res": checkpoint_dir / "step08_stat_res.pkl", + "de_results": checkpoint_dir / "step08_de_results.parquet", + "counts_df": checkpoint_dir / "step08_counts.parquet", + }, + _run_differential_expression_as_dict, + pb_adata, + cell_type=step8_params["cell_type"], + design_factors=step8_params["design_factors"], + var_to_feature=var_to_feature, + n_cpus=step8_params["n_cpus"], + execution_log=execution_log, + step_number=8, + log_parameters=step8_params, + ) + + de_results = de_outputs["de_results"] + counts_df_ct = de_outputs["counts_df"] + + results["stat_res"] = de_outputs["stat_res"] + results["de_results"] = de_results + results["counts_df"] = counts_df_ct + + # ===================================================================== + # STEP 9: Pathway Analysis (GSEA) + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 9: PATHWAY ANALYSIS (GSEA)") + print("=" * 60) + + step9_params = { + "gene_sets": ["MSigDB_Hallmark_2020"], + "n_top": 10, + } + + gsea_results = run_with_checkpoint( + "gsea", + checkpoint_dir / "step09_gsea.pkl", + run_gsea_pipeline, + de_results, + gene_sets=step9_params["gene_sets"], + n_top=step9_params["n_top"], + figure_dir=figure_dir, + execution_log=execution_log, + step_number=9, + log_parameters=step9_params, + ) + + results["gsea"] = gsea_results + + # ===================================================================== + # STEP 10: Overrepresentation Analysis (Enrichr) + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 10: OVERREPRESENTATION ANALYSIS (Enrichr)") + print("=" * 60) + + step10_params = { + "gene_sets": ["MSigDB_Hallmark_2020"], + "padj_threshold": 0.05, + "n_top": 10, + } + + enr_outputs = run_with_checkpoint_multi( + "overrepresentation", + { + "enr_up": checkpoint_dir / "step10_enr_up.pkl", + "enr_down": checkpoint_dir / "step10_enr_down.pkl", + }, + _run_overrepresentation_as_dict, + de_results, + gene_sets=step10_params["gene_sets"], + padj_threshold=step10_params["padj_threshold"], + n_top=step10_params["n_top"], + figure_dir=figure_dir, + execution_log=execution_log, + step_number=10, + log_parameters=step10_params, + ) + + results["enrichr_up"] = enr_outputs["enr_up"] + results["enrichr_down"] = enr_outputs["enr_down"] + + # ===================================================================== + # STEP 11: Predictive Modeling + # ===================================================================== + print("\n" + "=" * 60) + print("STEP 11: PREDICTIVE MODELING") + print("=" * 60) + + # Get metadata for the cell type + pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type_for_de].copy() + pb_adata_ct.obs["age"] = ( + pb_adata_ct.obs["development_stage"] + .str.extract(r"(\d+)-year-old")[0] + .astype(float) + ) + metadata_ct = pb_adata_ct.obs.copy() + + step11_params = {"n_splits": 5} + + prediction_results = run_with_checkpoint( + "predictive_modeling", + checkpoint_dir / "step11_prediction.pkl", + run_predictive_modeling_pipeline, + counts_df_ct, + metadata_ct, + n_splits=step11_params["n_splits"], + figure_dir=figure_dir, + execution_log=execution_log, + step_number=11, + log_parameters=step11_params, + ) + + results["prediction"] = prediction_results + + except Exception as e: + error_occurred = str(e) + raise + + finally: + # Complete and save execution log + execution_log.complete(error_message=error_occurred) + log_file = execution_log.save(log_dir) + execution_log.print_summary() + print(f"\nExecution log saved to: {log_file}") + + print("\n" + "=" * 60) + print("WORKFLOW COMPLETE") + print("=" * 60) + print(f"Figures saved to: {figure_dir}") + print(f"Checkpoints saved to: {checkpoint_dir}") + + return results + + +def list_checkpoints(datadir: Path) -> list[tuple[str, Path]]: + """List all checkpoint files in the workflow directory. + + Parameters + ---------- + datadir : Path + Base directory for data files + + Returns + ------- + list[tuple[str, Path]] + List of (step_name, file_path) tuples + """ + checkpoint_dir = datadir / "workflow/checkpoints" + if not checkpoint_dir.exists(): + return [] + + checkpoints = [] + for filepath in sorted(checkpoint_dir.glob("step*")): + step_name = ( + filepath.stem.split("_", 1)[1] if "_" in filepath.stem else filepath.stem + ) + checkpoints.append((step_name, filepath)) + + return checkpoints + + +def print_checkpoint_status(datadir: Path) -> None: + """Print the status of all workflow checkpoints. + + Parameters + ---------- + datadir : Path + Base directory for data files + """ + checkpoints = list_checkpoints(datadir) + + if not checkpoints: + print("No checkpoints found.") + return + + print("\nCheckpoint Status:") + print("-" * 50) + for step_name, filepath in checkpoints: + size_mb = filepath.stat().st_size / (1024 * 1024) + print(f" {step_name:<30} ({size_mb:.1f} MB)") + print("-" * 50) + + +def list_execution_logs(datadir: Path) -> list[Path]: + """List all execution log files. + + Parameters + ---------- + datadir : Path + Base directory for data files + + Returns + ------- + list[Path] + List of log file paths, sorted by date (newest first) + """ + log_dir = datadir / "workflow/logs" + if not log_dir.exists(): + return [] + + return sorted(log_dir.glob("execution_log_*.json"), reverse=True) + + +def load_execution_log(log_file: Path) -> ExecutionLog: + """Load an execution log from a JSON file. + + Parameters + ---------- + log_file : Path + Path to the log file + + Returns + ------- + ExecutionLog + Loaded execution log + """ + import json + + from BetterCodeBetterScience.rnaseq.stateless_workflow.execution_log import ( + StepRecord, + ) + + with open(log_file) as f: + data = json.load(f) + + log = ExecutionLog( + workflow_name=data["workflow_name"], + run_id=data["run_id"], + start_time=data["start_time"], + end_time=data.get("end_time"), + total_duration_seconds=data.get("total_duration_seconds"), + status=data.get("status", "unknown"), + workflow_parameters=data.get("workflow_parameters", {}), + ) + + for step_data in data.get("steps", []): + step = StepRecord( + step_number=step_data["step_number"], + step_name=step_data["step_name"], + start_time=step_data["start_time"], + end_time=step_data.get("end_time"), + duration_seconds=step_data.get("duration_seconds"), + parameters=step_data.get("parameters", {}), + from_cache=step_data.get("from_cache", False), + status=step_data.get("status", "unknown"), + checkpoint_file=step_data.get("checkpoint_file"), + error_message=step_data.get("error_message"), + ) + log.steps.append(step) + + return log + + +if __name__ == "__main__": + load_dotenv() + + datadir_env = os.getenv("DATADIR") + if datadir_env is None: + raise ValueError("DATADIR environment variable not set") + + datadir = Path(datadir_env) / "immune_aging" + + # Print current checkpoint status + print_checkpoint_status(datadir) + + # Show recent execution logs + logs = list_execution_logs(datadir) + if logs: + print(f"\nRecent execution logs: {len(logs)} found") + for log_path in logs[:3]: + print(f" {log_path.name}") + + # Run the workflow (will resume from last checkpoint) + results = run_stateless_workflow(datadir) From e7d52c47281b109d886f8dff1e3d82e20d6383e4 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 11:36:58 -0800 Subject: [PATCH 38/87] Fix checkpoint logging to correctly handle interrupts during save MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The checkpoint wrapper was incorrectly marking steps as "completed" even when saving was interrupted (e.g., by Ctrl+C). This happened because: 1. KeyboardInterrupt/SystemExit are BaseException, not Exception 2. The `except Exception` block didn't catch them 3. The finally block would mark the step as completed with no error Now we: - Track `save_succeeded` flag, only set True after save completes - Catch BaseException to properly log all interruptions - Only report success when checkpoint is actually persisted 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/stateless_workflow/checkpoint.py | 21 ++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py index 11d2eba..ebac03c 100644 --- a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py +++ b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py @@ -173,23 +173,28 @@ def run_with_checkpoint( from_cache = False error_message = None + save_succeeded = False try: if checkpoint_file.exists() and not force: print(f"[{step_name}] Loading from checkpoint: {checkpoint_file.name}") from_cache = True result = load_checkpoint(checkpoint_file) + save_succeeded = True # Loading counts as success else: print(f"[{step_name}] Executing...") result = func(*args, **kwargs) print(f"[{step_name}] Saving checkpoint: {checkpoint_file.name}") save_checkpoint(result, checkpoint_file) + save_succeeded = True # Only mark success after save completes return result - except Exception as e: - error_message = str(e) + except BaseException as e: + # Catch all exceptions including KeyboardInterrupt, SystemExit + if not save_succeeded: + error_message = f"{type(e).__name__}: {e}" raise finally: @@ -253,6 +258,7 @@ def run_with_checkpoint_multi( from_cache = False error_message = None + save_succeeded = False try: all_exist = all(fp.exists() for fp in checkpoint_files.values()) @@ -260,7 +266,9 @@ def run_with_checkpoint_multi( if all_exist and not force: print(f"[{step_name}] Loading from checkpoints...") from_cache = True - return {key: load_checkpoint(fp) for key, fp in checkpoint_files.items()} + result = {key: load_checkpoint(fp) for key, fp in checkpoint_files.items()} + save_succeeded = True # Loading counts as success + return result print(f"[{step_name}] Executing...") results = func(*args, **kwargs) @@ -276,10 +284,13 @@ def run_with_checkpoint_multi( raise KeyError(f"Function result missing key: {key}") save_checkpoint(results[key], filepath) + save_succeeded = True # Only mark success after all saves complete return results - except Exception as e: - error_message = str(e) + except BaseException as e: + # Catch all exceptions including KeyboardInterrupt, SystemExit + if not save_succeeded: + error_message = f"{type(e).__name__}: {e}" raise finally: From 8df37e526dee6d7f48f386ae8f906a3601a9369e Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 16:05:57 -0800 Subject: [PATCH 39/87] Optimize checkpoint storage and add selective checkpointing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add BIDS naming for checkpoint files (dataset-{name}_step-{num}_desc-{desc}.ext) - Enable gzip compression for h5ad checkpoint files - Remove redundant layers["counts"] storage by loading raw counts from step 3 checkpoint during pseudobulking (step 7) - Add checkpoint_steps parameter to control which steps save checkpoints (default: {2, 3, 5} for filtering, QC, and dimensionality reduction) - Add skip_save parameter to run_with_checkpoint() functions - Update pseudobulk pipeline to accept layer parameter These changes reduce checkpoint storage by ~50% for steps 3-6 and allow users to checkpoint only expensive steps. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 45 +++++++ .../rnaseq/modular_workflow/pseudobulk.py | 5 +- .../modular_workflow/quality_control.py | 5 +- .../rnaseq/stateless_workflow/checkpoint.py | 117 ++++++++++++++---- .../rnaseq/stateless_workflow/run_workflow.py | 114 +++++++++++++---- 5 files changed, 237 insertions(+), 49 deletions(-) create mode 100644 problems_to_solve.md diff --git a/problems_to_solve.md b/problems_to_solve.md new file mode 100644 index 0000000..14c82c4 --- /dev/null +++ b/problems_to_solve.md @@ -0,0 +1,45 @@ +## Problems to be fixed + +Open problems marked with [ ] +Fixed problems marked with [x] + + +[x] Please change the file naming scheme for the checkpoint files to use a BIDS schema, just like the downloaded data. + - Implemented `bids_checkpoint_name()` and `parse_bids_checkpoint_name()` functions + - Checkpoint files now use format: `dataset-{name}_step-{number}_desc-{description}.{extension}` + - Updated `list_checkpoints()` and `clear_checkpoints_from_step()` to support both BIDS and legacy naming + +[x] Please save the checkpoint h5ad files using compression='gzip' + - Added `compression="gzip"` to `data.write()` in `save_checkpoint()` function + +[x] The size of the checkpoint files is very large, I think in part due to their storage of the original counts within the .X variable in the dataset. However, I'm not sure if and when that's actually necessary, versus simply reloading the original data or an earlier checkpoint to re-populate that variable. Please examine the usage of this .X variable and determine whether it would make more sense to remove it for the sake of space and then reload if needed from an earlier checkpoint. + +### Analysis of .X variable usage: + +The workflow uses two main data storage locations in AnnData: + +1. **`.X`** - The main expression matrix: + - Steps 2-3: Contains raw counts + - Steps 4-6: Contains normalized, log-transformed expression data (after preprocessing) + - Used for: QC metrics, normalization, HVG selection, PCA, neighbor graph, UMAP, clustering + +2. **`layers["counts"]`** - Raw counts layer (no longer used): + - Previously created at end of Step 3 (QC) - now removed + - Step 7 (pseudobulking) now loads step 3 checkpoint directly to get raw counts from `.X` + +**Optimization implemented:** +- Removed `layers["counts"]` creation from step 3 (QC) +- Step 7 (pseudobulking) now loads the step 3 checkpoint to get raw counts from `.X` +- This eliminates redundant storage of raw counts in `layers["counts"]` for steps 3-6 checkpoints + +**Storage savings:** +- Steps 3-6 checkpoints now store only `.X` (not both `.X` and `layers["counts"]`) +- Combined with gzip compression, this roughly halves the expression data storage for these checkpoints + +After Step 7, the pseudobulk AnnData is a separate object with only aggregated counts in `.X` (no layers needed). Steps 8-11 use pickle/parquet files, not h5ad. + +**Selective checkpointing:** +- Added `checkpoint_steps` parameter to `run_stateless_workflow()` (default: `{2, 3, 5}`) +- Only specified steps save checkpoints; other steps run without saving +- Step 3 is always required (provides raw counts for pseudobulking) +- Added `skip_save` parameter to `run_with_checkpoint()` and `run_with_checkpoint_multi()` diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/pseudobulk.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/pseudobulk.py index 3fd1a92..14f64e7 100644 --- a/src/BetterCodeBetterScience/rnaseq/modular_workflow/pseudobulk.py +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/pseudobulk.py @@ -157,6 +157,7 @@ def run_pseudobulk_pipeline( metadata_cols: list[str] | None = None, min_cells: int = 10, figure_dir: Path | None = None, + layer: str | None = None, ) -> ad.AnnData: """Run complete pseudobulking pipeline. @@ -174,6 +175,8 @@ def run_pseudobulk_pipeline( Minimum cells per pseudobulk sample figure_dir : Path, optional Directory to save figures + layer : str, optional + Layer to use for counts. If None, uses .X directly. Returns ------- @@ -188,7 +191,7 @@ def run_pseudobulk_pipeline( adata, group_col=group_col, donor_col=donor_col, - layer="counts", + layer=layer, metadata_cols=metadata_cols, ) diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py index 7115679..8200e37 100644 --- a/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py @@ -316,7 +316,8 @@ def run_qc_pipeline( adata = detect_doublets_per_donor(adata, expected_doublet_rate) adata = filter_doublets(adata) - # Save raw counts for later use - adata.layers["counts"] = adata.X.copy() + # Note: Raw counts remain in .X at this point. + # They will be accessed from this checkpoint during pseudobulking (step 7). + # This avoids redundant storage of counts in layers["counts"] for steps 4-6. return adata diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py index ebac03c..08459ba 100644 --- a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py +++ b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/checkpoint.py @@ -22,6 +22,60 @@ ) +def bids_checkpoint_name( + dataset_name: str, + step_number: int, + description: str, + extension: str = "h5ad", +) -> str: + """Generate a BIDS-compliant checkpoint filename. + + Parameters + ---------- + dataset_name : str + Name of the dataset (e.g., "OneK1K") + step_number : int + Step number in the workflow + description : str + Description of the checkpoint content (e.g., "filtered", "qc", "preprocessed") + extension : str + File extension without the dot (e.g., "h5ad", "pkl", "parquet") + + Returns + ------- + str + BIDS-formatted filename like "dataset-OneK1K_step-02_desc-filtered.h5ad" + """ + return f"dataset-{dataset_name}_step-{step_number:02d}_desc-{description}.{extension}" + + +def parse_bids_checkpoint_name(filename: str) -> dict[str, str | int]: + """Parse a BIDS-formatted checkpoint filename. + + Parameters + ---------- + filename : str + BIDS-formatted filename + + Returns + ------- + dict + Dictionary with keys: dataset, step_number, description, extension + """ + import re + + pattern = r"dataset-([^_]+)_step-(\d+)_desc-([^.]+)\.(.+)" + match = re.match(pattern, filename) + if match: + return { + "dataset": match.group(1), + "step_number": int(match.group(2)), + "description": match.group(3), + "extension": match.group(4), + } + return {} + + def get_file_type(filepath: Path) -> str: """Determine file type from extension. @@ -65,7 +119,7 @@ def save_checkpoint(data: Any, filepath: Path) -> None: if file_type == "h5ad": if not isinstance(data, ad.AnnData): raise TypeError(f"Expected AnnData for .h5ad file, got {type(data)}") - data.write(filepath) + data.write(filepath, compression="gzip") elif file_type == "parquet": if not isinstance(data, pd.DataFrame): raise TypeError(f"Expected DataFrame for .parquet file, got {type(data)}") @@ -124,6 +178,7 @@ def run_with_checkpoint( func: Callable, *args, force: bool = False, + skip_save: bool = False, execution_log: ExecutionLog | None = None, step_number: int | None = None, log_parameters: dict[str, Any] | None = None, @@ -133,7 +188,7 @@ def run_with_checkpoint( If the checkpoint file exists and force=False, loads and returns the cached result. Otherwise, executes the function, saves the - result, and returns it. + result (unless skip_save=True), and returns it. Parameters ---------- @@ -147,6 +202,8 @@ def run_with_checkpoint( Positional arguments for func force : bool If True, ignore existing checkpoint and re-run + skip_save : bool + If True, don't save checkpoint after running (still loads if exists) execution_log : ExecutionLog, optional Execution log to record step details step_number : int, optional @@ -168,7 +225,7 @@ def run_with_checkpoint( step_number=step_number, step_name=step_name, parameters=log_parameters, - checkpoint_file=str(checkpoint_file), + checkpoint_file=str(checkpoint_file) if not skip_save else None, ) from_cache = False @@ -185,9 +242,10 @@ def run_with_checkpoint( print(f"[{step_name}] Executing...") result = func(*args, **kwargs) - print(f"[{step_name}] Saving checkpoint: {checkpoint_file.name}") - save_checkpoint(result, checkpoint_file) - save_succeeded = True # Only mark success after save completes + if not skip_save: + print(f"[{step_name}] Saving checkpoint: {checkpoint_file.name}") + save_checkpoint(result, checkpoint_file) + save_succeeded = True # Mark success after execution (and optional save) return result @@ -210,6 +268,7 @@ def run_with_checkpoint_multi( func: Callable, *args, force: bool = False, + skip_save: bool = False, execution_log: ExecutionLog | None = None, step_number: int | None = None, log_parameters: dict[str, Any] | None = None, @@ -231,6 +290,8 @@ def run_with_checkpoint_multi( Positional arguments for func force : bool If True, ignore existing checkpoints and re-run + skip_save : bool + If True, don't save checkpoints after running (still loads if exists) execution_log : ExecutionLog, optional Execution log to record step details step_number : int, optional @@ -253,7 +314,7 @@ def run_with_checkpoint_multi( step_number=step_number, step_name=step_name, parameters=log_parameters, - checkpoint_file=str(checkpoint_files_str), + checkpoint_file=str(checkpoint_files_str) if not skip_save else None, ) from_cache = False @@ -278,13 +339,14 @@ def run_with_checkpoint_multi( f"Function must return dict for multi-checkpoint, got {type(results)}" ) - print(f"[{step_name}] Saving checkpoints...") - for key, filepath in checkpoint_files.items(): - if key not in results: - raise KeyError(f"Function result missing key: {key}") - save_checkpoint(results[key], filepath) + if not skip_save: + print(f"[{step_name}] Saving checkpoints...") + for key, filepath in checkpoint_files.items(): + if key not in results: + raise KeyError(f"Function result missing key: {key}") + save_checkpoint(results[key], filepath) - save_succeeded = True # Only mark success after all saves complete + save_succeeded = True # Mark success after execution (and optional saves) return results except BaseException as e: @@ -327,6 +389,7 @@ def clear_checkpoints_from_step(checkpoint_dir: Path, from_step: int) -> list[Pa """Remove checkpoints from a specific step onwards. Useful for invalidating downstream results when re-running an upstream step. + Supports both legacy naming (step02_*) and BIDS naming (dataset-*_step-02_*). Parameters ---------- @@ -341,13 +404,23 @@ def clear_checkpoints_from_step(checkpoint_dir: Path, from_step: int) -> list[Pa List of removed files """ removed = [] - for filepath in checkpoint_dir.glob("step*"): - try: - step_num = int(filepath.name.split("_")[0].replace("step", "")) - if step_num >= from_step: - filepath.unlink() - removed.append(filepath) - print(f"Removed: {filepath.name}") - except (ValueError, IndexError): - continue + for filepath in checkpoint_dir.glob("*"): + step_num = None + # Try BIDS format first + parsed = parse_bids_checkpoint_name(filepath.name) + if parsed: + step_num = parsed["step_number"] + else: + # Try legacy format (step02_*) + try: + if filepath.name.startswith("step"): + step_num = int(filepath.name.split("_")[0].replace("step", "")) + except (ValueError, IndexError): + pass + + if step_num is not None and step_num >= from_step: + filepath.unlink() + removed.append(filepath) + print(f"Removed: {filepath.name}") + return removed diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py index 76d4072..28c0e08 100644 --- a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py +++ b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py @@ -44,7 +44,10 @@ run_qc_pipeline, ) from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + bids_checkpoint_name, clear_checkpoints_from_step, + load_checkpoint, + parse_bids_checkpoint_name, run_with_checkpoint, run_with_checkpoint_multi, ) @@ -98,17 +101,21 @@ def _run_overrepresentation_as_dict( } +DEFAULT_CHECKPOINT_STEPS = frozenset({2, 3, 5}) + + def run_stateless_workflow( datadir: Path, dataset_name: str = "OneK1K", url: str = "https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad", cell_type_for_de: str = "central memory CD4-positive, alpha-beta T cell", force_from_step: int | None = None, + checkpoint_steps: set[int] | None = None, ) -> dict: """Run the complete immune aging scRNA-seq analysis workflow with checkpointing. - Each step saves its output to a checkpoint file. On subsequent runs, - completed steps are skipped by loading from checkpoints. + Only specified steps save checkpoints. On subsequent runs, steps with + existing checkpoints are skipped by loading from checkpoints. Parameters ---------- @@ -122,12 +129,28 @@ def run_stateless_workflow( Cell type to use for differential expression force_from_step : int, optional If provided, clears checkpoints from this step onwards and re-runs + checkpoint_steps : set[int], optional + Set of step numbers that should save checkpoints. Defaults to {2, 3, 5}. + Step 3 is always required (provides raw counts for pseudobulking). Returns ------- dict Dictionary containing all results """ + # Setup checkpoint steps + if checkpoint_steps is None: + checkpoint_steps = set(DEFAULT_CHECKPOINT_STEPS) + else: + checkpoint_steps = set(checkpoint_steps) + + # Step 3 is required for pseudobulking (provides raw counts) + if 3 not in checkpoint_steps: + print("Warning: Step 3 is required for pseudobulking. Adding to checkpoint_steps.") + checkpoint_steps.add(3) + + print(f"Checkpointing enabled for steps: {sorted(checkpoint_steps)}") + # Setup directories figure_dir = datadir / "workflow/figures" figure_dir.mkdir(parents=True, exist_ok=True) @@ -152,6 +175,7 @@ def run_stateless_workflow( url=url, cell_type_for_de=cell_type_for_de, force_from_step=force_from_step, + checkpoint_steps=sorted(checkpoint_steps), ), ) @@ -205,8 +229,9 @@ def _load_and_filter(): adata = run_with_checkpoint( "filtering", - checkpoint_dir / "step02_filtered.h5ad", + checkpoint_dir / bids_checkpoint_name(dataset_name, 2, "filtered"), _load_and_filter, + skip_save=2 not in checkpoint_steps, execution_log=execution_log, step_number=2, log_parameters=step2_params, @@ -234,7 +259,7 @@ def _load_and_filter(): adata = run_with_checkpoint( "quality_control", - checkpoint_dir / "step03_qc.h5ad", + checkpoint_dir / bids_checkpoint_name(dataset_name, 3, "qc"), run_qc_pipeline, adata, min_genes=step3_params["min_genes"], @@ -244,6 +269,7 @@ def _load_and_filter(): max_hb_pct=step3_params["max_hb_pct"], expected_doublet_rate=step3_params["expected_doublet_rate"], figure_dir=figure_dir, + skip_save=3 not in checkpoint_steps, execution_log=execution_log, step_number=3, log_parameters=step3_params, @@ -265,12 +291,13 @@ def _load_and_filter(): adata = run_with_checkpoint( "preprocessing", - checkpoint_dir / "step04_preprocessed.h5ad", + checkpoint_dir / bids_checkpoint_name(dataset_name, 4, "preprocessed"), run_preprocessing_pipeline, adata, target_sum=step4_params["target_sum"], n_top_genes=step4_params["n_top_genes"], batch_key=step4_params["batch_key"], + skip_save=4 not in checkpoint_steps, execution_log=execution_log, step_number=4, log_parameters=step4_params, @@ -291,13 +318,14 @@ def _load_and_filter(): adata = run_with_checkpoint( "dimensionality_reduction", - checkpoint_dir / "step05_dimreduced.h5ad", + checkpoint_dir / bids_checkpoint_name(dataset_name, 5, "dimreduced"), run_dimensionality_reduction_pipeline, adata, batch_key=step5_params["batch_key"], n_neighbors=step5_params["n_neighbors"], n_pcs=step5_params["n_pcs"], figure_dir=figure_dir, + skip_save=5 not in checkpoint_steps, execution_log=execution_log, step_number=5, log_parameters=step5_params, @@ -314,11 +342,12 @@ def _load_and_filter(): adata = run_with_checkpoint( "clustering", - checkpoint_dir / "step06_clustered.h5ad", + checkpoint_dir / bids_checkpoint_name(dataset_name, 6, "clustered"), run_clustering_pipeline, adata, resolution=step6_params["resolution"], figure_dir=figure_dir, + skip_save=6 not in checkpoint_steps, execution_log=execution_log, step_number=6, log_parameters=step6_params, @@ -333,6 +362,12 @@ def _load_and_filter(): print("STEP 7: PSEUDOBULKING") print("=" * 60) + # Load step 3 checkpoint to get raw counts (stored in .X) + # This avoids redundant storage of counts in layers["counts"] for steps 4-6 + step3_checkpoint = checkpoint_dir / bids_checkpoint_name(dataset_name, 3, "qc") + adata_raw_counts = load_checkpoint(step3_checkpoint) + print(f"Loaded raw counts from step 3 checkpoint: {adata_raw_counts.shape}") + step7_params = { "group_col": "cell_type", "donor_col": "donor_id", @@ -342,14 +377,16 @@ def _load_and_filter(): pb_adata = run_with_checkpoint( "pseudobulking", - checkpoint_dir / "step07_pseudobulk.h5ad", + checkpoint_dir / bids_checkpoint_name(dataset_name, 7, "pseudobulk"), run_pseudobulk_pipeline, - adata, + adata_raw_counts, # Use step 3 data with raw counts in .X group_col=step7_params["group_col"], donor_col=step7_params["donor_col"], metadata_cols=step7_params["metadata_cols"], min_cells=step7_params["min_cells"], figure_dir=figure_dir, + layer=None, # Use .X directly (raw counts) + skip_save=7 not in checkpoint_steps, execution_log=execution_log, step_number=7, log_parameters=step7_params, @@ -373,9 +410,12 @@ def _load_and_filter(): de_outputs = run_with_checkpoint_multi( "differential_expression", { - "stat_res": checkpoint_dir / "step08_stat_res.pkl", - "de_results": checkpoint_dir / "step08_de_results.parquet", - "counts_df": checkpoint_dir / "step08_counts.parquet", + "stat_res": checkpoint_dir + / bids_checkpoint_name(dataset_name, 8, "statres", "pkl"), + "de_results": checkpoint_dir + / bids_checkpoint_name(dataset_name, 8, "deresults", "parquet"), + "counts_df": checkpoint_dir + / bids_checkpoint_name(dataset_name, 8, "counts", "parquet"), }, _run_differential_expression_as_dict, pb_adata, @@ -383,6 +423,7 @@ def _load_and_filter(): design_factors=step8_params["design_factors"], var_to_feature=var_to_feature, n_cpus=step8_params["n_cpus"], + skip_save=8 not in checkpoint_steps, execution_log=execution_log, step_number=8, log_parameters=step8_params, @@ -409,12 +450,13 @@ def _load_and_filter(): gsea_results = run_with_checkpoint( "gsea", - checkpoint_dir / "step09_gsea.pkl", + checkpoint_dir / bids_checkpoint_name(dataset_name, 9, "gsea", "pkl"), run_gsea_pipeline, de_results, gene_sets=step9_params["gene_sets"], n_top=step9_params["n_top"], figure_dir=figure_dir, + skip_save=9 not in checkpoint_steps, execution_log=execution_log, step_number=9, log_parameters=step9_params, @@ -438,8 +480,10 @@ def _load_and_filter(): enr_outputs = run_with_checkpoint_multi( "overrepresentation", { - "enr_up": checkpoint_dir / "step10_enr_up.pkl", - "enr_down": checkpoint_dir / "step10_enr_down.pkl", + "enr_up": checkpoint_dir + / bids_checkpoint_name(dataset_name, 10, "enrup", "pkl"), + "enr_down": checkpoint_dir + / bids_checkpoint_name(dataset_name, 10, "enrdown", "pkl"), }, _run_overrepresentation_as_dict, de_results, @@ -447,6 +491,7 @@ def _load_and_filter(): padj_threshold=step10_params["padj_threshold"], n_top=step10_params["n_top"], figure_dir=figure_dir, + skip_save=10 not in checkpoint_steps, execution_log=execution_log, step_number=10, log_parameters=step10_params, @@ -475,12 +520,14 @@ def _load_and_filter(): prediction_results = run_with_checkpoint( "predictive_modeling", - checkpoint_dir / "step11_prediction.pkl", + checkpoint_dir + / bids_checkpoint_name(dataset_name, 11, "prediction", "pkl"), run_predictive_modeling_pipeline, counts_df_ct, metadata_ct, n_splits=step11_params["n_splits"], figure_dir=figure_dir, + skip_save=11 not in checkpoint_steps, execution_log=execution_log, step_number=11, log_parameters=step11_params, @@ -511,6 +558,8 @@ def _load_and_filter(): def list_checkpoints(datadir: Path) -> list[tuple[str, Path]]: """List all checkpoint files in the workflow directory. + Supports both BIDS naming (dataset-*_step-*_desc-*) and legacy naming (step*_*). + Parameters ---------- datadir : Path @@ -519,20 +568,37 @@ def list_checkpoints(datadir: Path) -> list[tuple[str, Path]]: Returns ------- list[tuple[str, Path]] - List of (step_name, file_path) tuples + List of (step_name, file_path) tuples, sorted by step number """ checkpoint_dir = datadir / "workflow/checkpoints" if not checkpoint_dir.exists(): return [] checkpoints = [] - for filepath in sorted(checkpoint_dir.glob("step*")): - step_name = ( - filepath.stem.split("_", 1)[1] if "_" in filepath.stem else filepath.stem - ) - checkpoints.append((step_name, filepath)) - - return checkpoints + for filepath in checkpoint_dir.glob("*"): + if filepath.is_dir(): + continue + + # Try BIDS format first + parsed = parse_bids_checkpoint_name(filepath.name) + if parsed: + step_num = parsed["step_number"] + step_name = parsed["description"] + checkpoints.append((step_num, step_name, filepath)) + else: + # Try legacy format (step02_*) + if filepath.name.startswith("step"): + try: + parts = filepath.stem.split("_", 1) + step_num = int(parts[0].replace("step", "")) + step_name = parts[1] if len(parts) > 1 else filepath.stem + checkpoints.append((step_num, step_name, filepath)) + except (ValueError, IndexError): + continue + + # Sort by step number and return as (step_name, filepath) tuples + checkpoints.sort(key=lambda x: x[0]) + return [(step_name, filepath) for _, step_name, filepath in checkpoints] def print_checkpoint_status(datadir: Path) -> None: From 319ccc4f2a16cf52106b69d361195cc04bf52f47 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 16:17:56 -0800 Subject: [PATCH 40/87] Fix: Restore layers['counts'] for HVG selection, delete after step 4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The HVG selection in preprocessing (step 4) requires raw counts in layers["counts"]. This was incorrectly removed in the previous commit. Fix: - Restore layers["counts"] creation in QC step (needed for step 4) - Delete counts layer after step 4, before step 5 saves checkpoint - This still saves space in steps 5-6 checkpoints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 19 +++++++++++-------- .../modular_workflow/quality_control.py | 7 ++++--- .../rnaseq/stateless_workflow/run_workflow.py | 6 ++++++ 3 files changed, 21 insertions(+), 11 deletions(-) diff --git a/problems_to_solve.md b/problems_to_solve.md index 14c82c4..d803a69 100644 --- a/problems_to_solve.md +++ b/problems_to_solve.md @@ -23,18 +23,21 @@ The workflow uses two main data storage locations in AnnData: - Steps 4-6: Contains normalized, log-transformed expression data (after preprocessing) - Used for: QC metrics, normalization, HVG selection, PCA, neighbor graph, UMAP, clustering -2. **`layers["counts"]`** - Raw counts layer (no longer used): - - Previously created at end of Step 3 (QC) - now removed - - Step 7 (pseudobulking) now loads step 3 checkpoint directly to get raw counts from `.X` +2. **`layers["counts"]`** - Raw counts layer: + - Created at end of Step 3 (QC) for HVG selection in step 4 + - Deleted after step 4 before step 5 checkpoint is saved + - Step 7 (pseudobulking) loads step 3 checkpoint directly to get raw counts from `.X` **Optimization implemented:** -- Removed `layers["counts"]` creation from step 3 (QC) -- Step 7 (pseudobulking) now loads the step 3 checkpoint to get raw counts from `.X` -- This eliminates redundant storage of raw counts in `layers["counts"]` for steps 3-6 checkpoints +- `layers["counts"]` is created in step 3 (needed for HVG selection in step 4) +- After step 4 (preprocessing), the counts layer is deleted before step 5 saves its checkpoint +- Step 7 (pseudobulking) loads the step 3 checkpoint to get raw counts from `.X` +- This eliminates redundant storage of raw counts in steps 5-6 checkpoints **Storage savings:** -- Steps 3-6 checkpoints now store only `.X` (not both `.X` and `layers["counts"]`) -- Combined with gzip compression, this roughly halves the expression data storage for these checkpoints +- Step 3 checkpoint stores both `.X` (raw counts) and `layers["counts"]` (needed for step 4) +- Steps 5-6 checkpoints store only `.X` (counts layer deleted after step 4) +- Combined with gzip compression, this reduces storage for steps 5-6 checkpoints After Step 7, the pseudobulk AnnData is a separate object with only aggregated counts in `.X` (no layers needed). Steps 8-11 use pickle/parquet files, not h5ad. diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py index 8200e37..92fa012 100644 --- a/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py @@ -316,8 +316,9 @@ def run_qc_pipeline( adata = detect_doublets_per_donor(adata, expected_doublet_rate) adata = filter_doublets(adata) - # Note: Raw counts remain in .X at this point. - # They will be accessed from this checkpoint during pseudobulking (step 7). - # This avoids redundant storage of counts in layers["counts"] for steps 4-6. + # Save raw counts for HVG selection (step 4) and pseudobulking (step 7) + # Note: Raw counts are also in .X at this point, which will be used + # by pseudobulking when loading this checkpoint directly. + adata.layers["counts"] = adata.X.copy() return adata diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py index 28c0e08..3592857 100644 --- a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py +++ b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py @@ -303,6 +303,12 @@ def _load_and_filter(): log_parameters=step4_params, ) + # Remove counts layer after preprocessing to save space in subsequent checkpoints + # (counts layer was needed for HVG selection but is no longer needed) + if "counts" in adata.layers: + del adata.layers["counts"] + print("Removed counts layer to save checkpoint space") + # ===================================================================== # STEP 5: Dimensionality Reduction # ===================================================================== From 20f2e71c97664c2cefad53b8f3b70960f5ac77f6 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Sun, 21 Dec 2025 19:28:44 -0800 Subject: [PATCH 41/87] Add steps 9-11 to default checkpoint steps MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit These steps produce small pickle files, so caching them adds minimal disk usage while avoiding re-computation on subsequent runs. Default checkpoint_steps now: {2, 3, 5, 9, 10, 11} 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 3 ++- .../rnaseq/stateless_workflow/run_workflow.py | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/problems_to_solve.md b/problems_to_solve.md index d803a69..5ad9246 100644 --- a/problems_to_solve.md +++ b/problems_to_solve.md @@ -42,7 +42,8 @@ The workflow uses two main data storage locations in AnnData: After Step 7, the pseudobulk AnnData is a separate object with only aggregated counts in `.X` (no layers needed). Steps 8-11 use pickle/parquet files, not h5ad. **Selective checkpointing:** -- Added `checkpoint_steps` parameter to `run_stateless_workflow()` (default: `{2, 3, 5}`) +- Added `checkpoint_steps` parameter to `run_stateless_workflow()` (default: `{2, 3, 5, 9, 10, 11}`) - Only specified steps save checkpoints; other steps run without saving - Step 3 is always required (provides raw counts for pseudobulking) +- Steps 9-11 included by default as they produce small pickle files - Added `skip_save` parameter to `run_with_checkpoint()` and `run_with_checkpoint_multi()` diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py index 3592857..861c285 100644 --- a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py +++ b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py @@ -101,7 +101,7 @@ def _run_overrepresentation_as_dict( } -DEFAULT_CHECKPOINT_STEPS = frozenset({2, 3, 5}) +DEFAULT_CHECKPOINT_STEPS = frozenset({2, 3, 5, 9, 10, 11}) def run_stateless_workflow( @@ -130,7 +130,7 @@ def run_stateless_workflow( force_from_step : int, optional If provided, clears checkpoints from this step onwards and re-runs checkpoint_steps : set[int], optional - Set of step numbers that should save checkpoints. Defaults to {2, 3, 5}. + Set of step numbers that should save checkpoints. Defaults to {2, 3, 5, 9, 10, 11}. Step 3 is always required (provides raw counts for pseudobulking). Returns From 70ad51be2fdc2613e53cc50ad13d26c5db3b978b Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 06:19:48 -0800 Subject: [PATCH 42/87] Add step 8 (differential expression) to default checkpoint steps MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Default checkpoint_steps now: {2, 3, 5, 8, 9, 10, 11} 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 4 ++-- .../rnaseq/stateless_workflow/run_workflow.py | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/problems_to_solve.md b/problems_to_solve.md index 5ad9246..03cffce 100644 --- a/problems_to_solve.md +++ b/problems_to_solve.md @@ -42,8 +42,8 @@ The workflow uses two main data storage locations in AnnData: After Step 7, the pseudobulk AnnData is a separate object with only aggregated counts in `.X` (no layers needed). Steps 8-11 use pickle/parquet files, not h5ad. **Selective checkpointing:** -- Added `checkpoint_steps` parameter to `run_stateless_workflow()` (default: `{2, 3, 5, 9, 10, 11}`) +- Added `checkpoint_steps` parameter to `run_stateless_workflow()` (default: `{2, 3, 5, 8, 9, 10, 11}`) - Only specified steps save checkpoints; other steps run without saving - Step 3 is always required (provides raw counts for pseudobulking) -- Steps 9-11 included by default as they produce small pickle files +- Steps 8-11 included by default as they produce small pickle/parquet files - Added `skip_save` parameter to `run_with_checkpoint()` and `run_with_checkpoint_multi()` diff --git a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py index 861c285..f43384a 100644 --- a/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py +++ b/src/BetterCodeBetterScience/rnaseq/stateless_workflow/run_workflow.py @@ -101,7 +101,7 @@ def _run_overrepresentation_as_dict( } -DEFAULT_CHECKPOINT_STEPS = frozenset({2, 3, 5, 9, 10, 11}) +DEFAULT_CHECKPOINT_STEPS = frozenset({2, 3, 5, 8, 9, 10, 11}) def run_stateless_workflow( @@ -130,7 +130,7 @@ def run_stateless_workflow( force_from_step : int, optional If provided, clears checkpoints from this step onwards and re-runs checkpoint_steps : set[int], optional - Set of step numbers that should save checkpoints. Defaults to {2, 3, 5, 9, 10, 11}. + Set of step numbers that should save checkpoints. Defaults to {2, 3, 5, 8, 9, 10, 11}. Step 3 is always required (provides raw counts for pseudobulking). Returns From c68aa0f10892f3250c696424cb231b7c26cec584 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 06:53:43 -0800 Subject: [PATCH 43/87] finished checkpointed workflow section --- book/workflows.md | 151 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 129 insertions(+), 22 deletions(-) diff --git a/book/workflows.md b/book/workflows.md index edae351..9f7c432 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -1,12 +1,12 @@ # Workflow Management -In most parts of science today, data processing and analysis comprises many different steps. We will refer to such a set of steps as a computational *workflow*. If you have been doing science for very long, you have very likely encountered a *mega-script* that implements such a workflow. Usually written in a scripting language like *Bash*, this is a script that may be hundreds or even thousands of lines long that runs a single workflow from start to end. Often these scripts are handed down to new trainees over generations, such that users become afraid to make any changes lest the entire house of cards comes crashing down. I think that most of us can agree that this is not an optimal workflow, and in this chapter I will discuss in detail how to move from a mega-script to a workflow that will meet all of the requirements that are required to provide robust and reliable answers to our scientific questions. +In most parts of science today, data processing and analysis comprise many different steps. We will refer to such a set of steps as a computational *workflow*. If you have been doing science for very long, you have very likely encountered a *mega-script* that implements such a workflow. Usually written in a scripting language like *Bash*, this is a script that may be hundreds or even thousands of lines long that runs a single workflow from start to end. Often these scripts are handed down to new trainees over generations, such that users become afraid to make any changes lest the entire house of cards comes crashing down. I think that most of us can agree that this is not an optimal workflow, and in this chapter I will discuss in detail how to move from a mega-script to a workflow that will meet all of the requirements to provide robust and reliable answers to our scientific questions. ## What do we want from a scientific workflow? First let's ask: What do we want from a computational scientific workflow? Here are some of the factors that I think are important. First, we care about the *correctness* of the workflow, which includes the following factors: -- *Verifiability*: The workflow includes validation procedures to ensure against known problems or edge cases. +- *Validity*: The workflow includes validation procedures to ensure against known problems or edge cases. - *Reproducibility*: The workflow can be rerun from scratch on the same data and get the same answer, at least within the limits of uncontrollable factors such as floating point imprecision. - *Robustness*: When there is a problem, the workflow fails quickly with explicit error messages, or degrades gracefully when possible. @@ -20,19 +20,19 @@ Third, we care about the *engineering quality* of the code, which includes: - *Maintainability*: The workflow is structured and documented so that others (including your future self) can easily maintain, update, and extend it in the future. - *Modularity*: The workflow is composed of a set of independently testable modules, which can be swapped in or out relatively easily. -- *Idempotency*: This term from computer science means that running the workflow multiple times gives the same result as running it once, which allows safely rerunning the workflow when there is a failure. +- *Idempotency*: This term from computer science means that the result of the workflow does not change after its first successful run, which allows safely rerunning the workflow when there is a failure. - *Traceability*: All operations are logged, and provenance information is stored for outputs. Finally, we care about the *efficiency* of the workflow implementation. This includes: - *Incremental execution*: The workflow only reruns a module if necessary, such as when an input changes. -- *Amortized computation*: The workflow pre-computes and reuses results from expensive operations when possible. +- *Cached computation*: The workflow pre-computes and reuses results from expensive operations when possible. -It's worth noting that these different desiderata will sometimes conflict with one another (such as configurability versus maintainability), and that no workflow will be perfect. +It's worth noting that these different desiderata will sometimes conflict with one another (such as configurability versus maintainability), and that no workflow will be perfect. For example, a highly configurable workflow will often be more difficult to maintain. -### Pipelines versus workflows +## Pipelines versus workflows -The terms *workflow* and *pipeline* are sometimes used interchangeable, but in this chapter I will use them to refer to different kinds of applications. I will use *workflow* as the more general term to refer to any set of analysis procedures that are implemented as separate modules. I will use the term *pipeline* to refer more specifically to a data analysis workflow where the several operations are combined into a single command through the use of *pipes*, which are a syntactic construct that feed the output of one process directly into the next process as input. Some readers may be familiar with pipes from the UNIX filesystem, where they are represented by the vertical bar "|". For example, let's say that we had a log file that contains the follow entries: +The terms *workflow* and *pipeline* are sometimes used interchangeably, but in this chapter I will use them to refer to different kinds of applications. I will use *workflow* as the more general term to refer to any set of analysis procedures that are implemented as separate modules. I will use the term *pipeline* to refer more specifically to a data analysis workflow where several operations are combined into a single command through the use of *pipes*, which are a syntactic construct that feed the output of one process directly into the next process as input. Some readers may be familiar with pipes from the UNIX command line, where they are represented by the vertical bar "|". For example, let's say that we had a log file that contains the following entries: ```bash 2024-01-15 10:23:45 ERROR: Database connection failed @@ -51,16 +51,16 @@ grep "ERROR" app.log | sed 's/.*ERROR: //' | sort | uniq -c | sort -rn > error_s where: -- `grep "ERROR" app.log` extracts line containing the word "ERROR" +- `grep "ERROR" app.log` extracts lines containing the word "ERROR" - `sed 's/.*ERROR: //'` replaces everything up to the actual message with an empty string - `sort` sorts the rows alphabetically - `uniq -c` counts the number of appearances of each unique error message -- `sort -rn` sorts the rows in numerical order +- `sort -rn` sorts the rows in reverse numerical order (largest to smallest) - `> error_summary.txt` redirects the output into a file called `error_summary.txt` #### Method chaining -One way that simple pipelines can be built in Python is using *method chaining*, where the output of one class method is redirected into the next class method. This is commonly used to perform data transformations in `pandas`, as it allows composing multiple transformations into a single command. As an example, we will work with the Eisenberg et al. dataset that we used in a previous chapter, to compute the probability of having ever been arrested separately for males and females in the sample. To do this we need to perform a number of operations: +One way that simple pipelines can be built in Python is using *method chaining*, where each method returns an object on which the next method is called; this is slightly different from the operation of UNIX pipes, where it is the result of each command that is being passed through the pipe. This is commonly used to perform data transformations in `pandas`, as it allows composing multiple transformations into a single command. As an example, we will work with the Eisenberg et al. dataset that we used in a previous chapter, to compute the probability of having ever been arrested separately for males and females in the sample. To do this we need to perform a number of operations: - drop any observations that have missing values for the `Sex` or `ArrestedChargedLifeCount` variables - replace the numeric values in the `Sex` variable with text labels @@ -69,7 +69,7 @@ One way that simple pipelines can be built in Python is using *method chaining*, - select the column that we want to compute the mean of (`EverArrested`) - compute the mean -We can do this in a single command using method chaining in `pandas`. It's useful to format the code in a way that makes the pipeline steps explicit, by putting parentheses around the operation; in Python, any commands within parentheses are combined into a single command, which can be useful for making complex code more readable: +We can do this in a single command using method chaining in `pandas`. It's useful to format the code in a way that makes the pipeline steps explicit, by putting parentheses around the operation; in Python, any commands within parentheses are implicitly treated as a single line, which can be useful for making complex code more readable: ```python arrest_stats_by_sex = (df @@ -94,7 +94,7 @@ Note that `pandas` data frames also include an explicit `.pipe` method that allo ## An example of a complex workflow -In this chapter we will focus primarily on complex workflows that have many stages. I will use a running example to show how to move from a monolithic analysis script to a well-structured and usable workflow that meets most of the desired features outlined above. For this example I will use an analysis of single-cell RNA-sequencing data to determine how gene expression in immune system cells changes with age. This analysis will utilize a [large openly available dataset](https://cellxgene.cziscience.com/collections/dde06e0f-ab3b-46be-96a2-a8082383c4a1) that includes data from about 1.3 million immune system cells for about 35K transcripts. I chose this particular example for several reasons: +In this chapter we will focus primarily on complex workflows that have many stages. I will use a running example to show how to move from a monolithic analysis script to a well-structured and usable workflow that meets most of the desired features described earlier. For this example I will use an analysis of single-cell RNA-sequencing data to determine how gene expression in immune system cells changes with age. This analysis will utilize a [large openly available dataset](https://cellxgene.cziscience.com/collections/dde06e0f-ab3b-46be-96a2-a8082383c4a1) that includes data from about 1.3 million immune system cells for about 35K transcripts. I chose this particular example for several reasons: - It is a realistic example of a workflow that a researcher might actually perform. - The data are large enough to call for a real workflow management scheme, but small enough to be processed on a single laptop (assuming it has decent memory). @@ -104,7 +104,7 @@ In this chapter we will focus primarily on complex workflows that have many stag ### Starting point: One huge notebook -I developed the initial version of this workflow in a way that many researchers would do so: By creating a Jupyter notebook that implements the entire workflow, which can be found [here](). Although I don't usually prefer to do code generation using a chatbot, I did most of the coding for this example using the Google Gemini 3.0 chatbot, for a couple of reasons. First, this model seemed particularly knowledgeable about this kind of analysis and the relevant packages. Second, I found it useful to read the commentary about why particular analysis steps were being selected. For debugging I used a mixture of the Gemini 3.0 chatbot and the VSCode Copilot agent, depending on the nature of the problem; for problems specific to the RNA-seq analysis tools I used Gemini, while for standard Python/Pandas issues I used Copilot. The total execution time for this notebook is about two hours on an M3 Max Macbook Pro. +I developed the initial version of this workflow as many researchers would: by creating a Jupyter notebook that implements the entire workflow, which can be found [here](). Although I don't usually prefer to do code generation using a chatbot, I did most of the coding for this example using the Google Gemini 3 chatbot, for a couple of reasons. First, this model seemed particularly knowledgeable about this kind of analysis and the relevant packages. Second, I found it useful to read the commentary about why particular analysis steps were being selected. For debugging I used a mixture of the Gemini 3 chatbot and the VSCode Copilot agent, depending on the nature of the problem; for problems specific to the RNA-seq analysis tools I used Gemini, while for standard Python/Pandas issues I used Copilot. The total execution time for this notebook is about two hours on an M3 Max Macbook Pro. #### The problem of in-place operations @@ -138,28 +138,135 @@ The first thing we need to do with a large monolithic workflow is to determine h - Overrepresentation analysis (Enrichr) - Predictive modeling -In addition to a conceptual breakdown, there are also other reasons that one might to further decompose the workflow: +In addition to a conceptual breakdown, there are also other reasons that one might want to further decompose the workflow: -- There may be points where one might need to restart the computation (e.g. due to computational cost) -- There may be sections where one might wish to swap in a new method or different parameterization -- There may be points where the output could be reusable elsewhere +- There may be points where one might need to restart the computation (e.g. due to computational cost). +- There may be sections where one might wish to swap in a new method or different parameterization. +- There may be points where the output could be reusable elsewhere. ## Stateless workflows -I asked Claude Code to help modularize the monolithic workflow, using a prompt that provided the conceptual breakdown described above. The resulting code (found at XXX) ran correctly, but crashed about two hours into the process due to a resource issue that appeared to be due to asking for too many CPU cores in the differential expression analysis. This left me in the situation of having to rerun the entire two hours of preliminary workflow simply to get to a point where I could test my fix for the differential expression component, which is not a particularly efficient way of coding. The problem here is that the workflow execution is *stateful*, in the sense that the previous steps need to be rerun prior to performing the current step in order to establish the required objects in memory. The solution to this problem is to implement the workflow in a *stateless* way, which doesn't require that earlier steps be rerun if they have already been completed. One way to do this is by implementing a process called *checkpointing*, in which intermediate results are stored for each step. These can then be used to start the workflow at any point without having to rerun all of the previous steps. +I asked Claude Code to help modularize the monolithic workflow, using a prompt that provided the conceptual breakdown described above. The resulting code (found at XXX - link to commit 678983e1c337b6a23b0f35cfb974a87587cfd13e) ran correctly, but crashed about two hours into the process due to a resource issue that appeared to be due to asking for too many CPU cores in the differential expression analysis. This left me in the situation of having to rerun the entire two hours of preliminary workflow simply to get to a point where I could test my fix for the differential expression component, which is not a particularly efficient way of coding. The problem here is that the workflow execution is *stateful*, in the sense that the previous steps need to be rerun prior to performing the current step in order to establish the required objects in memory. The solution to this problem is to implement the workflow in a *stateless* way, which doesn't require that earlier steps be rerun if they have already been completed. One way to do this is by implementing a process called *checkpointing*, in which intermediate results are stored for each step. These can then be used to start the workflow at any point without having to rerun all of the previous steps. +Another important feature of a workflow related to statelessness is *idempotency*, which means that a workflow will result in the same answer when run multiple times. This is related to, but not the same as, the idea of statelessness. For example, a stateless workflow that saves its outputs to checkpoint files could fail to be idempotent if the results were appended to the output file with each execution, rather than overwriting them. This would result in different outputs depending on how many times the workflow has been executed. Thus, when we use checkpointing we should be sure to either reuse the existing file or rewrite it completely with a new version. +I asked Claude Code to help with this: -the workflow should be stateless when possible +> I would like to modify the workflow described in src/BetterCodeBetterScience/rnaseq/modular_workflow/run_workflow.py to make it execute in a stateless way through the use of checkpointing. Please analyze the code and suggest the best way to accomplish this. -- allows each state to be run independently +After analyzing the codebase Claude came up with three proposed solutions to the problem: + +- 1. Use a "registry pattern" in which we define each step in terms of its inputs, outputs, and checkpoint file, and then assemble these into a workflow that can be executed in a stateless way, automatically skipping completed steps. This was its recommended approach. +- 2. Use simple "wrapper" approach in which each module in the workflow is executed via a wrapper function that checks for cached checkpoint values. +- 3. Use a well-established existing workflow engine such as [Prefect](https://www.prefect.io/) or [Luigi](https://github.com/spotify/luigi). While these are powerful, they incur additional dependencies and complexity and may be too heavyweight for our problem. + +Here we will examine the first (recommended) option and the third solution; while the second option is easy to implement, it's not as clean as the registry approach. + +### A workflow registry with checkpointing + +We start with a custom approach in order to get a better view of the details of workflow orchestration. + +> let's implement the recommended Stateless Workflow with Checkpointing. Please generate new code within src/BetterCodeBetterScience/rnaseq/stateless_workflow. + +The resulting code worked straight out of the box, but it didn't maintain any sort of log of its processing, which can be very useful. In particular, I wanted to log the time required to execute each step in the workflow, for use in optimization that I will discuss further below. I asked Claude to add this: + +> I would like to log information about execution, including the time required to execute each step along with the details about execution such as parameters passed for each step. please record these during execution and save to a date-stamped json file within the workflows directory. + +After Claude's implementation of this feature, a fresh run of the workflow gives the following summary: + +```bash +============================================================ +EXECUTION SUMMARY +============================================================ +Workflow: immune_aging_scrnaseq +Run ID: 20251221_114458 +Status: completed +Total Duration: 7094.5 seconds + +Step Details: +------------------------------------------------------------ + ✓ Step 1: data_download 0.0s [cached] + ✓ Step 2: filtering 74.7s + ✓ Step 3: quality_control 263.3s + ✓ Step 4: preprocessing 35.9s + ✓ Step 5: dimensionality_reduction 6565.4s + ✓ Step 6: clustering 69.6s + ✓ Step 7: pseudobulking 11.6s + ✓ Step 8: differential_expression 19.0s + ✓ Step 9: gsea 1.7s + ✓ Step 10: overrepresentation 13.3s + ✓ Step 11: predictive_modeling 39.8s +------------------------------------------------------------ +``` + +The associated JSON file contains much more detail regarding each workflow step. If we run the workflow again, we see that it now uses cached results at each step: + +```bash +============================================================ +EXECUTION SUMMARY +============================================================ +Workflow: immune_aging_scrnaseq +Run ID: 20251221_142225 +Status: completed +Total Duration: 17.4 seconds + +Step Details: +------------------------------------------------------------ + ✓ Step 1: data_download 0.0s [cached] + ✓ Step 2: filtering 1.9s [cached] + ✓ Step 3: quality_control 3.0s [cached] + ✓ Step 4: preprocessing 3.1s [cached] + ✓ Step 5: dimensionality_reduction 3.4s [cached] + ✓ Step 6: clustering 4.3s [cached] + ✓ Step 7: pseudobulking 0.1s [cached] + ✓ Step 8: differential_expression 1.4s [cached] + ✓ Step 9: gsea 0.0s [cached] + ✓ Step 10: overrepresentation 0.0s [cached] + ✓ Step 11: predictive_modeling 0.0s [cached] +------------------------------------------------------------ +``` + +Checkpointing thus solved our problem, by allowing each step to be skipped over once it's been completed. + +#### Checkpointing and disk usage + +One potential drawback of checkpointing is that it can result in substantial disk usage when working with large datasets. In the example above, the checkpoint directory after workflow completion weighs in at a whopping 64 Gigabytes, with numerous very large files: + +```bash +➤ du -sh * + +7.3G step02_filtered.h5ad + 13G step03_qc.h5ad + 13G step04_preprocessed.h5ad + 14G step05_dimreduced.h5ad + 14G step06_clustered.h5ad +380M step07_pseudobulk.h5ad + 28M step08_counts.parquet +1.6M step08_de_results.parquet +1.8G step08_stat_res.pkl + 13M step09_gsea.pkl + 44K step10_enr_down.pkl + 28K step10_enr_up.pkl + 36K step11_prediction.pkl + ``` + +In particular, in step 3 a copy of the original data was added for reuse in a later step (in a separate variable within the dataset) alongside the results of processing at that step, leading to files that were roughly doubled in size. However, those raw data were not needed again until step 7. By changing the workflow to avoid saving those data in the checkpoints and instead loading them directly at step 7, we were able to halve the size of those intermediate checkpoints. + +In this implementation a checkpoint file was stored for each step in the workflow. However, if the goal of checkpointing is primarily to avoid having to rerun expensive computations, then we don't need to checkpoint every step given that some of them take relatively little time. In this case, we can checkpoint only after a subset of steps. In this case I chose to checkpoint after steps 2, 3, and 5 since those each take well over a minute to run (with step 5 taking well over an hour). Another goal of checkpointing is to store files that might be useful for later analyses or QA by the researcher. In this example workflow, steps 1-7 can be classified as "preprocessing" in the sense that they are preparing the data for analysis, whereas steps 8-11 reflect actual analyses of the data, such that the results could be reported in a publication. It is thus important to save those outputs for later analyses and for sharing with the final results. + +#### Compressing checkpoint files + +Another potentially helpful solution is to compress the checkpoint data if they are not already being compressed by default. In this example, the default in the AnnData package for saving `h5ad` files is to use no compression, so there are substantial savings in disk space to be had by compressing the data: whereas the raw data file was 7.3 GB, a version of the same data saved using compression took up only 2.9 GB. The tradeoff is that working with compressed files takes longer. This is particularly the case for saving of files; whereas it took about 3 seconds to save an uncompressed version of the data, it took about 105 seconds to store the compressed version. Given that the saving of the compressed file will happen in the context of an already long workflow, that doesn't seem like such a concern. We are more concerned about how the use of compression increases loading times, and here the difference is not quite so dramatic, at 1.3 seconds versus 19.8 seconds. The decision about whether or not to compress will ultimately come down to the relative cost of time versus disk space, but in this case I decided to go ahead and compress the checkpoint files. + +Combining these strategies of reducing data duplication, eliminating some intermediate checkpoints, and compressing the stored data, our final pipeline generates about 13 GB worth of checkpoint data, substantially smaller than the initial 64 GB. With all checkpoints generated, the entire workflow completes in less than four minutes, with only three time-consuming steps being rerun each time. The initial execution of the workflow is a few minutes longer due to the extra time needed to read and write compressed checkpoint files, but these few minutes are hardly noticeable for a workflow that takes more than two hours to complete. + + +### Using a workflow engine -but sometimes state is required -- e.g. training a neural network, one needs to know where you are in the process From ad9381ae07731a69bbb5c21f69c430e4f317a6f9 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 07:34:11 -0800 Subject: [PATCH 44/87] Add Prefect-based workflow with parallel per-cell-type analysis MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implements a new Prefect workflow at rnaseq/prefect_workflow/ that: - Wraps existing modular workflow functions as Prefect tasks - Runs steps 1-7 sequentially with checkpoint caching - Runs steps 8-11 (DE, GSEA, Enrichr, prediction) in parallel for each cell type - Organizes results by cell type in workflow/results/per_cell_type/ New files: - tasks.py: Prefect task definitions - flows.py: Main workflow flow with parallel execution - run_workflow.py: CLI entry point with --force-from, --cell-type, --list-cell-types 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 56 +-- pyproject.toml | 1 + .../rnaseq/prefect_workflow/__init__.py | 0 .../rnaseq/prefect_workflow/flows.py | 430 +++++++++++++++++ .../rnaseq/prefect_workflow/run_workflow.py | 177 +++++++ .../rnaseq/prefect_workflow/tasks.py | 376 +++++++++++++++ uv.lock | 446 +++++++++++++++++- 7 files changed, 1427 insertions(+), 59 deletions(-) create mode 100644 src/BetterCodeBetterScience/rnaseq/prefect_workflow/__init__.py create mode 100644 src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py create mode 100644 src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py create mode 100644 src/BetterCodeBetterScience/rnaseq/prefect_workflow/tasks.py diff --git a/problems_to_solve.md b/problems_to_solve.md index 03cffce..f1e6d77 100644 --- a/problems_to_solve.md +++ b/problems_to_solve.md @@ -3,47 +3,15 @@ Open problems marked with [ ] Fixed problems marked with [x] - -[x] Please change the file naming scheme for the checkpoint files to use a BIDS schema, just like the downloaded data. - - Implemented `bids_checkpoint_name()` and `parse_bids_checkpoint_name()` functions - - Checkpoint files now use format: `dataset-{name}_step-{number}_desc-{description}.{extension}` - - Updated `list_checkpoints()` and `clear_checkpoints_from_step()` to support both BIDS and legacy naming - -[x] Please save the checkpoint h5ad files using compression='gzip' - - Added `compression="gzip"` to `data.write()` in `save_checkpoint()` function - -[x] The size of the checkpoint files is very large, I think in part due to their storage of the original counts within the .X variable in the dataset. However, I'm not sure if and when that's actually necessary, versus simply reloading the original data or an earlier checkpoint to re-populate that variable. Please examine the usage of this .X variable and determine whether it would make more sense to remove it for the sake of space and then reload if needed from an earlier checkpoint. - -### Analysis of .X variable usage: - -The workflow uses two main data storage locations in AnnData: - -1. **`.X`** - The main expression matrix: - - Steps 2-3: Contains raw counts - - Steps 4-6: Contains normalized, log-transformed expression data (after preprocessing) - - Used for: QC metrics, normalization, HVG selection, PCA, neighbor graph, UMAP, clustering - -2. **`layers["counts"]`** - Raw counts layer: - - Created at end of Step 3 (QC) for HVG selection in step 4 - - Deleted after step 4 before step 5 checkpoint is saved - - Step 7 (pseudobulking) loads step 3 checkpoint directly to get raw counts from `.X` - -**Optimization implemented:** -- `layers["counts"]` is created in step 3 (needed for HVG selection in step 4) -- After step 4 (preprocessing), the counts layer is deleted before step 5 saves its checkpoint -- Step 7 (pseudobulking) loads the step 3 checkpoint to get raw counts from `.X` -- This eliminates redundant storage of raw counts in steps 5-6 checkpoints - -**Storage savings:** -- Step 3 checkpoint stores both `.X` (raw counts) and `layers["counts"]` (needed for step 4) -- Steps 5-6 checkpoints store only `.X` (counts layer deleted after step 4) -- Combined with gzip compression, this reduces storage for steps 5-6 checkpoints - -After Step 7, the pseudobulk AnnData is a separate object with only aggregated counts in `.X` (no layers needed). Steps 8-11 use pickle/parquet files, not h5ad. - -**Selective checkpointing:** -- Added `checkpoint_steps` parameter to `run_stateless_workflow()` (default: `{2, 3, 5, 8, 9, 10, 11}`) -- Only specified steps save checkpoints; other steps run without saving -- Step 3 is always required (provides raw counts for pseudobulking) -- Steps 8-11 included by default as they produce small pickle/parquet files -- Added `skip_save` parameter to `run_with_checkpoint()` and `run_with_checkpoint_multi()` +[x] I would like to add a new workflow, with code saved to src/BetterCodeBetterScience/rnaseq/prefect_workflow. This workflow will use the Prefect workflow manager (https://github.com/PrefectHQ/prefect) to manage the workflow that was previously developed in src/BetterCodeBetterScience/rnaseq/stateless_workflow. The one new feature that I would like to add here is to perform steps 8-11 separately on each different cell type that survives the initial filtering. + - Created `prefect_workflow/` directory with: + - `tasks.py`: Prefect task definitions wrapping modular workflow functions + - `flows.py`: Main workflow flow with parallel per-cell-type analysis + - `run_workflow.py`: CLI entry point with argument parsing + - Steps 1-7 run sequentially with checkpoint caching (reuses existing system) + - Steps 8-11 run in parallel for each cell type: + - DE tasks submitted in parallel across all cell types + - GSEA, Enrichr, and predictive modeling run in parallel within each cell type + - Added `prefect>=3.0` dependency to pyproject.toml + - Results organized by cell type in `workflow/results/per_cell_type/` + - CLI supports: `--force-from`, `--cell-type`, `--list-cell-types`, `--min-samples` diff --git a/pyproject.toml b/pyproject.toml index 50c0f88..2874068 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -79,6 +79,7 @@ dependencies = [ "ipython>=9.8.0", "harmonypy>=0.0.10", "rpy2>=3.6.4", + "prefect>=3.0", ] [build-system] diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/__init__.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py new file mode 100644 index 0000000..a0b06f0 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py @@ -0,0 +1,430 @@ +"""Prefect flow definitions for scRNA-seq workflow. + +Main workflow flow that orchestrates all tasks. +""" + +from pathlib import Path +from typing import Any + +from prefect import flow, get_run_logger + +from BetterCodeBetterScience.rnaseq.prefect_workflow.tasks import ( + clustering_task, + differential_expression_task, + dimensionality_reduction_task, + download_data_task, + load_and_filter_task, + overrepresentation_task, + pathway_analysis_task, + predictive_modeling_task, + preprocessing_task, + pseudobulk_task, + quality_control_task, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + bids_checkpoint_name, + load_checkpoint, +) + + +@flow(name="immune_aging_scrna_workflow", log_prints=True) +def run_workflow( + datadir: Path, + dataset_name: str = "OneK1K", + url: str = "https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad", + force_from_step: int | None = None, + min_samples_per_cell_type: int = 10, +) -> dict[str, Any]: + """Run the complete immune aging scRNA-seq workflow with Prefect. + + Steps 1-7 run sequentially (shared preprocessing). + Steps 8-11 run in parallel for each cell type. + + Parameters + ---------- + datadir : Path + Base directory for data files + dataset_name : str + Name of the dataset + url : str + URL to download data from + force_from_step : int, optional + If provided, forces re-run from this step onwards + min_samples_per_cell_type : int + Minimum samples required per cell type to run steps 8-11 + + Returns + ------- + dict + Dictionary containing all results organized by cell type + """ + logger = get_run_logger() + + # Setup directories + figure_dir = datadir / "workflow/figures" + figure_dir.mkdir(parents=True, exist_ok=True) + + checkpoint_dir = datadir / "workflow/checkpoints" + checkpoint_dir.mkdir(parents=True, exist_ok=True) + + results_dir = datadir / "workflow/results/per_cell_type" + results_dir.mkdir(parents=True, exist_ok=True) + + # Determine which steps to force re-run + force = {i: False for i in range(1, 12)} + if force_from_step is not None: + for i in range(force_from_step, 12): + force[i] = True + + # ========================================================================= + # STEP 1: Data Download + # ========================================================================= + logger.info("=" * 60) + logger.info("STEP 1: DATA DOWNLOAD") + logger.info("=" * 60) + + datafile = datadir / f"dataset-{dataset_name}_subset-immune_raw.h5ad" + download_data_task(datafile, url) + + # ========================================================================= + # STEP 2: Data Filtering + # ========================================================================= + logger.info("=" * 60) + logger.info("STEP 2: DATA FILTERING") + logger.info("=" * 60) + + adata = load_and_filter_task( + datafile=datafile, + checkpoint_file=checkpoint_dir + / bids_checkpoint_name(dataset_name, 2, "filtered"), + cutoff_percentile=1.0, + min_cells_per_celltype=10, + percent_donors=0.95, + figure_dir=figure_dir, + force=force[2], + ) + + # Build var_to_feature mapping + var_to_feature = dict(zip(adata.var_names, adata.var["feature_name"])) + + # ========================================================================= + # STEP 3: Quality Control + # ========================================================================= + logger.info("=" * 60) + logger.info("STEP 3: QUALITY CONTROL") + logger.info("=" * 60) + + adata = quality_control_task( + adata=adata, + checkpoint_file=checkpoint_dir / bids_checkpoint_name(dataset_name, 3, "qc"), + min_genes=200, + max_genes=6000, + min_counts=500, + max_counts=30000, + max_hb_pct=5.0, + expected_doublet_rate=0.06, + figure_dir=figure_dir, + force=force[3], + ) + + # ========================================================================= + # STEP 4: Preprocessing + # ========================================================================= + logger.info("=" * 60) + logger.info("STEP 4: PREPROCESSING") + logger.info("=" * 60) + + adata = preprocessing_task( + adata=adata, + checkpoint_file=checkpoint_dir + / bids_checkpoint_name(dataset_name, 4, "preprocessed"), + target_sum=1e4, + n_top_genes=3000, + batch_key="donor_id", + force=force[4], + ) + + # ========================================================================= + # STEP 5: Dimensionality Reduction + # ========================================================================= + logger.info("=" * 60) + logger.info("STEP 5: DIMENSIONALITY REDUCTION") + logger.info("=" * 60) + + adata = dimensionality_reduction_task( + adata=adata, + checkpoint_file=checkpoint_dir + / bids_checkpoint_name(dataset_name, 5, "dimreduced"), + batch_key="donor_id", + n_neighbors=30, + n_pcs=40, + figure_dir=figure_dir, + force=force[5], + ) + + # ========================================================================= + # STEP 6: Clustering + # ========================================================================= + logger.info("=" * 60) + logger.info("STEP 6: CLUSTERING") + logger.info("=" * 60) + + adata = clustering_task( + adata=adata, + checkpoint_file=checkpoint_dir + / bids_checkpoint_name(dataset_name, 6, "clustered"), + resolution=1.0, + figure_dir=figure_dir, + force=force[6], + ) + + # ========================================================================= + # STEP 7: Pseudobulking + # ========================================================================= + logger.info("=" * 60) + logger.info("STEP 7: PSEUDOBULKING") + logger.info("=" * 60) + + # Load step 3 checkpoint for raw counts + step3_checkpoint = checkpoint_dir / bids_checkpoint_name(dataset_name, 3, "qc") + adata_raw_counts = load_checkpoint(step3_checkpoint) + logger.info(f"Loaded raw counts from step 3: {adata_raw_counts.shape}") + + pb_adata = pseudobulk_task( + adata=adata_raw_counts, + checkpoint_file=checkpoint_dir + / bids_checkpoint_name(dataset_name, 7, "pseudobulk"), + group_col="cell_type", + donor_col="donor_id", + metadata_cols=["development_stage", "sex"], + min_cells=10, + figure_dir=figure_dir, + layer=None, # Use .X directly (raw counts) + force=force[7], + ) + + # ========================================================================= + # STEPS 8-11: Per-Cell-Type Analysis (Parallel) + # ========================================================================= + logger.info("=" * 60) + logger.info("STEPS 8-11: PER-CELL-TYPE ANALYSIS") + logger.info("=" * 60) + + # Get all cell types from pseudobulk + cell_types = pb_adata.obs["cell_type"].unique().tolist() + logger.info(f"Found {len(cell_types)} cell types to analyze") + + # Filter cell types with insufficient samples + cell_type_counts = pb_adata.obs["cell_type"].value_counts() + valid_cell_types = [ + ct for ct in cell_types if cell_type_counts[ct] >= min_samples_per_cell_type + ] + skipped_cell_types = [ct for ct in cell_types if ct not in valid_cell_types] + + if skipped_cell_types: + logger.warning( + f"Skipping {len(skipped_cell_types)} cell types with < {min_samples_per_cell_type} samples: " + f"{skipped_cell_types}" + ) + + logger.info(f"Analyzing {len(valid_cell_types)} cell types") + + # Step 8: Submit all DE tasks in parallel + logger.info("Submitting differential expression tasks...") + de_futures = {} + for cell_type in valid_cell_types: + de_futures[cell_type] = differential_expression_task.submit( + pb_adata=pb_adata, + cell_type=cell_type, + var_to_feature=var_to_feature, + output_dir=results_dir, + design_factors=["age_scaled", "sex"], + n_cpus=4, # Reduced per-task to allow parallelism + ) + + # Steps 9-11: Submit pathway/enrichment/prediction tasks as DE completes + gsea_futures = {} + enrichr_futures = {} + prediction_futures = {} + + for cell_type in valid_cell_types: + # Wait for DE to complete for this cell type + de_result = de_futures[cell_type].result() + + # Get metadata for this cell type (for predictive modeling) + pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type].copy() + pb_adata_ct.obs["age"] = ( + pb_adata_ct.obs["development_stage"] + .str.extract(r"(\d+)-year-old")[0] + .astype(float) + ) + metadata_ct = pb_adata_ct.obs.copy() + + # Submit steps 9, 10, 11 in parallel for this cell type + gsea_futures[cell_type] = pathway_analysis_task.submit( + de_results=de_result["de_results"], + cell_type=cell_type, + output_dir=results_dir, + gene_sets=["MSigDB_Hallmark_2020"], + n_top=10, + ) + + enrichr_futures[cell_type] = overrepresentation_task.submit( + de_results=de_result["de_results"], + cell_type=cell_type, + output_dir=results_dir, + gene_sets=["MSigDB_Hallmark_2020"], + padj_threshold=0.05, + n_top=10, + ) + + prediction_futures[cell_type] = predictive_modeling_task.submit( + counts_df=de_result["counts_df"], + metadata=metadata_ct, + cell_type=cell_type, + output_dir=results_dir, + n_splits=5, + ) + + # Collect all results + logger.info("Collecting results...") + all_results = { + "adata": adata, + "pb_adata": pb_adata, + "per_cell_type": {}, + } + + for cell_type in valid_cell_types: + try: + all_results["per_cell_type"][cell_type] = { + "de": de_futures[cell_type].result(), + "gsea": gsea_futures[cell_type].result(), + "enrichment": enrichr_futures[cell_type].result(), + "prediction": prediction_futures[cell_type].result(), + } + logger.info(f"Completed analysis for: {cell_type}") + except Exception as e: + logger.error(f"Failed analysis for {cell_type}: {e}") + all_results["per_cell_type"][cell_type] = {"error": str(e)} + + # ========================================================================= + # Summary + # ========================================================================= + logger.info("=" * 60) + logger.info("WORKFLOW COMPLETE") + logger.info("=" * 60) + + successful = sum( + 1 + for ct_results in all_results["per_cell_type"].values() + if "error" not in ct_results + ) + failed = len(valid_cell_types) - successful + + logger.info( + f"Successfully analyzed: {successful}/{len(valid_cell_types)} cell types" + ) + if failed > 0: + logger.warning(f"Failed: {failed} cell types") + + logger.info(f"Figures saved to: {figure_dir}") + logger.info(f"Checkpoints saved to: {checkpoint_dir}") + logger.info(f"Per-cell-type results saved to: {results_dir}") + + return all_results + + +@flow(name="analyze_single_cell_type", log_prints=True) +def analyze_single_cell_type( + datadir: Path, + cell_type: str, + dataset_name: str = "OneK1K", +) -> dict[str, Any]: + """Run analysis for a single cell type (useful for debugging/testing). + + Requires that steps 1-7 have already been run. + + Parameters + ---------- + datadir : Path + Base directory for data files + cell_type : str + Cell type to analyze + dataset_name : str + Name of the dataset + + Returns + ------- + dict + Results for the specified cell type + """ + logger = get_run_logger() + + checkpoint_dir = datadir / "workflow/checkpoints" + results_dir = datadir / "workflow/results/per_cell_type" + results_dir.mkdir(parents=True, exist_ok=True) + + # Load required checkpoints + pb_adata = load_checkpoint( + checkpoint_dir / bids_checkpoint_name(dataset_name, 7, "pseudobulk") + ) + adata_filtered = load_checkpoint( + checkpoint_dir / bids_checkpoint_name(dataset_name, 2, "filtered") + ) + var_to_feature = dict( + zip(adata_filtered.var_names, adata_filtered.var["feature_name"]) + ) + + # Verify cell type exists + available_cell_types = pb_adata.obs["cell_type"].unique().tolist() + if cell_type not in available_cell_types: + raise ValueError( + f"Cell type '{cell_type}' not found. Available: {available_cell_types}" + ) + + logger.info(f"Analyzing cell type: {cell_type}") + + # Run DE + de_result = differential_expression_task( + pb_adata=pb_adata, + cell_type=cell_type, + var_to_feature=var_to_feature, + output_dir=results_dir, + ) + + # Get metadata + pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type].copy() + pb_adata_ct.obs["age"] = ( + pb_adata_ct.obs["development_stage"] + .str.extract(r"(\d+)-year-old")[0] + .astype(float) + ) + metadata_ct = pb_adata_ct.obs.copy() + + # Run parallel tasks + gsea_future = pathway_analysis_task.submit( + de_results=de_result["de_results"], + cell_type=cell_type, + output_dir=results_dir, + ) + + enrichr_future = overrepresentation_task.submit( + de_results=de_result["de_results"], + cell_type=cell_type, + output_dir=results_dir, + ) + + prediction_future = predictive_modeling_task.submit( + counts_df=de_result["counts_df"], + metadata=metadata_ct, + cell_type=cell_type, + output_dir=results_dir, + ) + + return { + "cell_type": cell_type, + "de": de_result, + "gsea": gsea_future.result(), + "enrichment": enrichr_future.result(), + "prediction": prediction_future.result(), + } diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py new file mode 100644 index 0000000..bc5e8ff --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py @@ -0,0 +1,177 @@ +"""Entry point for running the Prefect-based scRNA-seq workflow. + +Usage: + python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow + +Or with arguments: + python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow --force-from 8 +""" + +import argparse +import os +from pathlib import Path + +from dotenv import load_dotenv + +from BetterCodeBetterScience.rnaseq.prefect_workflow.flows import ( + analyze_single_cell_type, + run_workflow, +) + + +def main(): + """Run the Prefect workflow.""" + parser = argparse.ArgumentParser( + description="Run the immune aging scRNA-seq workflow with Prefect" + ) + parser.add_argument( + "--datadir", + type=Path, + default=None, + help="Base directory for data files (default: from DATADIR env var)", + ) + parser.add_argument( + "--dataset-name", + type=str, + default="OneK1K", + help="Name of the dataset (default: OneK1K)", + ) + parser.add_argument( + "--force-from", + type=int, + default=None, + dest="force_from_step", + help="Force re-run from this step onwards (1-11)", + ) + parser.add_argument( + "--min-samples", + type=int, + default=10, + dest="min_samples", + help="Minimum samples per cell type for steps 8-11 (default: 10)", + ) + parser.add_argument( + "--cell-type", + type=str, + default=None, + dest="cell_type", + help="Run analysis for a single cell type only (requires prior completion of steps 1-7)", + ) + parser.add_argument( + "--list-cell-types", + action="store_true", + dest="list_cell_types", + help="List available cell types and exit", + ) + + args = parser.parse_args() + + # Load environment variables + load_dotenv() + + # Get data directory + if args.datadir is not None: + datadir = args.datadir + else: + datadir_env = os.getenv("DATADIR") + if datadir_env is None: + raise ValueError( + "DATADIR environment variable not set. " + "Set it or use --datadir argument." + ) + datadir = Path(datadir_env) / "immune_aging" + + print(f"Data directory: {datadir}") + + # List cell types if requested + if args.list_cell_types: + from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + bids_checkpoint_name, + load_checkpoint, + ) + + checkpoint_dir = datadir / "workflow/checkpoints" + pb_checkpoint = checkpoint_dir / bids_checkpoint_name( + args.dataset_name, 7, "pseudobulk" + ) + + if not pb_checkpoint.exists(): + print(f"Pseudobulk checkpoint not found: {pb_checkpoint}") + print("Run steps 1-7 first to generate pseudobulk data.") + return + + pb_adata = load_checkpoint(pb_checkpoint) + cell_types = pb_adata.obs["cell_type"].unique().tolist() + cell_type_counts = pb_adata.obs["cell_type"].value_counts() + + print(f"\nAvailable cell types ({len(cell_types)} total):") + print("-" * 60) + for ct in sorted(cell_types): + count = cell_type_counts[ct] + status = ( + "OK" if count >= args.min_samples else f"< {args.min_samples} samples" + ) + print(f" {ct}: {count} samples ({status})") + return + + # Run single cell type analysis + if args.cell_type is not None: + print(f"\nRunning analysis for single cell type: {args.cell_type}") + results = analyze_single_cell_type( + datadir=datadir, + cell_type=args.cell_type, + dataset_name=args.dataset_name, + ) + print("\nResults:") + print(f" DE genes: {len(results['de']['de_results'])}") + if results["gsea"]["gsea_results"] is not None: + print(f" GSEA pathways: {len(results['gsea']['gsea_results'].res2d)}") + if results["prediction"]["prediction_results"]: + pred = results["prediction"]["prediction_results"] + import numpy as np + + print(f" Prediction R2: {np.mean(pred['full_r2']):.3f}") + print(f" Prediction MAE: {np.mean(pred['full_mae']):.2f} years") + return + + # Run full workflow + print("\nRunning full workflow...") + if args.force_from_step: + print(f"Forcing re-run from step {args.force_from_step}") + + results = run_workflow( + datadir=datadir, + dataset_name=args.dataset_name, + force_from_step=args.force_from_step, + min_samples_per_cell_type=args.min_samples, + ) + + # Print summary + print("\n" + "=" * 60) + print("RESULTS SUMMARY") + print("=" * 60) + + successful_cell_types = [ + ct for ct, res in results["per_cell_type"].items() if "error" not in res + ] + + print(f"Analyzed {len(successful_cell_types)} cell types:") + for ct in sorted(successful_cell_types): + ct_res = results["per_cell_type"][ct] + de_count = len(ct_res["de"]["de_results"]) + sig_genes = (ct_res["de"]["de_results"]["padj"] < 0.05).sum() + print(f" {ct}:") + print(f" - DE genes tested: {de_count}") + print(f" - Significant (padj<0.05): {sig_genes}") + + failed_cell_types = [ + ct for ct, res in results["per_cell_type"].items() if "error" in res + ] + if failed_cell_types: + print(f"\nFailed cell types ({len(failed_cell_types)}):") + for ct in failed_cell_types: + print(f" {ct}: {results['per_cell_type'][ct]['error']}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/tasks.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/tasks.py new file mode 100644 index 0000000..b0c7c83 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/tasks.py @@ -0,0 +1,376 @@ +"""Prefect task definitions for scRNA-seq workflow. + +Wraps modular workflow functions as Prefect tasks for orchestration. +""" + +from pathlib import Path +from typing import Any + +import anndata as ad +import pandas as pd +from prefect import task + +from BetterCodeBetterScience.rnaseq.modular_workflow.clustering import ( + run_clustering_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.data_filtering import ( + run_filtering_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.data_loading import ( + download_data, + load_lazy_anndata, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.differential_expression import ( + run_differential_expression_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.dimensionality_reduction import ( + run_dimensionality_reduction_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.overrepresentation_analysis import ( + run_overrepresentation_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.pathway_analysis import ( + run_gsea_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.predictive_modeling import ( + run_predictive_modeling_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.preprocessing import ( + run_preprocessing_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.pseudobulk import ( + run_pseudobulk_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.quality_control import ( + run_qc_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + load_checkpoint, + save_checkpoint, +) + + +@task(name="download_data", retries=2, retry_delay_seconds=30) +def download_data_task(datafile: Path, url: str) -> Path: + """Download data file if it doesn't exist. + + Returns the datafile path for chaining. + """ + download_data(datafile, url) + return datafile + + +@task(name="load_and_filter") +def load_and_filter_task( + datafile: Path, + checkpoint_file: Path, + cutoff_percentile: float = 1.0, + min_cells_per_celltype: int = 10, + percent_donors: float = 0.95, + figure_dir: Path | None = None, + force: bool = False, +) -> ad.AnnData: + """Load data and run filtering pipeline with checkpointing.""" + if checkpoint_file.exists() and not force: + print(f"Loading from checkpoint: {checkpoint_file.name}") + return load_checkpoint(checkpoint_file) + + adata = load_lazy_anndata(datafile) + print(f"Loaded dataset: {adata}") + adata = run_filtering_pipeline( + adata, + cutoff_percentile=cutoff_percentile, + min_cells_per_celltype=min_cells_per_celltype, + percent_donors=percent_donors, + figure_dir=figure_dir, + ) + save_checkpoint(adata, checkpoint_file) + return adata + + +@task(name="quality_control") +def quality_control_task( + adata: ad.AnnData, + checkpoint_file: Path, + min_genes: int = 200, + max_genes: int = 6000, + min_counts: int = 500, + max_counts: int = 30000, + max_hb_pct: float = 5.0, + expected_doublet_rate: float = 0.06, + figure_dir: Path | None = None, + force: bool = False, +) -> ad.AnnData: + """Run quality control pipeline with checkpointing.""" + if checkpoint_file.exists() and not force: + print(f"Loading from checkpoint: {checkpoint_file.name}") + return load_checkpoint(checkpoint_file) + + adata = run_qc_pipeline( + adata, + min_genes=min_genes, + max_genes=max_genes, + min_counts=min_counts, + max_counts=max_counts, + max_hb_pct=max_hb_pct, + expected_doublet_rate=expected_doublet_rate, + figure_dir=figure_dir, + ) + save_checkpoint(adata, checkpoint_file) + return adata + + +@task(name="preprocessing") +def preprocessing_task( + adata: ad.AnnData, + checkpoint_file: Path, + target_sum: float = 1e4, + n_top_genes: int = 3000, + batch_key: str = "donor_id", + force: bool = False, +) -> ad.AnnData: + """Run preprocessing pipeline with checkpointing.""" + if checkpoint_file.exists() and not force: + print(f"Loading from checkpoint: {checkpoint_file.name}") + return load_checkpoint(checkpoint_file) + + adata = run_preprocessing_pipeline( + adata, + target_sum=target_sum, + n_top_genes=n_top_genes, + batch_key=batch_key, + ) + # Remove counts layer after preprocessing to save space + if "counts" in adata.layers: + del adata.layers["counts"] + print("Removed counts layer to save checkpoint space") + + save_checkpoint(adata, checkpoint_file) + return adata + + +@task(name="dimensionality_reduction") +def dimensionality_reduction_task( + adata: ad.AnnData, + checkpoint_file: Path, + batch_key: str = "donor_id", + n_neighbors: int = 30, + n_pcs: int = 40, + figure_dir: Path | None = None, + force: bool = False, +) -> ad.AnnData: + """Run dimensionality reduction pipeline with checkpointing.""" + if checkpoint_file.exists() and not force: + print(f"Loading from checkpoint: {checkpoint_file.name}") + return load_checkpoint(checkpoint_file) + + adata = run_dimensionality_reduction_pipeline( + adata, + batch_key=batch_key, + n_neighbors=n_neighbors, + n_pcs=n_pcs, + figure_dir=figure_dir, + ) + save_checkpoint(adata, checkpoint_file) + return adata + + +@task(name="clustering") +def clustering_task( + adata: ad.AnnData, + checkpoint_file: Path, + resolution: float = 1.0, + figure_dir: Path | None = None, + force: bool = False, +) -> ad.AnnData: + """Run clustering pipeline with checkpointing.""" + if checkpoint_file.exists() and not force: + print(f"Loading from checkpoint: {checkpoint_file.name}") + return load_checkpoint(checkpoint_file) + + adata = run_clustering_pipeline( + adata, + resolution=resolution, + figure_dir=figure_dir, + ) + save_checkpoint(adata, checkpoint_file) + return adata + + +@task(name="pseudobulk") +def pseudobulk_task( + adata: ad.AnnData, + checkpoint_file: Path, + group_col: str = "cell_type", + donor_col: str = "donor_id", + metadata_cols: list[str] | None = None, + min_cells: int = 10, + figure_dir: Path | None = None, + layer: str | None = None, + force: bool = False, +) -> ad.AnnData: + """Run pseudobulking pipeline with checkpointing.""" + if checkpoint_file.exists() and not force: + print(f"Loading from checkpoint: {checkpoint_file.name}") + return load_checkpoint(checkpoint_file) + + pb_adata = run_pseudobulk_pipeline( + adata, + group_col=group_col, + donor_col=donor_col, + metadata_cols=metadata_cols, + min_cells=min_cells, + figure_dir=figure_dir, + layer=layer, + ) + save_checkpoint(pb_adata, checkpoint_file) + return pb_adata + + +@task(name="differential_expression", retries=1) +def differential_expression_task( + pb_adata: ad.AnnData, + cell_type: str, + var_to_feature: dict[str, str], + output_dir: Path, + design_factors: list[str] | None = None, + n_cpus: int = 8, +) -> dict[str, Any]: + """Run differential expression for a specific cell type. + + Returns dict with stat_res, de_results, and counts_df. + """ + print(f"\n{'=' * 60}") + print(f"Running DE for cell type: {cell_type}") + print(f"{'=' * 60}") + + stat_res, de_results, counts_df = run_differential_expression_pipeline( + pb_adata, + cell_type=cell_type, + design_factors=design_factors, + var_to_feature=var_to_feature, + n_cpus=n_cpus, + ) + + # Save results to cell-type specific directory + ct_dir = output_dir / _sanitize_cell_type(cell_type) + ct_dir.mkdir(parents=True, exist_ok=True) + + save_checkpoint(stat_res, ct_dir / "stat_res.pkl") + de_results.to_parquet(ct_dir / "de_results.parquet") + counts_df.to_parquet(ct_dir / "counts.parquet") + + return { + "cell_type": cell_type, + "stat_res": stat_res, + "de_results": de_results, + "counts_df": counts_df, + } + + +@task(name="pathway_analysis") +def pathway_analysis_task( + de_results: pd.DataFrame, + cell_type: str, + output_dir: Path, + gene_sets: list[str] | None = None, + n_top: int = 10, +) -> dict[str, Any]: + """Run GSEA pathway analysis for a cell type.""" + print(f"\n{'=' * 60}") + print(f"Running GSEA for cell type: {cell_type}") + print(f"{'=' * 60}") + + ct_dir = output_dir / _sanitize_cell_type(cell_type) + ct_dir.mkdir(parents=True, exist_ok=True) + figure_dir = ct_dir / "figures" + figure_dir.mkdir(parents=True, exist_ok=True) + + gsea_results = run_gsea_pipeline( + de_results, + gene_sets=gene_sets, + n_top=n_top, + figure_dir=figure_dir, + ) + + save_checkpoint(gsea_results, ct_dir / "gsea_results.pkl") + + return { + "cell_type": cell_type, + "gsea_results": gsea_results, + } + + +@task(name="overrepresentation") +def overrepresentation_task( + de_results: pd.DataFrame, + cell_type: str, + output_dir: Path, + gene_sets: list[str] | None = None, + padj_threshold: float = 0.05, + n_top: int = 10, +) -> dict[str, Any]: + """Run Enrichr overrepresentation analysis for a cell type.""" + print(f"\n{'=' * 60}") + print(f"Running Enrichr for cell type: {cell_type}") + print(f"{'=' * 60}") + + ct_dir = output_dir / _sanitize_cell_type(cell_type) + ct_dir.mkdir(parents=True, exist_ok=True) + figure_dir = ct_dir / "figures" + figure_dir.mkdir(parents=True, exist_ok=True) + + enr_up, enr_down = run_overrepresentation_pipeline( + de_results, + gene_sets=gene_sets, + padj_threshold=padj_threshold, + n_top=n_top, + figure_dir=figure_dir, + ) + + save_checkpoint(enr_up, ct_dir / "enrichr_up.pkl") + save_checkpoint(enr_down, ct_dir / "enrichr_down.pkl") + + return { + "cell_type": cell_type, + "enr_up": enr_up, + "enr_down": enr_down, + } + + +@task(name="predictive_modeling") +def predictive_modeling_task( + counts_df: pd.DataFrame, + metadata: pd.DataFrame, + cell_type: str, + output_dir: Path, + n_splits: int = 5, +) -> dict[str, Any]: + """Run predictive modeling for a cell type.""" + print(f"\n{'=' * 60}") + print(f"Running predictive modeling for cell type: {cell_type}") + print(f"{'=' * 60}") + + ct_dir = output_dir / _sanitize_cell_type(cell_type) + ct_dir.mkdir(parents=True, exist_ok=True) + figure_dir = ct_dir / "figures" + figure_dir.mkdir(parents=True, exist_ok=True) + + prediction_results = run_predictive_modeling_pipeline( + counts_df, + metadata, + n_splits=n_splits, + figure_dir=figure_dir, + ) + + save_checkpoint(prediction_results, ct_dir / "prediction_results.pkl") + + return { + "cell_type": cell_type, + "prediction_results": prediction_results, + } + + +def _sanitize_cell_type(cell_type: str) -> str: + """Sanitize cell type name for use as directory name.""" + return cell_type.replace(" ", "_").replace(",", "").replace("-", "_") diff --git a/uv.lock b/uv.lock index 996c206..5ab0f7a 100644 --- a/uv.lock +++ b/uv.lock @@ -85,6 +85,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/fb/76/641ae371508676492379f16e2fa48f4e2c11741bd63c48be4b12a6b09cba/aiosignal-1.4.0-py3-none-any.whl", hash = "sha256:053243f8b92b990551949e63930a839ff0cf0b0ebbe0597b0f3fb19e1a0fe82e", size = 7490, upload-time = "2025-07-03T22:54:42.156Z" }, ] +[[package]] +name = "aiosqlite" +version = "0.22.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/3a/0d/449c024bdabd0678ae07d804e60ed3b9786facd3add66f51eee67a0fccea/aiosqlite-0.22.0.tar.gz", hash = "sha256:7e9e52d72b319fcdeac727668975056c49720c995176dc57370935e5ba162bb9", size = 14707, upload-time = "2025-12-13T18:32:45.762Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/39/b2181148075272edfbbd6d87e6cd78cc71dca243446fa3b381fd4116950b/aiosqlite-0.22.0-py3-none-any.whl", hash = "sha256:96007fac2ce70eda3ca1bba7a3008c435258a592b8fbf2ee3eeaa36d33971a09", size = 17263, upload-time = "2025-12-13T18:32:44.619Z" }, +] + [[package]] name = "airium" version = "0.2.7" @@ -103,6 +112,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/32/34/d4e1c02d3bee589efb5dfa17f88ea08bdb3e3eac12bc475462aec52ed223/alabaster-0.7.16-py3-none-any.whl", hash = "sha256:b46733c07dce03ae4e150330b975c75737fa60f0a7c591b6c8bf4928a28e2c92", size = 13511, upload-time = "2024-01-10T00:56:08.388Z" }, ] +[[package]] +name = "alembic" +version = "1.17.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "mako" }, + { name = "sqlalchemy" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/02/a6/74c8cadc2882977d80ad756a13857857dbcf9bd405bc80b662eb10651282/alembic-1.17.2.tar.gz", hash = "sha256:bbe9751705c5e0f14877f02d46c53d10885e377e3d90eda810a016f9baa19e8e", size = 1988064, upload-time = "2025-11-14T20:35:04.057Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ba/88/6237e97e3385b57b5f1528647addea5cc03d4d65d5979ab24327d41fb00d/alembic-1.17.2-py3-none-any.whl", hash = "sha256:f483dd1fe93f6c5d49217055e4d15b905b425b6af906746abb35b69c1996c4e6", size = 248554, upload-time = "2025-11-14T20:35:05.699Z" }, +] + [[package]] name = "anndata" version = "0.12.7" @@ -212,6 +235,24 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/81/29/5ecc3a15d5a33e31b26c11426c45c501e439cb865d0bff96315d86443b78/appnope-0.1.4-py2.py3-none-any.whl", hash = "sha256:502575ee11cd7a28c0205f379b525beefebab9d161b7c964670864014ed7213c", size = 4321, upload-time = "2024-02-06T09:43:09.663Z" }, ] +[[package]] +name = "apprise" +version = "1.9.6" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "certifi" }, + { name = "click" }, + { name = "markdown" }, + { name = "pyyaml" }, + { name = "requests" }, + { name = "requests-oauthlib" }, + { name = "tzdata", marker = "sys_platform == 'win32'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/a9/a7/bb182d81f35c3fe405505f0976da4b74f942cfdd53c7193b0fe50412aa27/apprise-1.9.6.tar.gz", hash = "sha256:4206be9cb5694a3d08dd8e0393bbb9b36212ac3a7769c2633620055e75c6caef", size = 1921714, upload-time = "2025-12-07T19:24:30.587Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/39/df/343d125241f8cd3c9af58fd09688cf2bf59cc1edfd609adafef3556ce8ec/apprise-1.9.6-py3-none-any.whl", hash = "sha256:2fd18e8a5251b6a12f6f9d169f1d895d458d1de36a5faee4db149cedcce51674", size = 1452059, upload-time = "2025-12-07T19:24:28.568Z" }, +] + [[package]] name = "argon2-cffi" version = "25.1.0" @@ -267,6 +308,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ed/c9/d7977eaacb9df673210491da99e6a247e93df98c715fc43fd136ce1d3d33/arrow-1.4.0-py3-none-any.whl", hash = "sha256:749f0769958ebdc79c173ff0b0670d59051a535fa26e8eba02953dc19eb43205", size = 68797, upload-time = "2025-10-18T17:46:45.663Z" }, ] +[[package]] +name = "asgi-lifespan" +version = "2.1.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "sniffio" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/6a/da/e7908b54e0f8043725a990bf625f2041ecf6bfe8eb7b19407f1c00b630f7/asgi-lifespan-2.1.0.tar.gz", hash = "sha256:5e2effaf0bfe39829cf2d64e7ecc47c7d86d676a6599f7afba378c31f5e3a308", size = 15627, upload-time = "2023-03-28T17:35:49.126Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2f/f5/c36551e93acba41a59939ae6a0fb77ddb3f2e8e8caa716410c65f7341f72/asgi_lifespan-2.1.0-py3-none-any.whl", hash = "sha256:ed840706680e28428c01e14afb3875d7d76d3206f3d5b2f2294e059b5c23804f", size = 10895, upload-time = "2023-03-28T17:35:47.772Z" }, +] + [[package]] name = "asttokens" version = "3.0.1" @@ -285,6 +338,22 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/03/49/d10027df9fce941cb8184e78a02857af36360d33e1721df81c5ed2179a1a/async_lru-2.0.5-py3-none-any.whl", hash = "sha256:ab95404d8d2605310d345932697371a5f40def0487c03d6d0ad9138de52c9943", size = 6069, upload-time = "2025-03-16T17:25:35.422Z" }, ] +[[package]] +name = "asyncpg" +version = "0.31.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/fe/cc/d18065ce2380d80b1bcce927c24a2642efd38918e33fd724bc4bca904877/asyncpg-0.31.0.tar.gz", hash = "sha256:c989386c83940bfbd787180f2b1519415e2d3d6277a70d9d0f0145ac73500735", size = 993667, upload-time = "2025-11-24T23:27:00.812Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2a/a6/59d0a146e61d20e18db7396583242e32e0f120693b67a8de43f1557033e2/asyncpg-0.31.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b44c31e1efc1c15188ef183f287c728e2046abb1d26af4d20858215d50d91fad", size = 662042, upload-time = "2025-11-24T23:25:49.578Z" }, + { url = "https://files.pythonhosted.org/packages/36/01/ffaa189dcb63a2471720615e60185c3f6327716fdc0fc04334436fbb7c65/asyncpg-0.31.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:0c89ccf741c067614c9b5fc7f1fc6f3b61ab05ae4aaa966e6fd6b93097c7d20d", size = 638504, upload-time = "2025-11-24T23:25:51.501Z" }, + { url = "https://files.pythonhosted.org/packages/9f/62/3f699ba45d8bd24c5d65392190d19656d74ff0185f42e19d0bbd973bb371/asyncpg-0.31.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:12b3b2e39dc5470abd5e98c8d3373e4b1d1234d9fbdedf538798b2c13c64460a", size = 3426241, upload-time = "2025-11-24T23:25:53.278Z" }, + { url = "https://files.pythonhosted.org/packages/8c/d1/a867c2150f9c6e7af6462637f613ba67f78a314b00db220cd26ff559d532/asyncpg-0.31.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:aad7a33913fb8bcb5454313377cc330fbb19a0cd5faa7272407d8a0c4257b671", size = 3520321, upload-time = "2025-11-24T23:25:54.982Z" }, + { url = "https://files.pythonhosted.org/packages/7a/1a/cce4c3f246805ecd285a3591222a2611141f1669d002163abef999b60f98/asyncpg-0.31.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:3df118d94f46d85b2e434fd62c84cb66d5834d5a890725fe625f498e72e4d5ec", size = 3316685, upload-time = "2025-11-24T23:25:57.43Z" }, + { url = "https://files.pythonhosted.org/packages/40/ae/0fc961179e78cc579e138fad6eb580448ecae64908f95b8cb8ee2f241f67/asyncpg-0.31.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:bd5b6efff3c17c3202d4b37189969acf8927438a238c6257f66be3c426beba20", size = 3471858, upload-time = "2025-11-24T23:25:59.636Z" }, + { url = "https://files.pythonhosted.org/packages/52/b2/b20e09670be031afa4cbfabd645caece7f85ec62d69c312239de568e058e/asyncpg-0.31.0-cp312-cp312-win32.whl", hash = "sha256:027eaa61361ec735926566f995d959ade4796f6a49d3bde17e5134b9964f9ba8", size = 527852, upload-time = "2025-11-24T23:26:01.084Z" }, + { url = "https://files.pythonhosted.org/packages/b5/f0/f2ed1de154e15b107dc692262395b3c17fc34eafe2a78fc2115931561730/asyncpg-0.31.0-cp312-cp312-win_amd64.whl", hash = "sha256:72d6bdcbc93d608a1158f17932de2321f68b1a967a13e014998db87a72ed3186", size = 597175, upload-time = "2025-11-24T23:26:02.564Z" }, +] + [[package]] name = "attrs" version = "25.4.0" @@ -425,6 +494,7 @@ dependencies = [ { name = "pandas" }, { name = "pickleshare" }, { name = "pre-commit" }, + { name = "prefect" }, { name = "pyarrow" }, { name = "pydeseq2" }, { name = "pygithub" }, @@ -503,6 +573,7 @@ requires-dist = [ { name = "pandas", specifier = ">=2.2.3" }, { name = "pickleshare", specifier = ">=0.7.5" }, { name = "pre-commit", specifier = ">=4.2.0" }, + { name = "prefect", specifier = ">=3.0" }, { name = "pyarrow", specifier = ">=22.0.0" }, { name = "pydeseq2", specifier = ">=0.5.3" }, { name = "pygithub", specifier = ">=2.4.0" }, @@ -714,11 +785,11 @@ wheels = [ [[package]] name = "cachetools" -version = "6.2.4" +version = "5.5.2" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/bc/1d/ede8680603f6016887c062a2cf4fc8fdba905866a3ab8831aa8aa651320c/cachetools-6.2.4.tar.gz", hash = "sha256:82c5c05585e70b6ba2d3ae09ea60b79548872185d2f24ae1f2709d37299fd607", size = 31731, upload-time = "2025-12-15T18:24:53.744Z" } +sdist = { url = "https://files.pythonhosted.org/packages/6c/81/3747dad6b14fa2cf53fcf10548cf5aea6913e96fab41a3c198676f8948a5/cachetools-5.5.2.tar.gz", hash = "sha256:1a661caa9175d26759571b2e19580f9d6393969e5dfca11fdb1f947a23e640d4", size = 28380, upload-time = "2025-02-20T21:01:19.524Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/2c/fc/1d7b80d0eb7b714984ce40efc78859c022cd930e402f599d8ca9e39c78a4/cachetools-6.2.4-py3-none-any.whl", hash = "sha256:69a7a52634fed8b8bf6e24a050fb60bff1c9bd8f6d24572b99c32d4e71e62a51", size = 11551, upload-time = "2025-12-15T18:24:52.332Z" }, + { url = "https://files.pythonhosted.org/packages/72/76/20fa66124dbe6be5cafeb312ece67de6b61dd91a0247d1ea13db4ebb33c2/cachetools-5.5.2-py3-none-any.whl", hash = "sha256:d26a22bcc62eb95c3beabd9f1ee5e820d3d2704fe2967cbe350e20c8ffcd3f0a", size = 10080, upload-time = "2025-02-20T21:01:16.647Z" }, ] [[package]] @@ -883,14 +954,14 @@ wheels = [ [[package]] name = "click" -version = "8.3.1" +version = "8.1.8" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "colorama", marker = "sys_platform == 'win32'" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/3d/fa/656b739db8587d7b5dfa22e22ed02566950fbfbcdc20311993483657a5c0/click-8.3.1.tar.gz", hash = "sha256:12ff4785d337a1bb490bb7e9c2b1ee5da3112e94a8622f26a6c77f5d2fc6842a", size = 295065, upload-time = "2025-11-15T20:45:42.706Z" } +sdist = { url = "https://files.pythonhosted.org/packages/b9/2e/0090cbf739cee7d23781ad4b89a9894a41538e4fcf4c31dcdd705b78eb8b/click-8.1.8.tar.gz", hash = "sha256:ed53c9d8990d83c2a27deae68e4ee337473f6330c040a31d4225c9574d16096a", size = 226593, upload-time = "2024-12-21T18:38:44.339Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/98/78/01c019cdb5d6498122777c1a43056ebb3ebfeef2076d9d026bfe15583b2b/click-8.3.1-py3-none-any.whl", hash = "sha256:981153a64e25f12d547d3426c367a4857371575ee7ad18df2a6183ab0545b2a6", size = 108274, upload-time = "2025-11-15T20:45:41.139Z" }, + { url = "https://files.pythonhosted.org/packages/7e/d4/7ebdbd03970677812aac39c869717059dbb71a4cfc033ca6e5221787892c/click-8.1.8-py3-none-any.whl", hash = "sha256:63c132bbbed01578a06712a2d1f497bb62d9c1c0d329b7903a866228027263b2", size = 98188, upload-time = "2024-12-21T18:38:41.666Z" }, ] [[package]] @@ -985,6 +1056,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/d1/e2/f05240d2c39a1ed228d8328a78b6f44cd695f7ef47beb3e684cf93604f86/contourpy-1.3.3-cp312-cp312-win_arm64.whl", hash = "sha256:07ce5ed73ecdc4a03ffe3e1b3e3c1166db35ae7584be76f65dbbe28a7791b0cc", size = 193655, upload-time = "2025-07-26T12:01:37.999Z" }, ] +[[package]] +name = "coolname" +version = "2.2.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/c5/c6/1eaa4495ff4640e80d9af64f540e427ba1596a20f735d4c4750fe0386d07/coolname-2.2.0.tar.gz", hash = "sha256:6c5d5731759104479e7ca195a9b64f7900ac5bead40183c09323c7d0be9e75c7", size = 59006, upload-time = "2023-01-09T14:50:41.724Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/1b/b1/5745d7523d8ce53b87779f46ef6cf5c5c342997939c2fe967e607b944e43/coolname-2.2.0-py2.py3-none-any.whl", hash = "sha256:4d1563186cfaf71b394d5df4c744f8c41303b6846413645e31d31915cdeb13e8", size = 37849, upload-time = "2023-01-09T14:50:39.897Z" }, +] + [[package]] name = "coverage" version = "7.13.0" @@ -1208,6 +1288,21 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/53/ee/e97f37023938022e38d3abb28058191025c7a2cb240210e7e016f21fee72/datalad_osf-0.3.0-py2.py3-none-any.whl", hash = "sha256:2cdc42ac3015d0734ac1f386a2f09fe2bfd2bad56e2035ebcce87a378b0ec209", size = 26384, upload-time = "2023-06-09T09:45:40.014Z" }, ] +[[package]] +name = "dateparser" +version = "1.2.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "python-dateutil" }, + { name = "pytz" }, + { name = "regex" }, + { name = "tzlocal" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/a9/30/064144f0df1749e7bb5faaa7f52b007d7c2d08ec08fed8411aba87207f68/dateparser-1.2.2.tar.gz", hash = "sha256:986316f17cb8cdc23ea8ce563027c5ef12fc725b6fb1d137c14ca08777c5ecf7", size = 329840, upload-time = "2025-06-26T09:29:23.211Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/87/22/f020c047ae1346613db9322638186468238bcfa8849b4668a22b97faad65/dateparser-1.2.2-py3-none-any.whl", hash = "sha256:5a5d7211a09013499867547023a2a0c91d5a27d15dd4dbcea676ea9fe66f2482", size = 315453, upload-time = "2025-06-26T09:29:21.412Z" }, +] + [[package]] name = "debugpy" version = "1.8.19" @@ -1371,6 +1466,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/26/b5/d343da782460999bd3e7c3c367b91d7b77f2eaf424bff7b315ce72bb4e54/eutils-0.6.1-py3-none-any.whl", hash = "sha256:6916efd10f397f20ba0e6bd5b84d4e868e077161509e240d7c4ab1d98fb2d3b1", size = 40910, upload-time = "2025-12-07T23:33:43.053Z" }, ] +[[package]] +name = "exceptiongroup" +version = "1.3.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/50/79/66800aadf48771f6b62f7eb014e352e5d06856655206165d775e675a02c9/exceptiongroup-1.3.1.tar.gz", hash = "sha256:8b412432c6055b0b7d14c310000ae93352ed6754f70fa8f7c34141f91c4e3219", size = 30371, upload-time = "2025-11-21T23:01:54.787Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/8a/0e/97c33bf5009bdbac74fd2beace167cab3f978feb69cc36f1ef79360d6c4e/exceptiongroup-1.3.1-py3-none-any.whl", hash = "sha256:a7a39a3bd276781e98394987d3a5701d0c4edffb633bb7a5144577f82c773598", size = 16740, upload-time = "2025-11-21T23:01:53.443Z" }, +] + [[package]] name = "executing" version = "2.2.1" @@ -1720,6 +1827,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/6c/79/3912a94cf27ec503e51ba493692d6db1e3cd8ac7ac52b0b47c8e33d7f4f9/greenlet-3.3.0-cp312-cp312-win_amd64.whl", hash = "sha256:a7a34b13d43a6b78abf828a6d0e87d3385680eaf830cd60d20d52f249faabf39", size = 301964, upload-time = "2025-12-04T14:36:58.316Z" }, ] +[[package]] +name = "griffe" +version = "1.15.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/0d/0c/3a471b6e31951dce2360477420d0a8d1e00dea6cf33b70f3e8c3ab6e28e1/griffe-1.15.0.tar.gz", hash = "sha256:7726e3afd6f298fbc3696e67958803e7ac843c1cfe59734b6251a40cdbfb5eea", size = 424112, upload-time = "2025-11-10T15:03:15.52Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/9c/83/3b1d03d36f224edded98e9affd0467630fc09d766c0e56fb1498cbb04a9b/griffe-1.15.0-py3-none-any.whl", hash = "sha256:6f6762661949411031f5fcda9593f586e6ce8340f0ba88921a0f2ef7a81eb9a3", size = 150705, upload-time = "2025-11-10T15:03:13.549Z" }, +] + [[package]] name = "grpcio" version = "1.76.0" @@ -1782,6 +1901,19 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" }, ] +[[package]] +name = "h2" +version = "4.3.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "hpack" }, + { name = "hyperframe" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/1d/17/afa56379f94ad0fe8defd37d6eb3f89a25404ffc71d4d848893d270325fc/h2-4.3.0.tar.gz", hash = "sha256:6c59efe4323fa18b47a632221a1888bd7fde6249819beda254aeca909f221bf1", size = 2152026, upload-time = "2025-08-23T18:12:19.778Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/69/b2/119f6e6dcbd96f9069ce9a2665e0146588dc9f88f29549711853645e736a/h2-4.3.0-py3-none-any.whl", hash = "sha256:c438f029a25f7945c69e0ccf0fb951dc3f73a5f6412981daee861431b70e2bdd", size = 61779, upload-time = "2025-08-23T18:12:17.779Z" }, +] + [[package]] name = "h5py" version = "3.15.1" @@ -1857,6 +1989,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/cb/44/870d44b30e1dcfb6a65932e3e1506c103a8a5aea9103c337e7a53180322c/hf_xet-1.2.0-cp37-abi3-win_amd64.whl", hash = "sha256:e6584a52253f72c9f52f9e549d5895ca7a471608495c4ecaa6cc73dba2b24d69", size = 2905735, upload-time = "2025-10-24T19:04:35.928Z" }, ] +[[package]] +name = "hpack" +version = "4.1.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/2c/48/71de9ed269fdae9c8057e5a4c0aa7402e8bb16f2c6e90b3aa53327b113f8/hpack-4.1.0.tar.gz", hash = "sha256:ec5eca154f7056aa06f196a557655c5b009b382873ac8d1e66e79e87535f1dca", size = 51276, upload-time = "2025-01-22T21:44:58.347Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/07/c6/80c95b1b2b94682a72cbdbfb85b81ae2daffa4291fbfa1b1464502ede10d/hpack-4.1.0-py3-none-any.whl", hash = "sha256:157ac792668d995c657d93111f46b4535ed114f0c9c8d672271bbec7eae1b496", size = 34357, upload-time = "2025-01-22T21:44:56.92Z" }, +] + [[package]] name = "httpcore" version = "1.0.9" @@ -1900,6 +2041,11 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl", hash = "sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad", size = 73517, upload-time = "2024-12-06T15:37:21.509Z" }, ] +[package.optional-dependencies] +http2 = [ + { name = "h2" }, +] + [[package]] name = "huggingface-hub" version = "0.36.0" @@ -1940,6 +2086,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/c5/7b/bca5613a0c3b542420cf92bd5e5fb8ebd5435ce1011a091f66bb7693285e/humanize-4.15.0-py3-none-any.whl", hash = "sha256:b1186eb9f5a9749cd9cb8565aee77919dd7c8d076161cf44d70e59e3301e1769", size = 132203, upload-time = "2025-12-20T20:16:11.67Z" }, ] +[[package]] +name = "hyperframe" +version = "6.1.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/02/e7/94f8232d4a74cc99514c13a9f995811485a6903d48e5d952771ef6322e30/hyperframe-6.1.0.tar.gz", hash = "sha256:f630908a00854a7adeabd6382b43923a4c4cd4b821fcb527e6ab9e15382a3b08", size = 26566, upload-time = "2025-01-22T21:41:49.302Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/48/30/47d0bf6072f7252e6521f3447ccfa40b421b6824517f82854703d0f5a98b/hyperframe-6.1.0-py3-none-any.whl", hash = "sha256:b03380493a519fce58ea5af42e4a42317bf9bd425596f7a0835ffce80f1a42e5", size = 13007, upload-time = "2025-01-22T21:41:47.295Z" }, +] + [[package]] name = "hypothesis" version = "6.148.7" @@ -2254,6 +2409,19 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67", size = 134899, upload-time = "2025-03-05T20:05:00.369Z" }, ] +[[package]] +name = "jinja2-humanize-extension" +version = "0.4.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "humanize" }, + { name = "jinja2" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/74/77/0bba383819dd4e67566487c11c49479ced87e77c3285d8e7f7a3401cf882/jinja2_humanize_extension-0.4.0.tar.gz", hash = "sha256:e7d69b1c20f32815bbec722330ee8af14b1287bb1c2b0afa590dbf031cadeaa0", size = 4746, upload-time = "2023-09-01T12:52:42.781Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/26/b4/08c9d297edd5e1182506edecccbb88a92e1122a057953068cadac420ca5d/jinja2_humanize_extension-0.4.0-py3-none-any.whl", hash = "sha256:b6326e2da0f7d425338bebf58848e830421defbce785f12ae812e65128518156", size = 4769, upload-time = "2023-09-01T12:52:41.098Z" }, +] + [[package]] name = "jiter" version = "0.12.0" @@ -2352,6 +2520,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f8/62/d9ba6323b9202dd2fe166beab8a86d29465c41a0288cbe229fac60c1ab8d/jsonlines-4.0.0-py3-none-any.whl", hash = "sha256:185b334ff2ca5a91362993f42e83588a360cf95ce4b71a73548502bda52a7c55", size = 8701, upload-time = "2023-09-01T12:34:42.563Z" }, ] +[[package]] +name = "jsonpatch" +version = "1.33" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "jsonpointer" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/42/78/18813351fe5d63acad16aec57f94ec2b70a09e53ca98145589e185423873/jsonpatch-1.33.tar.gz", hash = "sha256:9fcd4009c41e6d12348b4a0ff2563ba56a2923a7dfee731d004e212e1ee5030c", size = 21699, upload-time = "2023-06-26T12:07:29.144Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/73/07/02e16ed01e04a374e644b575638ec7987ae846d25ad97bcc9945a3ee4b0e/jsonpatch-1.33-py2.py3-none-any.whl", hash = "sha256:0ae28c0cd062bbd8b8ecc26d7d164fbbea9652a1a3693f3b956c1eae5145dade", size = 12898, upload-time = "2023-06-16T21:01:28.466Z" }, +] + [[package]] name = "jsonpointer" version = "3.0.0" @@ -2943,6 +3123,27 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ea/7b/93c73c67db235931527301ed3785f849c78991e2e34f3fd9a6663ffda4c5/lxml-6.0.2-cp312-cp312-win_arm64.whl", hash = "sha256:61cb10eeb95570153e0c0e554f58df92ecf5109f75eacad4a95baa709e26c3d6", size = 3672836, upload-time = "2025-09-22T04:01:52.145Z" }, ] +[[package]] +name = "mako" +version = "1.3.10" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "markupsafe" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/9e/38/bd5b78a920a64d708fe6bc8e0a2c075e1389d53bef8413725c63ba041535/mako-1.3.10.tar.gz", hash = "sha256:99579a6f39583fa7e5630a28c3c1f440e4e97a414b80372649c0ce338da2ea28", size = 392474, upload-time = "2025-04-10T12:44:31.16Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/87/fb/99f81ac72ae23375f22b7afdb7642aba97c00a713c217124420147681a2f/mako-1.3.10-py3-none-any.whl", hash = "sha256:baef24a52fc4fc514a0887ac600f9f1cff3d82c61d4d700a1fa84d597b88db59", size = 78509, upload-time = "2025-04-10T12:50:53.297Z" }, +] + +[[package]] +name = "markdown" +version = "3.10" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/7d/ab/7dd27d9d863b3376fcf23a5a13cb5d024aed1db46f963f1b5735ae43b3be/markdown-3.10.tar.gz", hash = "sha256:37062d4f2aa4b2b6b32aefb80faa300f82cc790cb949a35b8caede34f2b68c0e", size = 364931, upload-time = "2025-11-03T19:51:15.007Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/70/81/54e3ce63502cd085a0c556652a4e1b919c45a446bd1e5300e10c44c8c521/markdown-3.10-py3-none-any.whl", hash = "sha256:b5b99d6951e2e4948d939255596523444c0e677c669700b1d17aa4a8a464cb7c", size = 107678, upload-time = "2025-11-03T19:51:13.887Z" }, +] + [[package]] name = "markdown-it-py" version = "4.0.0" @@ -3927,11 +4128,11 @@ wheels = [ [[package]] name = "packaging" -version = "25.0" +version = "24.2" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/a1/d4/1fc4078c65507b51b96ca8f8c3ba19e6a61c8253c72794544580a7b6c24d/packaging-25.0.tar.gz", hash = "sha256:d443872c98d677bf60f6a1f2f8c1cb748e8fe762d2bf9d3148b5599295b0fc4f", size = 165727, upload-time = "2025-04-19T11:48:59.673Z" } +sdist = { url = "https://files.pythonhosted.org/packages/d0/63/68dbb6eb2de9cb10ee4c9c14a0148804425e13c4fb20d61cce69f53106da/packaging-24.2.tar.gz", hash = "sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f", size = 163950, upload-time = "2024-11-08T09:47:47.202Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/20/12/38679034af332785aac8774540895e234f4d07f7545804097de4b666afd8/packaging-25.0-py3-none-any.whl", hash = "sha256:29572ef2b1f17581046b3a2227d5c611fb25ec70ca1ba8554b24b0e69331a484", size = 66469, upload-time = "2025-04-19T11:48:57.875Z" }, + { url = "https://files.pythonhosted.org/packages/88/ef/eb23f262cca3c0c4eb7ab1933c3b1f03d021f2c48f54763065b6f0e321be/packaging-24.2-py3-none-any.whl", hash = "sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759", size = 65451, upload-time = "2024-11-08T09:47:44.722Z" }, ] [[package]] @@ -4034,6 +4235,29 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f1/70/ba4b949bdc0490ab78d545459acd7702b211dfccf7eb89bbc1060f52818d/patsy-1.0.2-py2.py3-none-any.whl", hash = "sha256:37bfddbc58fcf0362febb5f54f10743f8b21dd2aa73dec7e7ef59d1b02ae668a", size = 233301, upload-time = "2025-10-20T16:17:36.563Z" }, ] +[[package]] +name = "pendulum" +version = "3.1.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "python-dateutil" }, + { name = "tzdata" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/23/7c/009c12b86c7cc6c403aec80f8a4308598dfc5995e5c523a5491faaa3952e/pendulum-3.1.0.tar.gz", hash = "sha256:66f96303560f41d097bee7d2dc98ffca716fbb3a832c4b3062034c2d45865015", size = 85930, upload-time = "2025-04-19T14:30:01.675Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/7a/d7/b1bfe15a742f2c2713acb1fdc7dc3594ff46ef9418ac6a96fcb12a6ba60b/pendulum-3.1.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:4dfd53e7583ccae138be86d6c0a0b324c7547df2afcec1876943c4d481cf9608", size = 336209, upload-time = "2025-04-19T14:01:27.815Z" }, + { url = "https://files.pythonhosted.org/packages/eb/87/0392da0c603c828b926d9f7097fbdddaafc01388cb8a00888635d04758c3/pendulum-3.1.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:6a6e06a28f3a7d696546347805536f6f38be458cb79de4f80754430696bea9e6", size = 323130, upload-time = "2025-04-19T14:01:29.336Z" }, + { url = "https://files.pythonhosted.org/packages/c0/61/95f1eec25796be6dddf71440ee16ec1fd0c573fc61a73bd1ef6daacd529a/pendulum-3.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7e68d6a51880708084afd8958af42dc8c5e819a70a6c6ae903b1c4bfc61e0f25", size = 341509, upload-time = "2025-04-19T14:01:31.1Z" }, + { url = "https://files.pythonhosted.org/packages/b5/7b/eb0f5e6aa87d5e1b467a1611009dbdc92f0f72425ebf07669bfadd8885a6/pendulum-3.1.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9e3f1e5da39a7ea7119efda1dd96b529748c1566f8a983412d0908455d606942", size = 378674, upload-time = "2025-04-19T14:01:32.974Z" }, + { url = "https://files.pythonhosted.org/packages/29/68/5a4c1b5de3e54e16cab21d2ec88f9cd3f18599e96cc90a441c0b0ab6b03f/pendulum-3.1.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e9af1e5eeddb4ebbe1b1c9afb9fd8077d73416ade42dd61264b3f3b87742e0bb", size = 436133, upload-time = "2025-04-19T14:01:34.349Z" }, + { url = "https://files.pythonhosted.org/packages/87/5d/f7a1d693e5c0f789185117d5c1d5bee104f5b0d9fbf061d715fb61c840a8/pendulum-3.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:20f74aa8029a42e327bfc150472e0e4d2358fa5d795f70460160ba81b94b6945", size = 351232, upload-time = "2025-04-19T14:01:35.669Z" }, + { url = "https://files.pythonhosted.org/packages/30/77/c97617eb31f1d0554edb073201a294019b9e0a9bd2f73c68e6d8d048cd6b/pendulum-3.1.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:cf6229e5ee70c2660148523f46c472e677654d0097bec010d6730f08312a4931", size = 521562, upload-time = "2025-04-19T14:01:37.05Z" }, + { url = "https://files.pythonhosted.org/packages/76/22/0d0ef3393303877e757b848ecef8a9a8c7627e17e7590af82d14633b2cd1/pendulum-3.1.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:350cabb23bf1aec7c7694b915d3030bff53a2ad4aeabc8c8c0d807c8194113d6", size = 523221, upload-time = "2025-04-19T14:01:38.444Z" }, + { url = "https://files.pythonhosted.org/packages/99/f3/aefb579aa3cebd6f2866b205fc7a60d33e9a696e9e629024752107dc3cf5/pendulum-3.1.0-cp312-cp312-win_amd64.whl", hash = "sha256:42959341e843077c41d47420f28c3631de054abd64da83f9b956519b5c7a06a7", size = 260502, upload-time = "2025-04-19T14:01:39.814Z" }, + { url = "https://files.pythonhosted.org/packages/02/74/4332b5d6e34c63d4df8e8eab2249e74c05513b1477757463f7fdca99e9be/pendulum-3.1.0-cp312-cp312-win_arm64.whl", hash = "sha256:006758e2125da2e624493324dfd5d7d1b02b0c44bc39358e18bf0f66d0767f5f", size = 253089, upload-time = "2025-04-19T14:01:41.171Z" }, + { url = "https://files.pythonhosted.org/packages/6e/23/e98758924d1b3aac11a626268eabf7f3cf177e7837c28d47bf84c64532d0/pendulum-3.1.0-py3-none-any.whl", hash = "sha256:f9178c2a8e291758ade1e8dd6371b1d26d08371b4c7730a6e9a3ef8b16ebae0f", size = 111799, upload-time = "2025-04-19T14:02:34.739Z" }, +] + [[package]] name = "pexpect" version = "4.9.0" @@ -4138,6 +4362,69 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/5d/19/fd3ef348460c80af7bb4669ea7926651d1f95c23ff2df18b9d24bab4f3fa/pre_commit-4.5.1-py2.py3-none-any.whl", hash = "sha256:3b3afd891e97337708c1674210f8eba659b52a38ea5f822ff142d10786221f77", size = 226437, upload-time = "2025-12-16T21:14:32.409Z" }, ] +[[package]] +name = "prefect" +version = "3.2.7" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "aiosqlite" }, + { name = "alembic" }, + { name = "anyio" }, + { name = "apprise" }, + { name = "asgi-lifespan" }, + { name = "asyncpg" }, + { name = "cachetools" }, + { name = "click" }, + { name = "cloudpickle" }, + { name = "coolname" }, + { name = "cryptography" }, + { name = "dateparser" }, + { name = "docker" }, + { name = "exceptiongroup" }, + { name = "fastapi" }, + { name = "fsspec" }, + { name = "graphviz" }, + { name = "griffe" }, + { name = "httpcore" }, + { name = "httpx", extra = ["http2"] }, + { name = "humanize" }, + { name = "jinja2" }, + { name = "jinja2-humanize-extension" }, + { name = "jsonpatch" }, + { name = "jsonschema" }, + { name = "opentelemetry-api" }, + { name = "orjson" }, + { name = "packaging" }, + { name = "pathspec" }, + { name = "pendulum" }, + { name = "prometheus-client" }, + { name = "pydantic" }, + { name = "pydantic-core" }, + { name = "pydantic-extra-types" }, + { name = "pydantic-settings" }, + { name = "python-dateutil" }, + { name = "python-slugify" }, + { name = "python-socks" }, + { name = "pytz" }, + { name = "pyyaml" }, + { name = "readchar" }, + { name = "rfc3339-validator" }, + { name = "rich" }, + { name = "ruamel-yaml" }, + { name = "sniffio" }, + { name = "sqlalchemy", extra = ["asyncio"] }, + { name = "toml" }, + { name = "typer" }, + { name = "typing-extensions" }, + { name = "ujson" }, + { name = "uvicorn" }, + { name = "websockets" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/18/9e/7009c09d4e6a09ff4ad35afa93d5dba901635dd2c8f8e09e1eca597b884d/prefect-3.2.7.tar.gz", hash = "sha256:e24c06acabc38a1062e8672f71fea8a61348e8619df8ce2a4749e7bbf0142b54", size = 5780087, upload-time = "2025-02-21T19:40:28.984Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f1/70/13eb4bb6a9224d07fb7893d2d970488a607445d9868aa2acaa034148cdd0/prefect-3.2.7-py3-none-any.whl", hash = "sha256:7c91097e1de68fd6bd6f22bdf883866b757a6b661437b3ba68727936187f1842", size = 6264722, upload-time = "2025-02-21T19:40:25.936Z" }, +] + [[package]] name = "prefixcommons" version = "0.1.12" @@ -4463,6 +4750,33 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f7/07/34573da085946b6a313d7c42f82f16e8920bfd730665de2d11c0c37a74b5/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:76d0819de158cd855d1cbb8fcafdf6f5cf1eb8e470abe056d5d161106e38062b", size = 2139017, upload-time = "2025-11-04T13:42:59.471Z" }, ] +[[package]] +name = "pydantic-extra-types" +version = "2.10.6" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "pydantic" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/3a/10/fb64987804cde41bcc39d9cd757cd5f2bb5d97b389d81aa70238b14b8a7e/pydantic_extra_types-2.10.6.tar.gz", hash = "sha256:c63d70bf684366e6bbe1f4ee3957952ebe6973d41e7802aea0b770d06b116aeb", size = 141858, upload-time = "2025-10-08T13:47:49.483Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/93/04/5c918669096da8d1c9ec7bb716bd72e755526103a61bc5e76a3e4fb23b53/pydantic_extra_types-2.10.6-py3-none-any.whl", hash = "sha256:6106c448316d30abf721b5b9fecc65e983ef2614399a24142d689c7546cc246a", size = 40949, upload-time = "2025-10-08T13:47:48.268Z" }, +] + +[[package]] +name = "pydantic-settings" +version = "2.12.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "pydantic" }, + { name = "python-dotenv" }, + { name = "typing-inspection" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/43/4b/ac7e0aae12027748076d72a8764ff1c9d82ca75a7a52622e67ed3f765c54/pydantic_settings-2.12.0.tar.gz", hash = "sha256:005538ef951e3c2a68e1c08b292b5f2e71490def8589d4221b95dab00dafcfd0", size = 194184, upload-time = "2025-11-10T14:25:47.013Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c1/60/5d4751ba3f4a40a6891f24eec885f51afd78d208498268c734e256fb13c4/pydantic_settings-2.12.0-py3-none-any.whl", hash = "sha256:fddb9fd99a5b18da837b29710391e945b1e30c135477f484084ee513adb93809", size = 51880, upload-time = "2025-11-10T14:25:45.546Z" }, +] + [[package]] name = "pydeseq2" version = "0.5.3" @@ -4824,6 +5138,27 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/51/e5/fecf13f06e5e5f67e8837d777d1bc43fac0ed2b77a676804df5c34744727/python_json_logger-4.0.0-py3-none-any.whl", hash = "sha256:af09c9daf6a813aa4cc7180395f50f2a9e5fa056034c9953aec92e381c5ba1e2", size = 15548, upload-time = "2025-10-06T04:15:17.553Z" }, ] +[[package]] +name = "python-slugify" +version = "8.0.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "text-unidecode" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/87/c7/5e1547c44e31da50a460df93af11a535ace568ef89d7a811069ead340c4a/python-slugify-8.0.4.tar.gz", hash = "sha256:59202371d1d05b54a9e7720c5e038f928f45daaffe41dd10822f3907b937c856", size = 10921, upload-time = "2024-02-08T18:32:45.488Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a4/62/02da182e544a51a5c3ccf4b03ab79df279f9c60c5e82d5e8bec7ca26ac11/python_slugify-8.0.4-py2.py3-none-any.whl", hash = "sha256:276540b79961052b66b7d116620b36518847f52d5fd9e3a70164fc8c50faa6b8", size = 10051, upload-time = "2024-02-08T18:32:43.911Z" }, +] + +[[package]] +name = "python-socks" +version = "2.8.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/6c/07/cfdd6a846ac859e513b4e68bb6c669a90a74d89d8d405516fba7fc9c6f0c/python_socks-2.8.0.tar.gz", hash = "sha256:340f82778b20a290bdd538ee47492978d603dff7826aaf2ce362d21ad9ee6f1b", size = 273130, upload-time = "2025-12-09T12:17:05.433Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/13/10/e2b575faa32d1d32e5e6041fc64794fa9f09526852a06b25353b66f52cae/python_socks-2.8.0-py3-none-any.whl", hash = "sha256:57c24b416569ccea493a101d38b0c82ed54be603aa50b6afbe64c46e4a4e4315", size = 55075, upload-time = "2025-12-09T12:17:03.269Z" }, +] + [[package]] name = "pytz" version = "2025.2" @@ -4943,6 +5278,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/5f/97/d8a785d2c7131c731c90cb0e65af9400081af4380bea4ec04868dc21aa92/rdflib_shim-1.0.3-py3-none-any.whl", hash = "sha256:7a853e7750ef1e9bf4e35dea27d54e02d4ed087de5a9e0c329c4a6d82d647081", size = 5190, upload-time = "2021-12-21T16:31:05.719Z" }, ] +[[package]] +name = "readchar" +version = "4.2.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/dd/f8/8657b8cbb4ebeabfbdf991ac40eca8a1d1bd012011bd44ad1ed10f5cb494/readchar-4.2.1.tar.gz", hash = "sha256:91ce3faf07688de14d800592951e5575e9c7a3213738ed01d394dcc949b79adb", size = 9685, upload-time = "2024-11-04T18:28:07.757Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a9/10/e4b1e0e5b6b6745c8098c275b69bc9d73e9542d5c7da4f137542b499ed44/readchar-4.2.1-py3-none-any.whl", hash = "sha256:a769305cd3994bb5fa2764aa4073452dc105a4ec39068ffe6efd3c20c60acc77", size = 9350, upload-time = "2024-11-04T18:28:02.859Z" }, +] + [[package]] name = "referencing" version = "0.37.0" @@ -5080,15 +5424,15 @@ wheels = [ [[package]] name = "rich" -version = "14.2.0" +version = "13.9.4" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "markdown-it-py" }, { name = "pygments" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/fb/d2/8920e102050a0de7bfabeb4c4614a49248cf8d5d7a8d01885fbb24dc767a/rich-14.2.0.tar.gz", hash = "sha256:73ff50c7c0c1c77c8243079283f4edb376f0f6442433aecb8ce7e6d0b92d1fe4", size = 219990, upload-time = "2025-10-09T14:16:53.064Z" } +sdist = { url = "https://files.pythonhosted.org/packages/ab/3a/0316b28d0761c6734d6bc14e770d85506c986c85ffb239e688eeaab2c2bc/rich-13.9.4.tar.gz", hash = "sha256:439594978a49a09530cff7ebc4b5c7103ef57baf48d5ea3184f21d9a2befa098", size = 223149, upload-time = "2024-11-01T16:43:57.873Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/25/7a/b0178788f8dc6cafce37a212c99565fa1fe7872c70c6c9c1e1a372d9d88f/rich-14.2.0-py3-none-any.whl", hash = "sha256:76bc51fe2e57d2b1be1f96c524b890b816e334ab4c1e45888799bfaab0021edd", size = 243393, upload-time = "2025-10-09T14:16:51.245Z" }, + { url = "https://files.pythonhosted.org/packages/19/71/39c7c0d87f8d4e6c020a393182060eaefeeae6c01dab6a84ec346f2567df/rich-13.9.4-py3-none-any.whl", hash = "sha256:6049d5e6ec054bf2779ab3358186963bac2ea89175919d699e378b99738c2a90", size = 242424, upload-time = "2024-11-01T16:43:55.817Z" }, ] [[package]] @@ -5169,6 +5513,36 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/64/8d/0133e4eb4beed9e425d9a98ed6e081a55d195481b7632472be1af08d2f6b/rsa-4.9.1-py3-none-any.whl", hash = "sha256:68635866661c6836b8d39430f97a996acbd61bfa49406748ea243539fe239762", size = 34696, upload-time = "2025-04-16T09:51:17.142Z" }, ] +[[package]] +name = "ruamel-yaml" +version = "0.18.17" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "ruamel-yaml-clib", marker = "platform_python_implementation == 'CPython'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/3a/2b/7a1f1ebcd6b3f14febdc003e658778d81e76b40df2267904ee6b13f0c5c6/ruamel_yaml-0.18.17.tar.gz", hash = "sha256:9091cd6e2d93a3a4b157ddb8fabf348c3de7f1fb1381346d985b6b247dcd8d3c", size = 149602, upload-time = "2025-12-17T20:02:55.757Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/af/fe/b6045c782f1fd1ae317d2a6ca1884857ce5c20f59befe6ab25a8603c43a7/ruamel_yaml-0.18.17-py3-none-any.whl", hash = "sha256:9c8ba9eb3e793efdf924b60d521820869d5bf0cb9c6f1b82d82de8295e290b9d", size = 121594, upload-time = "2025-12-17T20:02:07.657Z" }, +] + +[[package]] +name = "ruamel-yaml-clib" +version = "0.2.15" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/ea/97/60fda20e2fb54b83a61ae14648b0817c8f5d84a3821e40bfbdae1437026a/ruamel_yaml_clib-0.2.15.tar.gz", hash = "sha256:46e4cc8c43ef6a94885f72512094e482114a8a706d3c555a34ed4b0d20200600", size = 225794, upload-time = "2025-11-16T16:12:59.761Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/72/4b/5fde11a0722d676e469d3d6f78c6a17591b9c7e0072ca359801c4bd17eee/ruamel_yaml_clib-0.2.15-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:cb15a2e2a90c8475df45c0949793af1ff413acfb0a716b8b94e488ea95ce7cff", size = 149088, upload-time = "2025-11-16T16:13:22.836Z" }, + { url = "https://files.pythonhosted.org/packages/85/82/4d08ac65ecf0ef3b046421985e66301a242804eb9a62c93ca3437dc94ee0/ruamel_yaml_clib-0.2.15-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:64da03cbe93c1e91af133f5bec37fd24d0d4ba2418eaf970d7166b0a26a148a2", size = 134553, upload-time = "2025-11-16T16:13:24.151Z" }, + { url = "https://files.pythonhosted.org/packages/b9/cb/22366d68b280e281a932403b76da7a988108287adff2bfa5ce881200107a/ruamel_yaml_clib-0.2.15-cp312-cp312-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:f6d3655e95a80325b84c4e14c080b2470fe4f33b6846f288379ce36154993fb1", size = 737468, upload-time = "2025-11-16T20:22:47.335Z" }, + { url = "https://files.pythonhosted.org/packages/71/73/81230babf8c9e33770d43ed9056f603f6f5f9665aea4177a2c30ae48e3f3/ruamel_yaml_clib-0.2.15-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:71845d377c7a47afc6592aacfea738cc8a7e876d586dfba814501d8c53c1ba60", size = 753349, upload-time = "2025-11-16T16:13:26.269Z" }, + { url = "https://files.pythonhosted.org/packages/61/62/150c841f24cda9e30f588ef396ed83f64cfdc13b92d2f925bb96df337ba9/ruamel_yaml_clib-0.2.15-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:11e5499db1ccbc7f4b41f0565e4f799d863ea720e01d3e99fa0b7b5fcd7802c9", size = 788211, upload-time = "2025-11-16T16:13:27.441Z" }, + { url = "https://files.pythonhosted.org/packages/30/93/e79bd9cbecc3267499d9ead919bd61f7ddf55d793fb5ef2b1d7d92444f35/ruamel_yaml_clib-0.2.15-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:4b293a37dc97e2b1e8a1aec62792d1e52027087c8eea4fc7b5abd2bdafdd6642", size = 743203, upload-time = "2025-11-16T16:13:28.671Z" }, + { url = "https://files.pythonhosted.org/packages/8d/06/1eb640065c3a27ce92d76157f8efddb184bd484ed2639b712396a20d6dce/ruamel_yaml_clib-0.2.15-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:512571ad41bba04eac7268fe33f7f4742210ca26a81fe0c75357fa682636c690", size = 747292, upload-time = "2025-11-16T20:22:48.584Z" }, + { url = "https://files.pythonhosted.org/packages/a5/21/ee353e882350beab65fcc47a91b6bdc512cace4358ee327af2962892ff16/ruamel_yaml_clib-0.2.15-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:e5e9f630c73a490b758bf14d859a39f375e6999aea5ddd2e2e9da89b9953486a", size = 771624, upload-time = "2025-11-16T16:13:29.853Z" }, + { url = "https://files.pythonhosted.org/packages/57/34/cc1b94057aa867c963ecf9ea92ac59198ec2ee3a8d22a126af0b4d4be712/ruamel_yaml_clib-0.2.15-cp312-cp312-win32.whl", hash = "sha256:f4421ab780c37210a07d138e56dd4b51f8642187cdfb433eb687fe8c11de0144", size = 100342, upload-time = "2025-11-16T16:13:31.067Z" }, + { url = "https://files.pythonhosted.org/packages/b3/e5/8925a4208f131b218f9a7e459c0d6fcac8324ae35da269cb437894576366/ruamel_yaml_clib-0.2.15-cp312-cp312-win_amd64.whl", hash = "sha256:2b216904750889133d9222b7b873c199d48ecbb12912aca78970f84a5aa1a4bc", size = 119013, upload-time = "2025-11-16T16:13:32.164Z" }, +] + [[package]] name = "ruff" version = "0.14.10" @@ -5725,6 +6099,11 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/bf/e1/3ccb13c643399d22289c6a9786c1a91e3dcbb68bce4beb44926ac2c557bf/sqlalchemy-2.0.45-py3-none-any.whl", hash = "sha256:5225a288e4c8cc2308dbdd874edad6e7d0fd38eac1e9e5f23503425c8eee20d0", size = 1936672, upload-time = "2025-12-09T21:54:52.608Z" }, ] +[package.optional-dependencies] +asyncio = [ + { name = "greenlet" }, +] + [[package]] name = "sqlalchemy-utils" version = "0.38.3" @@ -5897,6 +6276,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/6a/9e/2064975477fdc887e47ad42157e214526dcad8f317a948dee17e1659a62f/terminado-0.18.1-py3-none-any.whl", hash = "sha256:a4468e1b37bb318f8a86514f65814e1afc977cf29b3992a4500d9dd305dcceb0", size = 14154, upload-time = "2024-03-12T14:34:36.569Z" }, ] +[[package]] +name = "text-unidecode" +version = "1.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/ab/e2/e9a00f0ccb71718418230718b3d900e71a5d16e701a3dae079a21e9cd8f8/text-unidecode-1.3.tar.gz", hash = "sha256:bad6603bb14d279193107714b288be206cac565dfa49aa5b105294dd5c4aab93", size = 76885, upload-time = "2019-08-30T21:36:45.405Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a6/a5/c0b6468d3824fe3fde30dbb5e1f687b291608f9473681bbf7dabbf5a87d7/text_unidecode-1.3-py2.py3-none-any.whl", hash = "sha256:1311f10e8b895935241623731c2ba64f4c455287888b18189350b67134a822e8", size = 78154, upload-time = "2019-08-30T21:37:03.543Z" }, +] + [[package]] name = "texttable" version = "1.7.0" @@ -5994,6 +6382,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/b3/46/e33a8c93907b631a99377ef4c5f817ab453d0b34f93529421f42ff559671/tokenizers-0.22.1-cp39-abi3-win_amd64.whl", hash = "sha256:65fd6e3fb11ca1e78a6a93602490f134d1fdeb13bcef99389d5102ea318ed138", size = 2674684, upload-time = "2025-09-19T09:49:24.953Z" }, ] +[[package]] +name = "toml" +version = "0.10.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/be/ba/1f744cdc819428fc6b5084ec34d9b30660f6f9daaf70eead706e3203ec3c/toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f", size = 22253, upload-time = "2020-11-01T01:40:22.204Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/44/6f/7120676b6d73228c96e17f1f794d8ab046fc910d781c8d151120c3f1569e/toml-0.10.2-py2.py3-none-any.whl", hash = "sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b", size = 16588, upload-time = "2020-11-01T01:40:20.672Z" }, +] + [[package]] name = "tomli" version = "2.3.0" @@ -6127,7 +6524,7 @@ wheels = [ [[package]] name = "typer" -version = "0.20.1" +version = "0.15.4" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "click" }, @@ -6135,9 +6532,9 @@ dependencies = [ { name = "shellingham" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/6d/c1/933d30fd7a123ed981e2a1eedafceab63cb379db0402e438a13bc51bbb15/typer-0.20.1.tar.gz", hash = "sha256:68585eb1b01203689c4199bc440d6be616f0851e9f0eb41e4a778845c5a0fd5b", size = 105968, upload-time = "2025-12-19T16:48:56.302Z" } +sdist = { url = "https://files.pythonhosted.org/packages/6c/89/c527e6c848739be8ceb5c44eb8208c52ea3515c6cf6406aa61932887bf58/typer-0.15.4.tar.gz", hash = "sha256:89507b104f9b6a0730354f27c39fae5b63ccd0c95b1ce1f1a6ba0cfd329997c3", size = 101559, upload-time = "2025-05-14T16:34:57.704Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/c8/52/1f2df7e7d1be3d65ddc2936d820d4a3d9777a54f4204f5ca46b8513eff77/typer-0.20.1-py3-none-any.whl", hash = "sha256:4b3bde918a67c8e03d861aa02deca90a95bbac572e71b1b9be56ff49affdb5a8", size = 47381, upload-time = "2025-12-19T16:48:53.679Z" }, + { url = "https://files.pythonhosted.org/packages/c9/62/d4ba7afe2096d5659ec3db8b15d8665bdcb92a3c6ff0b95e99895b335a9c/typer-0.15.4-py3-none-any.whl", hash = "sha256:eb0651654dcdea706780c466cf06d8f174405a659ffff8f163cfbfee98c0e173", size = 45258, upload-time = "2025-05-14T16:34:55.583Z" }, ] [[package]] @@ -6195,6 +6592,25 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/c2/14/e2a54fabd4f08cd7af1c07030603c3356b74da07f7cc056e600436edfa17/tzlocal-5.3.1-py3-none-any.whl", hash = "sha256:eb1a66c3ef5847adf7a834f1be0800581b683b5608e74f86ecbcef8ab91bb85d", size = 18026, upload-time = "2025-03-05T21:17:39.857Z" }, ] +[[package]] +name = "ujson" +version = "5.11.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/43/d9/3f17e3c5773fb4941c68d9a37a47b1a79c9649d6c56aefbed87cc409d18a/ujson-5.11.0.tar.gz", hash = "sha256:e204ae6f909f099ba6b6b942131cee359ddda2b6e4ea39c12eb8b991fe2010e0", size = 7156583, upload-time = "2025-08-20T11:57:02.452Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b9/ef/a9cb1fce38f699123ff012161599fb9f2ff3f8d482b4b18c43a2dc35073f/ujson-5.11.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:7895f0d2d53bd6aea11743bd56e3cb82d729980636cd0ed9b89418bf66591702", size = 55434, upload-time = "2025-08-20T11:55:34.987Z" }, + { url = "https://files.pythonhosted.org/packages/b1/05/dba51a00eb30bd947791b173766cbed3492269c150a7771d2750000c965f/ujson-5.11.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:12b5e7e22a1fe01058000d1b317d3b65cc3daf61bd2ea7a2b76721fe160fa74d", size = 53190, upload-time = "2025-08-20T11:55:36.384Z" }, + { url = "https://files.pythonhosted.org/packages/03/3c/fd11a224f73fbffa299fb9644e425f38b38b30231f7923a088dd513aabb4/ujson-5.11.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0180a480a7d099082501cad1fe85252e4d4bf926b40960fb3d9e87a3a6fbbc80", size = 57600, upload-time = "2025-08-20T11:55:37.692Z" }, + { url = "https://files.pythonhosted.org/packages/55/b9/405103cae24899df688a3431c776e00528bd4799e7d68820e7ebcf824f92/ujson-5.11.0-cp312-cp312-manylinux_2_24_i686.manylinux_2_28_i686.whl", hash = "sha256:fa79fdb47701942c2132a9dd2297a1a85941d966d8c87bfd9e29b0cf423f26cc", size = 59791, upload-time = "2025-08-20T11:55:38.877Z" }, + { url = "https://files.pythonhosted.org/packages/17/7b/2dcbc2bbfdbf68f2368fb21ab0f6735e872290bb604c75f6e06b81edcb3f/ujson-5.11.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8254e858437c00f17cb72e7a644fc42dad0ebb21ea981b71df6e84b1072aaa7c", size = 57356, upload-time = "2025-08-20T11:55:40.036Z" }, + { url = "https://files.pythonhosted.org/packages/d1/71/fea2ca18986a366c750767b694430d5ded6b20b6985fddca72f74af38a4c/ujson-5.11.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:1aa8a2ab482f09f6c10fba37112af5f957689a79ea598399c85009f2f29898b5", size = 1036313, upload-time = "2025-08-20T11:55:41.408Z" }, + { url = "https://files.pythonhosted.org/packages/a3/bb/d4220bd7532eac6288d8115db51710fa2d7d271250797b0bfba9f1e755af/ujson-5.11.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:a638425d3c6eed0318df663df44480f4a40dc87cc7c6da44d221418312f6413b", size = 1195782, upload-time = "2025-08-20T11:55:43.357Z" }, + { url = "https://files.pythonhosted.org/packages/80/47/226e540aa38878ce1194454385701d82df538ccb5ff8db2cf1641dde849a/ujson-5.11.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:7e3cff632c1d78023b15f7e3a81c3745cd3f94c044d1e8fa8efbd6b161997bbc", size = 1088817, upload-time = "2025-08-20T11:55:45.262Z" }, + { url = "https://files.pythonhosted.org/packages/7e/81/546042f0b23c9040d61d46ea5ca76f0cc5e0d399180ddfb2ae976ebff5b5/ujson-5.11.0-cp312-cp312-win32.whl", hash = "sha256:be6b0eaf92cae8cdee4d4c9e074bde43ef1c590ed5ba037ea26c9632fb479c88", size = 39757, upload-time = "2025-08-20T11:55:46.522Z" }, + { url = "https://files.pythonhosted.org/packages/44/1b/27c05dc8c9728f44875d74b5bfa948ce91f6c33349232619279f35c6e817/ujson-5.11.0-cp312-cp312-win_amd64.whl", hash = "sha256:b7b136cc6abc7619124fd897ef75f8e63105298b5ca9bdf43ebd0e1fa0ee105f", size = 43859, upload-time = "2025-08-20T11:55:47.987Z" }, + { url = "https://files.pythonhosted.org/packages/22/2d/37b6557c97c3409c202c838aa9c960ca3896843b4295c4b7bb2bbd260664/ujson-5.11.0-cp312-cp312-win_arm64.whl", hash = "sha256:6cd2df62f24c506a0ba322d5e4fe4466d47a9467b57e881ee15a31f7ecf68ff6", size = 38361, upload-time = "2025-08-20T11:55:49.122Z" }, +] + [[package]] name = "umap-learn" version = "0.5.9.post2" From def34e76500db8c3e26c93681548034d649e663b Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 09:44:17 -0800 Subject: [PATCH 45/87] Remove parallelization from Prefect workflow to reduce memory usage MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit All cell types are now processed sequentially: - Steps 8-11 run one cell type at a time instead of in parallel - analyze_single_cell_type also runs tasks sequentially - Restored n_cpus=8 for DE since we're no longer parallelizing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/prefect_workflow/flows.py | 144 +++++++++--------- 1 file changed, 68 insertions(+), 76 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py index a0b06f0..7f54095 100644 --- a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py @@ -37,8 +37,7 @@ def run_workflow( ) -> dict[str, Any]: """Run the complete immune aging scRNA-seq workflow with Prefect. - Steps 1-7 run sequentially (shared preprocessing). - Steps 8-11 run in parallel for each cell type. + All steps run sequentially to minimize memory usage. Parameters ---------- @@ -204,10 +203,10 @@ def run_workflow( ) # ========================================================================= - # STEPS 8-11: Per-Cell-Type Analysis (Parallel) + # STEPS 8-11: Per-Cell-Type Analysis (Sequential) # ========================================================================= logger.info("=" * 60) - logger.info("STEPS 8-11: PER-CELL-TYPE ANALYSIS") + logger.info("STEPS 8-11: PER-CELL-TYPE ANALYSIS (SEQUENTIAL)") logger.info("=" * 60) # Get all cell types from pseudobulk @@ -227,82 +226,75 @@ def run_workflow( f"{skipped_cell_types}" ) - logger.info(f"Analyzing {len(valid_cell_types)} cell types") - - # Step 8: Submit all DE tasks in parallel - logger.info("Submitting differential expression tasks...") - de_futures = {} - for cell_type in valid_cell_types: - de_futures[cell_type] = differential_expression_task.submit( - pb_adata=pb_adata, - cell_type=cell_type, - var_to_feature=var_to_feature, - output_dir=results_dir, - design_factors=["age_scaled", "sex"], - n_cpus=4, # Reduced per-task to allow parallelism - ) - - # Steps 9-11: Submit pathway/enrichment/prediction tasks as DE completes - gsea_futures = {} - enrichr_futures = {} - prediction_futures = {} - - for cell_type in valid_cell_types: - # Wait for DE to complete for this cell type - de_result = de_futures[cell_type].result() - - # Get metadata for this cell type (for predictive modeling) - pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type].copy() - pb_adata_ct.obs["age"] = ( - pb_adata_ct.obs["development_stage"] - .str.extract(r"(\d+)-year-old")[0] - .astype(float) - ) - metadata_ct = pb_adata_ct.obs.copy() - - # Submit steps 9, 10, 11 in parallel for this cell type - gsea_futures[cell_type] = pathway_analysis_task.submit( - de_results=de_result["de_results"], - cell_type=cell_type, - output_dir=results_dir, - gene_sets=["MSigDB_Hallmark_2020"], - n_top=10, - ) - - enrichr_futures[cell_type] = overrepresentation_task.submit( - de_results=de_result["de_results"], - cell_type=cell_type, - output_dir=results_dir, - gene_sets=["MSigDB_Hallmark_2020"], - padj_threshold=0.05, - n_top=10, - ) + logger.info(f"Analyzing {len(valid_cell_types)} cell types sequentially") - prediction_futures[cell_type] = predictive_modeling_task.submit( - counts_df=de_result["counts_df"], - metadata=metadata_ct, - cell_type=cell_type, - output_dir=results_dir, - n_splits=5, - ) - - # Collect all results - logger.info("Collecting results...") + # Initialize results all_results = { "adata": adata, "pb_adata": pb_adata, "per_cell_type": {}, } - for cell_type in valid_cell_types: + # Process each cell type sequentially + for i, cell_type in enumerate(valid_cell_types): + logger.info(f"\n[{i + 1}/{len(valid_cell_types)}] Processing: {cell_type}") + try: + # Step 8: Differential Expression + de_result = differential_expression_task( + pb_adata=pb_adata, + cell_type=cell_type, + var_to_feature=var_to_feature, + output_dir=results_dir, + design_factors=["age_scaled", "sex"], + n_cpus=8, + ) + + # Get metadata for this cell type (for predictive modeling) + pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type].copy() + pb_adata_ct.obs["age"] = ( + pb_adata_ct.obs["development_stage"] + .str.extract(r"(\d+)-year-old")[0] + .astype(float) + ) + metadata_ct = pb_adata_ct.obs.copy() + + # Step 9: Pathway Analysis (GSEA) + gsea_result = pathway_analysis_task( + de_results=de_result["de_results"], + cell_type=cell_type, + output_dir=results_dir, + gene_sets=["MSigDB_Hallmark_2020"], + n_top=10, + ) + + # Step 10: Overrepresentation Analysis (Enrichr) + enrichr_result = overrepresentation_task( + de_results=de_result["de_results"], + cell_type=cell_type, + output_dir=results_dir, + gene_sets=["MSigDB_Hallmark_2020"], + padj_threshold=0.05, + n_top=10, + ) + + # Step 11: Predictive Modeling + prediction_result = predictive_modeling_task( + counts_df=de_result["counts_df"], + metadata=metadata_ct, + cell_type=cell_type, + output_dir=results_dir, + n_splits=5, + ) + all_results["per_cell_type"][cell_type] = { - "de": de_futures[cell_type].result(), - "gsea": gsea_futures[cell_type].result(), - "enrichment": enrichr_futures[cell_type].result(), - "prediction": prediction_futures[cell_type].result(), + "de": de_result, + "gsea": gsea_result, + "enrichment": enrichr_result, + "prediction": prediction_result, } logger.info(f"Completed analysis for: {cell_type}") + except Exception as e: logger.error(f"Failed analysis for {cell_type}: {e}") all_results["per_cell_type"][cell_type] = {"error": str(e)} @@ -401,20 +393,20 @@ def analyze_single_cell_type( ) metadata_ct = pb_adata_ct.obs.copy() - # Run parallel tasks - gsea_future = pathway_analysis_task.submit( + # Run tasks sequentially + gsea_result = pathway_analysis_task( de_results=de_result["de_results"], cell_type=cell_type, output_dir=results_dir, ) - enrichr_future = overrepresentation_task.submit( + enrichr_result = overrepresentation_task( de_results=de_result["de_results"], cell_type=cell_type, output_dir=results_dir, ) - prediction_future = predictive_modeling_task.submit( + prediction_result = predictive_modeling_task( counts_df=de_result["counts_df"], metadata=metadata_ct, cell_type=cell_type, @@ -424,7 +416,7 @@ def analyze_single_cell_type( return { "cell_type": cell_type, "de": de_result, - "gsea": gsea_future.result(), - "enrichment": enrichr_future.result(), - "prediction": prediction_future.result(), + "gsea": gsea_result, + "enrichment": enrichr_result, + "prediction": prediction_result, } From 13e70289507af2823e5b061460bd947ba6d67f88 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 14:30:01 -0800 Subject: [PATCH 46/87] Pin numba<0.63 to fix pynndescent compatibility issue MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit numba 0.63+ has stricter type checking that causes TypingError in pynndescent's nn_descent function when using print statements in numba-compiled code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyproject.toml b/pyproject.toml index 2874068..ca205b3 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -29,7 +29,7 @@ dependencies = [ "icecream>=2.1.4", "python-dotenv>=1.0.1", "pyyaml>=6.0.2", - "numba>=0.61.0", + "numba>=0.61,<0.63", "codespell>=2.4.1", "tomli>=2.2.1", "pre-commit>=4.2.0", From 6fe3ec6cf453c540ae87730c1c405428e873e34c Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 14:30:56 -0800 Subject: [PATCH 47/87] Update uv.lock for numba version constraint MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- uv.lock | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/uv.lock b/uv.lock index 5ab0f7a..37de7fe 100644 --- a/uv.lock +++ b/uv.lock @@ -566,7 +566,7 @@ requires-dist = [ { name = "networkx", specifier = ">=3.4.2" }, { name = "nibabel", specifier = ">=5.3.2" }, { name = "nilearn", specifier = ">=0.12.1" }, - { name = "numba", specifier = ">=0.61.0" }, + { name = "numba", specifier = ">=0.61,<0.63" }, { name = "numpy", specifier = ">=2.1.2" }, { name = "ols-client", specifier = ">=0.2.1" }, { name = "openai", specifier = ">=1.51.2" }, @@ -3056,14 +3056,15 @@ wheels = [ [[package]] name = "llvmlite" -version = "0.46.0" +version = "0.45.1" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/74/cd/08ae687ba099c7e3d21fe2ea536500563ef1943c5105bf6ab4ee3829f68e/llvmlite-0.46.0.tar.gz", hash = "sha256:227c9fd6d09dce2783c18b754b7cd9d9b3b3515210c46acc2d3c5badd9870ceb", size = 193456, upload-time = "2025-12-08T18:15:36.295Z" } +sdist = { url = "https://files.pythonhosted.org/packages/99/8d/5baf1cef7f9c084fb35a8afbde88074f0d6a727bc63ef764fe0e7543ba40/llvmlite-0.45.1.tar.gz", hash = "sha256:09430bb9d0bb58fc45a45a57c7eae912850bedc095cd0810a57de109c69e1c32", size = 185600, upload-time = "2025-10-01T17:59:52.046Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/2b/f8/4db016a5e547d4e054ff2f3b99203d63a497465f81ab78ec8eb2ff7b2304/llvmlite-0.46.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:6b9588ad4c63b4f0175a3984b85494f0c927c6b001e3a246a3a7fb3920d9a137", size = 37232767, upload-time = "2025-12-08T18:15:00.737Z" }, - { url = "https://files.pythonhosted.org/packages/aa/85/4890a7c14b4fa54400945cb52ac3cd88545bbdb973c440f98ca41591cdc5/llvmlite-0.46.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3535bd2bb6a2d7ae4012681ac228e5132cdb75fefb1bcb24e33f2f3e0c865ed4", size = 56275176, upload-time = "2025-12-08T18:15:03.936Z" }, - { url = "https://files.pythonhosted.org/packages/6a/07/3d31d39c1a1a08cd5337e78299fca77e6aebc07c059fbd0033e3edfab45c/llvmlite-0.46.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4cbfd366e60ff87ea6cc62f50bc4cd800ebb13ed4c149466f50cf2163a473d1e", size = 55128630, upload-time = "2025-12-08T18:15:07.196Z" }, - { url = "https://files.pythonhosted.org/packages/2a/6b/d139535d7590a1bba1ceb68751bef22fadaa5b815bbdf0e858e3875726b2/llvmlite-0.46.0-cp312-cp312-win_amd64.whl", hash = "sha256:398b39db462c39563a97b912d4f2866cd37cba60537975a09679b28fbbc0fb38", size = 38138940, upload-time = "2025-12-08T18:15:10.162Z" }, + { url = "https://files.pythonhosted.org/packages/e2/7c/82cbd5c656e8991bcc110c69d05913be2229302a92acb96109e166ae31fb/llvmlite-0.45.1-cp312-cp312-macosx_10_15_x86_64.whl", hash = "sha256:28e763aba92fe9c72296911e040231d486447c01d4f90027c8e893d89d49b20e", size = 43043524, upload-time = "2025-10-01T18:03:30.666Z" }, + { url = "https://files.pythonhosted.org/packages/9d/bc/5314005bb2c7ee9f33102c6456c18cc81745d7055155d1218f1624463774/llvmlite-0.45.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:1a53f4b74ee9fd30cb3d27d904dadece67a7575198bd80e687ee76474620735f", size = 37253123, upload-time = "2025-10-01T18:04:18.177Z" }, + { url = "https://files.pythonhosted.org/packages/96/76/0f7154952f037cb320b83e1c952ec4a19d5d689cf7d27cb8a26887d7bbc1/llvmlite-0.45.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b3796b1b1e1c14dcae34285d2f4ea488402fbd2c400ccf7137603ca3800864f", size = 56288211, upload-time = "2025-10-01T18:01:24.079Z" }, + { url = "https://files.pythonhosted.org/packages/00/b1/0b581942be2683ceb6862d558979e87387e14ad65a1e4db0e7dd671fa315/llvmlite-0.45.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:779e2f2ceefef0f4368548685f0b4adde34e5f4b457e90391f570a10b348d433", size = 55140958, upload-time = "2025-10-01T18:02:30.482Z" }, + { url = "https://files.pythonhosted.org/packages/33/94/9ba4ebcf4d541a325fd8098ddc073b663af75cc8b065b6059848f7d4dce7/llvmlite-0.45.1-cp312-cp312-win_amd64.whl", hash = "sha256:9e6c9949baf25d9aa9cd7cf0f6d011b9ca660dd17f5ba2b23bdbdb77cc86b116", size = 38132231, upload-time = "2025-10-01T18:05:03.664Z" }, ] [[package]] @@ -3682,18 +3683,19 @@ wheels = [ [[package]] name = "numba" -version = "0.63.1" +version = "0.62.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llvmlite" }, { name = "numpy" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/dc/60/0145d479b2209bd8fdae5f44201eceb8ce5a23e0ed54c71f57db24618665/numba-0.63.1.tar.gz", hash = "sha256:b320aa675d0e3b17b40364935ea52a7b1c670c9037c39cf92c49502a75902f4b", size = 2761666, upload-time = "2025-12-10T02:57:39.002Z" } +sdist = { url = "https://files.pythonhosted.org/packages/a3/20/33dbdbfe60e5fd8e3dbfde299d106279a33d9f8308346022316781368591/numba-0.62.1.tar.gz", hash = "sha256:7b774242aa890e34c21200a1fc62e5b5757d5286267e71103257f4e2af0d5161", size = 2749817, upload-time = "2025-09-29T10:46:31.551Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/14/9c/c0974cd3d00ff70d30e8ff90522ba5fbb2bcee168a867d2321d8d0457676/numba-0.63.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:2819cd52afa5d8d04e057bdfd54367575105f8829350d8fb5e4066fb7591cc71", size = 2680981, upload-time = "2025-12-10T02:57:17.579Z" }, - { url = "https://files.pythonhosted.org/packages/cb/70/ea2bc45205f206b7a24ee68a159f5097c9ca7e6466806e7c213587e0c2b1/numba-0.63.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5cfd45dbd3d409e713b1ccfdc2ee72ca82006860254429f4ef01867fdba5845f", size = 3801656, upload-time = "2025-12-10T02:57:19.106Z" }, - { url = "https://files.pythonhosted.org/packages/0d/82/4f4ba4fd0f99825cbf3cdefd682ca3678be1702b63362011de6e5f71f831/numba-0.63.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:69a599df6976c03b7ecf15d05302696f79f7e6d10d620367407517943355bcb0", size = 3501857, upload-time = "2025-12-10T02:57:20.721Z" }, - { url = "https://files.pythonhosted.org/packages/af/fd/6540456efa90b5f6604a86ff50dabefb187e43557e9081adcad3be44f048/numba-0.63.1-cp312-cp312-win_amd64.whl", hash = "sha256:bbad8c63e4fc7eb3cdb2c2da52178e180419f7969f9a685f283b313a70b92af3", size = 2750282, upload-time = "2025-12-10T02:57:22.474Z" }, + { url = "https://files.pythonhosted.org/packages/5e/fa/30fa6873e9f821c0ae755915a3ca444e6ff8d6a7b6860b669a3d33377ac7/numba-0.62.1-cp312-cp312-macosx_10_15_x86_64.whl", hash = "sha256:1b743b32f8fa5fff22e19c2e906db2f0a340782caf024477b97801b918cf0494", size = 2685346, upload-time = "2025-09-29T10:43:43.677Z" }, + { url = "https://files.pythonhosted.org/packages/a9/d5/504ce8dc46e0dba2790c77e6b878ee65b60fe3e7d6d0006483ef6fde5a97/numba-0.62.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:90fa21b0142bcf08ad8e32a97d25d0b84b1e921bc9423f8dda07d3652860eef6", size = 2688139, upload-time = "2025-09-29T10:44:04.894Z" }, + { url = "https://files.pythonhosted.org/packages/50/5f/6a802741176c93f2ebe97ad90751894c7b0c922b52ba99a4395e79492205/numba-0.62.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:6ef84d0ac19f1bf80431347b6f4ce3c39b7ec13f48f233a48c01e2ec06ecbc59", size = 3796453, upload-time = "2025-09-29T10:42:52.771Z" }, + { url = "https://files.pythonhosted.org/packages/7e/df/efd21527d25150c4544eccc9d0b7260a5dec4b7e98b5a581990e05a133c0/numba-0.62.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9315cc5e441300e0ca07c828a627d92a6802bcbf27c5487f31ae73783c58da53", size = 3496451, upload-time = "2025-09-29T10:43:19.279Z" }, + { url = "https://files.pythonhosted.org/packages/80/44/79bfdab12a02796bf4f1841630355c82b5a69933b1d50eb15c7fa37dabe8/numba-0.62.1-cp312-cp312-win_amd64.whl", hash = "sha256:44e3aa6228039992f058f5ebfcfd372c83798e9464297bdad8cc79febcf7891e", size = 2745552, upload-time = "2025-09-29T10:44:26.399Z" }, ] [[package]] From 2154feb2babfb9c89b37064078f49d9f3fce676b Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 14:38:42 -0800 Subject: [PATCH 48/87] Add execution logging to Prefect workflow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Set up file-based logging that captures all Prefect logs to workflow/logs/ - Integrate with existing ExecutionLog system for structured JSON logs - Track timing, parameters, and cache status for each workflow step - Log files saved as: prefect_workflow_YYYYMMDD_HHMMSS.log - Execution logs saved as: execution_log_YYYYMMDD_HHMMSS.json - Both logs are saved in the finally block to capture errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/prefect_workflow/flows.py | 661 +++++++++++------- 1 file changed, 423 insertions(+), 238 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py index 7f54095..d349cdd 100644 --- a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py @@ -3,6 +3,8 @@ Main workflow flow that orchestrates all tasks. """ +import logging +from datetime import datetime from pathlib import Path from typing import Any @@ -25,6 +27,47 @@ bids_checkpoint_name, load_checkpoint, ) +from BetterCodeBetterScience.rnaseq.stateless_workflow.execution_log import ( + create_execution_log, + serialize_parameters, +) + + +def setup_file_logging(log_dir: Path) -> tuple[Path, logging.FileHandler]: + """Set up file-based logging for the workflow. + + Parameters + ---------- + log_dir : Path + Directory to save log files + + Returns + ------- + tuple[Path, logging.FileHandler] + Path to log file and the file handler (for cleanup) + """ + log_dir.mkdir(parents=True, exist_ok=True) + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + log_file = log_dir / f"prefect_workflow_{timestamp}.log" + + # Create file handler + file_handler = logging.FileHandler(log_file) + file_handler.setLevel(logging.INFO) + formatter = logging.Formatter( + "%(asctime)s | %(levelname)-8s | %(name)s - %(message)s", + datefmt="%Y-%m-%d %H:%M:%S", + ) + file_handler.setFormatter(formatter) + + # Add handler to root logger to capture all logs + root_logger = logging.getLogger() + root_logger.addHandler(file_handler) + + # Also add to prefect logger + prefect_logger = logging.getLogger("prefect") + prefect_logger.addHandler(file_handler) + + return log_file, file_handler @flow(name="immune_aging_scrna_workflow", log_prints=True) @@ -69,259 +112,401 @@ def run_workflow( results_dir = datadir / "workflow/results/per_cell_type" results_dir.mkdir(parents=True, exist_ok=True) + log_dir = datadir / "workflow/logs" + log_dir.mkdir(parents=True, exist_ok=True) + + # Set up file logging + log_file, file_handler = setup_file_logging(log_dir) + logger.info(f"Logging to file: {log_file}") + + # Initialize execution log for structured tracking + execution_log = create_execution_log( + workflow_name="immune_aging_scrnaseq_prefect", + workflow_parameters=serialize_parameters( + datadir=datadir, + dataset_name=dataset_name, + url=url, + force_from_step=force_from_step, + min_samples_per_cell_type=min_samples_per_cell_type, + ), + ) + # Determine which steps to force re-run force = {i: False for i in range(1, 12)} if force_from_step is not None: for i in range(force_from_step, 12): force[i] = True - # ========================================================================= - # STEP 1: Data Download - # ========================================================================= - logger.info("=" * 60) - logger.info("STEP 1: DATA DOWNLOAD") - logger.info("=" * 60) - - datafile = datadir / f"dataset-{dataset_name}_subset-immune_raw.h5ad" - download_data_task(datafile, url) - - # ========================================================================= - # STEP 2: Data Filtering - # ========================================================================= - logger.info("=" * 60) - logger.info("STEP 2: DATA FILTERING") - logger.info("=" * 60) - - adata = load_and_filter_task( - datafile=datafile, - checkpoint_file=checkpoint_dir - / bids_checkpoint_name(dataset_name, 2, "filtered"), - cutoff_percentile=1.0, - min_cells_per_celltype=10, - percent_donors=0.95, - figure_dir=figure_dir, - force=force[2], - ) - - # Build var_to_feature mapping - var_to_feature = dict(zip(adata.var_names, adata.var["feature_name"])) - - # ========================================================================= - # STEP 3: Quality Control - # ========================================================================= - logger.info("=" * 60) - logger.info("STEP 3: QUALITY CONTROL") - logger.info("=" * 60) - - adata = quality_control_task( - adata=adata, - checkpoint_file=checkpoint_dir / bids_checkpoint_name(dataset_name, 3, "qc"), - min_genes=200, - max_genes=6000, - min_counts=500, - max_counts=30000, - max_hb_pct=5.0, - expected_doublet_rate=0.06, - figure_dir=figure_dir, - force=force[3], - ) - - # ========================================================================= - # STEP 4: Preprocessing - # ========================================================================= - logger.info("=" * 60) - logger.info("STEP 4: PREPROCESSING") - logger.info("=" * 60) - - adata = preprocessing_task( - adata=adata, - checkpoint_file=checkpoint_dir - / bids_checkpoint_name(dataset_name, 4, "preprocessed"), - target_sum=1e4, - n_top_genes=3000, - batch_key="donor_id", - force=force[4], - ) - - # ========================================================================= - # STEP 5: Dimensionality Reduction - # ========================================================================= - logger.info("=" * 60) - logger.info("STEP 5: DIMENSIONALITY REDUCTION") - logger.info("=" * 60) - - adata = dimensionality_reduction_task( - adata=adata, - checkpoint_file=checkpoint_dir - / bids_checkpoint_name(dataset_name, 5, "dimreduced"), - batch_key="donor_id", - n_neighbors=30, - n_pcs=40, - figure_dir=figure_dir, - force=force[5], - ) + error_occurred = None - # ========================================================================= - # STEP 6: Clustering - # ========================================================================= - logger.info("=" * 60) - logger.info("STEP 6: CLUSTERING") - logger.info("=" * 60) - - adata = clustering_task( - adata=adata, - checkpoint_file=checkpoint_dir - / bids_checkpoint_name(dataset_name, 6, "clustered"), - resolution=1.0, - figure_dir=figure_dir, - force=force[6], - ) + try: + # ===================================================================== + # STEP 1: Data Download + # ===================================================================== + logger.info("=" * 60) + logger.info("STEP 1: DATA DOWNLOAD") + logger.info("=" * 60) - # ========================================================================= - # STEP 7: Pseudobulking - # ========================================================================= - logger.info("=" * 60) - logger.info("STEP 7: PSEUDOBULKING") - logger.info("=" * 60) - - # Load step 3 checkpoint for raw counts - step3_checkpoint = checkpoint_dir / bids_checkpoint_name(dataset_name, 3, "qc") - adata_raw_counts = load_checkpoint(step3_checkpoint) - logger.info(f"Loaded raw counts from step 3: {adata_raw_counts.shape}") - - pb_adata = pseudobulk_task( - adata=adata_raw_counts, - checkpoint_file=checkpoint_dir - / bids_checkpoint_name(dataset_name, 7, "pseudobulk"), - group_col="cell_type", - donor_col="donor_id", - metadata_cols=["development_stage", "sex"], - min_cells=10, - figure_dir=figure_dir, - layer=None, # Use .X directly (raw counts) - force=force[7], - ) - - # ========================================================================= - # STEPS 8-11: Per-Cell-Type Analysis (Sequential) - # ========================================================================= - logger.info("=" * 60) - logger.info("STEPS 8-11: PER-CELL-TYPE ANALYSIS (SEQUENTIAL)") - logger.info("=" * 60) - - # Get all cell types from pseudobulk - cell_types = pb_adata.obs["cell_type"].unique().tolist() - logger.info(f"Found {len(cell_types)} cell types to analyze") - - # Filter cell types with insufficient samples - cell_type_counts = pb_adata.obs["cell_type"].value_counts() - valid_cell_types = [ - ct for ct in cell_types if cell_type_counts[ct] >= min_samples_per_cell_type - ] - skipped_cell_types = [ct for ct in cell_types if ct not in valid_cell_types] - - if skipped_cell_types: - logger.warning( - f"Skipping {len(skipped_cell_types)} cell types with < {min_samples_per_cell_type} samples: " - f"{skipped_cell_types}" + step_record = execution_log.add_step( + step_number=1, + step_name="data_download", + parameters=serialize_parameters(url=url), ) + datafile = datadir / f"dataset-{dataset_name}_subset-immune_raw.h5ad" + download_data_task(datafile, url) + execution_log.complete_step(step_record, from_cache=datafile.exists()) + + # ===================================================================== + # STEP 2: Data Filtering + # ===================================================================== + logger.info("=" * 60) + logger.info("STEP 2: DATA FILTERING") + logger.info("=" * 60) + + step2_params = { + "cutoff_percentile": 1.0, + "min_cells_per_celltype": 10, + "percent_donors": 0.95, + } + step_record = execution_log.add_step( + step_number=2, + step_name="data_filtering", + parameters=step2_params, + ) + checkpoint_file = checkpoint_dir / bids_checkpoint_name( + dataset_name, 2, "filtered" + ) + from_cache = checkpoint_file.exists() and not force[2] + adata = load_and_filter_task( + datafile=datafile, + checkpoint_file=checkpoint_file, + cutoff_percentile=step2_params["cutoff_percentile"], + min_cells_per_celltype=step2_params["min_cells_per_celltype"], + percent_donors=step2_params["percent_donors"], + figure_dir=figure_dir, + force=force[2], + ) + execution_log.complete_step(step_record, from_cache=from_cache) + + # Build var_to_feature mapping + var_to_feature = dict(zip(adata.var_names, adata.var["feature_name"])) + + # ===================================================================== + # STEP 3: Quality Control + # ===================================================================== + logger.info("=" * 60) + logger.info("STEP 3: QUALITY CONTROL") + logger.info("=" * 60) + + step3_params = { + "min_genes": 200, + "max_genes": 6000, + "min_counts": 500, + "max_counts": 30000, + "max_hb_pct": 5.0, + "expected_doublet_rate": 0.06, + } + step_record = execution_log.add_step( + step_number=3, + step_name="quality_control", + parameters=step3_params, + ) + checkpoint_file = checkpoint_dir / bids_checkpoint_name(dataset_name, 3, "qc") + from_cache = checkpoint_file.exists() and not force[3] + adata = quality_control_task( + adata=adata, + checkpoint_file=checkpoint_file, + min_genes=step3_params["min_genes"], + max_genes=step3_params["max_genes"], + min_counts=step3_params["min_counts"], + max_counts=step3_params["max_counts"], + max_hb_pct=step3_params["max_hb_pct"], + expected_doublet_rate=step3_params["expected_doublet_rate"], + figure_dir=figure_dir, + force=force[3], + ) + execution_log.complete_step(step_record, from_cache=from_cache) + + # ===================================================================== + # STEP 4: Preprocessing + # ===================================================================== + logger.info("=" * 60) + logger.info("STEP 4: PREPROCESSING") + logger.info("=" * 60) + + step4_params = { + "target_sum": 1e4, + "n_top_genes": 3000, + "batch_key": "donor_id", + } + step_record = execution_log.add_step( + step_number=4, + step_name="preprocessing", + parameters=step4_params, + ) + checkpoint_file = checkpoint_dir / bids_checkpoint_name( + dataset_name, 4, "preprocessed" + ) + from_cache = checkpoint_file.exists() and not force[4] + adata = preprocessing_task( + adata=adata, + checkpoint_file=checkpoint_file, + target_sum=step4_params["target_sum"], + n_top_genes=step4_params["n_top_genes"], + batch_key=step4_params["batch_key"], + force=force[4], + ) + execution_log.complete_step(step_record, from_cache=from_cache) + + # ===================================================================== + # STEP 5: Dimensionality Reduction + # ===================================================================== + logger.info("=" * 60) + logger.info("STEP 5: DIMENSIONALITY REDUCTION") + logger.info("=" * 60) + + step5_params = { + "batch_key": "donor_id", + "n_neighbors": 30, + "n_pcs": 40, + } + step_record = execution_log.add_step( + step_number=5, + step_name="dimensionality_reduction", + parameters=step5_params, + ) + checkpoint_file = checkpoint_dir / bids_checkpoint_name( + dataset_name, 5, "dimreduced" + ) + from_cache = checkpoint_file.exists() and not force[5] + adata = dimensionality_reduction_task( + adata=adata, + checkpoint_file=checkpoint_file, + batch_key=step5_params["batch_key"], + n_neighbors=step5_params["n_neighbors"], + n_pcs=step5_params["n_pcs"], + figure_dir=figure_dir, + force=force[5], + ) + execution_log.complete_step(step_record, from_cache=from_cache) + + # ===================================================================== + # STEP 6: Clustering + # ===================================================================== + logger.info("=" * 60) + logger.info("STEP 6: CLUSTERING") + logger.info("=" * 60) + + step6_params = {"resolution": 1.0} + step_record = execution_log.add_step( + step_number=6, + step_name="clustering", + parameters=step6_params, + ) + checkpoint_file = checkpoint_dir / bids_checkpoint_name( + dataset_name, 6, "clustered" + ) + from_cache = checkpoint_file.exists() and not force[6] + adata = clustering_task( + adata=adata, + checkpoint_file=checkpoint_file, + resolution=step6_params["resolution"], + figure_dir=figure_dir, + force=force[6], + ) + execution_log.complete_step(step_record, from_cache=from_cache) + + # ===================================================================== + # STEP 7: Pseudobulking + # ===================================================================== + logger.info("=" * 60) + logger.info("STEP 7: PSEUDOBULKING") + logger.info("=" * 60) + + step7_params = { + "group_col": "cell_type", + "donor_col": "donor_id", + "metadata_cols": ["development_stage", "sex"], + "min_cells": 10, + } + step_record = execution_log.add_step( + step_number=7, + step_name="pseudobulking", + parameters=step7_params, + ) + # Load step 3 checkpoint for raw counts + step3_checkpoint = checkpoint_dir / bids_checkpoint_name(dataset_name, 3, "qc") + adata_raw_counts = load_checkpoint(step3_checkpoint) + logger.info(f"Loaded raw counts from step 3: {adata_raw_counts.shape}") - logger.info(f"Analyzing {len(valid_cell_types)} cell types sequentially") - - # Initialize results - all_results = { - "adata": adata, - "pb_adata": pb_adata, - "per_cell_type": {}, - } - - # Process each cell type sequentially - for i, cell_type in enumerate(valid_cell_types): - logger.info(f"\n[{i + 1}/{len(valid_cell_types)}] Processing: {cell_type}") - - try: - # Step 8: Differential Expression - de_result = differential_expression_task( - pb_adata=pb_adata, - cell_type=cell_type, - var_to_feature=var_to_feature, - output_dir=results_dir, - design_factors=["age_scaled", "sex"], - n_cpus=8, - ) - - # Get metadata for this cell type (for predictive modeling) - pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type].copy() - pb_adata_ct.obs["age"] = ( - pb_adata_ct.obs["development_stage"] - .str.extract(r"(\d+)-year-old")[0] - .astype(float) - ) - metadata_ct = pb_adata_ct.obs.copy() - - # Step 9: Pathway Analysis (GSEA) - gsea_result = pathway_analysis_task( - de_results=de_result["de_results"], - cell_type=cell_type, - output_dir=results_dir, - gene_sets=["MSigDB_Hallmark_2020"], - n_top=10, - ) - - # Step 10: Overrepresentation Analysis (Enrichr) - enrichr_result = overrepresentation_task( - de_results=de_result["de_results"], - cell_type=cell_type, - output_dir=results_dir, - gene_sets=["MSigDB_Hallmark_2020"], - padj_threshold=0.05, - n_top=10, + checkpoint_file = checkpoint_dir / bids_checkpoint_name( + dataset_name, 7, "pseudobulk" + ) + from_cache = checkpoint_file.exists() and not force[7] + pb_adata = pseudobulk_task( + adata=adata_raw_counts, + checkpoint_file=checkpoint_file, + group_col=step7_params["group_col"], + donor_col=step7_params["donor_col"], + metadata_cols=step7_params["metadata_cols"], + min_cells=step7_params["min_cells"], + figure_dir=figure_dir, + layer=None, # Use .X directly (raw counts) + force=force[7], + ) + execution_log.complete_step(step_record, from_cache=from_cache) + + # ===================================================================== + # STEPS 8-11: Per-Cell-Type Analysis (Sequential) + # ===================================================================== + logger.info("=" * 60) + logger.info("STEPS 8-11: PER-CELL-TYPE ANALYSIS (SEQUENTIAL)") + logger.info("=" * 60) + + # Get all cell types from pseudobulk + cell_types = pb_adata.obs["cell_type"].unique().tolist() + logger.info(f"Found {len(cell_types)} cell types to analyze") + + # Filter cell types with insufficient samples + cell_type_counts = pb_adata.obs["cell_type"].value_counts() + valid_cell_types = [ + ct for ct in cell_types if cell_type_counts[ct] >= min_samples_per_cell_type + ] + skipped_cell_types = [ct for ct in cell_types if ct not in valid_cell_types] + + if skipped_cell_types: + logger.warning( + f"Skipping {len(skipped_cell_types)} cell types with " + f"< {min_samples_per_cell_type} samples: {skipped_cell_types}" ) - # Step 11: Predictive Modeling - prediction_result = predictive_modeling_task( - counts_df=de_result["counts_df"], - metadata=metadata_ct, - cell_type=cell_type, - output_dir=results_dir, - n_splits=5, + logger.info(f"Analyzing {len(valid_cell_types)} cell types sequentially") + + # Initialize results + all_results = { + "adata": adata, + "pb_adata": pb_adata, + "per_cell_type": {}, + } + + # Process each cell type sequentially + for i, cell_type in enumerate(valid_cell_types): + logger.info(f"\n[{i + 1}/{len(valid_cell_types)}] Processing: {cell_type}") + + # Log combined steps 8-11 for this cell type + step_record = execution_log.add_step( + step_number=8, + step_name=f"per_cell_type_analysis ({cell_type})", + parameters=serialize_parameters( + cell_type=cell_type, + design_factors=["age_scaled", "sex"], + gene_sets=["MSigDB_Hallmark_2020"], + n_splits=5, + ), ) - all_results["per_cell_type"][cell_type] = { - "de": de_result, - "gsea": gsea_result, - "enrichment": enrichr_result, - "prediction": prediction_result, - } - logger.info(f"Completed analysis for: {cell_type}") - - except Exception as e: - logger.error(f"Failed analysis for {cell_type}: {e}") - all_results["per_cell_type"][cell_type] = {"error": str(e)} - - # ========================================================================= - # Summary - # ========================================================================= - logger.info("=" * 60) - logger.info("WORKFLOW COMPLETE") - logger.info("=" * 60) - - successful = sum( - 1 - for ct_results in all_results["per_cell_type"].values() - if "error" not in ct_results - ) - failed = len(valid_cell_types) - successful - - logger.info( - f"Successfully analyzed: {successful}/{len(valid_cell_types)} cell types" - ) - if failed > 0: - logger.warning(f"Failed: {failed} cell types") + try: + # Step 8: Differential Expression + de_result = differential_expression_task( + pb_adata=pb_adata, + cell_type=cell_type, + var_to_feature=var_to_feature, + output_dir=results_dir, + design_factors=["age_scaled", "sex"], + n_cpus=8, + ) + + # Get metadata for this cell type (for predictive modeling) + pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type].copy() + pb_adata_ct.obs["age"] = ( + pb_adata_ct.obs["development_stage"] + .str.extract(r"(\d+)-year-old")[0] + .astype(float) + ) + metadata_ct = pb_adata_ct.obs.copy() + + # Step 9: Pathway Analysis (GSEA) + gsea_result = pathway_analysis_task( + de_results=de_result["de_results"], + cell_type=cell_type, + output_dir=results_dir, + gene_sets=["MSigDB_Hallmark_2020"], + n_top=10, + ) + + # Step 10: Overrepresentation Analysis (Enrichr) + enrichr_result = overrepresentation_task( + de_results=de_result["de_results"], + cell_type=cell_type, + output_dir=results_dir, + gene_sets=["MSigDB_Hallmark_2020"], + padj_threshold=0.05, + n_top=10, + ) + + # Step 11: Predictive Modeling + prediction_result = predictive_modeling_task( + counts_df=de_result["counts_df"], + metadata=metadata_ct, + cell_type=cell_type, + output_dir=results_dir, + n_splits=5, + ) + + all_results["per_cell_type"][cell_type] = { + "de": de_result, + "gsea": gsea_result, + "enrichment": enrichr_result, + "prediction": prediction_result, + } + logger.info(f"Completed analysis for: {cell_type}") + execution_log.complete_step(step_record) + + except Exception as e: + logger.error(f"Failed analysis for {cell_type}: {e}") + all_results["per_cell_type"][cell_type] = {"error": str(e)} + execution_log.complete_step(step_record, error_message=str(e)) + + # ===================================================================== + # Summary + # ===================================================================== + logger.info("=" * 60) + logger.info("WORKFLOW COMPLETE") + logger.info("=" * 60) + + successful = sum( + 1 + for ct_results in all_results["per_cell_type"].values() + if "error" not in ct_results + ) + failed = len(valid_cell_types) - successful - logger.info(f"Figures saved to: {figure_dir}") - logger.info(f"Checkpoints saved to: {checkpoint_dir}") - logger.info(f"Per-cell-type results saved to: {results_dir}") + logger.info( + f"Successfully analyzed: {successful}/{len(valid_cell_types)} cell types" + ) + if failed > 0: + logger.warning(f"Failed: {failed} cell types") + + logger.info(f"Figures saved to: {figure_dir}") + logger.info(f"Checkpoints saved to: {checkpoint_dir}") + logger.info(f"Per-cell-type results saved to: {results_dir}") + + except Exception as e: + error_occurred = str(e) + raise + + finally: + # Complete and save execution log + execution_log.complete(error_message=error_occurred) + execution_log_file = execution_log.save(log_dir) + execution_log.print_summary() + logger.info(f"Execution log saved to: {execution_log_file}") + logger.info(f"Workflow log saved to: {log_file}") + + # Clean up file handler + logging.getLogger().removeHandler(file_handler) + logging.getLogger("prefect").removeHandler(file_handler) + file_handler.close() return all_results From 532a113f87384154dbc16023cfb9a7e70353865d Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 14:51:24 -0800 Subject: [PATCH 49/87] Disable numba JIT to fix Prefect/pynndescent compatibility MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Prefect (like Ray) interferes with numba's JIT compilation, causing TypingError when pynndescent tries to use print in JIT-compiled code. Setting NUMBA_DISABLE_JIT=1 before imports avoids this issue. This makes some operations slower but ensures reliable execution. See: https://github.com/ray-project/ray/issues/44714 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/prefect_workflow/run_workflow.py | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py index bc5e8ff..65707a4 100644 --- a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py @@ -7,8 +7,13 @@ python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow --force-from 8 """ -import argparse +# Disable numba JIT to avoid compatibility issues with Prefect/pynndescent +# This must be set BEFORE importing any numba-dependent packages import os + +os.environ["NUMBA_DISABLE_JIT"] = "1" + +import argparse from pathlib import Path from dotenv import load_dotenv From 8b3a383ab1b83d0c71db7accde17e6ebdec16c7a Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 15:03:37 -0800 Subject: [PATCH 50/87] Pre-warm numba before Prefect import to fix JIT compatibility MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Instead of disabling numba JIT (which breaks scanpy's normalize_total), import pynndescent before Prefect to trigger JIT compilation in a clean Python environment before Prefect modifies the runtime. This is specific to Prefect/Ray - the stateless workflow doesn't have this issue because it doesn't use Prefect. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/prefect_workflow/run_workflow.py | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py index 65707a4..0c9fa75 100644 --- a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py @@ -7,13 +7,14 @@ python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow --force-from 8 """ -# Disable numba JIT to avoid compatibility issues with Prefect/pynndescent -# This must be set BEFORE importing any numba-dependent packages -import os - -os.environ["NUMBA_DISABLE_JIT"] = "1" +# Pre-warm numba JIT compilation BEFORE importing Prefect +# This avoids the Prefect/numba compatibility issue where Prefect's initialization +# interferes with numba's JIT compilation of pynndescent functions. +# See: https://github.com/ray-project/ray/issues/44714 +import pynndescent # noqa: F401 - triggers numba JIT compilation import argparse +import os from pathlib import Path from dotenv import load_dotenv From 6f8e524e29ea3a6df3823ac941656d960965a1bc Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 15:11:12 -0800 Subject: [PATCH 51/87] Set NUMBA_CAPTURED_ERRORS=old_style to fix pynndescent print issue MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This environment variable allows print statements in numba nopython functions, fixing the TypingError with pynndescent/Prefect. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/prefect_workflow/run_workflow.py | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py index 0c9fa75..9b9dfcd 100644 --- a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py @@ -7,14 +7,14 @@ python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow --force-from 8 """ -# Pre-warm numba JIT compilation BEFORE importing Prefect -# This avoids the Prefect/numba compatibility issue where Prefect's initialization -# interferes with numba's JIT compilation of pynndescent functions. +# Set numba environment variables BEFORE any imports +# NUMBA_CAPTURED_ERRORS='old_style' allows print in nopython functions # See: https://github.com/ray-project/ray/issues/44714 -import pynndescent # noqa: F401 - triggers numba JIT compilation +import os + +os.environ["NUMBA_CAPTURED_ERRORS"] = "old_style" import argparse -import os from pathlib import Path from dotenv import load_dotenv From 07d5a243c4e297cf5860e5251c0f70f1570add68 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 15:31:54 -0800 Subject: [PATCH 52/87] set log_prints=False to fix numba issue --- .../rnaseq/prefect_workflow/flows.py | 4 ++-- .../rnaseq/prefect_workflow/run_workflow.py | 8 +------- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py index d349cdd..d57a9f7 100644 --- a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py @@ -70,7 +70,7 @@ def setup_file_logging(log_dir: Path) -> tuple[Path, logging.FileHandler]: return log_file, file_handler -@flow(name="immune_aging_scrna_workflow", log_prints=True) +@flow(name="immune_aging_scrna_workflow", log_prints=False) def run_workflow( datadir: Path, dataset_name: str = "OneK1K", @@ -511,7 +511,7 @@ def run_workflow( return all_results -@flow(name="analyze_single_cell_type", log_prints=True) +@flow(name="analyze_single_cell_type", log_prints=False) def analyze_single_cell_type( datadir: Path, cell_type: str, diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py index 9b9dfcd..bc5e8ff 100644 --- a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py @@ -7,14 +7,8 @@ python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow --force-from 8 """ -# Set numba environment variables BEFORE any imports -# NUMBA_CAPTURED_ERRORS='old_style' allows print in nopython functions -# See: https://github.com/ray-project/ray/issues/44714 -import os - -os.environ["NUMBA_CAPTURED_ERRORS"] = "old_style" - import argparse +import os from pathlib import Path from dotenv import load_dotenv From 70764cd0ee8017c6e86dfafe1a5a88b917440191 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Mon, 22 Dec 2025 16:07:01 -0800 Subject: [PATCH 53/87] Add Snakemake workflow for scRNA-seq analysis MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Create a Snakemake workflow functionally equivalent to the Prefect and stateless workflows: - Snakefile with modular rule files (preprocessing, pseudobulk, per_cell_type) - Uses Snakemake checkpoint for step 7 to discover cell types dynamically - Per-cell-type analysis (steps 8-11) triggered for all valid cell types - Python scripts wrap existing modular workflow functions - Configuration in config/config.yaml with all workflow parameters - Added snakemake>=8.0 dependency Usage: snakemake --cores 8 --config datadir=/path/to/data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 15 +++ pyproject.toml | 1 + .../rnaseq/snakemake_workflow/Snakefile | 100 ++++++++++++++++ .../rnaseq/snakemake_workflow/__init__.py | 0 .../snakemake_workflow/config/config.yaml | 75 ++++++++++++ .../snakemake_workflow/rules/common.smk | 92 +++++++++++++++ .../rules/per_cell_type.smk | 96 +++++++++++++++ .../rules/preprocessing.smk | 107 +++++++++++++++++ .../snakemake_workflow/rules/pseudobulk.smk | 37 ++++++ .../scripts/aggregate_results.py | 56 +++++++++ .../snakemake_workflow/scripts/cluster.py | 43 +++++++ .../scripts/differential_expression.py | 79 +++++++++++++ .../snakemake_workflow/scripts/dimred.py | 47 ++++++++ .../snakemake_workflow/scripts/download.py | 23 ++++ .../snakemake_workflow/scripts/enrichr.py | 54 +++++++++ .../snakemake_workflow/scripts/filter.py | 48 ++++++++ .../rnaseq/snakemake_workflow/scripts/gsea.py | 48 ++++++++ .../snakemake_workflow/scripts/prediction.py | 79 +++++++++++++ .../snakemake_workflow/scripts/preprocess.py | 48 ++++++++ .../snakemake_workflow/scripts/pseudobulk.py | 110 ++++++++++++++++++ .../rnaseq/snakemake_workflow/scripts/qc.py | 54 +++++++++ 21 files changed, 1212 insertions(+) create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/__init__.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/config/config.yaml create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/common.smk create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/aggregate_results.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/differential_expression.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/download.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/enrichr.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/gsea.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/prediction.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/preprocess.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py diff --git a/problems_to_solve.md b/problems_to_solve.md index f1e6d77..b2d4a7d 100644 --- a/problems_to_solve.md +++ b/problems_to_solve.md @@ -3,6 +3,21 @@ Open problems marked with [ ] Fixed problems marked with [x] +[x] I would now like to add another workflow, with code saved to src/BetterCodeBetterScience/rnaseq/snakemake_workflow. This workflow will use the Snakemake workflow manager (https://snakemake.readthedocs.io/en/stable/index.html); otherwise it should be functionally equivalent to the other workflows already developed. + - Created `snakemake_workflow/` directory with: + - `Snakefile`: Main workflow entry point + - `config/config.yaml`: All workflow parameters with defaults + - `rules/common.smk`: Helper functions (sanitize_cell_type, aggregate functions) + - `rules/preprocessing.smk`: Steps 1-6 rules + - `rules/pseudobulk.smk`: Step 7 as Snakemake checkpoint (enables dynamic rules) + - `rules/per_cell_type.smk`: Steps 8-11 with {cell_type} wildcard + - `scripts/*.py`: 12 Python scripts wrapping modular workflow functions + - Uses Snakemake checkpoint for step 7 to discover cell types dynamically + - Per-cell-type steps (8-11) triggered automatically for all valid cell types + - Reuses existing modular workflow functions and checkpoint utilities + - Added `snakemake>=8.0` dependency to pyproject.toml + - Usage: `snakemake --cores 8 --config datadir=/path/to/data` + [x] I would like to add a new workflow, with code saved to src/BetterCodeBetterScience/rnaseq/prefect_workflow. This workflow will use the Prefect workflow manager (https://github.com/PrefectHQ/prefect) to manage the workflow that was previously developed in src/BetterCodeBetterScience/rnaseq/stateless_workflow. The one new feature that I would like to add here is to perform steps 8-11 separately on each different cell type that survives the initial filtering. - Created `prefect_workflow/` directory with: - `tasks.py`: Prefect task definitions wrapping modular workflow functions diff --git a/pyproject.toml b/pyproject.toml index ca205b3..f467571 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -80,6 +80,7 @@ dependencies = [ "harmonypy>=0.0.10", "rpy2>=3.6.4", "prefect>=3.0", + "snakemake>=8.0", ] [build-system] diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile new file mode 100644 index 0000000..87b617b --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile @@ -0,0 +1,100 @@ +"""Main Snakemake workflow for scRNA-seq immune aging analysis. + +This workflow is functionally equivalent to the Prefect and stateless workflows. +It processes scRNA-seq data through 11 steps: + +Global Steps (1-7): +1. Data Download +2. Data Filtering +3. Quality Control +4. Preprocessing +5. Dimensionality Reduction +6. Clustering +7. Pseudobulking (discovers cell types) + +Per-Cell-Type Steps (8-11): +8. Differential Expression +9. Pathway Analysis (GSEA) +10. Overrepresentation Analysis (Enrichr) +11. Predictive Modeling + +Usage: + # Run full workflow + snakemake --cores 8 --config datadir=/path/to/data + + # Dry run (see what would be executed) + snakemake -n --config datadir=/path/to/data + + # Force re-run from a specific rule + snakemake --cores 8 --forcerun dimensionality_reduction --config datadir=/path/to/data + + # Run only preprocessing (steps 1-6) + snakemake --cores 8 clustering --config datadir=/path/to/data +""" + +from pathlib import Path + +from snakemake.utils import min_version + +# Require Snakemake 8.0 or higher +min_version("8.0") + + +# Load configuration +configfile: "config/config.yaml" + + +# Validate required config +if config.get("datadir") is None: + raise ValueError( + "datadir must be provided via --config datadir=/path/to/data" + ) + +DATADIR = Path(config["datadir"]) +DATASET = config["dataset_name"] + +# Derived paths +CHECKPOINT_DIR = DATADIR / "workflow" / "checkpoints" +RESULTS_DIR = DATADIR / "workflow" / "results" +FIGURE_DIR = DATADIR / "workflow" / "figures" +LOG_DIR = DATADIR / "workflow" / "logs" + + +# Include modular rule files +include: "rules/common.smk" +include: "rules/preprocessing.smk" +include: "rules/pseudobulk.smk" +include: "rules/per_cell_type.smk" + + +# Default target: all analyses complete +rule all: + input: + # Global preprocessing outputs + CHECKPOINT_DIR / f"dataset-{DATASET}_step-06_desc-clustered.h5ad", + # Aggregated per-cell-type results (triggers dynamic rules) + RESULTS_DIR / "workflow_complete.txt", + + +# Rule to aggregate all per-cell-type results +rule aggregate_results: + input: + aggregate_per_cell_type_outputs, + output: + RESULTS_DIR / "workflow_complete.txt", + log: + LOG_DIR / "aggregate_results.log", + script: + "scripts/aggregate_results.py" + + +# Preprocessing-only target (stops at step 6) +rule preprocessing_only: + input: + CHECKPOINT_DIR / f"dataset-{DATASET}_step-06_desc-clustered.h5ad", + + +# Pseudobulk-only target (stops at step 7) +rule pseudobulk_only: + input: + CHECKPOINT_DIR / f"dataset-{DATASET}_step-07_desc-pseudobulk.h5ad", diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/__init__.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/config/config.yaml b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/config/config.yaml new file mode 100644 index 0000000..121be0f --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/config/config.yaml @@ -0,0 +1,75 @@ +# Configuration for Snakemake scRNA-seq immune aging workflow +# +# Usage: +# snakemake --cores 8 --config datadir=/path/to/data +# +# Override any parameter: +# snakemake --cores 8 --config datadir=/path/to/data dataset_name=MyDataset + +# Dataset configuration +dataset_name: "OneK1K" +url: "https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad" + +# Step 2: Filtering parameters +filtering: + cutoff_percentile: 1.0 + min_cells_per_celltype: 10 + percent_donors: 0.95 + +# Step 3: QC parameters +qc: + min_genes: 200 + max_genes: 6000 + min_counts: 500 + max_counts: 30000 + max_hb_pct: 5.0 + expected_doublet_rate: 0.06 + +# Step 4: Preprocessing parameters +preprocessing: + target_sum: 10000 + n_top_genes: 3000 + batch_key: "donor_id" + +# Step 5: Dimensionality reduction parameters +dimred: + batch_key: "donor_id" + n_neighbors: 30 + n_pcs: 40 + +# Step 6: Clustering parameters +clustering: + resolution: 1.0 + +# Step 7: Pseudobulking parameters +pseudobulk: + group_col: "cell_type" + donor_col: "donor_id" + metadata_cols: + - "development_stage" + - "sex" + min_cells: 10 + +# Steps 8-11: Per-cell-type analysis parameters +differential_expression: + design_factors: + - "age_scaled" + - "sex" + n_cpus: 8 + +pathway_analysis: + gene_sets: + - "MSigDB_Hallmark_2020" + n_top: 10 + +overrepresentation: + gene_sets: + - "MSigDB_Hallmark_2020" + padj_threshold: 0.05 + n_top: 10 + +predictive_modeling: + n_splits: 5 + +# Minimum samples per cell type for per-cell-type analysis +min_samples_per_cell_type: 10 diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/common.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/common.smk new file mode 100644 index 0000000..3c11e87 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/common.smk @@ -0,0 +1,92 @@ +"""Common helper functions and wildcards for Snakemake workflow.""" + +import json +from pathlib import Path + + +def sanitize_cell_type(cell_type: str) -> str: + """Sanitize cell type name for filesystem use. + + Matches the function used in Prefect workflow. + """ + return cell_type.replace(" ", "_").replace(",", "").replace("-", "_") + + +def unsanitize_cell_type(sanitized: str, cell_types_file: Path) -> str: + """Convert sanitized cell type back to original name. + + Parameters + ---------- + sanitized : str + Sanitized cell type name + cell_types_file : Path + Path to cell_types.json file from pseudobulk step + + Returns + ------- + str + Original cell type name + """ + with open(cell_types_file) as f: + data = json.load(f) + # Reverse lookup + for original, sanitized_name in data["sanitized_names"].items(): + if sanitized_name == sanitized: + return original + raise ValueError(f"Unknown sanitized cell type: {sanitized}") + + +def bids_checkpoint_name( + dataset_name: str, step_number: int, description: str, extension: str = "h5ad" +) -> str: + """Generate BIDS-compliant checkpoint filename. + + Matches the naming convention used in stateless_workflow. + """ + return f"dataset-{dataset_name}_step-{step_number:02d}_desc-{description}.{extension}" + + +def get_valid_cell_types(wildcards): + """Get list of valid cell types from pseudobulk checkpoint. + + This function is used as an input function after the checkpoint + to determine which cell types to process. + """ + checkpoint_output = checkpoints.pseudobulk.get(**wildcards) + cell_types_file = checkpoint_output.output.cell_types + + with open(cell_types_file) as f: + data = json.load(f) + + return data["valid_cell_types"] + + +def aggregate_per_cell_type_outputs(wildcards): + """Aggregate function to collect all per-cell-type outputs. + + This is called after the pseudobulk checkpoint is resolved to + generate the list of expected outputs for all valid cell types. + """ + checkpoint_output = checkpoints.pseudobulk.get(**wildcards) + cell_types_file = checkpoint_output.output.cell_types + + with open(cell_types_file) as f: + data = json.load(f) + + valid_cell_types = data["valid_cell_types"] + + outputs = [] + for ct in valid_cell_types: + ct_sanitized = sanitize_cell_type(ct) + ct_dir = RESULTS_DIR / "per_cell_type" / ct_sanitized + outputs.extend( + [ + ct_dir / "de_results.parquet", + ct_dir / "gsea_results.pkl", + ct_dir / "enrichr_up.pkl", + ct_dir / "enrichr_down.pkl", + ct_dir / "prediction_results.pkl", + ] + ) + + return outputs diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk new file mode 100644 index 0000000..9d06fa6 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk @@ -0,0 +1,96 @@ +"""Per-cell-type analysis rules (Steps 8-11). + +These rules are triggered dynamically based on the cell types discovered +in the pseudobulk checkpoint (step 7). Each rule uses the {cell_type} +wildcard to process all valid cell types. + +The workflow is: +- Step 8: Differential Expression (required first - provides DE results and counts) +- Step 9: Pathway Analysis (GSEA) - depends on DE results +- Step 10: Overrepresentation Analysis (Enrichr) - depends on DE results +- Step 11: Predictive Modeling - depends on counts from DE step +""" + + +# Step 8: Differential Expression (per cell type) +rule differential_expression: + input: + pseudobulk=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 7, "pseudobulk"), + var_to_feature=CHECKPOINT_DIR / f"dataset-{DATASET}_step-07_var_to_feature.json", + cell_types=CHECKPOINT_DIR / f"dataset-{DATASET}_step-07_cell_types.json", + output: + stat_res=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "stat_res.pkl", + de_results=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "de_results.parquet", + counts_df=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "counts.parquet", + params: + cell_type=lambda wildcards: wildcards.cell_type, + design_factors=config["differential_expression"]["design_factors"], + n_cpus=config["differential_expression"]["n_cpus"], + threads: config["differential_expression"]["n_cpus"] + log: + LOG_DIR / "step08_de_{cell_type}.log", + script: + "../scripts/differential_expression.py" + + +# Step 9: Pathway Analysis (GSEA) (per cell type) +rule pathway_analysis: + input: + de_results=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "de_results.parquet", + output: + gsea_results=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "gsea_results.pkl", + params: + cell_type=lambda wildcards: wildcards.cell_type, + gene_sets=config["pathway_analysis"]["gene_sets"], + n_top=config["pathway_analysis"]["n_top"], + figure_dir=lambda wildcards: str( + RESULTS_DIR / "per_cell_type" / wildcards.cell_type / "figures" + ), + log: + LOG_DIR / "step09_gsea_{cell_type}.log", + script: + "../scripts/gsea.py" + + +# Step 10: Overrepresentation Analysis (Enrichr) (per cell type) +rule overrepresentation: + input: + de_results=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "de_results.parquet", + output: + enr_up=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "enrichr_up.pkl", + enr_down=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "enrichr_down.pkl", + params: + cell_type=lambda wildcards: wildcards.cell_type, + gene_sets=config["overrepresentation"]["gene_sets"], + padj_threshold=config["overrepresentation"]["padj_threshold"], + n_top=config["overrepresentation"]["n_top"], + figure_dir=lambda wildcards: str( + RESULTS_DIR / "per_cell_type" / wildcards.cell_type / "figures" + ), + log: + LOG_DIR / "step10_enrichr_{cell_type}.log", + script: + "../scripts/enrichr.py" + + +# Step 11: Predictive Modeling (per cell type) +rule predictive_modeling: + input: + counts_df=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "counts.parquet", + pseudobulk=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 7, "pseudobulk"), + cell_types=CHECKPOINT_DIR / f"dataset-{DATASET}_step-07_cell_types.json", + output: + prediction_results=RESULTS_DIR + / "per_cell_type" + / "{cell_type}" + / "prediction_results.pkl", + params: + cell_type=lambda wildcards: wildcards.cell_type, + n_splits=config["predictive_modeling"]["n_splits"], + figure_dir=lambda wildcards: str( + RESULTS_DIR / "per_cell_type" / wildcards.cell_type / "figures" + ), + log: + LOG_DIR / "step11_prediction_{cell_type}.log", + script: + "../scripts/prediction.py" diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk new file mode 100644 index 0000000..3545060 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk @@ -0,0 +1,107 @@ +"""Preprocessing rules (Steps 1-6). + +These rules handle the initial data processing pipeline: +1. Data Download +2. Data Filtering +3. Quality Control +4. Preprocessing (normalization, HVG selection) +5. Dimensionality Reduction (PCA, UMAP) +6. Clustering +""" + + +# Step 1: Data Download +rule download_data: + output: + DATADIR / f"dataset-{DATASET}_subset-immune_raw.h5ad", + params: + url=config["url"], + log: + LOG_DIR / "step01_download.log", + script: + "../scripts/download.py" + + +# Step 2: Data Filtering +rule filter_data: + input: + DATADIR / f"dataset-{DATASET}_subset-immune_raw.h5ad", + output: + CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 2, "filtered"), + params: + cutoff_percentile=config["filtering"]["cutoff_percentile"], + min_cells_per_celltype=config["filtering"]["min_cells_per_celltype"], + percent_donors=config["filtering"]["percent_donors"], + figure_dir=str(FIGURE_DIR), + log: + LOG_DIR / "step02_filtering.log", + script: + "../scripts/filter.py" + + +# Step 3: Quality Control +rule quality_control: + input: + CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 2, "filtered"), + output: + CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), + params: + min_genes=config["qc"]["min_genes"], + max_genes=config["qc"]["max_genes"], + min_counts=config["qc"]["min_counts"], + max_counts=config["qc"]["max_counts"], + max_hb_pct=config["qc"]["max_hb_pct"], + expected_doublet_rate=config["qc"]["expected_doublet_rate"], + figure_dir=str(FIGURE_DIR), + log: + LOG_DIR / "step03_qc.log", + script: + "../scripts/qc.py" + + +# Step 4: Preprocessing +rule preprocess: + input: + CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), + output: + CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 4, "preprocessed"), + params: + target_sum=config["preprocessing"]["target_sum"], + n_top_genes=config["preprocessing"]["n_top_genes"], + batch_key=config["preprocessing"]["batch_key"], + log: + LOG_DIR / "step04_preprocessing.log", + script: + "../scripts/preprocess.py" + + +# Step 5: Dimensionality Reduction +rule dimensionality_reduction: + input: + CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 4, "preprocessed"), + output: + CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 5, "dimreduced"), + params: + batch_key=config["dimred"]["batch_key"], + n_neighbors=config["dimred"]["n_neighbors"], + n_pcs=config["dimred"]["n_pcs"], + figure_dir=str(FIGURE_DIR), + log: + LOG_DIR / "step05_dimred.log", + script: + "../scripts/dimred.py" + + +# Step 6: Clustering +rule clustering: + input: + CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 5, "dimreduced"), + output: + CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 6, "clustered"), + params: + resolution=config["clustering"]["resolution"], + figure_dir=str(FIGURE_DIR), + log: + LOG_DIR / "step06_clustering.log", + script: + "../scripts/cluster.py" diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk new file mode 100644 index 0000000..ad3be77 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk @@ -0,0 +1,37 @@ +"""Pseudobulking rule (Step 7) - Uses Snakemake checkpoint for dynamic cell types. + +This step aggregates single-cell data to pseudobulk samples per cell-type and donor. +It outputs a JSON file listing valid cell types, which enables dynamic downstream rules. + +IMPORTANT: This uses 'checkpoint' instead of 'rule' because: +- The number of cell types is not known until this step completes +- Downstream rules (steps 8-11) need to run for each discovered cell type +- Snakemake's checkpoint mechanism re-evaluates the DAG after this step +""" + + +# Step 7: Pseudobulking (CHECKPOINT - enables dynamic per-cell-type rules) +checkpoint pseudobulk: + input: + # Step 3 provides raw counts in .X + qc_checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), + # Step 6 provides clustered data (for var_to_feature mapping) + clustered_checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 6, "clustered"), + output: + # Main pseudobulk AnnData + pseudobulk=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 7, "pseudobulk"), + # JSON file listing valid cell types (enables dynamic downstream rules) + cell_types=CHECKPOINT_DIR / f"dataset-{DATASET}_step-07_cell_types.json", + # Gene name mapping (needed for DE analysis) + var_to_feature=CHECKPOINT_DIR / f"dataset-{DATASET}_step-07_var_to_feature.json", + params: + group_col=config["pseudobulk"]["group_col"], + donor_col=config["pseudobulk"]["donor_col"], + metadata_cols=config["pseudobulk"]["metadata_cols"], + min_cells=config["pseudobulk"]["min_cells"], + min_samples_per_cell_type=config["min_samples_per_cell_type"], + figure_dir=str(FIGURE_DIR), + log: + LOG_DIR / "step07_pseudobulk.log", + script: + "../scripts/pseudobulk.py" diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/aggregate_results.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/aggregate_results.py new file mode 100644 index 0000000..4686bea --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/aggregate_results.py @@ -0,0 +1,56 @@ +"""Snakemake script for aggregating all per-cell-type results. + +This script runs after all per-cell-type analyses are complete. +It creates a summary file indicating successful completion. +""" +# ruff: noqa: F821 + +from datetime import datetime +from pathlib import Path + + +def main(): + """Aggregate results and create completion marker.""" + output_file = Path(snakemake.output[0]) + input_files = [Path(f) for f in snakemake.input] + + print(f"Aggregating {len(input_files)} result files...") + + # Group files by cell type + cell_types = set() + for f in input_files: + # Extract cell type from path: results/per_cell_type/{cell_type}/... + parts = f.parts + if "per_cell_type" in parts: + idx = parts.index("per_cell_type") + if idx + 1 < len(parts): + cell_types.add(parts[idx + 1]) + + # Create summary + summary = { + "workflow": "snakemake_scrna_immune_aging", + "completed_at": datetime.now().isoformat(), + "cell_types_analyzed": sorted(cell_types), + "total_cell_types": len(cell_types), + "output_files": len(input_files), + } + + # Write summary to output file + output_file.parent.mkdir(parents=True, exist_ok=True) + with open(output_file, "w") as f: + f.write("Workflow completed successfully!\n") + f.write(f"Completed at: {summary['completed_at']}\n") + f.write(f"Cell types analyzed: {summary['total_cell_types']}\n") + f.write("\n") + f.write("Cell types:\n") + for ct in summary["cell_types_analyzed"]: + f.write(f" - {ct}\n") + f.write("\n") + f.write(f"Total output files: {summary['output_files']}\n") + + print(f"Summary written to: {output_file}") + print(f"Analyzed {len(cell_types)} cell types") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py new file mode 100644 index 0000000..243a107 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py @@ -0,0 +1,43 @@ +"""Snakemake script for Step 6: Clustering.""" + +from pathlib import Path + +from BetterCodeBetterScience.rnaseq.modular_workflow.clustering import ( + run_clustering_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + load_checkpoint, + save_checkpoint, +) + + +def main(): + """Run clustering pipeline.""" + # ruff: noqa: F821 + input_file = Path(snakemake.input[0]) + output_file = Path(snakemake.output[0]) + + # Get parameters + resolution = snakemake.params.resolution + figure_dir = ( + Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None + ) + + print(f"Loading data from: {input_file}") + adata = load_checkpoint(input_file) + print(f"Loaded dataset: {adata}") + + print("Running clustering pipeline...") + adata = run_clustering_pipeline( + adata, + resolution=resolution, + figure_dir=figure_dir, + ) + + # Save checkpoint + save_checkpoint(adata, output_file) + print(f"Saved checkpoint: {output_file}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/differential_expression.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/differential_expression.py new file mode 100644 index 0000000..193b423 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/differential_expression.py @@ -0,0 +1,79 @@ +"""Snakemake script for Step 8: Differential Expression.""" + +import json +from pathlib import Path + +from BetterCodeBetterScience.rnaseq.modular_workflow.differential_expression import ( + run_differential_expression_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + load_checkpoint, + save_checkpoint, +) + + +def unsanitize_cell_type(sanitized: str, cell_types_file: Path) -> str: + """Convert sanitized cell type back to original name.""" + # ruff: noqa: F821 + with open(cell_types_file) as f: + data = json.load(f) + # Reverse lookup + for original, sanitized_name in data["sanitized_names"].items(): + if sanitized_name == sanitized: + return original + raise ValueError(f"Unknown sanitized cell type: {sanitized}") + + +def main(): + """Run differential expression for a cell type.""" + pseudobulk_file = Path(snakemake.input.pseudobulk) + var_to_feature_file = Path(snakemake.input.var_to_feature) + cell_types_file = Path(snakemake.input.cell_types) + + output_stat_res = Path(snakemake.output.stat_res) + output_de_results = Path(snakemake.output.de_results) + output_counts_df = Path(snakemake.output.counts_df) + + # Get parameters + sanitized_cell_type = snakemake.params.cell_type + design_factors = snakemake.params.design_factors + n_cpus = snakemake.params.n_cpus + + # Get original cell type name + cell_type = unsanitize_cell_type(sanitized_cell_type, cell_types_file) + + print(f"Running DE for cell type: {cell_type}") + print(f"(sanitized: {sanitized_cell_type})") + + # Create output directory + output_stat_res.parent.mkdir(parents=True, exist_ok=True) + + # Load pseudobulk data + pb_adata = load_checkpoint(pseudobulk_file) + + # Load var_to_feature mapping + with open(var_to_feature_file) as f: + var_to_feature = json.load(f) + + # Run differential expression + stat_res, de_results, counts_df = run_differential_expression_pipeline( + pb_adata, + cell_type=cell_type, + design_factors=design_factors, + var_to_feature=var_to_feature, + n_cpus=n_cpus, + ) + + # Save outputs + save_checkpoint(stat_res, output_stat_res) + de_results.to_parquet(output_de_results) + counts_df.to_parquet(output_counts_df) + + print(f"DE results saved for: {cell_type}") + print(f" - stat_res: {output_stat_res}") + print(f" - de_results: {output_de_results}") + print(f" - counts_df: {output_counts_df}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py new file mode 100644 index 0000000..fc20290 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py @@ -0,0 +1,47 @@ +"""Snakemake script for Step 5: Dimensionality Reduction.""" + +from pathlib import Path + +from BetterCodeBetterScience.rnaseq.modular_workflow.dimensionality_reduction import ( + run_dimensionality_reduction_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + load_checkpoint, + save_checkpoint, +) + + +def main(): + """Run dimensionality reduction pipeline.""" + # ruff: noqa: F821 + input_file = Path(snakemake.input[0]) + output_file = Path(snakemake.output[0]) + + # Get parameters + batch_key = snakemake.params.batch_key + n_neighbors = snakemake.params.n_neighbors + n_pcs = snakemake.params.n_pcs + figure_dir = ( + Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None + ) + + print(f"Loading data from: {input_file}") + adata = load_checkpoint(input_file) + print(f"Loaded dataset: {adata}") + + print("Running dimensionality reduction pipeline...") + adata = run_dimensionality_reduction_pipeline( + adata, + batch_key=batch_key, + n_neighbors=n_neighbors, + n_pcs=n_pcs, + figure_dir=figure_dir, + ) + + # Save checkpoint + save_checkpoint(adata, output_file) + print(f"Saved checkpoint: {output_file}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/download.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/download.py new file mode 100644 index 0000000..659b04d --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/download.py @@ -0,0 +1,23 @@ +"""Snakemake script for Step 1: Data Download.""" + +from pathlib import Path + +from BetterCodeBetterScience.rnaseq.modular_workflow.data_loading import download_data + + +def main(): + """Download data file if it doesn't exist.""" + # ruff: noqa: F821 + datafile = Path(snakemake.output[0]) + url = snakemake.params.url + + print(f"Downloading data from: {url}") + print(f"Output file: {datafile}") + + download_data(datafile, url) + + print(f"Download complete: {datafile}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/enrichr.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/enrichr.py new file mode 100644 index 0000000..a076199 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/enrichr.py @@ -0,0 +1,54 @@ +"""Snakemake script for Step 10: Overrepresentation Analysis (Enrichr).""" + +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.rnaseq.modular_workflow.overrepresentation_analysis import ( + run_overrepresentation_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import save_checkpoint + + +def main(): + """Run Enrichr overrepresentation analysis for a cell type.""" + # ruff: noqa: F821 + de_results_file = Path(snakemake.input.de_results) + output_enr_up = Path(snakemake.output.enr_up) + output_enr_down = Path(snakemake.output.enr_down) + + # Get parameters + cell_type = snakemake.params.cell_type + gene_sets = snakemake.params.gene_sets + padj_threshold = snakemake.params.padj_threshold + n_top = snakemake.params.n_top + figure_dir = Path(snakemake.params.figure_dir) + + print(f"Running Enrichr for cell type: {cell_type}") + + # Create output directories + output_enr_up.parent.mkdir(parents=True, exist_ok=True) + figure_dir.mkdir(parents=True, exist_ok=True) + + # Load DE results + de_results = pd.read_parquet(de_results_file) + + # Run overrepresentation analysis + enr_up, enr_down = run_overrepresentation_pipeline( + de_results, + gene_sets=gene_sets, + padj_threshold=padj_threshold, + n_top=n_top, + figure_dir=figure_dir, + ) + + # Save results + save_checkpoint(enr_up, output_enr_up) + save_checkpoint(enr_down, output_enr_down) + print("Enrichr results saved:") + print(f" - enr_up: {output_enr_up}") + print(f" - enr_down: {output_enr_down}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py new file mode 100644 index 0000000..9a37dd6 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py @@ -0,0 +1,48 @@ +"""Snakemake script for Step 2: Data Filtering.""" + +from pathlib import Path + +from BetterCodeBetterScience.rnaseq.modular_workflow.data_filtering import ( + run_filtering_pipeline, +) +from BetterCodeBetterScience.rnaseq.modular_workflow.data_loading import ( + load_lazy_anndata, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import save_checkpoint + + +def main(): + """Load data and run filtering pipeline.""" + # ruff: noqa: F821 + input_file = Path(snakemake.input[0]) + output_file = Path(snakemake.output[0]) + + # Get parameters + cutoff_percentile = snakemake.params.cutoff_percentile + min_cells_per_celltype = snakemake.params.min_cells_per_celltype + percent_donors = snakemake.params.percent_donors + figure_dir = ( + Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None + ) + + print(f"Loading data from: {input_file}") + adata = load_lazy_anndata(input_file) + print(f"Loaded dataset: {adata}") + + print("Running filtering pipeline...") + adata = run_filtering_pipeline( + adata, + cutoff_percentile=cutoff_percentile, + min_cells_per_celltype=min_cells_per_celltype, + percent_donors=percent_donors, + figure_dir=figure_dir, + ) + print(f"Dataset after filtering: {adata}") + + # Save checkpoint + save_checkpoint(adata, output_file) + print(f"Saved checkpoint: {output_file}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/gsea.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/gsea.py new file mode 100644 index 0000000..2d956c0 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/gsea.py @@ -0,0 +1,48 @@ +"""Snakemake script for Step 9: Pathway Analysis (GSEA).""" + +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.rnaseq.modular_workflow.pathway_analysis import ( + run_gsea_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import save_checkpoint + + +def main(): + """Run GSEA pathway analysis for a cell type.""" + # ruff: noqa: F821 + de_results_file = Path(snakemake.input.de_results) + output_file = Path(snakemake.output.gsea_results) + + # Get parameters + cell_type = snakemake.params.cell_type + gene_sets = snakemake.params.gene_sets + n_top = snakemake.params.n_top + figure_dir = Path(snakemake.params.figure_dir) + + print(f"Running GSEA for cell type: {cell_type}") + + # Create output directories + output_file.parent.mkdir(parents=True, exist_ok=True) + figure_dir.mkdir(parents=True, exist_ok=True) + + # Load DE results + de_results = pd.read_parquet(de_results_file) + + # Run GSEA + gsea_results = run_gsea_pipeline( + de_results, + gene_sets=gene_sets, + n_top=n_top, + figure_dir=figure_dir, + ) + + # Save results + save_checkpoint(gsea_results, output_file) + print(f"GSEA results saved: {output_file}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/prediction.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/prediction.py new file mode 100644 index 0000000..5ff128b --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/prediction.py @@ -0,0 +1,79 @@ +"""Snakemake script for Step 11: Predictive Modeling.""" + +import json +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.rnaseq.modular_workflow.predictive_modeling import ( + run_predictive_modeling_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + load_checkpoint, + save_checkpoint, +) + + +def unsanitize_cell_type(sanitized: str, cell_types_file: Path) -> str: + """Convert sanitized cell type back to original name.""" + # ruff: noqa: F821 + with open(cell_types_file) as f: + data = json.load(f) + # Reverse lookup + for original, sanitized_name in data["sanitized_names"].items(): + if sanitized_name == sanitized: + return original + raise ValueError(f"Unknown sanitized cell type: {sanitized}") + + +def main(): + """Run predictive modeling for a cell type.""" + counts_df_file = Path(snakemake.input.counts_df) + pseudobulk_file = Path(snakemake.input.pseudobulk) + cell_types_file = Path(snakemake.input.cell_types) + output_file = Path(snakemake.output.prediction_results) + + # Get parameters + sanitized_cell_type = snakemake.params.cell_type + n_splits = snakemake.params.n_splits + figure_dir = Path(snakemake.params.figure_dir) + + # Get original cell type name + cell_type = unsanitize_cell_type(sanitized_cell_type, cell_types_file) + + print(f"Running predictive modeling for cell type: {cell_type}") + + # Create output directories + output_file.parent.mkdir(parents=True, exist_ok=True) + figure_dir.mkdir(parents=True, exist_ok=True) + + # Load counts + counts_df = pd.read_parquet(counts_df_file) + + # Load pseudobulk to get metadata + pb_adata = load_checkpoint(pseudobulk_file) + + # Get metadata for this cell type + pb_adata_ct = pb_adata[pb_adata.obs["cell_type"] == cell_type].copy() + pb_adata_ct.obs["age"] = ( + pb_adata_ct.obs["development_stage"] + .str.extract(r"(\d+)-year-old")[0] + .astype(float) + ) + metadata = pb_adata_ct.obs.copy() + + # Run predictive modeling + prediction_results = run_predictive_modeling_pipeline( + counts_df, + metadata, + n_splits=n_splits, + figure_dir=figure_dir, + ) + + # Save results + save_checkpoint(prediction_results, output_file) + print(f"Prediction results saved: {output_file}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/preprocess.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/preprocess.py new file mode 100644 index 0000000..dfdee46 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/preprocess.py @@ -0,0 +1,48 @@ +"""Snakemake script for Step 4: Preprocessing.""" + +from pathlib import Path + +from BetterCodeBetterScience.rnaseq.modular_workflow.preprocessing import ( + run_preprocessing_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + load_checkpoint, + save_checkpoint, +) + + +def main(): + """Run preprocessing pipeline.""" + # ruff: noqa: F821 + input_file = Path(snakemake.input[0]) + output_file = Path(snakemake.output[0]) + + # Get parameters + target_sum = snakemake.params.target_sum + n_top_genes = snakemake.params.n_top_genes + batch_key = snakemake.params.batch_key + + print(f"Loading data from: {input_file}") + adata = load_checkpoint(input_file) + print(f"Loaded dataset: {adata}") + + print("Running preprocessing pipeline...") + adata = run_preprocessing_pipeline( + adata, + target_sum=target_sum, + n_top_genes=n_top_genes, + batch_key=batch_key, + ) + + # Remove counts layer after preprocessing to save space + if "counts" in adata.layers: + del adata.layers["counts"] + print("Removed counts layer to save checkpoint space") + + # Save checkpoint + save_checkpoint(adata, output_file) + print(f"Saved checkpoint: {output_file}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py new file mode 100644 index 0000000..8d19de4 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py @@ -0,0 +1,110 @@ +"""Snakemake script for Step 7: Pseudobulking. + +This script creates the pseudobulk data AND outputs a JSON file listing +all valid cell types, which enables downstream dynamic rules. +""" +# ruff: noqa: F821 + +import json +from pathlib import Path + +from BetterCodeBetterScience.rnaseq.modular_workflow.pseudobulk import ( + run_pseudobulk_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + load_checkpoint, + save_checkpoint, +) + + +def sanitize_cell_type(cell_type: str) -> str: + """Sanitize cell type name for filesystem use.""" + return cell_type.replace(" ", "_").replace(",", "").replace("-", "_") + + +def main(): + """Run pseudobulking pipeline and output cell types JSON.""" + qc_checkpoint = Path(snakemake.input.qc_checkpoint) + clustered_checkpoint = Path(snakemake.input.clustered_checkpoint) + output_pseudobulk = Path(snakemake.output.pseudobulk) + output_cell_types = Path(snakemake.output.cell_types) + output_var_to_feature = Path(snakemake.output.var_to_feature) + + # Get parameters + group_col = snakemake.params.group_col + donor_col = snakemake.params.donor_col + metadata_cols = snakemake.params.metadata_cols + min_cells = snakemake.params.min_cells + min_samples_per_cell_type = snakemake.params.min_samples_per_cell_type + figure_dir = ( + Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None + ) + + # Load step 3 checkpoint (raw counts in .X) + print(f"Loading raw counts from: {qc_checkpoint}") + adata_raw = load_checkpoint(qc_checkpoint) + print(f"Loaded: {adata_raw}") + + # Load clustered data to get var_to_feature mapping + print(f"Loading clustered data from: {clustered_checkpoint}") + adata_clustered = load_checkpoint(clustered_checkpoint) + var_to_feature = dict( + zip(adata_clustered.var_names, adata_clustered.var["feature_name"]) + ) + + # Run pseudobulking on raw counts + print("Running pseudobulking pipeline...") + pb_adata = run_pseudobulk_pipeline( + adata_raw, + group_col=group_col, + donor_col=donor_col, + metadata_cols=metadata_cols, + min_cells=min_cells, + figure_dir=figure_dir, + layer=None, # Use .X directly (raw counts) + ) + print(f"Pseudobulk data: {pb_adata}") + + # Save pseudobulk checkpoint + save_checkpoint(pb_adata, output_pseudobulk) + print(f"Saved pseudobulk checkpoint: {output_pseudobulk}") + + # Determine valid cell types (with sufficient samples) + all_cell_types = pb_adata.obs[group_col].unique().tolist() + cell_type_counts = pb_adata.obs[group_col].value_counts() + + valid_cell_types = [ + ct for ct in all_cell_types if cell_type_counts[ct] >= min_samples_per_cell_type + ] + skipped_cell_types = [ct for ct in all_cell_types if ct not in valid_cell_types] + + print(f"\nFound {len(all_cell_types)} cell types total") + print( + f"Valid cell types (>= {min_samples_per_cell_type} samples): {len(valid_cell_types)}" + ) + if skipped_cell_types: + print(f"Skipped cell types (insufficient samples): {skipped_cell_types}") + + # Write cell types JSON (enables dynamic rules) + cell_types_data = { + "all_cell_types": all_cell_types, + "valid_cell_types": valid_cell_types, + "skipped_cell_types": skipped_cell_types, + "cell_type_counts": {str(k): int(v) for k, v in cell_type_counts.items()}, + "min_samples_threshold": min_samples_per_cell_type, + # Include sanitized names for filesystem use + "sanitized_names": {ct: sanitize_cell_type(ct) for ct in valid_cell_types}, + } + + with open(output_cell_types, "w") as f: + json.dump(cell_types_data, f, indent=2) + print(f"Saved cell types JSON: {output_cell_types}") + + # Write var_to_feature mapping (needed for DE) + with open(output_var_to_feature, "w") as f: + json.dump(var_to_feature, f) + print(f"Saved var_to_feature mapping: {output_var_to_feature}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py new file mode 100644 index 0000000..d66066d --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py @@ -0,0 +1,54 @@ +"""Snakemake script for Step 3: Quality Control.""" + +from pathlib import Path + +from BetterCodeBetterScience.rnaseq.modular_workflow.quality_control import ( + run_qc_pipeline, +) +from BetterCodeBetterScience.rnaseq.stateless_workflow.checkpoint import ( + load_checkpoint, + save_checkpoint, +) + + +def main(): + """Run quality control pipeline.""" + # ruff: noqa: F821 + input_file = Path(snakemake.input[0]) + output_file = Path(snakemake.output[0]) + + # Get parameters + min_genes = snakemake.params.min_genes + max_genes = snakemake.params.max_genes + min_counts = snakemake.params.min_counts + max_counts = snakemake.params.max_counts + max_hb_pct = snakemake.params.max_hb_pct + expected_doublet_rate = snakemake.params.expected_doublet_rate + figure_dir = ( + Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None + ) + + print(f"Loading data from: {input_file}") + adata = load_checkpoint(input_file) + print(f"Loaded dataset: {adata}") + + print("Running QC pipeline...") + adata = run_qc_pipeline( + adata, + min_genes=min_genes, + max_genes=max_genes, + min_counts=min_counts, + max_counts=max_counts, + max_hb_pct=max_hb_pct, + expected_doublet_rate=expected_doublet_rate, + figure_dir=figure_dir, + ) + print(f"Dataset after QC: {adata}") + + # Save checkpoint + save_checkpoint(adata, output_file) + print(f"Saved checkpoint: {output_file}") + + +if __name__ == "__main__": + main() From f2052d2ac88878a5976532c5e604c8012a8fefae Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 05:50:16 -0800 Subject: [PATCH 54/87] Fix feature_name KeyError in Snakemake pseudobulk step MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Get var_to_feature mapping from QC checkpoint (step 3) instead of clustered checkpoint (step 6), since HVG selection during preprocessing removes the feature_name column. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/snakemake_workflow/rules/pseudobulk.smk | 4 ++-- .../rnaseq/snakemake_workflow/scripts/pseudobulk.py | 13 +++++++------ 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk index ad3be77..ba9e165 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk @@ -13,9 +13,9 @@ IMPORTANT: This uses 'checkpoint' instead of 'rule' because: # Step 7: Pseudobulking (CHECKPOINT - enables dynamic per-cell-type rules) checkpoint pseudobulk: input: - # Step 3 provides raw counts in .X + # Step 3 provides raw counts in .X and var_to_feature mapping qc_checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), - # Step 6 provides clustered data (for var_to_feature mapping) + # Step 6 listed for workflow ordering (ensures clustering completes first) clustered_checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 6, "clustered"), output: # Main pseudobulk AnnData diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py index 8d19de4..6caa564 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py @@ -25,7 +25,8 @@ def sanitize_cell_type(cell_type: str) -> str: def main(): """Run pseudobulking pipeline and output cell types JSON.""" qc_checkpoint = Path(snakemake.input.qc_checkpoint) - clustered_checkpoint = Path(snakemake.input.clustered_checkpoint) + # Note: clustered_checkpoint is listed as input for dependency ordering + # but not used here - we use QC checkpoint for both counts and gene names output_pseudobulk = Path(snakemake.output.pseudobulk) output_cell_types = Path(snakemake.output.cell_types) output_var_to_feature = Path(snakemake.output.var_to_feature) @@ -40,17 +41,17 @@ def main(): Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None ) - # Load step 3 checkpoint (raw counts in .X) + # Load step 3 checkpoint (raw counts in .X, has feature_name annotations) print(f"Loading raw counts from: {qc_checkpoint}") adata_raw = load_checkpoint(qc_checkpoint) print(f"Loaded: {adata_raw}") - # Load clustered data to get var_to_feature mapping - print(f"Loading clustered data from: {clustered_checkpoint}") - adata_clustered = load_checkpoint(clustered_checkpoint) + # Get var_to_feature mapping from QC checkpoint (before HVG selection) + # Step 6 (clustered) may not have feature_name after preprocessing var_to_feature = dict( - zip(adata_clustered.var_names, adata_clustered.var["feature_name"]) + zip(adata_raw.var_names, adata_raw.var["feature_name"]) ) + print(f"Built var_to_feature mapping with {len(var_to_feature)} genes") # Run pseudobulking on raw counts print("Running pseudobulking pipeline...") From e7106a7f06d5b8c9bbdfed419f269d24dccc4839 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 05:58:05 -0800 Subject: [PATCH 55/87] Handle missing feature_name column in pseudobulk step MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fall back to using var_names as feature names if the feature_name column doesn't exist in the checkpoint. This handles datasets where gene symbols are already stored in the var index. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../snakemake_workflow/scripts/pseudobulk.py | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py index 6caa564..e090b4e 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py @@ -48,10 +48,17 @@ def main(): # Get var_to_feature mapping from QC checkpoint (before HVG selection) # Step 6 (clustered) may not have feature_name after preprocessing - var_to_feature = dict( - zip(adata_raw.var_names, adata_raw.var["feature_name"]) - ) - print(f"Built var_to_feature mapping with {len(var_to_feature)} genes") + # Fall back to var_names if feature_name column doesn't exist + if "feature_name" in adata_raw.var.columns: + var_to_feature = dict( + zip(adata_raw.var_names, adata_raw.var["feature_name"]) + ) + print("Built var_to_feature mapping from feature_name column") + else: + # Use var_names as feature names (they may already be gene symbols) + var_to_feature = dict(zip(adata_raw.var_names, adata_raw.var_names)) + print("No feature_name column found, using var_names as feature names") + print(f"var_to_feature mapping has {len(var_to_feature)} genes") # Run pseudobulking on raw counts print("Running pseudobulking pipeline...") From b81dfb625227dc543f91157439524377103aad6b Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 06:07:38 -0800 Subject: [PATCH 56/87] Use step 2 (filtered) checkpoint for var_to_feature mapping MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Match Prefect workflow behavior: get feature_name from step 2 (filtered) checkpoint, not step 3 (QC). The feature_name column is available after filtering but may not be preserved through QC processing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../snakemake_workflow/rules/pseudobulk.smk | 4 ++- .../snakemake_workflow/scripts/pseudobulk.py | 26 +++++++------------ 2 files changed, 13 insertions(+), 17 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk index ba9e165..678937e 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk @@ -13,7 +13,9 @@ IMPORTANT: This uses 'checkpoint' instead of 'rule' because: # Step 7: Pseudobulking (CHECKPOINT - enables dynamic per-cell-type rules) checkpoint pseudobulk: input: - # Step 3 provides raw counts in .X and var_to_feature mapping + # Step 2 provides feature_name column for var_to_feature mapping + filtered_checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 2, "filtered"), + # Step 3 provides raw counts in .X (after QC filtering) qc_checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), # Step 6 listed for workflow ordering (ensures clustering completes first) clustered_checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 6, "clustered"), diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py index e090b4e..8273b68 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py @@ -24,9 +24,9 @@ def sanitize_cell_type(cell_type: str) -> str: def main(): """Run pseudobulking pipeline and output cell types JSON.""" + filtered_checkpoint = Path(snakemake.input.filtered_checkpoint) qc_checkpoint = Path(snakemake.input.qc_checkpoint) # Note: clustered_checkpoint is listed as input for dependency ordering - # but not used here - we use QC checkpoint for both counts and gene names output_pseudobulk = Path(snakemake.output.pseudobulk) output_cell_types = Path(snakemake.output.cell_types) output_var_to_feature = Path(snakemake.output.var_to_feature) @@ -41,25 +41,19 @@ def main(): Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None ) - # Load step 3 checkpoint (raw counts in .X, has feature_name annotations) + # Load step 2 checkpoint to get var_to_feature mapping (has feature_name) + print(f"Loading filtered data for var_to_feature: {filtered_checkpoint}") + adata_filtered = load_checkpoint(filtered_checkpoint) + var_to_feature = dict( + zip(adata_filtered.var_names, adata_filtered.var["feature_name"]) + ) + print(f"Built var_to_feature mapping with {len(var_to_feature)} genes") + + # Load step 3 checkpoint for raw counts (after QC filtering) print(f"Loading raw counts from: {qc_checkpoint}") adata_raw = load_checkpoint(qc_checkpoint) print(f"Loaded: {adata_raw}") - # Get var_to_feature mapping from QC checkpoint (before HVG selection) - # Step 6 (clustered) may not have feature_name after preprocessing - # Fall back to var_names if feature_name column doesn't exist - if "feature_name" in adata_raw.var.columns: - var_to_feature = dict( - zip(adata_raw.var_names, adata_raw.var["feature_name"]) - ) - print("Built var_to_feature mapping from feature_name column") - else: - # Use var_names as feature names (they may already be gene symbols) - var_to_feature = dict(zip(adata_raw.var_names, adata_raw.var_names)) - print("No feature_name column found, using var_names as feature names") - print(f"var_to_feature mapping has {len(var_to_feature)} genes") - # Run pseudobulking on raw counts print("Running pseudobulking pipeline...") pb_adata = run_pseudobulk_pipeline( From 32f342856d9933f9eaa226c3fed007a4c48e58cc Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 06:43:36 -0800 Subject: [PATCH 57/87] Add thread specifications to compute-intensive Snakemake rules MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Use workflow.cores for QC, preprocessing, and dimensionality reduction - Set NUMBA_NUM_THREADS and OMP_NUM_THREADS in dimred.py script This uses the cores specified via --cores on the command line, matching Prefect behavior where tasks have access to all CPUs by default. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/snakemake_workflow/rules/preprocessing.smk | 3 +++ .../rnaseq/snakemake_workflow/scripts/dimred.py | 9 ++++++++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk index 3545060..7991264 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk @@ -45,6 +45,7 @@ rule quality_control: CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 2, "filtered"), output: CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), + threads: workflow.cores params: min_genes=config["qc"]["min_genes"], max_genes=config["qc"]["max_genes"], @@ -65,6 +66,7 @@ rule preprocess: CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), output: CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 4, "preprocessed"), + threads: workflow.cores params: target_sum=config["preprocessing"]["target_sum"], n_top_genes=config["preprocessing"]["n_top_genes"], @@ -81,6 +83,7 @@ rule dimensionality_reduction: CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 4, "preprocessed"), output: CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 5, "dimreduced"), + threads: workflow.cores params: batch_key=config["dimred"]["batch_key"], n_neighbors=config["dimred"]["n_neighbors"], diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py index fc20290..7a4c52c 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py @@ -1,7 +1,13 @@ """Snakemake script for Step 5: Dimensionality Reduction.""" +# ruff: noqa: F821 +import os from pathlib import Path +# Set thread count for numba/pynndescent before importing scanpy +os.environ["NUMBA_NUM_THREADS"] = str(snakemake.threads) +os.environ["OMP_NUM_THREADS"] = str(snakemake.threads) + from BetterCodeBetterScience.rnaseq.modular_workflow.dimensionality_reduction import ( run_dimensionality_reduction_pipeline, ) @@ -13,7 +19,8 @@ def main(): """Run dimensionality reduction pipeline.""" - # ruff: noqa: F821 + print(f"Running with {snakemake.threads} threads") + input_file = Path(snakemake.input[0]) output_file = Path(snakemake.output[0]) From 2ba2d788e97713992e279a31a55ee088cf79119d Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 06:53:03 -0800 Subject: [PATCH 58/87] Create output directories in Snakemake scripts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add directory creation for checkpoint and figure directories in all scripts that save outputs. This prevents FileNotFoundError when the workflow runs from a clean state. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/snakemake_workflow/scripts/cluster.py | 5 +++++ .../rnaseq/snakemake_workflow/scripts/dimred.py | 5 +++++ .../rnaseq/snakemake_workflow/scripts/filter.py | 5 +++++ .../rnaseq/snakemake_workflow/scripts/pseudobulk.py | 5 +++++ .../rnaseq/snakemake_workflow/scripts/qc.py | 5 +++++ 5 files changed, 25 insertions(+) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py index 243a107..4a67cb4 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py @@ -23,6 +23,11 @@ def main(): Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None ) + # Create output directories + output_file.parent.mkdir(parents=True, exist_ok=True) + if figure_dir: + figure_dir.mkdir(parents=True, exist_ok=True) + print(f"Loading data from: {input_file}") adata = load_checkpoint(input_file) print(f"Loaded dataset: {adata}") diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py index 7a4c52c..4ce8f1e 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py @@ -32,6 +32,11 @@ def main(): Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None ) + # Create output directories + output_file.parent.mkdir(parents=True, exist_ok=True) + if figure_dir: + figure_dir.mkdir(parents=True, exist_ok=True) + print(f"Loading data from: {input_file}") adata = load_checkpoint(input_file) print(f"Loaded dataset: {adata}") diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py index 9a37dd6..4935521 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py @@ -25,6 +25,11 @@ def main(): Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None ) + # Create output directories + output_file.parent.mkdir(parents=True, exist_ok=True) + if figure_dir: + figure_dir.mkdir(parents=True, exist_ok=True) + print(f"Loading data from: {input_file}") adata = load_lazy_anndata(input_file) print(f"Loaded dataset: {adata}") diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py index 8273b68..d8a19bb 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/pseudobulk.py @@ -41,6 +41,11 @@ def main(): Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None ) + # Create output directories + output_pseudobulk.parent.mkdir(parents=True, exist_ok=True) + if figure_dir: + figure_dir.mkdir(parents=True, exist_ok=True) + # Load step 2 checkpoint to get var_to_feature mapping (has feature_name) print(f"Loading filtered data for var_to_feature: {filtered_checkpoint}") adata_filtered = load_checkpoint(filtered_checkpoint) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py index d66066d..4839258 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py @@ -28,6 +28,11 @@ def main(): Path(snakemake.params.figure_dir) if snakemake.params.figure_dir else None ) + # Create output directories + output_file.parent.mkdir(parents=True, exist_ok=True) + if figure_dir: + figure_dir.mkdir(parents=True, exist_ok=True) + print(f"Loading data from: {input_file}") adata = load_checkpoint(input_file) print(f"Loaded dataset: {adata}") From a531c9825e18709134e6d65ebeec85329dd081db Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 07:21:53 -0800 Subject: [PATCH 59/87] initial add --- .../rnaseq/snakemake_workflow/Makefile | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile new file mode 100644 index 0000000..52d3e71 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile @@ -0,0 +1,5 @@ +include ../../../../.env +export + +rulegraph: + snakemake --rulegraph --config datadir=$(DATADIR)/immune_aging/workflow/ --cores 2 | dot -Tpng > rulegraph.png From 263bdde530ddf60d65dbc32e68d5306110ecff07 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 07:43:37 -0800 Subject: [PATCH 60/87] intermediate progress, full draft of workflow engine intro --- book/workflows.md | 77 ++++++++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 27 deletions(-) diff --git a/book/workflows.md b/book/workflows.md index 9f7c432..fd0112c 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -112,6 +112,8 @@ What I found as I developed the workflow is that I increasingly ran into problem #### Converting from Jupyter notebook to a runnable python script +As we discussed in an earlier chapter, converting a Jupyter notebook to a pure python script is easy using `jupytext`. This results in a script that can be run from the command line. However, there can be some commands that will block execution of the script; in particular, plotting commands can open windows that will block execution until they are closed. To prevent this, and to ensure that the results of the plots are saved for later examination, I replaced all of the `plt.show()` commands that display a figure to the screen with `plt.savefig()` commands that save the figures to a file in the results directory. (This was an easy job for the Copilot agent to complete.) + - had to prevent plots from being displayed because this blocked execution - used copilot to find and fix all plotting commands to save them to file rather than showing @@ -144,8 +146,6 @@ In addition to a conceptual breakdown, there are also other reasons that one mig - There may be sections where one might wish to swap in a new method or different parameterization. - There may be points where the output could be reusable elsewhere. - - ## Stateless workflows I asked Claude Code to help modularize the monolithic workflow, using a prompt that provided the conceptual breakdown described above. The resulting code (found at XXX - link to commit 678983e1c337b6a23b0f35cfb974a87587cfd13e) ran correctly, but crashed about two hours into the process due to a resource issue that appeared to be due to asking for too many CPU cores in the differential expression analysis. This left me in the situation of having to rerun the entire two hours of preliminary workflow simply to get to a point where I could test my fix for the differential expression component, which is not a particularly efficient way of coding. The problem here is that the workflow execution is *stateful*, in the sense that the previous steps need to be rerun prior to performing the current step in order to establish the required objects in memory. The solution to this problem is to implement the workflow in a *stateless* way, which doesn't require that earlier steps be rerun if they have already been completed. One way to do this is by implementing a process called *checkpointing*, in which intermediate results are stored for each step. These can then be used to start the workflow at any point without having to rerun all of the previous steps. @@ -263,11 +263,59 @@ Another potentially helpful solution is to compress the checkpoint data if they Combining these strategies of reducing data duplication, eliminating some intermediate checkpoints, and compressing the stored data, our final pipeline generates about 13 GB worth of checkpoint data, substantially smaller than the initial 64 GB. With all checkpoints generated, the entire workflow completes in less than four minutes, with only three time-consuming steps being rerun each time. The initial execution of the workflow is a few minutes longer due to the extra time needed to read and write compressed checkpoint files, but these few minutes are hardly noticeable for a workflow that takes more than two hours to complete. +The use of a modular architecture for our stateless workflow helps to separate the actual workflow components from the execution logic of the workflow. One important benefit of this is that it allows us to plug those modules into any other workflow system, and as long as the inputs are correct it should work. We will see that next when we create new versions of this workflow using two common workflow engines. ### Using a workflow engine +There is a wide variety of workflow engines available for data analysis workflows, most of which are centered around the concept of an "execution graph." This is a graph in the sense described by graph theory, which refers to a set of nodes that are connected by lines (known as "edges"). Workflow execution graphs are a particular kind of graph known as a *directed acyclic graph*, or *DAG* for short. Each node in the graph represents a single step in the workflow, and each edge represents the dependency relationships that exist between nodes. DAGs have two important features. First, the edges are directed, which means that they move in one direction that is represented graphically as an arrow. These represent the dependencies within the workflow. For example, in our workflow step 1 (obtaining the data) must occur before step 2 (filtering the data), so the graph would have an edge from step 1 with an arrow pointing at step 2. Second, the graph is *acyclic*, which means that it doesn't have any cycles, that is, it never circles back on itself. Cycles would be problematic, since they could result in workflows that executed in an infinite loop as the cycle repeated itself. + +Most workflow engines provide tools to visualize a workflow as a DAG. #DAG-fig shows our example workflow visualized using the Snakemake tool that we will introduce below: + +```{figure} images/snakemake-DAG.png +:label: DAG-fig +:align: center +:width: 300px + +The execution graph for the RNA-seq analysis workflow visualized as a DAG. +``` + +The use of DAGs to represent workflows provides a number of important benefits: + +- The engine can identify independent pathways through the graph, which can then be executed in parallel +- If one node of the graph changes, the engine can identify which downstream nodes need to be rerun +- If a node fails, the engine can continue with executing the nodes that don't depend on the failed node either directly or indirectly + +Another benefit of using a workflow engine is that they generally deal automatically with checkpointing and caching of intermediate results. + +#### General-purpose versus domain-specific workflow engines + +With the growth of data science within industry and research, there has been an explosion of new workflow management systems that aim to solve particular problems; a list of these can be found at [awesome-workflow-engines](https://github.com/meirwah/awesome-workflow-engines). One important distinction between engines is the degree to which the workflow definition is built into the code, or whether it is defined in a *domain-specific language* (DSL). We will look at two examples below, one of which (Prefect) builds the workflow details in the code, and the other (Snakemake) uses a specialized syntax built on Python to define the workflow. + +It's also worth noting that there are a number of domain-specific workflow engines that are specialized for particular kinds of data and workflows. Examples include [Galaxy](https://galaxyproject.org/) which is specialized for bioinformatics and genomics, and [Nipype](https://nipype.readthedocs.io/en/latest/index.html) which is specialized for neuroimaging analysis workflows. If your research community uses one of these then it's worth exploring that engine as your first option, since it will probably be well supported within the community. However, a benefit of using a general-purpose engine is that they will often be better maintained and supported, and AI tools will likely have more examples to work from in generating workflows. + +#### A language-specific workflow management example: Prefect + + + + +#### A general-purpose workflow management example: Snakemake +The first time that I ran my workflow using snakemake, I noticed that it was substantially slower than when I ran it with Prefect. When I asked Claude about this, it gave me a reasonable answer: + +> The Snakemake rules don't specify a threads directive. By default, Snakemake runs each job with 1 thread, which limits parallelization of computationally intensive operations like neighbor computation and UMAP (which use numba internally). + +I asked Claude to fix this, and it returned the following change: + + +> Fixes applied: +> 1. Added threads: 8 to dimensionality_reduction rule +> 2. Added threads: 8 to quality_control rule (Scrublet) +> 3. Added threads: 4 to preprocess rule +> 4. Set NUMBA_NUM_THREADS and OMP_NUM_THREADS environment variables in dimred.py +> In contrast, Prefect tasks run in the main process with access to all CPUs by default, which is why it was faster. + +This solves the problem but it's an odd choice: in particular, it will probably fail if there are fewer than 8 threads available on the system. Snakemake actually take a command line argument (`--cores`) to specify the number of cores to use, so I instead asked Claude to have Snakemake use the number of cores specified at the command line rather than an arbitrary number that might fail if the requested number of cores are not available. ## Modularity and reusability @@ -326,14 +374,6 @@ also look at arrow for columnar data - look into arrow immutability - dask, xarray -## Checkpointing - -- pipeline state should be files on disk, not in memory -- functions don't pass large objects in memory, they simply pass file names -- for modern formats like parquet (others?) reading is very fast so the penalty is minimal - -## Precomputing expensive/common operations - ## Tracking provenance @@ -354,8 +394,6 @@ also look at arrow for columnar data - look into arrow immutability https://workflowhub.eu/ - - use narps as the working example - ## Error handling and robustness @@ -374,21 +412,6 @@ https://workflowhub.eu/ -## Simple workflow management with Makefiles - - - -## Python workflow management with checkpoints - - - -## Workflow management systems for complex workflows - -- introduce DAGs - -- general purpose vs domain specific - - overview various engines - From af0ce0b7ee0db50924a533098c00bb546112e99a Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 08:30:47 -0800 Subject: [PATCH 61/87] clean up immutability section --- book/workflows.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/book/workflows.md b/book/workflows.md index fd0112c..0646b96 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -108,14 +108,17 @@ I developed the initial version of this workflow as many researchers would: by c #### The problem of in-place operations -What I found as I developed the workflow is that I increasingly ran into problems that arose because the state of particular objects had changed. This occurred for two reasons at different points. In some cases it occurred because I saved a new version of the object to the same name, resulting in an object with different structure than before. +What I found as I developed the workflow is that I increasingly ran into problems that arose because the state of particular objects had changed. This occurred for two reasons at different points. In some cases it occurred because I saved a new version of the object to the same name, resulting in an object with different structure than before. Second, and more insidiously, it occurred when an object passed into a function is modified by the function internally. This is known as an *in-place* operation, in which a function modifies an object directly rather than returning a new object that can be assigned to a variable. -#### Converting from Jupyter notebook to a runnable python script +In-place operations can make code particularly difficult to debug in the context of a Jupyter notebook, because it's a case where out-of-order execution can result in very confusing results or errors, since the changes that were made in-place may not be obvious. For this reason, I generally avoid any kind of in-place operations if possible. Rather, any functions should immediately create a copy of the object that was passed in, and then do its work on that copy, which is returned at the end of the function for assignment to a new variable. One can then re-assign it to the same variable name if desired, which is more transparent than an in-place operation but still makes the workflow dependent on the exact state of execution and can lead to confusion when debugging. Some packages allow a feature called "copy-on-write" which defers actually copying the data in memory until it is actually modified, which can make copying more efficient. + +If one must modify objects in-place, then it is good practice to announce this loudly. The loudest way to do this would be to put "inplace" in the function name. Another cleaner but less loud way is through conventions regarding function naming; for example, in PyTorch it is a convention that any function that ends with an underscore (e.g. `tensor.mul_(x)`) performs an in-place operation whereas the same function without the underscore (`tensor.mul(x)`) returns a new object. Another way that some packages enable explicit in-place operations is through a function argument (e.g. `inplace=True` in pandas), though this is being phased out from many functions in Pandas because "It is generally seen (at least by several pandas maintainers and educators) as bad practice and often unnecessary" ([PDEP-8](https://pandas.pydata.org/pdeps/0008-inplace-methods-in-pandas.html)). -As we discussed in an earlier chapter, converting a Jupyter notebook to a pure python script is easy using `jupytext`. This results in a script that can be run from the command line. However, there can be some commands that will block execution of the script; in particular, plotting commands can open windows that will block execution until they are closed. To prevent this, and to ensure that the results of the plots are saved for later examination, I replaced all of the `plt.show()` commands that display a figure to the screen with `plt.savefig()` commands that save the figures to a file in the results directory. (This was an easy job for the Copilot agent to complete.) +One way to prevent in-place operations altogether is to use data types that are *immutable*, meaning that they can't be changed once created. This is one of the central principles in *functional programming* languages (such as Haskell), where all data types are immutable, such that one is required to create a new object any time data are modified. Some native data types in Python are immutable (such as tuples and frozensets), and some data science packages also provide immutable data types; in particular, the Polars package (which is meant to be a high-performance alternative to pandas) implements its version of a data frame as an immutable object, and the JAX package (for high-performance numerical computation and machine learning) implements immutable numerical arrays. -- had to prevent plots from being displayed because this blocked execution -- used copilot to find and fix all plotting commands to save them to file rather than showing +#### Converting from Jupyter notebook to a runnable python script + +As we discussed in an earlier chapter, converting a Jupyter notebook to a pure Python script is easy using `jupytext`. This results in a script that can be run from the command line. However, there can be some commands that will block execution of the script; in particular, plotting commands can open windows that will block execution until they are closed. To prevent this, and to ensure that the results of the plots are saved for later examination, I replaced all of the `plt.show()` commands that display a figure to the screen with `plt.savefig()` commands that save the figures to a file in the results directory. (This was an easy job for the Copilot agent to complete.) ## Decomposing a complex workflow @@ -301,6 +304,7 @@ It's also worth noting that there are a number of domain-specific workflow engin #### A general-purpose workflow management example: Snakemake + The first time that I ran my workflow using snakemake, I noticed that it was substantially slower than when I ran it with Prefect. When I asked Claude about this, it gave me a reasonable answer: > The Snakemake rules don't specify a threads directive. By default, Snakemake runs each job with 1 thread, which limits parallelization of computationally intensive operations like neighbor computation and UMAP (which use numba internally). @@ -315,7 +319,7 @@ I asked Claude to fix this, and it returned the following change: > 4. Set NUMBA_NUM_THREADS and OMP_NUM_THREADS environment variables in dimred.py > In contrast, Prefect tasks run in the main process with access to all CPUs by default, which is why it was faster. -This solves the problem but it's an odd choice: in particular, it will probably fail if there are fewer than 8 threads available on the system. Snakemake actually take a command line argument (`--cores`) to specify the number of cores to use, so I instead asked Claude to have Snakemake use the number of cores specified at the command line rather than an arbitrary number that might fail if the requested number of cores are not available. +This solves the problem but it's an odd choice: in particular, it will probably fail if there are fewer than 8 threads available on the system. Snakemake actually take a command line argument (`--cores`) to specify the number of cores to use, so I instead asked Claude to have Snakemake use the number of cores specified at the command line rather than an arbitrary number that might fail if the requested number of cores are not available. We will discuss optimization further in a later chapter. ## Modularity and reusability From 67c95e6e179e75c9307f122291e9bc012eaf1b7f Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 08:31:01 -0800 Subject: [PATCH 62/87] initial add --- book/images/snakemake-DAG.png | Bin 0 -> 38256 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 book/images/snakemake-DAG.png diff --git a/book/images/snakemake-DAG.png b/book/images/snakemake-DAG.png new file mode 100644 index 0000000000000000000000000000000000000000..231ddf5467603b62dd95813fdb2cf178e2eee67f GIT binary patch literal 38256 zcmce;by!vF*DgHiZlpuPMM{^Xl9Ey)A`Q~g(%qd(hkzhRtCVzuw35<|G}5qu?^%2Q z-s`;IIq&)Ru;ub{?Pt#U#F%5;_dUi~k?Lv+c-U0f5Cq{VK9$vkAS4n9LJ`A62cKB? zyZ!)wp_?iz$U=9B|2{N-N`|2OkfQ7pZI7(oc~2kh^=7HVk=<9Ybn_$?7!I=?mKY;3 zQggXZ7vs22Vz;fq_K2Z_cjc?-)8FI^vn__L&NbU|{=bd9-;_?~la>%B-ah6ix>v!9 zuKo@c(>!x`miep0=H(6Vpvf0iSbzGXuRe?Zs(hBio_xtZf5k^U$*{~~0tuyyp0c|k zVKS5ic7zZzVq%j>N8DdCB@akAx)5OzrTsLzenEqh0~5KKKg7r~{;=N?nV>|f{N2B>KShIjYTo5?O7O(ksdy+hMY za2LjQeW7v%2N&i|H<8iU`l90-Vca$2zD82|Y9H{(?2&7f50P>z6@|!2!*Icb8|hFm zJswty>5Y=U#vx=*ydE;5Y$8o8pL#41SGN}s?*c)je6;M&LdNoER8O}IpQw1CxMl6l z2S-nyx3wWbU$hJBzi;tctvw&@e=EJx%dox&$^<&+ zaz4LXP}Gdbp~p_bfgl-}BJH@$wpi_;D9Q z3}3Q3NA@{>{o)`W_7fMfqAClUU8Fzq08fr83M4>6-HaX_zD(&WWGF~fT1(5Oh1*f! zeDFLxNQ{qH$iu|i{CnQ9ZH)?%nsbIKrLfrt_N7Igaxf$pLlEs*Mntb>T0;HK+X;(_G%T^<1P;$! zz)teW;5LS_++M*%8j}b$6pKNY5Evla&0)Aymsz|7+40V|+FEJG#5S86Y_Em>6sg}+ zL%CA)^nMr1=21e{U~Gv<_OM3l2!)K67KuLBRPl%c2O(w*HO;C)X;3!pJQTPd zTY2Gxn5d4wRE3up?`{^6Q12*Cy(DrxR{+27P*#Yk?jd zc0XU4MW7E?eH7>7RwlGZhAn6unMm zH8t=5oO04H!^Nrh4HK%eo#IMUft9_g!-v#$bt4|Wsx>}XXx43XdnkExqE$a{ z58rd}@07YbmRcPz(W;uUyI2Y)a9oWQmgZ`0dNKIxnG{~OTKMaM6qEHI?~6-IsWs!C zPOXUGJ2V}k3)#;_?C;xWiMivmzsG+2_ASwX)J(3}&(pBXzG|%}z$qwt8AZ$fv0t zjaRS4*1TDs<)A`q8yj&K1m4@@Y$};TcuQQOxe(fb@CNnUL+ z`5b31()<-IIr*E5LS2~5_p?SVOC<2kY^}-N?TwX{Ra=Yi^@m@7dLG`H`|m%UqPA!3 zRYL>*(_?3u_&kryNo1&eT+K}uPnkXOh zsQBIN7@5s{d9BX={-Q^?C2;ZftS(K$`z#^jsQ{xvX-Nsz%*+fhUJPh=zLA`q+Yr@^ z(-dLA3g773=G0L%yB^pG1aZ8m2$|ZU9qWqh@tE?uz1-B$ z)I3Z=_vete+YPB~${^f1|?HpJr{jrzP-PKgJNhkW8@A>?pSM&5YYppJ` z=%^?d+%P8WY>2_hz3ImD+iyv9$j5uQqS$B(JUlKAe=t+nK{j*Ire+0xd7Q>;h8`nd z@L`>n>t_uWH8t=C#yk`d^ybYQp3I?yr_4kH^ok#+C~;tm8D)LtsAP6QwVDs|5<8g;a{uWg;Tn_LQ=D3h{M; zD}1FFaz+M=kX^_^SW_?&QYgtiMeYtZIL_($e_fvAD(lv+&N zsH86A^V*~Mny4n;co0CGQVZEv>Ce9p|Ko%W&mTeUdXYN3XVWwYZ`+W8-N%d#J#M_I z2f-jw#CN{xJ2eDd{tiu$)wM|O2J-9>*W!f(?=7 zBBIDt7K~Kt=Orp&(??&38VK|f7|6k2mp|l z^Y#XWI`F_Tj%Z$hU7)42S59T8))!26e`Qw+Mttd5#QgB?i?lgv+)BuxQ^kdLghbf* z?bE`0rcrX_X&mmju}O*WJTeXyclhDx>o|Y+Lo~FACT!VMB8oo}KT64@mqSF$Emxno z);e*l)K-PHq{3Xa-pz~Rb*UvR)N_)YoVTD{Q;vy`Ubx)iQM7z=NS)ay?eF+TW zwBWSpjJP#(0j4*C;FK{9La=zC5w0VxMe(ntTM5JaU49{}u%a@33lS;t3d&~tr ze%*o@X}V_EYMO*VrEf<$q7%wvd@XJ>nTS{fU@`=#-!`iI`q+!Y3yBFrNCenS( zPYX4HOPBZc@$>V;RZga0Rl!XEXoVupmCD4z!qQVnL=Uh`^1JM8yijT0eJ4*(PvA_P zSQG0J1DPXbgPNL}oZ^+kVfc^1?cT+ZlPVPwzkK=9i9Pn8AwgwjoW&E{ytyMIBhRA$ zGfK{8`=c@m24b>5da!Y0I)k3(zU0J-TuI{2n5OgteQN&4M zC!LUyuuu~#xIH07l_mfj@RM|vqHk#o;#7dnL>gd9n|60TavF2O#)}&nTet2AXr;~x zAi=G^tVw-L{M7{|KqWsw?{lcebJVXkFpI5<$B}Ma%!K}Zk|0tk2HzUOpB>~nSsb$Z zAlQp3yK-VU2cED=i3y%L@evg@FzLdJbn@J@=4dnD=S_aZ|9 z&b|-qf3(uuAlOTECH}eGuI*wV5@0X)wd}j3QsB7hQCs9KIoRO69OtYM-TeZ00E7&3 zEL?{9Fx~Bc+#^Y6f6*4!{AMJt`#BEO_nd5X*jg%FiR#)@y2Rj)iEU_mr(^cb$loW< zGPe@-40k!-@Q9(&zLRNPII2<&4if5h;#u7teaj_@GT2E)UU=c^UB^94`+iUG2z~2% z#WlVM=nb-g4&;(i;(*311+6mMToH7`zk}gy+0{1$M(=Zrvf~~8;W5O~N}t=yL#4tr zsq}Nto*vMNpakwBQ2|Kjyx3^-Kiu-jO0_x3i;Ai5)jIeP>{5|uFP zlV;O7>|cWqhdI1jiBKwm1P(@ZJwmgYK2?=xY0UQTWwVE8S^* zON*pO^8(RgonHA;{n)3&#(Det`LVZFqS?c|xjTD%-G_ytJ?tY^y)a~8Wj|ZaYAW+u zTIkLf{iSle_a8OgUhU{Md!0tCrf|Lt8Xi`ga~^o;aXRq=31U&pY70Wc0U#Z-U@QW_ z&OqL<+FGIF37e+BKR8&~*`)!%?&8ya{=D5z2ta+~?x=fK*4E3pVy;_+?Ck97hKBgb z{UHkUzOI9RW8T_z7>6(?`o0mH_RpE`x;@@4DmKzGG#ozLo9(4A+pTN8dDwWhT}o*d z^OBPcJ>c~GoWF;HgO$}7gMiLFhS%cjOxlc{#WTrEW)~5fvXYWWz|N`{;Tyv69i4K$ zZhSiNKEV3YrrbocRSL$c_Gat&ODBF9dhZFM2fUg8{{H_$XZZ8QS82s^egpP$Uha0d zTD;g8`5^;~p`iHl=b)*bZl-u~k-mp^P&6~Z1gKTk6 z;wb}n+W!!T<%^?LpEs{J2FEt3Ro8T@)%n8{lu5DC&@qCQsTVv}W48_tdWW;c+9_mQ zT=*v|45AGh-Bh=-{%1mwk&%&*ogMr4uQSxP(Cm#^&#juOhi8wN5D|#=&i!F{)Q;!# z?q-|A*{mEKod7>>s1GAmUa`4X3YUUipsiJ;!z*l5{zBf`Ze8vkcvADo= zlq16a#pr8)jR^uemmj}*^AIpqb$xyDKgVwF?ga$}D90N^)MHa))6`ZE93c6lTdk2s$R|g#r=a*8+#)GTVE2X4 zw>ln5ql-8{O+`Cll1lGHz8Ep2QQ=G|N5Z!#Hwg(#Awh`76SPz z;-j?KyVe8b|EmUvA#%()6>&Z5NA`DrQXuH}{OOjX5ov>Ln&3fg`*o(zc77f9Sh1-~dD8}5)Ap-4{9svJg za|Su`+BJ9NXxIT<&qzSD4_o{IziJZGC@YD$e2W;Dq_}nwcd4<)Is^qo$Bi8Irm*mC z9S}8`uQY63&^Ua9W6%UH`E^8psp$N&TjSIq^zg~!q@0PV$ocZwv4cGbGVjB?kYAAC zO}EG`ZpxyjKW~nGcD1m+xjZ<~gf5(U`^e&7_|Mi(TV>{O?PBGYv)n4IDn>B1z7Pqb z36V0{D;k`om_63p$zvD@aLSACm-JTmbgkj|D-HDgAOKM!ZS*dBEj*4r9ZjI@wm4N8 z?AMJmRej|AE_Aey&vb#3lEZH!n4c%W{uw_NjnPus4aKe>$6b?}ecLwNTO)_4vEAAX z5hOm1%!UzszXJ=^G9`at|B;xoqE9c@) zvVq@mk9Pkd2?@LiA;Q}ujLbc| zZesMz>FMb~Xjl-?jc%P@yht_Pya(VNLxGiz%?kaqTuFmb7wTAfznv%$zn+XdgaJ)G z%tkn33{!m}vV>Pw751XktJk@3ILqde#Y8JWyZ~BxX zbs!X|M^HOny|f@^9wjb1h;>lAA4web-M(f%Ab89wK>uo`^eD4=z;Jol?!@5NntQOT zY5`ET{>qWZMI(n^x!d5LcxaN1J9vXI8ATm~*g8vXgR!#zk1NJPYnK4Ha~VC8!KelGx? z`6y926%EuX2q1LnCr_TtRg|l<0zZrgQBY6>%w{(Yr~v`@n`O1`g`Oh#nFDhz`AGy6 z`JWk!2MVuVIchloJpXlu=9~{O7iky{S9#i?DlI?o}LT6NNsjGC#UMZ2UP6ong4w1>gswZGdX0z{|;CJ_!!)< zlI_7OEySMU^-$>P=^=b5_+d=7`nar0n>hsBSw66npYktWkn3}iLmxhTfY{$Ftr5p^ zz5M5%>nG>u@!v<)L9h&1^1qV>=0t&u-co{~R1q1OnNGXYyyH_-$L?>wH#RQ2qSu`Z zq^rP9ha*|R^(*n?46?>_@!|B#zsJUqzRnDCPiJk_BqcO8iTlOA2bQ>X>XoQ0Ao)2x z;oT~*uHL}M5YT)p-&4fOTh-`QN0HSW7O8!in?Sty-8-l9bHPPG=?2kO`QqLX*b%yT z7B;sVV7F`!fE`lxPykkY1jOO|ii)_(YUcbyx$@*ZrWCOLcUHDEF}$vBZYy&QF7Pw5 zNEio4w8M7DnWUOI*wMn67yMYM@82Urc8~ASl+}Y-eT2bi=Hn%+Cqy*d*AxH<=}u(K zmIX0dmIT&M%>3A~LajKfxw)A!6~>1c+_S^4SacsL$R)zr*%^Y|8YS`$q(QRa9dU_8 zlHC(x;%#9h;zq>gX`=T!!4EH&zkK~#Z8etv5r_DG{j_UwavI>bA{3=MNnk3M$D7e# zSmFrRoF4%I1{pg#CKxpejWKUFD)+I<<0P9`B!=J!zNIk3nnf7DX3rr{|L|}RK(tA< z^2A(}_CzdqjVgT)mfc0cbTna4p~r=qNBrT}*VjzLQpCzOO}|2xoQIeJi!Zr;+C%&~ z+mQyNbb{U4T3)H#J@cbrd~foXC&&Tx4#>CJRSIQSnl5!s75=hr}69=LCe)P2lV*mOG=={lKt9IMGaM?GR76r=MRu-UJQQH zJ4)C;+NW-(3EcD(nCf83MI>_@?{pTDu4r>rV^~XSx+tYk`Ed8?3kwFFQYuH9@B)SS zpj8ob3yYN~j*5t(P*`9Gb2r{JipTztP}UpgqeXu6KBt>6D*DZj}_p3 zfjuBWK*-@qR|zsQ<6~9P9B;57uWg;gPGz|mkZ6>R#SqR&OwFda>(nGCj`GZb=vDu6J4PJqC zvn?;ApAegQ&R~ms8evr!Bmt?*Qw07u{>|3bJvY-Fz4DEN72p_H2B~UEVQf8rHV(&k z%6$}ew%1>JSS3>`p}no>36|z+C$Q&e1D`?Q2m>tLYCHarxm|d>xd}mS2jJp3e{iwI zX1l%a_aIv@&WyrtiJiY@ZkB;DYK%zx29_N^{~GVL|6K1Lmg%s6P9#ZH+uCY0>hg>% zR1w?S>Uha<2!h()6|{9`;{0)|eELyy$;e)~^Bswu9^I`Sg!g8wGD|84h)aGiLr1e| z0Z9Fb*%L5@@5(Ul(_;HZ@*6MRV>!lOp9|}4-#9XrFXxVED;8>+zc#Q@nz1{)zFBQz zJlm>!C^s%7s^MocVd z=v3<&5giKj;r@nNuX!jgB;I!IgP@~`l-K_nvO+cr!oJh*kg6<68d!%*{{mWKK9_)M zt7iyvgL*4guTwsL`6mDr`a|8WQM&dJ(TchMwS~psXshT3H1tp`B3WBo4rXGkG93=R zSFfD*YNk5YvE$CNr2Hi%;fFGb$|B$UQy#)M@4dPhy=rfNf{MIScA#vM1hRJ&6lrr2 zg9{4_yi;RU=G(=q}wl%M|mdvKctFNUQ z@T^W8?j{Hl`9BDKR|7Q_6-ba6*UeU&_Y)%mN6IVv_R`9hs9A?rT7*#cVL}6Yyo%DUeHV|PF($gc! zjeLlJR?0&aTYvW3mRCeXB;O9?V$@3~Jl5a8KybwE?QPzXOskG-RoT*qyh$euS0BRP zT>W`j`{$G8OXW{2pQiwNJA)sO@K;U^Ecl)qz)}ZhCd*&VyfE_ZhIr~`hv*xQ3A!<^N2@D!hAm!Vjc?C|Yrkgo&xCc+Z8yTwyBUBX9Y=c7k(? z4f%;^daIcf;;Oc{Z9rz0>Eg2BiD0q-u-ZeAGnLL$U)soY=!v?=(c_EP>jdb%aFx@l zysNwWDv0vxCE6f+8h!O0ko)k(gWl!Na3sk4Y`6VQ-F(joFXh1gd?U!c1r22iqd=2o zFM~ls@Nu5H@l=HYaIv~|_O#H4swzT2)7t_0znqe~Ir#;Gly0pZrA~!DW%b}}okMSi zply}IVvEyACM&S{MF-QmS-F=<0UgY$<@ycIeN+4vgA~WQx!m<(B>=BIe|}F%Sy|)d z%Sc)=w?mv0S>93Xqa6hTD{rPCIgC>``DM2A2Q(#FlE$%zzb@?AZq5>hm zxU7ukufDZ4Lb|`KJU%|Y?4^o@nbHq;gyLM<#H~IhQ^JWr;W;&(^-_yuG;|8yyYsy*^X5@%{lGNCE_(PLO1LVI@H215%~*=YkCl z4aJr8-%mG{xtmd=yMyND=7B90hYB z?~&sYw8)(Cbf!afy~z|D4m?cXbZ!hdGcCkIfno4?pV`^X)e{Y*JVb$RnY&FAvZ7eH-;ac>-L{mt2|25>SgIUdM&r$%>Z4f=5)eZO4tDW{{nZXxRBSBpF# z?+DMkl7I|wh0};wlihNWvthNl1$x~KfIdzK!g;Fonr%6q&C@RRuobm~Od_ zxw-jcF0j#tK1Xs*U~)bRcHaQJ7ZnrJ(9;v$797r%il^kaXtNTC^Kp9jnq<3anbW`! zei14UwT11(PCMSL4fE%uq>$-1c@P0A#Nrbfq)UA>&t%OYCg%NAOiX`kxXmi;j)9zv zEY7=J{SZ8Jd~$Lnl-!VlPYv~P*GA`mGQ%wGBe(y&72` zdB81Jix;$f13Qv7NkUc{V!u6_+33oi$rv6MGPu<*BJXk=I{+myx5GeSPfySfMh?a9PKWfBX-P)27w>@q_-xZGa#)#0$h%7KJ(fVxjh z&+~>FRs{dV1N5xl${FpD`fATl{M3u%>J2ID(YOOn%Ylyfp)bLm$_m@U4GaLHt(MbE zT^=o*`}K-vB&{|*-8wdZyJTF#F=qXK4LYbv)@cVHl#kGFKCdSFoDOn+=1Jy!SadUX z^4%v(IwUtjQhsM*IQVp58QaDTa&$2raWUdwR`0dGx+r?x?TY6Ij}J{4Fcl_eAqH`e zz%W{Yg|pl;sl&%FEVWeh=sY>qKQZd~{l&iUag|jHK2ZuigN7;YMgBtI)SZzqr$K7? z;9cjv^!ht85>jWU#kCR1Yh3(r6zovbuJ5hV1w4u<{+GFaySKL7a|;8>LxT-U+QTDl za7w**YaDq~Kkk$Kk+R~$LZz@9hQNE~<2e@jtn~{)x_g0+QuPr!(_{sWZ+4o8nEO6W zE9Cb2^?H3Hq=FWzm{Ii`?y=*@jUYi+U7;+A(EH=l`s1ReG=kLJ2vFdWCE-m5@i;7q z1K`oUt49j6F3KAUK9{UsNv^bW{BBmKA1I4Vy6$28=XtEZBQe0b#aWfTvG38}-GeQR zY&(h^&RFImK~d4sye=Dq2*HzC6%85JGIQR_4yede7@snOrS8vFN>@xqC6#gEdhm_% z83%FmH1LvPm(Zif(qfko=(wY2NyZw#&cba_K`z&eLM}iqv=IF9sf47((_^iV2o)I# zC z^Z?PtjbHtIHMIjsUZYNc)I?4W9n^jo0b~y?Q6rBu+222tV;+*iAX7atfL()*u5V;& z53~3!)w-Hp^BEGKUKnf_Tkt&sH2)7sTm)81kdl)Vq*L?V%lnNL+P}8tDk=#|o1S)j z)hplH+JXq@OkK-jzrbWlaJdu}hqC$;l?r`Y?*Ylv3ZzI7V#5^J+dviAoa?pvP{^vt zg^y(WaHg8HDs6l($yfCv9tN$+(NQRsqyEb4$&>c9S8-}&Z#&csJX#gYCPNCr{|1<1 z?DubMNS&1wRPlr@O<@YUK-+DZ>hxs za}as>@FASLXv`{RhMDhkx=OYRY}FDM==z29Vo|u_Ty07`M(2$3qv#zYtRut7QcT${n^S1M2SyKLnTWk*?cO{@gk_f$g+OL96w zV%QD&gII#VRxt@oA|{PO82T7cmmVrh!V$`L65yO1X;`-CpM1w!Y|`%o2BSz-T3#+^ zX0{rkj+IL!Zq^f%29}KZU4k=4ogag+b}^)bIToY>3jkIV6chy2fROw!CKo3s>DLGu z^)yT-9YK=cwj!#JXz*J~mK;SK|E?(lobLMeEmFh*SXk&HU6Gh*GPxl?FlH?fRC(5H zfeiNh_eJIEBPJ0&ZGNIK&5{spcC=on@j(VHa+@D9-RsT|^jTP9gYKzu+7kd`Skm6! z9x>wX>beBdXIs0wFu^-8HfU3;uX==z*W@3;zGX4F{ znwLxCMEV8>ZWL2p&o%l^siRZ2*cj*|GGpT^AX~RFoXzo@bGaJ#AVl>Y(%XI*+1Dy? z^B7PzSEpAxwZrSM(8O@pOx%4ag^ZotAK9G-Vx<7TVR@H1pUH66BdlKvd^Er;5G$8u zL^6x%2vHfgNfzI zlE?^>EVI5>@@O^$`efws^$kHIpmsFh_5)7LTvRGu7GLoJSOD=rEwT;H!p|G2dEdlC_Pj7;h# zJVr)GANQx5IO2;W)hj_pi^6?Z^0ivmvV*Z43tJH`yhLSx!Kw?kHIIiVF!*>myG4|S z&MM$xLe`#ZSR|TZfKwHkIQS!Su;1-H>9zHmT|f7En8UjEW0rdkv(d{1#w2fuKA96O zC@^;Y79@oc@$i7_+j6~Hx|XU4`xDqS$_7IWi-I`OSTfNlf*&pgc8g(p-@h;#AUpi< zxe}iqU%*&v9~s;f9c$(8>#sc%laQ$`RuxmL`%678Z|GOcO2hARo>cUlroXP&GYYc) z%t@q7Zlp@bAzA&>(W44Ur+_T*{lAkgioW9`d_+|K8-H-CsnTg9$eabNxs?3JsbC-(=z@QrPZZ8s6pupNIqUJ znPfr;aAoNpeND(eQ0wnyM354*`KedYlzz29hU}2IbF3w>SvI2wPOTTgj&^iyW?pT| zn>hCm?{ga;p}(**`Y5=Z1Lu&ff0n4v4DAaV1Qv?&Lv1UxU0}qD^WO4+a8f^TX!i-z z{x-0xKms0DP97hSAY)DQrR9!$Y@Ut6WFHfjj0WHHu<=1qI0n_|(x3Rtx7G$E;Bg_m zLhFweuX#)$!yVCXg6;1d{=DB9@%xnIfepH^bE6Z|WRmzunMKs9HHN*Df z22{#l)NMQ7fA5+p7E9MJ;pieZK!QFDPXz0|a962RKzJWInAV;!xb-7(k3|0_Huu1Y z2a0}ml$%HKQcdG_G54r!8DkE4@+^i)Q7Jmv6fmF<;t49ku}6YVi;{9>=@f6NojyTrLR3 zFm2LD*%K3gyDUgzv!N20**LfSZvWUmg!oqSdqWAqSX_vFyBh6;3aShfiy24lfdi`w za$vxHCXru66CiQ+wgWrR1{M;4UV(*Zuo;EZFsSphAOSapi;o|7l{LE;HtTz-mN;QU zdmJ2fCjO6B*1ql>D-yRVqRW}WJ0LoF`Y29DL7}&}GV(+}|3gOqxW^e3PY#dbCgR%}#Tzohx zLvTbJ2b2X+G(ZPDdUR3}vH11wS0B&b?~)K4zA2>ul=~k80~$c<0vgX@W%P+YW1dEp zmVzKjB0zB{r)TNNdFzNCLsA3B(`6V!rgpA zY$rtRcRjmk3dF#Ak0Zuix9ha2R>x&zx83PL?v@K=2?+@gP#%AjoC`;V7$sA!I#L}z zX7sPA56PO}9ns$*6>3IybV$ROFr?ZV4_XGK>U^)=%3gj&2ZGuERA1vQy*bn9&J8vS zXmuD6D<@|cSVoomf#L2#Gc|yz(@mcIknvE4AVL)a;RzJr_cTeGAs7c(7@wDy*Wt8T z{PN*)Cq#JEAS*i{48zAxM(7vP_V!$NZ=IZYy}Y;_!mAyZrJ*HT3>;{Wp&ot2Ei5dwa&S;EGh+frB(~P@%Axrlm;+IzlcMkJptVSAWzBE(zERN0chu7i-3E`Iq?Jv|Nl~w@I8muo%2BPF|a5N1A5F` z7AfyDPN2(^aT|)~zwj^H>@hpSU+Ikvhal>hc}q){MO=^@xJ~e5<&*sM zD-yImpg;gaNp2sgf3y0$%l!V`JK48c|CS^b`OpQo2b5NPw;A6Qpc+bRMKv=pVodyy zpKm%oCF{jCHmaN4^~4uO0uq$H@L;l4@etVcgQRcxBqhvtCq|^@&2rIX%QxqXxqsh= ziauk+_NVV$T-42DIE`c~@06vkO^6RJVE;n6YjRK^gpZ@vZ_d|e-u_U(B%P_^8-B9> z<@y8k*(M=SktEu&*emYoBjoex-W0KnUbte(k2WdEo$W|R=1853YY~Gt5sn`ntw`-m zpYHJVKey5-9!J64KVC756cgc~yr2lzigSY==V`|4d-{<5!GGIvb>pYkq9JQd3xle} znMBSy$PQQ1%H`I6GS`{al1+2%$+*6iKN3;E3AZO;3HwHnO#Q{;v75E}WUg0hNa~n> zZU5mH=yv@20rVJ_xO7#PcpPI2SlT~B^+U=2K>=PqYE8e~b=HUgTU%9_-6Z6cSDjXxB^Fp&WaBuuua< ziaTaE(cov{RZr7* z_SK=R7-2_&bZ_j%C%jN^ZoBk5nKVwW4Lm-dVwT-5hg7-Mbfg-wUypb4u<$n{j6%h z^Nuvqa0`;zj()9FF4OaULV?7y=quf_yPjn%bZ*AQeVfltL%Nt73(OU04u8?8*5P8U z_1C{ID=wGJ|Ka`(b7yFLkm~9KFIl;r^cZP|#1#y*0gcam#_k6%99{f$n}Vkou80Pe zB-r_*c#Sf{iC>eQ@I5eRyj6&M53L&Gc$k2fL)nsCvgTFTV55fe`m|M6e9erDEa&=b zkT-O*zU(8TeSZbz~VNa`irJ=vbo zt3vvcRGqzmO?T0U_KV;;3+@XWL{j7OALg7Q1E{>TX4>djjY?dFYO8Kth^@UH?Nnd_ z`=OnfnnsEO$P<02;=3Q(GZ!qs_~4cBLXn+)xq?P|T1m02ECN3y7EC3Is-=7aI<(PT zHkrC7S+IBdjG;Q89|tm`6mHwSD~-jP&nl>ftrYO2-%po~^2`t){I%A-26tr5knTM3J#iS zD*CJ{04pP|OgOwH-uB6^Kb-9xAOGh%X?N$Mu>GwFAbtgg*uX@y<68Njf(Ji<)XL_S$$F%7GqOuNQb#I;>KVkP%&g-$xljEvr2CILsA?_ z$wL*9jbqyF>rgPyQbTSVlOh2p?v^o0f@ttDr23*njXuF#6~b*`M+)FGE1yV5X~E>t zzp>1IVCK+#KN0@JBxXAB$jAJlp)z%dOBwkx*f=@$oK(e0VT=Xo9Ub`=aiHW)RvwH1 ztsCH-iHO->!&K%8X}Fu15O5X6Eo%?JwQmQ0A@L6IZ}U2 z>|<3~`S_pK5x)lOB1b!QmNF45g;tS2ySuXf{!)M{3sB*mcu9bpBrcn&3Q)yH6enTLDxxxw+-;5)1 z0H7lhDWsy!eh-8Qy49Z_YY*3{!AhkNPtYv6|32={Z6rf5TD}7t0OW39!!)sVI*I_w z0sykA4Q_1P-h=oQG?;+2!eEYsSRA-Bc*EjWtuA{+@^~T(o|+&k+OXdKOzJBp5xNB% zV^vl)Y9-*xp=&HK$^7pJJn0hNZra&jnY|DnZ3&CSfr=bB+{fI~GFlxxW+>~T0NiWY zgfY3F$E*i8Q7QR%aXN`VnCC}r_HY49kO#E3jiyKhNp^2l!Q_h%Jdt13_n1Km-9P1E zu$nR5-)fP z>_qJ6J-u`K^Ftd&rgNHy4=sgN1=F|xss;Eq)wcgRbl?}Re;az73zkML>4PVqVj~SA z>XVfO69!I3i;I|Kv;qfXIPx?}yG}AX9e_U6HTki#YePJCxTU#D%oge6B7N^#;ih-D z6PL#$@}?Fp-0Ap&n(O$!(cw_GO|em1@}9K}5KvlRG%!OWa=k{Q-hi8ngac(}zG6}f z)_d|#3%><$Q%0@oA6|JaAJqg`gRe1YJjUG}=I{Gak1gl97^<1by(MT(YUiWMs^W!# zUr@7i&yiCAM=MBWo^j3G5Tn_i*PN8moCawaEpyf71kW$xNiR7>fxg>o;*uq;aofye zVZc3I1H!M593>2ww%K1^;ehT|jOsaM)bT-}nigF_exf#WvW@Ym5nk}EjmSLt*zZ$z znthk0G=hiOcX)N5DCI8QQPQs7xY%rTy&yH|7@fWqFRL|!f`-!bn;H{sb@3P^9g|wV zZBwR6>{)JAx5Q8GPqpbn^OM1cXl1N|QXwU`exWH2Czn3j{E@*9~DYQ`Ajivl|%MFnT?S8+@pbyh~ zNmB=|DbOJ+C_7Y0cp6mhpghHK=pk0*I{dF3eRIo}ni%tDkjHw@FT2%N>z%vs$!|0# z4pK!U%8UlqAB{LrTN=R|2`$_GQQGVaN+^uWM!u|+XA(w?AqB!cJ2L(JHH z0vY0a8=d!SbYq73)^&TPZ~P%LG@Wg^bfj8w0Q~V0Kcfmdtsv$QeSbgPD=S*2?{%vG z9R>QHV6rI2X?hp957vYdGco1K_iiU{LIwt}d1BBiP_DUG@m(=nCQtovKvi&h$>$yT1B)RaKKNRMDr4S!KcJSQU|m-TL#Vy^}yHi}1~aZt81eV~9Ws&ZE!uLi%QoA#X&M zHS&T#F0?13OF^Y$tuK*D#BG;CoQ?pLX%OdUQ|){KtrX&*`5jctF;xiq-}?AmEQj+p z*ngAG?bi;uqlrA3i3V^%_i$qZI=zlZizdqu*(uKvT zTLp!NhJxyt>Do8c6lV2+kn{2LAFV`lGjeki0MP-_{tvo3>hS$2Ir`!__m1XB0@UCI>y`I#R5k_W=QC=*coBa|CwHY{V{4nIQyLBr zHv|)@Lcs5Qx=yE{OHpKbM*0$-~zIjIAFdUE`4FC#WBW2y_4 zQ@?;#8Q!>6L{$&S+m#g+h5c6_6F~!+SR;GfUq&%5$ zT?-4C3*hmKo1>R6{?p0?>VtZCCoc)ff6;>A}`m7@AbyadqGf9iw)(WRoI zDsru0n^M82m&65(5mcqnfR@wxor*>vWaRV4-_4Bph0V?Bml?IvL5IQga0qA@piN8= z=q*483IF_=9SCaTfBTpKgtHPb1?3|$qJ%)2um<@*&R4a#po?GDz<`#*43uwpfXe={ zU%!HT>AZ2n3F7t3LFvN(R6@j*le;R?bksvVUGKyM*6*}G_o!H#eSAVrYTWzg>J$_m z`ZS#F&IHfeH$L3P%dd4mi8^ifv{K@Bi;#0I~l6(e)NkQT|=m@X+1e9nztM^nm1$iXxzNs30v!BRO9*%{P zFwiHfFZ#u`4-iL0nDA%t(vr@C#tteMRt?^Kh>*?PJWAtmjz(MIU|{fF6E0D=Bnp&a z0IY0&xl!o=G%#dlf$VED_uZk34jZ6HdfC-AHH<*sr5lvKYiMq18T;vz1rXgFfe(}3 z>r6NEU~c6483m8=NMKa~-O~vyFdDW8R7%2++GE?OUhp>~7Zlnu%WnnLraY*9r!U@Y z-n|G?C>@Dm=b6EvN9I)v6eoZ0C7+7N{tva6?<$2Mu0`pX8Jy^-qP;&$%7Xktl{YiP5z>v#4@M|-oo@; z%*;_ReNBs>|4Jx|S=2uWe~kjI?GX3pvHaQ%LlWXIt>@1YoVp=$S`aks_7?VN?uUix zsuO$zKZDW&`*NJu_KuI!H&VWx%keb0wD+^=B(ruZ#9t`FzhyCmxu(9}2~0J{s~kG2 zA|Fn8aR2TLg2fG!FDEp=mBytK2h_6#KqGM9`*N#8_HMFL#+lCyx`&?poxuS)Z3`x~ z=F{Q1NwBMENTXY3ht+0Z8pBIV%l+1WnLcth;0@^I##gQeFF&S>t}e#x1Pjwo%NPGn znTsJ_KgW`F->ZeqZJ?kHU`dM9g_zb^XRDS_<=~KzYR^dppgVVW>jM3b9C&PL`;EYT zMktP2fTIFh1&YHn^HMlt#ji0C(jzBHLOb;saVjC)3qK9>D;?JB#KhT`5BH`+XrLbG zbnFKz?yATID#a;O`Nbpvz~ixP^%ieLyFD6{&-nOzvnM7z5Uv4W=2>9$cYmq9VGeF8B-}wL;PX8gk?jM% z7Stdefj$R>OD$N^0Vu7yHvA-^@-57<;;|KnT5DO@C#P;N79zZtZ!xD<%$TaHvyy3rJKcJdSr2hVU(hJwTH=X1; zdFKejL{5Ylz7#0?5hD|)q)BW}r%^XUo#ATd`n!{GD+y)|2z7zN`SpbNTssk_@GKLEQ_LqV?d)i>dI{kl zZ-eKJT5Fu@rQqPk&9pxNotu#6n8fLiiGFhXG&V-^V36eXgz9RZR4xTfXHiK>uI(_& zFc1DQdHOay?-K&lvwXS<`4k=4e*VL^KX)#6*7NaMh5J;9-wSQB9Wx^%X5VUo8}J6^ zBqC-bk4fgup_8pv8A>tnk{f5P$>qyL93T2jp4oXIagQdFh5P=8IJgwVO9)L(Vp{6b z1~TDDm%cc~->Lp7uq6QgSyE*5bT~y+uyu4#_KOq4n<8vvj0=TdVDn~j4SJ`k`{)zvXVeOqUya>?nSxf zg)JY3$}w@nd0RR9OM{*6&**PXa>k1Uc;CsRES}LQaajHGsQMxAsyF&|l*s*0vKWjU z|WT^|+^0S&e%S+ByIn5o1#aa~WUDU18 z?_coLu3yVCD2(J4M%S+~1U0q2oR?Uj7iBsy;K}kej$^naix+uVFd1vka;b49O2hua zhykgiDj~iJq#J}NQC6y7uimj_EyM*2*YuOoTW_+l$eO=+m2tn;b<^48bl@Rg7TXz3 z+M$l(vIV9&5(&k>UJuP_O#eI}{gzJ|zlULs!-n<8h$x1f`83jYKy?4e$aXo{%|e7f z!M)+rF15#RX*|@mpP}DeD^n+`PoY#euVRE(?xN^Ab=us`F77u}UG&_TNHLi_Hh7TE zt6MT$#{;%dc!W-YF0pCaZSLJcmOJ3VGxs1)HH#OXGwS2@L^JCubA6Ya%RvOjYc{7w zixN(8TGiZYPRU7CM4>1dMXQeqt&RQf&M!ort%%K%k;Oxmw6LH5#f{%V_&4&_-jSkI zttyqGOAozl>zZfR;~#d#C2rEFNQaTU%o7^{5BgAYE;4W{E{Vwso$!^GZO>oFedP5J zOTrE=GJIOruKytNzM;bT-dURAq!Y5EffD!iATCJ4l?F)eFBpw;+L?S|<=-`X@yh;8 zrQWS>MLbM}f*^aGYunFVFq_B{EA}agNg^|in>%|Ur(f6f#BFI^1(mz_tCY$>lC?B= zI`ygw(*)a$)~)XP7{Xp1&B=~SWMLe_$%(~zcPrsk(lpJJrt zZVS(AU9N(6%wLRVt-m5p&lBXFRsmt9!}l4T`PnYP(}Vm>b6tYWMx2qIf^%k5I-&a! z+^Gu|XK$h%`F%uh@~daxpbwo1bvv+H{e}`0Cd0X(CSN)K3)2}@7mrszDLM6d+41}_ z+A-4}G`@!e*eWM>c$snH-bnNwRFpN}yYPdT!W>C* zrw%b(nytm{5GC%mOOc&Ar;%T^xQv;vmZVJ2OCYwSO)?T2;0yV{QQiHbZH-dr<8p|Fh(s^1B^4_zpoLl z=n}t*58q1u@6Q3QAGu0!?yC>U9k*r|fD@$X@+w`stJ7Ylb7x#om|xq6VQaSQA8)Qj zwn&l&9+oJh?AKmAeTku9k*hwIy`g$9O=g(1cG`ntO{L zd0LQfgTRkpJw8V8cyqmFW+OL%BOwT9gra9KKLw6_XxRfjO{0SSEU|LZ`R0F*H1Xn# zuIy9i`PEH2b1(GX`Q9b?nJNPfBP;*Z`6tQuQuH0^BP<+3@)lrMn`e?nZg#u_E3o&| z_F1t8hX%1O1Ffx*v^3p0Z)Rtuj;P>X5Lgx!VmpoXyI~%j$@`E97A*e zk0E>?dQ$QIs5}`H!!L{lDn&OM5}7%#STtR(#5sP>IERN9U*LgWV$R0K{q((tCmb*t z=>P&iMtj@q12SaTq@+vjhcmM_qbH80$ zXB`QWp*&z%@|7nSVq{kvVf*v(M|PW;63IdPbs58eb`M$#<;8oOMPxJF zr;F0_FsRQfI)i1+ODYj4yugAuH=&@lfoeR$sax=`K}D+d;$OOKYXLboiF3DhCC%LD z&vWKb_M92({O(IOpGRt%+v%+8&3Q1uqd7JdNb4P+_So#nV{Ip~I~F$xOAIAGC5Lc>)%U6Ar>9O!>Ev>+$D`bRCq2GUZBR8lUGbfEE?Kv zpB|vgl1^CUGkQkl`D6{ys?EmkNBfueIl=ixw@mBv%ey^Ae&USR|Ex8Zy5#aoz9m%l zJ0@JM<*HvTRJBRU&iJisRf?IP{sZxX`O)gU19Q8 z8}9~(Moe*#6si>-)MADw|0E;FH-)chEDU!Ind*xwwfSV(@J#nyb_z96C^L?2v(3Vk z%&GMhie&uH9O3x2vPvfn8;5v7jnYXTBi}O4GlRH{CGjst_$6U zdt==2HE{hron*eM@B2HD84%GX=XSo1+A-6|pl8QX1s_CEDE9MoH8~j^Y7AO6Z-$Fy zi57)b?x*3Nw3-XTWH+3U-})1Z775Pk6>NH{qv=S0XY8}!$Ya*TN~Q_+Ypr%qBXM|@ zdo-LF}AJe|MV|(QG@gA;-2!pcTzA#t!^ircEevi|-%)3s!JLhTW z$*=~bcw?hFJOPcDb&N{2-4(o-cC+{oQ&Txr8Nj;or+0mjmLom>N+3l&li2OCubqh> zS2_=-=!)_WH;b?rCKZEKxr@0)hOUxnI?A0bD_UijlxsAx@H)r)0~s8AqW+rY6t+VIZ{^=0gd7T1k|IT()L$dy%%9=eg9mSPO(|K8~-_g zn!UliXYO(C)jMBr;&_fXNe)-tG_=9xXM9^JmE*V3;%+b#A zJM{GTC;tsC7~+-9Cset$stIJPe>5&6)kid#e3B~c0E`bWcf!jH^Hy=-rhuCqN3@{r=gLg1M9c@oZ7p@5_K#gLNm(m|cY0K#^Kfp-SMR z=g-N(hDQ;~uiT-VS{E){NZvN=Avsm&wF*PQnmXa%{wI7cEKCiC0o$z^M0PemKQC+7 zCvxs?Zten5pgQ5#L!TkI8 z#UXh02fIz0igE3N9)n5dLa%i(U zexO!UzXW(O3U#T%wk=c0$a{a&;eSA332i{nvV}rv35eI=mV?Pj`%sDA79zY83t>up z!J~L$Q=r8vYMti|KdLvx_fccQ=21?;Dz&1W2bYYC#EX9UVuX zTaE*IA2ZVdZ~1_S?doJb4JXb49@GAc-O1Ux1p}_KwIv9@+Dv4dA^LK*0MyYQ;Ngaa zRF(7x57M3%1(Ux!TJtKlh#gcIzFrGvtqlJ^NC-6xlP%_IW&G&%E^ub`Su6~k2Z)Qg z&fn|Z(XA=K_K~6oqDTB+@h!4<@dzl^#bVb<6TV4+xN;%?4JXVXS96Lj!@9yi&Vy`Q zhBJ4TY{CGuw-q=Pc6bG)xm-R-mJ-q34m2$QM}ekA_h9V5_>XcEfBu3J;wpTzo&)46 zY@a`NyT0aAQ@cWY>VM)+XF2pb=Db^7Ul#-OlRJ&Xx}3Q$R~nZVap;7w@cPF>JU={} z*;PXEE`FKJ@LOBUdZp(9k+jd@z7Glp(Xen>*usb2ED?v-695kZHB91*3AN^emJ^&@w)u!A{^{ zPo4;88C>AQbI;HEg@~|W$|EXjnZR}1!)4Qmc6g+5U11hPr@AnJyT@tL$Cx=SjW!+U7s5h*<9bPEmg#y&UL=95mU>EBxtX$4P%F zl=Jff;}HFFYjCCO`}Zy_>5TMaN~sLLj3|kEVT_w>c?D@Cb z*=8OBT!&}FA3i!WgM8N**G81Nos=yBL2H3=v|pSYMZ2oXDS=pCNV}l0#>U3RZ^IDN z=}fx(m;#ywq;dW3i`S_Wep79y9uo<;tG@JQ7N^SpmkW?(s%HA?>4-|9NyFE+A76Xt z@4~Nj^t{}cVm;^muI2Bz3P`Z_TnvdDB3ZuTrtwfRSG`^C4Eh=+FoM*js?qKox{Tk-C_4*=e{)_=Jzxohi8;0^7Q5r zt?UNJPAfpAsl)n80u|PPkTsV6^M)$%m8(^c#xy=A{eWAfDwf^wa zYm*WZyZ*QDitgK`MVkwb)RJooEc|&U@jfrc)z`C0@*J_>!-#(}VnKTq5~Y-}8kU>) zoX?==m7%au5lC%NO4QRW;!DbJIMhS}lRU2$*xO6kkoFyJ^a)>4XU~)z z6P!SBfZhi4W#b;3!+IIZaF)6+Y4(*T^e+Q7JzhJK62vcCmxi#XU$&(h?*9(i1;gaC zm@JRWRa{(L?Vj`u`58Iqp?aK6&yf?ey$M&E093@BTH@eZC&Y3Xm z{r`+<=0+G#4St6bwpw|cQk+X~$HF~MRw{|UN8i60ppF2DSMr%B1Y30nnI4t#v-G`; zk!!1~j5$ih(Ic5uUp{2e7T+ccbtI*TF$MSx53=s~M zr0$TZvdXN-W5Qx%66*6%krrh-mX>!nA44fxDd6XYankn}_}I=?fG0ySK^T;1`Pqq^N1&EJ!64O-FJ=0Ds6k8z2_> zB8w!R+BH|)(TfJqqw%#ilUv}OZg43{9uN$&4E;gDY=}ZJ;90i+7u@xqy~v7aD;b~c zD1IZ<+mzIy4X28Z_QKwJOF>R91GHraYG1G%gZ2XHW*qb>rBQv&c{AO|S>{UGhD0B8vDYKw|&3T^^J zjk#cv;QWpxtl?wB=TC+*cx5wKn*#Ztz>^qq*D*Fu*Mzy`_bbisqd?XR0UH zfTNDOW5jv&V`n5g5sX_Vwl zi(uKc&&tZQf`iXS+RtuwVj0?{UyRJDW$Dy;eUncay?zEVK=c3`g+4}57w}3|_7q~+ zpXpque=*?>Z^mRB?EJ3!;L6a%o7Bbc(Pe}O0a@F|qbiB}7gR*S$t#Bmhh@(C=u6b1 zU?>LHPbteV){RbuSNqS95#pnV-J7q{lnk+sZFjN5p>+3dWG1VK_`XmwDRuiaF>%VH z(&z({Yzyi=bIVMIB<#h!QbrM3L>OzF*43+5Z3mfX;LSM3&KNyv_|o@x0^WXgFiy|( z$?md(pG$U5%}|zkFBanUX;^&A9aZvekfyAHY+1NxXXv1ehlm^@CY?#oC`j) zP{BjjHtZ`fE4B6E^pag(TZ^4sBc^RKu`qbRlX$w^#G$mVpX}UoCS2$t|J5~)+m(dr z>qy3y!3l`k{`bsGw|0T9#DySYhsQ-W^yFA9;oO3-1tWuIFcU7r*H|ev$(j_5UE*8_ zjnwJ|v%jsI86_uJm&c8*+?|pWVQiV-hn5WRIMH1`P}3aOsLTV{uyTw;m=^-o2H)@G z01g5t2mT?#6*DuLA6qbcz@LVu`Dz+KPoRSBUCfDr38}R&gno6LsYhYw6MC?7O^K7b zc)p>r(c1ln3}gKy*VyH^M@L6WK#^DF9!#h%$S@=TeoM(&g^egmO3H^2np06xwTD@J zkR*%EQKr}V2{)ztyw9rtS)ZMyp$2A}U$RN=(d}Y)#eQDy*Ur^0j(BS;Ra09l6tL

uw*ybbd+8$| zm_3+hu5lm-U+^>Lrz@KuU)Ta6_WAY@NdIF$xF+3~1eSz3&a?Y%CmWdjrRC)0RPBi~ zAxAL2wl8t0Nn8N>KFwriikqQ&hYUJn-m23Y6ZxY3UcdHxy*r&ccY228=h({TgG#5U zAq87sUvHRzDlP8N?V>0n`tAGbgo~IUt(4yly@c@SGHm*t1tAvfo~lLsQ>O+hFNoo$ z*1_;F;q0!OwJczpPrq>T`}9o6Ec~1MZ zqlJY9y=9BxE9Aj^V~3BUEuivTR8;h=$PUbf>K(Q28WO!2J^OQEsM{#Ax=RpN9Yn~$ zBhAJqo5Ab*8v`qpOa*gVXt-_@kRjNswN*Q0WVbPpTFa_=j z0LS15$B*z*f4aor7PM~MxG}ZLehTZUYQ$;pEDK3c2dset-7ckYhjhpaPmAaQP+$gv zl^d1K3eO!b0Gq%)0a!=l&HA#x9l^?H^ETZ;bmIPV&m_t_8oR-i{((F#1 zN2T}N+VPY=evS|P`SzoRj?O1bLq@jKGS9Qk-t%VppGLkwBhvLL3w6XM`*;}M>b0M3 zPd&q|Y_4~II>aOd0A40&D0W*oY#J4QelpSTNNP?!OE>oMA2usz1;<1!b*$acr`es^ zJ}DLIn%#~cyCpL$t)D5@_Fp#hXcS(K#2C-NzRqSszRiZE`_pH4j1SWqJkx;PmgH$- z>+T!8xtf2`zcG3`SGUdHMT_}nVJ201rW{9;MEp^C^Z4jEK_G8KYv1Q*o1HJw*-u0+ zIxi2W%lHo7QA4B`rV3&uj^4E6?aa&yyY3T{gLIu;%L{t>rI*B z0VeycN;m7FABMK7DV;j)fa}d{q}fFC%EA*2`mw^y_p!169#+rIy4WC7@}u*)a{Ep1 zbLu&lqXV>W*`WptueC*L$bM&r9Vc!Nqm-IuDLw9)t1_fYyHof4^-hDAb%nbMqeV2{ zX2-SLdH1;__&vBc9Vfpp-8VKShDb*M$5q%zO#?R-5B00&y2ne2^xa!ts%A2+reZI9 zIBe;;-B&tXaGTU8vKn4fgPNTwA8i?w6UazrKNxYL9mmy%eT!L?>5QH)S&~o5OR}_X zuCPllWo63vt`ONwDn`DJR-v`L9Bi8_SsBiQ{n6+_aQ)Stu}zP66t!VnU%$etB;&h{ zp{}`7DA`okDe*|?*IZrZ3;7^*E?f0(3giSu1_fJvl!h2MKVK;wdpMOBEXg$A=3l~m z?qD`;V2fJ>-)Cx90Qs64^%@iIo~5hQ8yS;OFv$qd%2=o@bnF(owGP3lIW-x$!>I; zr@oeG9H3Y&v=iL8FdkZ&YFROwTlTW6u(S@5yD*J96v^cqBmKHq9e z1kN1qJk>xz7rF<2fc>d%buuid--3l2`X7l>x<2TUrKiGTJD~uo+Oj`S0sVzU=lx(K zMhu)%{*EI`$)-(w*rgG%F(IST2E1m;v#AlwAoIMsutl6FQ7n2CIQA@uN`S9rd}=E)DGX7z+yv zk#=F5K#wIrfyxu%HO)dRFl4QjvTI2OV>~;nd*Hwp?ZA3kQE_2=Py@HMfZGE9E?w0b z_rs%^6SzC$QNT>0~e{16U5QHjl53 zOn?ao&0|Hp!BP*Uwb2g7G1WM30-O&BP7%W>9Azo@e|nll>;*=|gD(@<-l?G(Hozud z03hNQmnw--jEs!-z_Sbj$S`J1V|U~{axl_u0D4rd&b*75+StX46t#loSqv~)g$vUG z5P793b-8Lm2sRGa?a`m}!2E?0KygzOr&r4#VdP3c`wOe(0-P`*YGib@4MdWF)`E9G zpsJl9rzLXDCdP!~eTR{g+?B)I?y?$G8)cuEaqA=^uk7Jh3X>raD(LG>bgRe@)AV^mxu)x-_;gbrBynQ9P)o zzIypGxGj~a?-jy?#X~I)KEQ!vUNtnE1Bxyq()tbNA-Ek^T^|*57<2ve>pK`>cfmMo z_2GmeS`9CI%(QU9t__}j;a-7#1|53F-TK}msZvIs4Dw5d{v;JBHiDR<)M{;1{r~<$lKdL=JMT&+7O@~`U^l| z2I=3)crQ)D=KoaLY#^HAh{dXC>G?Nfx8TV?Pz!y<9~lBQNuls z20K_FF~)W26qtwrlRX0v>_?9tRpCIJJqswvysRqBc!;Gs&Wza#)I`iyPk10){3j-b zi3J5ff%Cj}Ey}uF_vX!;Ba2By9|*3hhSzYTy#Ho%PF!&I>XLqbS zu@b|?C~4`2W*XB?w*}O5^7EB{uD~4w@<2jbK1LWCghY&trPdKJEQea$$x)W$W%4sl z)^+NPu;t%M&+9=c2MH&pyO45py?@)-$PR9UKB~r2Pd1g2Hl|B}1gLcxA!D z#oImwWbAJFVoK-=H`PVjjJzJgGLee>;flBo>noo^X?hKB=rde1vOa7th2x@F7|Nvl z42LVFMhCiXTU~hWQ*rWhSHd98?)6Ooms)YgkouK2OnRA|FVR#~nt41%#v|Umm!7y`4bre@})+<};il;k5jmlF2KAH_Um%cqj-q;u}}_Z3Jkj1122yh=zArE`M4G zmVz(kjpr)du$C|8q!fI2euhf=wn$ccpKs7p^*qvN&a4t2(!A6uuvpB;>y;I;UvJhY zj1Y?BVn#l<&&P~*1|^!%-``>>B;$0GN#{DOK~LZ^;!Pn$emq>7=h{RNWY zk-gKqvyZOaYZp~uS-ZK)#^_XHja8Y?M)N@2ib;=l!$%R{rC%|MJ|m`{)8PjJ)pvtJ z!zHVa-?tXt$d{vOL&D0~i0}F)UowWgJ)ibogRl#*2_OH-Z$Fs+vAd98Y4dSKNDCV` z-L~Sw(R8(VOl}&v1#u3}qk_v(zij&zmhlMVJv9QF=mMy(B{?@;^AE22qf;VN5bFAb zdM|e|b?f;!?9BoIRE?-H@HYJ29LI=@$@`=3ylZ%;G@5)=u0SZ5r^jH5&j}%?f`|Hq zpahw?ynoQ)J^W|$R)9PGbmJF53T23Ad@C&H#TEwdI%3kkSM8!z)K?nLvOLwVY@>N% zZC2=_Ie%=KNkiRQc2(rqY7F}$J^PQ>ch_03r;A$U`m3W5RnbA33>7aF?dmbujOx^h z-U=Q6l*C2#)tjV(x%_efyTs|IPB{i?j?KinwHAPn3gD$ie`V3!j%2Z$AogzPqq~SJ zwG{JqGoZf6lsL?p)TyHjc{$<)hKhvRYTEdk(fjeH@Kukc?x;HTkGa3aC?jKC1zI#l z>Bs#4RHwJ_U`Ts}#+*-M*qM%R1#`FDI5Ssrk)vg`&WSg6k_o~^iGEqf+figLe6g91 zT1D7oz$=uXOG~7^ct_d!r->fsp=92i1G{g{H@r(eeDBbVjMdOOweJ#kDZTq(*8G`| z^6$b2Q|A>4>N9ur;C>!yZ^G|Q%CsK2Z#}Ox$ zP=(=eji$fhDy)mip;grTO_r4s+R!@eJ%0L>=B+dn5RdTS z%IYY0(LLg1K%T_kG~#+UWBNMYsnu*tf3&@&I6bcv1<%R)Cx#oe1`zF`G+kX?tuDR2 z5Z5V4^pLA?ydy-Urt=Ls)_|Orol{U2OU0bzVoMZR27C_khojfaW~IB~LSf%cxm=lL zX!f+05gOz{@YcHQpLjjKvYaZlL~rhSdF<0_I=r`BFE0eGw=qSKj z-ENaQE~>bZoo(h1HKpH=^aMe?mwiNwy;(~qJLi>53-OhxQw8*9RekW^XdLp%#+_}R6BMf=q6|S?> zK?LjRLtIbpif2IKT9Yx%S|4LUn7XhDnNXz5Wu(l1{m#m~s7Z`CIBowe)<)gwW9w_) zYp?ORe8Em1?W{yym!_e+c;AaSR|A!t#M}mHk5hPf&6xsz;Ij%=cKL-g_)RKk>1PuY zkUt0}fAYQ0K))2fqqQbfIZ?qY&e!{thlW3C;bdLP9}D^{#ye^E8@{eL)OfF|c%z#C z{>DRPFE5K8cS%2da7O&cBHw&k#Vh(QjT2>f85KDl_V;7ptY+sJj40ZzmqS|t{tp*+ z$M6Vq6+WkM6rU1HTvkr#d~`|--S))To_Lt_?9~XaX~-NOm^x;nJLSt;w$maQZ+S}W zO;a4LPROdM^HDeE%R9VTjwfHF2 zl#~u0(;BUqhR$F;5&PG8sF9Hi%0gy~-%u$0qoWRPxJp?N8iG75ay)@dWqj>wMvlU= z6J0OOM%b&Hdz(0VPf9-Z&?zt#aQqZdbhvU&Md{3}#v)ydqM-08(sv^cmve~y=wzc| zreiM&OJs2F?Cv6@EV7egUReVzeT*=Y7jFWkBGI!u50Lk!&2|cq&7$bt6k+x|%CNRl zR)lpiW&!?e?xc_rZYiMl7x8Gd?)F?*rjLOtlXDoS+y0X}rXh zMn)tp0A|lbLK+mY&@V>kcgO7X931$-XFb6Qm+AnZDxr{O_Nq+;pmdLb<$`qE2J0xWk;g*;7bf=N3#KL-6e3{+x>r}#c8c?! zJFeg~XGSIx->X1A5ePQz11q~G`$~(SE4~ku+n3dOKnsFh10AgkSNdviPSEe(028P_ z&zY7&r^`7vpwk`nJPCxPu>zV%4=?*qAgqShn)X03Ti|!L&{kWU;1bA4n-72W(WNo;Dx@wnK$YYsI5BbJdcE&CLn^&>z}-gM#N|*;Crrbb>`B`sUZ_XSxdmr?Jp33A}~4uU)%_R9bGW zPhmmWiF8qhHlc{Am$~eCNQ*e{!X+wEts$ zD`fwF3@r2g#r377NFc01KCY`~A~@OYm%2r}tn^>rSbHSKm1(h6??yk0i&DouAz`2Z5>=6WTEY!=0Pe4JU%X^)c>7HMUJB5;duXl z;a`PxF6g|XBBd7J;CX3)_5XJ|dTHEUh+!Al`GUF*WQ&I5Yp2A;qYWRo1F;Q7m>Xpr zeJ)p+k8s z7jK}8U)pw%DqWgxfbqS+7o2X3oH`g7poftsTuPTx_?#C6n=$6XpvLtl|q_KYkp$ zvazuUD2`H&i*jyXmyfM?ydx-c|4pLmAs9kapm1w=@X&Cui40t*4q!pFlxflAo7dO1Kx(aoWlK>^@$mohZ!09Y5W831N9*2nA`{@(PC zLVaS9(A3fw7Uwgyj@af8<``P7r-UG_aH_E+1NN;XUQlCvav>?p3{Bt&&(@T$}+xc#6fxSAS7gNWhvqA zWgU75(jp_O7QL(udjIW~1>o#Hck6`I_ztdJCXbmKmD&;SfGFs%?;^?Z;f)y?YL(e6oCm8z2fu33>kj zU8&bS--WtUg7%EIQ$k-ISN}se-R5f8EoAhNl~bOB97{%5HDc zZ&Ylw&LNpn;nMUQ1{Apa_Z&~}>4tK{M{~$jG8(XADjb(@d`6LmZ=F5fjHlD%qot2o z_z3sCokr*P%puiCv`P?=p7_#>il}4x_bv0>Rqm{gz9{DI=dX?UBh)y3Tg3G@-Tct* zfJznojnXW9kOuGTg-Kz_@Bqg3fB7K64WIDqOgoN;@y0B%-h0H)=rKj=TBULs$Kmxb3?OKRo|i&8 zF)l;9V4lJ1p-B~s{~25u?>%~yAT0n#W7J9eN-8UKr?qLK*b3aH_GV5}a-KeTLU|cG z>*;xJyD~hss5t}=m291I7|#<2y;)@KjAFwHBWLS2=h{k30iOq=N|-NBC4{+rP(zw< zo%Jz263I5ZQf2eCpA#gBf;zzJ1i|F-(H)Vp)9mI|*BJcG>%@T~PIwIMY1bcHJSAtz&o?^wfZETdt^a@u2F15{P zm9-(AQ{ak=&@Y}~(w|LD&(aJW(WD3!1so^sS&5BuBrjhn5?U{69+D=jL-f~pDcALBA zq4V$^k4H@YyRA4&Q8SGj%&a-FZ)%ykr>_oq7zmOVQom28EH%!@|Jt8#=Hh00kS8b{ zgX(*=kCYMriL8YBQ#0iMi>IJ~cHs1LqzoXdEABV)Xu-@JeLc+1ZYEE=;c+Tt;83oJ zoaHReAP%M*(so40v<0fTT~;X0I_`Yf(sI?^^0I{m{@uib7p72s<4^uLL0sdPxV*deVu395_;LH`)*E7w+Dyn-C}Y57zAf%sq#&^ z4~8VLui=wYzkAQ2=g2I3R#3Q8X8(}>-Hdc!gTa+6h299~2@-B;KnfgyWlL5c5*(~k zBlsJt8yyPc4qu(+Vw*2H`ljd6b^t9r#8m#HPhmj-+9?CvuBsPK z^^&A1x}GFi(^9y+VJb>yQ!U;^fxtq^4(Z z)On?S{meIB;SV1^AmM8ylp2~~g z$IrlkoX=h8)57<`_PL|)pII~g>pz?r6SYNwvW-p$V4aSt11ap%2eXrYj9?)Eh?2QbR`JfmDgjQ?# zMRR0G9s1p(%4b{M`#bdcb58mJXb<&N)5cbqTxS7Ffbos{%FNHn&WP4YfzA{ln}D4P z!O`LK6B9@*S|yPLO`RHWqQu0+-r$;wi1}-3F3>gJVu{7@Enf-`<7JR5M` zYQTUW|23dKP(6fbOiNFX1WeQH748pBPgCLZPWArlSV zWCTCxlfph*!7JS%lYJvo6J|{UBuNOB<$1*Wpp75NjM8jWFGdr=JC*8k@_AVas9y5D9=XmeDx(PG^{jr+=RZX=K8)Q=+1@XZR#C z^JFNak*{hTOacHU#gIF&W&|iH1v;bQ3KXH0(wvHrz-=Lbub>o!PP0ba!q1UqEJd!G$r_lN;-JcD zM9`B(3b(QyY?G=YJye~9iye_xjo@=@sBa_(pEr;VJJ^Zwk62{*46;lp0vjJ0X41kT z7MnJ&wy1j=ZKaSZBy_#W9=%s8V=GtBE+QD+F&Bwj!1jGp1+@?unHcIC`Raq!^T0+& zYwUUNA{DYgBd*Y|YxY+_oT%wfV$aIUZTD0*weQ*R!x4`3oJ*9Ry_BVc1xYpNd%0Nv zRFYOni3C%xpPR`bAXQp(ywAY!>7`Ilqr(}w>p6s72?;8K!ex}W;*B0W|FsAbJhy!b z&d2Ip_nP=csdQx$zfs?Mb(QAu7Dv+s7uc-7`l7_^I}xomS1He*)DhbtN}27G9S)tY?Clrw3;H zd9=?_k#B_HKMcK_hrT3#&i*Tb`|Tq(Eqyr_$>`k*Jr`*~)I-_DlKtX{JKda^qrf7S zr~#T2ZRZ!?NB~~>_@_-FXfpbB(60RRjJH2pvdMMDsMth9XE%yZ6NXgpgdg)2ESe^I z+-(Ao0#8*kcPJ1L5zWG^_&aYEg^v*Kh}&X&@R-1zgj?%>op#7>cjf_N%9W~(n^uga z{4{lcBMQwiGdJIx+!CrFS8)X&wOOZFvJs)qJi-45&|p7esj+s;2xip72Um$Y`d4)7 z^g|^S!+6L&y%nD>4u74+|5n(E4`}vq#D1jrWK(E)LXIHK`yGt)fE_t%bJCMlDFW@B z^bHSzr9t4gc;`eEhJ^dm1eNFf`8|1io^y2z{pMSg8FYX!cTawfX^n)oq|cfzI^W1l zsyTFg_PQT5KUd%x_7DI(zi{ma{mmOmPq6L=zL^R^#!X*6XG5=MlxL{J;li^^3^%vm zwRc*MJZZbp?(^zeGp{hb9=F>VJL!9BHc6;>SGjmBF^^SH*f;KZ#iMw`5n(DUnHH^V zRA-F75^Vm6ixRk}jeDZMz`98W!n#>#Km`+t?5(jr2Q298uu{G} zz(D^FHh}0m0m;vR3<(;pfE}5s$995}rX#wOC>9+BZZ@-so(al~dW55Y#NF$ZNyT)b$6<`ePC(fauu+-rYbTLCyX*YhL@=!wMH5g_tlS;w)Gh zq4JC*U#eiivyYfpfhT+Q2Z4E%q+db<`098AxPus`JIl2>%i*Dp!I+#5xN)0ifa>2C zmxFtjAP$5^hX@u%#snu)xVzOANWl4RBrH)Z1LwGvKY@@~0&O6^I1(tZS!Rs9I`YXGOo$`WY zbNJy6_rdey)61??X9AV4!YF19$YEnQ^4{yb{aZ=tovCwDW_%RI)3(NFs(i}f7w779 zx#2?vJI{$wk*$$jzt1DGmD^QZjc1*dny+UhT@o&e4dcjCi>Wkp0>1^ju&g#@3-iko-|fB>7`e>?(e|Qk>+Jx3VOd`8lq69eJ Date: Tue, 23 Dec 2025 08:31:23 -0800 Subject: [PATCH 63/87] Add comprehensive workflow overview documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Document all 11 steps of the scRNA-seq analysis pipeline with: - Parameters and default values - Algorithms used (DESeq2, Harmony, GSEA, etc.) - Input/output specifications - Workflow diagram - Output directory structure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 1 + .../snakemake_workflow/WORKFLOW_OVERVIEW.md | 380 ++++++++++++++++++ 2 files changed, 381 insertions(+) create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md diff --git a/problems_to_solve.md b/problems_to_solve.md index b2d4a7d..cca966d 100644 --- a/problems_to_solve.md +++ b/problems_to_solve.md @@ -3,6 +3,7 @@ Open problems marked with [ ] Fixed problems marked with [x] + [x] I would now like to add another workflow, with code saved to src/BetterCodeBetterScience/rnaseq/snakemake_workflow. This workflow will use the Snakemake workflow manager (https://snakemake.readthedocs.io/en/stable/index.html); otherwise it should be functionally equivalent to the other workflows already developed. - Created `snakemake_workflow/` directory with: - `Snakefile`: Main workflow entry point diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md new file mode 100644 index 0000000..ec4f27a --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md @@ -0,0 +1,380 @@ +# Snakemake scRNA-seq Immune Aging Workflow + +## Overview + +This workflow analyzes single-cell RNA sequencing (scRNA-seq) data to investigate gene expression changes associated with aging in immune cells. It processes data from the OneK1K dataset (peripheral blood mononuclear cells from 982 donors) through quality control, normalization, dimensionality reduction, and per-cell-type differential expression analysis. + +The workflow is divided into two phases: +- **Global Steps (1-7)**: Process the entire dataset +- **Per-Cell-Type Steps (8-11)**: Run independently for each cell type discovered in Step 7 + +--- + +## Workflow Diagram + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ GLOBAL STEPS (1-7) │ +├─────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Step 1: Download ──► Step 2: Filter ──► Step 3: QC ──► Step 4: Preprocess +│ │ │ +│ ▼ │ +│ Step 7: Pseudobulk ◄── Step 6: Cluster ◄── Step 5: DimRed +│ │ │ +└──────────────────────────────┼──────────────────────────────────────────┘ + │ + ▼ (discovers N cell types) +┌──────────────────────────────────────────────────────────────────────────┐ +│ PER-CELL-TYPE STEPS (8-11) │ +│ Runs in parallel for each cell type │ +├──────────────────────────────────────────────────────────────────────────┤ +│ │ +│ For each cell type: │ +│ │ +│ Step 8: Differential Expression │ +│ │ │ +│ ├──► Step 9: GSEA (Pathway Analysis) │ +│ │ │ +│ ├──► Step 10: Enrichr (Overrepresentation) │ +│ │ │ +│ └──► Step 11: Predictive Modeling (Age Prediction) │ +│ │ +└──────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Step Details + +### Step 1: Data Download + +**Purpose**: Download the raw scRNA-seq dataset from CELLxGENE. + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `url` | CELLxGENE URL | Source URL for the h5ad file | +| `dataset_name` | "OneK1K" | Dataset identifier | + +**Input**: None (downloads from URL) +**Output**: Raw h5ad file (~1.2M cells × 35K genes) + +--- + +### Step 2: Data Filtering + +**Purpose**: Remove low-quality donors and rare cell types to ensure robust downstream analysis. + +**Operations**: +1. **Donor filtering**: Remove donors with abnormally low cell counts + - Uses percentile-based cutoff to identify outliers +2. **Cell type filtering**: Keep only cell types present in sufficient donors + - Requires minimum cells per cell type in a threshold percentage of donors +3. **Zero-count gene removal**: Remove genes with no expression across retained cells +4. **Memory loading**: Convert lazy-loaded data to in-memory representation + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `cutoff_percentile` | 1.0 | Percentile for donor cell count cutoff | +| `min_cells_per_celltype` | 10 | Minimum cells per cell type per donor | +| `percent_donors` | 0.95 | Fraction of donors that must have the cell type | + +**Input**: Raw h5ad file +**Output**: Filtered h5ad checkpoint + +--- + +### Step 3: Quality Control (QC) + +**Purpose**: Identify and remove low-quality cells, dying cells, and doublets. + +**Operations**: +1. **Gene annotation**: Identify mitochondrial (MT-), ribosomal (RPS/RPL), and hemoglobin (HB) genes +2. **QC metric calculation**: Compute per-cell metrics using scanpy + - Total counts, genes detected, % mitochondrial, % ribosomal, % hemoglobin +3. **Cell filtering**: Remove cells outside quality thresholds +4. **Doublet detection**: Run Scrublet per-donor to identify and remove doublets +5. **Raw count preservation**: Store raw counts in a layer for pseudobulking + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `min_genes` | 200 | Minimum genes per cell | +| `max_genes` | 6000 | Maximum genes per cell (doublet filter) | +| `min_counts` | 500 | Minimum UMIs per cell | +| `max_counts` | 30000 | Maximum UMIs per cell (doublet filter) | +| `max_hb_pct` | 5.0 | Maximum hemoglobin % (RBC contamination) | +| `expected_doublet_rate` | 0.06 | Expected doublet rate for Scrublet | + +**Algorithm**: Scrublet (doublet detection) +**Input**: Filtered h5ad +**Output**: QC-filtered h5ad checkpoint + +--- + +### Step 4: Preprocessing + +**Purpose**: Normalize expression values and select informative genes for downstream analysis. + +**Operations**: +1. **Normalization**: Scale counts to target sum per cell (CPM-like) +2. **Log transformation**: Apply log1p transformation for variance stabilization +3. **HVG selection**: Identify highly variable genes using Seurat v3 method + - Accounts for batch effects using donor as batch key +4. **Nuisance gene removal**: Exclude from HVG list: + - TCR/BCR variable region genes (IG[HKL]V, TR[ABDG]V patterns) + - Mitochondrial genes (MT-*) + - Ribosomal genes (RPS*, RPL*) +5. **PCA**: Compute principal components on HVG subset + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `target_sum` | 10000 | Target sum for normalization | +| `n_top_genes` | 3000 | Number of highly variable genes | +| `batch_key` | "donor_id" | Column for batch correction in HVG selection | + +**Algorithms**: +- Normalization: scanpy `normalize_total` +- HVG: Seurat v3 method (`flavor="seurat_v3"`) +- PCA: ARPACK solver + +**Input**: QC-filtered h5ad +**Output**: Preprocessed h5ad checkpoint + +--- + +### Step 5: Dimensionality Reduction + +**Purpose**: Reduce dimensionality and correct for batch effects for visualization and clustering. + +**Operations**: +1. **Batch correction**: Run Harmony integration on PCA coordinates + - Corrects for donor-specific technical effects +2. **Neighbor graph**: Compute k-nearest neighbor graph in corrected PCA space +3. **UMAP**: Generate 2D embedding for visualization + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `batch_key` | "donor_id" | Column for batch correction | +| `n_neighbors` | 30 | Number of neighbors for graph | +| `n_pcs` | 40 | Number of PCs to use | + +**Algorithms**: +- Batch correction: Harmony (harmony-pytorch) +- Neighbor graph: scanpy `pp.neighbors` (uses pynndescent/numba) +- UMAP: scanpy `tl.umap` + +**Input**: Preprocessed h5ad +**Output**: Dimensionality-reduced h5ad checkpoint + +--- + +### Step 6: Clustering + +**Purpose**: Cluster cells for visualization and validation (uses pre-existing cell type annotations). + +**Operations**: +1. **Leiden clustering**: Community detection on neighbor graph +2. **UMAP visualization**: Plot clusters colored by cell type + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `resolution` | 1.0 | Leiden clustering resolution | + +**Algorithm**: Leiden clustering (scanpy `tl.leiden`) + +**Input**: Dimensionality-reduced h5ad +**Output**: Clustered h5ad checkpoint + +--- + +### Step 7: Pseudobulking (Checkpoint) + +**Purpose**: Aggregate single-cell counts to donor-level pseudobulk samples for differential expression analysis. + +**Operations**: +1. **Count aggregation**: Sum raw counts per (cell_type, donor) combination + - Uses one-hot encoding for efficient sparse matrix multiplication +2. **Metadata preservation**: Retain donor-level metadata (age, sex, etc.) +3. **Sample filtering**: Remove pseudobulk samples with too few contributing cells +4. **Cell type discovery**: Identify cell types with sufficient samples for analysis + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `group_col` | "cell_type" | Column for grouping cells | +| `donor_col` | "donor_id" | Column for donor identity | +| `metadata_cols` | ["development_stage", "sex"] | Metadata to preserve | +| `min_cells` | 10 | Minimum cells per pseudobulk sample | +| `min_samples_per_cell_type` | 10 | Minimum samples to include cell type | + +**Note**: This step uses Snakemake's `checkpoint` mechanism to dynamically determine which cell types to analyze in subsequent steps. + +**Input**: QC checkpoint (raw counts), filtered checkpoint (gene names) +**Output**: Pseudobulk h5ad, cell_types.json, var_to_feature.json + +--- + +### Step 8: Differential Expression (Per-Cell-Type) + +**Purpose**: Identify genes associated with aging within each cell type. + +**Operations**: +1. **Age extraction**: Parse numeric age from development_stage field +2. **Age scaling**: Z-score normalize age for stable model fitting +3. **DESeq2 analysis**: Fit negative binomial GLM with design `~ age_scaled + sex` +4. **Wald test**: Test for age effect (contrast on age_scaled coefficient) +5. **Multiple testing**: Apply Benjamini-Hochberg FDR correction + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `design_factors` | ["age_scaled", "sex"] | Model covariates | +| `n_cpus` | 8 | CPUs for DESeq2 | + +**Algorithm**: DESeq2 (via PyDESeq2) +- Dispersion estimation with shrinkage +- Wald test for coefficient significance +- Cook's distance for outlier detection + +**Input**: Pseudobulk h5ad, var_to_feature mapping +**Output**: DESeq2 statistics (pkl), results table (parquet), counts (parquet) + +--- + +### Step 9: Pathway Analysis - GSEA (Per-Cell-Type) + +**Purpose**: Identify biological pathways enriched in aging-associated genes using ranked gene set enrichment. + +**Operations**: +1. **Gene ranking**: Rank genes by DESeq2 test statistic +2. **Preranked GSEA**: Run against MSigDB Hallmark gene sets +3. **NES calculation**: Compute normalized enrichment scores +4. **Visualization**: Generate pathway enrichment plots + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `gene_sets` | ["MSigDB_Hallmark_2020"] | Gene set databases | +| `n_top` | 10 | Number of top pathways to display | + +**Algorithm**: GSEA prerank (via gseapy) +- 1000 permutations for p-value estimation +- Min pathway size: 10 genes, Max: 1000 genes + +**Input**: DE results (parquet) +**Output**: GSEA results (pkl) + +--- + +### Step 10: Overrepresentation Analysis - Enrichr (Per-Cell-Type) + +**Purpose**: Test for pathway enrichment in significantly up/down-regulated gene sets. + +**Operations**: +1. **Gene set extraction**: Separate significant genes (padj < 0.05) by direction +2. **Enrichr analysis**: Query Enrichr web service for pathway enrichment +3. **Visualization**: Generate dot plots for enriched pathways + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `gene_sets` | ["MSigDB_Hallmark_2020"] | Gene set databases | +| `padj_threshold` | 0.05 | Significance threshold for gene selection | +| `n_top` | 10 | Number of top pathways to display | + +**Algorithm**: Enrichr (via gseapy) +- Fisher's exact test with FDR correction + +**Input**: DE results (parquet) +**Output**: Enrichr results for up/down genes (pkl) + +--- + +### Step 11: Predictive Modeling (Per-Cell-Type) + +**Purpose**: Assess whether gene expression in each cell type can predict donor age. + +**Operations**: +1. **Feature preparation**: Combine gene expression with sex as features +2. **Cross-validation**: 5-fold shuffle split +3. **Model training**: Linear Support Vector Regression with scaling +4. **Baseline comparison**: Compare full model (genes + sex) vs. baseline (sex only) +5. **Metrics**: R² score and Mean Absolute Error (MAE) + +| Parameter | Default Value | Description | +|-----------|--------------|-------------| +| `n_splits` | 5 | Number of CV folds | + +**Algorithm**: Linear SVR (scikit-learn) +- StandardScaler for feature normalization +- C=1.0 regularization +- Max 10,000 iterations + +**Input**: Counts (parquet), pseudobulk metadata +**Output**: Prediction results with CV metrics (pkl) + +--- + +## Output Structure + +``` +{datadir}/workflow/ +├── checkpoints/ +│ ├── dataset-{name}_step-02_desc-filtered.h5ad +│ ├── dataset-{name}_step-03_desc-qc.h5ad +│ ├── dataset-{name}_step-04_desc-preprocessed.h5ad +│ ├── dataset-{name}_step-05_desc-dimreduced.h5ad +│ ├── dataset-{name}_step-06_desc-clustered.h5ad +│ ├── dataset-{name}_step-07_desc-pseudobulk.h5ad +│ ├── dataset-{name}_step-07_cell_types.json +│ └── dataset-{name}_step-07_var_to_feature.json +├── results/ +│ └── per_cell_type/ +│ └── {cell_type}/ +│ ├── stat_res.pkl +│ ├── de_results.parquet +│ ├── counts.parquet +│ ├── gsea_results.pkl +│ ├── enrichr_up.pkl +│ ├── enrichr_down.pkl +│ └── prediction_results.pkl +├── figures/ +│ ├── donor_cell_counts_distribution.png +│ ├── qc_violin_plots.png +│ ├── hemoglobin_distribution.png +│ ├── pca_cell_type.png +│ ├── umap_total_counts.png +│ └── per_cell_type/{cell_type}/ +│ ├── gsea_pathways.png +│ └── enrichr_*.png +└── logs/ + └── step*.log +``` + +--- + +## Usage + +```bash +# Run full workflow +snakemake --cores 16 --config datadir=/path/to/data + +# Dry run (see what would be executed) +snakemake -n --config datadir=/path/to/data + +# Run only preprocessing (steps 1-6) +snakemake --cores 16 preprocessing_only --config datadir=/path/to/data + +# Force re-run from a specific step +snakemake --cores 16 --forcerun dimensionality_reduction --config datadir=/path/to/data +``` + +--- + +## Key Dependencies + +| Package | Purpose | +|---------|---------| +| scanpy | scRNA-seq analysis framework | +| anndata | Data structure for scRNA-seq | +| harmony-pytorch | Batch correction | +| pydeseq2 | Differential expression | +| gseapy | GSEA and Enrichr analysis | +| scikit-learn | Predictive modeling | +| snakemake | Workflow management | From 9461cda9fe022870bfa136d23b80eaa4fa3cddc2 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 08:31:42 -0800 Subject: [PATCH 64/87] update deps --- uv.lock | 266 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 266 insertions(+) diff --git a/uv.lock b/uv.lock index 37de7fe..bc7755a 100644 --- a/uv.lock +++ b/uv.lock @@ -286,6 +286,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/42/b9/f8d6fa329ab25128b7e98fd83a3cb34d9db5b059a9847eddb840a0af45dd/argon2_cffi_bindings-25.1.0-cp39-abi3-win_arm64.whl", hash = "sha256:b0fdbcf513833809c882823f98dc2f931cf659d9a1429616ac3adebb49f5db94", size = 27149, upload-time = "2025-07-30T10:01:59.329Z" }, ] +[[package]] +name = "argparse-dataclass" +version = "2.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/1a/ff/a2e4e328075ddef2ac3c9431eb12247e4ba707a70324894f1e6b4f43c286/argparse_dataclass-2.0.0.tar.gz", hash = "sha256:09ab641c914a2f12882337b9c3e5086196dbf2ee6bf0ef67895c74002cc9297f", size = 6395, upload-time = "2023-06-11T20:32:54.465Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b3/66/e6c0a808950ba5a4042e2fcedd577fc7401536c7db063de4d7c36be06f84/argparse_dataclass-2.0.0-py3-none-any.whl", hash = "sha256:3ffc8852a88d9d98d1364b4441a712491320afb91fb56049afd8a51d74bb52d2", size = 8762, upload-time = "2023-06-11T20:32:52.724Z" }, +] + [[package]] name = "array-api-compat" version = "1.12.0" @@ -513,6 +522,7 @@ dependencies = [ { name = "scipy" }, { name = "scrublet" }, { name = "seaborn" }, + { name = "snakemake" }, { name = "statsmodels" }, { name = "templateflow" }, { name = "tomli" }, @@ -592,6 +602,7 @@ requires-dist = [ { name = "scipy", specifier = ">=1.14.1" }, { name = "scrublet", specifier = ">=0.2.3" }, { name = "seaborn", specifier = ">=0.13.2" }, + { name = "snakemake", specifier = ">=8.0" }, { name = "statsmodels", specifier = ">=0.14.5" }, { name = "templateflow", specifier = ">=25.1.1" }, { name = "tomli", specifier = ">=2.2.1" }, @@ -1021,6 +1032,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/60/97/891a0971e1e4a8c5d2b20bbe0e524dc04548d2307fee33cdeba148fd4fc7/comm-0.2.3-py3-none-any.whl", hash = "sha256:c615d91d75f7f04f095b30d1c1711babd43bdc6419c1be9886a85f2f4e489417", size = 7294, upload-time = "2025-07-25T14:02:02.896Z" }, ] +[[package]] +name = "conda-inject" +version = "1.3.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "pyyaml" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/b1/a8/8dc86113c65c949cc72d651461d6e4c544b3302a85ed14a5298829e6a419/conda_inject-1.3.2.tar.gz", hash = "sha256:0b8cde8c47998c118d8ff285a04977a3abcf734caf579c520fca469df1cd0aac", size = 3635, upload-time = "2024-05-27T12:20:58.873Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/87/4c/fc30b69fb4062aee57e3ab7ff493647c4220144908f0839c619f912045bf/conda_inject-1.3.2-py3-none-any.whl", hash = "sha256:6e641b408980c2814e3e527008c30749117909a21ff47392f07ef807da93a564", size = 4133, upload-time = "2024-05-27T12:20:57.332Z" }, +] + [[package]] name = "confection" version = "0.1.5" @@ -1034,6 +1057,21 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/0c/00/3106b1854b45bd0474ced037dfe6b73b90fe68a68968cef47c23de3d43d2/confection-0.1.5-py3-none-any.whl", hash = "sha256:e29d3c3f8eac06b3f77eb9dfb4bf2fc6bcc9622a98ca00a698e3d019c6430b14", size = 35451, upload-time = "2024-05-31T16:16:59.075Z" }, ] +[[package]] +name = "configargparse" +version = "1.7.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/85/4d/6c9ef746dfcc2a32e26f3860bb4a011c008c392b83eabdfb598d1a8bbe5d/configargparse-1.7.1.tar.gz", hash = "sha256:79c2ddae836a1e5914b71d58e4b9adbd9f7779d4e6351a637b7d2d9b6c46d3d9", size = 43958, upload-time = "2025-05-23T14:26:17.369Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/31/28/d28211d29bcc3620b1fece85a65ce5bb22f18670a03cd28ea4b75ede270c/configargparse-1.7.1-py3-none-any.whl", hash = "sha256:8b586a31f9d873abd1ca527ffbe58863c99f36d896e2829779803125e83be4b6", size = 25607, upload-time = "2025-05-23T14:26:15.923Z" }, +] + +[[package]] +name = "connection-pool" +version = "0.0.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/bd/df/c9b4e25dce00f6349fd28aadba7b6c3f7431cc8bd4308a158fbe57b6a22e/connection_pool-0.0.3.tar.gz", hash = "sha256:bf429e7aef65921c69b4ed48f3d48d3eac1383b05d2df91884705842d974d0dc", size = 3795, upload-time = "2020-09-17T02:48:28.824Z" } + [[package]] name = "contourpy" version = "1.3.3" @@ -1435,6 +1473,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/0c/d5/c5db1ea3394c6e1732fb3286b3bd878b59507a8f77d32a2cebda7d7b7cd4/donfig-0.8.1.post1-py3-none-any.whl", hash = "sha256:2a3175ce74a06109ff9307d90a230f81215cbac9a751f4d1c6194644b8204f9d", size = 21592, upload-time = "2024-05-23T14:13:55.283Z" }, ] +[[package]] +name = "dpath" +version = "2.2.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/b5/ce/e1fd64d36e4a5717bd5e6b2ad188f5eaa2e902fde871ea73a79875793fc9/dpath-2.2.0.tar.gz", hash = "sha256:34f7e630dc55ea3f219e555726f5da4b4b25f2200319c8e6902c394258dd6a3e", size = 28266, upload-time = "2024-06-12T22:08:03.686Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/05/d1/8952806fbf9583004ab479d8f58a9496c3d35f6b6009ddd458bdd9978eaf/dpath-2.2.0-py3-none-any.whl", hash = "sha256:b330a375ded0a0d2ed404440f6c6a715deae5313af40bbb01c8a41d891900576", size = 17618, upload-time = "2024-06-12T22:08:01.881Z" }, +] + [[package]] name = "durationpy" version = "0.10" @@ -1751,6 +1798,30 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/36/93/f7d93f394eaa5f96d8249fb582034dfb5a7d1eb4007ad4a1cc65c2e17463/funowl-0.2.3-py3-none-any.whl", hash = "sha256:4c4328d03c7815cd61d6691f0fafc78dc9a78ec3dcab4c83afb64d125ad3660e", size = 51376, upload-time = "2023-08-08T17:38:14.735Z" }, ] +[[package]] +name = "gitdb" +version = "4.0.12" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "smmap" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/72/94/63b0fc47eb32792c7ba1fe1b694daec9a63620db1e313033d18140c2320a/gitdb-4.0.12.tar.gz", hash = "sha256:5ef71f855d191a3326fcfbc0d5da835f26b13fbcba60c32c21091c349ffdb571", size = 394684, upload-time = "2025-01-02T07:20:46.413Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a0/61/5c78b91c3143ed5c14207f463aecfc8f9dbb5092fb2869baf37c273b2705/gitdb-4.0.12-py3-none-any.whl", hash = "sha256:67073e15955400952c6565cc3e707c554a4eea2e428946f7a4c162fab9bd9bcf", size = 62794, upload-time = "2025-01-02T07:20:43.624Z" }, +] + +[[package]] +name = "gitpython" +version = "3.1.45" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "gitdb" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/9a/c8/dd58967d119baab745caec2f9d853297cec1989ec1d63f677d3880632b88/gitpython-3.1.45.tar.gz", hash = "sha256:85b0ee964ceddf211c41b9f27a49086010a190fd8132a24e21f362a4b36a791c", size = 215076, upload-time = "2025-07-24T03:45:54.871Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/01/61/d4b89fec821f72385526e1b9d9a3a0385dda4a72b206d28049e2c7cd39b8/gitpython-3.1.45-py3-none-any.whl", hash = "sha256:8908cb2e02fb3b93b7eb0f2827125cb699869470432cc885f019b8fd0fccff77", size = 208168, upload-time = "2025-07-24T03:45:52.517Z" }, +] + [[package]] name = "google-auth" version = "2.45.0" @@ -2201,6 +2272,22 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ff/62/85c4c919272577931d407be5ba5d71c20f0b616d31a0befe0ae45bb79abd/imagesize-1.4.1-py2.py3-none-any.whl", hash = "sha256:0d8d18d08f840c19d0ee7ca1fd82490fdc3729b7ac93f49870406ddde8ef8d8b", size = 8769, upload-time = "2022-07-01T12:21:02.467Z" }, ] +[[package]] +name = "immutables" +version = "0.21" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/69/41/0ccaa6ef9943c0609ec5aa663a3b3e681c1712c1007147b84590cec706a0/immutables-0.21.tar.gz", hash = "sha256:b55ffaf0449790242feb4c56ab799ea7af92801a0a43f9e2f4f8af2ab24dfc4a", size = 89008, upload-time = "2024-10-10T00:55:01.434Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/4d/f9/0c46f600702b815182212453f5514c0070ee168b817cdf7c3767554c8489/immutables-0.21-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ef1ed262094b755903122c3c3a83ad0e0d5c3ab7887cda12b2fe878769d1ee0d", size = 31885, upload-time = "2024-10-10T00:54:19.406Z" }, + { url = "https://files.pythonhosted.org/packages/29/34/7608d2eab6179aa47e8f59ab0fbd5b3eeb2333d78c9dc2da0de8de4ed322/immutables-0.21-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:ce604f81d9d8f26e60b52ebcb56bb5c0462c8ea50fb17868487d15f048a2f13e", size = 31537, upload-time = "2024-10-10T00:54:20.998Z" }, + { url = "https://files.pythonhosted.org/packages/f7/52/cb9e2bb7a69338155ffabbd2f993c968c750dd2d5c6c6eaa6ebb7bfcbdfa/immutables-0.21-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b48b116aaca4500398058b5a87814857a60c4cb09417fecc12d7da0f5639b73d", size = 104270, upload-time = "2024-10-10T00:54:21.912Z" }, + { url = "https://files.pythonhosted.org/packages/0f/a4/25df835a9b9b372a4a869a8a1ac30a32199f2b3f581ad0e249f7e3d19eed/immutables-0.21-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dad7c0c74b285cc0e555ec0e97acbdc6f1862fcd16b99abd612df3243732e741", size = 104864, upload-time = "2024-10-10T00:54:22.956Z" }, + { url = "https://files.pythonhosted.org/packages/4a/51/b548fbc657134d658e179ee8d201ae82d9049aba5c3cb2d858ed2ecb7e3f/immutables-0.21-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e44346e2221a5a676c880ca8e0e6429fa24d1a4ae562573f5c04d7f2e759b030", size = 99733, upload-time = "2024-10-10T00:54:23.99Z" }, + { url = "https://files.pythonhosted.org/packages/47/db/d7b1e0e88faf07fe9a88579a86f58078a9a37fff871f4b3dbcf28cad9a12/immutables-0.21-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:8b10139b529a460e53fe8be699ebd848c54c8a33ebe67763bcfcc809a475a26f", size = 101698, upload-time = "2024-10-10T00:54:25.734Z" }, + { url = "https://files.pythonhosted.org/packages/69/2d/6fe42a1a053dd8cfb9f45e91d5246522637c7287dc6bd347f67aedf7aedb/immutables-0.21-cp312-cp312-win32.whl", hash = "sha256:fc512d808662614feb17d2d92e98f611d69669a98c7af15910acf1dc72737038", size = 30977, upload-time = "2024-10-10T00:54:27.436Z" }, + { url = "https://files.pythonhosted.org/packages/63/45/d062aca6971e99454ce3ae42a7430037227fee961644ed1f8b6c9b99e0a5/immutables-0.21-cp312-cp312-win_amd64.whl", hash = "sha256:461dcb0f58a131045155e52a2c43de6ec2fe5ba19bdced6858a3abb63cee5111", size = 35088, upload-time = "2024-10-10T00:54:28.388Z" }, +] + [[package]] name = "importlib-metadata" version = "8.7.1" @@ -4573,6 +4660,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/22/a6/858897256d0deac81a172289110f31629fc4cee19b6f01283303e18c8db3/ptyprocess-0.7.0-py2.py3-none-any.whl", hash = "sha256:4b41f3967fce3af57cc7e94b888626c18bf37a083e3651ca8feeb66d492fef35", size = 13993, upload-time = "2020-12-28T15:15:28.35Z" }, ] +[[package]] +name = "pulp" +version = "3.3.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/16/1c/d880b739b841a8aa81143091c9bdda5e72e226a660aa13178cb312d4b27f/pulp-3.3.0.tar.gz", hash = "sha256:7eb99b9ce7beeb8bbb7ea9d1c919f02f003ab7867e0d1e322f2f2c26dd31c8ba", size = 16301847, upload-time = "2025-09-18T08:14:57.552Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/99/6c/64cafaceea3f99927e84b38a362ec6a8f24f33061c90bda77dfe1cd4c3c6/pulp-3.3.0-py3-none-any.whl", hash = "sha256:dd6ad2d63f196d1254eddf9dcff5cd224912c1f046120cb7c143c5b0eda63fae", size = 16387700, upload-time = "2025-09-18T08:14:53.368Z" }, +] + [[package]] name = "pure-eval" version = "0.2.3" @@ -5382,6 +5478,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/3f/51/d4db610ef29373b879047326cbf6fa98b6c1969d6f6dc423279de2b1be2c/requests_toolbelt-1.0.0-py2.py3-none-any.whl", hash = "sha256:cccfdd665f0a24fcf4726e690f65639d272bb0637b9b92dfd91a5568ccf6bd06", size = 54481, upload-time = "2023-05-01T04:11:28.427Z" }, ] +[[package]] +name = "reretry" +version = "0.11.8" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/40/1d/25d562a62b7471616bccd7c15a7533062eb383927e68667bf331db990415/reretry-0.11.8.tar.gz", hash = "sha256:f2791fcebe512ea2f1d153a2874778523a8064860b591cd90afc21a8bed432e3", size = 4836, upload-time = "2022-12-18T11:08:50.641Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/66/11/e295e07d4ae500144177f875a8de11daa4d86b8246ab41c76a98ce9280ca/reretry-0.11.8-py2.py3-none-any.whl", hash = "sha256:5ec1084cd9644271ee386d34cd5dd24bdb3e91d55961b076d1a31d585ad68a79", size = 5609, upload-time = "2022-12-18T11:08:49.1Z" }, +] + [[package]] name = "rfc3339-validator" version = "0.1.4" @@ -5871,6 +5976,135 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ad/95/bc978be7ea0babf2fb48a414b6afaad414c6a9e8b1eafc5b8a53c030381a/smart_open-7.5.0-py3-none-any.whl", hash = "sha256:87e695c5148bbb988f15cec00971602765874163be85acb1c9fb8abc012e6599", size = 63940, upload-time = "2025-11-08T21:38:39.024Z" }, ] +[[package]] +name = "smmap" +version = "5.0.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/44/cd/a040c4b3119bbe532e5b0732286f805445375489fceaec1f48306068ee3b/smmap-5.0.2.tar.gz", hash = "sha256:26ea65a03958fa0c8a1c7e8c7a58fdc77221b8910f6be2131affade476898ad5", size = 22329, upload-time = "2025-01-02T07:14:40.909Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/04/be/d09147ad1ec7934636ad912901c5fd7667e1c858e19d355237db0d0cd5e4/smmap-5.0.2-py3-none-any.whl", hash = "sha256:b30115f0def7d7531d22a0fb6502488d879e75b260a9db4d0819cfb25403af5e", size = 24303, upload-time = "2025-01-02T07:14:38.724Z" }, +] + +[[package]] +name = "snakemake" +version = "9.14.5" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "appdirs" }, + { name = "conda-inject" }, + { name = "configargparse" }, + { name = "connection-pool" }, + { name = "docutils" }, + { name = "dpath" }, + { name = "gitpython" }, + { name = "humanfriendly" }, + { name = "immutables" }, + { name = "jinja2" }, + { name = "jsonschema" }, + { name = "nbformat" }, + { name = "packaging" }, + { name = "psutil" }, + { name = "pulp" }, + { name = "pyyaml" }, + { name = "referencing" }, + { name = "requests" }, + { name = "reretry" }, + { name = "smart-open" }, + { name = "snakemake-interface-common" }, + { name = "snakemake-interface-executor-plugins" }, + { name = "snakemake-interface-logger-plugins" }, + { name = "snakemake-interface-report-plugins" }, + { name = "snakemake-interface-scheduler-plugins" }, + { name = "snakemake-interface-storage-plugins" }, + { name = "tabulate" }, + { name = "throttler" }, + { name = "wrapt" }, + { name = "yte" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/0c/28/b6ad097922b45a85f792bdf9b4f823ee6f37e72d80d471d0b13262309398/snakemake-9.14.5.tar.gz", hash = "sha256:f66b181806f02f9d7f43542bd85091a2e65ab126dfcb7bd869a622917d786bfb", size = 6745401, upload-time = "2025-12-15T13:55:25.898Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e0/a7/a13ec64b15cbd3dd0797847cb529e118e939d46968e80d0cdf7f5c638374/snakemake-9.14.5-py3-none-any.whl", hash = "sha256:a0f23d2250918553d283675051486f75c14f9fc33217a65810fefd52e292d390", size = 1132670, upload-time = "2025-12-15T13:55:23.721Z" }, +] + +[[package]] +name = "snakemake-interface-common" +version = "1.22.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "argparse-dataclass" }, + { name = "configargparse" }, + { name = "packaging" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/fd/be/cbd3f30c24eecb0e7d48f7025c770b7dc664124a01c8d9df6da73eb4fbd1/snakemake_interface_common-1.22.0.tar.gz", hash = "sha256:ef1fa710a15629be4cc352b938596ab5235ecf0b615c5845f086d6c5da10cb88", size = 13859, upload-time = "2025-09-30T17:11:00.92Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0f/c4/2da11760cebae7cfc66304ce5dccbabf9f1323e3e0ab8091960b84ad2bd6/snakemake_interface_common-1.22.0-py3-none-any.whl", hash = "sha256:a68c57cba8569536195fc9b7db07bc2a91b56ad811636585dae0313b2ca2e1ce", size = 16840, upload-time = "2025-09-30T17:10:59.594Z" }, +] + +[[package]] +name = "snakemake-interface-executor-plugins" +version = "9.3.9" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "argparse-dataclass" }, + { name = "snakemake-interface-common" }, + { name = "throttler" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/18/51/e62e14090393d6688e7d4026a574f0a9de14ffb678bc4c6993306fc3e62a/snakemake_interface_executor_plugins-9.3.9.tar.gz", hash = "sha256:988ab388d48522fac84107867ae3f3398312b93b55df6ed7b99afc225468ca26", size = 16530, upload-time = "2025-07-29T15:34:21.993Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/7a/8b/fec4419acedfa5924549f40664cb2134f2ea5fae3d8a39df5e24035df06a/snakemake_interface_executor_plugins-9.3.9-py3-none-any.whl", hash = "sha256:bae310d5e258d5504731cca69d73051cd5ae1702d46fa66c03ef947be27fe09a", size = 22511, upload-time = "2025-07-29T15:34:20.472Z" }, +] + +[[package]] +name = "snakemake-interface-logger-plugins" +version = "2.0.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "snakemake-interface-common" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/c0/92/2fe4fa879a5d4408cad6db5330cd4ebd352e47529cb0fdfdf8ebf73f2920/snakemake_interface_logger_plugins-2.0.0.tar.gz", hash = "sha256:0e8ff2af4c55ca140d6ea1c1540e733a4b3944abae48fe0eaf6a707e5797cd17", size = 13767, upload-time = "2025-09-28T19:51:55.094Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c3/1f/f0848d750e7ca675e2cc0ea5e14135f432db498e8ad8cf746a19108e9d55/snakemake_interface_logger_plugins-2.0.0-py3-none-any.whl", hash = "sha256:c06a6779528a60f0362049c3adfb558e64d071769691718c810ef3057fdb9ff3", size = 12293, upload-time = "2025-09-28T19:51:53.556Z" }, +] + +[[package]] +name = "snakemake-interface-report-plugins" +version = "1.3.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "snakemake-interface-common" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/18/d6/6160ed98de665d6871dd356597dbf726688cc786e88668359ca37b7d9f54/snakemake_interface_report_plugins-1.3.0.tar.gz", hash = "sha256:fc9495298bec4e69721ab8afe6d6d88a86966fda2eeb003db56b9a88b86d5934", size = 4283, upload-time = "2025-10-31T10:52:36.55Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f0/f0/df73f6abc9b5910e43612ae28c7b6f666af80c4edd46a216ef47599ab6cb/snakemake_interface_report_plugins-1.3.0-py3-none-any.whl", hash = "sha256:78da3931f70e79eef51e5645a40b172929e555fe4d86ff45d6b856e521a379db", size = 7251, upload-time = "2025-10-31T10:52:35.474Z" }, +] + +[[package]] +name = "snakemake-interface-scheduler-plugins" +version = "2.0.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "snakemake-interface-common" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/88/d9/d480807d2cfc2d132bc760d877d45ec8fbe620a24200ec4d2697c4a26031/snakemake_interface_scheduler_plugins-2.0.2.tar.gz", hash = "sha256:2797e8fa9019d983132c2b403f14d6fcd3c5ad4c8d8a66b984b4740a71cacc46", size = 8642, upload-time = "2025-10-20T13:58:12.988Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0e/d0/f4e9894c8aaf37efe3bf1afe15ee3cf0546d82b2713a589e266ee47bf2ef/snakemake_interface_scheduler_plugins-2.0.2-py3-none-any.whl", hash = "sha256:b9ddfa508bd480711de1770dfb24f3b813cfa3cd0f862f0127ef721ae5346915", size = 10766, upload-time = "2025-10-20T13:58:11.898Z" }, +] + +[[package]] +name = "snakemake-interface-storage-plugins" +version = "4.3.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "reretry" }, + { name = "snakemake-interface-common" }, + { name = "throttler" }, + { name = "wrapt" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/d4/0c/906d09e4e99733b605a5b24b03fcdbe40c47787c770aea42421f225f9171/snakemake_interface_storage_plugins-4.3.2.tar.gz", hash = "sha256:2f45c6b784e2af5b6e7102d3cb700d597b7cf7515fcf02d7d1153065e90a7895", size = 14543, upload-time = "2025-12-01T13:30:30.754Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/80/7e/51e4d50494725c77116fc3978879babe1a15336d9b144bba061ec968e02a/snakemake_interface_storage_plugins-4.3.2-py3-none-any.whl", hash = "sha256:bd185233cb7882a58d79294ad2f8d1cead535744fe3c9d42d9ef51bc8f1744b1", size = 17800, upload-time = "2025-12-01T13:30:29.699Z" }, +] + [[package]] name = "sniffio" version = "1.3.1" @@ -6239,6 +6473,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/a2/09/77d55d46fd61b4a135c444fc97158ef34a095e5681d0a6c10b75bf356191/sympy-1.14.0-py3-none-any.whl", hash = "sha256:e091cc3e99d2141a0ba2847328f5479b05d94a6635cb96148ccb3f34671bd8f5", size = 6299353, upload-time = "2025-04-27T18:04:59.103Z" }, ] +[[package]] +name = "tabulate" +version = "0.9.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/ec/fe/802052aecb21e3797b8f7902564ab6ea0d60ff8ca23952079064155d1ae1/tabulate-0.9.0.tar.gz", hash = "sha256:0095b12bf5966de529c0feb1fa08671671b3368eec77d7ef7ab114be2c068b3c", size = 81090, upload-time = "2022-10-06T17:21:48.54Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/40/44/4a5f08c96eb108af5cb50b41f76142f0afa346dfa99d5296fe7202a11854/tabulate-0.9.0-py3-none-any.whl", hash = "sha256:024ca478df22e9340661486f85298cff5f6dcdba14f3813e8830015b9ed1948f", size = 35252, upload-time = "2022-10-06T17:21:44.262Z" }, +] + [[package]] name = "templateflow" version = "25.1.1" @@ -6335,6 +6578,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/32/d5/f9a850d79b0851d1d4ef6456097579a9005b31fea68726a4ae5f2d82ddd9/threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb", size = 18638, upload-time = "2025-03-13T13:49:21.846Z" }, ] +[[package]] +name = "throttler" +version = "1.2.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/b4/22/638451122136d5280bc477c8075ea448b9ebdfbd319f0f120edaecea2038/throttler-1.2.2.tar.gz", hash = "sha256:d54db406d98e1b54d18a9ba2b31ab9f093ac64a0a59d730c1cf7bb1cdfc94a58", size = 7970, upload-time = "2022-11-22T19:08:57.847Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/df/d4/36bf6010b184286000b2334622bfb3446a40c22c1d2a9776bff025cb0fe5/throttler-1.2.2-py3-none-any.whl", hash = "sha256:fc6ae612a2529e01110b32335af40375258b98e3b81232ec77cd07f51bf71392", size = 7609, upload-time = "2022-11-22T19:08:55.699Z" }, +] + [[package]] name = "tifffile" version = "2025.12.20" @@ -6936,6 +7188,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/73/ae/b48f95715333080afb75a4504487cbe142cae1268afc482d06692d605ae6/yarl-1.22.0-py3-none-any.whl", hash = "sha256:1380560bdba02b6b6c90de54133c81c9f2a453dee9912fe58c1dcced1edb7cff", size = 46814, upload-time = "2025-10-06T14:12:53.872Z" }, ] +[[package]] +name = "yte" +version = "1.9.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "argparse-dataclass" }, + { name = "dpath" }, + { name = "pyyaml" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/44/f5/7e44620e6e077bfe624b9a17c329b8e0d0159e176e1f1a93c2790428ab2c/yte-1.9.4.tar.gz", hash = "sha256:86a47e6d722cec9419a7ac88be57d0d6c4ce28f02860393b71a66f2c674069f6", size = 8101, upload-time = "2025-11-27T12:55:00.85Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/4d/63/6a44729fdc60eb255a7b156a84e7552290174a9bf151e3b6c18e83d6fbfa/yte-1.9.4-py3-none-any.whl", hash = "sha256:5dac63303d3e6bc2ebadc36ece3c3fb09343772fe6e25e9356d9baf8f9dfaf6d", size = 10618, upload-time = "2025-11-27T12:55:01.685Z" }, +] + [[package]] name = "zarr" version = "3.1.5" From 4e455a28a673a1414776b0f15d3b3e73047f433a Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 09:05:45 -0800 Subject: [PATCH 65/87] Extract Prefect workflow params to config file, update output folders MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Create prefect_workflow/config/config.yaml with all workflow parameters - Add load_config() function to flows.py for YAML config loading - Update run_workflow() and analyze_single_cell_type() to accept config_path - Add --config CLI argument to run_workflow.py - Change Prefect output folder from "workflow" to "wf_prefect" - Change Snakemake output folder from "workflow" to "wf_snakemake" - Update WORKFLOW_OVERVIEW.md to reflect new output structure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 12 ++ .../prefect_workflow/config/config.yaml | 75 +++++++++ .../rnaseq/prefect_workflow/flows.py | 144 ++++++++++-------- .../rnaseq/prefect_workflow/run_workflow.py | 43 +++--- .../rnaseq/snakemake_workflow/Snakefile | 10 +- .../snakemake_workflow/WORKFLOW_OVERVIEW.md | 12 +- 6 files changed, 206 insertions(+), 90 deletions(-) create mode 100644 src/BetterCodeBetterScience/rnaseq/prefect_workflow/config/config.yaml diff --git a/problems_to_solve.md b/problems_to_solve.md index cca966d..ee73576 100644 --- a/problems_to_solve.md +++ b/problems_to_solve.md @@ -3,6 +3,18 @@ Open problems marked with [ ] Fixed problems marked with [x] +[x] For the Prefect workflow, the default parameters for each workflow module are embedded in the python code for the workflow. I would rather that they be defined using a configuration file. Please extract all of the parameters into a configuration file (using whatever format you think is most appropriate) and read those in during workflow execution rather than hard-coding. + - Created `prefect_workflow/config/config.yaml` with all workflow parameters + - Parameters organized by step: filtering, qc, preprocessing, dimred, clustering, pseudobulk, differential_expression, pathway_analysis, overrepresentation, predictive_modeling + - Added `load_config()` function to flows.py that loads from YAML file + - Updated `run_workflow()` and `analyze_single_cell_type()` to accept `config_path` parameter + - Added `--config` CLI argument to run_workflow.py + - Default config bundled with package; custom configs can be specified via CLI +[x] For the Prefect workflow, please save the output to a folder called "wf_prefect" (rather than "workflow") + - Updated all output directories in flows.py and run_workflow.py to use `wf_prefect/` instead of `workflow/` +[x] For the Snakemake workflow, please save the output to a folder called "wf_snakemake" (rather than "workflow") + - Updated Snakefile to use `wf_snakemake/` for CHECKPOINT_DIR, RESULTS_DIR, FIGURE_DIR, LOG_DIR + - Updated WORKFLOW_OVERVIEW.md to reflect new output structure [x] I would now like to add another workflow, with code saved to src/BetterCodeBetterScience/rnaseq/snakemake_workflow. This workflow will use the Snakemake workflow manager (https://snakemake.readthedocs.io/en/stable/index.html); otherwise it should be functionally equivalent to the other workflows already developed. - Created `snakemake_workflow/` directory with: diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/config/config.yaml b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/config/config.yaml new file mode 100644 index 0000000..e4043e2 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/config/config.yaml @@ -0,0 +1,75 @@ +# Configuration for Prefect scRNA-seq immune aging workflow +# +# Usage: +# python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow --config /path/to/config.yaml +# +# Override any parameter via CLI: +# python ... --dataset-name MyDataset --force-from 5 + +# Dataset configuration +dataset_name: "OneK1K" +url: "https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad" + +# Step 2: Filtering parameters +filtering: + cutoff_percentile: 1.0 + min_cells_per_celltype: 10 + percent_donors: 0.95 + +# Step 3: QC parameters +qc: + min_genes: 200 + max_genes: 6000 + min_counts: 500 + max_counts: 30000 + max_hb_pct: 5.0 + expected_doublet_rate: 0.06 + +# Step 4: Preprocessing parameters +preprocessing: + target_sum: 10000 + n_top_genes: 3000 + batch_key: "donor_id" + +# Step 5: Dimensionality reduction parameters +dimred: + batch_key: "donor_id" + n_neighbors: 30 + n_pcs: 40 + +# Step 6: Clustering parameters +clustering: + resolution: 1.0 + +# Step 7: Pseudobulking parameters +pseudobulk: + group_col: "cell_type" + donor_col: "donor_id" + metadata_cols: + - "development_stage" + - "sex" + min_cells: 10 + +# Steps 8-11: Per-cell-type analysis parameters +differential_expression: + design_factors: + - "age_scaled" + - "sex" + n_cpus: 8 + +pathway_analysis: + gene_sets: + - "MSigDB_Hallmark_2020" + n_top: 10 + +overrepresentation: + gene_sets: + - "MSigDB_Hallmark_2020" + padj_threshold: 0.05 + n_top: 10 + +predictive_modeling: + n_splits: 5 + +# Minimum samples per cell type for per-cell-type analysis +min_samples_per_cell_type: 10 diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py index d57a9f7..ef40fdc 100644 --- a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/flows.py @@ -8,6 +8,7 @@ from pathlib import Path from typing import Any +import yaml from prefect import flow, get_run_logger from BetterCodeBetterScience.rnaseq.prefect_workflow.tasks import ( @@ -33,6 +34,31 @@ ) +def get_default_config_path() -> Path: + """Get the path to the default config file bundled with the package.""" + return Path(__file__).parent / "config" / "config.yaml" + + +def load_config(config_path: Path | None = None) -> dict[str, Any]: + """Load workflow configuration from YAML file. + + Parameters + ---------- + config_path : Path, optional + Path to config file. If None, uses the default config bundled with the package. + + Returns + ------- + dict + Configuration dictionary + """ + if config_path is None: + config_path = get_default_config_path() + + with open(config_path) as f: + return yaml.safe_load(f) + + def setup_file_logging(log_dir: Path) -> tuple[Path, logging.FileHandler]: """Set up file-based logging for the workflow. @@ -73,10 +99,8 @@ def setup_file_logging(log_dir: Path) -> tuple[Path, logging.FileHandler]: @flow(name="immune_aging_scrna_workflow", log_prints=False) def run_workflow( datadir: Path, - dataset_name: str = "OneK1K", - url: str = "https://datasets.cellxgene.cziscience.com/a3f5651f-cd1a-4d26-8165-74964b79b4f2.h5ad", + config_path: Path | None = None, force_from_step: int | None = None, - min_samples_per_cell_type: int = 10, ) -> dict[str, Any]: """Run the complete immune aging scRNA-seq workflow with Prefect. @@ -86,14 +110,10 @@ def run_workflow( ---------- datadir : Path Base directory for data files - dataset_name : str - Name of the dataset - url : str - URL to download data from + config_path : Path, optional + Path to config file. If None, uses the default config bundled with the package. force_from_step : int, optional If provided, forces re-run from this step onwards - min_samples_per_cell_type : int - Minimum samples required per cell type to run steps 8-11 Returns ------- @@ -102,17 +122,23 @@ def run_workflow( """ logger = get_run_logger() - # Setup directories - figure_dir = datadir / "workflow/figures" + # Load configuration + config = load_config(config_path) + dataset_name = config["dataset_name"] + url = config["url"] + min_samples_per_cell_type = config["min_samples_per_cell_type"] + + # Setup directories (using wf_prefect folder) + figure_dir = datadir / "wf_prefect/figures" figure_dir.mkdir(parents=True, exist_ok=True) - checkpoint_dir = datadir / "workflow/checkpoints" + checkpoint_dir = datadir / "wf_prefect/checkpoints" checkpoint_dir.mkdir(parents=True, exist_ok=True) - results_dir = datadir / "workflow/results/per_cell_type" + results_dir = datadir / "wf_prefect/results/per_cell_type" results_dir.mkdir(parents=True, exist_ok=True) - log_dir = datadir / "workflow/logs" + log_dir = datadir / "wf_prefect/logs" log_dir.mkdir(parents=True, exist_ok=True) # Set up file logging @@ -163,11 +189,7 @@ def run_workflow( logger.info("STEP 2: DATA FILTERING") logger.info("=" * 60) - step2_params = { - "cutoff_percentile": 1.0, - "min_cells_per_celltype": 10, - "percent_donors": 0.95, - } + step2_params = config["filtering"] step_record = execution_log.add_step( step_number=2, step_name="data_filtering", @@ -198,14 +220,7 @@ def run_workflow( logger.info("STEP 3: QUALITY CONTROL") logger.info("=" * 60) - step3_params = { - "min_genes": 200, - "max_genes": 6000, - "min_counts": 500, - "max_counts": 30000, - "max_hb_pct": 5.0, - "expected_doublet_rate": 0.06, - } + step3_params = config["qc"] step_record = execution_log.add_step( step_number=3, step_name="quality_control", @@ -234,11 +249,7 @@ def run_workflow( logger.info("STEP 4: PREPROCESSING") logger.info("=" * 60) - step4_params = { - "target_sum": 1e4, - "n_top_genes": 3000, - "batch_key": "donor_id", - } + step4_params = config["preprocessing"] step_record = execution_log.add_step( step_number=4, step_name="preprocessing", @@ -265,11 +276,7 @@ def run_workflow( logger.info("STEP 5: DIMENSIONALITY REDUCTION") logger.info("=" * 60) - step5_params = { - "batch_key": "donor_id", - "n_neighbors": 30, - "n_pcs": 40, - } + step5_params = config["dimred"] step_record = execution_log.add_step( step_number=5, step_name="dimensionality_reduction", @@ -297,7 +304,7 @@ def run_workflow( logger.info("STEP 6: CLUSTERING") logger.info("=" * 60) - step6_params = {"resolution": 1.0} + step6_params = config["clustering"] step_record = execution_log.add_step( step_number=6, step_name="clustering", @@ -323,12 +330,7 @@ def run_workflow( logger.info("STEP 7: PSEUDOBULKING") logger.info("=" * 60) - step7_params = { - "group_col": "cell_type", - "donor_col": "donor_id", - "metadata_cols": ["development_stage", "sex"], - "min_cells": 10, - } + step7_params = config["pseudobulk"] step_record = execution_log.add_step( step_number=7, step_name="pseudobulking", @@ -393,15 +395,21 @@ def run_workflow( for i, cell_type in enumerate(valid_cell_types): logger.info(f"\n[{i + 1}/{len(valid_cell_types)}] Processing: {cell_type}") + # Get config for per-cell-type steps + de_config = config["differential_expression"] + gsea_config = config["pathway_analysis"] + enrichr_config = config["overrepresentation"] + pred_config = config["predictive_modeling"] + # Log combined steps 8-11 for this cell type step_record = execution_log.add_step( step_number=8, step_name=f"per_cell_type_analysis ({cell_type})", parameters=serialize_parameters( cell_type=cell_type, - design_factors=["age_scaled", "sex"], - gene_sets=["MSigDB_Hallmark_2020"], - n_splits=5, + design_factors=de_config["design_factors"], + gene_sets=gsea_config["gene_sets"], + n_splits=pred_config["n_splits"], ), ) @@ -412,8 +420,8 @@ def run_workflow( cell_type=cell_type, var_to_feature=var_to_feature, output_dir=results_dir, - design_factors=["age_scaled", "sex"], - n_cpus=8, + design_factors=de_config["design_factors"], + n_cpus=de_config["n_cpus"], ) # Get metadata for this cell type (for predictive modeling) @@ -430,8 +438,8 @@ def run_workflow( de_results=de_result["de_results"], cell_type=cell_type, output_dir=results_dir, - gene_sets=["MSigDB_Hallmark_2020"], - n_top=10, + gene_sets=gsea_config["gene_sets"], + n_top=gsea_config["n_top"], ) # Step 10: Overrepresentation Analysis (Enrichr) @@ -439,9 +447,9 @@ def run_workflow( de_results=de_result["de_results"], cell_type=cell_type, output_dir=results_dir, - gene_sets=["MSigDB_Hallmark_2020"], - padj_threshold=0.05, - n_top=10, + gene_sets=enrichr_config["gene_sets"], + padj_threshold=enrichr_config["padj_threshold"], + n_top=enrichr_config["n_top"], ) # Step 11: Predictive Modeling @@ -450,7 +458,7 @@ def run_workflow( metadata=metadata_ct, cell_type=cell_type, output_dir=results_dir, - n_splits=5, + n_splits=pred_config["n_splits"], ) all_results["per_cell_type"][cell_type] = { @@ -515,7 +523,7 @@ def run_workflow( def analyze_single_cell_type( datadir: Path, cell_type: str, - dataset_name: str = "OneK1K", + config_path: Path | None = None, ) -> dict[str, Any]: """Run analysis for a single cell type (useful for debugging/testing). @@ -527,8 +535,8 @@ def analyze_single_cell_type( Base directory for data files cell_type : str Cell type to analyze - dataset_name : str - Name of the dataset + config_path : Path, optional + Path to config file. If None, uses the default config bundled with the package. Returns ------- @@ -537,8 +545,16 @@ def analyze_single_cell_type( """ logger = get_run_logger() - checkpoint_dir = datadir / "workflow/checkpoints" - results_dir = datadir / "workflow/results/per_cell_type" + # Load configuration + config = load_config(config_path) + dataset_name = config["dataset_name"] + de_config = config["differential_expression"] + gsea_config = config["pathway_analysis"] + enrichr_config = config["overrepresentation"] + pred_config = config["predictive_modeling"] + + checkpoint_dir = datadir / "wf_prefect/checkpoints" + results_dir = datadir / "wf_prefect/results/per_cell_type" results_dir.mkdir(parents=True, exist_ok=True) # Load required checkpoints @@ -567,6 +583,8 @@ def analyze_single_cell_type( cell_type=cell_type, var_to_feature=var_to_feature, output_dir=results_dir, + design_factors=de_config["design_factors"], + n_cpus=de_config["n_cpus"], ) # Get metadata @@ -583,12 +601,17 @@ def analyze_single_cell_type( de_results=de_result["de_results"], cell_type=cell_type, output_dir=results_dir, + gene_sets=gsea_config["gene_sets"], + n_top=gsea_config["n_top"], ) enrichr_result = overrepresentation_task( de_results=de_result["de_results"], cell_type=cell_type, output_dir=results_dir, + gene_sets=enrichr_config["gene_sets"], + padj_threshold=enrichr_config["padj_threshold"], + n_top=enrichr_config["n_top"], ) prediction_result = predictive_modeling_task( @@ -596,6 +619,7 @@ def analyze_single_cell_type( metadata=metadata_ct, cell_type=cell_type, output_dir=results_dir, + n_splits=pred_config["n_splits"], ) return { diff --git a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py index bc5e8ff..6247b55 100644 --- a/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py +++ b/src/BetterCodeBetterScience/rnaseq/prefect_workflow/run_workflow.py @@ -5,6 +5,9 @@ Or with arguments: python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow --force-from 8 + +With custom config: + python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow --config /path/to/config.yaml """ import argparse @@ -15,6 +18,7 @@ from BetterCodeBetterScience.rnaseq.prefect_workflow.flows import ( analyze_single_cell_type, + load_config, run_workflow, ) @@ -31,10 +35,11 @@ def main(): help="Base directory for data files (default: from DATADIR env var)", ) parser.add_argument( - "--dataset-name", - type=str, - default="OneK1K", - help="Name of the dataset (default: OneK1K)", + "--config", + type=Path, + default=None, + dest="config_path", + help="Path to config file (default: uses bundled config/config.yaml)", ) parser.add_argument( "--force-from", @@ -43,13 +48,6 @@ def main(): dest="force_from_step", help="Force re-run from this step onwards (1-11)", ) - parser.add_argument( - "--min-samples", - type=int, - default=10, - dest="min_samples", - help="Minimum samples per cell type for steps 8-11 (default: 10)", - ) parser.add_argument( "--cell-type", type=str, @@ -69,6 +67,16 @@ def main(): # Load environment variables load_dotenv() + # Load configuration + config = load_config(args.config_path) + dataset_name = config["dataset_name"] + min_samples = config["min_samples_per_cell_type"] + + if args.config_path: + print(f"Using config file: {args.config_path}") + else: + print("Using default bundled config") + # Get data directory if args.datadir is not None: datadir = args.datadir @@ -90,9 +98,9 @@ def main(): load_checkpoint, ) - checkpoint_dir = datadir / "workflow/checkpoints" + checkpoint_dir = datadir / "wf_prefect/checkpoints" pb_checkpoint = checkpoint_dir / bids_checkpoint_name( - args.dataset_name, 7, "pseudobulk" + dataset_name, 7, "pseudobulk" ) if not pb_checkpoint.exists(): @@ -108,9 +116,7 @@ def main(): print("-" * 60) for ct in sorted(cell_types): count = cell_type_counts[ct] - status = ( - "OK" if count >= args.min_samples else f"< {args.min_samples} samples" - ) + status = "OK" if count >= min_samples else f"< {min_samples} samples" print(f" {ct}: {count} samples ({status})") return @@ -120,7 +126,7 @@ def main(): results = analyze_single_cell_type( datadir=datadir, cell_type=args.cell_type, - dataset_name=args.dataset_name, + config_path=args.config_path, ) print("\nResults:") print(f" DE genes: {len(results['de']['de_results'])}") @@ -141,9 +147,8 @@ def main(): results = run_workflow( datadir=datadir, - dataset_name=args.dataset_name, + config_path=args.config_path, force_from_step=args.force_from_step, - min_samples_per_cell_type=args.min_samples, ) # Print summary diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile index 87b617b..7dcda81 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile @@ -53,11 +53,11 @@ if config.get("datadir") is None: DATADIR = Path(config["datadir"]) DATASET = config["dataset_name"] -# Derived paths -CHECKPOINT_DIR = DATADIR / "workflow" / "checkpoints" -RESULTS_DIR = DATADIR / "workflow" / "results" -FIGURE_DIR = DATADIR / "workflow" / "figures" -LOG_DIR = DATADIR / "workflow" / "logs" +# Derived paths (using wf_snakemake folder) +CHECKPOINT_DIR = DATADIR / "wf_snakemake" / "checkpoints" +RESULTS_DIR = DATADIR / "wf_snakemake" / "results" +FIGURE_DIR = DATADIR / "wf_snakemake" / "figures" +LOG_DIR = DATADIR / "wf_snakemake" / "logs" # Include modular rule files diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md index ec4f27a..ab08ea9 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md @@ -20,13 +20,13 @@ The workflow is divided into two phases: │ Step 1: Download ──► Step 2: Filter ──► Step 3: QC ──► Step 4: Preprocess │ │ │ │ ▼ │ -│ Step 7: Pseudobulk ◄── Step 6: Cluster ◄── Step 5: DimRed +│ Step 7: Pseudobulk ◄── Step 6: Cluster ◄── Step 5: DimRed. | │ │ │ └──────────────────────────────┼──────────────────────────────────────────┘ │ ▼ (discovers N cell types) ┌──────────────────────────────────────────────────────────────────────────┐ -│ PER-CELL-TYPE STEPS (8-11) │ +│ PER-CELL-TYPE STEPS (8-11) │ │ Runs in parallel for each cell type │ ├──────────────────────────────────────────────────────────────────────────┤ │ │ @@ -34,11 +34,11 @@ The workflow is divided into two phases: │ │ │ Step 8: Differential Expression │ │ │ │ -│ ├──► Step 9: GSEA (Pathway Analysis) │ +│ ├──► Step 9: GSEA (Pathway Analysis) │ │ │ │ -│ ├──► Step 10: Enrichr (Overrepresentation) │ +│ ├──► Step 10: Enrichr (Overrepresentation) │ │ │ │ -│ └──► Step 11: Predictive Modeling (Age Prediction) │ +│ └──► Step 11: Predictive Modeling (Age Prediction) │ │ │ └──────────────────────────────────────────────────────────────────────────┘ ``` @@ -314,7 +314,7 @@ The workflow is divided into two phases: ## Output Structure ``` -{datadir}/workflow/ +{datadir}/wf_snakemake/ ├── checkpoints/ │ ├── dataset-{name}_step-02_desc-filtered.h5ad │ ├── dataset-{name}_step-03_desc-qc.h5ad From bb87992178aa979bb826180e6d47923a5496b57b Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 09:10:55 -0800 Subject: [PATCH 66/87] Add workflow documentation with usage examples MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Create RUNNING_WORKFLOWS.md covering all four workflow implementations: - Monolithic: single script, no checkpointing - Modular: reusable functions, basic skip functionality - Stateless: BIDS checkpointing, execution logging - Prefect: config file, all cell types, orchestration - Snakemake: dynamic rules, parallel execution Includes prerequisites, CLI options, configuration examples, output locations, and comparison table. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/RUNNING_WORKFLOWS.md | 441 ++++++++++++++++++ 1 file changed, 441 insertions(+) create mode 100644 src/BetterCodeBetterScience/rnaseq/RUNNING_WORKFLOWS.md diff --git a/src/BetterCodeBetterScience/rnaseq/RUNNING_WORKFLOWS.md b/src/BetterCodeBetterScience/rnaseq/RUNNING_WORKFLOWS.md new file mode 100644 index 0000000..64b9c96 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/RUNNING_WORKFLOWS.md @@ -0,0 +1,441 @@ +# Running the scRNA-seq Immune Aging Workflows + +This document provides examples of how to run each of the four workflow implementations for the immune aging scRNA-seq analysis. + +All workflows perform the same 11-step analysis pipeline: +1. Data Download +2. Data Filtering +3. Quality Control +4. Preprocessing +5. Dimensionality Reduction +6. Clustering +7. Pseudobulking +8. Differential Expression (per cell type) +9. Pathway Analysis / GSEA (per cell type) +10. Overrepresentation Analysis / Enrichr (per cell type) +11. Predictive Modeling (per cell type) + +--- + +## Prerequisites + +### Environment Setup + +All workflows require the `DATADIR` environment variable to be set, pointing to the base data directory. The workflows will create an `immune_aging/` subdirectory within this path. + +```bash +# Option 1: Set environment variable directly +export DATADIR=/path/to/your/data + +# Option 2: Create a .env file in your working directory +echo "DATADIR=/path/to/your/data" > .env +``` + +### Install Dependencies + +```bash +# From the repository root +uv pip install -e . +``` + +--- + +## 1. Monolithic Workflow + +The monolithic workflow is a single Python script that runs all analysis steps sequentially. It's the simplest implementation but lacks checkpointing and resumability. + +**Location:** `immune_scrnaseq_monolithic.py` + +### Running as a Script + +```bash +# Edit the datadir path in the script first, then run: +python src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py +``` + +### Running as a Jupyter Notebook + +The script uses jupytext format and can be opened directly in Jupyter: + +```bash +jupyter notebook src/BetterCodeBetterScience/rnaseq/immune_scrnaseq_monolithic.py +``` + +### Output Location + +``` +{datadir}/workflow/figures/ +``` + +### Notes + +- No checkpointing - must run from start each time +- Analyzes only one cell type (hardcoded in script) +- Best for understanding the analysis pipeline + +--- + +## 2. Modular Workflow + +The modular workflow uses reusable pipeline functions organized by analysis step. It provides better code organization than the monolithic version but still lacks robust checkpointing. + +**Location:** `modular_workflow/run_workflow.py` + +### Running the Workflow + +```bash +# Using environment variable +export DATADIR=/path/to/your/data +python -m BetterCodeBetterScience.rnaseq.modular_workflow.run_workflow + +# Or import and run programmatically +python -c " +from pathlib import Path +from BetterCodeBetterScience.rnaseq.modular_workflow.run_workflow import run_full_workflow + +datadir = Path('/path/to/your/data/immune_aging') +results = run_full_workflow(datadir) +" +``` + +### Available Options + +The `run_full_workflow()` function accepts these parameters: + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `datadir` | Required | Base directory for data files | +| `dataset_name` | "OneK1K" | Name of the dataset | +| `url` | CELLxGENE URL | Source URL for the h5ad file | +| `cell_type_for_de` | "central memory CD4-positive, alpha-beta T cell" | Cell type for differential expression | +| `skip_download` | False | Skip data download if file exists | +| `skip_filtering` | False | Load pre-filtered data | +| `skip_qc` | False | Load post-QC data | + +### Output Location + +``` +{datadir}/workflow/figures/ +``` + +### Notes + +- Basic skip functionality for early steps +- Analyzes only one cell type +- Outputs figures only (no result files saved) + +--- + +## 3. Stateless Workflow (with Checkpointing) + +The stateless workflow adds robust checkpointing using BIDS-compliant naming. It can resume from any step and tracks execution history. + +**Location:** `stateless_workflow/run_workflow.py` + +### Running the Workflow + +```bash +# Basic run (resumes from last checkpoint automatically) +export DATADIR=/path/to/your/data +python -m BetterCodeBetterScience.rnaseq.stateless_workflow.run_workflow +``` + +### Force Re-run from a Specific Step + +```python +from pathlib import Path +from BetterCodeBetterScience.rnaseq.stateless_workflow.run_workflow import ( + run_stateless_workflow, + print_checkpoint_status, +) + +datadir = Path('/path/to/your/data/immune_aging') + +# Check current checkpoint status +print_checkpoint_status(datadir) + +# Force re-run from step 5 onwards +results = run_stateless_workflow(datadir, force_from_step=5) +``` + +### Available Options + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `datadir` | Required | Base directory for data files | +| `dataset_name` | "OneK1K" | Name of the dataset | +| `url` | CELLxGENE URL | Source URL for the h5ad file | +| `cell_type_for_de` | "central memory CD4-positive, alpha-beta T cell" | Cell type for differential expression | +| `force_from_step` | None | Clear checkpoints from this step onwards | +| `checkpoint_steps` | {2,3,5,8,9,10,11} | Which steps to save checkpoints for | + +### Utility Functions + +```python +from BetterCodeBetterScience.rnaseq.stateless_workflow.run_workflow import ( + list_checkpoints, + print_checkpoint_status, + list_execution_logs, + load_execution_log, +) + +# List all checkpoints +checkpoints = list_checkpoints(datadir) + +# Print checkpoint status with file sizes +print_checkpoint_status(datadir) + +# View execution history +logs = list_execution_logs(datadir) +if logs: + log = load_execution_log(logs[0]) # Load most recent + log.print_summary() +``` + +### Output Location + +``` +{datadir}/workflow/ +├── checkpoints/ # BIDS-named checkpoint files +├── figures/ # Visualization outputs +└── logs/ # Execution logs (JSON) +``` + +### Notes + +- Automatic checkpointing and resumption +- Execution logging with timing information +- Analyzes only one cell type +- Step 3 checkpoint required for pseudobulking + +--- + +## 4. Prefect Workflow + +The Prefect workflow uses the Prefect orchestration framework and analyzes all cell types in parallel. + +**Location:** `prefect_workflow/run_workflow.py` + +### Running the Workflow + +```bash +# Basic run with default config +python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow --datadir /path/to/data/immune_aging + +# With custom config file +python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow \ + --datadir /path/to/data/immune_aging \ + --config /path/to/custom_config.yaml + +# Force re-run from step 8 +python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow \ + --datadir /path/to/data/immune_aging \ + --force-from 8 +``` + +### List Available Cell Types + +```bash +python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow \ + --datadir /path/to/data/immune_aging \ + --list-cell-types +``` + +### Analyze a Single Cell Type + +```bash +python -m BetterCodeBetterScience.rnaseq.prefect_workflow.run_workflow \ + --datadir /path/to/data/immune_aging \ + --cell-type "central memory CD4-positive, alpha-beta T cell" +``` + +### CLI Options + +| Option | Description | +|--------|-------------| +| `--datadir` | Base directory for data files | +| `--config` | Path to custom config YAML file | +| `--force-from` | Force re-run from this step onwards (1-11) | +| `--cell-type` | Analyze a single cell type only | +| `--list-cell-types` | List available cell types and exit | + +### Configuration File + +The default configuration is in `prefect_workflow/config/config.yaml`. You can create a custom config to override any parameters: + +```yaml +# Custom config example +dataset_name: "MyDataset" + +filtering: + cutoff_percentile: 2.0 + min_cells_per_celltype: 20 + +qc: + min_genes: 300 + max_genes: 5000 + +differential_expression: + n_cpus: 16 + +min_samples_per_cell_type: 20 +``` + +### Output Location + +``` +{datadir}/wf_prefect/ +├── checkpoints/ # BIDS-named checkpoint files +├── figures/ # Visualization outputs +├── results/per_cell_type/ # Per-cell-type analysis results +│ └── {cell_type}/ +│ ├── stat_res.pkl +│ ├── de_results.parquet +│ ├── counts.parquet +│ ├── gsea_results.pkl +│ ├── enrichr_up.pkl +│ ├── enrichr_down.pkl +│ └── prediction_results.pkl +└── logs/ # Execution logs +``` + +### Notes + +- Analyzes ALL cell types (not just one) +- Configuration via YAML file +- Results saved per cell type +- Uses Prefect for orchestration and logging + +--- + +## 5. Snakemake Workflow + +The Snakemake workflow uses the Snakemake workflow management system with dynamic rule generation for per-cell-type analysis. + +**Location:** `snakemake_workflow/Snakefile` + +### Running the Workflow + +```bash +cd src/BetterCodeBetterScience/rnaseq/snakemake_workflow + +# Run full workflow +snakemake --cores 16 --config datadir=/path/to/data/immune_aging + +# Dry run (see what would be executed) +snakemake -n --config datadir=/path/to/data/immune_aging + +# Run only preprocessing (steps 1-6) +snakemake --cores 16 preprocessing_only --config datadir=/path/to/data/immune_aging + +# Run only through pseudobulking (steps 1-7) +snakemake --cores 16 pseudobulk_only --config datadir=/path/to/data/immune_aging +``` + +### Force Re-run from a Specific Rule + +```bash +# Force re-run from dimensionality reduction +snakemake --cores 16 --forcerun dimensionality_reduction \ + --config datadir=/path/to/data/immune_aging + +# Force re-run from a specific cell type's DE +snakemake --cores 16 --forcerun differential_expression \ + --config datadir=/path/to/data/immune_aging +``` + +### Configuration + +Configuration is in `snakemake_workflow/config/config.yaml`. Override any parameter via command line: + +```bash +snakemake --cores 16 --config \ + datadir=/path/to/data/immune_aging \ + dataset_name=MyDataset \ + min_samples_per_cell_type=20 +``` + +### Generate Workflow Visualization + +```bash +# Generate rule graph +snakemake --rulegraph --config datadir=/path/to/data/immune_aging | dot -Tpng > rulegraph.png + +# Generate DAG for specific run +snakemake --dag --config datadir=/path/to/data/immune_aging | dot -Tpng > dag.png +``` + +### Output Location + +``` +{datadir}/wf_snakemake/ +├── checkpoints/ # BIDS-named checkpoint files +├── figures/ # Visualization outputs +├── results/ +│ ├── per_cell_type/ # Per-cell-type analysis results +│ │ └── {cell_type}/ +│ │ ├── stat_res.pkl +│ │ ├── de_results.parquet +│ │ ├── counts.parquet +│ │ ├── gsea_results.pkl +│ │ ├── enrichr_up.pkl +│ │ ├── enrichr_down.pkl +│ │ └── prediction_results.pkl +│ └── workflow_complete.txt +└── logs/ # Step logs +``` + +### Notes + +- Uses Snakemake checkpoint mechanism for dynamic cell type discovery +- Analyzes ALL cell types +- Automatic dependency tracking and parallel execution +- See `WORKFLOW_OVERVIEW.md` for detailed step documentation + +--- + +## Workflow Comparison + +| Feature | Monolithic | Modular | Stateless | Prefect | Snakemake | +|---------|------------|---------|-----------|---------|-----------| +| Checkpointing | No | Limited | Yes | Yes | Yes | +| Resume from step | No | Limited | Yes | Yes | Yes | +| All cell types | No | No | No | Yes | Yes | +| Config file | No | No | No | Yes | Yes | +| Execution logs | No | No | Yes | Yes | Yes | +| Parallel execution | No | No | No | Sequential | Yes | +| Output folder | workflow | workflow | workflow | wf_prefect | wf_snakemake | + +--- + +## Troubleshooting + +### Memory Issues + +The dataset is large (~1.2M cells). If you encounter memory issues: + +1. Use a machine with at least 64GB RAM +2. For Prefect/Snakemake workflows, reduce `--cores` to limit parallel jobs +3. For the stateless workflow, ensure checkpoints are saved to reduce memory pressure + +### Missing Dependencies + +```bash +# Install all dependencies +uv pip install -e ".[dev]" + +# For Snakemake specifically +uv pip install snakemake>=8.0 +``` + +### Environment Variable Not Set + +If you see "DATADIR environment variable not set": + +```bash +# Set it for your session +export DATADIR=/path/to/your/data + +# Or create .env file +echo "DATADIR=/path/to/your/data" > .env +``` From 487262374c6f7b150846188f71a09b41ca8deef5 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 09:33:11 -0800 Subject: [PATCH 67/87] clean up discussion of engines --- book/workflows.md | 68 +++++++---------------------------------------- 1 file changed, 9 insertions(+), 59 deletions(-) diff --git a/book/workflows.md b/book/workflows.md index 0646b96..041d1f1 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -288,7 +288,7 @@ The use of DAGs to represent workflows provides a number of important benefits: - If one node of the graph changes, the engine can identify which downstream nodes need to be rerun - If a node fails, the engine can continue with executing the nodes that don't depend on the failed node either directly or indirectly -Another benefit of using a workflow engine is that they generally deal automatically with checkpointing and caching of intermediate results. +There are a couple of additional benefits to using a workflow engine. The first is that they generally deal automatically with checkpointing and caching of intermediate results. The second is that the workflow engine uses the execution graph to optimize the computation, only performing those operations that are actually needed. This is similar in spirit to the concept of *lazy execution* used by packages like Polars, in which the system optimizes computational efficiency by first analyzing the full computational graph. #### General-purpose versus domain-specific workflow engines @@ -296,14 +296,19 @@ With the growth of data science within industry and research, there has been an It's also worth noting that there are a number of domain-specific workflow engines that are specialized for particular kinds of data and workflows. Examples include [Galaxy](https://galaxyproject.org/) which is specialized for bioinformatics and genomics, and [Nipype](https://nipype.readthedocs.io/en/latest/index.html) which is specialized for neuroimaging analysis workflows. If your research community uses one of these then it's worth exploring that engine as your first option, since it will probably be well supported within the community. However, a benefit of using a general-purpose engine is that they will often be better maintained and supported, and AI tools will likely have more examples to work from in generating workflows. -#### A language-specific workflow management example: Prefect +### A language-specific workflow management example: Prefect +#### Configuration management +The initial version of the Prefect workflow generated by Claude had the default parameter settings for each workflow hard-coded into the task definitions in Python. This is not a great practice, since it means that any changes in parameter settings require changes in the code, making it difficult to easily test different parameter settings. A better practice is to put the parameter settings into a human-editable configuration file, which can be specified upon execution of the workflow. YAML is a common markup language for configuration files, which is used in both this workflow and the following one. -#### A general-purpose workflow management example: Snakemake +### A general-purpose workflow management example: Snakemake + + +#### Pipeline optimization The first time that I ran my workflow using snakemake, I noticed that it was substantially slower than when I ran it with Prefect. When I asked Claude about this, it gave me a reasonable answer: @@ -319,64 +324,9 @@ I asked Claude to fix this, and it returned the following change: > 4. Set NUMBA_NUM_THREADS and OMP_NUM_THREADS environment variables in dimred.py > In contrast, Prefect tasks run in the main process with access to all CPUs by default, which is why it was faster. -This solves the problem but it's an odd choice: in particular, it will probably fail if there are fewer than 8 threads available on the system. Snakemake actually take a command line argument (`--cores`) to specify the number of cores to use, so I instead asked Claude to have Snakemake use the number of cores specified at the command line rather than an arbitrary number that might fail if the requested number of cores are not available. We will discuss optimization further in a later chapter. - - -## Modularity and reusability - -- separate analysis logic from workflow orchestration -- analysis modules should be tested (e.g. with synthetic data) - - - -## Idempotency - -- running it multiple time should give same answer as running it once -- - -- somewhere talk about in-place operations and their challenges - -- local mutation - never change an object that is passed in as an argument - - always copy - - if the package uses copy-on-write then this is cheap (only copied metadata) - - this is not default in pandas 1.x but coming in 2.x - - google says it's possible using pd.options.mode.copy_on_write = True - need to confirm - - need to check for other frameworks - -- lazy frames (polars) - -- any function that must mutate in place should do so clearly - - e.g. normalize_(x) (apparently pytorch style for mutating functions?) - - or using "inplace" in the function name - - -- can encode state in type (e.g. a lightweight class that tracks state ,e.g. "NormalizedArray") -- or track stage explicitly (e.g. Dataset(stage='normalized', data=array)) - -- use zarr to save each pipeline step as a new group: - -dataset.zarr/ -├── raw/ -│ └── signal -├── zscored/ -│ └── signal -├── filtered/ -│ └── signal - -- can also store parameters as attrs in zarr -- e.g. z.attrs.update({ - "stage": "zscore", - "mean_method": "time", - "std_ddof": 1, -}) - - -also look at arrow for columnar data - look into arrow immutability - +This solves the problem but it's an odd choice: in particular, it will probably fail if there are fewer than 8 threads available on the system. Snakemake actually take a command line argument (`--cores`) to specify the number of cores to use, so I instead asked Claude to have Snakemake use the number of cores specified at the command line rather than an arbitrary number that might fail if the requested number of cores are not available. We will discuss optimization in much greater detail in a later chapter. -## Deferred execution -- dask, xarray ## Tracking provenance From 63cccd45a0ea95d3b4c847bfca05487f4efe6b18 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 10:03:58 -0800 Subject: [PATCH 68/87] Add Snakemake report generation for workflow results MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add report/ directory with RST caption files for each figure type - Add global report declaration to Snakefile - Update preprocessing.smk rules to declare figures as outputs with report() wrapper - Update pseudobulk.smk checkpoint to include figure with report() wrapper - Update per_cell_type.smk rules to include figures with report() wrapper and subcategory - Update common.smk aggregate function to include figure files - Add report and report-zip targets to Makefile - Update WORKFLOW_OVERVIEW.md with report generation documentation Usage: snakemake --report report.html --config datadir=/path/to/data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 11 +++++ .../rnaseq/snakemake_workflow/Makefile | 25 +++++++++- .../rnaseq/snakemake_workflow/Snakefile | 4 ++ .../snakemake_workflow/WORKFLOW_OVERVIEW.md | 44 +++++++++++++++++ .../snakemake_workflow/report/clustering.rst | 4 ++ .../snakemake_workflow/report/de_results.rst | 4 ++ .../report/doublet_umap.rst | 4 ++ .../snakemake_workflow/report/enrichr.rst | 4 ++ .../snakemake_workflow/report/filtering.rst | 4 ++ .../rnaseq/snakemake_workflow/report/gsea.rst | 4 ++ .../snakemake_workflow/report/hemoglobin.rst | 4 ++ .../rnaseq/snakemake_workflow/report/pca.rst | 4 ++ .../snakemake_workflow/report/prediction.rst | 4 ++ .../snakemake_workflow/report/pseudobulk.rst | 4 ++ .../snakemake_workflow/report/qc_scatter.rst | 5 ++ .../snakemake_workflow/report/qc_violin.rst | 8 ++++ .../rnaseq/snakemake_workflow/report/umap.rst | 4 ++ .../snakemake_workflow/report/workflow.rst | 29 +++++++++++ .../snakemake_workflow/rules/common.smk | 4 ++ .../rules/per_cell_type.smk | 18 +++++++ .../rules/preprocessing.smk | 48 +++++++++++++++++-- .../snakemake_workflow/rules/pseudobulk.smk | 6 +++ 22 files changed, 241 insertions(+), 5 deletions(-) create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/clustering.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/de_results.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/doublet_umap.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/enrichr.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/filtering.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/gsea.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/hemoglobin.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/pca.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/prediction.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/pseudobulk.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/qc_scatter.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/qc_violin.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/umap.rst create mode 100644 src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/workflow.rst diff --git a/problems_to_solve.md b/problems_to_solve.md index ee73576..94a86b1 100644 --- a/problems_to_solve.md +++ b/problems_to_solve.md @@ -3,6 +3,17 @@ Open problems marked with [ ] Fixed problems marked with [x] +[x] For the Snakemake workflow I would like to use the Snakemake report generating functions to create a report showing the results from each of the analyses. + - Added `report: "report/workflow.rst"` global declaration to Snakefile + - Created `report/` directory with RST caption files for each figure type + - Updated preprocessing.smk rules (filtering, qc, dimred, clustering) to declare figures as outputs with `report()` wrapper + - Updated pseudobulk.smk checkpoint to include pseudobulk figure with `report()` wrapper + - Updated per_cell_type.smk rules (GSEA, Enrichr, prediction) to include figures with `report()` wrapper and cell_type subcategory + - Updated common.smk `aggregate_per_cell_type_outputs()` to include figure files + - Added `report` and `report-zip` targets to Makefile + - Updated WORKFLOW_OVERVIEW.md with report generation documentation + - Usage: `snakemake --report report.html --config datadir=/path/to/data` or `make report` + [x] For the Prefect workflow, the default parameters for each workflow module are embedded in the python code for the workflow. I would rather that they be defined using a configuration file. Please extract all of the parameters into a configuration file (using whatever format you think is most appropriate) and read those in during workflow execution rather than hard-coding. - Created `prefect_workflow/config/config.yaml` with all workflow parameters - Parameters organized by step: filtering, qc, preprocessing, dimred, clustering, pseudobulk, differential_expression, pathway_analysis, overrepresentation, predictive_modeling diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile index 52d3e71..0947d14 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile @@ -1,5 +1,28 @@ include ../../../../.env -export +export +# Generate rule dependency graph rulegraph: snakemake --rulegraph --config datadir=$(DATADIR)/immune_aging/workflow/ --cores 2 | dot -Tpng > rulegraph.png + +# Run the full workflow +run: + snakemake --cores 8 --config datadir=$(DATADIR)/immune_aging/workflow/ + +# Generate HTML report (run after workflow completes) +report: + snakemake --report $(DATADIR)/immune_aging/workflow/wf_snakemake/report.html --config datadir=$(DATADIR)/immune_aging/workflow/ + +# Generate ZIP report (for larger reports with many figures) +report-zip: + snakemake --report $(DATADIR)/immune_aging/workflow/wf_snakemake/report.zip --config datadir=$(DATADIR)/immune_aging/workflow/ + +# Dry run - show what would be executed +dry-run: + snakemake -n --config datadir=$(DATADIR)/immune_aging/workflow/ + +# Clean all outputs +clean: + rm -rf $(DATADIR)/immune_aging/workflow/wf_snakemake/ + +.PHONY: rulegraph run report report-zip dry-run clean diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile index 7dcda81..796a140 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Snakefile @@ -44,6 +44,10 @@ min_version("8.0") configfile: "config/config.yaml" +# Global report description +report: "report/workflow.rst" + + # Validate required config if config.get("datadir") is None: raise ValueError( diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md index ab08ea9..b39b832 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/WORKFLOW_OVERVIEW.md @@ -363,6 +363,50 @@ snakemake --cores 16 preprocessing_only --config datadir=/path/to/data # Force re-run from a specific step snakemake --cores 16 --forcerun dimensionality_reduction --config datadir=/path/to/data + +# Generate HTML report (after workflow completes) +snakemake --report /path/to/data/wf_snakemake/report.html --config datadir=/path/to/data + +# Generate ZIP report (for larger reports) +snakemake --report /path/to/data/wf_snakemake/report.zip --config datadir=/path/to/data +``` + +--- + +## Report Generation + +The workflow includes Snakemake's built-in report generation functionality. After the workflow completes, you can generate a self-contained HTML report that includes: + +- **Runtime statistics**: Execution times for each rule +- **Provenance information**: Input/output file tracking +- **Workflow topology**: Visual representation of rule dependencies +- **Analysis results**: Figures from each analysis step + +### Report Contents + +The report organizes results by analysis step: + +| Category | Contents | +|----------|----------| +| Step 2: Filtering | Donor cell count distribution | +| Step 3: Quality Control | QC violin plots, scatter plots, hemoglobin distribution, doublet UMAP | +| Step 5: Dimensionality Reduction | PCA and UMAP visualizations | +| Step 6: Clustering | UMAP with cell type and cluster annotations | +| Step 7: Pseudobulking | Pseudobulk sample characteristics | +| Step 9: Pathway Analysis (GSEA) | GSEA pathway plots per cell type | +| Step 10: Overrepresentation Analysis | Enrichr pathway plots per cell type | +| Step 11: Predictive Modeling | Age prediction performance per cell type | + +### Using the Makefile + +If using the provided Makefile: + +```bash +# Generate HTML report +make report + +# Generate ZIP report (recommended for many cell types) +make report-zip ``` --- diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/clustering.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/clustering.rst new file mode 100644 index 0000000..79cc652 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/clustering.rst @@ -0,0 +1,4 @@ +UMAP visualization with cell type and Leiden clustering. + +Cell types are annotated from the original dataset. Leiden clustering +identifies communities of cells with similar expression profiles. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/de_results.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/de_results.rst new file mode 100644 index 0000000..3210df8 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/de_results.rst @@ -0,0 +1,4 @@ +Differential expression results for {{ snakemake.wildcards.cell_type }}. + +Contains log2 fold changes, p-values, and adjusted p-values from +DESeq2 analysis comparing conditions within this cell type. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/doublet_umap.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/doublet_umap.rst new file mode 100644 index 0000000..efc9341 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/doublet_umap.rst @@ -0,0 +1,4 @@ +UMAP visualization of predicted doublets. + +Doublets (cells containing RNA from multiple cells) are computationally +detected and shown here in the context of the full UMAP embedding. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/enrichr.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/enrichr.rst new file mode 100644 index 0000000..fb9dc2b --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/enrichr.rst @@ -0,0 +1,4 @@ +Enrichr pathway analysis for {{ snakemake.wildcards.cell_type }}. + +Overrepresentation analysis of differentially expressed genes using +the Enrichr database to identify enriched biological processes. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/filtering.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/filtering.rst new file mode 100644 index 0000000..2afe459 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/filtering.rst @@ -0,0 +1,4 @@ +Distribution of cell counts per donor after filtering. + +This plot shows the number of cells from each donor that passed quality filters, +helping to identify potential batch effects or sample quality issues. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/gsea.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/gsea.rst new file mode 100644 index 0000000..bc4c4e0 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/gsea.rst @@ -0,0 +1,4 @@ +Gene Set Enrichment Analysis (GSEA) results for {{ snakemake.wildcards.cell_type }}. + +Shows significantly enriched pathways based on the ranking of genes +by their differential expression statistics. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/hemoglobin.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/hemoglobin.rst new file mode 100644 index 0000000..ea70d4f --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/hemoglobin.rst @@ -0,0 +1,4 @@ +Distribution of hemoglobin gene expression. + +High hemoglobin gene expression can indicate red blood cell contamination. +Cells exceeding the threshold are filtered out to improve data quality. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/pca.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/pca.rst new file mode 100644 index 0000000..805e6ec --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/pca.rst @@ -0,0 +1,4 @@ +PCA visualization of cells colored by cell type. + +Principal component analysis reduces dimensionality while preserving +the major sources of variation in gene expression. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/prediction.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/prediction.rst new file mode 100644 index 0000000..da779b7 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/prediction.rst @@ -0,0 +1,4 @@ +Age prediction performance for {{ snakemake.wildcards.cell_type }}. + +Cross-validated prediction of donor age from gene expression using +machine learning models. Shows correlation between predicted and actual age. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/pseudobulk.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/pseudobulk.rst new file mode 100644 index 0000000..3fcbdd4 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/pseudobulk.rst @@ -0,0 +1,4 @@ +Distribution of pseudobulk sample characteristics. + +Shows the distribution of cells per sample and total counts after +aggregating single-cell data to pseudobulk samples per donor and cell type. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/qc_scatter.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/qc_scatter.rst new file mode 100644 index 0000000..5e2b9f4 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/qc_scatter.rst @@ -0,0 +1,5 @@ +Scatter plot of QC metrics with doublet predictions. + +Shows the relationship between gene counts and UMI counts per cell, +colored by predicted doublet status. Doublets typically have elevated +counts in both dimensions. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/qc_violin.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/qc_violin.rst new file mode 100644 index 0000000..e4d4415 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/qc_violin.rst @@ -0,0 +1,8 @@ +Violin plots of quality control metrics. + +Shows the distribution of key QC metrics across cells: +- Number of genes detected per cell +- Total UMI counts per cell +- Mitochondrial gene percentage + +These metrics help identify low-quality cells for filtering. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/umap.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/umap.rst new file mode 100644 index 0000000..64c8b38 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/umap.rst @@ -0,0 +1,4 @@ +UMAP visualization colored by total counts. + +UMAP provides a non-linear dimensionality reduction that preserves +local structure. Coloring by total counts helps assess technical variation. diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/workflow.rst b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/workflow.rst new file mode 100644 index 0000000..f011d94 --- /dev/null +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/report/workflow.rst @@ -0,0 +1,29 @@ +This workflow performs single-cell RNA-seq analysis of immune aging data. + +The analysis consists of 11 steps: + +**Global Preprocessing (Steps 1-7)** + +1. Data Download - Retrieve raw data from repository +2. Data Filtering - Remove low-quality cells and cell types +3. Quality Control - Filter cells based on QC metrics and detect doublets +4. Preprocessing - Normalize and select highly variable genes +5. Dimensionality Reduction - PCA and UMAP computation +6. Clustering - Leiden clustering for cell type identification +7. Pseudobulking - Aggregate single-cell data to pseudobulk per donor/cell-type + +**Per-Cell-Type Analysis (Steps 8-11)** + +For each cell type discovered in step 7: + +8. Differential Expression - Compare gene expression between conditions +9. Pathway Analysis (GSEA) - Identify enriched biological pathways +10. Overrepresentation Analysis - Enrichr analysis of DE genes +11. Predictive Modeling - Age prediction from gene expression + +Configuration +============= + +Dataset: ``{{ snakemake.config["dataset_name"] }}`` + +Data directory: ``{{ snakemake.config["datadir"] }}`` diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/common.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/common.smk index 3c11e87..1bff4bb 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/common.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/common.smk @@ -86,6 +86,10 @@ def aggregate_per_cell_type_outputs(wildcards): ct_dir / "enrichr_up.pkl", ct_dir / "enrichr_down.pkl", ct_dir / "prediction_results.pkl", + # Figures for report + ct_dir / "figures" / "gsea_pathways.png", + ct_dir / "figures" / "enrichr_pathways.png", + ct_dir / "figures" / "age_prediction_performance.png", ] ) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk index 9d06fa6..c5b0149 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk @@ -39,6 +39,12 @@ rule pathway_analysis: de_results=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "de_results.parquet", output: gsea_results=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "gsea_results.pkl", + fig_gsea=report( + RESULTS_DIR / "per_cell_type" / "{cell_type}" / "figures" / "gsea_pathways.png", + caption="report/gsea.rst", + category="Step 9: Pathway Analysis (GSEA)", + subcategory="{cell_type}", + ), params: cell_type=lambda wildcards: wildcards.cell_type, gene_sets=config["pathway_analysis"]["gene_sets"], @@ -59,6 +65,12 @@ rule overrepresentation: output: enr_up=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "enrichr_up.pkl", enr_down=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "enrichr_down.pkl", + fig_enrichr=report( + RESULTS_DIR / "per_cell_type" / "{cell_type}" / "figures" / "enrichr_pathways.png", + caption="report/enrichr.rst", + category="Step 10: Overrepresentation Analysis", + subcategory="{cell_type}", + ), params: cell_type=lambda wildcards: wildcards.cell_type, gene_sets=config["overrepresentation"]["gene_sets"], @@ -84,6 +96,12 @@ rule predictive_modeling: / "per_cell_type" / "{cell_type}" / "prediction_results.pkl", + fig_prediction=report( + RESULTS_DIR / "per_cell_type" / "{cell_type}" / "figures" / "age_prediction_performance.png", + caption="report/prediction.rst", + category="Step 11: Predictive Modeling", + subcategory="{cell_type}", + ), params: cell_type=lambda wildcards: wildcards.cell_type, n_splits=config["predictive_modeling"]["n_splits"], diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk index 7991264..29d4946 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk @@ -27,7 +27,12 @@ rule filter_data: input: DATADIR / f"dataset-{DATASET}_subset-immune_raw.h5ad", output: - CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 2, "filtered"), + checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 2, "filtered"), + fig_donor_counts=report( + FIGURE_DIR / "donor_cell_counts_distribution.png", + caption="report/filtering.rst", + category="Step 2: Filtering", + ), params: cutoff_percentile=config["filtering"]["cutoff_percentile"], min_cells_per_celltype=config["filtering"]["min_cells_per_celltype"], @@ -44,7 +49,27 @@ rule quality_control: input: CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 2, "filtered"), output: - CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), + checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), + fig_violin=report( + FIGURE_DIR / "qc_violin_plots.png", + caption="report/qc_violin.rst", + category="Step 3: Quality Control", + ), + fig_scatter=report( + FIGURE_DIR / "qc_scatter_doublets.png", + caption="report/qc_scatter.rst", + category="Step 3: Quality Control", + ), + fig_hemoglobin=report( + FIGURE_DIR / "hemoglobin_distribution.png", + caption="report/hemoglobin.rst", + category="Step 3: Quality Control", + ), + fig_doublet_umap=report( + FIGURE_DIR / "doublet_detection_umap.png", + caption="report/doublet_umap.rst", + category="Step 3: Quality Control", + ), threads: workflow.cores params: min_genes=config["qc"]["min_genes"], @@ -82,7 +107,17 @@ rule dimensionality_reduction: input: CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 4, "preprocessed"), output: - CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 5, "dimreduced"), + checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 5, "dimreduced"), + fig_pca=report( + FIGURE_DIR / "pca_cell_type.png", + caption="report/pca.rst", + category="Step 5: Dimensionality Reduction", + ), + fig_umap=report( + FIGURE_DIR / "umap_total_counts.png", + caption="report/umap.rst", + category="Step 5: Dimensionality Reduction", + ), threads: workflow.cores params: batch_key=config["dimred"]["batch_key"], @@ -100,7 +135,12 @@ rule clustering: input: CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 5, "dimreduced"), output: - CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 6, "clustered"), + checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 6, "clustered"), + fig_clustering=report( + FIGURE_DIR / "umap_cell_type_leiden.png", + caption="report/clustering.rst", + category="Step 6: Clustering", + ), params: resolution=config["clustering"]["resolution"], figure_dir=str(FIGURE_DIR), diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk index 678937e..7e71baf 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk @@ -26,6 +26,12 @@ checkpoint pseudobulk: cell_types=CHECKPOINT_DIR / f"dataset-{DATASET}_step-07_cell_types.json", # Gene name mapping (needed for DE analysis) var_to_feature=CHECKPOINT_DIR / f"dataset-{DATASET}_step-07_var_to_feature.json", + # Pseudobulk figure + fig_pseudobulk=report( + FIGURE_DIR / "pseudobulk_violin.png", + caption="report/pseudobulk.rst", + category="Step 7: Pseudobulking", + ), params: group_col=config["pseudobulk"]["group_col"], donor_col=config["pseudobulk"]["donor_col"], From 65d6706ee3ff1a15f4d019588d962fd67e534aeb Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 13:51:54 -0800 Subject: [PATCH 69/87] Add simple workflow example for Prefect, Snakemake, and Make MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Create a simple pandas-based data analysis workflow to demonstrate workflow manager features. The workflow loads Self-Regulation Ontology data, filters to numerical columns, joins datasets, computes Spearman correlations, and generates a clustered heatmap. Includes: - Core modules: load_data, filter_data, join_data, correlation, visualization - Prefect workflow with tasks and flows - Snakemake workflow with rules and report generation - GNU Make workflow with Makefile and CLI scripts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- problems_to_solve.md | 31 ++++ .../simple_workflow/__init__.py | 0 .../simple_workflow/correlation.py | 43 +++++ .../simple_workflow/filter_data.py | 55 +++++++ .../simple_workflow/join_data.py | 54 +++++++ .../simple_workflow/load_data.py | 86 ++++++++++ .../simple_workflow/make_workflow/Makefile | 107 ++++++++++++ .../simple_workflow/make_workflow/__init__.py | 0 .../scripts/compute_correlation.py | 42 +++++ .../make_workflow/scripts/download_data.py | 35 ++++ .../make_workflow/scripts/filter_data.py | 42 +++++ .../make_workflow/scripts/generate_heatmap.py | 38 +++++ .../make_workflow/scripts/join_data.py | 44 +++++ .../prefect_workflow/__init__.py | 0 .../simple_workflow/prefect_workflow/flows.py | 115 +++++++++++++ .../prefect_workflow/run_workflow.py | 43 +++++ .../simple_workflow/prefect_workflow/tasks.py | 153 ++++++++++++++++++ .../snakemake_workflow/Snakefile | 145 +++++++++++++++++ .../snakemake_workflow/__init__.py | 0 .../snakemake_workflow/config/config.yaml | 15 ++ .../snakemake_workflow/report/heatmap.rst | 1 + .../snakemake_workflow/report/workflow.rst | 10 ++ .../scripts/compute_correlation.py | 34 ++++ .../scripts/download_data.py | 26 +++ .../snakemake_workflow/scripts/filter_data.py | 33 ++++ .../scripts/generate_heatmap.py | 40 +++++ .../snakemake_workflow/scripts/join_data.py | 35 ++++ .../simple_workflow/visualization.py | 85 ++++++++++ 28 files changed, 1312 insertions(+) create mode 100644 src/BetterCodeBetterScience/simple_workflow/__init__.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/correlation.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/filter_data.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/join_data.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/load_data.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/make_workflow/Makefile create mode 100644 src/BetterCodeBetterScience/simple_workflow/make_workflow/__init__.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/compute_correlation.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/download_data.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/filter_data.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/generate_heatmap.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/join_data.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/prefect_workflow/__init__.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/prefect_workflow/flows.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/prefect_workflow/run_workflow.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/prefect_workflow/tasks.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Snakefile create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/__init__.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/config/config.yaml create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/report/heatmap.rst create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/report/workflow.rst create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/compute_correlation.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/download_data.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/filter_data.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/generate_heatmap.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/join_data.py create mode 100644 src/BetterCodeBetterScience/simple_workflow/visualization.py diff --git a/problems_to_solve.md b/problems_to_solve.md index 94a86b1..f45c496 100644 --- a/problems_to_solve.md +++ b/problems_to_solve.md @@ -3,6 +3,37 @@ Open problems marked with [ ] Fixed problems marked with [x] +[x] I would like to generate a new example of a very simple pandas-based data analysis workflow for demonstrating the features of Prefect and snakemake. Put the new code into src/BetterCodeBetterScience/simple_workflow. The example should include separate modules that implement each of the following functions: +- load these two files (using the first column as the index for each): + - https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv + - https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv +- Filter out any non-numerical variables from each +- join the two data frames based on the index +- compute the correlation matrix across all measures using Spearman correlation +- generate a clustered heatmap from the correlation matrix using Seaborn + - Created `simple_workflow/` directory with modular functions: + - `load_data.py`: Functions to load CSV data from URLs with optional caching + - `filter_data.py`: Functions to filter dataframes to numerical columns only + - `join_data.py`: Functions to join dataframes based on index + - `correlation.py`: Functions to compute Spearman correlation matrices + - `visualization.py`: Functions to generate clustered heatmaps with Seaborn + - Created `prefect_workflow/` subdirectory: + - `tasks.py`: Prefect task definitions wrapping each workflow function + - `flows.py`: Main workflow flow orchestrating all steps + - `run_workflow.py`: CLI entry point + - Usage: `python run_workflow.py --output-dir ./output` + - Created `snakemake_workflow/` subdirectory: + - `Snakefile`: Workflow rules with dependencies + - `config/config.yaml`: Configuration for URLs and heatmap settings + - `scripts/*.py`: Scripts for each workflow step + - `report/`: RST files for Snakemake report generation + - Usage: `snakemake --cores 1 --config output_dir=/path/to/output` + - Created `make_workflow/` subdirectory: + - `Makefile`: GNU Make-based workflow with proper dependencies + - `scripts/*.py`: Standalone CLI scripts for each step + - Usage: `make OUTPUT_DIR=/path/to/output all` + + [x] For the Snakemake workflow I would like to use the Snakemake report generating functions to create a report showing the results from each of the analyses. - Added `report: "report/workflow.rst"` global declaration to Snakefile - Created `report/` directory with RST caption files for each figure type diff --git a/src/BetterCodeBetterScience/simple_workflow/__init__.py b/src/BetterCodeBetterScience/simple_workflow/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/BetterCodeBetterScience/simple_workflow/correlation.py b/src/BetterCodeBetterScience/simple_workflow/correlation.py new file mode 100644 index 0000000..935b194 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/correlation.py @@ -0,0 +1,43 @@ +"""Correlation computation module for the simple workflow example. + +This module provides functions to compute correlation matrices. +""" + +import pandas as pd + + +def compute_spearman_correlation(df: pd.DataFrame) -> pd.DataFrame: + """Compute Spearman correlation matrix for a dataframe. + + Parameters + ---------- + df : pd.DataFrame + Input dataframe with numerical columns + + Returns + ------- + pd.DataFrame + Spearman correlation matrix + """ + return df.corr(method="spearman") + + +def compute_correlation_matrix( + df: pd.DataFrame, + method: str = "spearman", +) -> pd.DataFrame: + """Compute correlation matrix using the specified method. + + Parameters + ---------- + df : pd.DataFrame + Input dataframe with numerical columns + method : str + Correlation method: 'pearson', 'spearman', or 'kendall' (default: 'spearman') + + Returns + ------- + pd.DataFrame + Correlation matrix + """ + return df.corr(method=method) diff --git a/src/BetterCodeBetterScience/simple_workflow/filter_data.py b/src/BetterCodeBetterScience/simple_workflow/filter_data.py new file mode 100644 index 0000000..292d73a --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/filter_data.py @@ -0,0 +1,55 @@ +"""Data filtering module for the simple workflow example. + +This module provides functions to filter dataframes to keep only numerical columns. +""" + +import pandas as pd + + +def filter_numerical_columns(df: pd.DataFrame) -> pd.DataFrame: + """Filter a dataframe to keep only numerical columns. + + Parameters + ---------- + df : pd.DataFrame + Input dataframe + + Returns + ------- + pd.DataFrame + Dataframe with only numerical columns + """ + numerical_df = df.select_dtypes(include=["number"]) + return numerical_df + + +def filter_meaningful_variables(df: pd.DataFrame) -> pd.DataFrame: + """Filter meaningful variables dataframe to numerical columns only. + + Parameters + ---------- + df : pd.DataFrame + Meaningful variables dataframe + + Returns + ------- + pd.DataFrame + Filtered dataframe with only numerical columns + """ + return filter_numerical_columns(df) + + +def filter_demographics(df: pd.DataFrame) -> pd.DataFrame: + """Filter demographics dataframe to numerical columns only. + + Parameters + ---------- + df : pd.DataFrame + Demographics dataframe + + Returns + ------- + pd.DataFrame + Filtered dataframe with only numerical columns + """ + return filter_numerical_columns(df) diff --git a/src/BetterCodeBetterScience/simple_workflow/join_data.py b/src/BetterCodeBetterScience/simple_workflow/join_data.py new file mode 100644 index 0000000..f5c8090 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/join_data.py @@ -0,0 +1,54 @@ +"""Data joining module for the simple workflow example. + +This module provides functions to join dataframes based on their index. +""" + +import pandas as pd + + +def join_dataframes( + df1: pd.DataFrame, + df2: pd.DataFrame, + how: str = "inner", +) -> pd.DataFrame: + """Join two dataframes based on their index. + + Parameters + ---------- + df1 : pd.DataFrame + First dataframe + df2 : pd.DataFrame + Second dataframe + how : str + Type of join: 'inner', 'outer', 'left', 'right' (default: 'inner') + + Returns + ------- + pd.DataFrame + Joined dataframe + """ + return df1.join(df2, how=how, lsuffix="_mv", rsuffix="_demo") + + +def join_meaningful_and_demographics( + meaningful_vars: pd.DataFrame, + demographics: pd.DataFrame, + how: str = "inner", +) -> pd.DataFrame: + """Join meaningful variables and demographics dataframes. + + Parameters + ---------- + meaningful_vars : pd.DataFrame + Meaningful variables dataframe (filtered to numerical) + demographics : pd.DataFrame + Demographics dataframe (filtered to numerical) + how : str + Type of join (default: 'inner') + + Returns + ------- + pd.DataFrame + Joined dataframe + """ + return join_dataframes(meaningful_vars, demographics, how=how) diff --git a/src/BetterCodeBetterScience/simple_workflow/load_data.py b/src/BetterCodeBetterScience/simple_workflow/load_data.py new file mode 100644 index 0000000..59a3c98 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/load_data.py @@ -0,0 +1,86 @@ +"""Data loading module for the simple workflow example. + +This module provides functions to load CSV data from URLs or local files. +""" + +from pathlib import Path + +import pandas as pd + + +def load_csv_from_url(url: str, index_col: int = 0) -> pd.DataFrame: + """Load a CSV file from a URL. + + Parameters + ---------- + url : str + URL to the CSV file + index_col : int + Column to use as index (default: 0, first column) + + Returns + ------- + pd.DataFrame + Loaded dataframe with the first column as index + """ + return pd.read_csv(url, index_col=index_col) + + +def load_meaningful_variables( + url: str = "https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv", + cache_path: Path | None = None, +) -> pd.DataFrame: + """Load the meaningful variables dataset. + + Parameters + ---------- + url : str + URL to the meaningful variables CSV file + cache_path : Path, optional + If provided, save/load from this local path + + Returns + ------- + pd.DataFrame + Meaningful variables dataframe + """ + if cache_path is not None and cache_path.exists(): + return pd.read_csv(cache_path, index_col=0) + + df = load_csv_from_url(url) + + if cache_path is not None: + cache_path.parent.mkdir(parents=True, exist_ok=True) + df.to_csv(cache_path) + + return df + + +def load_demographics( + url: str = "https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv", + cache_path: Path | None = None, +) -> pd.DataFrame: + """Load the demographics dataset. + + Parameters + ---------- + url : str + URL to the demographics CSV file + cache_path : Path, optional + If provided, save/load from this local path + + Returns + ------- + pd.DataFrame + Demographics dataframe + """ + if cache_path is not None and cache_path.exists(): + return pd.read_csv(cache_path, index_col=0) + + df = load_csv_from_url(url) + + if cache_path is not None: + cache_path.parent.mkdir(parents=True, exist_ok=True) + df.to_csv(cache_path) + + return df diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/Makefile b/src/BetterCodeBetterScience/simple_workflow/make_workflow/Makefile new file mode 100644 index 0000000..b1c642f --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/Makefile @@ -0,0 +1,107 @@ +# Simple Correlation Workflow using Make +# +# This Makefile demonstrates a simple data analysis pipeline using Make. +# Each target represents a step in the workflow, with dependencies ensuring +# correct execution order. +# +# Usage: +# make OUTPUT_DIR=/path/to/output all +# make OUTPUT_DIR=/path/to/output clean +# make OUTPUT_DIR=/path/to/output help + +# Configuration +OUTPUT_DIR ?= ./output +DATA_DIR := $(OUTPUT_DIR)/data +RESULTS_DIR := $(OUTPUT_DIR)/results +FIGURES_DIR := $(OUTPUT_DIR)/figures +LOGS_DIR := $(OUTPUT_DIR)/logs + +# Data URLs +MV_URL := https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv +DEMO_URL := https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv + +# Python command +PYTHON := python + +# Default target +.PHONY: all +all: $(FIGURES_DIR)/correlation_heatmap.png + +# Help target +.PHONY: help +help: + @echo "Simple Correlation Workflow" + @echo "" + @echo "Usage: make OUTPUT_DIR=/path/to/output " + @echo "" + @echo "Targets:" + @echo " all - Run full workflow (default)" + @echo " clean - Remove all generated files" + @echo " help - Show this help message" + @echo "" + @echo "Individual steps:" + @echo " download_mv - Download meaningful variables data" + @echo " download_demo - Download demographics data" + @echo " filter_mv - Filter meaningful variables to numerical" + @echo " filter_demo - Filter demographics to numerical" + @echo " join - Join the two datasets" + @echo " correlation - Compute correlation matrix" + @echo " heatmap - Generate clustered heatmap" + +# Create directories +$(DATA_DIR) $(RESULTS_DIR) $(FIGURES_DIR) $(LOGS_DIR): + mkdir -p $@ + +# Step 1a: Download meaningful variables +.PHONY: download_mv +download_mv: $(DATA_DIR)/meaningful_variables.csv + +$(DATA_DIR)/meaningful_variables.csv: | $(DATA_DIR) $(LOGS_DIR) + $(PYTHON) scripts/download_data.py "$(MV_URL)" "$@" > $(LOGS_DIR)/download_mv.log 2>&1 + +# Step 1b: Download demographics +.PHONY: download_demo +download_demo: $(DATA_DIR)/demographics.csv + +$(DATA_DIR)/demographics.csv: | $(DATA_DIR) $(LOGS_DIR) + $(PYTHON) scripts/download_data.py "$(DEMO_URL)" "$@" > $(LOGS_DIR)/download_demo.log 2>&1 + +# Step 2a: Filter meaningful variables +.PHONY: filter_mv +filter_mv: $(DATA_DIR)/meaningful_variables_numerical.csv + +$(DATA_DIR)/meaningful_variables_numerical.csv: $(DATA_DIR)/meaningful_variables.csv | $(LOGS_DIR) + $(PYTHON) scripts/filter_data.py "$<" "$@" > $(LOGS_DIR)/filter_mv.log 2>&1 + +# Step 2b: Filter demographics +.PHONY: filter_demo +filter_demo: $(DATA_DIR)/demographics_numerical.csv + +$(DATA_DIR)/demographics_numerical.csv: $(DATA_DIR)/demographics.csv | $(LOGS_DIR) + $(PYTHON) scripts/filter_data.py "$<" "$@" > $(LOGS_DIR)/filter_demo.log 2>&1 + +# Step 3: Join datasets +.PHONY: join +join: $(DATA_DIR)/joined_data.csv + +$(DATA_DIR)/joined_data.csv: $(DATA_DIR)/meaningful_variables_numerical.csv $(DATA_DIR)/demographics_numerical.csv | $(LOGS_DIR) + $(PYTHON) scripts/join_data.py "$<" "$(word 2,$^)" "$@" > $(LOGS_DIR)/join.log 2>&1 + +# Step 4: Compute correlation +.PHONY: correlation +correlation: $(RESULTS_DIR)/correlation_matrix.csv + +$(RESULTS_DIR)/correlation_matrix.csv: $(DATA_DIR)/joined_data.csv | $(RESULTS_DIR) $(LOGS_DIR) + $(PYTHON) scripts/compute_correlation.py "$<" "$@" > $(LOGS_DIR)/correlation.log 2>&1 + +# Step 5: Generate heatmap +.PHONY: heatmap +heatmap: $(FIGURES_DIR)/correlation_heatmap.png + +$(FIGURES_DIR)/correlation_heatmap.png: $(RESULTS_DIR)/correlation_matrix.csv | $(FIGURES_DIR) $(LOGS_DIR) + $(PYTHON) scripts/generate_heatmap.py "$<" "$@" > $(LOGS_DIR)/heatmap.log 2>&1 + +# Clean target +.PHONY: clean +clean: + rm -rf $(OUTPUT_DIR) diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/__init__.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/compute_correlation.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/compute_correlation.py new file mode 100644 index 0000000..500366e --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/compute_correlation.py @@ -0,0 +1,42 @@ +#!/usr/bin/env python3 +"""Compute Spearman correlation matrix. + +Usage: + python compute_correlation.py +""" + +import sys +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.simple_workflow.correlation import ( + compute_spearman_correlation, +) + + +def main(): + """Compute Spearman correlation matrix.""" + if len(sys.argv) != 3: + print("Usage: python compute_correlation.py ") + sys.exit(1) + + input_path = Path(sys.argv[1]) + output_path = Path(sys.argv[2]) + + # Load data + df = pd.read_csv(input_path, index_col=0) + print(f"Loaded {df.shape} from {input_path}") + + # Compute correlation + corr_matrix = compute_spearman_correlation(df) + print(f"Computed Spearman correlation matrix: {corr_matrix.shape}") + + # Save + output_path.parent.mkdir(parents=True, exist_ok=True) + corr_matrix.to_csv(output_path) + print(f"Saved to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/download_data.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/download_data.py new file mode 100644 index 0000000..b843979 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/download_data.py @@ -0,0 +1,35 @@ +#!/usr/bin/env python3 +"""Download data from URL and save to CSV. + +Usage: + python download_data.py +""" + +import sys +from pathlib import Path + +import pandas as pd + + +def main(): + """Download data from URL.""" + if len(sys.argv) != 3: + print("Usage: python download_data.py ") + sys.exit(1) + + url = sys.argv[1] + output_path = Path(sys.argv[2]) + + # Create output directory + output_path.parent.mkdir(parents=True, exist_ok=True) + + # Download and save + df = pd.read_csv(url, index_col=0) + df.to_csv(output_path) + + print(f"Downloaded {len(df)} rows from {url}") + print(f"Saved to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/filter_data.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/filter_data.py new file mode 100644 index 0000000..62f0c86 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/filter_data.py @@ -0,0 +1,42 @@ +#!/usr/bin/env python3 +"""Filter dataframe to numerical columns only. + +Usage: + python filter_data.py +""" + +import sys +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.simple_workflow.filter_data import ( + filter_numerical_columns, +) + + +def main(): + """Filter data to numerical columns.""" + if len(sys.argv) != 3: + print("Usage: python filter_data.py ") + sys.exit(1) + + input_path = Path(sys.argv[1]) + output_path = Path(sys.argv[2]) + + # Load data + df = pd.read_csv(input_path, index_col=0) + print(f"Loaded {df.shape} from {input_path}") + + # Filter to numerical columns + df_num = filter_numerical_columns(df) + print(f"Filtered to {df_num.shape} (numerical columns only)") + + # Save + output_path.parent.mkdir(parents=True, exist_ok=True) + df_num.to_csv(output_path) + print(f"Saved to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/generate_heatmap.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/generate_heatmap.py new file mode 100644 index 0000000..1f55cde --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/generate_heatmap.py @@ -0,0 +1,38 @@ +#!/usr/bin/env python3 +"""Generate clustered heatmap from correlation matrix. + +Usage: + python generate_heatmap.py +""" + +import sys +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.simple_workflow.visualization import ( + generate_clustered_heatmap, +) + + +def main(): + """Generate and save clustered heatmap.""" + if len(sys.argv) != 3: + print("Usage: python generate_heatmap.py ") + sys.exit(1) + + input_path = Path(sys.argv[1]) + output_path = Path(sys.argv[2]) + + # Load correlation matrix + corr_matrix = pd.read_csv(input_path, index_col=0) + print(f"Loaded correlation matrix: {corr_matrix.shape}") + + # Generate heatmap + output_path.parent.mkdir(parents=True, exist_ok=True) + generate_clustered_heatmap(corr_matrix, output_path=output_path) + print(f"Saved heatmap to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/join_data.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/join_data.py new file mode 100644 index 0000000..593a713 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/join_data.py @@ -0,0 +1,44 @@ +#!/usr/bin/env python3 +"""Join two dataframes based on their index. + +Usage: + python join_data.py +""" + +import sys +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.simple_workflow.join_data import join_dataframes + + +def main(): + """Join the two datasets.""" + if len(sys.argv) != 4: + print("Usage: python join_data.py ") + sys.exit(1) + + input1_path = Path(sys.argv[1]) + input2_path = Path(sys.argv[2]) + output_path = Path(sys.argv[3]) + + # Load data + df1 = pd.read_csv(input1_path, index_col=0) + df2 = pd.read_csv(input2_path, index_col=0) + + print(f"Dataset 1: {df1.shape}") + print(f"Dataset 2: {df2.shape}") + + # Join + joined = join_dataframes(df1, df2) + print(f"Joined: {joined.shape}") + + # Save + output_path.parent.mkdir(parents=True, exist_ok=True) + joined.to_csv(output_path) + print(f"Saved to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/__init__.py b/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/flows.py b/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/flows.py new file mode 100644 index 0000000..498e670 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/flows.py @@ -0,0 +1,115 @@ +"""Prefect flow definitions for the simple correlation workflow. + +This workflow demonstrates Prefect features with a simple pandas-based analysis: +1. Load two datasets from URLs +2. Filter to numerical columns +3. Join the datasets +4. Compute Spearman correlation +5. Generate a clustered heatmap +""" + +from pathlib import Path + +from prefect import flow, get_run_logger + +from BetterCodeBetterScience.simple_workflow.prefect_workflow.tasks import ( + compute_correlation_task, + filter_numerical_task, + generate_heatmap_task, + join_dataframes_task, + load_demographics_task, + load_meaningful_variables_task, + save_correlation_task, +) + + +@flow(name="simple_correlation_workflow", log_prints=True) +def run_workflow( + output_dir: Path, + cache_data: bool = True, +) -> Path: + """Run the simple correlation workflow. + + Steps: + 1. Load meaningful variables and demographics datasets + 2. Filter both to numerical columns only + 3. Join the datasets on their index + 4. Compute Spearman correlation matrix + 5. Generate clustered heatmap + + Parameters + ---------- + output_dir : Path + Directory to save outputs (correlation matrix CSV and heatmap) + cache_data : bool + Whether to cache downloaded data locally (default: True) + + Returns + ------- + Path + Path to the generated heatmap + """ + logger = get_run_logger() + + # Setup directories + output_dir = Path(output_dir) + data_dir = output_dir / "data" + results_dir = output_dir / "results" + figures_dir = output_dir / "figures" + + for d in [data_dir, results_dir, figures_dir]: + d.mkdir(parents=True, exist_ok=True) + + # Step 1: Load data (can run in parallel) + logger.info("Step 1: Loading datasets...") + mv_cache = data_dir / "meaningful_variables.csv" if cache_data else None + demo_cache = data_dir / "demographics.csv" if cache_data else None + + meaningful_vars = load_meaningful_variables_task(cache_path=mv_cache) + demographics = load_demographics_task(cache_path=demo_cache) + + logger.info(f" Meaningful variables: {meaningful_vars.shape}") + logger.info(f" Demographics: {demographics.shape}") + + # Step 2: Filter to numerical columns + logger.info("Step 2: Filtering to numerical columns...") + meaningful_vars_num = filter_numerical_task(meaningful_vars) + demographics_num = filter_numerical_task(demographics) + + logger.info(f" Meaningful variables (numerical): {meaningful_vars_num.shape}") + logger.info(f" Demographics (numerical): {demographics_num.shape}") + + # Step 3: Join datasets + logger.info("Step 3: Joining datasets...") + joined_df = join_dataframes_task(meaningful_vars_num, demographics_num) + logger.info(f" Joined dataset: {joined_df.shape}") + + # Step 4: Compute correlation matrix + logger.info("Step 4: Computing Spearman correlation matrix...") + corr_matrix = compute_correlation_task(joined_df) + logger.info(f" Correlation matrix: {corr_matrix.shape}") + + # Save correlation matrix + corr_path = results_dir / "correlation_matrix.csv" + save_correlation_task(corr_matrix, corr_path) + logger.info(f" Saved correlation matrix to: {corr_path}") + + # Step 5: Generate heatmap + logger.info("Step 5: Generating clustered heatmap...") + heatmap_path = figures_dir / "correlation_heatmap.png" + generate_heatmap_task(corr_matrix, heatmap_path) + logger.info(f" Saved heatmap to: {heatmap_path}") + + logger.info("Workflow complete!") + return heatmap_path + + +if __name__ == "__main__": + import sys + + if len(sys.argv) < 2: + print("Usage: python flows.py ") + sys.exit(1) + + output_dir = Path(sys.argv[1]) + run_workflow(output_dir) diff --git a/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/run_workflow.py b/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/run_workflow.py new file mode 100644 index 0000000..e85c670 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/run_workflow.py @@ -0,0 +1,43 @@ +#!/usr/bin/env python3 +"""CLI entry point for the simple correlation Prefect workflow. + +Usage: + python run_workflow.py --output-dir ./output + python run_workflow.py --output-dir ./output --no-cache +""" + +import argparse +from pathlib import Path + +from BetterCodeBetterScience.simple_workflow.prefect_workflow.flows import run_workflow + + +def main(): + """Run the simple correlation workflow from the command line.""" + parser = argparse.ArgumentParser( + description="Run the simple correlation workflow with Prefect" + ) + parser.add_argument( + "--output-dir", + type=Path, + required=True, + help="Directory to save outputs", + ) + parser.add_argument( + "--no-cache", + action="store_true", + help="Do not cache downloaded data locally", + ) + + args = parser.parse_args() + + result = run_workflow( + output_dir=args.output_dir, + cache_data=not args.no_cache, + ) + + print(f"\nWorkflow complete! Heatmap saved to: {result}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/tasks.py b/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/tasks.py new file mode 100644 index 0000000..4d5193e --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/prefect_workflow/tasks.py @@ -0,0 +1,153 @@ +"""Prefect task definitions for the simple correlation workflow. + +Each task wraps a function from the modular workflow modules. +""" + +from pathlib import Path + +import pandas as pd +from prefect import task + +from BetterCodeBetterScience.simple_workflow.correlation import ( + compute_spearman_correlation, +) +from BetterCodeBetterScience.simple_workflow.filter_data import ( + filter_numerical_columns, +) +from BetterCodeBetterScience.simple_workflow.join_data import join_dataframes +from BetterCodeBetterScience.simple_workflow.load_data import ( + load_demographics, + load_meaningful_variables, +) +from BetterCodeBetterScience.simple_workflow.visualization import ( + generate_clustered_heatmap, + save_correlation_matrix, +) + + +@task(name="load_meaningful_variables") +def load_meaningful_variables_task( + cache_path: Path | None = None, +) -> pd.DataFrame: + """Load meaningful variables dataset. + + Parameters + ---------- + cache_path : Path, optional + Path to cache the downloaded data + + Returns + ------- + pd.DataFrame + Meaningful variables dataframe + """ + return load_meaningful_variables(cache_path=cache_path) + + +@task(name="load_demographics") +def load_demographics_task( + cache_path: Path | None = None, +) -> pd.DataFrame: + """Load demographics dataset. + + Parameters + ---------- + cache_path : Path, optional + Path to cache the downloaded data + + Returns + ------- + pd.DataFrame + Demographics dataframe + """ + return load_demographics(cache_path=cache_path) + + +@task(name="filter_numerical_columns") +def filter_numerical_task(df: pd.DataFrame) -> pd.DataFrame: + """Filter dataframe to keep only numerical columns. + + Parameters + ---------- + df : pd.DataFrame + Input dataframe + + Returns + ------- + pd.DataFrame + Filtered dataframe + """ + return filter_numerical_columns(df) + + +@task(name="join_dataframes") +def join_dataframes_task( + df1: pd.DataFrame, + df2: pd.DataFrame, +) -> pd.DataFrame: + """Join two dataframes based on their index. + + Parameters + ---------- + df1 : pd.DataFrame + First dataframe + df2 : pd.DataFrame + Second dataframe + + Returns + ------- + pd.DataFrame + Joined dataframe + """ + return join_dataframes(df1, df2) + + +@task(name="compute_correlation") +def compute_correlation_task(df: pd.DataFrame) -> pd.DataFrame: + """Compute Spearman correlation matrix. + + Parameters + ---------- + df : pd.DataFrame + Input dataframe + + Returns + ------- + pd.DataFrame + Correlation matrix + """ + return compute_spearman_correlation(df) + + +@task(name="save_correlation_matrix") +def save_correlation_task( + corr_matrix: pd.DataFrame, + output_path: Path, +) -> None: + """Save correlation matrix to CSV. + + Parameters + ---------- + corr_matrix : pd.DataFrame + Correlation matrix + output_path : Path + Output path + """ + save_correlation_matrix(corr_matrix, output_path) + + +@task(name="generate_heatmap") +def generate_heatmap_task( + corr_matrix: pd.DataFrame, + output_path: Path, +) -> None: + """Generate and save clustered heatmap. + + Parameters + ---------- + corr_matrix : pd.DataFrame + Correlation matrix + output_path : Path + Output path for the figure + """ + generate_clustered_heatmap(corr_matrix, output_path) diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Snakefile b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Snakefile new file mode 100644 index 0000000..88104ce --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Snakefile @@ -0,0 +1,145 @@ +"""Simple correlation Snakemake workflow. + +This workflow demonstrates Snakemake features with a simple pandas-based analysis: +1. Load two datasets from URLs +2. Filter to numerical columns +3. Join the datasets +4. Compute Spearman correlation +5. Generate a clustered heatmap + +Usage: + # Run full workflow + snakemake --cores 1 --config output_dir=/path/to/output + + # Dry run + snakemake -n --config output_dir=/path/to/output + + # Generate report + snakemake --report report.html --config output_dir=/path/to/output +""" + +from pathlib import Path + +from snakemake.utils import min_version + +min_version("8.0") + + +# Load configuration +configfile: "config/config.yaml" + + +# Global report +report: "report/workflow.rst" + + +# Validate required config +if config.get("output_dir") is None: + raise ValueError("output_dir must be provided via --config output_dir=/path/to/output") + +OUTPUT_DIR = Path(config["output_dir"]) +DATA_DIR = OUTPUT_DIR / "data" +RESULTS_DIR = OUTPUT_DIR / "results" +FIGURES_DIR = OUTPUT_DIR / "figures" + + +# Default target +rule all: + input: + FIGURES_DIR / "correlation_heatmap.png", + + +# Step 1a: Download meaningful variables data +rule download_meaningful_variables: + output: + DATA_DIR / "meaningful_variables.csv", + params: + url=config["meaningful_variables_url"], + log: + OUTPUT_DIR / "logs" / "download_meaningful_variables.log", + script: + "scripts/download_data.py" + + +# Step 1b: Download demographics data +rule download_demographics: + output: + DATA_DIR / "demographics.csv", + params: + url=config["demographics_url"], + log: + OUTPUT_DIR / "logs" / "download_demographics.log", + script: + "scripts/download_data.py" + + +# Step 2a: Filter meaningful variables to numerical columns +rule filter_meaningful_variables: + input: + DATA_DIR / "meaningful_variables.csv", + output: + DATA_DIR / "meaningful_variables_numerical.csv", + log: + OUTPUT_DIR / "logs" / "filter_meaningful_variables.log", + script: + "scripts/filter_data.py" + + +# Step 2b: Filter demographics to numerical columns +rule filter_demographics: + input: + DATA_DIR / "demographics.csv", + output: + DATA_DIR / "demographics_numerical.csv", + log: + OUTPUT_DIR / "logs" / "filter_demographics.log", + script: + "scripts/filter_data.py" + + +# Step 3: Join the two datasets +rule join_datasets: + input: + meaningful_vars=DATA_DIR / "meaningful_variables_numerical.csv", + demographics=DATA_DIR / "demographics_numerical.csv", + output: + DATA_DIR / "joined_data.csv", + log: + OUTPUT_DIR / "logs" / "join_datasets.log", + script: + "scripts/join_data.py" + + +# Step 4: Compute correlation matrix +rule compute_correlation: + input: + DATA_DIR / "joined_data.csv", + output: + RESULTS_DIR / "correlation_matrix.csv", + params: + method=config["correlation_method"], + log: + OUTPUT_DIR / "logs" / "compute_correlation.log", + script: + "scripts/compute_correlation.py" + + +# Step 5: Generate clustered heatmap +rule generate_heatmap: + input: + RESULTS_DIR / "correlation_matrix.csv", + output: + report( + FIGURES_DIR / "correlation_heatmap.png", + caption="report/heatmap.rst", + category="Results", + ), + params: + figsize=config["heatmap"]["figsize"], + cmap=config["heatmap"]["cmap"], + vmin=config["heatmap"]["vmin"], + vmax=config["heatmap"]["vmax"], + log: + OUTPUT_DIR / "logs" / "generate_heatmap.log", + script: + "scripts/generate_heatmap.py" diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/__init__.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/config/config.yaml b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/config/config.yaml new file mode 100644 index 0000000..332399f --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/config/config.yaml @@ -0,0 +1,15 @@ +# Configuration for the simple correlation Snakemake workflow + +# Data URLs +meaningful_variables_url: "https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv" +demographics_url: "https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv" + +# Correlation settings +correlation_method: "spearman" + +# Heatmap settings +heatmap: + figsize: [12, 10] + cmap: "coolwarm" + vmin: -1.0 + vmax: 1.0 diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/report/heatmap.rst b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/report/heatmap.rst new file mode 100644 index 0000000..8700d3b --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/report/heatmap.rst @@ -0,0 +1 @@ +Clustered correlation heatmap showing Spearman correlations between all numerical variables from the meaningful variables and demographics datasets. Variables are hierarchically clustered to reveal patterns of related measures. diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/report/workflow.rst b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/report/workflow.rst new file mode 100644 index 0000000..0dadb46 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/report/workflow.rst @@ -0,0 +1,10 @@ +Simple Correlation Workflow +=========================== + +This workflow demonstrates a simple pandas-based data analysis pipeline: + +1. **Data Loading**: Downloads two datasets from the Self-Regulation Ontology project +2. **Filtering**: Removes non-numerical columns from both datasets +3. **Joining**: Combines the datasets based on their common index (subject IDs) +4. **Correlation**: Computes Spearman correlation matrix across all measures +5. **Visualization**: Generates a clustered heatmap showing correlation patterns diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/compute_correlation.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/compute_correlation.py new file mode 100644 index 0000000..5095888 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/compute_correlation.py @@ -0,0 +1,34 @@ +"""Snakemake script for computing correlation matrix.""" + +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.simple_workflow.correlation import ( + compute_correlation_matrix, +) + + +def main(): + """Compute Spearman correlation matrix.""" + # ruff: noqa: F821 + input_path = Path(snakemake.input[0]) + output_path = Path(snakemake.output[0]) + method = snakemake.params.method + + # Load data + df = pd.read_csv(input_path, index_col=0) + print(f"Loaded {df.shape} from {input_path}") + + # Compute correlation + corr_matrix = compute_correlation_matrix(df, method=method) + print(f"Computed {method} correlation matrix: {corr_matrix.shape}") + + # Save + output_path.parent.mkdir(parents=True, exist_ok=True) + corr_matrix.to_csv(output_path) + print(f"Saved to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/download_data.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/download_data.py new file mode 100644 index 0000000..61474f2 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/download_data.py @@ -0,0 +1,26 @@ +"""Snakemake script for downloading data from URL.""" + +from pathlib import Path + +import pandas as pd + + +def main(): + """Download data from URL.""" + # ruff: noqa: F821 + url = snakemake.params.url + output_path = Path(snakemake.output[0]) + + # Create output directory + output_path.parent.mkdir(parents=True, exist_ok=True) + + # Download and save + df = pd.read_csv(url, index_col=0) + df.to_csv(output_path) + + print(f"Downloaded {len(df)} rows from {url}") + print(f"Saved to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/filter_data.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/filter_data.py new file mode 100644 index 0000000..325b818 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/filter_data.py @@ -0,0 +1,33 @@ +"""Snakemake script for filtering data to numerical columns.""" + +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.simple_workflow.filter_data import ( + filter_numerical_columns, +) + + +def main(): + """Filter data to numerical columns.""" + # ruff: noqa: F821 + input_path = Path(snakemake.input[0]) + output_path = Path(snakemake.output[0]) + + # Load data + df = pd.read_csv(input_path, index_col=0) + print(f"Loaded {df.shape} from {input_path}") + + # Filter to numerical columns + df_num = filter_numerical_columns(df) + print(f"Filtered to {df_num.shape} (numerical columns only)") + + # Save + output_path.parent.mkdir(parents=True, exist_ok=True) + df_num.to_csv(output_path) + print(f"Saved to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/generate_heatmap.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/generate_heatmap.py new file mode 100644 index 0000000..cb59946 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/generate_heatmap.py @@ -0,0 +1,40 @@ +"""Snakemake script for generating clustered heatmap.""" + +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.simple_workflow.visualization import ( + generate_clustered_heatmap, +) + + +def main(): + """Generate and save clustered heatmap.""" + # ruff: noqa: F821 + input_path = Path(snakemake.input[0]) + output_path = Path(snakemake.output[0]) + figsize = tuple(snakemake.params.figsize) + cmap = snakemake.params.cmap + vmin = snakemake.params.vmin + vmax = snakemake.params.vmax + + # Load correlation matrix + corr_matrix = pd.read_csv(input_path, index_col=0) + print(f"Loaded correlation matrix: {corr_matrix.shape}") + + # Generate heatmap + output_path.parent.mkdir(parents=True, exist_ok=True) + generate_clustered_heatmap( + corr_matrix, + output_path=output_path, + figsize=figsize, + cmap=cmap, + vmin=vmin, + vmax=vmax, + ) + print(f"Saved heatmap to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/join_data.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/join_data.py new file mode 100644 index 0000000..73b915d --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/join_data.py @@ -0,0 +1,35 @@ +"""Snakemake script for joining two dataframes.""" + +from pathlib import Path + +import pandas as pd + +from BetterCodeBetterScience.simple_workflow.join_data import join_dataframes + + +def main(): + """Join the two datasets.""" + # ruff: noqa: F821 + mv_path = Path(snakemake.input.meaningful_vars) + demo_path = Path(snakemake.input.demographics) + output_path = Path(snakemake.output[0]) + + # Load data + meaningful_vars = pd.read_csv(mv_path, index_col=0) + demographics = pd.read_csv(demo_path, index_col=0) + + print(f"Meaningful variables: {meaningful_vars.shape}") + print(f"Demographics: {demographics.shape}") + + # Join + joined = join_dataframes(meaningful_vars, demographics) + print(f"Joined: {joined.shape}") + + # Save + output_path.parent.mkdir(parents=True, exist_ok=True) + joined.to_csv(output_path) + print(f"Saved to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/src/BetterCodeBetterScience/simple_workflow/visualization.py b/src/BetterCodeBetterScience/simple_workflow/visualization.py new file mode 100644 index 0000000..5c30f19 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/visualization.py @@ -0,0 +1,85 @@ +"""Visualization module for the simple workflow example. + +This module provides functions to generate heatmaps from correlation matrices. +""" + +from pathlib import Path + +import matplotlib.pyplot as plt +import pandas as pd +import seaborn as sns + + +def generate_clustered_heatmap( + corr_matrix: pd.DataFrame, + output_path: Path | None = None, + figsize: tuple[int, int] = (12, 10), + cmap: str = "coolwarm", + vmin: float = -1.0, + vmax: float = 1.0, +) -> sns.matrix.ClusterGrid: + """Generate a clustered heatmap from a correlation matrix. + + Parameters + ---------- + corr_matrix : pd.DataFrame + Correlation matrix + output_path : Path, optional + If provided, save the figure to this path + figsize : tuple + Figure size (width, height) in inches + cmap : str + Colormap name (default: 'coolwarm') + vmin : float + Minimum value for color scale + vmax : float + Maximum value for color scale + + Returns + ------- + sns.matrix.ClusterGrid + The ClusterGrid object containing the heatmap + """ + # Create clustered heatmap + g = sns.clustermap( + corr_matrix, + cmap=cmap, + vmin=vmin, + vmax=vmax, + figsize=figsize, + dendrogram_ratio=(0.1, 0.1), + cbar_pos=(0.02, 0.8, 0.03, 0.15), + xticklabels=True, + yticklabels=True, + ) + + # Rotate x-axis labels for readability + plt.setp(g.ax_heatmap.get_xticklabels(), rotation=90, fontsize=6) + plt.setp(g.ax_heatmap.get_yticklabels(), rotation=0, fontsize=6) + + # Set title + g.fig.suptitle("Clustered Correlation Heatmap (Spearman)", y=1.02, fontsize=14) + + # Save if output path provided + if output_path is not None: + output_path.parent.mkdir(parents=True, exist_ok=True) + g.savefig(output_path, dpi=150, bbox_inches="tight") + + return g + + +def save_correlation_matrix( + corr_matrix: pd.DataFrame, + output_path: Path, +) -> None: + """Save a correlation matrix to a CSV file. + + Parameters + ---------- + corr_matrix : pd.DataFrame + Correlation matrix + output_path : Path + Path to save the CSV file + """ + output_path.parent.mkdir(parents=True, exist_ok=True) + corr_matrix.to_csv(output_path) From 2a13fd2da2718ab4c50715368daa0161d389d98b Mon Sep 17 00:00:00 2001 From: Russ Poldrack Date: Tue, 23 Dec 2025 14:40:23 -0800 Subject: [PATCH 70/87] add reporting cmd --- src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile index 52d3e71..59a7b59 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile @@ -3,3 +3,6 @@ export rulegraph: snakemake --rulegraph --config datadir=$(DATADIR)/immune_aging/workflow/ --cores 2 | dot -Tpng > rulegraph.png + +report: + snakemake --verbose --report report.html --config datadir=$(DATADIR)/immune_aging/ --cores 1 From c37360e7247df64240b1fa4e6b214fbdf4789ee8 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 14:44:05 -0800 Subject: [PATCH 71/87] fix paths --- .../rnaseq/snakemake_workflow/Makefile | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile index 0947d14..147bd26 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/Makefile @@ -3,26 +3,25 @@ export # Generate rule dependency graph rulegraph: - snakemake --rulegraph --config datadir=$(DATADIR)/immune_aging/workflow/ --cores 2 | dot -Tpng > rulegraph.png + snakemake --rulegraph --config datadir=$(DATADIR)/immune_aging/wf_snakemake/ --cores 2 | dot -Tpng > rulegraph.png # Run the full workflow run: - snakemake --cores 8 --config datadir=$(DATADIR)/immune_aging/workflow/ - + snakemake --cores 8 --config datadir=$(DATADIR)/immune_aging/wf_snakemake/ # Generate HTML report (run after workflow completes) report: - snakemake --report $(DATADIR)/immune_aging/workflow/wf_snakemake/report.html --config datadir=$(DATADIR)/immune_aging/workflow/ + snakemake --report $(DATADIR)/immune_aging/wf_snakemake/report.html --config datadir=$(DATADIR)/immune_aging/ # Generate ZIP report (for larger reports with many figures) report-zip: - snakemake --report $(DATADIR)/immune_aging/workflow/wf_snakemake/report.zip --config datadir=$(DATADIR)/immune_aging/workflow/ + snakemake --report $(DATADIR)/immune_aging/workflow/wf_snakemake/report.zip --config datadir=$(DATADIR)/immune_aging/ # Dry run - show what would be executed dry-run: - snakemake -n --config datadir=$(DATADIR)/immune_aging/workflow/ + snakemake -n --config datadir=$(DATADIR)/immune_aging/wf_snakemake/ # Clean all outputs clean: - rm -rf $(DATADIR)/immune_aging/workflow/wf_snakemake/ - + rm -rf $(DATADIR)/immune_aging/wf_snakemake + .PHONY: rulegraph run report report-zip dry-run clean From daebf73f03aa764bd77c45380537332dadf79e94 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 14:51:46 -0800 Subject: [PATCH 72/87] Fix missing doublet UMAP visualization in QC step MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The doublet_detection_umap.png was declared as an output in the Snakemake workflow but never generated because: 1. plot_doublets() was never called in run_qc_pipeline() 2. No UMAP coordinates existed at the QC step Added compute_umap_for_qc() to compute a simple UMAP embedding specifically for doublet visualization. This runs on a temporary copy to preserve raw counts, then copies coordinates back. Updated run_qc_pipeline() to compute UMAP and plot doublets after detection but before filtering, so both doublets and singlets are visible in the visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../modular_workflow/quality_control.py | 51 ++++++++++++++++++- 1 file changed, 49 insertions(+), 2 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py b/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py index 92fa012..3e83822 100644 --- a/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py +++ b/src/BetterCodeBetterScience/rnaseq/modular_workflow/quality_control.py @@ -224,16 +224,56 @@ def detect_doublets_per_donor( return adata_combined +def compute_umap_for_qc(adata: ad.AnnData, n_pcs: int = 30) -> ad.AnnData: + """Compute a simple UMAP embedding for QC visualization. + + This computes a quick UMAP on the raw counts for visualizing + doublet detection results. This is separate from the main + dimensionality reduction in step 5. + + Parameters + ---------- + adata : AnnData + AnnData object (raw counts in .X) + n_pcs : int + Number of principal components to use + + Returns + ------- + AnnData + AnnData with UMAP coordinates in .obsm['X_umap'] + """ + # Work on a copy to avoid modifying the original + adata_temp = adata.copy() + + # Basic preprocessing for UMAP computation + sc.pp.normalize_total(adata_temp, target_sum=1e4) + sc.pp.log1p(adata_temp) + sc.pp.highly_variable_genes(adata_temp, n_top_genes=2000, flavor="seurat_v3") + sc.pp.pca(adata_temp, n_comps=n_pcs, use_highly_variable=True) + sc.pp.neighbors(adata_temp, n_neighbors=15, n_pcs=n_pcs) + sc.tl.umap(adata_temp) + + # Copy UMAP coordinates back to original + adata.obsm["X_umap"] = adata_temp.obsm["X_umap"] + + return adata + + def plot_doublets(adata: ad.AnnData, figure_dir: Path | None = None) -> None: """Visualize doublet detection results on UMAP. Parameters ---------- adata : AnnData - AnnData object with doublet annotations + AnnData object with doublet annotations and UMAP coordinates figure_dir : Path, optional Directory to save figures """ + if "X_umap" not in adata.obsm: + print("Warning: No UMAP coordinates found, skipping doublet plot") + return + sc.pl.umap(adata, color=["doublet_score", "predicted_doublet"], size=20, show=False) if figure_dir is not None: plt.savefig( @@ -312,8 +352,15 @@ def run_qc_pipeline( adata, min_genes, max_genes, min_counts, max_counts, max_hb_pct ) - # Detect and filter doublets + # Detect doublets adata = detect_doublets_per_donor(adata, expected_doublet_rate) + + # Compute UMAP for doublet visualization (before filtering) + print("Computing UMAP for doublet visualization...") + adata = compute_umap_for_qc(adata) + plot_doublets(adata, figure_dir) + + # Filter doublets adata = filter_doublets(adata) # Save raw counts for HVG selection (step 4) and pseudobulking (step 7) From a07cecb66f80d0cd8ab36d6dea508ae5a31cba4d Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 15:22:06 -0800 Subject: [PATCH 73/87] Fix report caption paths in Snakemake rule files MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Caption paths were relative to the rules/ directory but pointed to report/ instead of ../report/. This caused "caption file not found" errors when generating Snakemake reports. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../snakemake_workflow/rules/per_cell_type.smk | 6 +++--- .../snakemake_workflow/rules/preprocessing.smk | 16 ++++++++-------- .../snakemake_workflow/rules/pseudobulk.smk | 2 +- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk index c5b0149..ea5bbe1 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/per_cell_type.smk @@ -41,7 +41,7 @@ rule pathway_analysis: gsea_results=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "gsea_results.pkl", fig_gsea=report( RESULTS_DIR / "per_cell_type" / "{cell_type}" / "figures" / "gsea_pathways.png", - caption="report/gsea.rst", + caption="../report/gsea.rst", category="Step 9: Pathway Analysis (GSEA)", subcategory="{cell_type}", ), @@ -67,7 +67,7 @@ rule overrepresentation: enr_down=RESULTS_DIR / "per_cell_type" / "{cell_type}" / "enrichr_down.pkl", fig_enrichr=report( RESULTS_DIR / "per_cell_type" / "{cell_type}" / "figures" / "enrichr_pathways.png", - caption="report/enrichr.rst", + caption="../report/enrichr.rst", category="Step 10: Overrepresentation Analysis", subcategory="{cell_type}", ), @@ -98,7 +98,7 @@ rule predictive_modeling: / "prediction_results.pkl", fig_prediction=report( RESULTS_DIR / "per_cell_type" / "{cell_type}" / "figures" / "age_prediction_performance.png", - caption="report/prediction.rst", + caption="../report/prediction.rst", category="Step 11: Predictive Modeling", subcategory="{cell_type}", ), diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk index 29d4946..5700454 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/preprocessing.smk @@ -30,7 +30,7 @@ rule filter_data: checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 2, "filtered"), fig_donor_counts=report( FIGURE_DIR / "donor_cell_counts_distribution.png", - caption="report/filtering.rst", + caption="../report/filtering.rst", category="Step 2: Filtering", ), params: @@ -52,22 +52,22 @@ rule quality_control: checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 3, "qc"), fig_violin=report( FIGURE_DIR / "qc_violin_plots.png", - caption="report/qc_violin.rst", + caption="../report/qc_violin.rst", category="Step 3: Quality Control", ), fig_scatter=report( FIGURE_DIR / "qc_scatter_doublets.png", - caption="report/qc_scatter.rst", + caption="../report/qc_scatter.rst", category="Step 3: Quality Control", ), fig_hemoglobin=report( FIGURE_DIR / "hemoglobin_distribution.png", - caption="report/hemoglobin.rst", + caption="../report/hemoglobin.rst", category="Step 3: Quality Control", ), fig_doublet_umap=report( FIGURE_DIR / "doublet_detection_umap.png", - caption="report/doublet_umap.rst", + caption="../report/doublet_umap.rst", category="Step 3: Quality Control", ), threads: workflow.cores @@ -110,12 +110,12 @@ rule dimensionality_reduction: checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 5, "dimreduced"), fig_pca=report( FIGURE_DIR / "pca_cell_type.png", - caption="report/pca.rst", + caption="../report/pca.rst", category="Step 5: Dimensionality Reduction", ), fig_umap=report( FIGURE_DIR / "umap_total_counts.png", - caption="report/umap.rst", + caption="../report/umap.rst", category="Step 5: Dimensionality Reduction", ), threads: workflow.cores @@ -138,7 +138,7 @@ rule clustering: checkpoint=CHECKPOINT_DIR / bids_checkpoint_name(DATASET, 6, "clustered"), fig_clustering=report( FIGURE_DIR / "umap_cell_type_leiden.png", - caption="report/clustering.rst", + caption="../report/clustering.rst", category="Step 6: Clustering", ), params: diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk index 7e71baf..606ab86 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/rules/pseudobulk.smk @@ -29,7 +29,7 @@ checkpoint pseudobulk: # Pseudobulk figure fig_pseudobulk=report( FIGURE_DIR / "pseudobulk_violin.png", - caption="report/pseudobulk.rst", + caption="../report/pseudobulk.rst", category="Step 7: Pseudobulking", ), params: From de06d21c74d178aa1920a486c388d299d8de65ff Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Tue, 23 Dec 2025 15:34:37 -0800 Subject: [PATCH 74/87] Fix Snakemake scripts to use named output references MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Scripts were using snakemake.output[0] but the rules have named outputs (checkpoint=, fig_*=). This caused checkpoint files to not be saved to the correct path. Changed to snakemake.output.checkpoint in filter.py, qc.py, dimred.py, and cluster.py. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../rnaseq/snakemake_workflow/scripts/cluster.py | 2 +- .../rnaseq/snakemake_workflow/scripts/dimred.py | 2 +- .../rnaseq/snakemake_workflow/scripts/filter.py | 2 +- .../rnaseq/snakemake_workflow/scripts/qc.py | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py index 4a67cb4..ff8ed68 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/cluster.py @@ -15,7 +15,7 @@ def main(): """Run clustering pipeline.""" # ruff: noqa: F821 input_file = Path(snakemake.input[0]) - output_file = Path(snakemake.output[0]) + output_file = Path(snakemake.output.checkpoint) # Get parameters resolution = snakemake.params.resolution diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py index 4ce8f1e..4e72cb6 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/dimred.py @@ -22,7 +22,7 @@ def main(): print(f"Running with {snakemake.threads} threads") input_file = Path(snakemake.input[0]) - output_file = Path(snakemake.output[0]) + output_file = Path(snakemake.output.checkpoint) # Get parameters batch_key = snakemake.params.batch_key diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py index 4935521..41a24bc 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/filter.py @@ -15,7 +15,7 @@ def main(): """Load data and run filtering pipeline.""" # ruff: noqa: F821 input_file = Path(snakemake.input[0]) - output_file = Path(snakemake.output[0]) + output_file = Path(snakemake.output.checkpoint) # Get parameters cutoff_percentile = snakemake.params.cutoff_percentile diff --git a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py index 4839258..abda90f 100644 --- a/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py +++ b/src/BetterCodeBetterScience/rnaseq/snakemake_workflow/scripts/qc.py @@ -15,7 +15,7 @@ def main(): """Run quality control pipeline.""" # ruff: noqa: F821 input_file = Path(snakemake.input[0]) - output_file = Path(snakemake.output[0]) + output_file = Path(snakemake.output.checkpoint) # Get parameters min_genes = snakemake.params.min_genes From 7dcc0a160b71479367a22942ea680065785ece5a Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 05:55:15 -0800 Subject: [PATCH 75/87] file naming --- myst.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/myst.yml b/myst.yml index 741d1b8..edc3d9b 100644 --- a/myst.yml +++ b/myst.yml @@ -23,7 +23,7 @@ project: - file: book/project_organization.md - file: book/data_management.md # - file: workflows -# - file: validation_robustness.md +# - file: validation.md # - file: performance # - file: HPC # - file: sharing From dd97412f3edc1024c550c92df596589c019cb678 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 05:56:51 -0800 Subject: [PATCH 76/87] simplify workflow --- .../simple_workflow/make_workflow/Makefile | 107 ++---------------- .../scripts/compute_correlation.py | 22 ++-- .../make_workflow/scripts/download_data.py | 32 +++--- .../make_workflow/scripts/filter_data.py | 34 +++--- .../make_workflow/scripts/generate_heatmap.py | 12 +- .../make_workflow/scripts/join_data.py | 28 ++--- 6 files changed, 69 insertions(+), 166 deletions(-) diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/Makefile b/src/BetterCodeBetterScience/simple_workflow/make_workflow/Makefile index b1c642f..b3367b6 100644 --- a/src/BetterCodeBetterScience/simple_workflow/make_workflow/Makefile +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/Makefile @@ -1,107 +1,20 @@ # Simple Correlation Workflow using Make # -# This Makefile demonstrates a simple data analysis pipeline using Make. -# Each target represents a step in the workflow, with dependencies ensuring -# correct execution order. -# # Usage: -# make OUTPUT_DIR=/path/to/output all -# make OUTPUT_DIR=/path/to/output clean -# make OUTPUT_DIR=/path/to/output help +# make all - Run full workflow +# make clean - Remove output directory -# Configuration OUTPUT_DIR ?= ./output -DATA_DIR := $(OUTPUT_DIR)/data -RESULTS_DIR := $(OUTPUT_DIR)/results -FIGURES_DIR := $(OUTPUT_DIR)/figures -LOGS_DIR := $(OUTPUT_DIR)/logs - -# Data URLs -MV_URL := https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv -DEMO_URL := https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv - -# Python command -PYTHON := python - -# Default target -.PHONY: all -all: $(FIGURES_DIR)/correlation_heatmap.png - -# Help target -.PHONY: help -help: - @echo "Simple Correlation Workflow" - @echo "" - @echo "Usage: make OUTPUT_DIR=/path/to/output " - @echo "" - @echo "Targets:" - @echo " all - Run full workflow (default)" - @echo " clean - Remove all generated files" - @echo " help - Show this help message" - @echo "" - @echo "Individual steps:" - @echo " download_mv - Download meaningful variables data" - @echo " download_demo - Download demographics data" - @echo " filter_mv - Filter meaningful variables to numerical" - @echo " filter_demo - Filter demographics to numerical" - @echo " join - Join the two datasets" - @echo " correlation - Compute correlation matrix" - @echo " heatmap - Generate clustered heatmap" - -# Create directories -$(DATA_DIR) $(RESULTS_DIR) $(FIGURES_DIR) $(LOGS_DIR): - mkdir -p $@ - -# Step 1a: Download meaningful variables -.PHONY: download_mv -download_mv: $(DATA_DIR)/meaningful_variables.csv - -$(DATA_DIR)/meaningful_variables.csv: | $(DATA_DIR) $(LOGS_DIR) - $(PYTHON) scripts/download_data.py "$(MV_URL)" "$@" > $(LOGS_DIR)/download_mv.log 2>&1 - -# Step 1b: Download demographics -.PHONY: download_demo -download_demo: $(DATA_DIR)/demographics.csv - -$(DATA_DIR)/demographics.csv: | $(DATA_DIR) $(LOGS_DIR) - $(PYTHON) scripts/download_data.py "$(DEMO_URL)" "$@" > $(LOGS_DIR)/download_demo.log 2>&1 - -# Step 2a: Filter meaningful variables -.PHONY: filter_mv -filter_mv: $(DATA_DIR)/meaningful_variables_numerical.csv - -$(DATA_DIR)/meaningful_variables_numerical.csv: $(DATA_DIR)/meaningful_variables.csv | $(LOGS_DIR) - $(PYTHON) scripts/filter_data.py "$<" "$@" > $(LOGS_DIR)/filter_mv.log 2>&1 - -# Step 2b: Filter demographics -.PHONY: filter_demo -filter_demo: $(DATA_DIR)/demographics_numerical.csv - -$(DATA_DIR)/demographics_numerical.csv: $(DATA_DIR)/demographics.csv | $(LOGS_DIR) - $(PYTHON) scripts/filter_data.py "$<" "$@" > $(LOGS_DIR)/filter_demo.log 2>&1 - -# Step 3: Join datasets -.PHONY: join -join: $(DATA_DIR)/joined_data.csv - -$(DATA_DIR)/joined_data.csv: $(DATA_DIR)/meaningful_variables_numerical.csv $(DATA_DIR)/demographics_numerical.csv | $(LOGS_DIR) - $(PYTHON) scripts/join_data.py "$<" "$(word 2,$^)" "$@" > $(LOGS_DIR)/join.log 2>&1 - -# Step 4: Compute correlation -.PHONY: correlation -correlation: $(RESULTS_DIR)/correlation_matrix.csv - -$(RESULTS_DIR)/correlation_matrix.csv: $(DATA_DIR)/joined_data.csv | $(RESULTS_DIR) $(LOGS_DIR) - $(PYTHON) scripts/compute_correlation.py "$<" "$@" > $(LOGS_DIR)/correlation.log 2>&1 -# Step 5: Generate heatmap -.PHONY: heatmap -heatmap: $(FIGURES_DIR)/correlation_heatmap.png +.PHONY: all clean -$(FIGURES_DIR)/correlation_heatmap.png: $(RESULTS_DIR)/correlation_matrix.csv | $(FIGURES_DIR) $(LOGS_DIR) - $(PYTHON) scripts/generate_heatmap.py "$<" "$@" > $(LOGS_DIR)/heatmap.log 2>&1 +all: + mkdir -p $(OUTPUT_DIR)/data $(OUTPUT_DIR)/results $(OUTPUT_DIR)/figures + python scripts/download_data.py $(OUTPUT_DIR)/data + python scripts/filter_data.py $(OUTPUT_DIR)/data + python scripts/join_data.py $(OUTPUT_DIR)/data + python scripts/compute_correlation.py $(OUTPUT_DIR)/data $(OUTPUT_DIR)/results + python scripts/generate_heatmap.py $(OUTPUT_DIR)/results $(OUTPUT_DIR)/figures -# Clean target -.PHONY: clean clean: rm -rf $(OUTPUT_DIR) diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/compute_correlation.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/compute_correlation.py index 500366e..e09a55b 100644 --- a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/compute_correlation.py +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/compute_correlation.py @@ -2,7 +2,7 @@ """Compute Spearman correlation matrix. Usage: - python compute_correlation.py + python compute_correlation.py """ import sys @@ -18,24 +18,20 @@ def main(): """Compute Spearman correlation matrix.""" if len(sys.argv) != 3: - print("Usage: python compute_correlation.py ") + print("Usage: python compute_correlation.py ") sys.exit(1) - input_path = Path(sys.argv[1]) - output_path = Path(sys.argv[2]) + data_dir = Path(sys.argv[1]) + results_dir = Path(sys.argv[2]) - # Load data - df = pd.read_csv(input_path, index_col=0) - print(f"Loaded {df.shape} from {input_path}") + # Load joined data + df = pd.read_csv(data_dir / "joined_data.csv", index_col=0) + print(f"Loaded joined data: {df.shape}") # Compute correlation corr_matrix = compute_spearman_correlation(df) - print(f"Computed Spearman correlation matrix: {corr_matrix.shape}") - - # Save - output_path.parent.mkdir(parents=True, exist_ok=True) - corr_matrix.to_csv(output_path) - print(f"Saved to {output_path}") + corr_matrix.to_csv(results_dir / "correlation_matrix.csv") + print(f"Saved correlation matrix: {corr_matrix.shape}") if __name__ == "__main__": diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/download_data.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/download_data.py index b843979..a3abe23 100644 --- a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/download_data.py +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/download_data.py @@ -1,8 +1,8 @@ #!/usr/bin/env python3 -"""Download data from URL and save to CSV. +"""Download data files. Usage: - python download_data.py + python download_data.py """ import sys @@ -10,25 +10,27 @@ import pandas as pd +MV_URL = "https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv" +DEMO_URL = "https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv" + def main(): - """Download data from URL.""" - if len(sys.argv) != 3: - print("Usage: python download_data.py ") + """Download both data files.""" + if len(sys.argv) != 2: + print("Usage: python download_data.py ") sys.exit(1) - url = sys.argv[1] - output_path = Path(sys.argv[2]) - - # Create output directory - output_path.parent.mkdir(parents=True, exist_ok=True) + data_dir = Path(sys.argv[1]) - # Download and save - df = pd.read_csv(url, index_col=0) - df.to_csv(output_path) + # Download meaningful variables + mv_df = pd.read_csv(MV_URL, index_col=0) + mv_df.to_csv(data_dir / "meaningful_variables.csv") + print(f"Downloaded meaningful_variables.csv ({len(mv_df)} rows)") - print(f"Downloaded {len(df)} rows from {url}") - print(f"Saved to {output_path}") + # Download demographics + demo_df = pd.read_csv(DEMO_URL, index_col=0) + demo_df.to_csv(data_dir / "demographics.csv") + print(f"Downloaded demographics.csv ({len(demo_df)} rows)") if __name__ == "__main__": diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/filter_data.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/filter_data.py index 62f0c86..73c13bc 100644 --- a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/filter_data.py +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/filter_data.py @@ -1,8 +1,8 @@ #!/usr/bin/env python3 -"""Filter dataframe to numerical columns only. +"""Filter dataframes to numerical columns only. Usage: - python filter_data.py + python filter_data.py """ import sys @@ -16,26 +16,24 @@ def main(): - """Filter data to numerical columns.""" - if len(sys.argv) != 3: - print("Usage: python filter_data.py ") + """Filter both datasets to numerical columns.""" + if len(sys.argv) != 2: + print("Usage: python filter_data.py ") sys.exit(1) - input_path = Path(sys.argv[1]) - output_path = Path(sys.argv[2]) + data_dir = Path(sys.argv[1]) - # Load data - df = pd.read_csv(input_path, index_col=0) - print(f"Loaded {df.shape} from {input_path}") + # Filter meaningful variables + mv_df = pd.read_csv(data_dir / "meaningful_variables.csv", index_col=0) + mv_num = filter_numerical_columns(mv_df) + mv_num.to_csv(data_dir / "meaningful_variables_numerical.csv") + print(f"Filtered meaningful_variables: {mv_df.shape} -> {mv_num.shape}") - # Filter to numerical columns - df_num = filter_numerical_columns(df) - print(f"Filtered to {df_num.shape} (numerical columns only)") - - # Save - output_path.parent.mkdir(parents=True, exist_ok=True) - df_num.to_csv(output_path) - print(f"Saved to {output_path}") + # Filter demographics + demo_df = pd.read_csv(data_dir / "demographics.csv", index_col=0) + demo_num = filter_numerical_columns(demo_df) + demo_num.to_csv(data_dir / "demographics_numerical.csv") + print(f"Filtered demographics: {demo_df.shape} -> {demo_num.shape}") if __name__ == "__main__": diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/generate_heatmap.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/generate_heatmap.py index 1f55cde..11b0415 100644 --- a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/generate_heatmap.py +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/generate_heatmap.py @@ -2,7 +2,7 @@ """Generate clustered heatmap from correlation matrix. Usage: - python generate_heatmap.py + python generate_heatmap.py """ import sys @@ -18,18 +18,18 @@ def main(): """Generate and save clustered heatmap.""" if len(sys.argv) != 3: - print("Usage: python generate_heatmap.py ") + print("Usage: python generate_heatmap.py ") sys.exit(1) - input_path = Path(sys.argv[1]) - output_path = Path(sys.argv[2]) + results_dir = Path(sys.argv[1]) + figures_dir = Path(sys.argv[2]) # Load correlation matrix - corr_matrix = pd.read_csv(input_path, index_col=0) + corr_matrix = pd.read_csv(results_dir / "correlation_matrix.csv", index_col=0) print(f"Loaded correlation matrix: {corr_matrix.shape}") # Generate heatmap - output_path.parent.mkdir(parents=True, exist_ok=True) + output_path = figures_dir / "correlation_heatmap.png" generate_clustered_heatmap(corr_matrix, output_path=output_path) print(f"Saved heatmap to {output_path}") diff --git a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/join_data.py b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/join_data.py index 593a713..b8f09f4 100644 --- a/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/join_data.py +++ b/src/BetterCodeBetterScience/simple_workflow/make_workflow/scripts/join_data.py @@ -2,7 +2,7 @@ """Join two dataframes based on their index. Usage: - python join_data.py + python join_data.py """ import sys @@ -15,30 +15,24 @@ def main(): """Join the two datasets.""" - if len(sys.argv) != 4: - print("Usage: python join_data.py ") + if len(sys.argv) != 2: + print("Usage: python join_data.py ") sys.exit(1) - input1_path = Path(sys.argv[1]) - input2_path = Path(sys.argv[2]) - output_path = Path(sys.argv[3]) + data_dir = Path(sys.argv[1]) - # Load data - df1 = pd.read_csv(input1_path, index_col=0) - df2 = pd.read_csv(input2_path, index_col=0) + # Load filtered data + mv_df = pd.read_csv(data_dir / "meaningful_variables_numerical.csv", index_col=0) + demo_df = pd.read_csv(data_dir / "demographics_numerical.csv", index_col=0) - print(f"Dataset 1: {df1.shape}") - print(f"Dataset 2: {df2.shape}") + print(f"Meaningful variables: {mv_df.shape}") + print(f"Demographics: {demo_df.shape}") # Join - joined = join_dataframes(df1, df2) + joined = join_dataframes(mv_df, demo_df) + joined.to_csv(data_dir / "joined_data.csv") print(f"Joined: {joined.shape}") - # Save - output_path.parent.mkdir(parents=True, exist_ok=True) - joined.to_csv(output_path) - print(f"Saved to {output_path}") - if __name__ == "__main__": main() From 066be6766db7e4806358c45c97bc8780a6737e6c Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 05:57:39 -0800 Subject: [PATCH 77/87] clean up figure --- .../simple_workflow/visualization.py | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/src/BetterCodeBetterScience/simple_workflow/visualization.py b/src/BetterCodeBetterScience/simple_workflow/visualization.py index 5c30f19..f6dddca 100644 --- a/src/BetterCodeBetterScience/simple_workflow/visualization.py +++ b/src/BetterCodeBetterScience/simple_workflow/visualization.py @@ -13,7 +13,7 @@ def generate_clustered_heatmap( corr_matrix: pd.DataFrame, output_path: Path | None = None, - figsize: tuple[int, int] = (12, 10), + figsize: tuple[int, int] = (8, 10), cmap: str = "coolwarm", vmin: float = -1.0, vmax: float = 1.0, @@ -49,13 +49,12 @@ def generate_clustered_heatmap( figsize=figsize, dendrogram_ratio=(0.1, 0.1), cbar_pos=(0.02, 0.8, 0.03, 0.15), - xticklabels=True, + xticklabels=False, yticklabels=True, ) - # Rotate x-axis labels for readability - plt.setp(g.ax_heatmap.get_xticklabels(), rotation=90, fontsize=6) - plt.setp(g.ax_heatmap.get_yticklabels(), rotation=0, fontsize=6) + # Set y-axis label font size + plt.setp(g.ax_heatmap.get_yticklabels(), rotation=0, fontsize=3) # Set title g.fig.suptitle("Clustered Correlation Heatmap (Spearman)", y=1.02, fontsize=14) @@ -63,7 +62,7 @@ def generate_clustered_heatmap( # Save if output path provided if output_path is not None: output_path.parent.mkdir(parents=True, exist_ok=True) - g.savefig(output_path, dpi=150, bbox_inches="tight") + g.savefig(output_path, dpi=300, bbox_inches="tight") return g From 584e0c112dad82c7036c98ae4761a518bdc6622f Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 05:59:00 -0800 Subject: [PATCH 78/87] add return values --- src/BetterCodeBetterScience/simple_workflow/filter_data.py | 1 + src/BetterCodeBetterScience/simple_workflow/load_data.py | 1 + 2 files changed, 2 insertions(+) diff --git a/src/BetterCodeBetterScience/simple_workflow/filter_data.py b/src/BetterCodeBetterScience/simple_workflow/filter_data.py index 292d73a..c4b3bbe 100644 --- a/src/BetterCodeBetterScience/simple_workflow/filter_data.py +++ b/src/BetterCodeBetterScience/simple_workflow/filter_data.py @@ -53,3 +53,4 @@ def filter_demographics(df: pd.DataFrame) -> pd.DataFrame: Filtered dataframe with only numerical columns """ return filter_numerical_columns(df) + diff --git a/src/BetterCodeBetterScience/simple_workflow/load_data.py b/src/BetterCodeBetterScience/simple_workflow/load_data.py index 59a3c98..5f67460 100644 --- a/src/BetterCodeBetterScience/simple_workflow/load_data.py +++ b/src/BetterCodeBetterScience/simple_workflow/load_data.py @@ -84,3 +84,4 @@ def load_demographics( df.to_csv(cache_path) return df + From 7e87b67e13246fc56cfd0ed1801fc8b8a7c4009d Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 05:59:23 -0800 Subject: [PATCH 79/87] first draft of simple workflows --- book/workflows.md | 123 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 98 insertions(+), 25 deletions(-) diff --git a/book/workflows.md b/book/workflows.md index 041d1f1..075c0f8 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -20,7 +20,7 @@ Third, we care about the *engineering quality* of the code, which includes: - *Maintainability*: The workflow is structured and documented so that others (including your future self) can easily maintain, update, and extend it in the future. - *Modularity*: The workflow is composed of a set of independently testable modules, which can be swapped in or out relatively easily. -- *Idempotency*: This term from computer science means that the result of the workflow does not change after its first successful run, which allows safely rerunning the workflow when there is a failure. +- *Idempotency*: This term from computer science means that the result of the workflow does not change after its first successful run, which allows safely rerunning the workflow. - *Traceability*: All operations are logged, and provenance information is stored for outputs. Finally, we care about the *efficiency* of the workflow implementation. This includes: @@ -91,10 +91,99 @@ Name: EverArrested, dtype: float64 Note that `pandas` data frames also include an explicit `.pipe` method that allows using arbitrary functions within a pipeline. While these kinds of pipelines can be useful for simple data processing operations, they can become very difficult to debug, so I would generally avoid using complex functions within a method chain. +## A simple workflow example -## An example of a complex workflow +Most real scientific workflows are complex and can often run for hours, and we will encounter such a complex workflow later in the chapter. However, we will start our discussion of workflows with a relatively simple and fast-running example that will help us understand the basic concepts of workflow execution. We will use the same data as above (from Eisenberg et al.) to perform a simple workflow: -In this chapter we will focus primarily on complex workflows that have many stages. I will use a running example to show how to move from a monolithic analysis script to a well-structured and usable workflow that meets most of the desired features described earlier. For this example I will use an analysis of single-cell RNA-sequencing data to determine how gene expression in immune system cells changes with age. This analysis will utilize a [large openly available dataset](https://cellxgene.cziscience.com/collections/dde06e0f-ab3b-46be-96a2-a8082383c4a1) that includes data from about 1.3 million immune system cells for about 35K transcripts. I chose this particular example for several reasons: +- Load the demographic and meaningful variables files +- Filter out any non-numeric variables from each data frame +- Join the data frames using their shared index +- Compute the correlation matrix across all variables +- Generate a clustered heatmap for the correlation matrix + +### Running a simple workflow using UNIX make + +One of the simplest ways to organize a workflow is using the UNIX `make` command, which executes commands defined in a file named `Makefile`. `make` is a very handy general-purpose tool that every user of UNIX systems should become familiar with. The Makefile defines a set of labeled commands, like this: + +```Makefile + +all: step1 step2 + +step1: + python step1.py + +step2: + python step2.py +``` + +In this case, the command `make step1` will run `python step1.py`, `make step2` will run `python step2.py`, and `make all` will run both of those commands. This should already show you why `make` is such a handy tool: Any time there is a command that you run regularly in a particular directory, you can put it into a `Makefile` and then execute it with just a single `make` call. Here is how we could build a very simple Makefile to run our simple workflow: + +```Makefile +# Simple Correlation Workflow using Make +# +# Usage: +# make all - Run full workflow +# make clean - Remove output directory + +# if OUTPUT_DIR isn't already defined, set it to the default +OUTPUT_DIR ?= ./output + +# run commands even if files exist with these names +.PHONY: all clean + +all: + mkdir -p $(OUTPUT_DIR)/data $(OUTPUT_DIR)/results $(OUTPUT_DIR)/figures + python scripts/download_data.py $(OUTPUT_DIR)/data + python scripts/filter_data.py $(OUTPUT_DIR)/data + python scripts/join_data.py $(OUTPUT_DIR)/data + python scripts/compute_correlation.py $(OUTPUT_DIR)/data $(OUTPUT_DIR)/results + python scripts/generate_heatmap.py $(OUTPUT_DIR)/results $(OUTPUT_DIR)/figures + +clean: + rm -rf $(OUTPUT_DIR) +``` + +We can run the entire workflow by simply running `make all`. We could also take advantage of another feature of `make`: it only triggers the action if a file with the name of the action doesn't exist. Thus, if the command was `make results/output.txt`, then the action would only be triggered if the file does not exist. This is why we had to put the `.PHONY` command in the makefile above: it's telling `make` that those are not meant to be interpreted as file names, but rather as commands, so that they will be run even if files named "all" or "clean" exist. + +For a very simple workflow `make` can be useful, but we will see below why this wouldn't be sufficient for a complex workflow. For those workflows we could either build our own more complex workflow management system, or we could use an existing software tool that is built to manage workflow execution, known as a *workflow engine*. Later in the chapter I will show an example of a purpose-built workflow management system, but for this first example we will now turn to a general-purpose workflow engine. + +### Using a workflow engine + +There is a wide variety of workflow engines available for data analysis workflows, most of which are centered around the concept of an "execution graph". This is a graph in the sense described by graph theory, which refers to a set of nodes that are connected by lines (known as "edges"). Workflow execution graphs are a particular kind of graph known as a *directed acyclic graph*, or *DAG* for short. Each node in the graph represents a single step in the workflow, and each edge represents the dependency relationships that exist between nodes. DAGs have two important features. First, the edges are directed, which means that they move in one direction that is represented graphically as an arrow. These represent the dependencies within the workflow. For example, in our workflow step 1 (obtaining the data) must occur before step 2 (filtering the data), so the graph would have an edge from step 1 with an arrow pointing at step 2. Second, the graph is *acyclic*, which means that it doesn't have any cycles, that is, it never circles back on itself. Cycles would be problematic, since they could result in workflows that executed in an infinite loop as the cycle repeated itself. + +Most workflow engines provide tools to visualize a workflow as a DAG. #simpleDAG-fig shows our example workflow visualized using the Snakemake tool that we will introduce below: + +```{figure} images/simple-DAG.png +:label: simpleDAG-fig +:align: center +:width: 300px + +The execution graph for the simple example analysis workflow visualized as a DAG. +``` + +The use of DAGs to represent workflows provides a number of important benefits: + +- The engine can identify independent pathways through the graph, which can then be executed in parallel +- If one node of the graph changes, the engine can identify which downstream nodes need to be rerun +- If a node fails, the engine can continue with executing the nodes that don't depend on the failed node either directly or indirectly + +There are a couple of additional benefits to using a workflow engine, which we will discuss in more detail in the context of a more complex workflow. The first is that they generally deal automatically with the storage of intermediate results (known as *checkpointing*), which can help speed up execution when nothing has changed. The second is that the workflow engine uses the execution graph to optimize the computation, only performing those operations that are actually needed. This is similar in spirit to the concept of *lazy execution* used by packages like Polars, in which the system optimizes computational efficiency by first analyzing the full computational graph. + +#### General-purpose versus domain-specific workflow engines + +With the growth of data science within industry and research, there has been an explosion of new workflow management systems that aim to solve particular problems; a list of these can be found at [awesome-workflow-engines](https://github.com/meirwah/awesome-workflow-engines). One important distinction between engines is the degree to which the workflow definition is built into the code, or whether it is defined in a *domain-specific language* (DSL). We will look at two examples below, one of which (Prefect) builds the workflow details in the code, and the other (Snakemake) uses a specialized syntax built on Python to define the workflow. + +It's also worth noting that there are a number of domain-specific workflow engines that are specialized for particular kinds of data and workflows. Examples include [Galaxy](https://galaxyproject.org/) which is specialized for bioinformatics and genomics, and [Nipype](https://nipype.readthedocs.io/en/latest/index.html) which is specialized for neuroimaging analysis workflows. If your research community uses one of these then it's worth exploring that engine as your first option, since it will probably be well supported within the community. However, a benefit of using a general-purpose engine is that they will often be better maintained and supported, and AI tools will likely have more examples to work from in generating workflows. + +### Using the Snakemake workflow engine + + + +- show how one can run snakemake with an output file name to reconstruct that file (using --force if it already exists) + +## Scaling to complex workflows + +We now turn to a more realistic and complex scientific data analysis workflow. For this example I will use an analysis of single-cell RNA-sequencing data to determine how gene expression in immune system cells changes with age. This analysis will utilize a [large openly available dataset](https://cellxgene.cziscience.com/collections/dde06e0f-ab3b-46be-96a2-a8082383c4a1) that includes data from 982 people comprising about 1.3 million immune system cells for about 35K transcripts. I chose this particular example for several reasons: - It is a realistic example of a workflow that a researcher might actually perform. - The data are large enough to call for a real workflow management scheme, but small enough to be processed on a single laptop (assuming it has decent memory). @@ -102,6 +191,8 @@ In this chapter we will focus primarily on complex workflows that have many stag - There is an established Python library ([scanpy](https://scanpy.readthedocs.io/en/stable/)) that implements the necessary workflow components. - It's an example outside of my own research domain, to help demonstrate the applicability of the book's ideas across a broader set of data types. +I will use this example to show how to move from a monolithic analysis script to a well-structured and usable workflow that meets most of the desired features described above. + ### Starting point: One huge notebook I developed the initial version of this workflow as many researchers would: by creating a Jupyter notebook that implements the entire workflow, which can be found [here](). Although I don't usually prefer to do code generation using a chatbot, I did most of the coding for this example using the Google Gemini 3 chatbot, for a couple of reasons. First, this model seemed particularly knowledgeable about this kind of analysis and the relevant packages. Second, I found it useful to read the commentary about why particular analysis steps were being selected. For debugging I used a mixture of the Gemini 3 chatbot and the VSCode Copilot agent, depending on the nature of the problem; for problems specific to the RNA-seq analysis tools I used Gemini, while for standard Python/Pandas issues I used Copilot. The total execution time for this notebook is about two hours on an M3 Max Macbook Pro. @@ -268,11 +359,8 @@ Combining these strategies of reducing data duplication, eliminating some interm The use of a modular architecture for our stateless workflow helps to separate the actual workflow components from the execution logic of the workflow. One important benefit of this is that it allows us to plug those modules into any other workflow system, and as long as the inputs are correct it should work. We will see that next when we create new versions of this workflow using two common workflow engines. -### Using a workflow engine - -There is a wide variety of workflow engines available for data analysis workflows, most of which are centered around the concept of an "execution graph." This is a graph in the sense described by graph theory, which refers to a set of nodes that are connected by lines (known as "edges"). Workflow execution graphs are a particular kind of graph known as a *directed acyclic graph*, or *DAG* for short. Each node in the graph represents a single step in the workflow, and each edge represents the dependency relationships that exist between nodes. DAGs have two important features. First, the edges are directed, which means that they move in one direction that is represented graphically as an arrow. These represent the dependencies within the workflow. For example, in our workflow step 1 (obtaining the data) must occur before step 2 (filtering the data), so the graph would have an edge from step 1 with an arrow pointing at step 2. Second, the graph is *acyclic*, which means that it doesn't have any cycles, that is, it never circles back on itself. Cycles would be problematic, since they could result in workflows that executed in an infinite loop as the cycle repeated itself. -Most workflow engines provide tools to visualize a workflow as a DAG. #DAG-fig shows our example workflow visualized using the Snakemake tool that we will introduce below: +### Using a workflow engine ```{figure} images/snakemake-DAG.png :label: DAG-fig @@ -282,22 +370,10 @@ Most workflow engines provide tools to visualize a workflow as a DAG. #DAG-fig s The execution graph for the RNA-seq analysis workflow visualized as a DAG. ``` -The use of DAGs to represent workflows provides a number of important benefits: - -- The engine can identify independent pathways through the graph, which can then be executed in parallel -- If one node of the graph changes, the engine can identify which downstream nodes need to be rerun -- If a node fails, the engine can continue with executing the nodes that don't depend on the failed node either directly or indirectly - -There are a couple of additional benefits to using a workflow engine. The first is that they generally deal automatically with checkpointing and caching of intermediate results. The second is that the workflow engine uses the execution graph to optimize the computation, only performing those operations that are actually needed. This is similar in spirit to the concept of *lazy execution* used by packages like Polars, in which the system optimizes computational efficiency by first analyzing the full computational graph. - -#### General-purpose versus domain-specific workflow engines - -With the growth of data science within industry and research, there has been an explosion of new workflow management systems that aim to solve particular problems; a list of these can be found at [awesome-workflow-engines](https://github.com/meirwah/awesome-workflow-engines). One important distinction between engines is the degree to which the workflow definition is built into the code, or whether it is defined in a *domain-specific language* (DSL). We will look at two examples below, one of which (Prefect) builds the workflow details in the code, and the other (Snakemake) uses a specialized syntax built on Python to define the workflow. - -It's also worth noting that there are a number of domain-specific workflow engines that are specialized for particular kinds of data and workflows. Examples include [Galaxy](https://galaxyproject.org/) which is specialized for bioinformatics and genomics, and [Nipype](https://nipype.readthedocs.io/en/latest/index.html) which is specialized for neuroimaging analysis workflows. If your research community uses one of these then it's worth exploring that engine as your first option, since it will probably be well supported within the community. However, a benefit of using a general-purpose engine is that they will often be better maintained and supported, and AI tools will likely have more examples to work from in generating workflows. ### A language-specific workflow management example: Prefect +- First build a very simple workflow example using Prefect #### Configuration management @@ -307,6 +383,7 @@ The initial version of the Prefect workflow generated by Claude had the default ### A general-purpose workflow management example: Snakemake +- First build the simple example using snakemake #### Pipeline optimization @@ -324,7 +401,7 @@ I asked Claude to fix this, and it returned the following change: > 4. Set NUMBA_NUM_THREADS and OMP_NUM_THREADS environment variables in dimred.py > In contrast, Prefect tasks run in the main process with access to all CPUs by default, which is why it was faster. -This solves the problem but it's an odd choice: in particular, it will probably fail if there are fewer than 8 threads available on the system. Snakemake actually take a command line argument (`--cores`) to specify the number of cores to use, so I instead asked Claude to have Snakemake use the number of cores specified at the command line rather than an arbitrary number that might fail if the requested number of cores are not available. We will discuss optimization in much greater detail in a later chapter. +This solves the problem but it's a brittle soluution: in particular, it will probably fail if there are fewer than 8 threads available on the system and it won't take advantage of more than 8 if they are available. Snakemake actually take a command line argument (`--cores`) to specify the number of cores to use, so I instead asked Claude to have Snakemake use the number of cores specified at the command line rather than an arbitrary number that might not be optimal. We will discuss optimization in much greater detail in a later chapter, but whenever a pipeline takes much longer to run using a workflow manager than one would expect, it's likely that there is optimization to be done. @@ -365,10 +442,6 @@ https://workflowhub.eu/ ## Report generation - - - - ## Scaling workflows - maybe leave this to the HPC chapter? From 89a5ab2798aeed594c9db659d37344be4693a653 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 05:59:36 -0800 Subject: [PATCH 80/87] initial add --- book/images/simple-DAG.png | Bin 0 -> 92015 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 book/images/simple-DAG.png diff --git a/book/images/simple-DAG.png b/book/images/simple-DAG.png new file mode 100644 index 0000000000000000000000000000000000000000..aa0abe0ef4f8934fbacc93ea7450fd29087430f1 GIT binary patch literal 92015 zcmbrmby(D0*ET#VAWAEe3W#)zbV#>!NlP2eDWxb zr4IauW+W#i4nZLQeQPO*gFq-CGU6g??rFPoZfP^CcaQC5=Fj(9SiNXXSW_gV#XdfH zLxn0!82zj-AOF{Vjj*=lM^qBJ9Q=n=@p$ObZ0tR){ez~#Gc$E9Hd`0LVM;bKv^tdn z`zO9@I2&YI*VD0Q0tQ>dC()Q4xZv){mqSRX>wn%sAcmp}82@>H^puzRKkqsjSF8W? z4!1lSj(l_oM7a6??m(OBN9d0S-1%p*SQ=;6oa&*eskL5Ew`%`~kk8^dd8eLz-wT+~ zB6@|X3${+^*;#hv(O<2cb6hp2zn@lsyJik4z#}b!Q6S$0Vn?AIW})m>bXAgZ-zPR= za^Fvj1TwwfHe+3}{6SvVSp2=Ln9I#2xGBTTdl(x`#Wg|bXzmsU`zdvD(8vcdXsgH9 zL=JI&I;E$3_Gzy5gN?eJESd@ZdSdXhfA8>KwtVOvi7aN&vz)9^YZ3+Jb&KFp{#e7W z7TpwuGjMGyp^sKVy2kvAS?R;rk0FpnS`w8ca}o^`vN=|$QA#Vt^pY@-cPm-yUC4lM zeYeK`nX5SeOUUPsV#y!;d4VAH-^Q;XtNnCABf}N-xzxlK?`2NmV4T zkdl_++B(HSIbCr}rQtZucN5c6O3uNnD9MlXl?KNMF&a}nbqlf=d%}yfwr)OQXZ~gT zg7t4PAQYMHO0nq;JiJg!k!bJ)XpYln8(V^<2NP2`Qrwu}Y}2*mAcaBg6gv{KJ>hG< zsEKLIJ4N#eGqgt^9zy8)OkRFX@bHK=6?N_iF;ke?vYi~eRxnl6sz&--2wSDWNI;R+ zMDF%U&MO&%)srdhQmYyMHZvkSq?M#Unw|8_?pj3z-48#d;NxW;xWuk4XRY(yG!|gj z%zN#E79l|z45xbl5e`3Lw@n(#msaO|d(!uIi%>OJGUlgn=zhjr#DHB7%r=pV6W=tS zE{;<90i+W@$N~QTF|!xjo7hp4T!Z*@4^y@Lx(HE)neA&srxl^^WqnEu($7qSvZ&yH zS>Z})P&)YuLQ)O_tfDaCaPZ)WHDdvm%+fhMV;6NTJx&cx)6|lb*P>;DRhV~XYs~$j z^QOD49Xop#*7xRaL-TC&CGfSc;dfX;D*mmV7IecFD)djQa32Z#853nj?K?DWr|nxN zl}v9HJ2w>na@Q9t@9!UK8MfB&x};Ng(oQ)_(3&8#Hc$*cM%|)UjWPZ3#nk(YX&U{x zWF#8ob8FYq&iC3^g|&MnHLo1u*jskF(gVC(Ngw{`hLhIDz^7_Ti>~C!ZQ||;VnuA6&)qd{<4z2Bz}`2OgwZR>XZxv_)2R%+t8Jv*DWGumueqOV|e;p#ni z=bB|9 zfEN*-r=l>k$e(aFduY792;H{n)nx4YrU2--$2FN=cswSnnV zkE=T1gA#l{T02&3N=ppZ>@#9&d6HuVI0bF=KMa(kLvZ_VPrT~feBT_^mIhVgud-^I zOPg>yPIv5)y4JM5$!n_!8jXO~|6vXx;N#U~+U4J1n{av~X*bi4raDLozCzzIxTaH-D^22o?yb?x? z@qE0ye6Z|*G(|s=J@LzL0S5jZLYfuf7ugRX(x2WNXj;v5@4X_;`xBoS>uoDC@5mne zNJgBkz!crn3SQF9-?rn@>+Cc9z1g&(s~s;C1z63rw5*RgqP$v`XccDP7e3X{)?KK) z1U;CXoV=ARk8+1{B&K~Yfvap3KYsxcHdu7!=He)=l_O&9H6l}2v*(r|euFQQ>!?k> za+SXnNbY$a!YUb)0@j&4a58!i5Z&7Uh94;1Y?zMMw75#J)PxJQExP`xA!D$bDHtd{ zIB)zYdpz2lwqSYq@j2x4@z=;fyHxFa`3o+L&%Hc_^@oZi@{Uw~A<_yvrn3g4rQho= zVk--FFouTO1Ur(_QA746e#$+%p1NZ&>^rifiJtagIG`Yx7cptngz zF*#w5518ot(ceL4c?1u_JC09~bIs zeHdE5f(OsteDIt9HDd;xT_;5&Oh`;FCPgM$uFFi<@AX><!EQx#RS)ir&^7&I4A0Ed1Qd`+!(*1ie(dX)Vi|Kx;XAjBE2 zSBrk}?nNfTzC<~G&-rkIvCqO0rDmT4?BS3#=zCdmUGRJ3$E!HsrRv=sH4OTFXP1NT zz(Jjr=-9b$l8CnmiwU1YAh$hR0y?U7BRf0Kv}mY3JPq%epUyAl#%tq=;08}~htO6_ z%J)%A;813v0zfJHD{vi~{Wv1~jmb;P1^fCtj4{zrR@d&M>#nvis%okt`ktmIPnaRX z)kG1GWW&hEwkjnOx3=epXGd<(VI%a3a4ufOeS%kBf4|GOa*S{k_SL2ux<(0V-$}3**(hARaUbL>f-W#NJb)t)KN&LryI4q9SEl+d_v@JPjJXX$w?6I9P|Is-k5JDC4d{P6^z~L!d-j~-~1oi|}r|ao>T0b_imC#U-d)HbjUmkiI zo|58XljRYgCJ#J~>Idb?CcTy>M#4zAozn`VPxE6()qZ|`yCx;5>~Zy<@IXr>n!SEY za=o^}@?7g%#TI9>V6^F4_2Zd|FkKjLXvX@43{mJO0geT0TTe^`uN4_ccI&~TKJrj?VVB~hn@kC z9LFngF-V(M19$!wMu2R8DY+{H2fK;4A(5LH+%ikQ8{d?Pp*}481A`PQ;J-b+jpHUH z3~JN!6~jwB#OHk-Lr}Mx))ixcTSn@djg18s_I+?1O3Om7*1da1u8^;WlcNv8_&?a?vT-h)W6U)92$e}`#tEa;O%L$NuH3*** zG+ip?{gdYEw$mgEbxvvhWZjjwR7wEAWQZiQf==(UYh^!?(-Q5u%XsaOcdP9@Whe&# zzBlM9Ysg8>uO2;NS|1k88z{B<`XfjJleVA~gsdv^HD<~uj;{L$oVhIFW5EAUP4SD; z(=WVKzFFGZtBp=Y6~0udzVx26d1Zi}U}laGE$CN%M+Si?p(kg0J(3`H@1NbjF5BNT z{LYb<=4!X2ot2$cROKWu_Uj`p80ad|V=Vb}YPH6?vqzl%U$Vz*6$92?*oZ>kjrf*1 z3V_GQPiLWhb4QsqLa@W<_f1UCyocAOq@=R9SBU0?gk`t6F2VLwV2sb4uTP$4?JLL+ z#sR9!(4F%>$idCkhgK~m-6P>oGWr7uMI$?$b=)G>BTOJ;eLv$atBh?`d�(Q&i#& z+&3eM#1f1k$FTV$blF`Yz{m%1m$bAM}xwr=`pz8jNZd5{a+S@wgamscMrwH z9zdu*ho7Qq+o0JgN9}!^dBYP!TK?B@I!-I2z8}8Zanfwj3H$i>Ca1d}3yR(97sD+w3DOxO66bd~H}zo2<3!Zqx~dNa<9< zkz(*5!aX6?9-`b-QEpG)Tctqw0)+Imx34TE!BpZUB=QC06A5KkImt=CkqC^2pS$an zmRZhci=Koy`(&M}Bkc;_8MZ=s>|(~$$M&Ggyks?ObG45153x*TWRlAqIRI|(f*=g= zah|fg91bFIA{tF-#HMH!5$7--D~fO6ES9PyAjRC_rPPYv0JwGS4-iJvyp| z-z${)8UpDoa$YKH70e3veJQHIsOG;-@6?w2WMl%M1Egg;lS9TLw@`cl>J=cQiS+(A z;K5#emG5wVc(Q&vc`joRHJUaq*R;JS?^30d9^y>brzJeoq~t8HaT+_k z2b{$Z3I&6sY|oYHWy3!qsA;$|rwlP*(JU2y0_BQO;=~6Ih#|B6CqJuMLZO#4eKbvD z)6+gNE3=am`M8DbsM}guQ)kObs^9XmCN`ev{dx>yz$e_oE!|ivJ{jK9s3T`j68A86 zHZx#c;Z74GS$D+zQdtghd0)GDyc|UV=TmR+O*B#XL%~>(r$+rmv3o$ZR;`yolX7%K zVER-X+y`=O`c#)cpV*+FKzZ#GW#3xa#$A(0R2s+}V~7I|VjJgNRK_Xj3tX-n142hm zB~-?dCmBjl5I%ju>oaF5`AiveB?roH7emI31^FD1Kd}~<8&uarn-=US(+Aaie{UH? zfu2Jn(?W`%UMblA_>6!*Vb6&ihibF=Sd0u!U;rSto2h z=f3*fwg#Qk8C=*9eZC+;0}h@;Fx>y*fG@lh2RON*sAlwY`hq-Yff63gF-S7qMfDsl z$YEq8m#*{}!SrVwA#{A`I{W`Odqc4rNY&p|2(1=@;n>U%y|{b@}cJ!WXS7LFicNR_mB&o{CPKwEhXr6zAF=sLx$nSMqmVS z@xDFk-fi_gXt^*cTT0N>R~Gn%62}AdCn|x|f%Bd>QQ~x|$e{b#Jpg#za#y?mz zwsc-6GkkxUPYw$#Qa*jA-G9@%$%ylz(4V3|$3a0sWo2cr z#e%|#-_bPpQc;%X<^2}QHCIqjaBy%?QHi~E=|k*dc5rZUX=rG;y1K4!Y!DIUw8Bn46o6NyPTksN#?XK02zVswx({`t_}`GoM~z>gLu~QE91-rKPOA zJh-#Gq9V@LUa0qsZwyZwzE(r`N>Nb}JslkzNko?I#3#0Z>%@TEQHNh_f0EysTUuF3 zMUkZ^CleI7nP<=EuHrCS8l9G}_+}}OMe)%;ul~J%+yl{)?$$4#= z&st~fBJ2QRpA=umxZHHm+H}2L(%>aXO}#eJcJJch!P~*6QIc?X16LFpXbSRwjHB=H z`$c=(hVR`qvrg^z($a{g!+zE_fh{S!)+?HWr*8YRy)5rsw$%#Yy-kz+2hANUuDg?O zZHLzfQm3-3%!k-9SB~5>y!IO$R=OPCjBYs-8h9Vo+RQv0Z}z&_*B5dj%GBHJV9{&j zxYRY~N-b|WpDxv@J8wk0@?yiBwVNc*9Wc|Wu~b)9#=B7c^YyL&4W9S=1v%E1&N&Rp z9Ea&@sN1}UIe3+mo?UsDx7}ZRY@`>A-4Ai*;JuOjuG>%v=4*m8;KHU~AkB4ex@J?# z!1pSzk`YHqS=r%Y&JD3^F7z(>1BysAm6n>CGgvC)6B9QR&9_pfrly!3-se;1_dmNL zU-jEsU7v2<3+cn|&URCT?k|nG1_1=fQUxXHdw$WY%+C+HYdXzTmnQRA#dCFv5reRT!rg>$xC+5v=%Ej(f2V3~ihSzpcfr_f?ogHkiW}*`Z?dt6;@@S*` zHft~{aIXheU-?|NOUheGZe6!-q}QX47Tf(X2$<08J&h_p(#WScBknGmbXkt-7&q_8 z_m@ipFKWxwnS-QvDqel(sR+2poS4W_z+|Z7AW|nGJsz5W26OPK&C9aJTuQIW9+{45 z>lU-YWE&~|@wqg>O}FKAMDAYpu(+h`7_l~4Vcd4NXV<_fDl6LuEPB!YnEbG8DJSZ^ zQIc+z8CD0Iyu3V)z@e}(iwzqQ0gG^89OX`!7i^V*(6G<@2+cBi~816Npi^tqzIYv0Ik(ChuCgK-NEQO4Kk zl&q|FbB!*$I;~ghq_q}HzXF^85xhO@nB}ydZ%%fr=x1sA(g`}x zYe@-uyc1prEws)>lNKDl*INbH!Txt)Z6xmd4IARE5zEU)u4^Tl<=F1!(JvS^e#m9; zxrE+%1vpNh?MzgaFf%dbLbXkn*4NEyIaSrvR!^=T5=1+t*$XY^%cpvJdYTQS{Pvl! zDsQ3jW$TH0O;O-;f3+bosGMZr+c0MN;u@XuMtU%6c;!eJ_~SBnD6L|Ku9g;kwTtQN zf{4@coRN`FYEbJQA+L+;u$>iVEL3Cko9}s!R#Nw5=D#eFv(D!F{OP!Arq6kDkKn3l ze=_t(aWp4VQ@gMJ;RL|{`C1W6LJ&N?iqf6b`8!qmj%kW2WHuf#p{ z-J?tI_q9zutiDHKpH=qTuZN|`*Vfj)z&?a#*H6TGmYV&#*?nMDzZ}X~9~MKUrLD~e zAf>sv`EzFbY=e_OY~=Xj+sg+9{v#tQF+>3ME)Euipd(aAN<%^j@2oI=ez$$h{a2k{ z&zkpYXYdiVreq2uPSb1}`x|GTc5QdAN8H=ajcyA*Nq2U?R;$UPXlZGK^-j+BW+VuF z!Ng8cYkvQTr^Il=v*i1ewe@ltcrec8`@Gd^cH>VoPbG7XF4_=+Zib~Ar1oxp6d~E! z*~P^S^HCJlQ6~392Mrr!`|Qcd$;Msb#2gk%!sX4Ur>ApGZehE!c@zZ^oB26o|DBpJ z#PZ(}B?dHJb7oE*{^JYx$%qG%+Aq4gilhZixl$*}^!f#_z|%bzy5lhW^Y#4PH7osi zGbak|Gz5E)%7rf-Vb=QB7VDBkkZkC{_oC@@0`Szw7e)IEZ7^YL-wF*lwY$O%i1PMr z;~ppLE%(d14?;m^+SFTMnxZD)`3O^V#d8H9{`xvzS9dVtL|uO;f# z+aY$VhHmY2t1a+uy2XOAQBl^N`5q~7RGAGtjf5S{x9mPdT2z?+;*%+>#y~p%1waTs z-&=PnsqTvf7y^d4fXx!j-(Hgz__WZ%JAH#ujOE5v;wPpcE`1}w6sK{blLTX z$4+S-{(`@weVgGq-}p$L9Jlr4-^oe$+sh;Fd3XPtZx}c46ciO<$Gx;|i2IxC>ubPu zW?g4&W)12QB<7z72M2d|tzDB1JAz8}n#LD!5lg}39?($v929RmTd>xVxtE5OIUvA^6=D`0VvaTgJFdlg;Nakw%IYuM! zDoL$4gHmRMTt-1*@ZFHW%E7%n{B)|y97HvJSUI!}1ZY)@f*-`YNJzih$eb7-Z=BnI z^cY(JaeFdL?k;SOb=3IP>q@5pUiz`(*MfPf;n&*QT28C+)%EpC3?}Vry2ZV>AK}!F z3NypQ?dMUnGROZsCPUx%zW}QkRUmv&P-=3|^<3*=gycy%>@LhrEb9F=-OV{zt1uiC z+Z7(3yU9vZjH9-2N#nzsjs!-{N+~BmYU~eZIisn1y<&ZU8%%a6luuOQ@P=rgc8WBCLO!1#my_@t!Ay3DCuHrjv# z{WMYptETB^EVZ0p-S3v!vmFCON*#Rg5stu57S)d*iT)Dmt^cn-1R36TL|=w|pm~1K zQHF%=`g7vwr27?e#HTPB?8uwe!DzC61A;gC=y=v?2uojh-Qm~W zXJ2Y+>IWAQUIS3mF5GB1HVH>ao%v|0U2AVGR62IH&d%UaYH?1A9TvXJ95OmOx_DB% z0Q`Ka4t6rY-P})y&TF$;OFEZ`_WY_V=(eh^hZXx@KIk#7Qd;u{7S(T!DP8pIHommr z_~fKP9(5eavE(E?TkcP{;MU5AicuFk)6E|5ZVcsVo?kRQD++6BYMP2y2lfH6qut8Z z+Im%=?#bmdqaXQlt7+#I-DJjXeNFq%W&yyr{1&vR-w5d@ho#-!EZm=S$JpF0;lR)* zHRR>7Zjye75k6)by4P(`eD4iN(}S>zw!$LS0yI00MSFu*?*=9JI9f&Q`B3)`g+X(3 zfcBRNve^f*VS8)r01fsR)(%9HAGR)imb3UyubTg(AH(9lKGSzB%|Vz0}?r3ZQD zgK36}GAmsX7}6RI_1rrEc`-^QRApEMK>^dKx!0)bema^BKEMD;ePu~Z&zXNx>6 zAe;HO7a))PLgp7hMpRh9q7N0Ko5%ao9zj0De;|-`;rK412aBsiRDrZ%PqHqrUA(p8 zHDZ_fI801VEQV+`vqsHouV-O}o%E4PF`E$T4!s`*m-QqC^(dgnX?I>f6&e7r29E!* zUlMhF`voV^fw3E~U@41%+0^ZW)*F^OIy5W3WL}%yK}D;&nW@_JDQoP_lK~e1SD0i1 zJeLP^b91{1a#d3=XJ;aIVx-6nVIAbYm#;-|B_t$3a5+Ml&eYmG=&x$JLdGMnUdxm9 zfmx(MZ@Ms8o0Uvn``>Z+?z;^#rwWJwC6K#!#| z@HtbD`R6%cZ0~E0hX9isyax;FmGD%D%zYoo+NR{%{L!$U5la|f9k0#Kvp-qaY`cDO z;|XBDxxe{h-s_&wtm+D|&8Kwg=RB{`mqA{M;S!2&YFrO;k^+Tv2DL`-E63WR3y^F& z1!Lk-o^(sYPl40jmf!hzAGf-#|4CMSf3<21sP?3BR0St$XL?~B2f7O>sRcB|C&Rus zO2qfMAERUGt1BJgIRT*B3T;5@W*sq7g?2$P>VWlrO@QOr4kcEI>@{6@|@G{ufXeLA$}3uz`#Ir-%b|XKRah+LcRi* z2Y~xj4hmlAX7Mi9({>K$NMV%zv&T6)K2D%i4TaMn!oaM4_{t_i6`q#%@*FC&|HGHP zZ4c5%rf{t3dwc1SI)}pnz{}W zhiNW-WVFv|K76t}kS?&sbenYumWG1Bviv*juADfi===9wmq37K_!1R$a8eSIL_qgW zuG|-V0YH?0tT6f=`l8)GUbs0VFcB@w$oLU6)-Vn}&DGl;J5hUyG9;og&Jf`i{Q`hi zf1&J8jEQ#ci7O9pJ{!%Ii4$ibcan06jfHEdsR06YX)u%_sAD4fDX#Z)R8{eAOA&Sr z&%E!PTlvqdcwL`D&(DAbItj^m9dj>9SU@_XU)O{Epj*FXaU4sh&eqi6V5+JcMn=!X zgj8yD26m@w!UW}?qqIj3XC@_a1DrZ4J<(@2#v)?->@$#oE#8CJZ80Lsuc$bhv1@Aq zd^9CB^|vv)9S%cDZmxKtGze1VAh(;`DYi&izpu;7%WG+wLufa19UGUsH0YLr<(f6I z_TJHI@inlK;Acwz7{hWt(4gHwKD!F6s!tkMNaK0ydAbRO z=Yz4jI6LE_RDi^R)V99epzjEB}2J7t1!2f4dbdWcredo>ASX33!!>fu@?8uG^f; za=#Pn)A8|fw9J~0%i2eDUOc@9yceG~Yc7)F-*EZrfkTifJ>smjUW-LDnH7o0HdBaxGhS9|S%km*THLj}%4yL4!o|Uj&xlq>)ak1Je z^xuUu)$-R-)~6By!tc%~)11Qc-f>#L4Gz+%2+8O63u!kqj4UksE@89n?YGRdr5gNp z^UXC@6VDJ{Tk(76E9GWSpFR!3A@>E+2`r{9Wov5-khUmTg+6OG?nBk+K>2%fbFI>G zkVqntlTwjH3IB!t+bAU?90oa)VyiO!^7kk!8q&ZYo@_B_{?y{!wd4O>860<;;Di&T zpE)01?qpF&z^ok~8@s&FwliM363OIvyee}(xaTsVN9#z(;F5;&M}+X1gb{9=^Io?g zS#fc3C!xk__t(V4S|Htk`0o_lNrj%>(BJ_O{9a+@jZ~y=rAhDcTK}W`mjRd0fVo6} zccE#M#8^Ok!bRF?P@105A7JJD&pO|%d8(QVxM?NGX=l?C>kb@BDn#1FdFB=2jsoDI^HP?IU-P5iTL# zc)g*%q)!5#2yY7OT^|%cN?Qd#r>dv;U6#g!mT+%e~3!w6IuwzmE4kFOT{%I6X7- z044&j9<$tYDR+WFch_0!7azt_-97TNnX1e)X!8X^*8R9W$oVEyF@X;2bM%|=3wfsF zTapNCgKthlf|{#ad+5rmRhhg_>i}HL$i=vmKP2fo_QunzN=A^3wZRa8@$UN*x33&%rbi2Ey(Q#0s(7It=o9yKhgWSmIpM@Hq8wQJLy zHCGr@M|w=j#16@^Lz@TU9fl*-2YJFK-aVk6aof&r!a#n@>>IgA_JPTk@C5j|KFsUv z>}=P{)#)COu&=-WiQgnZl5*d(lFzNez5ub$-DnIm9Suv83tv76^*%?cR)ViSWWM(S z9`^c9X$@%vcf)#x(tr-&Mxn!J46cQ=P~c6Xv3rw@I@m%}AjfP!EE4g!L7<)aEUb#<9!p9e-tyVV>a zJl0?>Wl3Ykytks41{NrI`^{ai*GD#l&<$ewg64RU>|G1l825tr33zztVF^MccZjXO z=PBJX;x1U*|4}#ZDd%lCk6Pc>0;FUqZ!8dx&4Xh}IW6()b_BfkPJptnuC6{)XLo_% z{-ma|ugU^cg3`9f%1|g&jfN%$F%|WSzbhXF1dtE1vd=tk_UD?W!VGnfFFrN*3=Ei> zLxFa`SdX5j-&no9=o*Xe-Mn$_-d6spR40uv>>o(&cljXV8}~ zg?yW{vR?0&_H?gTcSb!Qt9(Pos^9$ehQ1!@GXg=Mhv^?qT63MJ(lan@pVfZ8C0|*I z652=fgJml|KM7W?j%`EW-2o`-{N_ffpmzle=jHz_oGWwYF_67%bD-gu7j!ylH_Z*| zCx9wSBKzyO53^#CZe?B?Ha2@8muUuq3Kl7sJ^878PYR2x(=>?w6-nx~b_>3XKObaK zgRujNJQEe+Gh6Sl%eK5TQNHm?H}wo829@QA%WfgZu>23nzg($4)>tMWHpxMV{P{w| zm3>*S$rY(--cJl~d$J&^oLIX;J`5}1c7g&y`;8lr|093IMQWFh1BFK3q*p$btL0|D z>BYaV9Lb2|sF*hhu+rcKisp*>2hAsu(!tqdC7N;#*Ng!-dwVl=iWU_c8D1>A@^nUj zzlngiW&bjS^ZmL0duiB7>H>NTBSGZz-^T7sfrRxr!l)mCuFc;p^)p#3P5CO{UYv;_p)E?Wlx zaWSMhVaQygc47onJH#9LTL8z}4UEEIssDWEB|7e0k&0wdf~SAQc42lcXBzvLP^P4Q z5RKiA^ouG#GTlEOL-zxccqIQmE(z%;kQW1avl(&rH%lyuN$VGL9-Za=?bZDR5G&5x z&@=cPk<|(1V?CCM6i%yKSh#GrvFMkf!B_5|ZqD~aCmv>k>=Vc>#{|6#{tm0Xw3!^q zAMPh$H!}+Xjpmbk=iuw$y91zy?Z93m4GTEPgsxKrqNaQ0s)PJ~(V5Sma0Y;3%f&pS zx!A176$ruYK3{`tK>n<0;G;gMf#0qm>c06mtL=7;^?m|~N@;J6gZ}FL$63Ld?}2>Z zEYTfBZXV|7AtEyDyZ^zdb243zvN3welOr+ik&t#+px(oGA#Hd`rE$hQ|5NM>r^d7RKi1TfMK2 zLC%nVa{)x#{RU8R2qy%=0u%C4k&FY|L`) z-T~xPK*=>Kmz@QEXPv5;_J@20;L}vmE1=RA-5j(DT^gG&5xlxIA9EAOHgE;n+adJS2OrG4;R~kcKxFu9E5`F4dYKBR;u4s zXOZ&QFCr67_ah_3<0(LcKnw(gQZn-8)bzBrmR7I+J`zL__ZMx!Em=UyJ*jKED@B|g zt#r%D%A$daw&$ru$r#|A%CNoEw6xnre;h|v_Yzj1_JA>fl;^<>VQBVbYMto6q$3Cf z2L6|kpnU*Ylg&ijZh%O41Ikm=(~U^x^=N1V7#>MCr06!frIklPDH0SU)l%}IDZW5B z5xQE91B?;#6<;+BaSzJ%z^mOi(%mKpwLLsM9I-V+8vkFw>`?T@WKYE{1~tbqC?=hb z=KAU?2RFCS^=8g}|1!uXBO)SzfxvWq5`x;=vhd4gK9!F32FScVlzDJ~{{!TiySux< z56t`G>4C2UHwEBqNT}cJeq>WO*X847H0~UZ;-xdu>S!c$*meP{m+PA`ovwGtxvZZA z0mIVH&Yb)!vQYPE99)fnUhijtV&?qV7&SiHMdIDzQin&<5}7cHNb}(@jAimYWT}%- zzsfA4Cl*LuFPf-s9)J?(D^L~|@+v>cl~3cT0BQ*!bq`>O<9^nugq{U|!?;TUeJC6#?OTel>yHh6!025FIl(zrq&daTLYe@#!2dVhlygT6Tvt+CfI7kLS zDYPA1QyRAuc7H;?KZ7`mg8h&H6)pVGrQd`G_#S=I1#p|mAOG<4{RYfo@U{m@10M?qr3Scfhak@Nlqj1M5_Yf&llw@3_hG zEDZxhVSFdgGNuJtK~x_wa4-3M-8H5%PX_0XxdS9qQ$wTc?w<;}&_*Df0#gt8`0-8x zj|43!?!DAW=*-b6Dc67A4OvfBQuFd^SrG3-Fh$e7yQ0X4BB%(~X%erd7C=F^?f$I1 znecO1Xeg)xPXUsVkeCP%;~Z$*FQx;K40Qw!7rI>Z2b55CQ&?6}u?=toq+r2M{RN4> zfcT9Lrt<^b+yYd`LF$5&BqB-Ab!xm+2U$khH-6m`>D0di;L8^j?%fxfK(!lS^koxy zaD|!aK&DV8KYu2`Tipu#G&-V=dWRJ-@o#~%E>gDW!$5f>F)1kw=eS)C6!d|tI|b+g zfRBK2#n@G#!#OQ?e)*X3;aP52Od=XyW(3xM>pb$uQLZmY9)XfAs7f!icwGR~_=i<% z?=b^72S6x6^95_{hhc{+sLBB;FOQ&5%=Bk|w)k2ob{zX-aY9)qHEWmQ#J zSl9-rUzrcS+((~yA~R?XPt;2kPj{!;=*zyGwU_hxFTm(qe;FAdAcanqCMzTyVY5Tf#@d3hNrh`;2q2X{8^j!Ym< zbtYGvpW83Py1xnd!$@m(JY@FL&H7wytS0J&0E_rk5$m^5dqU;kk?1GPSQ19jPU9-c zlu};eq>l{AH9`C{4z{$iv%4?|r(HZ|RRI=6bX;i{np=NU0KXy7`<0|1dZ z^_B|cX1|*pADp80K*xHa4=7y4>)!rMg^P;!{(%~MhSNc$G2dq%0D^VeHrU&X=Mfr2 z5dy4!V}MD*feyA1K!lBijUkked6_#YCHO+h4B9eY>5ym30b$#7r}X{`uBxH}lH>pu z58$_;C1VV64d70bRa}{e;sN9t&fp$H*hLqllYO-sHw*}buVK`ps{e?o{mB$s5Gr@m z)SLj#I@Lf7UNi;7%>^KOWrwIV@dK4t|2=;@Z0En{KcrzmS7Z{X!THWuR{zT&0uSZr zumgvii)$Xp6rk>bB2uqgb_WtPK;g!U)prE|oq*i|g*1m?AMxwK6G*hXy{)E!E;kxQ z;_HIIk5u;|sGT{0Gk`il9PD<5{D~hZt+FB^p0)WCS1QKSr(kT8MXPs`QDk?3#7okv zko}*}Rmt-;lK}FpW6o{i4kWID{AYf1wn9GF&cd(GQx|T{0JoTZvIDx-DBga13yJv* z`shZTKx59t=%@gwOMu17O%gGS?1uqGHW#3-gB~TmZO|e2S^ULK#6K0r-7MEJ3I$;r zL=@?{SJ3{_oxaPB0_Nu81vp;FQ^+&&4z_`T0l+>0AqQc#f-6Blc=z(#aha=|-r)kR znLh7phxGzMUA-G3yW!}65Th3M=@=Amv))2{E&nK%oIeHnbZ{3MMA+}-$*404=pOA%ON8l>P50tf$N$?zOJvct zk#$A;lY>erN;gI$EULDM_z4737xlJG8$XhC6uNO|Lsfg^P_NY8=YiCl&m+ftB*iT6C>C_aCaO?o0co;Stb6XrgaN+TrdVCZ6%( zD)VS_KYa{8HB=tSC6w9#=RLPt@6ecOT}sW z!z0+-)ULqF^UXWZGz)ROBvK69VX{-s7e*s$3xAjyVmOOUi%0@rJ^8sL046BQD!n~- zp@4b)t|`fNY=)7l3~X9~%kFE@PcQU=!HpTOrt$VXKH?r(9wBy|2AcjIMwhJ-@`ft~d*z&#FhW zM(_%UopP*cniBWYC(!XuEhEZ_r96C9b9x5GDqJqUt&vso=V;J~OhOrLK07i}9;LnT zC1@oUSCiV{7JV~A8>j*YP50|WkBK?ynYAXb|7p5M2OEFqYW(YZa%Bd%u)($i zm+0Iue#MDK{qpHu%Y*>^o1jWnA}05bIOXtC77n9_gzB^6h z0~(JZ6w4+PT&x{J%SVgxPBOi770U{ieM14E)BJytuIg|)*%#0yE3J?stxU63^7J(r zrIGlwK$h)*6klsDv4OM|=%@btiDQL-Qn)@}HM%=N&-m=tbKj;_=k6C~F_@8W2RUPfIzPk*Os?+xA5*x^&p0S=Vs;&R~F z=)1_AEiGo*_eV*S$WcKqX1Zc4>1o2g)y?GA4z^1=a;|X$;1ez7dvCHvz;`OS8Uyax z)Ql&NSz)Job4@QJ;~2VuePj)Z>L%rPW5-(px-F@g1>6Lko@-+E6Uz4aS$5L1>>PGs%y#T;ZqZ7w#Y;h1@ z@$gOidT^V)7%0=yG{m-PO!QgQJd3S#5=sUBXXg%Q?IYl17zql0biY%dS#~|wo6?dC zv>j`lJ1hO=Rt2`mnDOeV+q@zTkOdoDhAmZoeY62>*CM!Ls^($gO_x4!QnG2h`4nK9 zYYLpb$?4M+C1~+`nDpil;crg=R%w$mZL1kt1BjlEY;F(Sc*G zl-PNHG1FxLoBJV#Ru#S!931$dH9J{TC7!;SGdLOk3}sjdTG_K1M%z@Y;cWfXYYo`Y zryIG~x2g6?LZUX!Y2Z?%|CTc^IO%CZqwW}1M9Ekka+vL1WRH0rm!z^AY&@gXq82RkioTDYY2P z(Fqkj-&kX-o@gN_r9*Xyku3_JVehEcwovEGQ8pO2Ujww8(1p>fStASe>A;)dyg_BK zjn9PRwEsh!cwxMu0H+JgrD zamg3f|MocVzb+6j69(RUD#Eccr`A^d+e5?RK^%>#!u){f$cPt(MaqO^+YS6GwA<-^8_84bM)LRK<2uu+m)o*A*d zJx^aLtUq{Qa?W?IN&!|<_D=yvm2q`S47Kxk*uZnN_@LUaW3y)v!W0wZDpD+}TFw{; z4Dy!6*btH1ONo)TKVOlx|3p9s?3VONPIC3pLX2+hzoP7q*QN;{!1ea49;*B*wH~4` z?AFM@SVHdXMWT)YOzrnJygUn9<1T8&jy>4hs!_zukcc!i|3N;J7%YZ*dsj-R+-G)e zN7*(9n8TrCYe(PUW`ktrL5wNb`a0N;ctn=7mXKVbberAU4cApa3F-Mn1(=O8$4YT2 ztT^X{KgQI#F)iD#X;nzFpOQETmh$b8$6GIRRA6+Aph-L>pY^MOy<1AFaQhyJZJqV5 zZgXAN1^nVtoAP5DtW3V3c|n=ExW0p}b^6-e(|V1v--1p~lP?zk|p}mAkYW`q5|-yM^^OT90Lis$#B`mHGVI=b>rDPfm7_Q5aW|XvzhAg`%k?yKIT~ z$N`J4Pfh#w$M9dpDS5_B0NBIty>m zwM|~fjwa#HnR8)jdY|C+LZA5kXk-BM@oRYI(0{7yL6u`t)>E*HD!{}=-4?3N%>JY_ zQ!CZgaH%C()}T(o8o0jvoNVw(_@xopYpXwK7YD-0zx(0APn2+i{qQqUp{$b1`OAJy z1M4v{A+J@CRs%$1$cp@6*XB|rbTnM1&BAEi2>ecj5|~y9!^zu>JDbd1yPNt6kKI_8 zDZ&@?nZ-k@v_Hf};bqf7=GL|TEiB0<#F?31Ed(cb;1@H#Y;E4w7OmN&xQ4P83{$-z zoCm%Zs`L`?4{|(S{O`-Xn&p2T`a`e%wx-$`YSW%Kd;)hGRW|0Y&sk$WTKaa&jzvvK zP{w8jR|{cy4Ss{6_2@U)Jl(=lG)!e`&MYsNEEY@h;^8BRv^92jZ&xCBdW(xVB|6yf z$wIFSwwg(?eCPUf#}M9j@%jK{76+~Ezl%gxODvf90*S{@IDYl*lj zUy-GZEK$CuS`Lo8IKx?REh5=U_A#fMyPSZGCarm_Vji2gJ-5}wp|BMA7zfX)GZ2jb!t{C1fj=dy!Wtx)S zWG#F=8iF?d7=%)=U}jseR@+A`{~yZUJ08pTec--rAv>##B%AD&9YS{Y%FZTaW+W1_ z30Waz3%8XrD;XiHkQuVd9$C+E`TTyr=kV3x~}s&&*MDK<2;V@cppQM zvtL+rYFgA2We@%SAjVSir_=(NZ{OD z&!vKH*=T$>FurU?udxs8a&PQH9w_?(e3&; zntEt&a%;W^a)NW4BUP(2lR!|Ch0Kfkp6lA6P@7K+%|woJ875BuWJ`URvU6DYwpw_LLg8WJJ1Q*RGiaCO=omxL}qn20fcNOL+#!I@z%X3Eb-WM!0^1Sck zV3XH05+G7kQcuhlSmlT|w;oMU!nuYG%!aQWVn3zdWz;n~yc-%DS}aL9Jy5H1o^Sop z*aKVq-d?xYJ)VC96*tgBPEh!be)D=@&WamdE_vALI`$}9po2N@`tG7 z8!hX!spv1-+9XRbEAkZWdj4x7sjHOvc3NOCE*lOv z*bHSx-@0q3-rdywS|!GtJCM5LK;oT9z2TVud-}mx9N0?6zed0d-iua{2mR7gL9}SM z#&3eSHqgIp`|`amsX8#mtcVpIaj$FUr1|Mp^WGePi;^S7R)05s^FwuA@Z5Iz^y4ci zlvr2tFODDScXs|A+utU?EM9)EzNj&8dv!i5BTJy^0j%%}3ED{l+zD%6RE$f-inmYs(Aat*y=zXN;>6F| zpV3@ps9EL<0+rrZRh&YjC5DK9=x7u(lFJS|aO7tTTxPE^w;sI9n$Ie3b}t#gLL*C- z`q`a}o3%}oBVfagjU7%qTz0s;UUOt0P>RWYdEZ1Z?93?}*R?rvw1Y!iyZ!$a_WYRf>ndq`L( zUoUujeNAQ{q~6olrzgV_;v2!$2iqQOOOXt}GuQfg7%x4@pk|bDFdXt%96r|D3tEi-JD>|I$29d!TVh)zu=LkeoC%)~S|j zNK4~2f3xyD&ZLsXMcPA=`{(odEB>PJPkSLdZH=OZJG5$^-$o%;StI&?n;}Ei?cned z4(W2@!IR@H!u~NNu|z`uIm7icY&8F9zzFx?OCXVDfgU-3BAg=spZ2gM56xK`qr}he zt(-_n_wW0x`&_I~x)p^f{-G1|zA6}~q$u{=2NMnchw4Igakqtwnje#{O za{uptt$qUILl%mBYepJ;)FIXGIaEyY$W2IcGe?)U8%=r7eNWGPh&tff^M774)WA*`lBNGg8*!515H z&h~PcwIT5q%#8nY6K<+#Eiz;(hs5JA)HY}mGBLt$GjPvvUWRiy#*aj191IN!v;x2V zycJO(!2jJK^uJbMS}R~E5W|(EAb}EBh@V>O5+jNH8iPVSLbfR66_M$LePN-Ns6#AE zm9rhd~UC zIhOrDeoVW1FZH^8P4{dKL^?ChxZE!A3Uml6B&jN%B;+>I6zSiH@Viq`ZDDM%BSRg* zRfIO=mh`~``9ooy)S?)|p%V@^)B>MZAlmaj2+DuoM8qBylEJq})0&U;C}X5To=&mh zf3sTb|1P~Uvr_Vr9MNiDXSy8*VYKk!IWwq2Y3OYC^p94#>``cZcu*+Pqr~nDwMzMe zj_4>v0fe~CuDreuH^)aNf!Hhk(#9yhV`RoK_J8xC2_X9KULpFW3!Lp6*6@rVc%Ka+ zX;k`$Njm}0YCf~E=b#3Ltv&f7tLjFP7-^k9Sf?kqG=8?uMT>wa(ADuJYQudRu9OdI6tSDVhJN_EQJP%nko4vC(DIJ|`!>VDELa6LR;k^ zf(hvwAcb1&$*KivtasTd3Bu)hQzzU64>=XJAtnEz0n&U@Xudrj+W04=kL_*3Vd$D4 z&OO$R6)F=D$=Z?y<_3y^l9G}owWrUnX3$=UF2;0(g*;M?HR+3uPH}Q6F^nNLq6;$> zhlC6+PYw@vzFhT~rF{b(G=qj16-2`hR<$TczI|&C4&R3mtU$V2vF`XM*&l8b5|0-= z`~p6UV_|c{`?XdS6ezlVw&2?^O>(7%YzJLVoFA89{L(VNIAqPLO*viu;(WO`Gc)t~ zRSlFW3{LA=-|X-TtKt2pr0;OEVEvoPhbJlo@USkY^ejM%EcNLPGSlwY7D1 zhrNl};@VT}U>a!C#VF-!`TnWCQ5qJ89f?ioj_KaskHz#VJR%}Pjg_Cm7!xE6HPh3% zR&pI*o;iVqh2_~{6iOPVx~wd8r;3USa|=7XKC0CRu8BVMc>DHk8VhX*DBuv;)XYro zMNXiXISSDYn!}*W8Zz$>MFlU@wl0AT<9q283thjiSerUzFVw(+4?>ZuS^$%Bu36v4 zhJ_a!ODs(A4vLAKf4HA#W$y+gWDv6Pik=fq`Lt2nGIaACZ?ELh}-+l?i73g5IxfSDY z-mni+#o%p_z;u0uhF6JHICz4kl9-)+9kzb5LJZvwNr{P)Z%t4#h|y>aqm@mw`+CHY zKl~bWWK2~RU)uzFU|=9Vp44*x!puIR7NV1rDckNBwe}&Su4yBXKC?gkDv7EU!r&Mo z;cBdRR8+{ka~jg2KaZbRb;QA{ecON$wPApKZf0c#`kvA~;=GYE*PJz?p;yKbZRJfK z$s#PQ*Ghjr6vt}z_HVT6hr4YA`pa*kW~Pxx7d149<=S!Tvr~FBn#P-f3i7>Btjjr^ ztN}d%LIHomrbA{(B+D0gTRHFW^De&=N+k0_ndQQT3xEFNG4Y)*VIbzArjI|JzeFnb z7YA!AD;VCRBMf{sXlLu=ogJ@zpHNVcsEs>UuU;jOe0P61jh>*D{`^zul4sBGMST#( zjyg_MV1D2!dsjgLTS*Y30ogT(E!r8BD==%9j>zL6+Foz+O3-s?izxE!&$Y(oHwlUC z(Z;ziBtT*|ME?AOKZ!_6nzmY?FcD4Tt_o>*rXTL%-va(sYg~9(SVJ?BriVv$RT~QX zT*pzSs4i>?VExz4G%AU3l!I4NJ6Ef-vlE|zMD`V;@xQ97s61Gg5IUcGEsY|_V*LDM z`-`Xs#KIA~!<+g}MByBXd>~5gVH~MI&dB&T5?XK((RC4kYT4r@sMO_zV-gW_(AOKd za&mG4>1Ae`k#TXikMPge(E(zZpu)mvGtIX1zT8(%bT%;H`A;^Z@t9u0(1z^61#V+j zH8j4U?H<1}6L-}_Y&(=^z`)P zGfem6hwg}O!6uGT%Gc%e9It1{f+R6h?SUsmRrq{gF*jw3r5O&T>PGkn27f`Oq?Ej+*C&+siTo^yM z!b*-WTV{ryotTShzEJ$NN(nFZaYHH3mhY6^`am|E6 zuOn7A85tQpIgu=;5I*I$J!no0tfr4V_aU1GD3MM2|8cavK%OG~hr*Ac9bw_&v@xIbDVVTA z*y4rMeY}X6i0DUjY+W6z>2`Iar^fS&jLU>Z2Ib^J5`|y#T%~cu`JjF^!QnbVj6!Y# zXJ|o*uiu;G&KXws7b$fwGI}z~)csY9>Q{D?rxem9B-lwWIb6T46shO5{##a-Mn|VG z;SE}++&J&T3*3+0)_UY+M!$LDhyGRhN1uk~HV^3DwJG6oEG`!j_3=`G)dS9xl3%hc z?tW&YsPK}rqDu`3(-JX~XP=^mKT&=%-@ zM%6%99&^~#Y-;w$!R*6h0WU8x)BwM7=-O-5)%;~a<(qFZnr^DZ)6;8P$Z4z!b{o^} z37Lvb`Zl&ju~MK~qhE96vxa#y(tlIr(f0DHYk79PCaA& zO`$$kr<=nqD(daU4az#|!^_Tq{HKMb;EiESu1mHXJx&|(!t8hsGjTI2%z(XK>eQ{LZ_YE|PE2?+?Yx)Z99c7zTsDqM_9z=L<4$uqvQzCV&dBFdm70<-ka z-NK)jrHbMTHE-k1xy29pC*YROMtA3bhO=fZTxf-g1ooiRzTC$=hI?cgn?1gS( zwUPEM*NcCzwW0lTIH#o zHfxi=Tbv4vvdkU@Bd)cjjkhl`lKb9{VT~D@2rw=3nr?|Fcp?H+Z@tN%wm3TEVguyZr!a zy;Yf(^#d+G6Kl7Ze(+-QAKmOfy5$9RX8rVZLe7?Jc}yHs;y%Zj(` z{80otA_8K89)|tAB{Xrku;V=Ltradk9RKr=5a+4KQqG8GQZ0ATqbkY{t@jDp--3-# zRlK)VmU6xsjo5m8|3%hzZIG6bbjCjPv`4F|C`eO(`vRfMgDs=Mt>2&L7&Y$}1PF<= zL}b==3}A1X{7iQ%ttk^)P&=XV+u*T$|M{sGVIrp#JsdU>tNgB8|8Dm9@zZG5B@L@Q z8@agaYAc}}3GdYsitSx~1T64qnB~SNZ4F+vW(`-@xLfEa!H3%5i5dNLaxwT+`|Mp3 zU@monf_t$jzLF1mylcT<#Jw?R!%%4#NTfFW)B0;<(%#ArumEE^`2 z#e+H3g3RP8MnMl9drdM-tov6SB%9w_J&E*|EDcXijwNOgyteg+-7gvcG581sh8BPB zzN;T`b*Ys=i`-Z0`X1cL8=vH%iMN;)3OJ~s z!YQ4zzuoF=Sq%m<>M5scb2V4Lk&@+Yv2{5p>N&Xjzqh#`{-X^)j4$0TsQ1mRT!SJW zs?{LfVf6;u&%|l_3b7&s-SBe4!37T;P7RCJSNJR2QR+u@ds_zAE_gip9o9JLIp~Lp z`Xe^zbv&JtbMs16<|}1i5eM63A|KIOEDz(&}(@yT1p4n`{*{i;5={jT_3Bm{5|>RFwJ z*cKRyv!=`n8SzulSqH|Ey=_VfSG9 z%2=95#y9VfE*PCFm&Kr&7FOW&QI2BgQ`E&^k+u2C(K|yG(PmVCSQJvH$jaZ-3BRhs544X&pk0R?mGqP}irQSlCpF*)DO=(&t_VbT-7 zWbz)#Iw|~zeKzNkBp<;0jGo6>_p!6*uFvuQKq9tZKqq}op4}aMuWYQ4jjyZyfm(Y( zPtcExG^zy_KE6(^)#)vtr|~e36V&3Vd~xhQV%T?=O_BDkzysAZwQJ{(s5UhlY`Pq9 zI9BzZ%W9_V&DFec$~5m`m$Ql$lWYP3ojRdExiDoF;gAESR!ZYmOrj`f?;|IxecgS3 z5eFQqg9pk@jw2%jk+cR!Pe`NY?TK+ZXU> znrzaSE0t$T1}t0--W=b`o~lMgWR6~|8_QZu|1GA^P|duoFZ>6X@c0%|!w=$E)n<(1 z_ljOlCQ_NEP|Kz?I)aFr7O=?iMw*OkPTcE_N1==_Bs?usHvLk=wC;D(_mWuScAmbp?YwXz^A~Rfb*Pb_#(%S0$mROWzJZqfM-o)!^cL`AVi#tgA+&BlP2KsE>%67R**}*lxzD^4H^_e3Lk4>tpa4zUaLyy7{nd`Ayp90rsz2`EPf5WQKpa zr?7H${EeGE4#fm3{hMlv4PD!|hiE_8s zUdyxh6Eax#(VKqhP?NUlD+Bi%*Y7>f zzo@&eZ(A`wy}@R%95xX7-j_NI8}*x=GtyssO-3bF1>1>R_}!{c^p(U^P)%u|n(CVb zO$+X}A9>yV97PL}|vBzI<*#p59vH1qi-BX?~$njLHyx1IOQa2Rr? zdFtg9SdtB@!Nn&(WNAB}f9|1CONn#YZ>;SmEE-d=*uj&46A)}0-(g_X^$tfc7a=sPr ze?NDZ8daARih0!6_HXhd=kLUs&CjuN_zr=F! za(w(s;dVf_myZ>&rCqn67KAlt%#nAp<1MY*G&+ck7O;XGnNd>Df@=9D>$=hKY z(HhM%Me<+0?A%jWs1H}Kc{%4XC!o7zX^BM`$i*2dVc|gOBW8E2p1<3$R>fNzxYOPo zWv=9%59`HC&Q7gUAO5|klzU5ckpzy&s!iuh=9q! zK-bCg@4vsfv8$z)Ee$Y<2S$6l+6Z($8GBW2=0#fN?BLVrk2kd>1U6Zp*4%yqBOYodOF6Xp+MGtkHz{O-esa26^w0oZa9CTBjAqwh_-2sAf^^}`Dr@-e zFV258vp4v!Z7LpJ-hXgOogj-Fc}yq`oe;Mg6|}Niy!6L7w}+%TzIgdNkBsOB+-gnY z)$p6J%YOC#Te5bR-e8$x>hqL~`eiCzUzBP4SAV2lRn}tJ`L+XHi9+qGs4Rbve^91K zHSS@YbHi0^>ik9z70cqdl#P^>4HA&k;b(jt$?r>Fu>O`bd)w7^5HI*|$nIre`GJb1 zXc9KBFjk1>OBxZC_JoG4Z&~ExG&jP4B+WYRVSewEu@E1-xM9ipvo=WbLvQw|a5#~T zp>>%W`qx>)eihDSC_wy5mR-C04FlbtQ`TYAql)w?EIbe6`pl7(x`DO3TEyQ9pCNIT zgNp?Uqj!>8jlGNw9CNWwcET^I4;(g0j^^>MRJ47wNSMb!kq$7eGmU=EC|6GA|iWv@6P2Q9hG77X4Uh?>Q`cZ_~0*rD^=CjYVL0!YvU)q3{2PL?FQbyKzTu@*=@l7%|43&w)2+^$T%OEo9ubj2BCa zc9?I^>BSEiiR4SRF+sm(KE;_j(x#VeFtkeq*Z^_$SmNKuF%ex%f zu1myjUnp1L@Ia4o8M|Nj_%=3_l0pHFR?5C+3Ap1@ zz>#om(}S|@^szF2aaRJYXR*mPGKJcYf3lwkzTlRqdq@oO{O1u8x;p>a&_n!4$WN;Q z;yFD3`Tz69QG=7s&x}ZJqcRJ^8&6M6Y^MZZBlCg`9UYyq6#tDIaU4G(Rk=a?N$Ni! zOQpMggmlND!CCjY!*N2)7UK5pdrz<5dC>)qb2TdPYqk$F0-MbHr8E}9_eY;W7#XV3oknYdATW%^ulq;+TY)Tc@_Sp&R;vEm94Z+e>$ za(hsy@3`E|X)bW?3a;eEliClrfkJ7F?)JRDcjNtz&Ui2YCOU3aVh-?TpN?ap8hZ2b z*oC@LDB|24s86!A1ZiU9b$9RHg?!dh;HSj1a15V1IXXf{o9jfKA-DWGRV)4f=Spzi zNI`CQ@7@y<>9mA|!IE~M$%%L_s!e!nX~p{{JnnB=F3f-gLFkm2|lx1)P@j3I39 z zJnZ(V7!sr}d6?W~$}1s~L{EcEzGmnpFF2h)<0 z?&|B0LIErQI03P%u_`;gXT4DBRHjrb!YhB!7k~-C)`du?>(8b}_g}S!+yK~wBtYCJ zS9y7PrKF_9#J-0sk8|F*(f{>pMP;Stoja=l2zGWZ+n&wU-Y^$xaI~H6uQ@fI-fMB= zj0^aEsR$?{fMRpQ4gz)wr1i&tIbHpyTxtG(3bG}mq6kLsXGsNk z{|>gXwf(VI-QLzw*vthtg)n?*N=KI;763(^tZHxcZ{vQYk;~c9 z=K?&bov5#juT1SzPb#%t7Z3^x49{9ka(sL|Ad;AX-dwm0q40Xo#3V<|hK1wX(7*6-|qe z-#9(`dpSnwv?X+3h~RSfSPjGK6nl_*Elo4T(#9g;-2EiLs?!~#sc+aZRF;-Hp(!c^ z+wW)lJ-vShsmYF)`ynUT*jBhaEF$9UX!^|cNa2VEvbldwPD*;O86L3_e0wy`cpYJn zLvr)(%1AWVFfjQCkOsC19uC0RK9r#E=W05B{GEpA>6aw7ZONZ@kVagt5xF+K3^D_h zc8lQ95jZ^oQo+-Ie-~8vMsibK@FzewHdF+c>XrZ<1r8@HCcv8`z^jZshMHH}P}82BqBQ?2c1Hk{==T(rY_V%7Nf<0u$WyAu*%0B+xRFzH}O|EaJ&)4})$(um+Oyi8hXo zD}Y~xcCW>~Mv62@JS-Wsx4AeE&4p>(-_tYiPpx0N8_AXp3W(GICy4j&v?ChP51<1^ zm%Yj~;b;fo(yH9jk3W^yPj&CqCck)r0Pu^p09n$>J;o3CH2s)2TGJHmF*(^<{92S~ zn=gsC8T>jO(c=sWUu??%nVc>KpB6zLKIqi-sDVDlc}TLZe=x5UmFIj02!}PGQl;{m ztwqp5{fvu{tPT|@%pPO-##Rb->z6)+72vX{>#q{v_eWjgn(|ws&(6Z%vXct5g1_g$I84~09+`M`7eH4@5*866kOm|A#uwnleEnMY*XkckUl$UfpI5s6dGw>&w$6-y^&+q<1CP8X zo*)`1Z~^Lq08pi-qT=gbKZBJ2F=z;|3vi#zj-#D2fDr`Q`galVA!FbU+1Tm_0Fd_* zt${QFA4Nl zLo-VPC`0;2MvZoTbau~2r``tu`5JH3M*<$VwdE1Y#Hx{b03;AipgzzpJvIU+gs?u1gocJ9Wix=+#fsnVEOG_~ZR&$Z;T@#@ zp{XO(SP^JZS}-(XOLq#;-8CmbIjZHnv=SiSusKA*iJL!ouw0Zwjtz&M^vet%0K@@0 z6AHCJnm+JlGaGLe1N{(K<32Q(IzYxM%-*FXEki++E5K=OYLbwVIRn}Rtm!~eN;iWM z0N|bNS3KO)&tTpKyCmNWSQkFZ_)irNBF8TzEXVbryH?y!d7p|U6qf@BrR z=z}OXLid2FU~Qe>X3+pWQ`0Z)bXi!Pv~zi==pN9HE5txy*qTtEAgcrr;h~@pSTU)o z`=Iy*$GCg#wz(>s=Qel*MO-HvfWgy78YE&*)a zg5K9UnC@j-n*CctL+dIy*d*@pOZlcW4qI)LN3j+=g1FIt=;7gEY=#MaYy;4MQb!m= z-n*8-C3Csg(?=3kxC~hGANrJJWZK5Y-Tz|Ka3h(XL65q>zYEoi&Z-g3n8XH$(2m}P zWHJVssDqzI@rOS6*A2d?-H! zbaof6hV65+K>SMQf*nz>tZ)XeSyL79{=d2C8)+3pi7cL;o)aH%Mq3gxWIiDV4Fk!= zi{(&DD9{H+#4h`T&tLOV3C#BI)@+e7dug+S-ap*vY+n|F@tK|>u9El&?5~L_^Gy)< z`j?W$BlqXNW?WkDl&u^S-2bgw@C~?xQM*`(9HeKxE-W06TYK2@(`&1rbM2~>wDi;m zJ=?YEmK<8SaxHe=dmnl!(Ndo+P8#<9PWLw(p^aV6j5dW#pvH+>cL{VUjCNfBFYKyf zW9{|^R&H&yD`8m28Rh0sjjykal=iAC0P_bHf@`Zh1n^)^!NsE5+TkF>J6r%$>FVim z5>|OW!vLlQh*G`3JMicL#$$WRk=i5bGTwRRlm@|tNL&Si%FhzrcoLSIHz(>gft-`H zs*woW2(Zw{>n7^-p2HAHZgx_*Fao5EVcX8c*mzvn>vP{p3(%-iQc~*H!KCEldGj8O znh$&9P8H$5PH+KGT%^f3;caP|yWSpb0u$kyU{mq>h?V@n-wU%VnwqUN%8AFI905Br zS@#=+`Tkc`huiy90+VO_fNA8H75KR7_mZR9E)0v`!5q*X35&>nbY z$B6(+Ft7Ij0Or0f_g!?gk>UxkXTTu%UI*+&^@@sNgfay+8R}f1BGU&bHlvm>v(7q} zqxap;cBoLj?2Zr+g1mw1QMdl$1=SHDMLD-=L;UVM9dMywiF2F~p&`=9?kF>V-;|fv zUUve_@1Icq!DST=eVhW_-|F;-W0(jX&OJB}H|_%P={l3!fZeaM#OnWr$E9CpfxCpr zjOhq~e3r(0kG@7fdxoI@45NXV)#eNc%oAY2IGfz&P4!tB{!(YAr#DhEd_wXX4vlbg zbHjc^N-8V(r+c9`RM4;~e>LLRIK?4Hy61UQ&%j{2nUtD(Z+c*$)wKW5w*tccVJ2id z7%(*>`lsbA$Swm8?Fw!f0#pUo*m6~z4e&=1et*lv*I{9dW0Y~7$OgvhMEC*raW-ki=}uV8 zEVxLPIAS6rE&s5*cK8STUt*u%i*ee_34A}70f`HCcwNrq$0-1u{l~uU24Cw@s*5$dpz^AiVmmWX=@T#dc9e?Xrj+*i3GjYQFu;`jE9qD){t~r)P~z z0B8CEqZW2I`WLoltz^{e_r zCE2cD4=+{D)#97Ty6NNHz-4FuCT50F*+fV^H8u4wI0=BM+pEmNTb7zcySM$wy)U2r z;1t&6UP&?6NwL$DZSK;U)w%ik^pE5f=I_THKWT(AfoT(9_ilYnix|BcAn&_aG&V(Z z<#g^T5pBEkz4!hpE!9jiN=G$KfCL4tT4DHh4&I)4A9vvDqc_@hZiGOq`*N=l>|7T{ z$~{B~TTipVHq0W<`AYQA&L0QIP1NkDrfxre%CY=a-P&2%-rt0T7cV|-dmX(VPWMdX zGN|C2D9evsN6474x}TS(c#c~ymp{X06I+V<-sro@ToVZRKHyGnj@7qR82~eKuWJIl z5f~lw?a$L^$Goq*Mqr=GmgzgWy8Z!Cgjaq9W)G}e6NR2$0t}}Z*2kr#e)ea3F$81o z0=WGuPZ`r-Wd_s4c>+rJlv5elc#H$BD27ocJJ`TANE82RkiY9EQYxxXBP#7xXVPN5 z4mgMW9{?i+du=OMUnr=6&RT#|k2W?oYT1&9(CXd4ZX5?xjE>3c6POV(0!}!V(`8E< zL~Av|_lc?Z%tpZ=GXVZ0^hefVP1AJQ_qZQ+wmC!NEw^z6GfhbMZnXKQRtgTV9}xVi z9gyDR=?3J{#KVH9-7foHmo!-6R%ynmKA&fr3ov- zj}JWVzHc;@mE%B;qZfZmLM4&-#@+Fy z+A`}nvxd;tj#KN}v-k;|n9xc{9>1kqb6!ZO~_#jD(u?t9H0Xx6nf}g04 zC(0DILoH{p9)Xo}g_zN!g-w6s1k8Vd3F6C=4=NsfztPOgiZWb)sf`v}lIKE+;e z{;O;eNm8u@%n|2Wxi6{>29cB5d64nSpU`x)x5Gv|5V(vWRBy9iWdd*yXnSfZDu-aL zmxBqi%${<|pM!Jei5OF#C{ijMZqp^_N8})J7d{U*LA&*;rgNLxr=uKtwkAnl>30y^O5H;eO z;yOHu!EtpAl4Xf3q*4!Wc_%wBuU#Fz`Ivjo3ERDU_x>(U4Y2!)pG$7N4SR|zSt0kH zN9ZFa*zQ{xCxh$F1)y|c*6;1tD*36YskiI<_AX{G)n+v#-SS-J&YhY*{lxV2LqLsD zU_XFLAB5Dr-#ItDzWI0TE40#N8mX6?-MFn22ksciwiACd`gs3v%J<0YDBBSq9Hc-0O}qmInIPcfH!>(;Hw z%70RB#A|2|A8s$Lv+X(cNb8b4T^9d@r6CyAtw2sS#NCD}dPaQV*z6>dN zD}N7Mo$!A2=oozEUq|xN`t0nogcz-YOXE5ctb1wWG8*hA>MdWSQ8TI}_Bw8X$DqET zT8+&`OKSkgVtxM<+zztwFN3c)sbgXm@PnV7>$!?{I|o$?bbfmD@Bod_At*>%@SPg8 zRq&1~%J&&~R`)#V#U-Vy>~KD8*{x3(cYImL4}}^wRW7$6b>v^(M+Z?HH~lrYHUv&B6NPu^86U7V(9v$hvrP~wuviaoRe z?){a$hYe@ekLF`Uo#TiYr8Acz;Y2fH8lR@;r8ZzHA(47dp46<#s}bS| z=ZgyhqFuDhlnIs|t7)DMVD=>fFze$JZmn{puwi=5ui~O?54#930O#lCoa4Zcym@c9 zo{^S)O(6@l4))#_xQ5eRTLC%wYhmHRNHe&dfNyCS{b;Tah=TD(+4o_7mqm~ALGy3g zWrz<%4!IHpAr7&KAI^)MKVg@$9R+~H-N`Fv&w-oBkM)R{UZPV_;5XDrv#9V#uz&w& z-Sq`onT%eqn+ER=Y>FmMfxG-sxKqAzOy*=~s89;^TV#I#j*p} z$6S%Lw>k>2tO!}t8L?^@1nr2%?*7+L29@T4Km|68hGH3Mc@Q^%U5-S}Ez#iT$YzS+ z7{=7s{MoW4F#*BK)cbdWZ!sD^xzR0DRaL>xBjIrATAZYn%=C>88v!X8f71x+9N!w? zi;Eaw%8^nkN`Z6D^sd?&SdB1+Ke)UD2>>{9wM+DQ_qBK}e*@QEpRw2BmhDj%g7#aF z99AR@v%09@{rQZP0$qMj|8USu#}FtrBj4aAQeCO}~v{rZwL@pUH%!e+2F7PQvO z4nFledJ>7hUdTqgjDQ+T^h%9wXTV8HBmGFsb4*i7=@TqrX2kEzkP?|h9#de0BN2r_ zD;TgW0b2r!4Ul8|>BrBN+9E8btnFjO2`I1=pe8Rw$&!ADl9JIZ2|LD;E4uO)D=2SJxfBE zpZ_gXL4vIa7!0*&KhRdNqp2##hLD{g1e(BF0p%W;_dze!>(AsNQaA#1-+D}@q+c~` zh#(#YmK4NKKLnjfcwRipjbH#Ca-3;UE?c(m9PFfPUBHtE88A@RaUY!Rg@uctvg+!fhj<@EScm z!fJ)WtmA*g(gbX~kuC1M50l<((wlY#I?KT7^*LIvpKvvl^7^gi@Ba^k1qLBA84A8+ zAy>i%P2h_qj5i?Xom%i7jH_E_q(Gz+&) zNLIlIuuOm1>j#Ofd%tSk6@L}XWhf`3W{pti;eRm7kqa)8052ho3-xAEY^}oW2JqzHekUA^tnu9G% zcA&;h%XaLeNM57HLn$(rqDy>FySlpKWKMm7BMgs*H-;)^Zbu%lj8CJTsK=U($|jHb zLE-O&g9t687&za{1qAw!{`bzB`Gb$ZT?T>Lub**eFlC&Qk*eNMwQ?1T z-vvJpR31X!9U2DC6{s38vbVs@%nTt*(Ia%M0-wjPE?6Ew5lm3g@n!Tu*>IOs85$_W z;Zz-?)MoI?^NMj&&jR>^s&cqv^?ZE~NMKn3Lme9r&r{DA{GQ+l_~VX4d^WR)R}H-e zW14|pR##Oue&@XoLkZIokBDXiDx%_FX)*-PuP2NrOi&+C0DDGJss&9Di2xYy0gzzF zr&s6PvrY6#U?U2ap;69Fn9wk?<4+miRo%BljIzBu#9O_@w*8ksuwj1U14=T;dhBJ< z3YP@=T;E>_w_$g3aG%$5VKMFjxV2!XxH4ABICiR%meZ_p91h^hgkF9}c;Xz?nnm&&lqrlom<8{kcAtWRf``Wz0I-44J-sqTG;#eC z6fT2kehi?%znENnPoSusdgC6{Lx&KYin6lBR67FMZd&D?*n_nl+;X#9cPKAiT7cR= zi;LY0TM(Jq0moO?d%sKP=5hy&DXfzHPy+)>`W_q{z_Aat3=tkl$;m#ZpNIUgx`Pb? zYY|lWx@QEbNtv11*+>Zi*f2nqx7NPzne}VqFE|BzbS3r`FTNAZYq7WVZ3(;>AgW*0 ziz-U67xgA{E6a~KxRJ5yx6SVNnes-4FWv~Mk-i3@Hqqw9zxB=uf5f?QZemt03Ocq*9A6L7XFQTy`(_c zfgEzn3SO#3(a*umvSs~L2IW#Ai1*zWB>^ZEwnHBc!qx)o3H8xTDoMz<?({Zp~S*sL$LxPAR z+IN+el_4-O_HYt<6;{vjI7KSO%U6U48xW74Xp_7R*CTiuuz#r$9oS`zW9-qn;+Bn|s$bqEWG^uY{Pj8am3yq2M%y-3D7 z7c9NdM`(IV4#G!QF_>Cs`Pij%TtzWDBX%M4kEkd022#O&jXDRI03gn;BYgc3Erx|+ zuXF>Bd0{ivW}vUeKeKgOl8h`AqS;_FHM!*UyFI@D6@39AMODSXjR%DvN_8m(kBykmdMJwGh|#^~!E$~;vG$61 zIRlb{iQlG-*1FQs+c5qrq#(gPyMI|+0|Nu~D0ZsEI?%M|f!PgU5IxLN>JDPV8WVm9 zWC8C%&*|PgC7Jc$A&-6S-(u%qCmvIj5dOsQ#HryELu$rl#q&dgI*Nq%pX8GpcxV>kaN(aO*AmM>Q~CJ8 zEvAGgn;w?cd=(CPQEH0FxzCvkwu=ISebPZ}8Vm#55pZDFtVAHR^v#y1mE zwXB;_S~!s@gr|fm@hVX)qj`BR6~^fQ@C&f60$1=5>K$Wh0f3a z`8+_ga!gX|VbP9p&_kYds|;mElo2CIBCcW=W|Nt$#wv)oidNWLsuNpwYJ|J9A*j@C z`aH8v3X&Dd;!^9A$4pifku$XJ!L$U1jGEeTS~jvCk|n7sYN7=H=?J=9I=_sqs4*jv z5)Ro@ncjaWB=`TY_1^JV_Tk_7DXT>_Q^DjIy&gWp5&Tr;sE&TV{6l9u?Vy zB72h^+4Fat`rgm?_q<-u_0N4*=XGA6b?o=?{>(4^Jw@KCclOUof0ND?e}uQq;_8FR zm#5{LJ*`1~_i@zUea=((uUFV`hv#^NA5C=@rYk!kdxDBi1l@td_sm%NKmgQwf7;mE zQ+<0rLGWj-*#AIoX{_*CA3I3-S3AM-0E&HnL0UbiD$C$S`ts#N9A50K-&s$D3;9!p ztv;}!SazD^nmO(M=Jd3Zg0ccYpHj?}Sf7hFQ;ZDF71!t-$m6Cz#Y9uHUumn1dP(t- zVDq-I0FF6sk&tyVe&*I5x=S>~m#yxl8`S%!_*eN$q+H^ovUw5Tuk*t9mx)Gh>F;Mf zC>a*}pputQj_Es|WHQT%yEjd5J^B=qJfu`FiGpqYp5?B8OzZ{V_>T*u|8UGY0t4+% z_DO*7L~PAU$cthN3Q5IVUxYl8k*D(r5G06Ko2~eki3Po# ztBWbG1J(SBNTFpgqLe!TC9dAB5~esVvK^`AWT?miFm-&-=Tr^q%yX_bUv<&dw$=Nv zu&**r(x1|m&AyJe&1ZoLP3lgeShsJN;96XgC9GcT4m^DkH<#5c`P_}`5A#WJuNXXN zZhgdO(UU}Q@FoJV=?F5iX)#)$Eq?g{Dq2hc>lFZO~awybX&AOAVG(58@_TX?gb; zrk;S}=hd^xFU5Ch8h*%1vM6K#I5}^Vn1*Nmlcd2gnt(@71e@@`ImzP3b*R8l-Ws#y z15&(u_rt#1YyEl(`WxIYEFQ0lyogu5a@XH3-@!e3aw<5=^){>lLcOJRG>F4;F!2jI zF_GM<>3FI|s@68*epg&(Q{Hr(w@rzfzwIe|=dvBoSCkdt1)1+J!$Zf1Ndr9!;7fs^Kvy^1LvNO)^ zLw_bNnYD5Er;t_|c{F8eX9hwO%v1TJ)p~qvXk)nl%xOj@bb8&m#S9y5d?5M}n6o(^ zDM^thap^0}RPR~J!cgnnl6Lxz*oCMt)1qM?x|W#7JGPOD14+W)Kj}5vBFIo?m*S4c z){b`i)ei&5^+&^^~#P~ z5$#k7Wo~79Ww#P@!DjjAU*Yi5*&GTgZTd1CHP(mO@zh@{w+ll#>)i~> zeGKrR+4(Ft839RLj{LJo9BCD=5UT6O0>|@XAKh*kg=(h@4;;!na?Q^JO=gj?#oq&? zwKy{_wU^Kss{BY9I7y z+mIs+laF+ZxCUglk-b>8byHldE11YwgOrvqt6Tl+gMf%joMlNFN$;=kLAi!%J>-3l zW}t7*rgNW42N^a(IN_k$7nffDQ)^S31jk&-8gG>r8ko$J9VcWz-?fY6(TW>qbxHj^ zL}Lt=B=0*`ZQ;|eW=CfEcu$HxRrYw1anVuxLR%R0m89EEJ2Pzl(r_}iuu60Wd+B}G z>;v1{$ZqPdM!v82-^ I=+Enl*1gM#ltpI^?m#JO&1-L+4zj}NJAfwnuk-eov<3y zqjx87w8-fQh;GyB!Ml}kaMQ=Ay6*B!94v;rQT&R23?0Y9#_3si?rsxHNX5zjmO2e7vDn%$HiF6 zJ0;Qg5_2LL0%S@2L;KGlSj1dbI$C8r5&7>kyT(%`Hx-%#Nfo^xs%CVAc)ZK}LOU9IgS8`KXwh~@*rD1A7Igs)a z3-N(wh$M5)`n>&DCV9*1E+b(T>FVk6;J0?;MP^4c%BV`uXp@-ww&b=RRW0%w-^W4z z*bu0G&7N8@*-RIBUX}Hftl21lR1t|Kv-hHw)l0-v3s%!(^Q`=)cti6cLiqls?~p=a zqF3ZHjoF`##!POaE-KDX*UiXVBbuS-sZM9VI;L#n$JMLxb-X+eni7D6*n5qc`)7@l zf&9yu0?b<)SC18YDMb~#+=N64CE9j1TXWzSoEODfXsP6#Q;PBa%3*p-{?4rmX%p_~ zOPI<^4f~iV?Mj79YXWV5j~wkM$;ZUcp7VXE;3pJ%i|^4T_S243-><>Zca^Vk_K+st z3*}s4VN$wLX`_aLaYrjomt9w7=*xf!JLBw*#Hf?SE;00%dLxfbtLo;yxgD|?NIzPD zM#%$4U#2O0#u|+0HYRr^ zC2&HB*_ieG<5iX2DDZexu4PoLonbudOBe9woi#|j&!h=-qpiFyTr7V5LOetB*!&Xb zN?RJ!m9!gmQuecP^aX0~S$1=pScpi!u`hmDy)LJWOcapk66h%(A3$5?u{o#Wc6SAR zs;>G!CP*b&Oo z%sq~$L|D+C=*hlNrtT8J>m$lS(EqAjYK)azXTCs!XlD2OIs26PY)U z(7>^|vWGt-(l&bhnsH5W11?97i^vUQbS+Ac*A@ zrwU4HEdE6iknz%b@8j}|@J_~rtoRI>YDRM&25St&AKadpJHk}f z0v*MNtT!~Nj+Cdk!25-Ec;OZ^ewkYF`h3fA><2Mik+W$I&(sTszwC>(z7U`UKelAR z<-uToY)#)jkqL7+mYrH-Lv<>S^!L&EJEp#4y45TGi)&22HZRXeTznp47TR}UzMs-f ziVS7nieAed-ZS|sbbjs4&YR<*{ z{MPM~=I_aJii%T%6JRDUVQ$qgkGi;P{Y=~z7kM&UdW4*` z=9LVA^pfwrndxkl84=bGKJ;CXHhI6zZm)e3HNED2ls$3Y49$O;A1N$OuVf`XTE}Ljr6k2QNeV2Z z1;qv*rqlKsAOF1+-d?i7n8DmDq_>N`YNRgAX{CItidC4}Yst?@zf}H5+fP}hbofQ1 z*k!!uj(Ec-h={Yyy?NZ_z9Yxf&-X@3Z&5jjPSY!MHc37!^W@D5B(5l#5XMy)G+59f7Os3At72? zraEZn;yAL~4^a)F?=3|FR5(TZCe*3Fc!)!Q1s5hA4*Wp;Y2jLhN8H*_A3(Dq4GZ@=t$ z!L_m$bD%v&vhp66X6l)C?A7$Ios?LCj7nLx+_n22F{PDK+e9JqT=_8r#(ExB+dsEP zq@rm!wPWx^Z|8AKR1wlVRp$J;XI8yqtXs)srj%w8Xm2?5>(;!kvq#xd9J2mFLC0BRe>U1!J1Rr}FAvi@GlM|2KZ&|m_|2{gJfXVs z><`76fO|&U5jnErQU%(Ceq(fh+hcC%lH$D9lZvU~sWi2a+S~rIafYNij3I?-{>QoY z!0@Q)m43&}fvXG8Qp6Lv)-d}9=v>0EL_ghR?VcC7o2ehlM(2koD{kM`4||^z##6|h z9MMuzX#R!I>kH@i!dpW&NmNc9g>syPb|hsHImWL^f)p(?9ajskPS*%d2TEkH_840I znE73jUKJ5$Sb49P&!EMzgrdpM86Mp{RS4!Me%L}apm~4qX8#YBt=e@?r|GwUv zw@>6^>9~}OclL564TmzBswuO1_)=Mhb^GHvv8og-@r+Og5G_ zOF2Q~Lo1U)X3{}9KI9jPq&kI9;Rc0He2esR5_?5u<~>dl_!Ust1KHC4JTu+ zZKDr~i=2CaZS#xR?s(%;vofbjk#=0lck@{$%}O4XP|c6*tZLEG=56o~X*rlc(bsR~ z;82p2!}5Kwx3@Pto0(||16h!jmoG0ZbrEPE*N+bi>;3T~O>upJTI6?O1U8ZzG`zC9 zv~=0*7B@Hdty?rId%czAeTUY^gHJesqlK}NQ=3T4Z_?+Xr7FD^o< zboT2tZ)YgnL7C&a&=v-WAe3Jcd~WJoYf>pCD$KJ&66ej;$KI-texe;_pL8yywo;bG8tgb-Q-CQ2033VQ6U ztk`@=MRwnrVbE{`eHavGXPy-%<-q;(FjVxtlU&fkHaf~SmQhhrQB|cG>x$5^!hro` zjEys#bGLxHo{|z6L))?uni(8iTsJqjrs0UlNUO0L4fI(~XjbtijEjwh%T3Al7MR0Z zq5ZR{xHvr*54k@JWyuE*9!yyQ%v0g<Gxr*N#EWGTCAj z3L=2*7mE!8LOdXs-o?Loac6TgfLnn%5|GnbSy|{DEvB_Dh`2}c$H=I{#MRYrYN#MI z=u&^l1hyAgds+f*_UO<0pEUDe7Z8idb?2es;TJe?Z>r`QHe?|bR__foLPBy~oE*4} zUwqat9;AZ-t2Rxqmz0&s$;t+ngtpH@O$m_hVJbkI(;FtR^C4#SX#T(wObG(TIXb_f0YEXo0&fEf|RQe*d1ZkVL|6$8`cKZOTVA z=O@qIyO&Kd#7JOW#Jm-=EG;dGlPd4>@j)?*31oq2@bIukx4`LM0vF9$SvYcXnFcD3$m0Ou~IM2al5^vt%ug7@m7NB&z|CB*va3t1HQ4uHn=#~2!*xtXxA|q9_ zv{*RDyO*}gN=rd?N^6jK^L`%~C=V0S($HW6;U^?Q<-(NMK~NFu3cr56kBMP2QbC@o z$;)^D`SZy+4zPZ`y^3aw7~Tp`U};hwLj%+E=cNxZLz??QaD;`?HGCQ|u(Yzg49f_e z3u4LGgh4H!Dbn%IpNBuw^){?HQsCnh53-_h z0MK5b=tFVXquW6c@>J0UeE^5w&Fx@!x8fm%UNRdiYXtf{8~S`H>^F2ch2qF%_UYgH zWyym+f7#vXrp&4Mw;IA?V`F2Z;(xuF{4qj^>FVZ&i;F8xZps}Cx4WjL(V^ey9I#vihW@eYhXWPk4`}+Fm(J{O}fHy~hb_Eem?YY^@0Jk=0_AwFT zwPv3oM8~9Cpr@`*XqF1iclUIpq+Ww0iBzEa@f&=pzrTNB!7P*qAO~=l_A|b4J8!Wk zry5|F=r>SRXJs+*@o9k05IN&Mg5iDTMtJMu(h?UZr`S+OM+YcUR{<*YheTC#N`cu~ z;5&2(-#HcUMrzoBMyh{@CMt@d_nRT-a|ef8S8)T}jEgJN)0?czgGzw-+;Rt-Mi(6q z0}MJrE;=@rx!2g-JiEA9sVEG2XafMo-@fyp2%*y&_?*$PNTF58+a3R_2%zf@&I+oq z!(tK=1fMTP46l%yg-1m}qxeU~EOfKzP=(!iD^e=GZ=rGiG90n3&CQ3ZM2Hzo1dE^q zO5q?-SenRsFuJQQj@5#{H*S%TkWlbgzRRXXCXKkshlhuo27KB(IxJ01^|d$-tZge1Pdp+6I&iLT9IoOC?Az6c$@slUW*|3jO2f&%zI6-eL{%_mBWw_-nwwy1$wP zWVZjSS74Jwcx7Z{U<6=@0MY>?IOj|4h!8R=Dk?hMO8~}UGkbut_VheP&!T2(YAW~% zo}QjoR={r_4re_8HU5nu(DMH6o5cF?JoK$GFfgDm{s{e;4$q$lmNPLkgMEI;Z3r`C zfh%D}=dUF~Q|+^7im=4F;}aA5PEO^{aUfoci{_{-|Ne}MgFh%qu=Qg7Vv((eFgECg zRaC{+w}RU9{=vbiQvt1oQc4#QF_LevgBOfUOoFdI(^{z^fQGrMyMD=(g2`Q(x7ME-0!Gj`x&C_P!9O=btGlT<%iIkZ5 zA=rE|Hn5u__kXXfP+q%6Y?j<(3WN(lxC=|&Q2Br9h^56v_3~?o`xWrC{1gEHqOHx^ z+hAP|TF(FLbr9t!7#Ge9QGxHw#IK=SA6%AQbY;$^L5H`4gK`>}Nh@=6#hjlonuca4 z`-72@_=gnfii4nm6BF6NP!k@ZFUKkZEcfA|dngWow88$dM@z%vob{cs<<->G^u^Ik zkdsWt_NE7Q`%VN5zN~EUZmxLopgY|C4R^k1sEGL5^In6IxbN8kL+;2(N0}C4%Z|?T z%ereVc*dY-XstLoImsy~NKUg}baGKB>@3#_`nAUc5$PhF~jZUgQZ8E{%<<}toNr}5&JYxO95 zuf%_zf(RD6rFL+8N3E_6Pn`G=Kv-c{fLr^bWH@3e~h9_!7FH z9W3W3b8p6{PqqLChIMSL1qY+QUv(Y(8*nzme1HudRfcb#uHHuIxKA|$NPyVToBwnJ z;W#v~`mVl@1TW#o53pO_k*Iq^^J`@r-X7P<(3??GQj(o5RTSo3!7V6A>9AD_+5|RK zl@iyR3sqh|A3#ro8Srl571hG3q*6_&D*z* zsIa$ocNb)3Jqfl$)YL9V8sxx8-jwh zA3lh>ZyUhn+cPt1FoHr0TiZ|7)mk;OfL#Wg&b@_*99{<|>!=O3YGxt}*bPZZ@Qxyi zm}%zlFI?E(-hN%q#KJOM<5G-1t33s**bN(KAL8(^xTh=-PeX_A^0FmU&Y7o8fQx}Y zH4a<;`t%_sq7qxPz#8WpBlBY!9fjBd{4#Y<>kMNXsM`(!!~T8UKour zkamETNb><#+uU4AS~?t-A=c$zpt1Dve*+DgI?yAC=>IFy5MvzAo10?!*lJh=OZ?@_ zeK=CyyWd6(fXNR)x1jt9%_g2z;^VydX&#wPOGEA=kO=ck_9-hXivksLT@0v)u+$)K z)753~ZCG$1$H#$#q47vWdEHq6L#M+jaDDdn|}KL%hgEn8D6or zvPy`Fx%7X#8bnwST6sri1A>Z~naE7Pe3$^|#n(yDn>17I_G?Z58*Yg3qT89H0RBT# z5=So>r_cJgcoVvwJ2sA12Al&qDH!PP>DwAdUh&}SnzSsdLvwS+!4k#!_c0Kb2k2!R zI)Fd@34u;;!+(Kjf*WYEK#m5?+%~{qnCzS^Zi!5&b4~vA2MT9qcJ!MAA9|Ee7%!CGZ

R&rL-n3kXEn9YUJDV-` z1%BFxv=7LTrdXbX6oL_3Pp<(VFg7+8z^<<6VRkqWmk^!-_bjt7u)F*kIr-w++RjPO zg2Uq0KCI_Di(Bm&kh*_Y4s0x2rpX%zLWblY(0%$M-!Up7ArFmM#n0mX_abs|lE4|8 zDl97~=q_IlKXrRGCTokp`LP&YB>i*wYTA~TBW5=0WBKqfGG^A*>9WrofQd$g-7%2s zz*j>-LD4j?3!-`fA0L!-uDgXobo(7nPNLUmLMy<3QSI$8OKk&DQjeNmA0^%YM zs^#p|JO@G(0f8s_1`~aNTtO4`6P0L6Dyj=r*9i&X_y8nO`-wqn&+nj8(4>H1BR%m~ zQ&3O<@dr>I(@PN@Q<%}`uF=~a8#{5J;zMBUv0-i#sYm6 z75b_xv$M025fOfi<_zIxFCqOdbe%eF6d>Pl+)WG4A?g>wv!r?RM#8|fd~z~h6F|U+ zhj6>Z$9%XN20~JzFKebSa8Wqle(jBoVh|)XPoG=4vjXRaBY@Wt$UX>l^OBR3;W7?C zGAzUo3q2;Ju!~^quaDpETH3mB@uL5BY=AVtr$6W@gM?VZ{UiVTtShW9XwJ#%iuPH& zzNSle@AE%gvIh(BLb%QI|rM^sg>6&YWO;69QWS0e81lOwfP4YkC)nFw(w-47VLrM&lZu2gP zXE$}Yl!NRa-Fmn*?{E*yJy@VzNu;3?72fA&78XH2RvBO;EO{i;EW{X7dwB2OWi$%; zFVb0Q9gQnb;;=og)mbCcOEq(e1#J1>b@GDc<~x$XSwp zAd(joaYR35t%mo2bKqlGUaKKzT&HDL7%${Nc(>D3vXsGRfn)@8B=ZazF5-t~ZoSWP zkq5&}t`7##Lo&Lo%<*VIROSi=#S$cBCb2cn;2}>hp&LqaKg`mIa}ZF&+~l$!v+w`d zaIKXLp(-7;yj}^xgeGu?KrAjTf+5m@I)N!IK>ntuH|`pB$i&758DP98j{@XCwn1K( z>H*0t2lIkHjC~GlB;YKnNET%P-uzFU5_GHFB%V+5s#9$By|iw5^rD0n0~@h>0roaP z0uCDJ0eyXag35HzX>bg#sa%&rV_7Gsf|W|<$ZSY_r=|JPCnGdM=xOO`U)qwBkpXWd zk9p9XI~H)yb_Rwu8_ojg_=i_=Jf6b-nwc@7gO5KQe1c6!S$)0N>}~K)Gm?{Oa#gM? zzVi2Xb9GhSiU8dVhbGGVjCTd8qgOu3Uj^&=eb6rxM(7^^X_e-|SHSyB>#PG=Q$r3#x@952G>!i6gBd=V>H0>}eIs9YZzvEG69+ot+1g`32X@&Kz~ zse}*d0(8|>tjmY*Qt&Ahjn;X%LHeXo<+}U7k+4=&qCNK){mo71#LkJ1bgS|WHa1U~5{{H@gR3U4jrzzL$7S)$q_SK{Q z7pb=|HnkfTfgOQqLDNnPDGn0Ai=K`#R=F~~ms0@DS7VM0Wyf3U9UmVb7-0N_MT|^b zL&xmC!8i;I0#M2juez3-&^~+b*-U^oI|tNK1-iD z>^8G}n3HM3-%KiS_)3&DyxlMM%M6P-XZP^?Y{gs8&LfXsqfZ70y3Dgs78f{mbq!vL z6xV(vuc%!`KkZR^bbXN7MYz`of4q^RW>kO3)ae%x!i$*1Vja9_@58&v3#b=7#yFG3&o1ji<`_ zT6|n!&>K#CVk>(5QpN0g79R2m)shX&SJ?6y$H;~G^fkWPpYKzlun%W`#%-h#yJ7j< zwMa45kWEAE&FE}`mMwt{KTgI9X5V(PQduG2Jb8X`p1^D$8OcoArzJ=^p+4s)MX#0k zqpitSY=4tk`NvF#rQ)r_ZHnblnZ#+u#;hqryYYW#yk{GeD-9ezFX%SmrWuwV&i(z% zJ{)$bZU`=PLewNQH3*KyARok4@y&j#Q7vpNAGZ*DMV@rnPr!~L( zrG16V->s9SX3z1#+qw2N7GtS1$iqO`EJFLI4I97oB-vQr+pu2t`^(ZNRCXity`UVS zo`f-Fy|A&s2!DN8w+xf{9vM5!kY*y$b`cObpV~*-;~~C--4E-L_zHKT)0c;1s;^64 zK2Gs&`OP(yz+PO`yyFBdpNp%%?tuzG1Pxxoh9+hd@CndkaS^aXNyDq=3M9EO}eIe z4mL~X=?qc2k5QuS1_dFV!y{9j+x^Ce1wnWLr3j%AdOF6P7i+VG%doC~$m@{Z-#B=^ z(W^c}a?WaK%5^s~boE@EbMaDa@y<#|WCk7bPz3$TiF2XVoZ6a9KzU^f!*ta7zmd-* z5|Pgf+sE6#XM*y$raG;MBAtTSrJ>{q_j$v}PG*DiE0gMOGi+#yvR}5YyyazMmEmv) zqi|w&J4UaL>-t8h6vNr1>a;&$=2426-rvI|!9f>M=DlNiH2A?TR3s;&eYK`0K;(S< zQd?CAhpsLir^{!Sv%>t`WA`j0i1fe?vXNvA#bo0RAP|kh&Mu#;vCX;9egMl>PE04QK}48Z zMoj0m)kYLatDn1)_~~f;Yq(+**1yE{xs}nez35>~_Ce*9Kh(X- z^bT8>>@dA7=4|N7X&mmJp=-Yde@H3c-0U)5o!G80h+9QpN zgwc3?DztwoGdjA+@EHIxaBAVb{kUxGPqOYzFfLflGF7u^kl~#T9*s2Pzng}0@8Anl z-^iIDag2s4I6E|x0t1}26Y8GdQdQaNw`i?XvF|VRkOBC>P(xsSX;#d&Wc2Vzb!V8?D z6;CAPl8bE|J2lUYz)h5AQEPB+rgN3A+F~!H>(0y8D#g6UEhI2Ci#5H@d9VEa=&T3- zALAYbu|t=ZrogcS=ic+LZ@qWm8o|I2T&ir3EGy$ZEdVTjMyLJQ;SD=}>~eczHj7j~ zn^aT<6-|#N)5HApeb8&nMclB$wb0iZYPxAzH`H6uhv{1UtOR>7{~i3gjJAbC)aylUKuU; zd@#x(Jsvv*gX!jMk#sq0y+MCX=P1|bO#?*z4*GL`QOJuDUmEQiKAkoxZ>)00Km4nW zA`LrCQ0G4VcafyX^SvrfSIq;1Fo-lBWmMhLDP7JIxqB8IYIJJHo$5N6eeDI#HMQ}; zexG$GI+OKa*Sklk`qQBxz-mQcdk9bV-?ZaGELeuUYe_)kUKYuAq7$Y@xc`O+-s-I` z2}geSuIV$@g9=6VVa1lR67L`8L1CC|9X!0Xht(tqA$r`W^t`<7hI&~`r9QuI_Wq%a zr(rj@u~WE%TOkW~p#OU!InE4Zfk^F4LnduQHOKe=^WAtgLJ8xjq|S#K*tF>aCVNif zEpNWE*fAl5)93)_Nv}~wo9ta|rPo-V!W5G*^R8@`D<7Eq^H3WfY2=2LXynVeG0{(B z_6U!O1!E$be0S6YT;T?DbPc*RmNA5>Ge7P@)$~ z0Q@}l3?s<9s`|j$vg?||-`(SV)MSG}!x@I@l?slsx5{K6{1%^+H+Bk^sKOs zNOE0d8ehjI(;;C&mkU!+$Wzm{6MxSR zg8208&VQpuW3d*vdbc@*nwpX_AucW_KOb0Gp&WO8%?KhyTU%R*uOLK&falJgJ2N@~ z5F$c}1zrc`SqtYXK$t+(wzL$NkiZ5$FGL*>`MQKIEd!-76pSVjwdf5GJ|xony$+%uq4ed>BKir z4+4jgjpyGu;Bl~DSCof;hX4Bu{iQ(PP=BmO9;S$aJZ^;e$3n8ZYy3}fm(huR6~qra zKmxjTlJByZmDy!AvF=>A|FeLZ7%x;$5h5{^dTqDKYa=`f>mKCh`T-_3GIH(rKEyMj zl2qsVPB4O+oOt|vAiH6?E z{(w07IPer+Uuf!Ha6s8Lo>l%Dl|e%=J-wvelYB@oS&dey%+NxRU4J;sK_+ZeTEMNG z{>ra!<=o~s0ar&|Uuc7{wiN~+^$g_Efd%<6mHOy3nA;4FWRt4|oR-&)lh;mVw}-V= zE!2s190!d6qvJzfF}hm4TD1h!3KrFMXUm6cfp!AOC1@1@)B*G#JBNS#g^A5<`(*D@ z3OIj136NochDK9sYj##v?Q0!yFoG|7?S7U1rlw|kRO>!XF%F#Mi~t~73xP=SUV=3c zH!?@Ijn4rG1JH&;d$wsZClDXk0MM=pbHSGe_|4glQ(FzIW}~#~Hq$xsj#XY2DI+2! zXy{Hm0dz83u!R)kh1%>VT~2lO{~F-|01@803EWS>JCkty?PUJsFiH4mLEhr`Y(|_l z)cY=DO}=KycitW}0lptWF0Q{oxaP4_J+8uhvb0J8Q#(1@DLBAp+v+|!>;~?TKPPr; zW3~0WO&pq9T2MNJ26|iIo;X+?+w4vj8UT8XLx17J&rb1iaR)%Z_r7zkuC5Ns2xud> zjEn$q*FYY%4hMKH7eT1_$Jc?s*X17B6t z{3ZV;NtYlg7ATJd{>*`;t<~}R@p^`Lnt6$k@oWk@=74M%MEu8RR)O>u zjzwB0u-#T0Qs&Y>2N8>`YlhSp#Qo7LE%lEIXwE{5|3O%B(#hfck;%cqL3Hhxdu}MB zk$Z><5Z=kju^((UY2B-Zix=j3DuB}Rh(Ks7OJ0LD%6@nRcs`DKAejYaC^*-2>90=9 z`pQuAMjoz!OhNi&gA$b_uah0Gddp(?`r3KDUEj5e`uY>C@%?`C{VJe%0$PB4d+PeV zxi}y@Uf_4+y~ZdPul}_O(`C7^88D0<%}%4w2X^XqYVFr1Yv9ZuuZ$nd&d@??@GqtN zs(lg*JG*XKCyAOiD9}lMA<&rQQaUCapu=}4EywEwzvH!{Oa>>rBQ6sF!3q)D|G4H1 zgcb4v>wc;^f;+V12V*;j#cyF*$mFARKWs(gApvq|aF8}PKYw)D8dRKB`6xs0?(q0= znOE2+uh+|M>1k>1f8)mS9URw(9z*FMr`DMU>iaP5bxMoUp{5t_q(?x=p!Tb~{o~Ic zTc}b)7RJaZ-Ait{=+Mj_2vthc^NYyj;+go=iu80nEP|Ytx#oN+S+Xc8JG-yqlc9?XEn{Ff>GgbRwS<#&J z9bOT#n`%Uvl_5V24Cdg4XkB)$efcUS8nshE0OAVd9qwM&sHi;u0KJd$ z(dr!_uO{8oT3uY+pe`>j2Se65HOfZCPDM_RB4Uoe+*bx`Z;pCYVVdz~?QLzoWxx~v zmwIP=J3qTXMOOJjWK#@}sRUfa0XLHe-CfY>Fy%h(6F%zelNV-tR*FfPh#NVUI4LHr*dEY1Y|g-*Dix&?<$i^`!N`YVDXgU~@ z*Hd@oK{3U30&3}RtGs8gCC|YzZ1&-Yiyo#x1PGU*0MlFU$Bzm*KbJV}_|~#Z?8~}i z)ZGXI|MA@qsL~uZL#0HNYo(o&PSKQABM$) zK>jns&(6#Y{29?_#AP%4Du+sw>6hr?Vwvom{QU)auk3-PBk3nwJ39w=?-3pI^s(2K zt3G96LN_0EzfqAu`2(9Mk_23O>wj21c<{A|;0k<_R8K3i?3E)3lSSkcm4MTV!_XSB*lcM#6hs^Z!v;5s=`EUV%H44yUP4emT_u1rf-SXbRB)Bgh=^BmF&D4+4EWl^zF0y92el zjYuP03bF#vP4E+N$=|teg3An-k%UT-avTWUzI|y!7g!1xgpc9;(kcde9mGRD1Md7_ z725hreS8aW)=q~UCmS3-83#ah14RA9C(UuzRc7lN>uTiLa2|Wm?Bu&4^y~thetaF+ zF}GJoEjVMMO2D&n%CA3m_5V6~Fbxb_6rsT`{c|Apf8+~=;MSY;Xx%D0(-@fypeVy3 zKLN1H#>IHJVPYA`pmd7v=7LW*{gxl~>DW9+z@uqR50cdL`tyRCm9P3(7oep_Zik1z zC589D-K~Q*BFEd`T6W# zoB+)12DY)lfT)^#%#n-0)f(>*|i7cN}!_ukCKX@PMmV#sRC{ z*H=7)cTwO^1H=hy2Yi8dh?FoA*_ zvfxl9@3|3PSAc)0ckQh2AMh!G!wnc}(R&zxkjv|%s|!R=aF>mHNb@v=Ag~-K>mgn> z%SG)_jV}VqBnT$q_KNZ2?egQwItWZqC$RFQdKFI5_~G2$x;rU2I5_CyK%@!)XSmrY zfzMu|V1QcC^#`>mn8A~y@e_|$65l$2dsCgpLMVnlz#EMCe(8Yj!3?AD!my_!XkWOl z1c=uJ%4b3G0HrZUW8IOjFjN@CESVY z1jt2VY%Jar4%k7MR{h@Sn&-d{HUJRdGhB4?phR=!_~-yis9l2TJ*E&_91eDSZo_ue z(wx2EKaKM1?Tk8gTMdqkpgz24&QEq1f!{17Sl3T&IAORsf`k^DbW_#po`CAVsqg~I zkC0CI4Hr2*1fW^bCxyX(6sK_J99NktwLL+vQ^Lgw>#xP61>m#_?|#16)oJ5P16QmF zZM58g(pUazkaRpnR$sA<-qr*f=94q!obBcY%Bl)S5~{!oWSlX}-zaRrX&@f-$+l&R~hl85?wP zwt&ZaHqJ(XheuUgd*s)z3Wxwn^0WAF-LeI~W7x37#6%b()bOO?mbIdGcCBYQ^C-z3 zCI6f)FZdoDl`}*q+0wo=5KE4YLF*5D`LK?HLUP{%X%fK`4)i(~z$2b2Qa3;vLHVB< zE{e&{X0Mp=g@Z|QycV03Bn%X1Al7p6ytnV&(}WAgG&QRrPhgQ>$?#kFRb)*sVN{hI zd&o{k5=Z^gn%Xw$+6bYo1^qu@K!ANul9>+{nwL^|0*E{K(5Mzl5F<3K?o0dojcvqJ z;8kqt%|H-#9PADWCKMHqJh204?1>$4`C+kWRDv*V=`~uH7o5-{i2h47ZKALbM=4`q z35E}pOwY$}NTqYCeq)PXE-mH(aLrOZ|DA3>L6cmhCJzoNab$Q`6j7|Vei^hjz zk9qT4kFNfatnFL~Eh0iZY1iruNvOL4GrIfX z93=8`&h4gpLh?>5py|?|(M%PVv|KFYDVkB9if@@W2ugr|di8Vlf&(-CpB)9rjjcbp zj7(|2zBdu#3aP_PbL5E;fq+~)fFspOJWXLVec7`mEyia(9>JO{s8O{zh zj6c9Yx|%M^0=ycpl9f>f@~xlEsvDb-KMQx+PgdKdm6*lPrS@^u=$|1 zM8ppt%Xuf##DXsWzk%twbZsPs;lA0vwXATS=B&+x29eKL-hWBiGr@;Yg1UbBL8;YONRWK1j$?VRu^uJkmSu zP*A zZMHHCy~k+CUQfM@5H*R%^HFluj- zxvWTvNf$82H2h%|YC_^;k^dD(7GIb6Y&UE=zf7vRJ-5v&9D${8HA1yE?f(t4w$LGc zsEW2M9oxGU6_GD{jV_fwtJ#$A%j)%SJ<=o~T)S{Z#>ZHs1#5D881CTxuR0@VWZm0E zkmb`QWdDU%VESDV70uh=u6tsu%EC4VLm7v3XnAP&0q7605*xU%xb9FtuW#ijWT5#B zn=z*TY`pba{n4F?>PPACMnBz0ijVwx8l2yovyW! zbwhVQm+FAHLp^(J0zBKNZT3vgQX0!wn#@EJO zJ@pM4Ceoy6o)C2Nu6kJO>uXvj`~gv*9%b0>yoBgm8S2_Ji0>iDxu)6mk$HW z``my{=icwKYoS`UK4KFjewe^BIv%$X;Q4+iBHCdMS2|o-<}If8C$uns{H><*D1i*e zUq*E9iVd6m-k0>3OtIeQ88FwaK2PqoQW!`cbWC;Pas3!SQqXYaTq8!#l7XVq&$^nk zxG9dUrSTUhaDFCvcHSmfKITPktZHa#<@{opOrBya;Jk06Ca(Px zuYj<-+3x21Uv!2^fIB!Ic)7RW<5Yn?71*@ux$*l408(C*jl?HyZskZT#hD8=m%<$_s?bAaxiK)(1MJqjgF?r~pnY2~53FBYc~faMEq?jbzp6xr^+#OpXq< zKa>}dhbXY=>{R62!J{=f-cgo)kH;}_o1uAw0HoOy2;rSr$7??5>`shdO(@DstChNb z@B@2K?T4XX^)193?UW=O8Du&Da0Z-BwO>L?IU~h{F)kExBLzQ9?@Q&IgqCc=uIpWGJMmbHSA1>6_ox=Wew+GC|lhnW@2BQp&PH(8#3 zE@^uH?EABt@m%Vns;2UUBdty&rlk6kTft4eh{!WY@ij=fU{4lgns~Vx_D=oDT@2(D z{q-)#j9GT@U20ZK4{=&fOw5~`lk0fX_HR8?Yr9Qtu9j?4tmGW&N2ElJSFo%V#o`}7 z#uP!Tlt_+2QI+TVb|W}>cl1O7e`IXd8&0BGDj=TsJ$n}Z>ql~iHupjZz@2zURQ*k? zKMNWb4eMB(u$I_A`2X1Z^JpsD_YD}|XfB#GAta^9P$ENyN=a!nrA&p)lzG}{6p|^K zLWs1drJN)&o^|793Jx{s!zOVbbuj{$@_>TPHbb*wkNWY^$Xm-6zfHz>tpE9%S&2h}$Kd_Y{{sm40atZq># zE-aM0%`x{XyXpHwMd~+>5knfjXJM~l8g7@Y+mTZlV*Du3@4c_4YJi=( zg>YIQ4ZA&Y%>e;a)-x>Tnj6#pSnzZ$ta))@?O^rmp*5AIq-`cJBre{Q_Rm_j z4`bQ&3=5_hvN5uYRp^{OmpbPj{Bl}2CiT`Sk_;Ed#XfUCF&W4zKe{*Ao4FpM6)%o6 zMdzi;4A&P_4Pzv0i>|?O3*Y_>kqD%#Md+e;OIA zTGf1KX20!Bx~&1Yk9R`*?ckd#iAqQJzFF(i2L2`KqTao3Q4tfiKpJ*a-$xXoWkDOg8he)2%b+>4_sOZNAC%%pUZZu>;~a3%BwRaAaDW@VRonE%SI z!3Gk^h*(PO_tUeokXQ;^wuQS5YR1eg9KMH%GFu6^^#8oVi3|SU7D@l-D@v$(<28SJ zV~1BJZRX{#)sky_ud;7={&ZQO>h5P5-ScjrH*vF$aJ%zWe#+I*w6dB{O&;TR?@?FV z)zTc!6FaesW-H&G-sj#AIB7)Bjyd=S>5N4!&zCuljvJro8WhdRG3sA*oRGajmQzkvA|O?`dxjGFU}aV%~H0 zG;28O4`Zt|@ANCuN!JBX{Xm8-bivx*-X2<~uN|mJq+E0SK)Sw#ppZW{0Ix}Td98~a z7o4;&AjKZq;pSo!e328pE86a(q+Y_Oe#W@s9e3t7DTcuBY?6cTZo#{WUG}+=i=yxUlcuN5fQq$xrmYQ{2pN(0;Vt%jJKf3h&&BRJRXU}*w_tPvZQT$ ziErc&{H5?Dog^A{ zz44_b9L|kTh|A)M%Wj=Kbt>TXYx$-7$Qzq*?! zByp!3G3&R9`G^HC9kbrsmJMly)YRiCc$$9`6%1tM29TT^{rgiy947sgu!9EPMDiA!SB}ms#V|*JPsU6-GV!Ng2 zBr)_2SmXWy0eXYp|5@V+7#4vP`Q=SsN+s4f7bf}Z0S7~W%uE>ZaaU{wWI}F)5V??q z1UiH&?!**F_!tPZ5cB5!#2jO$>6B3jR)mE6dsfmB5#l?0e;!aJR=FNV z|5WbZD&L=dcv;+uq-BVqv&-WqxK)Ex%fBT~+d@Zrb_gRbfBDwf`1f&A<6GS407$Ys zJW4oxE}mo(n%wJRa24;_8%a15ef<|Q9RHc?XMF4bvrQ-Vg_#o{#X8s~2}TC?DQMfz z_|}j%9Sgn`y-A&pr2S@On5iwxA~ps8h_?f-ID9GE_ZgqN^eEfQ*4GS-Ih$yAdHR2= zFD`S?LdQcTWicE)_E>3WvD;YeP? zP`}Ldip-xCnd!&wsvegTm^P_zm#)*f?ZWZq((IU1hQstw^6a%)V~68A9+s^8bkr#D z4!KQE$u(5*_;iMketKOWi3k~vTv#2wBi`~Ocf7>)x&7R(oZak>m$-)wy-r7&Rxno% z>Fvqc?XrR$;C_)?v}tk3Z-2u0VE3n%*kb~B>npP_550M-NnSWwe@iaHgQk3Q(C&9f z-uph;Z#34GHWlzh-C?u(W`X9{%0ef@EN^uAo|4hl(BC2t5Sw&F^v&f~iGt!Nd;X7|JP4(2})bN?{uG@dlx(r4p-n}{%&D6Vxr0Ie#6?uV{WGpnEcB-)vc4oN2vks*F+r4_!={j;2c4 z=B?=|1w3Wz<7KrERbS>e5t%$Y?fs|@Mq`Rkv@UJ6U|76WBIxo!!%%6ALiaCS=Av)W zzUP)TJ{OvlBs~drFdx*+pJ)miARpVEQf_mX*% zm`-i*LCSlp_V573rghhM3DDEeT8%l)#JtLw-~UNZwRj}kY*o#`%7UasBae;i&A;l9 zWm@ou=SgU8nhj4Kp61Qh=wrCG#D+UQ8kW(W>Ni&po^Gi<6x_dFVONn1HmzNK zmR@6jCxzlIxN7+7w05Z~o4bFu1O0}Jp9=Nxi4o_)Un5svyi=GG&d*?-sXMsS)BhxI{`<*qS!?P5>R`Z`(r$bp)S{=D>FBFe_OZ-t9hPjX>n zJFVzF@?Rrx_yliJ?_Wo&k}xG3ox$t*4+ck0QLWO`kA3tdnQ!NHtF`~b^{K2wg2%Z% zn-W)aiMAXmSAmV|#4^na$Coa0{@J;Sd*E1*6-D!c_MtP%YrUiN6LngcQ{sc29sBv> zM(02*r<)KHe#Tz_S^4W7Tr@pwq4)l z_b*we^uS`Bd9&uhMLg0|n4I7~W{qQEfvf$6d`D9@e&Td_$P*VEO8%l*_S;Q<|6Jn2 zd1_cSKHyC}5lvjvtL4HFCZ_VH?wk?+N~VEzs@sd!9AX-%7Fl+ePm1gPIY*_~q|DK{ zyW(=E(b}%&w9&0Ww}-vR3Zq5=YgM-vy^j^0dbyJ>=U=dmjob?%g68yen(i{*)nJ93BSbe^YJ*59gl zX(748(|?`Xjv^hm(o1?_4!SX?ys?)?$zq+%*)+6pAzzkVuPvAx7^`gRc(8J(^3I8Z zt#s^+JE~oq*=)|lw9M%_3$}H!>PV2Ha0rw}Vf$A+>b?>3J0r>SL*Buk_N1wZMXT}3 z8N={i>7K9;rkg*rx;(+0YYVrvx1{zL$R{P~ISEu536Rcm-oD)uyjV}d>9Bvqcr(WP z@4}XX{%%F{2Td_kEv<_Ct?w1hl6tQ--{r#2Fl>rZH25yNs! z2qatfq%|><@B4y+#ae#s)Hy_2k29uJ zORJ}5jMqrkgxe@k0oEL!TN7no^ZV2D_9$U1GY!FyW_tL$_rzn@Ee^R~$f{FC(_EJ$2;O^bB2@dyA0Ax_*yT|v_4mgO7?nl*zlD-xu}G+hA_|d4fUFeE!jAe zHzD&>u4Ml1gI?$6C}AZwr%Q5?+?eUdjF%Oa*sL33uSnbpzs4}LPHk(E08TTNF6Qtn z^z<(~UJp937AfA$(3&w)H`IUU&}c!khfYAA__o&m<|v%L{?o5EXYO{PjlKQgET`B~ zR@qo}?K-tB|M?nnuI7F#a>@57*gRUkb?ml>`h<1~Y6u!CjE~X(?>AlHZZ6cDj^*{Y z@Y?u!x69eMw+bc??^<*2pvN_j-7Zuy-S_J79TeN>t%3`#K3~eYpssoI z2J?eMq-Xwva^tX_E<7|ZVDm*QXViINA&KAYx~%A#BiT3(wR>3Da=A}tGxB$Qy$v1> z9%>w)@;$+O9TZOKtG=recvmGnw8?AUeY|u+w_{7IoJ$@ygSqdeL$YQsvb|{=<)060 zln!%BOD=Mw8@Ti>%mS$J{%|S(Z~Z~~g#^l|)6~DqJyfSNh6H4+L1<;6Op5O!J*iRI zAOhdFw~#`ipomS+4=un#cA=S>nfFG2M@Ehjm%KqiXQ7ZXHclCvkm`U+bwpLf5fL7e z``drNKzoUn6iunT(;_0b>TVy|U9-uB192WnzI>nF+lPc+5^hmKjovG9N2>t>5&Gibt3YTt&O{-Y4HDI9zjWPNs?SDKh1Drc`EHW$02J*u| z8>xNN`ht%N#Yh==i;0U9{Q^*EMVF_wn|A}=y+%#i=k2aI#UXrUd!F5NY8h*1Q`5;Y z(?#*HI7LIqIzqyD{O{jDDBJ*Shn7WLMt?+uzd1i4)PW%Ii$Xc7<^$CsDv0s2qiLU> z+vLKMNFDq@oiGbvS?NhGGu;&9f=?fVA~JAM>(xpD)FqTabF;GMz3wB)8l%k20IWX1X7-vL#di!vPN- zWI%NKA=J79_0Gon`iB!5fSD}yK_*ADU`2i0zI7s)00a=o+?{SR_M0(_17QwhDC(!( z7|w$ZP=i&sRjaz{t6~7g!^g5QnTp)ADb%HuVgg?fV`D?$&Qn$@RSJ9!6wu+4I>|f3kjO5L#SVWdVT!pCM#tq#*2}+5JOYs7CL!y1~B$~(A^jQoci(L z+*nsbsm*CNVA(Be%>Zx?qPz$4^DodZw>31Jio(-BeVRg1oY`)Ufw@t262Ns{Y7~LW z@N-C)r7){ynW}U9Gwt2`8Hm#hlx24rj!D}UU?_m?TQ4|D-Z5%F5_2QPb=Y}nkTl2e zfCRh7^{hIbfJ-0^axkbG%{rh}A0!zjf0){DMu|bEx~kpKL9(mj&0<+;C(xUNsK)m3 zYm#T1At9yLQ4Rq3+e^{JLIYGpo?LT!`a2qO(#@Hd>{8j1g``K!%{Xz$8vPvcne*{- z1oC9Sh1YT1q35#A=fN0b=c5Gnzd=G105Hrf@Lpn7KH0zBc`kI-026K_9V;RYfX7y+ zi6_VW<*(D!y89fT!W$YK896grY)80mRdJ2ja3ZSg zGapJVm0zytO@)p->Y01Zm3yYBruE|*sb5cbo-jek-V0Q(a!lb&1>CapC9CNO?;kvv zILEeBw_bAM5s=a|mFOo#cUT_Hz50l60YvM|or{P42<#XT<0)I|>7!N96pU9R&$jo} z2&56I?cuBK02|dGD>3fMnT39vMy|C<3*Y#r$q90v5R46YbpJsO*2T~I-(n27e zxya+^t(3vmfT-chqxGyJI@PKZA52n85roHC?RSgH73lRB#js>y*JZfaN)#0+Pd^-!>iSAMRSicLvj z8~f-?rL^XcSzVJ{)ZyG=ps&9W;Jg?B7>)ja5Ypecq!ue?R?`7m90v$mc!&?VL%#f( zhDVPcfkgo{r8N*GgasQMRE0n+fk;EnKw9O~p*jXC27B!%EJKur33{v8dk0pN|Qp;c>Ti&jC4Z|<04 ziK`Vlt>SGHa*fbpBXGN5@D%SSrVS8Qz2Yiu37$`Z-=fIvx&aWeme}F;)~4U$e((Qc z0lMu7ngIcc!ZAYNjK-;y@y4B*eigl-)JXMfuuA{kW&dt3JIkgM^R|FX4nDZ4tUO1U zte=v=T$>pf$Z0lb8MmMKl#!b&PX1#gCs*O+G!u2hW~PtdSv3C{4vUv>ABmcG|9P-4 z3xpCHHfJD$F1)fi&*9T}Ron!#yxwvdusffz2Nj0`UK(Gjm0%TyNarurFY7-8Fy1)N zfqtj?z&#^l(_9*NW;T zCck?kJQuFc?-zEathll{Ex7RZwwXLG6I}lB=TyO@XYm!}iq^BZrdzhXlHnleIy6by zZ+EHovtC;&{GxwSY>07~*psNlAy4RO|7S@DV2&f)=sD=#O0Yp77rEu1sdHZX;B5aO z9BOYX(^6E6r82sHYp3BmJsmh0C#C>vmdYO|k271z5*88|NUFK59l8k zFLXAVpKM^8jg}HRyElDDhCCSznqPf8!2@{CY`S+grp0-|)%jwrZx^%{%d=B(!VFuT~e8&3ovgXoFu?H~&hOF-(s|W7H*Vro_yLm71JO89(69Iu_>)1}OGU7!4 z)rW=}y79eMDGpr#`+3LBV#^sC8?$H>;NV3kNWt0fe#A*!e2Pvss-Cd7s$omhfJq5U zbS)9<_pt8xd0S-}4j+PO7SKBSvV%9J%mQusknP2`Qv3Yi7m%d^(RyNy&;;cV;2qF! z@UjM)D#rPV&qbZyv?yQ}CHl>2zE+^!se4mLh7r*cEP${B$ILzgkG-(eT3CQX|03s! zsa@ibcD(9*rf{a=awGrseM$-njljV<0P!cw;p54C3VRH~&%Xp~L=KXNM`$kG_lHoj zaHB?pvpcM#&X7g@`8~REf<71DKKQzkuy{yny&8GWW!i8QOknrb%w=^|1=UUnOVmIKt7UlN1k-KZo7LEWmR3g+R0bA>Y z@TH8ff5m&Y(HnZw?>!w)s-Qt&o8j6Gyhuu3^pxk1>Q=MAc(CxAo?DawQu~8_gqyM+ zZQaD&xxXXiP}L0PJ>S@-Gepv;Rk z?kvt91Prs-y#~-QvC^Lhi>ZeonZSvKqUh>Gl)pvKR}uN8`WnfyWf$2D8u^_FOnjJv zG%iRM3)^5EwIQd*y;o(&bh~n_*4vcNya=Ux&sHGDP_KWoai%}rC&(Ip^`Gumk(b}K zuuvdcYBd%f`g#F7^86BO>&S)C-QOpTjf~*#NZL$SpVG@#5k_nHAaMaxKd1DqVV+xD zdIo41<9rN~va{`2?z92QJ8|~dZX-iOF&!%4d>;a2N5lITCH$gua+bUu_65D#;r8v8 zO+C83_kW@sAzv)JyC8S*p9PRv^WgM2u=(dbc9Y?1iHp>g{vjkTDV01aj(E-HZ_h|5 zDl!JQmZY>NEal5x2@4kQkSfQc8%7{G9}n zEZi6`(f&Y$Gmc*#uRSL81*l4h94-@a~wJYocN0ecaY>y*- zu=o2ILn5ryQ)oYG-EBQCD~UjaSO8)O(K4&Qk_7E6W|u&@OC?4cz}AU|%lucyx?Oku8v~pu4xy=G3PB8w9sm!ca5{8+ z1x5yLjIu9r9E69h!rG-TXLv5CB;I;hUdEU|pAT$2HpijMgfonAV{A3%N@^NMO4&3V z;A>QCZ3y(DIu>Lv2^!MiuOAgkin7fAYKiVm!y5BBGrh%l9)YTZcR|Ia3a+kFs?D4% z>{s2m!UCOy$`N(?tY(~ouXU}V%6U1fO7CTRnyn_87qGd+}UK3{ltcFkN9X$Z9?X+L`iVoCGi5J};sC{@!Yf*;)Wx#FkH9wARBisys{h1INnz-pFdv$ zPXH6;wdLwTD;pbU!grzC!ZFirHyexTR`n_TBRX~9CSv&qF%P?NVwv{JS$$NiahG== z)u)q}gSnZ^0adrb!U$P{?#+Ck-$g}f>$nmW#K)WYi1xsge}oO1>0N{=WFybi7gr(j zFT>k(o=|XRv+MPN+B}$w&I@I=fI_cz)-2ysF!79^Ja~Cfs`Vp5whtD z!oace=m(b5$MVekD*djC4T3ArN|{7NA~l-c$S~_#)66K&Gi7sHuR7SDvRNHI$bb}e zk*-@B5ch5`a8)6?JzG8tfo_SP;}U3+=JC%Q_ztkrF{bFBTq>l_Q=J+^x=*gORATJ! zqNDq8nrY@(8X(St$}tf|TMpP1OotL=zO8L%;hcg@;PC6&9)_M<3y{wxPSYmPR~LKz z2pMR2^I(C`5XXe`6-`7^L~!+v;uvLQV$y{J3ADq$UyM0gUmhchlY&9OLcG!{ifzRV z4SJWY8Wz<;&XfUX$vh&e{vIyn{OjwpGFAs;EaG4xqvua#u3V`mX58nX)4}L8SZ|g?(L4`}Ts&~@mEQToCD!RQF|0UiW04&(fIs;S)3O3u349sE zNxOKISypM1L12Tfu;8)T;2j~h5S(y zvrfcC0GuOGVB7yA0)8M7+mvU$-;#?X7YO$hkpC@D-$LwF?plqbDV#F%K{d?X5pND{ z5o{oKQY?jzbOj)QV68~LyTjpQ3udk4l#kcS%7B!Y3m}1-qb1IIIASj4U$ z*z`KAPO}o-3S=eki1;DfKx{DTVmLKu@C9;2L=q zR$HOqv!=nWVrg8X-Nv*&%?aU92e2+P6jo-%H$>+W`f`aBU8_4P|%?t{?0+R}jw>pKx>U?9p6 zo5rVb+Et_DALPinLMz;>$?EV7PRq zclNW6jg5~>4aG(gXQGeuxVG58X_7F_Wh=C+Rmj?(Ct2^<_Si<(;pc8?Ng|tl^trtF zGtXdyM+C0rd$gZ{68aP(*blNC8;+?u4&eEoS(U&)RrO)i@M zWa(obc0@P@?GDs9vC3EbL;D>&M)CIBiSKdDf6;dcX?>x(RC3d7sllViDBYOfR6eEM zGz%aCss`)Twn#^8nH^ysF=fj*+Gw?dJ%#x2YoY4erl!`+q~n_>a8h>4MDe9X>6 zvlj@AUe0C#&eB@!g&dXL=R;niBrk`ZUK%=WL~ zVV~*gRN~QE>;C2&-Tz#}=W}=IVQY+@LPdm&2h9ec6Wa0@vu}j#@SMg+w16J)PH=CR z^~#ZK?`cdMefIC7+BXA!C_V@)eb-=laQt6^L=E+yn=vuTvpi1{*SurT5!T6TQQ{RL zhTec@QS(JV7hDKHi?+9}x%%=@F7CpvHEJ|g-J~eg^G*Am7H)DpHgK`S z)?_&L3O>%-mL*7(rjEA-JruqK%)`C@8#JwS-k7?V=TF|wTF9#pxfwS%;KXb!M51Y5 z{+=QzA@mN=5{`QP|1Fs*WG(#Amvd~2nQNZkVx75GZL4$vzyQ@JCqB`#YZ;TgRTVB%V3$@QWg!G+XwTO^cTlMb!2Z@3lm_w|mH%Tsq`xEw!Xc3|=ai zZ7~Iv#-^A`Q4gN`yqoZybeNSt_2x?s96t5oZVlOE6*VfXTZsX*Q7g4WQzV1pG7JWz z&+MiX;6|nTbmbjQ1*==7^T)>=C)S@MtDkA16BzVe?aF0cx*O9HUwn0L6WhDJ#FsX~ zxsy*x**juNAX0#U(lQ@){OgJ}D{JO)Aw+Y&MyrZuFHvN^>K3z@AsdQ2{aIWFKw~?~ zvVO7C{$=y|6U*{R8M65>5Vu_VNIFDSQxo`<+Law1O>@i1 z11UJ)Z}$;aqD0=rVt1kW@yo3qD9%}xt)9|h8YUf)jN3o&oaJ2GFJG;+`D&u*)C~J( z+H&Bi0%P~-k&h@V`@~sMFK-CtyjA)Y)8UWEb}mUgU;QDUczAQnqx>rZkq|1W-X zEsTIJ13{xm3YeO1QXB3PuwmnC;|G#0BA9{)_K1N*9jZBQKuQMTs8zK2k1%; zWDM6+J7(*)9*ldrC;C>NV@){4#6answ=PXDUvQ=4MmjB)9AU|{*;cCWB?e$cEPD3| z))a6UEdgbNIy;~SS=4CPv>8_p>&}J#(E#%N@e^hZ#Zqq3#UZyp~ZEwHE%4)*orpkC@Zu-AZ`wzg_kL z0EaWsm-O5!+1YTJ+{(3_^U20YbefL7OxGb&T>3cn&Qvscwo9CC<;8sd zu6$dyXKJa+=+!MId8MM6CP1Y27ZyJJXmM@MwEouM_i0IGi{8O6ifaYV;G<4fCXV+z z)msm*Ujov|{@3vBtA)ojI+>)EiqfGbsNEVxZO$HZJ2&^Tq)M-FF}TvMZ0+@3Mem70 zC)<==zT-4K8!`3@zc0OH=p~so_}U|17%(L4xlZMkaB?k`&!0t1XTY~IM|97M$>Ph_ zHP`nR=^#b*SfDLKud}Ol)_D0GUuaacE&xuczdmoiln^qdzE?nr-a2!XJa-DsA>gc3#EZG1sX6+Il`R)$z?b`2%r^ z^KY5~kQjEHtxzky@7l~OnPS!6SAWYu<1$@2nt8KdU)HqO!O8^ypqJtUbkXeg%L)DA zw~h~I0EI^}r2ER_&ob0L=bB8SI3@_FY*210)Ixf zVrvdWop#ZJ@p_gzUb5asVkRo0d-YHE*}NE04e^6-V)^70OMz*0i`RFvdm^A4{d}Dt zSshjQh=MU4&4HqV{dCJVUJgdU7XZXkI2YS`?cOueTA(}`zXOe;zLl!q} zBb)L&HGixwT+7W$JZgEMs7XND}BpfS*V58>i%k~8F!K?wuh)d&yC= z^Z6sPI0s)>qmtR`ja9eOC#?aqqeso0k1yCC*;KEeXMG{k;7s30R04zeQH{>qe;<=* z0a<-2Z~iXDxi+RClb&wbnqk)Z@N%x-Y8Q@}ih?nqW$K!JG;>4VSgX`X*phYU`};G? zo}{?4pVy8~mC*%|1n38zgkwij+*&g^gZI1^`tDJ(^I0+jakK+pi~Pn|N5=5aS-_4x z_V6bcm3K4)xVK}u?v({Iw7HF$&cOQUzEgOy2P!WT6rt6 zAU*vMs~79iIns(00u#p9?lKyqqvYr;8*R;==RnJm$!Cu}J$wF`Fu%{O>&n7XJ|dBM zD-%=Z+xM`GBYqhAdpTMeIt#!wv^QtiJopH1#7-rJo1fl)ntC4}uMnTWqiwp6ZtpXk z-}V(3b(Hi5^7rqrwX)fo6JM;T$mGd-M|JV}($da^c$qu@g3oP6Y%D8X1>`~_vmO)0 z-7&8H4Gq&n!?nP_%`11 z8yvfH-xW|sJv}#W^`VV91dov=Y;=r5d9^=x-MYh{LvL~3RK59|CjL5Sj$g%CM1&hP zI9`aAI)Q|ivhvllS4a^U9Aj=xLFZLL}H2}VjPtsW8*+Ds)${Ji=1N?-cL z$4@EkllCFX!-^+%EVhqz75+%$ny<1nI!$?=lan?)Wa!yYq?l@^|CjRWm4mpLb-&~A znUg!RV^b&9tE=zp{^M+2e>q1_ua1_h!DC};VzLK)@UprUHJ#Ozluq{e(>(KbmkApk z%1Ov4E2*fw((HN_m$>pRQPrt(Hbaez2mn@b~P1&3}#XlFQ5a`rm zZPSzPFn>8e-*qJD#f6kGiDj9rR$7Gs!YgI%E~GAvcDgcH=_bD@>27YC>3_aX(DQD` zrSlwx!+#ehlhq|$?eAL2om~2bpfEX!K1RUB1>t7N@3FZ#wviupNg_X(C`C)(k&*A+yi9ItEe%s|o(yS#wb2{*dr?A(%&-dj zviqmAy;e^FwJr6J_x!QE-0a23(o(HUv>PCkaou*_+VaOoZP__V!K%vSAR(iI`D)sz z_KqIs(Uzo;<>fS+Wi6xOkm2DcM>n$X#pFGbXm4MvPP&uUv(?7jYPRftO`^M9JE!-J zYbO@R#jQ^lI6J?QtiRa1Gwpt;vcgXA5EhX>Vr;PJ>eB4|q$@BSy@2IBZfH6ua^$D? zO`r1WTpOx+ZP+pLrKtLAwLun%vN3cD&ome57{6l~ni+QmAF8`>&}Tj?T@4zb*=`p& z#!XF~G%sI%<58-`J?oit=hW=%S`{QwF>TN4{p2;WOzxQV2Ry*K{YqG_DL$J^!(`}MSCCM{p*r=i`j7KI5CMqvXKtnZ=$U+TqPzcjsY1(iKIN=2_H zmpVGGeGm5UbLgI@8|*(gDtRuhiC)~=Hg)T^n^+RktUt2G{lC}Vvr|!Myn(Kf7y^&1lA>Be)9bj| zJ`}6xM+t%efqYzZSHM3Vo;sIHFWqri4dkCSdt`Fu_PeXr-Vk5zVf zQdIkMS;vG|_ijKdadt19jXF$|`krOiQsM|+!(F|mmdyKmvV6k6C9Vkix)T&9PBTch$VkI(I{a|I^!-Ib@{}Ky9o-`Cw>}|O3|9U+{0vbm0VF@ zxGhXkX%}+hK{`1cvxogUmzR-g*Mek@OJ3XSUxU|?jcSbDAU*M_+VOnl%xiI%NGzn@ zlIL4`)+XFbePeqXWP%e0Zebh0F~N( zJ2%0;XU{v;rn^4-b+5OpM{4d`b7gPS@x7dU_3VW&wOMv@1${fjcTxW0H{Kxci(A|u z|9Q^wdtKYfBW>(;b<9`B9h6k8c<1M3uNZQ7S$V`H4DmIm3>3S1T@F>qlAOP(*qU3L zJ#bdLx8=pu=Es-&!*odR2o;|f64__a?cewWo>7`tc(Er#6t?1|SBf&4$xBT2(E8zV zPaamKSRDOo_Ln4maKUgbDRs{;adts38vQFzYY*+zjlE%2hrA3-gzD%6s1vf^o=;~*MZ*oE+W316$8_2vzmF=kz4`^X;hap z_J1Q>FRVg}E3~7JQ=~>1UC)4)B<+0<#xmHyMVYO`N*Qmym9?)!hkI1C@bmv?Si>L- zf3Z7_UVFijk&ysle;JKHRT!9~h=>Caj0NmmP|)JRvH=9A;^GFuUaqQQmUM((n!B&i zMPIaAy>aJ`INIBySKP<_XzvL0IwUlq`wrnl3v=_DIi+HWq#Nm?iAfyG*xseu#*Oh>5wtSR&_lIH6j6@MmwEZTs+e|P@p^r8mZ83H>SC{LZ(TsQfrjA*TmF0gmQ zGf>gpOc_H{$VaWWQGSK`@;RG(!XZSvZvePNrfzJBe16WAVeRI&TpKejE>!MifY|t% zP4sUJ|KNz`hCp`=hcB84cV0Qwl@17l2?z8-%Gze_&LUTm!e+uv@?*$*UtTUAkvboE)HScx;?>= z>H3?n;8-v_-0E5&1@Yzk)xjC~7ufDVwpW~T1gU0SSi2!wL)u#a)0>y1iZES2NQ~`&kvPMj!w0zH2GOCq}z=;xTA{inF zKcuG1hK~FuXP)z#0ajN86Spw=r!NdJwZn40OiWBray-~N0Qk^D5tdX57CB!)wsz@% zYB%=f%tveR?fussfifH)9NbseW66lkY3WI<{0K}N`jFe89?sx;(0F}7RfgqYL!B9Q zbrkIqfoxo!AKyy;Slc>=wF?{uK^gh#^omtN3@~T9^$aqx)~MElCGzmCAsTQagGd1m z;%ny8Duw8`3dHpnS(zKhjvfs$pwj=-|QHhC(REwz7n@Vb|rPbBDfN#d$>l6C{(Ezlr{|q>k&1lD=)&T%k zRMYi+I!DKP7!!&Kf)HpjdyIihqu-yjYt1jfnb89DzJo)qnh%Kt8^J!O$nrtG(1|>; zvs>rH+5+ovk%zy_eGU@R8U)-(+OJ-Ji}kSJRY&x5;m@mK>2W4NO7mo8(1s5PjwdeGbH6E7iBERAYW;$DD z09}pBpVl{S)DtyyKvYKCQV-Tm1ID%?v%k03xI0hO@_I5rmw=u{1O!sh`*h1m=q78Nddlw&aHi z3q&#}6#JYlH8_7`%gW{NRaFjvY|;v5tvXTBwsNeHN-0vJqXC9~-;W*&2_2}VTd}yd zIT-486ir=z#Qy~St-EYuL~$jy>*Ptr}i@Dd7V77z`buTh)qRh%TtSz8K@`}+AU%LjN;jxdUJl3z95 zOC>;clu?{|REq4hCH%*ycTgI!{U zGk{4kNOp6n`~V=_+~G5TMNhgUS0;*YHP>-9f}VxG;zA>hX`epPl38tk(qspCbH?Eq zs)O3M0XeuP@KdBzc>VD7sU36XJucWtYjdCRnKJrCUH@r3FM2aOmB3%ETZUEx4gm@o zBHT|DHUf*%T6pfHYa>@9(WKt|+Sb(%IRJER;AcF9BU}BCJtUI0LDBKcmuQO`QN6xO ze{%@}w1n2)AFqHk<3*>k)je|IhehqPt2f4a1VyB#0n*yiS!S9=?d6j4={)zh;|987 zJdD2!Q1V?fL5rcLICPZ#PitIzfYZ$R4{SIq@W#^G`oiDCDtEVd9kNH$6+%uxZ5SMs zC!N@^5gF(07%Sh7UdM0D$Mex{bzPPjfNQ{}&5mmehseHS+1JJ(v5V&T>9@d|3RpgO zWW#c@5jc%)I}9@7c1XcJKZYmbe1UDaruErv?1860@20hn;bA^r`66U?0j+w;OEVkz z18@eZUHe>nw(<-%2G5Zr-krt06@E`TQ45)jy$cE6FQSoT#yxvfR=OUq-Q273b$FIl z)SUAyy-&%U;{^f<`QpLB_PX5Vr|Yu+dYuhQ+dqVZiXq=pSn=<^K#_-$KUe{|qGbkT zoE&`5RwP3E&njfNf2KICI63R;>h|W-6x9*kTh}~XS(yAe2g}7_!{FfMhh8nPw?)>y ze|HENG`uf*j^68bQ#w?JyRt=(G5-BO7-A;dTDCET)Vg!Z0p|q!bPHHSZS6llt#?er zF5HN^a9*E2M&y90^TE3%y~gv$?|s{leb}R9#}8J3tM=LS%`evuMb{2JSo)zKU$}AS z;Wb(3sGfAB$MSk;2UPb2Yxd_H&`ZqO7HwuLFwpd-m@fhkY}6b2^wk;Iq`BIP?a$Xd zbF%SqlcgsbM)!VLkMjf|SYYt?OF8A?|L_B&x#ZIj7+(8#?LZ@g+IKIGt9rR!Ki~J1 zqa}TmW5kiIT$UV`w<3asXL>}SHV78&Mvj?f(_Vq2Ti6>}#n5v*;vAjx>_^St%Rllg z+3(#s_jkYQS3?5>yPuENqYo3~uc0eXzC6f0Xhm}zJA+x~smHGy-)4>HZzw9B^fvuc zlC}Q8Qt|flH%uQ}Ue}M&MW@`S1a|7;whvcBH<^Y{dYHF;B$%OpRkkJ`Xp(L7X)C{{ zac&JTjLeu=fWMdYcBHiv-U^N-?VbYildz)1-qbDE|C&5ne?(b!`*>QH?9<^2ZH~jP za6imT#m-UV=CU{MA+fuFhQNJUAu0KNx#C3|*q5E_gx>X=DD7`GfQal2Lp9?RK7{ z%C7K)-d_Dsecz+*1H_s>?^mPSN7_bs;~P0FjWh9A<}viEkt~e%vvJLa{&$XxBr_)5 z#qRzhZw{y?juPJfM`t$U3_dQpX>CoW?>#FKr^3R#JVzkjwl6)UC{)+fuzpJ*Nq^PG zym7v$yR1_yqB#Rp@Gw#G7?R0k={RO{Cz}&SiBkEjA(pu6Q!f>AX$76cnfa%_D561y&Ncq%1kiI~t??31`6E$`g9|FQh(0Dk;@F{f|$_tdSrT$w$q~jJEVJkhLP9``}yKWd&Aeo zml`v&5#QUD<&C#;na5a(g@y>5Y%<0iG4yI&A_@HG&T5wR=f51%p#aC}8mi9n5nb2O zi}|TZ4%&^=GQ$iIj_lKbP6&c%Ql{M(B}=SK0Iv zww0nj^V#uCfRPE=5>qa?*R~MM(prYngVNvD`Ri9d5PkADQzi+aAIC1krfoU2jQ~P& zpE{BQm?E&mC%-zK-D7QPY&_EQ?}grAeRfap&c0N!w=N)4qf5H>-H=yoQmi?yfS`u! zMw0{-k)~pQ2;L^r5*)y8`!fCm!qo&2cV%%J(M<200Hhz-s0(F}MnNhE z_Y4ZPvx5rC%GiGC@B`OSJ_FDU!bZu&5ysdEh@p#oRla?D3ApFEY&U(NejyB$Z9h9y zSJv6lv9u>AIEGEKd6vs3hrDzQ%#~0lW|ZnSUUTd63!rbmLCKulu$l%^3R& zcw5!fd*`*50l4i3?G2PR7%qI0j6UJu8S*?}ko9EB08)F8hV7YcPvNJ5j(jcnq`~4; zA590dgw=nwJm^KZTX6DM)eRTAl!9GWGH51Nw)Ld?EWR0gcemYdO}-WKW@<>V1& z&r(@dJI>D!64*CDspEFIM0#c>;Wp~6Y&?NSg;|%Z1wrrf$6`D#OGFJ$&r@yFP>Oh| zL=R58+sDr9dU?%5FZT(eL)ZLxPtU+4B2BK3u3(hZb!M$gyq&@Q9Oi&uO&i(v@Jxy>{f!-z@&AMizW#{M`c&fmk68`D? zn3(5ex9{J-gO*?hksCDOe^qwih(a-X`LQ~T91p2(DKu#Nco#n7%ve`0I?=_*2blww z{yq@+lH0 z$Bl;XU638StC-U@ucGwKQXMB0f>@A^)In7KBJ}1DZ?}-5iVFHItU@(T(rLw3S-EB6 zS221QoI=P|x62Kk({T@8UTjdpLF=yxWj^iqm>mZ%Ocx784hE+C1_rSFj;1SBXy3SS zQZ}NuUAgEdgWl{=GnU_mlV~gezkkEykGKo=Jow@nSy===AKEkFcy2UzcrEJPn4dqd zto+=>KK;GUd#CQVq~|~llD>smpGbO(o&!`u4~3}>I!vIpD`&B4{apYRrAMkU9+^++ z)>{WRk#;<-9dLkOK--BK+0GY-Pz)G;dSlSKMb|*^@SP^oP#3pdh?)d^U-aScr|cVG zhd?DV|09LSu`|O+Mau!ZrPBXJ+JRFCRuYXnGe_ZodOAswNcy`Go$n|k_+DU6T;28* z!$1On(92p~X2G!y0_UdW?LyxE#u4sj1&fE5Lf4FUuOUejnL+8_gtx@rxasjk-MOvJ zt*t%S29G+!(a9g=HgMSySY~*7dOnimQAvJ#T>H6UA4epjq6>vPOLp(0B}wxVBWD4x zlZcwAqY%{EwkD;OkVpevLQps)DK<7XKECqLgxa4M!45zi+l9pmWdn6S zm5*_sKih&qgZ2@iGsU>?qCMCC`l!Ngehrp3;gIT~-E^dlF3b4*h*EYJmCwSt15D^+ zHzz3RzMMkB!eNqRF+st!ckhZS$%koX6k0oTFZ5H=y&zXLkxe7riAMb{ZI@4!YqbBG zTJ}$rVJw09Sy^#Bq{-+~AhV!$=?xXinVvr_Hj(bt9;Whslx>I4q1Wq8Rdb&(SVv;| zGQfz%QmwU3H<-iGW(`S6gk12^%L7kTM<@A##6I^jQr~}()x*EdLQ1RXVEtc?ao5;-!9z zxa(iechG3n5pn09gF0hQ2j;uBlbSf@vpMHQ*O7t=t=`Xj`KHxBwh#CD-X?X_k4|UU z=id#vAb@an>%Wg3nu3JiXSGYpw508hjg1Fezae3${%;stvTpS%Z|)j@aOCzTQn}x< z%#O8p(P-0!l{h^464w>~egFRB72U2b->=OiX%Sy`=*V{*my=UA?WOzAxAQBE&VBUC zvdvF)+9JI<{)cI{hrx#m$Hz1-EyQ6f+9MsQ|3L!OrN2( z^vh8;kQ@=pJO z;74OcR|Y?JXIP!WIBDw4#(5tYC&$@H9*GsC#xjtrRB05Ga3h8-SIApN)c|9|-%}xx z+N^CtSruG0ZL;cCFu52cQEw{dwsY%G2PWCskNMXSRxs9LpRH{ccJ^_@$>-0Rngt#9 zV2P;sDb6@8#VLv{*J;hi%-LS6k^__PY0S#fOYrSZRB%=HDo2l8hDW9cU5TD9eoRa3 zJt!kX*D|C7^+=vqW!r{qoxf4k+}tdY5CdeO7DyWV43cAGxk{EE^3Zfhh+2fZT~3#c%CIqFaF)N5|`)Y+^pM+_}s5z7K9} z{1u^8SPwtt2zt}XrJ+V76K~vzp7$d5240fF9zywleReDv?|TKy z5~A|>j5k2cDcKy*7`Y=X)y{7(0?$%-(fi`&X=g)eP>w7RC!~?A`p3vSlrq3Kp z`DMB%JAwpDo09YHF>hjTimZ?~TLC9c8yKSPHskwt;jf_wcfW~ExP4LOp>bZ@V!+(T zoPiNjoa(N~$QfVX0MD5i)or@ZJ#(u9WARpd$V&*{ed%K5`3o1+{mq}XP5M@rmmBzt zvtUx0dPQq2*;hSVMP8&@N{ni1yv?`7d;m3zF?|ke^nLX3ea2jT3%UqSj5io!f!$L- zwyLuJsw-*~%=nd=!PP569f=Mvq=#$j&E{@x(HxsOpPJVDQHMDEFqV?@)W_A;ZCf=p zzXr31!-;7Zy6tuId9cvqpv7L;co`0SN=P^)Q|eoNp{U?@R<_bB0yZ@qm6{ron`cx# zLgQE1@;S-G5YTR_S&$ZB{=*&@fQv5%=SX=K9miyx3ATAI^ zQmYv3V`O4m=b?y{L!EhJs_X+c+twBL$4b1oVWVr!DLfR*T{A1PCi>pXS1QV7Mfj&L z1Rtzmu%qB%hR)dC1345IE~NLc5sh-YZn--z92E}Q{oR`+b9RT6lVh}jac*3W|2B@@5~OZ{p=GYV8ntd5!VKKD>?i?QlIcaE^^?7KM;#~eQz z(D)%fD|<9Ct>c#<{t{YdPEnNC-BsHhgoJjldwh^ch_e9BvCZ==z{Bu}M)3HJ zd|EMdaPV_kl-n3Q5PWw3zO3wL4Zx1CL2hmC$hX@7${3jEFb^Qre(%W1(YLXo`zB+= z`AY28x#J!lb8}Nu=X(CuxQNr6tyWN=5{#xZr-8i^??uE--wzHpNlDoq@Q{z@n^XCI zEv>A9`Sm*~RLl9b-}q-2ggvENQ$Cbzyq1s@n(y7Zv(UfW9Mu5F_xg1W|F*ES9K;G! zW;2^D5-chrlvR#gf5m7ZXzae>nmVfxSLLC`0WI1NUK-^ELh3F^+qh1 z0i-&GuV36z)$V~L#>D`uq)V<(-VvNn_aC4p)kZpsW!+3tk#idM4~XcP{Jwe2KOm&f zXmD`$*c{6l*qonbk3LCq_?9MtKY@x-yw+UskRAOj75bDNdr;79VUQ3nIbx$tqQ^8S zK%~o;7=q55m;~H#c3!^qv5*SCQLL7Q>+kmz0F$8Os;cVOZUYSIpIuU4 z-26Jbekj{!Cq;O0x+W;N`lz(lEiXSXmIO9lnEjEQM`cI-Lo(lg`PDt`@#~I#{BBN( zImK^QNluBwvY(tQ*u&r}EKwdb!KH7}&s@j@c)v||-pf16<#6+DVST;I`$a}Vahb}> zt$l$~%Hl_qt`%4k2w9)!cVc21HqPkOhQUOk$rJCf(jtjrod-ZtFutay zBT#^D=f}{{5aiDU%z*Pg*OGV+r5y35;O(P7e9$t0&lKeJ(ZA{XpiANN-xvJ%IQ)N` z57t^Fcz1ANV&ajArZeBSf!4t-0K7sAgRgnCdB0+ zbS_QGcB3loR3z@Hr@-kE-|DSa>2<(IzM3jE-l zwd5zT!FTwJ2z-!UJT^84 z@HKAgy1Bc5`^|u1d}wdOZwNB0LekpUjEkGQ-oC7H1&3i89O#{Dp+JWZy!03YqUy)g zn%mpOS--Bh8!#&TFW-tLT!Gs4t^>Edyvkdw9`<~I4jw|+7le*3|GAyK;tQZ`EM$dG zjRG_L`_|Tn>&_Hd=PtB=7+x1;EiNgsd^0GeV2xp=K$?l+gBvUfqQDA|?IpJ|*vi@( zU;wNOosJSsF4O!Nrh_acIe82Jo|K^>V@*cON_mhJ(wO-22CrQ5y$=vaJiZ)=?s##y zcUJm(dE$_G{)+@7lmxpiW_L9|>-d?=`?1Tn(13~Y>KVkq9sm9x++$OsKs=n(9!DTB zMEhV^CrYhghK#@I1<=4gB{!8B36K@YGvSw*g^lzGswXN(qL#e>1nBDV zPm!yF_Zpv=I6`Z?R&V%67QuAr?E}`Q6HcEw(=o#jg?9gn=D!LhNYcg0J)3vrWjOPL z6%$2g!m!TL9t?j{=KMknNF^mCt2|<$VzdS6E z$b*lNI~gAvYb}w~@wwemB5nSj7VPP$0uWsG&-`teTnd^x>8RV}meSE$X`MoHl&446 z9J{Qpt`69d&RYAjtn6$7J=8-~KSl01m-+>pNM=k_3=L;zNOKP26Jk%EiEm^!acC)K%-*m>~4Xt&{87jys+2kN$GHO7L-yHSrP-2wV;{C#;kN_C@?us z)WzjzJCTB9>5*qv>e%@BgACS<%VWM<7#8`N{Jsb8ER@yN4{o0#uZi4i(8?KvEzv9t(2Q zFGx#EGe5!!2&m5g4NES8Pnj?v4sxJZx6CZ69-j2rNV%F;|7CGnlax_Adpb$`JV#j` z>p6M1;x1>&&yI>o?N6evLn;ds<%4d?eI@YpwL}*_+~@7->Ec;NdTrQuLgE#hqr~j@ zFTIDj1-TV@!q}q4>|?%1RD6MYHE1Stc3pO^fJ&(g#iyILe~KeTD|Q5D7*(tH$iX6h~$B*<`-)gw`K%dW_3wN8iMU42LDoPtz6dHjNyMgv-Usu!epVEsoE z?R&aI87(mqzFw={&$E(`mN`&hcDrE8q^rIuECklQff0YB?wDkk)k0sdm&*U~QdTbQ_C+ACkL{xmZ>M%96QD3{QN}Y z(5Fw~WSN|A3QzgOSIE!Jy|X$vubtIUB!S_n(Fz%uSgUXx83L}kw;{Iew{P$G_QaQ# zR>lQaE_s4oKh*zYdowjF>#nR4V@VN77>27dS_q>>&^Nr-`s>$MFZ_J`Gp~sH^Qf$+ zedBoiVD^<0-8^*4Zbzj$w--p^yZ2E4q8)sueBXQ2>55vIg9(O(p?NFT>iPB3u(tZ` zq@+FF{a)Ma(rt0L6!)uDdG`#Bn<6A@wu*nw7Qs(3(b}apYTZ*^KMMC@>8R$E+#@KS zpu08}?O(C^_>c*Nu$IK!Y*A3Ob2a% zc30O}%#wkb2^YcZCfQ}*Zu(kmb|#N1@M1z;`d-4vSC&dWjbYoVsm;d}qI3Q!Bx%{U zYh?*z_>*XzI^6*D5s~ za``a48)~B3)JVnRvbMUitXYuk>W!X#3$NSkDg^YHJZeXKv;k1zz1V*^C%RVSSN-v< z56L;ed5_&kZJcrU|GkoWgzC{f;WFpp0RSQ$$|_z(7^C1Ev*|$xWqY> zg|kLMno6ZN@C2887uW0yO}7G}qN+Lz%|F=(yqk>I+;=gtASZ^f#;rk@$Z%AM~drVp^ zFzggE6B<&0iP3jw`=wge++1mY2~*)7d4qwue)!)@HQ0QKs&6R%gh(4qU=;3+9xvzj z`|@|~+k@PE&E}WQDn%V1js5iWUyiC9$FPisJ{nAJ$TL1J*w0w(2$8_ts%rL9Zo#x{ zcypECm=@zyK`Qjly3APUJa?%;zJApjXHCtN!V8Cbsyd3+QN6HtR`j@F3$ILwU=I zpW(1w9)>ZXS10gzcc>2r?F9vRQ(6vh_v&+F#KFNasaqP(Hee0~!l>e&Gw9mvNkd~A9{Ckb{hR!d6 z2W4+TO8lb$_!kGz+DLx`bkia)qX45Z9&aXW^7^Aq36&8rW+5t20I2@M4NPWwuP;Fl z{=7*=RdqKzkhl2)n*FmZq%{GtyVr5CzeMS2D>!`;z2~VhIcq>mwL+q!xtX5wI^g*Z z`NYM<=;-S1z1P{_pQGUl?&Kd=oe2}Q1cs7 z;Q)JLXJ`Kewl(H-9pL=ffgkI4tgWqquwMp}-tZTEAQh~x9x>oa0q!E~9+=oupxpyT z@y{^WK@Qdjr~^kA^@~r!58=cnP|q#^Ap)q&Fc3EH$T>(P@v5iZ0Zb^MJ6)E>T5@Cd zV0asVsABk5DB~6d^-wOn8Vp>_KD)6T~P^Rk45HKCb0Ez_9zxj1IggdDx`o z)F7!(>kK6M*#w^Z^W}e#J~}yI$NY&s+Mg5@q^0Ws%n3r$`Yp(TLFWZI_d)<_FaIbn zmmn_h$KFQP9RYLc#oYf1%92-6p-&$K*g?eK23%(JlrZAffSiy1OU)2C;$UWf%m>$2 z7a8Ch{oX(vj95TF>6;zc`T@A15V~VCJ+^=xOOkYPc5v8l1z9Q7X3yDPU}{E{;aFP{ z+5^7iR*VP-&O;5YDAF=*e4S$Ed){lD1y@Z5{fpU105P^Emp4qLmUSTuzs7X zMS%1KoCN?xAv^{38Akx3bRc+V^Vl-H#`(=U9KG!ZtO_EKgu%YYX$(xnZt;DJb^zL- zKpS4?!$aJPi1q)hWd$Ia-ak1de-{jgPT2|A8{qR_JE{T~{^!FkOqU7Iy!K*?^4Mic zr0S|z352mko_N(375vL*sT{<=z;h-1&uyC}a9Fr+%v~lnpOQwPZnM+81SMk0S#12R zQ#`o6!QU^~ZCM%G;YYvFjGBvHdIF)LPAHs(aXMe>;yZI!TS67vM_|tuA8@?KcM>$L z`AxDqBP96`4R)sA#K{G4B&?l)&;!{R#Y>%@oDA`+s(5YqMFqbNUfT8Jk}QmT#{Yh) zRB<@#p;cF+AO&BG&`%N*aSbea7 zDVYp`C<)G2Oog)^P*CT}`XnUKDLohax%<$D?ngEl+6=WUKD6u|{0N>o zp(*p`dHmRl*W3_Wc}q-m{Vp&FMNpfE`mLzn+=1f;zH2b3oz=DxLj}*Ytg>2n{|SSJ zl1d+m#5vd_Q%VIp@s%XJLQsv#r^M({!A;HuLQ{BbQb)i0Sm~6AJj9M~F2Kbr>mCF694R+(v%bVQ34Uq1&cIf`>Q6 z$?eYi^O7p#FFvATD@UcjZq4MqR11FgOblXDcsewQDxy5R*{x=}Z5l!`U0d-r&En5Z z-5v}TfzxUTFanH^XLi6rA)4P&yF+r+qh(ZtG4uhHN$?5hI55bQcm=$eJV@{3pCI3e z(c(iJvfT>Hrk}^(p$j3bm|y;JHbpJqSD>-O#yzTus&iCrV7k&GGc%cNvx+-LS5#Nu z{-%gb(k?*m`wR}$NjzXy?fv?WY?3k|^I1CU@#FUbg^>>P5J_ea01|VArw)1X{oBPa)PW$;ivVLJ^)&Q5zlb#r`gWTuO(HY-sZ=gL(ZqD zuudw}8mij@dY39$>}KPH!u>WO<-a$FR9d%T>Gv0&)Bw>Z456A zC>{6I*E@Ob5QCBnaKaY(|G=kZ*z~ zfP3`!uJ7fgA7z(9i%3R=fzQ*^!Q-UV)B$DrCkCH;iiek_&{W8ju%KHa5kl!Yrm!}%vR18YFXVJP z8NuJIckhq}G^b@p%zk0Ms!0w-=-*v(xQkmkOr-kb_K2_T9G_u9ziPYT0LOs-C5pJi?$~;|Df<RM z(U!^Ct?%{l^Uw0~RAL3+1l0ZMCE$KQ4k!II5+LcQwe<=dnqw1rLMzI`Xl3SYGNZt9 zs!Oc)GKYGfZCY8?YGU@&>zu#CynJg9`kSZ8h>zT-_>;lt6mxNRAMQ~ng&#j|9uU|b zC*zn>S#pK_F?_)APW0{@y6=N4@#i4ID968MJl^qJ4bX&3tmc z)gOciVTK+1g%MdlpVrbF5K^j)4X&IjDDsYpl~z$r2|0MsYU#Gd#@71nijh{RQ%j2i z`FL5W)p-X{{S+pu>k)$Y@2}=)7i20KDLsD4$l~5!Kn-!G#T@O08T+*dhe1900-ks9 z{1Yf@q+eJHiY5Zg5Ax1A6jD5Ckka=ANsybZR7Qw4`0G>@A0Vfm=+5S;p{E-SNV z*|h>nvdxTm8e$q{`zf#&C}ZDwLV0GOBpc#^dn)OL86x<3fsb#td!pU)v@}NRiXLJL z9RRmnC}1mj?2dt061L3^j zF}kL^yN9ixkkF0 zd{bW+g03HhX>4t-s6j*!g7KDVr*Z3P+sdM>9}8))bd;QU8~KlugS);yDKGcV{Cw@% zObAQY)I(ZcJ_^Qrz_FG@nRNP00S*^;h#N(};~}1eLNoA@D|*sRw%`7ZQWJ&F}pW%!b|U literal 0 HcmV?d00001 From 40f8edd180403b4265b68d92de8efac6f361498c Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 05:59:47 -0800 Subject: [PATCH 81/87] initial add --- .../simple_workflow/snakemake_workflow/Makefile | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile new file mode 100644 index 0000000..c31e427 --- /dev/null +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile @@ -0,0 +1,2 @@ +graph: + snakemake --rulegraph --config output_dir=./output --cores 2 | dot -Tpng -Gdpi=300 > output/rulegraph.png From 08ce603b006ce557663167b04155af58865c2d0e Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 08:34:34 -0800 Subject: [PATCH 82/87] remove .snakemake before running --- .../simple_workflow/snakemake_workflow/Makefile | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile index c31e427..2b72b47 100644 --- a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile @@ -1,2 +1,9 @@ +run: + -rm .snakemake + snakemake --cores 1 --config output_dir=~/data_unsynced/BCBS/simple_workflow/wf_snakemake + +report: + snakemake --report report.html --config output_dir=~/data_unsynced/BCBS/simple_workflow/wf_snakemake + graph: snakemake --rulegraph --config output_dir=./output --cores 2 | dot -Tpng -Gdpi=300 > output/rulegraph.png From bfc5de03366b7caa34e1b7cb56379e545cea3979 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 12:27:28 -0800 Subject: [PATCH 83/87] add expanduser to deal with relative paths --- .../snakemake_workflow/scripts/compute_correlation.py | 4 ++-- .../snakemake_workflow/scripts/download_data.py | 2 +- .../snakemake_workflow/scripts/filter_data.py | 4 ++-- .../simple_workflow/snakemake_workflow/scripts/join_data.py | 6 +++--- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/compute_correlation.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/compute_correlation.py index 5095888..33a29a1 100644 --- a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/compute_correlation.py +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/compute_correlation.py @@ -12,8 +12,8 @@ def main(): """Compute Spearman correlation matrix.""" # ruff: noqa: F821 - input_path = Path(snakemake.input[0]) - output_path = Path(snakemake.output[0]) + input_path = Path(snakemake.input[0]).expanduser() + output_path = Path(snakemake.output[0]).expanduser() method = snakemake.params.method # Load data diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/download_data.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/download_data.py index 61474f2..abad205 100644 --- a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/download_data.py +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/download_data.py @@ -9,7 +9,7 @@ def main(): """Download data from URL.""" # ruff: noqa: F821 url = snakemake.params.url - output_path = Path(snakemake.output[0]) + output_path = Path(snakemake.output[0]).expanduser() # Create output directory output_path.parent.mkdir(parents=True, exist_ok=True) diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/filter_data.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/filter_data.py index 325b818..6e5af11 100644 --- a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/filter_data.py +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/filter_data.py @@ -12,8 +12,8 @@ def main(): """Filter data to numerical columns.""" # ruff: noqa: F821 - input_path = Path(snakemake.input[0]) - output_path = Path(snakemake.output[0]) + input_path = Path(snakemake.input[0]).expanduser() + output_path = Path(snakemake.output[0]).expanduser() # Load data df = pd.read_csv(input_path, index_col=0) diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/join_data.py b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/join_data.py index 73b915d..b484d92 100644 --- a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/join_data.py +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/scripts/join_data.py @@ -10,9 +10,9 @@ def main(): """Join the two datasets.""" # ruff: noqa: F821 - mv_path = Path(snakemake.input.meaningful_vars) - demo_path = Path(snakemake.input.demographics) - output_path = Path(snakemake.output[0]) + mv_path = Path(snakemake.input.meaningful_vars).expanduser() + demo_path = Path(snakemake.input.demographics).expanduser() + output_path = Path(snakemake.output[0]).expanduser() # Load data meaningful_vars = pd.read_csv(mv_path, index_col=0) From 0aee5574177b15d593efea8bb612e2a2e97658c0 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 12:27:57 -0800 Subject: [PATCH 84/87] add log dir and onstart --- .../simple_workflow/snakemake_workflow/Snakefile | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Snakefile b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Snakefile index 88104ce..789fd98 100644 --- a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Snakefile +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Snakefile @@ -41,6 +41,12 @@ OUTPUT_DIR = Path(config["output_dir"]) DATA_DIR = OUTPUT_DIR / "data" RESULTS_DIR = OUTPUT_DIR / "results" FIGURES_DIR = OUTPUT_DIR / "figures" +LOGS_DIR = OUTPUT_DIR / "logs" + + +# Create output directories at workflow start +onstart: + shell(f"mkdir -p {DATA_DIR} {RESULTS_DIR} {FIGURES_DIR} {LOGS_DIR}") # Default target From 71f286f3535f77d92bbb8766f5f6e1512e3415f4 Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 12:28:41 -0800 Subject: [PATCH 85/87] use DATADIR var instead of hard coding --- .../snakemake_workflow/Makefile | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile index 2b72b47..95954e9 100644 --- a/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile +++ b/src/BetterCodeBetterScience/simple_workflow/snakemake_workflow/Makefile @@ -1,9 +1,19 @@ -run: +OUTPUT_DIR := /Users/poldrack/data_unsynced/BCBS/simple_workflow/wf_snakemake + +.PHONY: run report graph + +clean: -rm .snakemake - snakemake --cores 1 --config output_dir=~/data_unsynced/BCBS/simple_workflow/wf_snakemake + -rm -rf $(OUTPUT_DIR)/* + +run: + snakemake --cores 1 --config output_dir=$(OUTPUT_DIR) + +dryrun: + snakemake --dry-run --cores 1 --config output_dir=$(OUTPUT_DIR) report: - snakemake --report report.html --config output_dir=~/data_unsynced/BCBS/simple_workflow/wf_snakemake + snakemake --report $(OUTPUT_DIR)/report.html --config output_dir=$(OUTPUT_DIR) graph: - snakemake --rulegraph --config output_dir=./output --cores 2 | dot -Tpng -Gdpi=300 > output/rulegraph.png + snakemake --rulegraph --config output_dir=$(OUTPUT_DIR) --cores 2 | dot -Tpng -Gdpi=300 > output/rulegraph.png From 3206f8d8bbe882683015f433f3a1ed0d35b38eaa Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 12:30:31 -0800 Subject: [PATCH 86/87] add .snakemake --- .gitignore | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.gitignore b/.gitignore index 977d7ee..c1df5e4 100644 --- a/.gitignore +++ b/.gitignore @@ -5,6 +5,10 @@ __pycache__ .hypothesis .env ._* +.snakemake + +# Workflow output directories +**/simple_workflow/*/output/ data exports From f7e942cb7706161b495e44924542dfa48273518e Mon Sep 17 00:00:00 2001 From: Russell Poldrack Date: Wed, 24 Dec 2025 12:31:34 -0800 Subject: [PATCH 87/87] close to a first draft. merging for now, will return to finish later --- book/workflows.md | 426 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 365 insertions(+), 61 deletions(-) diff --git a/book/workflows.md b/book/workflows.md index 075c0f8..4fea77a 100644 --- a/book/workflows.md +++ b/book/workflows.md @@ -32,7 +32,7 @@ It's worth noting that these different desiderata will sometimes conflict with o ## Pipelines versus workflows -The terms *workflow* and *pipeline* are sometimes used interchangeably, but in this chapter I will use them to refer to different kinds of applications. I will use *workflow* as the more general term to refer to any set of analysis procedures that are implemented as separate modules. I will use the term *pipeline* to refer more specifically to a data analysis workflow where several operations are combined into a single command through the use of *pipes*, which are a syntactic construct that feed the output of one process directly into the next process as input. Some readers may be familiar with pipes from the UNIX command line, where they are represented by the vertical bar "|". For example, let's say that we had a log file that contains the following entries: +The terms *workflow* and *pipeline* are sometimes used interchangeably, but in this chapter I will use them to refer to different kinds of applications. I will use *workflow* as the more general term to refer to any set of analysis procedures that are implemented as separate modules. I will use the term *pipeline* to refer more specifically to a data analysis workflow where several operations are combined into a single command through the use of *pipes*, which are a syntactic construct that feed the results of one process directly into the next process. Some readers may be familiar with pipes from the UNIX command line, where they are represented by the vertical bar "|". For example, let's say that we had a log file that contains the following entries: ```bash 2024-01-15 10:23:45 ERROR: Database connection failed @@ -58,6 +58,8 @@ where: - `sort -rn` sorts the rows in reverse numerical order (largest to smallest) - `> error_summary.txt` redirects the output into a file called `error_summary.txt` +Pipes are also commonly used in the R ecosystem, where they are a fundamental component of the *tidyverse* group of packages. + #### Method chaining One way that simple pipelines can be built in Python is using *method chaining*, where each method returns an object on which the next method is called; this is slightly different from the operation of UNIX pipes, where it is the result of each command that is being passed through the pipe. This is commonly used to perform data transformations in `pandas`, as it allows composing multiple transformations into a single command. As an example, we will work with the Eisenberg et al. dataset that we used in a previous chapter, to compute the probability of having ever been arrested separately for males and females in the sample. To do this we need to perform a number of operations: @@ -91,9 +93,38 @@ Name: EverArrested, dtype: float64 Note that `pandas` data frames also include an explicit `.pipe` method that allows using arbitrary functions within a pipeline. While these kinds of pipelines can be useful for simple data processing operations, they can become very difficult to debug, so I would generally avoid using complex functions within a method chain. + +## FAIR-inspired practices for workflows + - FAIR workflows + - https://pmc.ncbi.nlm.nih.gov/articles/PMC10538699/ + - https://www.nature.com/articles/s41597-025-04451-9 + - this seems really heavyweight. + - 80/20 approach to reproducible workflows + - version control + documentation + - requirements file or container + - clear workflow structure + - standard file formats + - The full FAIR approach may be necessary in some contexts + +In the earlier chapter on Data Management I discussed the FAIR (Findable, Accessible, Interoperable, and Reusable) principles for data. Since those principles were proposed in 2016 they have been extended to many other types of research objects, including workflows (REFS - https://www.nature.com/articles/s41597-025-04451-9). The reader who is not an informatician is likely to quickly glaze over when reading these articles, as they ... + +Realizing that most scientists are unlikely to go to the lengths of a fully FAIR workflow, and preferring that the perfect never be the enemy of the good, I think that we can take an "80/20" approach, meaning that we can get 80% of the benefits for about 20% of the effort. We can adhere to the spirit of the FAIR Workflows principle by adopting the following principles, based in part on the "Ten Quick Tips for FAIR Workflows" presented by de Visser et al., (2023; https://pmc.ncbi.nlm.nih.gov/articles/PMC10538699): + +- *Metadata*: Provide sufficient metadata in a standard machine-readable format to make the workflow findable once it is shared. +- *Version control*: All workflow code should be kept under version control and hosted on a public repository such as Github. +- *Documentation*: Workflows should be well documented. Documentation should focus primarily on the scientific motivation and technical design of the workflow, along with instructions on how to run it and description of the outputs. +- *Standard organization schemes*: Both the workflow files (code and configuration) and data files should follow established standards for organization. +- *Standard file formats*: The inputs and outputs to the workflow should use established standard file formats rather than inventing new formats. +- *Configurability*: The workflow should be easily configurable, and example configuration files should be included in the repository. +- *Requirements*: The requirements for the workflow should be clearly specified, either in a file (such as `pyproject.toml` or `requiremets.txt`) or in a container configuration file (such as a Dockerfile). +- *Clear workflow structure*: The workflow structure should be easily understandable. + +There are certainly some contexts where a more formal structure adhering in detail to the FAIR Workflows standard may be required, as in large collaborative projects with specific compliance objectives, but these rough guidelines should get a researcher most of the way there. + + ## A simple workflow example -Most real scientific workflows are complex and can often run for hours, and we will encounter such a complex workflow later in the chapter. However, we will start our discussion of workflows with a relatively simple and fast-running example that will help us understand the basic concepts of workflow execution. We will use the same data as above (from Eisenberg et al.) to perform a simple workflow: +Most real scientific workflows are complex and can often run for hours, and we will encounter such a complex workflow later in the chapter. However, we will start our discussion of workflows with a relatively simple and fast-running example that will help demonstrate the basic concepts of workflow execution. We will use the same data as above (from Eisenberg et al.) to perform a simple workflow: - Load the demographic and meaningful variables files - Filter out any non-numeric variables from each data frame @@ -101,9 +132,11 @@ Most real scientific workflows are complex and can often run for hours, and we w - Compute the correlation matrix across all variables - Generate a clustered heatmap for the correlation matrix -### Running a simple workflow using UNIX make +I have implemented each of these components as a module [here](). The simplest possible workflow would be a script that simply imnports and calls each of the methods in turn. For such a simple workflow this would be fine, but we will use the example to show how we might take advantage of more sophisticated workflow management tools. + +### Running a simple workflow using GNU make -One of the simplest ways to organize a workflow is using the UNIX `make` command, which executes commands defined in a file named `Makefile`. `make` is a very handy general-purpose tool that every user of UNIX systems should become familiar with. The Makefile defines a set of labeled commands, like this: +One of the simplest ways to organize a workflow is using the GNU `make` command, which executes commands defined in a file named `Makefile`. `make` is a very handy general-purpose tool that every user of UNIX systems should become familiar with. The Makefile defines a set of labeled commands, like this: ```Makefile @@ -169,23 +202,315 @@ The use of DAGs to represent workflows provides a number of important benefits: There are a couple of additional benefits to using a workflow engine, which we will discuss in more detail in the context of a more complex workflow. The first is that they generally deal automatically with the storage of intermediate results (known as *checkpointing*), which can help speed up execution when nothing has changed. The second is that the workflow engine uses the execution graph to optimize the computation, only performing those operations that are actually needed. This is similar in spirit to the concept of *lazy execution* used by packages like Polars, in which the system optimizes computational efficiency by first analyzing the full computational graph. -#### General-purpose versus domain-specific workflow engines +### General-purpose versus domain-specific workflow engines + +With the growth of data science within industry and research, there has been an explosion of new workflow management systems that aim to solve particular problems; a list of these can be found at [awesome-workflow-engines](https://github.com/meirwah/awesome-workflow-engines). It's also worth noting that there are a number of domain-specific workflow engines that are specialized for particular kinds of data and workflows. Examples include [Galaxy](https://galaxyproject.org/) which is specialized for bioinformatics and genomics, and [Nipype](https://nipype.readthedocs.io/en/latest/index.html) which is specialized for neuroimaging analysis workflows. If your research community uses one of these then it's worth exploring that engine as your first option, since it will probably be well supported within the community. However, a benefit of using a general-purpose engine is that they will often be better maintained and supported, and AI tools will likely have more examples to work from in generating workflows. + +### Workflow management using Snakemake + +We will use the Snakemake workflow system for our example, which I chose for several reasons: + +- It is a very well-established project that is actively maintained. +- It is Python-based, which makes it easy for Python users to grasp. +- Because of its long history and wide use, AI coding assistants are quite familiar with it and can easily generate the necessary files for complex workflows. + +Snakemake is a sort of "make on steroids", designed specifically to manage complex computational workflows. It uses a Python-like syntax to define the workflow, from which it infers the computational graph and optimizes the computation. The Snakemake workflow is defined using a `Snakefile`, the most important aspect of which is a set of rules that define the different workflow steps in terms of their outputs. Here is an initial portion of the `Snakefile` for our simple workflow: + +```Python +# Load configuration +configfile: "config/config.yaml" + +# Global report +report: "report/workflow.rst" + +OUTPUT_DIR = Path(config["output_dir"]) +DATA_DIR = OUTPUT_DIR / "data" +RESULTS_DIR = OUTPUT_DIR / "results" +FIGURES_DIR = OUTPUT_DIR / "figures" + +# Default target +rule all: + input: + FIGURES_DIR / "correlation_heatmap.png", +``` + +What this does is first specify the configuration file, which is a YAML file that defines various parameters for the workflow. Here are the contents of the config file for our simple example: + +```bash +# Data URLs +meaningful_variables_url: "https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv" +demographics_url: "https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv" + +# Correlation settings +correlation_method: "spearman" + +# Heatmap settings +heatmap: + figsize: [12, 10] + cmap: "coolwarm" + vmin: -1.0 + vmax: 1.0 +``` + +The only rule shown here is the `all` rule, which takes as its input the correlation figure that is the final output of the workflow. If snakemake is called and that file already exists, then it won't be rerun (since it's the only requirement for the rule) unless 1) the `--force` flag is included, which forces rerunning the entire workflow, or 2) a rerun is triggered by one of the changes that Snakemake looks for (discussed more below). If the file doesn't exist, then Snakemake examines the additional rules to determine which steps need to be run in order to generate that output. In this case, it would start with the rule that generates the correlation figure: + +```python +# Step 5: Generate clustered heatmap +rule generate_heatmap: + input: + RESULTS_DIR / "correlation_matrix.csv", + output: + report( + FIGURES_DIR / "correlation_heatmap.png", + caption="report/heatmap.rst", + category="Results", + ), + params: + figsize=config["heatmap"]["figsize"], + cmap=config["heatmap"]["cmap"], + vmin=config["heatmap"]["vmin"], + vmax=config["heatmap"]["vmax"], + log: + OUTPUT_DIR / "logs" / "generate_heatmap.log", + script: + "scripts/generate_heatmap.py" +``` + +This step uses the `generate_heatmap.py` script to generate the correlation figure, and it requires the `correlation_matrix.csv` file as input. Snakemake would then work backward to identify which step is required to generate that file, which is the following: + +```python +# Step 4: Compute correlation matrix +rule compute_correlation: + input: + DATA_DIR / "joined_data.csv", + output: + RESULTS_DIR / "correlation_matrix.csv", + params: + method=config["correlation_method"], + log: + OUTPUT_DIR / "logs" / "compute_correlation.log", + script: + "scripts/compute_correlation.py" +``` + +By working backwards this way from the intended output, Snakemake can reconstruct the computational graph that we saw in #simpleDAG-fig. It then uses this graph to plan the computations that will be performed. + + +#### Snakemake scripts + +In order for Snakemake to execute each of our modules, we need to wrap those modules in a script that can use the configuration information from the config file. Here is an example of what the [generate_heatmap.py]() script would looks like: + +```python +from pathlib import Path +import pandas as pd +from BetterCodeBetterScience.simple_workflow.visualization import ( + generate_clustered_heatmap, +) + +def main(): + """Generate and save clustered heatmap.""" + # ruff: noqa: F821 + input_path = Path(snakemake.input[0]) + output_path = Path(snakemake.output[0]) + figsize = tuple(snakemake.params.figsize) + cmap = snakemake.params.cmap + vmin = snakemake.params.vmin + vmax = snakemake.params.vmax + + # Load correlation matrix + corr_matrix = pd.read_csv(input_path, index_col=0) + print(f"Loaded correlation matrix: {corr_matrix.shape}") + + # Generate heatmap + output_path.parent.mkdir(parents=True, exist_ok=True) + generate_clustered_heatmap( + corr_matrix, + output_path=output_path, + figsize=figsize, + cmap=cmap, + vmin=vmin, + vmax=vmax, + ) + print(f"Saved heatmap to {output_path}") + +if __name__ == "__main__": + main() +``` + +You can see that the code refers to `snakemake` even though we haven't explicitly imported it; this is possible because the script is executed within the Snakemake environment which makes that object available, which contains all of the configuration details. + +- Dry run + +```bash +Config file config/config.yaml is extended by additional config specified via the command line. +host: Russells-MacBook-Pro.local +Building DAG of jobs... +Job stats: +job count +----------------------------- ------- +all 1 +compute_correlation 1 +download_demographics 1 +download_meaningful_variables 1 +filter_demographics 1 +filter_meaningful_variables 1 +generate_heatmap 1 +join_datasets 1 +total 8 + +... (omitting intermediate output) + +Job stats: +job count +----------------------------- ------- +all 1 +compute_correlation 1 +download_demographics 1 +download_meaningful_variables 1 +filter_demographics 1 +filter_meaningful_variables 1 +generate_heatmap 1 +join_datasets 1 +total 8 + +Reasons: + (check individual jobs above for details) + input files updated by another job: + all, compute_correlation, filter_demographics, filter_meaningful_variables, generate_heatmap, join_datasets + output files have to be generated: + compute_correlation, download_demographics, download_meaningful_variables, filter_demographics, filter_meaningful_variables, generate_heatmap, join_datasets +This was a dry-run (flag -n). The order of jobs does not reflect the order of execution. +``` + +Once we have confirmed that everything is set up properly, we can then use `snakemake` to run the workflow: + +```bash +➤ snakemake --cores 1 --config output_dir=./output +Config file config/config.yaml is extended by additional config specified via the command line. +Assuming unrestricted shared filesystem usage. +host: Russells-MacBook-Pro.local +Building DAG of jobs... +Using shell: /bin/bash +Provided cores: 1 (use --cores to define parallelism) +Rules claiming more threads will be scaled down. +Job stats: +job count +----------------------------- ------- +all 1 +compute_correlation 1 +download_demographics 1 +download_meaningful_variables 1 +filter_demographics 1 +filter_meaningful_variables 1 +generate_heatmap 1 +join_datasets 1 +total 8 + +Select jobs to execute... +Execute 1 jobs... + +[Wed Dec 24 08:17:57 2025] +localrule download_demographics: + output: output/data/demographics.csv + log: output/logs/download_demographics.log + jobid: 7 + reason: Missing output files: output/data/demographics.csv + resources: tmpdir=/var/folders/r2/f85nyfr1785fj4257wkdj7480000gn/T +Downloaded 522 rows from https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv +Saved to output/data/demographics.csv +[Wed Dec 24 08:17:58 2025] +Finished jobid: 7 (Rule: download_demographics) +1 of 8 steps (12%) done + +... (omitting intermediate output) + +8 of 8 steps (100%) done +Complete log(s): .snakemake/log/2025-12-24T081757.266320.snakemake.log +``` + +One handy feature of snakemake is that, just like `make`, we can give it a specific target file and it will perform only the portions of the workflow that are required to regenerate that specific file. -With the growth of data science within industry and research, there has been an explosion of new workflow management systems that aim to solve particular problems; a list of these can be found at [awesome-workflow-engines](https://github.com/meirwah/awesome-workflow-engines). One important distinction between engines is the degree to which the workflow definition is built into the code, or whether it is defined in a *domain-specific language* (DSL). We will look at two examples below, one of which (Prefect) builds the workflow details in the code, and the other (Snakemake) uses a specialized syntax built on Python to define the workflow. +#### Best practices for Snakemake workflows -It's also worth noting that there are a number of domain-specific workflow engines that are specialized for particular kinds of data and workflows. Examples include [Galaxy](https://galaxyproject.org/) which is specialized for bioinformatics and genomics, and [Nipype](https://nipype.readthedocs.io/en/latest/index.html) which is specialized for neuroimaging analysis workflows. If your research community uses one of these then it's worth exploring that engine as your first option, since it will probably be well supported within the community. However, a benefit of using a general-purpose engine is that they will often be better maintained and supported, and AI tools will likely have more examples to work from in generating workflows. +The Snakemake team has published a set of [best practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) for the creation of Snakemake workflows, some of which I will outline here, along with one of my own (the first). -### Using the Snakemake workflow engine +#### Using a working directory- TBD +By default Snakemake looks for a Snakefile in the current directory, so it's tempting to run the workflow from the code repository. However, Snakemake creates a directory called `.snakemake` to store metadata in the directory where the workflow is run, which one generally doesn't want to mix with the code. Thus, it's best to create a working directory with its own copy of the config file (to allow local modifications), and then run the command from that directory using the +##### Workflow organization +There is a [standard format](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#distribution-and-reproducibility) for the organization of Snakemake workflow directories, which one should follow when developing new workflows. + +##### Snakefile formatting +Snakemake comes with a set of commands that help ensure that Snakemake files are properly formatted and follow best practices. First, there is a static analysis tool (i.e a "linter", akin to ruff or flake8 for Python code), which can automatically identify problems with Snakemake rule files. Unfortunately, this tool assumes that one is using the Conda environment manager (which is increasingly being abandoned in favor of uv) or a container (which comes with substantial overhead), and it raises an issue for any rule that doesn't specify a Conda or container environment. Nonetheless, if those are ignored the linter can be useful in identifying problems. There is also a formatting tool called `snakefmt` (separately installed) that optimally formats Snakemake files in the way that `black` or `blue` format Python code. These can both be useful tools when developing a new workflow. + +##### Configurability +Workflow configuration details should be stored in configuration files, such as the `config.yaml` files that we have used in our workflow examples. However, these files should not be used for runtime parameters, such as the number of cores or the output directory; those should instead be handled using Snakemake's standard arguments. The initial workflow generated by Claude did not follow this guidance, and instead custom variables to define runtime details such as the output directory and the number of cores. (TBD - CHECK THIS) + +#### Updating the workflow when inputs change + +Once the workflow has completed successfully, re-running it will not result in the re-execution of any of the analyses: + +```bash +snakemake --cores 1 --config output_dir=/Users/poldrack/data_unsynced/BCBS/simple_workflow/wf_snakemake +Config file config/config.yaml is extended by additional config specified via the command line. +Assuming unrestricted shared filesystem usage. +host: Russells-MacBook-Pro.local +Building DAG of jobs... +Nothing to be done (all requested files are present and up to date). +``` +However, Snakemake checks several features of the workflow (by default) when generating its DAG to see if anything relevant has changed. By default it checks to see if any of the following have changed (configurable using the `-rerun-triggers` flag): -- show how one can run snakemake with an output file name to reconstruct that file (using --force if it already exists) +- modification times of input files +- the code specified within the rule +- the input files or parameters for the rule -## Scaling to complex workflows +Snakemake also checks for changes in the details of the software environment, but as of the date of writing this only works for Conda environments. -We now turn to a more realistic and complex scientific data analysis workflow. For this example I will use an analysis of single-cell RNA-sequencing data to determine how gene expression in immune system cells changes with age. This analysis will utilize a [large openly available dataset](https://cellxgene.cziscience.com/collections/dde06e0f-ab3b-46be-96a2-a8082383c4a1) that includes data from 982 people comprising about 1.3 million immune system cells for about 35K transcripts. I chose this particular example for several reasons: +As an example, I will first update the modification time of the demographics file from a previous successful run using the `touch` command: + +```bash +➤ ls -l data/meaningful_variables.csv +Permissions Size User Date Modified Name +.rw-r--r--@ 1.2M poldrack 24 Dec 10:11 data/meaningful_variables.csv + +➤ touch data/meaningful_variables.csv + +➤ ls -l data/meaningful_variables.csv +Permissions Size User Date Modified Name +.rw-r--r--@ 1.2M poldrack 24 Dec 10:14 data/meaningful_variables.csv +``` + +You can see that the touch command updated the modification time of the file. Now let's rerun the `snakemake` command: + +```bash +snakemake --cores 1 --config output_dir=/Users/poldrack/data_unsynced/BCBS/simple_workflow/wf_snakemake +Config file config/config.yaml is extended by additional config specified via the command line. +Assuming unrestricted shared filesystem usage. +host: Russells-MacBook-Pro.local +Building DAG of jobs... +Using shell: /bin/bash +Provided cores: 1 (use --cores to define parallelism) +Rules claiming more threads will be scaled down. +Job stats: +job count +--------------------------- ------- +all 1 +compute_correlation 1 +filter_meaningful_variables 1 +generate_heatmap 1 +join_datasets 1 +total 5 +``` + +Similarly, Snakemake will rerun the workflow if any of the scripts used to run the workflow are modified. However, it's important to note that it will not identify changes in the modules that are imported. In that case you would need to rerun the workflow in order to re-execute the relevant steps. + +## Scaling to a complex workflow + +We now turn to a more realistic and complex scientific data analysis workflow. For this example I will use an analysis of single-cell RNA-sequencing data to determine how gene expression in immune system cells changes with age. This analysis will utilize a [large openly available dataset](https://cellxgene.cziscience.com/collections/dde06e0f-ab3b-46be-96a2-a8082383c4a1) that includes data from 982 people comprising about 1.3 million peripheral blood mononuclear cells (i.e. white blood cells) for about 35K transcripts. I chose this particular example for several reasons: - It is a realistic example of a workflow that a researcher might actually perform. +- It has a large enough sample size to provide a robust answer to our scientific question. - The data are large enough to call for a real workflow management scheme, but small enough to be processed on a single laptop (assuming it has decent memory). - The workflow has many different steps, some of which can take a significant amount of time (over 30 minutes) - There is an established Python library ([scanpy](https://scanpy.readthedocs.io/en/stable/)) that implements the necessary workflow components. @@ -261,7 +586,7 @@ Here we will examine the first (recommended) option and the third solution; whil ### A workflow registry with checkpointing -We start with a custom approach in order to get a better view of the details of workflow orchestration. +We start with a custom approach in order to get a better view of the details of workflow orchestration. It's important to note that I generally would not recommend building one's one custom workflow manager, at least not before trying a general-purpose workflow engine, but I will show an example of a custom workflow engine in order to provide a better understanding of the detailed process of workflow management. We start with a prompt: > let's implement the recommended Stateless Workflow with Checkpointing. Please generate new code within src/BetterCodeBetterScience/rnaseq/stateless_workflow. @@ -359,105 +684,84 @@ Combining these strategies of reducing data duplication, eliminating some interm The use of a modular architecture for our stateless workflow helps to separate the actual workflow components from the execution logic of the workflow. One important benefit of this is that it allows us to plug those modules into any other workflow system, and as long as the inputs are correct it should work. We will see that next when we create new versions of this workflow using two common workflow engines. +### Managing a complex workflow with Snakemake -### Using a workflow engine +In general I recommend trying a general-purpose workflow engine instead of writing a custom one. In this example I will focus on the Snakemake workflow engine; in the project repository there is also a parallel workflow developed using the [Prefect] workflow engine (which is integrated directly into the Python code), but I will not discuss it here. + #rnaseq-DAG-fig shows the DAG generated by Snakemake for the RNA-seq analysis workflow. ```{figure} images/snakemake-DAG.png -:label: DAG-fig +:label: rnaseq-DAG-fig :align: center :width: 300px The execution graph for the RNA-seq analysis workflow visualized as a DAG. ``` +The `Snakemake` file is substantially larger for this complex workflow; to keep it from being even longer, we can also move some of the rules into separate rules files using the following syntax within the `Snakemake` file: -### A language-specific workflow management example: Prefect - -- First build a very simple workflow example using Prefect - -#### Configuration management - -The initial version of the Prefect workflow generated by Claude had the default parameter settings for each workflow hard-coded into the task definitions in Python. This is not a great practice, since it means that any changes in parameter settings require changes in the code, making it difficult to easily test different parameter settings. A better practice is to put the parameter settings into a human-editable configuration file, which can be specified upon execution of the workflow. YAML is a common markup language for configuration files, which is used in both this workflow and the following one. - - - -### A general-purpose workflow management example: Snakemake +```bash +# Include modular rule files +include: "rules/common.smk" +include: "rules/preprocessing.smk" +include: "rules/pseudobulk.smk" +include: "rules/per_cell_type.smk" +``` -- First build the simple example using snakemake #### Pipeline optimization -The first time that I ran my workflow using snakemake, I noticed that it was substantially slower than when I ran it with Prefect. When I asked Claude about this, it gave me a reasonable answer: +The first time that I ran this workflow using snakemake, I noticed that it was substantially slower than when I ran it using the custom workflow engine. When I asked Claude about this, it gave me a reasonable answer: > The Snakemake rules don't specify a threads directive. By default, Snakemake runs each job with 1 thread, which limits parallelization of computationally intensive operations like neighbor computation and UMAP (which use numba internally). I asked Claude to fix this, and it returned the following change: - > Fixes applied: > 1. Added threads: 8 to dimensionality_reduction rule > 2. Added threads: 8 to quality_control rule (Scrublet) > 3. Added threads: 4 to preprocess rule > 4. Set NUMBA_NUM_THREADS and OMP_NUM_THREADS environment variables in dimred.py -> In contrast, Prefect tasks run in the main process with access to all CPUs by default, which is why it was faster. +> In contrast, tasks run in the main process have access to all CPUs by default, which is why it was faster. This solves the problem but it's a brittle soluution: in particular, it will probably fail if there are fewer than 8 threads available on the system and it won't take advantage of more than 8 if they are available. Snakemake actually take a command line argument (`--cores`) to specify the number of cores to use, so I instead asked Claude to have Snakemake use the number of cores specified at the command line rather than an arbitrary number that might not be optimal. We will discuss optimization in much greater detail in a later chapter, but whenever a pipeline takes much longer to run using a workflow manager than one would expect, it's likely that there is optimization to be done. +#### Running snakemake -## Tracking provenance +It's important to know that when snakemake is run, it stores metadata regarding the workflow in a hidden directory called `.snakemake`. It's generally a good idea to add this to the `.gitignore` file since one probably doesn't want to include detailed workflow metadata in one's git repository. It's also a best practice to execute the +#### Report generation +One of the very handy features of Snakemake is its ability to generate reports for workflow execution. Report generation is as simple as: -### Configuration management - -- how to configure a workflow +```bash +snakemake --report $DATADIR/immune_aging/wf_snakemake/report.html --config datadir=$DATADIR/immune_aging/ +``` - - configuration files - - command line arguments - - defaults +This command uses the metadata stored in the .snakemake -- interaction with provenance +#### Tracking provenance -- discuss fit-transform model somewhere +As I discussed in the earlier chapter on data management, it is essential to be able to track the provenance of files in a workflow. That is, how did the file come to be, and what other files did it depend on? -https://workflowhub.eu/ +#### Parametric sweeps +A common pattern in some computational research domains is the *parametric sweep*, where a workflow is run using a range of values for specific parameters in the workflow. A key to successful execution of parametric sweeps is proper organization of the outputs so that they can be easily processed by downstream tools. Snakemake provides the ability to easily implement parametric sweeps simply by specifying a list of parameter values in the configuration file. For example... TBD -## Error handling and robustness +## Testing workflows -- Fail fast -- Gracefully handle missing data -- Checkpointing for long-running workflows - write tests for common edge cases - use a small toy dataset for testing - unit vs integration tests -## Logging - - -## Report generation - - ## Scaling workflows - maybe leave this to the HPC chapter? +## Choosing a workflow engine -## FAIR-inspired practices for workflows - - FAIR workflows - - https://pmc.ncbi.nlm.nih.gov/articles/PMC10538699/ - - https://www.nature.com/articles/s41597-025-04451-9 - - this seems really heavyweight. - - 80/20 approach to reproducible workflows - - version control + documentation - - requirements file or container - - clear workflow structure - - standard file formats - - The full FAIR approach may be necessary in some contexts -