Develop update for Oct 2025 by djsmith17 · Pull Request #169 · ncats/multiplex-analysis-web-apps

djsmith17 · 2025-10-21T19:40:19Z

This pull request introduces several enhancements and improvements across the environment configuration and multiple Streamlit application pages, focusing on environment reproducibility, data export functionality, and usability for cluster/heatmap analysis. The most important changes are summarized below.

Environment and Dependency Management:

Refactored .envs/maestro/meta.yaml to update, pin, and reorganize dependencies for improved reproducibility and compatibility, including explicit versioning for key packages and consolidating requirements between conda and pip.

Neighborhood Profiles Export and Usability:

Enhanced the neighborhood profiles workflow in pages2/Neighborhood_Profiles.py and basic_phenotyper_lib.py to allow users to export cluster comparison results as CSV files directly to the output folder, both for individual and batch (subplot) analyses. This includes capturing the computed dataframes, managing file suffixes, and providing UI buttons for saving results. [1] [2] [3] [4] [5] [6] [7] [8] [9]
Improved cluster list management to dynamically include "Average Left" and "Average Right" when cluster difference toggles are enabled, ensuring correct options are available throughout the workflow. [1] [2]

Average Heatmap Data Export:

Added an option in pages2/Tool_parameter_selection.py for users to enable saving of slide-averaged heatmap data, and propagated this setting through the workflow. The pages2/Display_average_heatmaps.py page now provides a button to zip and export all relevant heatmap data CSVs, improving downstream data accessibility. [1] [2] [3] [4] [5]

Miscellaneous Improvements:

Updated dashboard publishing logic to support and enforce the use of resourcesV2 in .publish-dashboards.py.
Minor code cleanup and import fixes, such as adding a missing import in pages2/feature_creation.py.
Removed obsolete or commented-out UI code to streamline the interface in pages2/Neighborhood_Profiles.py.

These changes collectively improve the reproducibility, usability, and data export capabilities of the application, making it easier for users to manage, analyze, and extract results from their workflows.

Copilot

Pull Request Overview

This PR updates the project for October 2025, including dependency version updates, new functionality for saving heatmap data, and improvements to the neighborhood profile visualization workflow.

Updated parent template version and various package dependencies in the conda environment
Added capability to save heatmap CSV data alongside visualizations
Enhanced neighborhood profile functionality with CSV export options and improved cluster comparison handling

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
templateConfig.json	Updated parent template version from 0.231.0 to 0.243.0
.envs/maestro/meta.yaml	Comprehensive update of conda environment dependencies with specific version pinning
utils.py	Added filtering to exclude CSV files from heatmaps directory listing
time_cell_interaction_lib.py	Added functionality to save heatmap data as CSV files and character replacement in species names
pages2/Tool_parameter_selection.py	Added UI control for enabling heatmap data saving
pages2/Run_workflow.py	Integrated new save_heatmap_data parameter into workflow execution
pages2/Display_average_heatmaps.py	Added UI for creating zip file of saved heatmap CSV files
pages2/Neighborhood_Profiles.py	Enhanced neighborhood profile with CSV export capability and removed commented code
basic_phenotyper_lib.py	Modified draw_neigh_profile_fig to return density dataframe and added cluster labels
pages2/feature_creation.py	Changed dataframe display to show sampled data with resample button
pages2/multiaxial_gating.py	Converted columns index to list for selectbox options
.publish-dashboards.py	Added support for resourcesV2 configuration

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

…e already addressed statements.

… adding the app_a_app_db database to local Docker postgresql.

…reamlit creations.

…. Untested but should come close to working if not working out of the box.

…ropriate per the Lucidchat diagram.

…ce; add fields to tables for consistency and correctness.

…hing of workers with arbitrary compute resources.

…needed when pressing the shutdown button; complete ideal, RBAC, and dynamic snowflake_orchestrator.py; start comprehensive README for Snowflake deployment.

…nd operate on them. Start minimally modifying the launcher.

…code for the new organization scheme.

…data manager service.

…eedback for large datasets; cache the two value counts functionalities in phenotype.py.

…ms to work locally with small dataset, but not on Snowflake with a large one though. That might be because I didn't call .collect(), which I'm doing now.

…en BBBB and CCCC when operating on large images nearing 1M cells each.

… (numpy leads to 17% faster for 400k cells, 75s to 62s). Confirmed that changing to ckdtree doesn't make a difference, but parametrized that anyway.

…idental leave-in of old code that populated df_counts_holder twice.

…ure population slows down, resulting in a break even. So, for now, not using neighbors chunking.

… results, even when some phenotypes are missing. Didn't port to version of fast counts in fnp_main, but about to. So we still get a round 17% speedup with no change in accuracy.

…we can pull into full-stack branch.

…borhood_profiles workflow (changed five files total) since test showed for one of those statements that this prevented a container crash.

…ossible.

…t actually reduce size of final figure but that's somewhat expected.

…y page prior to realizing that species phenotyping causes plotly redraws and investigating that.

…e importantly, write the range index to the in-memory cells dataframe prior to writing it so that we can later scan it using streaming and be able to trust that the added index sumap_cell_index is always consistent. Pick up with going through assign_neighborhood_types.py and rest of HP workflow to assess usage of indices therein.

…et files to disk (two instances total) so that subsequent reads of the intermediate files don't need .with_row_index() after scanning, which could lead to different row indices each time a scan is done if the streaming engine is used. Also, clean up all files, particularly names and usages of indices in order to make things more clear.

…that will change streaming to in-memory.

…edrawing due to streaming; sort by an index so plotly sees the same figure between reruns.

…ithful_columns) a frame that only has one unique input_index per row: the unified lazyframe, as opposed to lf_phenotyped which in the marker case could have duplicated input_indices. Just in case I forget, add a guard for uniqueness on that RHS in the plotting function itself right at the join. Also add a note that the guard isn't in place for a pandas df lookup table, but hopefully we'd only ever use polars anyway.

…s when loading previously calculated results.

…lement more robust clearing of session state keys when jobs (via buttons) are run.

…ession state rather than in the cache.

…s of the workflow (load_unified_input_file.py and phenotype.py) and set as the exported figure name the format similar to what Robert wanted.

…rkflow.

…back in the app more apparent (loading of an archive and submission of a job).

…6e5d1015534684691159b2c of git@github.com:CBIIT/snowflake-knowledge-hub.git that allows Snowflake connections the same way whether from SPCS or from a local machine.

…t downstream results will be overwritten.

djsmith17 requested a review from Copilot October 21, 2025 19:40

djsmith17 self-assigned this Oct 21, 2025

Copilot AI reviewed Oct 21, 2025

View reviewed changes

djsmith17 requested a review from andrew-weisman October 21, 2025 19:43

andrew-weisman added 26 commits October 27, 2025 22:51

Add plausible node types for the frontend and worker pools.

6eba2b8

Fix bug in comment in resource plot script.

f1ca5d2

Add comment.

9f6975d

Ensure the user role can control the compute resources.

1bb9427

Change ownership of the apps appropriately.

7d14ce2

Add string of resource types to later push into an env var, and delet…

beb72de

…e already addressed statements.

Add everything required to configure the images/repository, including…

683a017

… adding the app_a_app_db database to local Docker postgresql.

Complete first page of ideal SQL aside from the three main service/St…

b1457fa

…reamlit creations.

Stab at asynchronizing cell density analysis in Neighborhood Profiles…

a2a4421

…. Untested but should come close to working if not working out of the box.

Create additional objects in both Snowflake and local (Docker) as app…

1e022c8

…ropriate per the Lucidchat diagram.

Add two stages to hold specs/scripts for the services/launcher.

e576b94

Work out the service spec template and definition.

ffef1aa

Work on job service definition with general resource selection.

af74edd

Continue adding functionality to select and log worker compute resour…

17540d5

…ce; add fields to tables for consistency and correctness.

Continue migrating to more robust RBAC schema and allow dynamic launc…

59114b1

…hing of workers with arbitrary compute resources.

Allow the app role to suspend the frontend compute pools as would be …

8f37a8f

…needed when pressing the shutdown button; complete ideal, RBAC, and dynamic snowflake_orchestrator.py; start comprehensive README for Snowflake deployment.

Accordingly define the seven frontend services.

7e2c76e

Work on complete deployment README for both local and Snowflake.

dfee4ff

Allow user role to even access the frontend services and to monitor a…

6df7575

…nd operate on them. Start minimally modifying the launcher.

Finish copying over to 01....sql from 03....sql and 04....sql.

985d168

Clean up deploy/snowflake directory.

3a2f8ba

Allow the launcher to determine a user group and update the launcher …

3e52739

…code for the new organization scheme.

Complete generalized app launcher and rename the to-define-in-future …

10e265a

…data manager service.

Remove some scratch code from the launcher and clean up a bit.

8afbf55

Update diagrams, combine READMEs into the main one.

4d07652

Set up the leandro environment in the Streamlit Dockerfile.

8f043eb

andrew-weisman added 30 commits December 10, 2025 23:29

Flush debugging output in fast_neighbors_counts_for_block2() to get f…

d497263

…eedback for large datasets; cache the two value counts functionalities in phenotype.py.

Cache the value_counts() generation and include a cache-buster... see…

1fb3099

…ms to work locally with small dataset, but not on Snowflake with a large one though. That might be because I didn't call .collect(), which I'm doing now.

Explore what if anything can be done to speed up the bottleneck betwe…

9126b8f

…en BBBB and CCCC when operating on large images nearing 1M cells each.

Parametrize using pandas or numpy in fast_neighbors_counts_for_block2…

7e7a579

… (numpy leads to 17% faster for 400k cells, 75s to 62s). Confirmed that changing to ckdtree doesn't make a difference, but parametrized that anyway.

Start making changes to subset neighbor trees but also eliminated acc…

277fc19

…idental leave-in of old code that populated df_counts_holder twice.

Chunking the neighbors does speed up query ball *but* the data struct…

2be9756

…ure population slows down, resulting in a break even. So, for now, not using neighbors chunking.

Switch default from pandas to numpy. Confirmed methods yield the same…

9854c0d

… results, even when some phenotypes are missing. Didn't port to version of fast counts in fnp_main, but about to. So we still get a round 17% speedup with no change in accuracy.

Implement updated fast neighbors counts function in main codebase so …

053fc30

…we can pull into full-stack branch.

Delete scratch Jupyter notebook (well, move to archive/).

46c7520

Use streaming in lf.collect() statements everywhere in the fast_neigh…

ac99be5

…borhood_profiles workflow (changed five files total) since test showed for one of those statements that this prevented a container crash.

Stream file writes and don't duplicate objects in memory as best as p…

3a3b155

…ossible.

Cast coordinate columns to float32 in plot_image_from_frame(). Doesn'…

b946340

…t actually reduce size of final figure but that's somewhat expected.

Fix call to time().

7ec9829

Organize fast framework main.py module; add incomplete delete_cells.p…

e6092c9

…y page prior to realizing that species phenotyping causes plotly redraws and investigating that.

Optimize streaming .collects() relative to functions such as .sort() …

b9b5e78

…that will change streaming to in-memory.

Implement robust solution for plotly plots resetting and completely r…

ed785a6

…edrawing due to streaming; sort by an index so plotly sees the same figure between reruns.

Re-optimize polars operations a bit.

ed0e4ca

Allow everything to be fully rebuilt/recalculated from core component…

00fc68b

…s when loading previously calculated results.

Stop checking for valid session state values since we're about to imp…

843df2d

…lement more robust clearing of session state keys when jobs (via buttons) are run.

Store counts of phenotypes in full dataset and in each image in the s…

41e1cc5

…ession state rather than in the cache.

Add calls to clear data in memory appropriately in the first two file…

d82bb62

…s of the workflow (load_unified_input_file.py and phenotype.py) and set as the exported figure name the format similar to what Robert wanted.

Add calls to clear data in memory appropriately in the rest of the wo…

513e5b9

…rkflow.

Create main.clear_data_in_memory().

cc7dde3

Uncomment out the two reruns in the full stack framework to make feed…

a3fd022

…back in the app more apparent (loading of an archive and submission of a job).

Add version of snowflake_connections.py from commit 3151d3643acd855aa…

cac5c3b

…6e5d1015534684691159b2c of git@github.com:CBIIT/snowflake-knowledge-hub.git that allows Snowflake connections the same way whether from SPCS or from a local machine.

Add warnings to the main workflow node buttons that warn the user tha…

aad7b4b

…t downstream results will be overwritten.

Merge branch 'full-stack' into develop

069cdbf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Develop update for Oct 2025#169

Develop update for Oct 2025#169
djsmith17 wants to merge 343 commits intomainfrom
develop

djsmith17 commented Oct 21, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

djsmith17 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djsmith17 commented Oct 21, 2025 •

edited

Loading