Skip to content

2025 12 16 improve performance#42

Merged
julenmendieta merged 13 commits intomainfrom
2025-12-16-improve-performance
Feb 16, 2026
Merged

2025 12 16 improve performance#42
julenmendieta merged 13 commits intomainfrom
2025-12-16-improve-performance

Conversation

@julenmendieta
Copy link
Contributor

No description provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello @julenmendieta, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the project's technical foundation by modernizing its build pipeline and dependency management, leading to a more robust and maintainable codebase. Concurrently, it delivers substantial performance gains in data processing for dimensionality reduction tasks and introduces a key feature for biological analysis, all while enhancing the reproducibility of results.

Highlights

  • Build System and Dependency Refactor: The project's build system has been significantly refactored, migrating to a standardized monorepo setup. This includes the adoption of @milaboratories/ts-builder for TypeScript projects, consolidation of ESLint and TypeScript configurations via @milaboratories/ts-configs, and updates to turbo.json for streamlined task management (lint, type-check, build, pack). Obsolete build tools like tsup and vite configurations have been removed.
  • Performance and Memory Optimization in Python Scripts: The core Python scripts (batch_correction.py and calculate_dim_reduction.py) have undergone major performance and memory efficiency improvements. This was achieved by replacing pandas with polars for initial CSV data loading and processing, and by optimizing the creation of sparse matrices using direct integer codes from categorical data. This reduces memory footprint and speeds up data ingestion.
  • Highly Variable Genes (HVG) Feature: A new feature to enable Highly Variable Genes (HVG) selection has been introduced. Users can now choose to restrict dimensionality reduction analysis to a specified number of HVGs, which can significantly improve computational speed and focus the analysis on the most biologically relevant genes. This functionality is exposed in the UI and integrated into the Python workflows.
  • Reproducibility Enhancements: Random seeds (np.random.seed(0)) have been explicitly set in the Python scripts for PCA, UMAP, and tSNE calculations. This ensures that the results of these stochastic algorithms are reproducible across different runs.
  • Dependency Updates and Standardization: The pnpm-workspace.yaml file has been updated to standardize SDK package versions to exact versions and introduce new SDK packages. Common dependencies like typescript, turbo, vitest, and eslint have also been updated to newer versions, ensuring consistency and leveraging the latest features.
  • Block Structure and Output Handling: The block's internal structure has been updated with new block/index.d.ts and block/index.js files. The model's output definitions in model/src/index.ts now use .outputWithStatus for UMAP and tSNE, indicating an enhancement in how output statuses are managed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance and memory efficiency improvements, primarily by refactoring the Python data processing scripts to use polars and a more efficient method for creating sparse matrices. A new feature for selecting Highly Variable Genes (HVG) has been added across the stack (backend, model, UI), which further enhances performance and analytical focus. The build system and dependencies have also been substantially updated for better maintainability. My review focuses on ensuring the new logic is robust and consistent. I've identified a couple of minor issues: a misleading log message and an opportunity for further memory optimization by using a consistent data type. I also noted a potential UI improvement for a more robust user experience. Overall, this is a high-quality contribution that significantly improves the block.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This log message refers to ComBat correction, but the script has been updated to only use Harmony and no longer performs ComBat correction. This message is misleading and should be removed to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sparse matrix is being created with dtype=np.float64. While this provides higher precision, scanpy and many single-cell analysis tools typically operate on np.float32 to conserve memory, which is particularly important for large datasets. The accompanying batch_correction.py script correctly uses np.float32. For consistency and to align with the performance goals of this PR, it would be better to use np.float32 here as well.

Suggested change
dtype=np.float64
dtype=np.float32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

Comment on lines +220 to +226
<PlNumberField
v-if="app.model.args.hvgEnabled"
v-model="app.model.args.hvgCount"
label="Number of HVG"
:min-value="app.model.args.nPCs + 1"
:step="100"
/>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The min-value for the "Number of HVG" is correctly bound to app.model.args.nPCs + 1. However, this only enforces the minimum at the UI level. If a user sets hvgCount and then increases nPCs to a value hvgCount - 1 or greater, the model value for hvgCount will become invalid but won't be automatically adjusted. This could lead to downstream errors.

For a more robust user experience, consider adding a watcher on app.model.args.nPCs to automatically adjust hvgCount if it becomes invalid.

import { watch } from 'vue';

watch(() => app.model.args.nPCs, (newNPCs) => {
  if (app.model.args.hvgEnabled && app.model.args.hvgCount <= newNPCs) {
    app.model.args.hvgCount = newNPCs + 1;
  }
});

@julenmendieta julenmendieta added this pull request to the merge queue Feb 16, 2026
Merged via the queue into main with commit 822950a Feb 16, 2026
8 checks passed
@julenmendieta julenmendieta deleted the 2025-12-16-improve-performance branch February 16, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments