2025 12 16 improve performance by julenmendieta · Pull Request #42 · platforma-open/dimensionality-reduction

julenmendieta · 2026-01-13T10:20:05Z

No description provided.

…pact)

…t HVG)

gemini-code-assist · 2026-01-13T10:20:45Z

Summary of Changes

Hello @julenmendieta, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the project's technical foundation by modernizing its build pipeline and dependency management, leading to a more robust and maintainable codebase. Concurrently, it delivers substantial performance gains in data processing for dimensionality reduction tasks and introduces a key feature for biological analysis, all while enhancing the reproducibility of results.

Highlights

Build System and Dependency Refactor: The project's build system has been significantly refactored, migrating to a standardized monorepo setup. This includes the adoption of @milaboratories/ts-builder for TypeScript projects, consolidation of ESLint and TypeScript configurations via @milaboratories/ts-configs, and updates to turbo.json for streamlined task management (lint, type-check, build, pack). Obsolete build tools like tsup and vite configurations have been removed.
Performance and Memory Optimization in Python Scripts: The core Python scripts (batch_correction.py and calculate_dim_reduction.py) have undergone major performance and memory efficiency improvements. This was achieved by replacing pandas with polars for initial CSV data loading and processing, and by optimizing the creation of sparse matrices using direct integer codes from categorical data. This reduces memory footprint and speeds up data ingestion.
Highly Variable Genes (HVG) Feature: A new feature to enable Highly Variable Genes (HVG) selection has been introduced. Users can now choose to restrict dimensionality reduction analysis to a specified number of HVGs, which can significantly improve computational speed and focus the analysis on the most biologically relevant genes. This functionality is exposed in the UI and integrated into the Python workflows.
Reproducibility Enhancements: Random seeds (np.random.seed(0)) have been explicitly set in the Python scripts for PCA, UMAP, and tSNE calculations. This ensures that the results of these stochastic algorithms are reproducible across different runs.
Dependency Updates and Standardization: The pnpm-workspace.yaml file has been updated to standardize SDK package versions to exact versions and introduce new SDK packages. Common dependencies like typescript, turbo, vitest, and eslint have also been updated to newer versions, ensuring consistency and leveraging the latest features.
Block Structure and Output Handling: The block's internal structure has been updated with new block/index.d.ts and block/index.js files. The model's output definitions in model/src/index.ts now use .outputWithStatus for UMAP and tSNE, indicating an enhancement in how output statuses are managed.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant performance and memory efficiency improvements, primarily by refactoring the Python data processing scripts to use polars and a more efficient method for creating sparse matrices. A new feature for selecting Highly Variable Genes (HVG) has been added across the stack (backend, model, UI), which further enhances performance and analytical focus. The build system and dependencies have also been substantially updated for better maintainability. My review focuses on ensuring the new logic is robust and consistent. I've identified a couple of minor issues: a misleading log message and an opportunity for further memory optimization by using a consistent data type. I also noted a potential UI improvement for a more robust user experience. Overall, this is a high-quality contribution that significantly improves the block.

gemini-code-assist · 2026-01-13T10:22:57Z

software/src/batch_correction.py

This log message refers to ComBat correction, but the script has been updated to only use Harmony and no longer performs ComBat correction. This message is misleading and should be removed to avoid confusion.

gemini-code-assist · 2026-01-13T10:22:58Z

software/src/calculate_dim_reduction.py

The sparse matrix is being created with dtype=np.float64. While this provides higher precision, scanpy and many single-cell analysis tools typically operate on np.float32 to conserve memory, which is particularly important for large datasets. The accompanying batch_correction.py script correctly uses np.float32. For consistency and to align with the performance goals of this PR, it would be better to use np.float32 here as well.

Suggested change

dtype=np.float64

dtype=np.float32

gemini-code-assist · 2026-01-13T10:22:58Z

ui/src/pages/MainPage.vue

+          <PlNumberField
+            v-if="app.model.args.hvgEnabled"
+            v-model="app.model.args.hvgCount"
+            label="Number of HVG"
+            :min-value="app.model.args.nPCs + 1"
+            :step="100"
+          />


The min-value for the "Number of HVG" is correctly bound to app.model.args.nPCs + 1. However, this only enforces the minimum at the UI level. If a user sets hvgCount and then increases nPCs to a value hvgCount - 1 or greater, the model value for hvgCount will become invalid but won't be automatically adjusted. This could lead to downstream errors.

For a more robust user experience, consider adding a watcher on app.model.args.nPCs to automatically adjust hvgCount if it becomes invalid.

import { watch } from 'vue'; watch(() => app.model.args.nPCs, (newNPCs) => { if (app.model.args.hvgEnabled && app.model.args.hvgCount <= newNPCs) { app.model.args.hvgCount = newNPCs + 1; } });

aiturbidemil and others added 12 commits December 18, 2025 09:55

SDK update

f3e7821

DimRed script update

b16bb8f

Load repetitive string columns as categorical to save memory (huge im…

1442eb9

…pact)

Improve dimensionality reduction from sparse matrix

57d31c2

Run batch correction on sparse data

1072f98

Make sparse results match non-sparse approach (not taking into accoun…

7b2ca48

…t HVG)

Improve reproducibility with non sparse version

85975a7

Implement user selected HVG

d384529

Update SDK and fix resulting issues

88e75e0

Update graph-maker

9bb1494

Migrate code to latest recommended layout

e28ca08

Changeset

a04ff01

julenmendieta requested a review from aiturbidemil January 13, 2026 10:20

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

Minor corrections after review

e86bcf8

julenmendieta added this pull request to the merge queue Feb 16, 2026

Merged via the queue into main with commit 822950a Feb 16, 2026
8 checks passed

julenmendieta deleted the 2025-12-16-improve-performance branch February 16, 2026 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2025 12 16 improve performance#42

2025 12 16 improve performance#42
julenmendieta merged 13 commits intomainfrom
2025-12-16-improve-performance

julenmendieta commented Jan 13, 2026

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 13, 2026

Uh oh!

julenmendieta Jan 13, 2026

Uh oh!

gemini-code-assist bot Jan 13, 2026

Uh oh!

julenmendieta Jan 13, 2026

Uh oh!

gemini-code-assist bot Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

julenmendieta commented Jan 13, 2026

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

julenmendieta Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

julenmendieta Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments