Skip to content

Idea: create an R script for the bipca algorithm #29

@hsuknowledge

Description

@hsuknowledge

Greetings, I have written an R script to replicate the python program. Here is my gist that I believe captures the core process from variance fitting, MP fitting, biwhitening, to the eventual denoising.

One thing I do differently, is not using Chebyshev polynomial fitting to find the (q, KS) minimum. The problem is it appears to not converge if I let it decide the number of terms from 5 up to 4097. Pychebfun will keep adding new terms if those coefficients don't fall below machine precision. Last time I checked, it continued well past 1025 terms.

Though 65 polynomial terms do approximate well, I can apply Brent's method to search for minimum much more quickly. It consistently uses fewer points to approach a lowered tolerance level (10-9). With this efficiency upgrade, using the whole matrix becomes feasible. My largest scRNAseq sample reaches 15300 genes x 19500 cells and can be done in around 1 hour when Brent's method concludes after 30-40 points, compared to 0.5 hour using bipca-python which performs subsampling that exchanges precision for time saving. On the other hand, medium samples like 17500 x 6000 take only 12 minutes beating the 20+ minutes with bipca-python, while small samples of 2000 cells are less than 3 minutes in run time.

May 1, 2025: I've finally sorted out the main circuit, so it now generates results correctly.
May 3, 2025: Replaced my improvised search method by stats::optimize() aka Brent's method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions