-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Greetings, I have written an R script to replicate the python program. Here is my gist that I believe captures the core process from variance fitting, MP fitting, biwhitening, to the eventual denoising.
One thing I do differently, is not using Chebyshev polynomial fitting to find the (q, KS) minimum. The problem is it appears to not converge if I let it decide the number of terms from 5 up to 4097. Pychebfun will keep adding new terms if those coefficients don't fall below machine precision. Last time I checked, it continued well past 1025 terms.
Though 65 polynomial terms do approximate well, I can apply Brent's method to search for minimum much more quickly. It consistently uses fewer points to approach a lowered tolerance level (10-9). With this efficiency upgrade, using the whole matrix becomes feasible. My largest scRNAseq sample reaches 15300 genes x 19500 cells and can be done in around 1 hour when Brent's method concludes after 30-40 points, compared to 0.5 hour using bipca-python which performs subsampling that exchanges precision for time saving. On the other hand, medium samples like 17500 x 6000 take only 12 minutes beating the 20+ minutes with bipca-python, while small samples of 2000 cells are less than 3 minutes in run time.
May 1, 2025: I've finally sorted out the main circuit, so it now generates results correctly.
May 3, 2025: Replaced my improvised search method by stats::optimize() aka Brent's method.