Skip to content

Conversation

@mbuttner
Copy link

This PR provides a fix to the issue stated in aertslab/scenicplus#61.
How it's done: Scaffold chromosomes are filtered out in the export_pseudobulk function when the fragments are loaded as DataFrame using a regular expression and the pandas pd.Series.str.contains() function. I introduced a new parameter for the export_pseudobulk function called chrom_filter = None.

Example following the SCENIC+ tutorial for 10X multiome data:

from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
                 variable = 'celltype',                                                                     # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
                 sample_id_col = 'sample_id',
                 chromsizes = chromsizes,
                 bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'),  # specify where pseudobulk_bed_files should be stored
                 bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored
                 path_to_fragments = fragments_dict,                                                        # location of fragment files
                 chrom_filter = "GL|KI",
                 n_cpu = 8,                                                                                 # specify the number of cores to use, we use ray for multi processing
                 normalize_bigwig = True,
                 remove_duplicates = True,
                 _temp_dir = tmp_dir,
                 split_pattern = '-')

Output:

2023-08-22 11:16:41,366 cisTopic     INFO     Reading fragments from ../atac_fragments.tsv.gz
2023-08-22 11:19:37,550 cisTopic     INFO     Filtering out 33056 fragments.
2023-08-22 11:20:42,732	INFO worker.py:1627 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265/ 
(export_pseudobulk_ray pid=3011257) 2023-08-22 11:20:46,836 cisTopic     INFO     Creating pseudobulk for CT1
(export_pseudobulk_ray pid=3011259) 2023-08-22 11:20:46,829 cisTopic     INFO     Creating pseudobulk for CT2
(export_pseudobulk_ray pid=3011259) 2023-08-22 11:20:47,958 cisTopic     INFO     Creating pseudobulk for CT3
(export_pseudobulk_ray pid=3011259) 2023-08-22 11:20:50,278 cisTopic     INFO     CT3 done!

Thank you for considering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant